mirror of
https://github.com/NousResearch/atropos.git
synced 2026-04-19 12:57:58 +00:00
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
This commit is contained in:
parent
a0979eb08e
commit
52b505296c
2 changed files with 243 additions and 121 deletions
|
|
@ -626,7 +626,7 @@ Please act as an impartial judge and evaluate the quality of the responses provi
|
|||
**Evaluation Methodology:**
|
||||
1. **Model Response Generation**: Generate response to Arena-Hard prompt using configured temperature/tokens
|
||||
2. **Thinking Validation**: If thinking mode enabled, validate exactly one `<think></think>` pair and extract content after tags
|
||||
3. **Dual-Round Judging**:
|
||||
3. **Dual-Round Judging**:
|
||||
- Round 1: Judge model response (A) vs GPT-4 baseline (B)
|
||||
- Round 2: Judge GPT-4 baseline (A) vs model response (B)
|
||||
4. **Score Combination**: Average the two judgment scores using Arena-Hard logic
|
||||
|
|
@ -635,7 +635,7 @@ Please act as an impartial judge and evaluate the quality of the responses provi
|
|||
**Reward Function:**
|
||||
- **Training**: Scores range from -1.0 to 1.0 based on combined judgment results
|
||||
- 1.0: Model response clearly better than baseline
|
||||
- 0.0: Tie between model and baseline
|
||||
- 0.0: Tie between model and baseline
|
||||
- -1.0: Baseline clearly better than model response
|
||||
- **Invalid Thinking**: Automatic 0.0 score for malformed `<think></think>` tags
|
||||
- **Evaluation**: Converted to Arena-Hard winrate format (0.0 to 1.0)
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue