mirror of
https://github.com/NousResearch/atropos.git
synced 2026-04-19 12:57:58 +00:00
Update README.md
This commit is contained in:
parent
320614e294
commit
baa6a1feef
1 changed files with 2 additions and 2 deletions
|
|
@ -44,7 +44,7 @@ visualization_dir: "./rubiks_visualizations/"
|
|||
|
||||
## Performance Metrics & Training (150 words)
|
||||
|
||||
[View WandB Run Results](https://wandb.ai/team/project/runs/abc123)
|
||||
[View WandB Run Results]([https://wandb.ai/team/project/runs/abc123](https://wandb.ai/joshuaxjerin-uc/atropos-environments?nw=nwuserjoshuaxjerin))
|
||||
|
||||
Our environment tracks several key metrics:
|
||||
|
||||
|
|
@ -78,4 +78,4 @@ Our reward function combines:
|
|||
3. Move efficiency compared to optimal solve
|
||||
4. Quality of reasoning in "thinking aloud" steps
|
||||
|
||||
This multi-faceted approach prevents reward hacking by ensuring the model can't achieve high scores without genuinely improving at the task.
|
||||
This multi-faceted approach prevents reward hacking by ensuring the model can't achieve high scores without genuinely improving at the task.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue