Update README.md

This commit is contained in:
Joshua Jerin 2025-05-18 20:50:53 -04:00 committed by GitHub
parent 320614e294
commit baa6a1feef
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -44,7 +44,7 @@ visualization_dir: "./rubiks_visualizations/"
## Performance Metrics & Training (150 words)
[View WandB Run Results](https://wandb.ai/team/project/runs/abc123)
[View WandB Run Results]([https://wandb.ai/team/project/runs/abc123](https://wandb.ai/joshuaxjerin-uc/atropos-environments?nw=nwuserjoshuaxjerin))
Our environment tracks several key metrics:
@ -78,4 +78,4 @@ Our reward function combines:
3. Move efficiency compared to optimal solve
4. Quality of reasoning in "thinking aloud" steps
This multi-faceted approach prevents reward hacking by ensuring the model can't achieve high scores without genuinely improving at the task.
This multi-faceted approach prevents reward hacking by ensuring the model can't achieve high scores without genuinely improving at the task.