diff --git a/environments/hack0/README.md b/environments/hack0/README.md index 4a111fce..fd6bfca4 100644 --- a/environments/hack0/README.md +++ b/environments/hack0/README.md @@ -44,7 +44,7 @@ visualization_dir: "./rubiks_visualizations/" ## Performance Metrics & Training (150 words) -[View WandB Run Results](https://wandb.ai/team/project/runs/abc123) +[View WandB Run Results]([https://wandb.ai/team/project/runs/abc123](https://wandb.ai/joshuaxjerin-uc/atropos-environments?nw=nwuserjoshuaxjerin)) Our environment tracks several key metrics: @@ -78,4 +78,4 @@ Our reward function combines: 3. Move efficiency compared to optimal solve 4. Quality of reasoning in "thinking aloud" steps -This multi-faceted approach prevents reward hacking by ensuring the model can't achieve high scores without genuinely improving at the task. \ No newline at end of file +This multi-faceted approach prevents reward hacking by ensuring the model can't achieve high scores without genuinely improving at the task.