Zafir Stojanovski
49b1dbbcce
Fix misleading instruction in shortest_path asking for "length" instead of path ( #523 )
...
The prompt asked to "find the length of the shortest path" but the expected
answer is a sequence of directions. This caused models to answer with a number
instead of directions, degrading evaluation results.
Closes #522
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-25 13:02:23 +01:00
Zafir Stojanovski
c6663cdb81
fix(training): Prepend <think> token in format reward ( #396 )
...
* prepend think token in format reward
* pre commit + fix some default vals
* add checkpoint config
2025-03-28 09:45:17 +01:00
Andreas Koepf
2802066233
remove data/ from main .gitignore
2025-03-07 16:16:40 +01:00
Zafir Stojanovski
5109ed89c9
pre-commit
2025-02-23 13:11:31 +01:00
Zafir Stojanovski
6bbec2ac4e
exploratory notebook
2025-02-22 00:46:33 +01:00
tohskai
847442ef0a
Add PolynomialMultiplicationDataset ( #64 )
...
* Add PolynomialMultiplicationDataset
2025-02-07 14:06:41 +01:00
abdulhakeem
715102c277
Remove .DS_Store
2025-02-01 20:39:37 -06:00
Rich Jones
99bf648989
initial bf working, contrib not committed
2025-01-30 15:38:03 +01:00
Andreas Koepf (aider)
3f80fd7b80
build: Initialize reasoning_gym package structure with packaging and development setup
2025-01-23 10:50:54 +01:00
Andreas Koepf
530cb523c8
chore: Add .gitignore with .aider and .env files
2025-01-23 10:50:53 +01:00