Andreas Köpf
b2904ccab9
Minor question template & score_answer improvements ( #261 )
...
* math prompt improvements
* ignore brackets in complex_arithmetic results
* improve additional instruction in prompt of polynomial_equations
* more strict tests for score_answer in polynomial_equations
* simplify special reward handling
* fix test_intermediate_integration
* fix sokoban dataset
* add common dataset score_answer consistency test
2025-03-04 21:55:09 +01:00
Andreas Köpf
28dc0932c4
Merge pull request #178 from olliestanley/feature/unsloth-train
...
Add minimal working GRPO training example with Unsloth
2025-02-21 15:37:24 +01:00
Andreas Koepf
ff5b210106
use native types List->list, Dict->dict, Set->set, Tuple->tuple
2025-02-21 15:15:38 +01:00
Oliver
31941d09e6
Answer scoring fixes to address edge cases
2025-02-20 22:04:01 +00:00
Oliver
47321936a5
Add docstring
2025-02-18 21:38:46 +00:00
Oliver
47b4f29c6a
Remove now redundant is_valid function
2025-02-18 21:37:37 +00:00
Oliver
43ccddf1ac
Remove comment
2025-02-18 21:32:15 +00:00
Oliver
368d13d470
Optimise Sudoku uniqueness checks
2025-02-18 21:30:59 +00:00
Oliver
c1d2e555ee
Fix Sudoku generator uniqueness and scoring
2025-02-18 21:02:49 +00:00
Andreas Koepf
519e411fa5
add reasoning_gym.create_dataset({name}, ...) global factory function
2025-01-25 00:58:34 +01:00
Andreas Koepf
aaabc05ace
formatting
2025-01-24 10:34:07 +01:00
Andreas Koepf (aider)
87d1db2a1a
feat: Add Sudoku puzzle generator with configurable difficulty
2025-01-23 22:55:09 +01:00