reasoning-gym

mirror of https://github.com/open-thought/reasoning-gym.git synced 2026-04-19 12:58:07 +00:00

Author	SHA1	Message	Date
Ritvik Rastogi	49b07130b3	feat: add scoring cascade for reducing false negatives (#526 ) * feat: add scoring cascade for reducing false negatives in answer verification * style: fix black and isort formatting Run black and isort to satisfy pre-commit checks. Made-with: Cursor * docs: add scoring cascade example to Quickstart section Mention the experimental scoring cascade feature at the end of the Quickstart section with a disclaimer and complete usage examples showing both the dataset method and standalone function. Made-with: Cursor * docs: shorten scoring cascade section in README Trim to a concise standalone example per review feedback. Made-with: Cursor * docs: simplify scoring cascade description in README Made-with: Cursor * update readme --------- Co-authored-by: Zafir Stojanovski <zaf.stojano@gmail.com>	2026-04-17 21:39:15 +02:00
Andreas Köpf	5d7fbac0ad	Minor question template & score_answer improvements (#261 ) * math prompt improvements * ignore brackets in complex_arithmetic results * improve additional instruction in prompt of polynomial_equations * more strict tests for score_answer in polynomial_equations * simplify special reward handling * fix test_intermediate_integration * fix sokoban dataset * add common dataset score_answer consistency test	2025-03-04 21:55:09 +01:00
Andreas Köpf	24828e1889	Remove strip from ProceduralDataset::core score_answer() (#250 ) * remove strip from ProceduralDataset::core score_answer(), strip in extract answer (optional, default=True) * test: Move test_extract_answer() from test_dataset.py to test_utils.py * refactor: Improve decimal reward computation with more flexible comparison * fix: Implement rounding for format_number when round_if_needed is True * test: Add test case for compute_decimal_reward with sign and zeros	2025-03-02 08:46:36 +01:00
Andreas Köpf	e71d2a96b6	feat: Add `category` property to `ProceduralDataset` to extract category name (#248 )	2025-03-01 23:11:40 +01:00
Andreas Köpf	850c1cf6f4	Eval script consolidation (#238 ) The script now supports: - YAML and JSON configurations - Dataset-specific parameters - Overriding configuration via command line - Detailed logging and error handling	2025-02-27 17:39:14 +01:00
Andreas Koepf	3e7ff3b084	use native types List->list, Dict->dict, Set->set, Tuple->tuple	2025-02-21 15:15:38 +01:00
Zafir Stojanovski	beb509b6f6	strip answer and solution	2025-02-16 15:39:10 +01:00
Andreas Koepf	3f98afe47d	more tolerant parsing of futoshiki answers	2025-02-16 14:23:40 +01:00
Andreas Koepf	0a660a3409	ignore single whitespace at beginning and end of answer, use reward = len(oracle_answer) / len(answer)	2025-02-14 15:40:12 +01:00
Andreas Koepf	5a88cf2529	add simple dataset gallery generation script	2025-01-30 22:30:26 +01:00
Andreas Koepf (aider)	e2d3f4b4e6	feat: Add seed wrapping at 2^32 to prevent unbounded growth	2025-01-30 22:05:14 +01:00
Andreas Koepf (aider)	dc54a7672f	refactor: Use self.dataset.seed directly for chunk seed generation	2025-01-30 22:02:21 +01:00
Andreas Koepf (aider)	66f99be4a3	feat: Add score_answer method to ReseedingDataset	2025-01-30 21:59:50 +01:00
Andreas Koepf (aider)	6d59648264	feat: Add ReseedingDataset wrapper for infinite procedural datasets	2025-01-30 21:56:43 +01:00
Andreas Koepf	c196d622e0	extract answer from last answer tag	2025-01-28 16:37:19 +00:00
Andreas Koepf	cc0312e446	add first example with OpenRLHF	2025-01-28 14:40:06 +00:00
Andreas Koepf	e9549f2a63	pass config to ProceduralDataset base	2025-01-25 00:23:05 +01:00
Andreas Koepf (aider)	2befe97151	feat: Add dataset factory with registration and creation functions	2025-01-25 00:00:22 +01:00
Andreas Koepf	20069b2a7d	formatting	2025-01-24 10:34:07 +01:00
Andreas Koepf (aider)	719d760eea	feat: Add return type annotation to ProceduralDataset.__next__()	2025-01-24 10:16:27 +01:00
Andreas Koepf (aider)	2a6a9655d7	feat: Add Sized and Iterable base classes to ProceduralDataset	2025-01-24 10:14:42 +01:00
Andreas Koepf (aider)	018bdcef6f	feat: Add ProceduralDataset abstract base class for dataset generators	2025-01-24 09:51:04 +01:00

22 commits