Commit graph

21 commits

Author SHA1 Message Date
Andreas Köpf
5d7fbac0ad
Minor question template & score_answer improvements (#261)
* math prompt improvements
* ignore brackets in complex_arithmetic results
* improve additional instruction in prompt of polynomial_equations
* more strict tests for score_answer in polynomial_equations
* simplify special reward handling
* fix test_intermediate_integration
* fix sokoban dataset
* add common dataset score_answer consistency test
2025-03-04 21:55:09 +01:00
Andreas Köpf
24828e1889
Remove strip from ProceduralDataset::core score_answer() (#250)
* remove strip from ProceduralDataset::core score_answer(), strip in extract answer (optional, default=True)
* test: Move test_extract_answer() from test_dataset.py to test_utils.py
* refactor: Improve decimal reward computation with more flexible comparison
* fix: Implement rounding for format_number when round_if_needed is True
* test: Add test case for compute_decimal_reward with sign and zeros
2025-03-02 08:46:36 +01:00
Andreas Köpf
e71d2a96b6
feat: Add category property to ProceduralDataset to extract category name (#248) 2025-03-01 23:11:40 +01:00
Andreas Köpf
850c1cf6f4
Eval script consolidation (#238)
The script now supports:
   - YAML and JSON configurations
   - Dataset-specific parameters
   - Overriding configuration via command line
   - Detailed logging and error handling
2025-02-27 17:39:14 +01:00
Andreas Koepf
3e7ff3b084 use native types List->list, Dict->dict, Set->set, Tuple->tuple 2025-02-21 15:15:38 +01:00
Zafir Stojanovski
beb509b6f6 strip answer and solution 2025-02-16 15:39:10 +01:00
Andreas Koepf
3f98afe47d more tolerant parsing of futoshiki answers 2025-02-16 14:23:40 +01:00
Andreas Koepf
0a660a3409 ignore single whitespace at beginning and end of answer, use reward = len(oracle_answer) / len(answer) 2025-02-14 15:40:12 +01:00
Andreas Koepf
5a88cf2529 add simple dataset gallery generation script 2025-01-30 22:30:26 +01:00
Andreas Koepf (aider)
e2d3f4b4e6 feat: Add seed wrapping at 2^32 to prevent unbounded growth 2025-01-30 22:05:14 +01:00
Andreas Koepf (aider)
dc54a7672f refactor: Use self.dataset.seed directly for chunk seed generation 2025-01-30 22:02:21 +01:00
Andreas Koepf (aider)
66f99be4a3 feat: Add score_answer method to ReseedingDataset 2025-01-30 21:59:50 +01:00
Andreas Koepf (aider)
6d59648264 feat: Add ReseedingDataset wrapper for infinite procedural datasets 2025-01-30 21:56:43 +01:00
Andreas Koepf
c196d622e0 extract answer from last answer tag 2025-01-28 16:37:19 +00:00
Andreas Koepf
cc0312e446 add first example with OpenRLHF 2025-01-28 14:40:06 +00:00
Andreas Koepf
e9549f2a63 pass config to ProceduralDataset base 2025-01-25 00:23:05 +01:00
Andreas Koepf (aider)
2befe97151 feat: Add dataset factory with registration and creation functions 2025-01-25 00:00:22 +01:00
Andreas Koepf
20069b2a7d formatting 2025-01-24 10:34:07 +01:00
Andreas Koepf (aider)
719d760eea feat: Add return type annotation to ProceduralDataset.__next__() 2025-01-24 10:16:27 +01:00
Andreas Koepf (aider)
2a6a9655d7 feat: Add Sized and Iterable base classes to ProceduralDataset 2025-01-24 10:14:42 +01:00
Andreas Koepf (aider)
018bdcef6f feat: Add ProceduralDataset abstract base class for dataset generators 2025-01-24 09:51:04 +01:00