Commit graph

14 commits

Author SHA1 Message Date
Andreas Köpf
b2904ccab9 Minor question template & score_answer improvements (#261)
* math prompt improvements
* ignore brackets in complex_arithmetic results
* improve additional instruction in prompt of polynomial_equations
* more strict tests for score_answer in polynomial_equations
* simplify special reward handling
* fix test_intermediate_integration
* fix sokoban dataset
* add common dataset score_answer consistency test
2025-03-04 21:55:09 +01:00
Andreas Köpf
17f87476a3 add Chain of Draft and direct system prompt styles (#255) 2025-03-03 21:56:31 +01:00
Andreas Köpf
ece6990709 Remove strip from ProceduralDataset::core score_answer() (#250)
* remove strip from ProceduralDataset::core score_answer(), strip in extract answer (optional, default=True)
* test: Move test_extract_answer() from test_dataset.py to test_utils.py
* refactor: Improve decimal reward computation with more flexible comparison
* fix: Implement rounding for format_number when round_if_needed is True
* test: Add test case for compute_decimal_reward with sign and zeros
2025-03-02 08:46:36 +01:00
Andreas Köpf
1ea9a657a7 Eval script consolidation (#238)
The script now supports:
   - YAML and JSON configurations
   - Dataset-specific parameters
   - Overriding configuration via command line
   - Detailed logging and error handling
2025-02-27 17:39:14 +01:00
AhmedSaif2
75cbfb8783 fix parameter name in compute_decimal_reward docstring 2025-02-21 17:01:59 +02:00
Andreas Koepf
476e37e70b use Decimal class for numeric comparison e.g. +0123.100 == 123.1 2025-02-21 15:36:06 +01:00
AhmedSaif2
6b5c7a8637 add a helper function to handle redundant code 2025-02-21 15:54:00 +02:00
Zafir Stojanovski
b5f5733052 update system prompt 2025-02-15 17:41:05 +01:00
Andreas Koepf
4ddd04a825 incorporate prompt changes suggested by Miserlou 2025-02-14 15:44:00 +01:00
Andreas Koepf
2726caf2fe ignore single whitespace at beginning and end of answer, use reward = len(oracle_answer) / len(answer) 2025-02-14 15:40:12 +01:00
Zafir Stojanovski
52a56cbc4f system prompt for structured output, and parse such outputs 2025-02-12 10:44:42 +01:00
Andreas Koepf
3ca9a709e8 gsm_symbolic generator changes 2025-02-05 20:58:01 +01:00
Andreas Koepf
1bc56b8559 extract answer from last answer tag 2025-01-28 16:37:19 +00:00
Andreas Koepf
655de7a7f3 add first example with OpenRLHF 2025-01-28 14:40:06 +00:00