Commit graph

784 commits

Author SHA1 Message Date
Andreas Koepf
fa1bf7910a update gallery, pypi release, bump version 2025-03-05 23:45:45 +01:00
joesharratt1229
1893691c57 updated algorithmics dataset (#269)
* updated algorithmic datasets
* added changes to symbolic and power
* updated power function test
2025-03-05 23:32:53 +01:00
Zafir Stojanovski
f843ac1b82 shortest path curriculum (#271) 2025-03-05 22:46:10 +01:00
Zafir Stojanovski
a048084009 largest island curriculum (#270) 2025-03-05 22:45:35 +01:00
Zafir Stojanovski
3d9bb382aa feat(env): Count Bits Curriculum (#267)
* add min n

* count bits
2025-03-05 22:44:04 +01:00
Zafir Stojanovski
84158df1c7 feat(env): Course Schedule Curriculum (#266)
* course schedule curriculum

* update levels

* update comments

* lint
2025-03-05 22:42:46 +01:00
joesharratt1229
2c524c0c6f Added puzzle24 closes #208 (#268)
* added puzzle24
2025-03-05 22:36:37 +01:00
Oliver Stanley
3286a68361 First version of CodeI/O reasoning data (#264)
* notebook for prepping first set of raw code files
* updated codeio processing notebook for repo-level processing
* fix for edge case in codeio scoring
* Add reformat notebook
* filtering pass
* add non-determinism filtering
* Tweak CodeIODataset & include first real data
* add basic codeio test, metadata
2025-03-05 22:34:11 +01:00
joesharratt1229
7458dbc95d Fixed countdown score_answer (#265)
* fixed countdown score ans
* checked solution uses all numbers
2025-03-05 22:30:12 +01:00
Zafir Stojanovski
3c544aba20 feat(env): Mahjong Puzzle Curriculum (#263)
* mahjong curriculum

* typo

* update levels
2025-03-05 22:28:02 +01:00
Zafir Stojanovski
19ca54da72 feat(env): NQueens Curriculum (#262)
* curriculum & tests
2025-03-05 15:05:17 +01:00
Andreas Köpf
b2904ccab9 Minor question template & score_answer improvements (#261)
* math prompt improvements
* ignore brackets in complex_arithmetic results
* improve additional instruction in prompt of polynomial_equations
* more strict tests for score_answer in polynomial_equations
* simplify special reward handling
* fix test_intermediate_integration
* fix sokoban dataset
* add common dataset score_answer consistency test
2025-03-04 21:55:09 +01:00
joesharratt1229
bf24999bb0 implemented family_relationships score ans (#260) 2025-03-04 21:37:57 +01:00
Rich Jones
e3b7365f50 Game of Life partial scoring and rule-clarification (#258)
* partial scoring and rule clarification
* better ql scoring
* word seq reverse typos
2025-03-03 22:22:39 +01:00
Andreas Köpf
07388767a2 Reduce precision from 28 to 6 in DecimalArithmeticDataset (#256) 2025-03-03 21:57:08 +01:00
Andreas Köpf
17f87476a3 add Chain of Draft and direct system prompt styles (#255) 2025-03-03 21:56:31 +01:00
Zafir Stojanovski
2f9d94c1e7 fix: Unify Prompts (#254)
* remove cot
* fix prompt template
* fix pool matrix
* spiral matrix fixed
2025-03-03 21:55:53 +01:00
joesharratt1229
976e1710a6 small change to word sequence reversal prompt (#252)
corrected ansewr format
2025-03-02 17:34:35 +01:00
vncntt
8992037ecc fixed problems in knights_knaves (#251)
* remove unnecessary variables

* added depth logic

* add depth tests
2025-03-02 08:47:54 +01:00
Andreas Köpf
ece6990709 Remove strip from ProceduralDataset::core score_answer() (#250)
* remove strip from ProceduralDataset::core score_answer(), strip in extract answer (optional, default=True)
* test: Move test_extract_answer() from test_dataset.py to test_utils.py
* refactor: Improve decimal reward computation with more flexible comparison
* fix: Implement rounding for format_number when round_if_needed is True
* test: Add test case for compute_decimal_reward with sign and zeros
2025-03-02 08:46:36 +01:00
Andreas Köpf
1b1c04bb70 feat: Add category property to ProceduralDataset to extract category name (#248) 2025-03-01 23:11:40 +01:00
Zafir Stojanovski
1bc9f6f09f fix manipulate matrix (#247) 2025-03-01 23:00:29 +01:00
Rich Jones
80aafda8e5 more dynamic scoring for jumble (#246) 2025-03-01 18:50:59 +01:00
Zafir Stojanovski
78c92d7056 Mahjong Puzzle (#241)
* mahjong
2025-03-01 16:27:26 +01:00
Andreas Koepf
d83e53115a fix prompt for arc_1d 2025-02-28 08:07:59 +01:00
Andreas Köpf
1ea9a657a7 Eval script consolidation (#238)
The script now supports:
   - YAML and JSON configurations
   - Dataset-specific parameters
   - Overriding configuration via command line
   - Detailed logging and error handling
2025-02-27 17:39:14 +01:00
Andreas Köpf
bd745ae959 Merge pull request #237 from open-thought/rich/richmorevalfixes2
Fix graph color example template
2025-02-27 16:08:23 +01:00
Rich Jones
ca5372dcc1 rm typo 2025-02-27 13:44:33 +01:00
Rich Jones
9a8e398f22 fix graph color example template 2025-02-27 13:43:01 +01:00
Andreas Köpf
ed90fff3fa Merge pull request #220 from open-thought/rich/cubeinstructions
Make Rubiks Cube Output Format More Explicit
2025-02-27 12:16:09 +01:00
Rich Jones
a1b1272e8d sm fixes 2025-02-27 11:54:04 +01:00
Rich Jones
9daaccc208 expand more 2025-02-27 10:41:30 +01:00
Andreas Koepf (aider)
a92dcd4a75 feat: Add comprehensive unit tests for parse_string_to_complex() method 2025-02-26 21:44:32 +01:00
Andreas Koepf
2362b52d24 add markdown tripple backticks around tsumego board 2025-02-26 19:39:05 +01:00
Andreas Koepf
2ddcb7c3c7 fix & simplify score_answer() of TsumegoDataset 2025-02-26 19:04:30 +01:00
Andreas Koepf
3bdf531122 bump version, pypi release of 0.1.12 2025-02-26 18:25:16 +01:00
Oliver Stanley
a0d466765a Merge pull request #188 from olliestanley/codeio-sampler
Procedural dataset for generating reasoning problems from CodeI/O-style data
2025-02-26 16:51:45 +00:00
Andreas Köpf
7c4ab296fd Merge pull request #231 from AhmedSaif2/count-primes
Fix primes representation in count_primes dataset metadata
2025-02-26 17:49:50 +01:00
Andreas Köpf
42d42aae89 Merge pull request #219 from open-thought/rich/fix_ccc
Fix Cube Rotation Scoring
2025-02-26 17:41:18 +01:00
AhmedSaif2
e9e36f3a23 Fix primes representation in count_primes dataset metadata 2025-02-26 14:58:21 +02:00
Rich Jones
214e9d4957 support expanded notation anyway 2025-02-26 13:17:03 +01:00
Rich Jones
b252937f99 rubiks cube instructions 2025-02-26 13:07:17 +01:00
Rich Jones
f2479fcacc fix CCC scoring 2025-02-26 12:54:40 +01:00
Oliver
8f05e6108c Fix 2025-02-26 11:17:23 +00:00
Andreas Köpf
6b923d5ea0 Fix PoolMatrixConfigs::score_answer(), add unit tests (#215) 2025-02-26 00:43:18 +01:00
Andreas Koepf
ba6bdb7d6b fix score_answer of pool_matrix (if -> elif), remove print 2025-02-25 23:43:29 +01:00
Andreas Koepf
969ec6a208 add try-except to GraphColorDataset.score_answer() 2025-02-25 23:43:29 +01:00
Andreas Koepf
d1f2f30d8a add None/empty check to score_answer of cryptarithm 2025-02-25 23:43:29 +01:00
Oliver
58caf1fbea Merge branch 'main' into codeio-sampler 2025-02-25 22:41:47 +00:00
Oliver
4bdb8c7d6b Add note on code execution to CodeIODataset 2025-02-25 22:39:06 +00:00