Commit graph

780 commits

Author SHA1 Message Date
Zafir Stojanovski
9bb6d028a3
feat(env): Count Bits Curriculum (#267)
* add min n

* count bits
2025-03-05 22:44:04 +01:00
Zafir Stojanovski
8ccc4d7b0c
feat(env): Course Schedule Curriculum (#266)
* course schedule curriculum

* update levels

* update comments

* lint
2025-03-05 22:42:46 +01:00
joesharratt1229
f3ee9a91a2
Added puzzle24 closes #208 (#268)
* added puzzle24
2025-03-05 22:36:37 +01:00
Oliver Stanley
d1e505a8e9
First version of CodeI/O reasoning data (#264)
* notebook for prepping first set of raw code files
* updated codeio processing notebook for repo-level processing
* fix for edge case in codeio scoring
* Add reformat notebook
* filtering pass
* add non-determinism filtering
* Tweak CodeIODataset & include first real data
* add basic codeio test, metadata
2025-03-05 22:34:11 +01:00
joesharratt1229
e30be066ec
Fixed countdown score_answer (#265)
* fixed countdown score ans
* checked solution uses all numbers
2025-03-05 22:30:12 +01:00
Zafir Stojanovski
d0a42116fb
feat(env): Mahjong Puzzle Curriculum (#263)
* mahjong curriculum

* typo

* update levels
2025-03-05 22:28:02 +01:00
Zafir Stojanovski
8ecc723607
feat(env): NQueens Curriculum (#262)
* curriculum & tests
2025-03-05 15:05:17 +01:00
Andreas Köpf
5d7fbac0ad
Minor question template & score_answer improvements (#261)
* math prompt improvements
* ignore brackets in complex_arithmetic results
* improve additional instruction in prompt of polynomial_equations
* more strict tests for score_answer in polynomial_equations
* simplify special reward handling
* fix test_intermediate_integration
* fix sokoban dataset
* add common dataset score_answer consistency test
2025-03-04 21:55:09 +01:00
joesharratt1229
061282e373
implemented family_relationships score ans (#260) 2025-03-04 21:37:57 +01:00
Rich Jones
0ba6119850
Game of Life partial scoring and rule-clarification (#258)
* partial scoring and rule clarification
* better ql scoring
* word seq reverse typos
2025-03-03 22:22:39 +01:00
Andreas Köpf
c0cf237474
Reduce precision from 28 to 6 in DecimalArithmeticDataset (#256) 2025-03-03 21:57:08 +01:00
Andreas Köpf
68ecdca2bb
add Chain of Draft and direct system prompt styles (#255) 2025-03-03 21:56:31 +01:00
Zafir Stojanovski
01e1c8f9af
fix: Unify Prompts (#254)
* remove cot
* fix prompt template
* fix pool matrix
* spiral matrix fixed
2025-03-03 21:55:53 +01:00
joesharratt1229
49db4ed761
small change to word sequence reversal prompt (#252)
corrected ansewr format
2025-03-02 17:34:35 +01:00
vncntt
3149edf2c4
fixed problems in knights_knaves (#251)
* remove unnecessary variables

* added depth logic

* add depth tests
2025-03-02 08:47:54 +01:00
Andreas Köpf
24828e1889
Remove strip from ProceduralDataset::core score_answer() (#250)
* remove strip from ProceduralDataset::core score_answer(), strip in extract answer (optional, default=True)
* test: Move test_extract_answer() from test_dataset.py to test_utils.py
* refactor: Improve decimal reward computation with more flexible comparison
* fix: Implement rounding for format_number when round_if_needed is True
* test: Add test case for compute_decimal_reward with sign and zeros
2025-03-02 08:46:36 +01:00
Andreas Köpf
e71d2a96b6
feat: Add category property to ProceduralDataset to extract category name (#248) 2025-03-01 23:11:40 +01:00
Zafir Stojanovski
f549909c3d
fix manipulate matrix (#247) 2025-03-01 23:00:29 +01:00
Rich Jones
39f151ad14
more dynamic scoring for jumble (#246) 2025-03-01 18:50:59 +01:00
Zafir Stojanovski
9c581f1be1
Mahjong Puzzle (#241)
* mahjong
2025-03-01 16:27:26 +01:00
Andreas Koepf
b1c8840129 fix prompt for arc_1d 2025-02-28 08:07:59 +01:00
Andreas Köpf
850c1cf6f4
Eval script consolidation (#238)
The script now supports:
   - YAML and JSON configurations
   - Dataset-specific parameters
   - Overriding configuration via command line
   - Detailed logging and error handling
2025-02-27 17:39:14 +01:00
Andreas Köpf
8a66d2a216
Merge pull request #237 from open-thought/rich/richmorevalfixes2
Fix graph color example template
2025-02-27 16:08:23 +01:00
Rich Jones
a6c90f40a1 rm typo 2025-02-27 13:44:33 +01:00
Rich Jones
1b95cd3206 fix graph color example template 2025-02-27 13:43:01 +01:00
Andreas Köpf
c98cc5fcd6
Merge pull request #220 from open-thought/rich/cubeinstructions
Make Rubiks Cube Output Format More Explicit
2025-02-27 12:16:09 +01:00
Rich Jones
253e49aecf sm fixes 2025-02-27 11:54:04 +01:00
Rich Jones
633a1aa1ba expand more 2025-02-27 10:41:30 +01:00
Andreas Koepf (aider)
941da618d8 feat: Add comprehensive unit tests for parse_string_to_complex() method 2025-02-26 21:44:32 +01:00
Andreas Koepf
6511725711 add markdown tripple backticks around tsumego board 2025-02-26 19:39:05 +01:00
Andreas Koepf
f97bf94caa fix & simplify score_answer() of TsumegoDataset 2025-02-26 19:04:30 +01:00
Andreas Koepf
72233fc2ea bump version, pypi release of 0.1.12 2025-02-26 18:25:16 +01:00
Oliver Stanley
ac4ce13369
Merge pull request #188 from olliestanley/codeio-sampler
Procedural dataset for generating reasoning problems from CodeI/O-style data
2025-02-26 16:51:45 +00:00
Andreas Köpf
5e1594da16
Merge pull request #231 from AhmedSaif2/count-primes
Fix primes representation in count_primes dataset metadata
2025-02-26 17:49:50 +01:00
Andreas Köpf
e351d302a3
Merge pull request #219 from open-thought/rich/fix_ccc
Fix Cube Rotation Scoring
2025-02-26 17:41:18 +01:00
AhmedSaif2
dcdc38b15d Fix primes representation in count_primes dataset metadata 2025-02-26 14:58:21 +02:00
Rich Jones
f0ca949aaf support expanded notation anyway 2025-02-26 13:17:03 +01:00
Rich Jones
285e2b20cc rubiks cube instructions 2025-02-26 13:07:17 +01:00
Rich Jones
229086131a fix CCC scoring 2025-02-26 12:54:40 +01:00
Oliver
5fa06c961f Fix 2025-02-26 11:17:23 +00:00
Andreas Köpf
48f082663a
Fix PoolMatrixConfigs::score_answer(), add unit tests (#215) 2025-02-26 00:43:18 +01:00
Andreas Koepf
bba128ffd0 fix score_answer of pool_matrix (if -> elif), remove print 2025-02-25 23:43:29 +01:00
Andreas Koepf
f9e8f8b064 add try-except to GraphColorDataset.score_answer() 2025-02-25 23:43:29 +01:00
Andreas Koepf
65d17b9850 add None/empty check to score_answer of cryptarithm 2025-02-25 23:43:29 +01:00
Oliver
aa6759c160 Merge branch 'main' into codeio-sampler 2025-02-25 22:41:47 +00:00
Oliver
81c77a495d Add note on code execution to CodeIODataset 2025-02-25 22:39:06 +00:00
Oliver
0252dd905f Move data file & load into memory on first object creation 2025-02-25 22:36:38 +00:00
vncntt
5f01049607
Add KnightsKnavesDataset (knights_knaves)
Adapted code from https://github.com/AlphaPav/mem-kk-logic/blob/main/data_prep/lib_kk.py

---------

Co-authored-by: Andreas Koepf (aider) <andreas.koepf@provisio.com>
2025-02-25 20:15:38 +01:00
Oliver
fe502d5eb2 Register CodeIODataset 2025-02-24 18:28:35 +00:00
Oliver
43daec67ea Initial scoring algo for codeio 2025-02-24 18:27:53 +00:00