Commit graph

438 commits

Author SHA1 Message Date
Zafir Stojanovski
194f08cad2
pool matrix curriculum (#298) 2025-03-08 20:57:22 +01:00
Zafir Stojanovski
5963cbd59e
rotten oranges curriculum (#297) 2025-03-08 20:56:46 +01:00
Zafir Stojanovski
6270e835bb
spiral matrix curriculum (#296) 2025-03-08 20:56:08 +01:00
Andreas Köpf
6615d8e662
Show curricula (#295)
* feat: Add debug_curricula.py script to generate CURRICULA.md with dataset curriculum details
2025-03-08 14:21:50 +01:00
Zafir Stojanovski
edab0389b6
rotate matrix curriculum (#294) 2025-03-08 01:58:54 +01:00
Zafir Stojanovski
8d4e9030c0
manipulate matrix curriculum (#293) 2025-03-08 01:57:37 +01:00
Zafir Stojanovski
e69ed78c26
feat(env): Isomorphic Strings Curriculum (#292)
* isomorphic strings curriculum

---------

Co-authored-by: Andreas Köpf <andreas.koepf@xamla.com>
2025-03-08 01:56:14 +01:00
joesharratt1229
a7dd5f6680
added power function exponent (#291)
* added power function exponent

* register PowerFunctionCurriculum

---------

Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>
2025-03-08 01:54:36 +01:00
joesharratt1229
af5a6533c8
added word sort curriculum (#289) 2025-03-08 01:50:13 +01:00
Zafir Stojanovski
2d05a48f9b
feat(env): Group Anagrams Curriculum (#288)
* group anagrams curriculum
2025-03-08 01:49:12 +01:00
Zafir Stojanovski
9fc9cf4597
feat(env): Count Primes Curriculum (#287)
* count primes curriculum
2025-03-08 01:48:00 +01:00
Zafir Stojanovski
adf8cd8f6d
base conversion curriculum (#286) 2025-03-08 01:46:32 +01:00
vncntt
7199363339
dice curriculum (#284)
* curriculum + unit tests
* add difficulty to metadata

---------

Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>
2025-03-08 01:43:45 +01:00
vncntt
8c80bf6bec
Calendar arithmetic curriculum (#283)
* calendar arithmetic curriculum
* add difficulty to metadata
* register CalendarArithmeticCurriculum

---------

Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>
2025-03-08 01:38:22 +01:00
vncntt
775a42e9e4
Bitwise arithmetic curriculum (#282)
* bitwise_arithmetic curriculum
* register BitwiseArithmeticCurriculum

---------

Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>
2025-03-08 01:32:00 +01:00
joesharratt1229
444c793d3f
added Decimal curriculum (#280)
* added decimal curricula

* added chain sum decimal curriculum

* register DecimalArithmeticCurriculum & DecimalChainSumCurriculum

---------

Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>
2025-03-07 23:02:57 +01:00
Zafir Stojanovski
25b8e35589
feat(env): Binary Matrix Curriculum (#279)
* binary matrix curriculum

* register BinaryMatrixCurriculum

---------

Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>
2025-03-07 22:58:47 +01:00
joesharratt1229
1888fe2bb4
added basic arith curricula (#276)
* added basic arith curricula
* register BasicArithmeticCurriculum

---------

Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>
2025-03-07 22:54:49 +01:00
Zafir Stojanovski
a8e920b552
feat(env): Binary Alternation Curriculum (#278)
* binary alternation

---------

Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>
2025-03-07 22:44:32 +01:00
Zafir Stojanovski
e560cb3c46
feat(env): Leg Counting Curriculum (#275)
* leg  counting curriculum

---------

Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>
2025-03-07 19:15:18 +01:00
Andreas Köpf
c69bc5d4e6
Basic curriculum (#198)
* feat: Add optional curriculum support to dataset registration and creation
* docs: Add docstrings to create_curriculum() and register_dataset()
* feat: Add curriculum configuration classes for CurriculumExperiment
* feat: Add weight parameter to CurriculumAttributeConfig and use in DatasetSpec
* refactor: Simplify CurriculumAttributeConfig with "*" attribute level support
* test: Add unit tests for CurriculumExperiment class
* feat: Add from_yaml() method to CurriculumExperimentConfig with unit test
2025-03-07 11:22:12 +01:00
Rich Jones
cbfdf097a0
Add Modulo Grid Task (#273)
* add modulo_grid dataset
* ensure the pattern is mathematical, not just spatial

---------

Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>
2025-03-07 11:11:41 +01:00
Rich Jones
07dc01ad87
[Env] Game of Life Halting Prediction (#272)
This is a variant of the Game of Life task, which rather than trying to test the algorithmic simulation, tests the ability of the model to do explanatory reasoning of the board. The idea is that a model with good explanatory reasoning will be able to see that a game will not halt without simulating it into the future.

The task presents a GoL board, and the model is asked to predict if the board will halt (die, all cells zero) after n steps. Sometimes, the board will be made up of 'oscillators', isolated structures which never die. Othertimes, it is filled with non-oscillators, structures which will always die after a few steps. The model should deduce which case the presented board is.
2025-03-07 10:05:12 +01:00
joesharratt1229
d9638df79c
updated algorithmics dataset (#269)
* updated algorithmic datasets
* added changes to symbolic and power
* updated power function test
2025-03-05 23:32:53 +01:00
Zafir Stojanovski
f426db90ec
shortest path curriculum (#271) 2025-03-05 22:46:10 +01:00
Zafir Stojanovski
5bac641650
largest island curriculum (#270) 2025-03-05 22:45:35 +01:00
Zafir Stojanovski
9bb6d028a3
feat(env): Count Bits Curriculum (#267)
* add min n

* count bits
2025-03-05 22:44:04 +01:00
Zafir Stojanovski
8ccc4d7b0c
feat(env): Course Schedule Curriculum (#266)
* course schedule curriculum

* update levels

* update comments

* lint
2025-03-05 22:42:46 +01:00
joesharratt1229
f3ee9a91a2
Added puzzle24 closes #208 (#268)
* added puzzle24
2025-03-05 22:36:37 +01:00
Oliver Stanley
d1e505a8e9
First version of CodeI/O reasoning data (#264)
* notebook for prepping first set of raw code files
* updated codeio processing notebook for repo-level processing
* fix for edge case in codeio scoring
* Add reformat notebook
* filtering pass
* add non-determinism filtering
* Tweak CodeIODataset & include first real data
* add basic codeio test, metadata
2025-03-05 22:34:11 +01:00
joesharratt1229
e30be066ec
Fixed countdown score_answer (#265)
* fixed countdown score ans
* checked solution uses all numbers
2025-03-05 22:30:12 +01:00
Zafir Stojanovski
d0a42116fb
feat(env): Mahjong Puzzle Curriculum (#263)
* mahjong curriculum

* typo

* update levels
2025-03-05 22:28:02 +01:00
Zafir Stojanovski
8ecc723607
feat(env): NQueens Curriculum (#262)
* curriculum & tests
2025-03-05 15:05:17 +01:00
Andreas Köpf
5d7fbac0ad
Minor question template & score_answer improvements (#261)
* math prompt improvements
* ignore brackets in complex_arithmetic results
* improve additional instruction in prompt of polynomial_equations
* more strict tests for score_answer in polynomial_equations
* simplify special reward handling
* fix test_intermediate_integration
* fix sokoban dataset
* add common dataset score_answer consistency test
2025-03-04 21:55:09 +01:00
Rich Jones
0ba6119850
Game of Life partial scoring and rule-clarification (#258)
* partial scoring and rule clarification
* better ql scoring
* word seq reverse typos
2025-03-03 22:22:39 +01:00
vncntt
3149edf2c4
fixed problems in knights_knaves (#251)
* remove unnecessary variables

* added depth logic

* add depth tests
2025-03-02 08:47:54 +01:00
Andreas Köpf
24828e1889
Remove strip from ProceduralDataset::core score_answer() (#250)
* remove strip from ProceduralDataset::core score_answer(), strip in extract answer (optional, default=True)
* test: Move test_extract_answer() from test_dataset.py to test_utils.py
* refactor: Improve decimal reward computation with more flexible comparison
* fix: Implement rounding for format_number when round_if_needed is True
* test: Add test case for compute_decimal_reward with sign and zeros
2025-03-02 08:46:36 +01:00
Zafir Stojanovski
f549909c3d
fix manipulate matrix (#247) 2025-03-01 23:00:29 +01:00
Rich Jones
39f151ad14
more dynamic scoring for jumble (#246) 2025-03-01 18:50:59 +01:00
Zafir Stojanovski
9c581f1be1
Mahjong Puzzle (#241)
* mahjong
2025-03-01 16:27:26 +01:00
Andreas Köpf
c98cc5fcd6
Merge pull request #220 from open-thought/rich/cubeinstructions
Make Rubiks Cube Output Format More Explicit
2025-02-27 12:16:09 +01:00
Rich Jones
52d6b2efd2 seed test config 2025-02-27 10:44:28 +01:00
Rich Jones
633a1aa1ba expand more 2025-02-27 10:41:30 +01:00
Andreas Koepf (aider)
941da618d8 feat: Add comprehensive unit tests for parse_string_to_complex() method 2025-02-26 21:44:32 +01:00
Andreas Koepf
f97bf94caa fix & simplify score_answer() of TsumegoDataset 2025-02-26 19:04:30 +01:00
Andreas Köpf
5e1594da16
Merge pull request #231 from AhmedSaif2/count-primes
Fix primes representation in count_primes dataset metadata
2025-02-26 17:49:50 +01:00
Andreas Köpf
e351d302a3
Merge pull request #219 from open-thought/rich/fix_ccc
Fix Cube Rotation Scoring
2025-02-26 17:41:18 +01:00
AhmedSaif2
dcdc38b15d Fix primes representation in count_primes dataset metadata 2025-02-26 14:58:21 +02:00
Rich Jones
229086131a fix CCC scoring 2025-02-26 12:54:40 +01:00
Andreas Köpf
48f082663a
Fix PoolMatrixConfigs::score_answer(), add unit tests (#215) 2025-02-26 00:43:18 +01:00