Commit graph

212 commits

Author SHA1 Message Date
Zafir Stojanovski
e8601a63b4 feat(env): Group Anagrams Curriculum (#288)
* group anagrams curriculum
2025-03-08 01:49:12 +01:00
Zafir Stojanovski
07eb434d61 feat(env): Count Primes Curriculum (#287)
* count primes curriculum
2025-03-08 01:48:00 +01:00
Zafir Stojanovski
488b72f6f1 base conversion curriculum (#286) 2025-03-08 01:46:32 +01:00
Zafir Stojanovski
dc657b5ed4 feat(env): Binary Matrix Curriculum (#279)
* binary matrix curriculum

* register BinaryMatrixCurriculum

---------

Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>
2025-03-07 22:58:47 +01:00
Zafir Stojanovski
dfc28c94d6 feat(env): Binary Alternation Curriculum (#278)
* binary alternation

---------

Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>
2025-03-07 22:44:32 +01:00
Andreas Koepf
2b1f7ce5ee use relative import for reasoning_gym.data 2025-03-07 15:56:45 +01:00
Rich Jones
11c9790a25 [Env] Game of Life Halting Prediction (#272)
This is a variant of the Game of Life task, which rather than trying to test the algorithmic simulation, tests the ability of the model to do explanatory reasoning of the board. The idea is that a model with good explanatory reasoning will be able to see that a game will not halt without simulating it into the future.

The task presents a GoL board, and the model is asked to predict if the board will halt (die, all cells zero) after n steps. Sometimes, the board will be made up of 'oscillators', isolated structures which never die. Othertimes, it is filled with non-oscillators, structures which will always die after a few steps. The model should deduce which case the presented board is.
2025-03-07 10:05:12 +01:00
joesharratt1229
1893691c57 updated algorithmics dataset (#269)
* updated algorithmic datasets
* added changes to symbolic and power
* updated power function test
2025-03-05 23:32:53 +01:00
Andreas Köpf
b2904ccab9 Minor question template & score_answer improvements (#261)
* math prompt improvements
* ignore brackets in complex_arithmetic results
* improve additional instruction in prompt of polynomial_equations
* more strict tests for score_answer in polynomial_equations
* simplify special reward handling
* fix test_intermediate_integration
* fix sokoban dataset
* add common dataset score_answer consistency test
2025-03-04 21:55:09 +01:00
Rich Jones
e3b7365f50 Game of Life partial scoring and rule-clarification (#258)
* partial scoring and rule clarification
* better ql scoring
* word seq reverse typos
2025-03-03 22:22:39 +01:00
Zafir Stojanovski
2f9d94c1e7 fix: Unify Prompts (#254)
* remove cot
* fix prompt template
* fix pool matrix
* spiral matrix fixed
2025-03-03 21:55:53 +01:00
joesharratt1229
976e1710a6 small change to word sequence reversal prompt (#252)
corrected ansewr format
2025-03-02 17:34:35 +01:00
Zafir Stojanovski
1bc9f6f09f fix manipulate matrix (#247) 2025-03-01 23:00:29 +01:00
Rich Jones
80aafda8e5 more dynamic scoring for jumble (#246) 2025-03-01 18:50:59 +01:00
Rich Jones
ca5372dcc1 rm typo 2025-02-27 13:44:33 +01:00
Rich Jones
9a8e398f22 fix graph color example template 2025-02-27 13:43:01 +01:00
AhmedSaif2
e9e36f3a23 Fix primes representation in count_primes dataset metadata 2025-02-26 14:58:21 +02:00
Andreas Köpf
6b923d5ea0 Fix PoolMatrixConfigs::score_answer(), add unit tests (#215) 2025-02-26 00:43:18 +01:00
Andreas Koepf
ba6bdb7d6b fix score_answer of pool_matrix (if -> elif), remove print 2025-02-25 23:43:29 +01:00
Andreas Koepf
969ec6a208 add try-except to GraphColorDataset.score_answer() 2025-02-25 23:43:29 +01:00
Andreas Koepf
d1f2f30d8a add None/empty check to score_answer of cryptarithm 2025-02-25 23:43:29 +01:00
Andreas Koepf
74f590e24f more native type hints 2025-02-21 21:23:14 +01:00
Andreas Köpf
de362fb76f Merge pull request #182 from zafstojano/env/binary-alternation
feat(env): Binary Alternation
2025-02-21 17:27:16 +01:00
Andreas Köpf
1e0f67f7a2 Merge pull request #175 from AhmedSaif2/fix-format
Add score_answer function to handle comma-formatted numbers
2025-02-21 15:36:21 +01:00
Andreas Koepf
ff5b210106 use native types List->list, Dict->dict, Set->set, Tuple->tuple 2025-02-21 15:15:38 +01:00
Zafir Stojanovski
a168605fc7 pre-commit 2025-02-21 13:39:05 +01:00
Zafir Stojanovski
6c46b93ae2 binary alternation 2025-02-21 13:09:21 +01:00
Andreas Köpf
07587d1647 Merge branch 'main' into env/rotten-oranges 2025-02-20 22:51:07 +01:00
Andreas Koepf
c1aeacad0b store possible answer in entry 'answer' field 2025-02-20 22:47:21 +01:00
Andreas Koepf (aider)
96bd177b9b docs: Add descriptive comments for num_jugs and difficulty parameters 2025-02-20 22:39:22 +01:00
Andreas Koepf (aider)
d8dac6272c feat: Add type hints to generate_puzzle and min_moves_n functions 2025-02-20 22:38:11 +01:00
Andreas Koepf
5f985d61c5 refactor: Simplify jug puzzle dataset generation and solution verification 2025-02-20 22:38:10 +01:00
Zafir Stojanovski
67617a0a42 remove empty space 2025-02-20 22:35:53 +01:00
Zafir Stojanovski
51ea7778ee rotten oranges 2025-02-20 22:33:39 +01:00
Rich Jones
3dad5f9eca jugs jugs jugs lint 2025-02-20 16:15:29 +01:00
Rich Jones
3eef5841f6 basic jugs 2025-02-20 15:24:46 +01:00
Andreas Köpf
3adf5b6c22 Merge pull request #158 from open-thought/rich/decimalmath
Decimal Arithmetic
2025-02-20 12:35:41 +01:00
Andreas Koepf
67bf9d10cb use correct signature for CryptarithmDataset.score_answer() method 2025-02-20 11:55:32 +01:00
Andreas Köpf
c6e0e5a6a2 Merge pull request #155 from theblackcat102/cryptarithm
Cryptarithm add score_answer function
2025-02-20 11:28:03 +01:00
Rich Jones
988b093800 fix weird GoL fmt 2025-02-20 11:09:29 +01:00
theblackcat102
daa3b309f2 [fix] precommit not happy 2025-02-20 17:00:18 +08:00
theblackcat102
9c955f5b3c [feat] remove answer parsing since its already handled 2025-02-20 16:57:51 +08:00
joesharratt1229
edfe7f19b1 cleaned up caesar cipher 2025-02-19 18:11:59 +00:00
theblackcat102
c95a163ea6 [fix] normalize to <answer></answer> 2025-02-19 08:40:31 +08:00
theblackcat102
822c096e73 [fix] pre-commit fix 2025-02-18 21:48:54 +08:00
theblackcat102
e714d5e395 [feat] add test case 2025-02-18 21:45:51 +08:00
theblackcat102
0e49bd8180 Merge remote-tracking branch 'upstream/main' into cryptarithm 2025-02-18 21:34:42 +08:00
theblackcat102
3975f78343 [feat] added score_answer function 2025-02-18 21:33:14 +08:00
Zafir Stojanovski
6b0ea7a14c lint 2025-02-18 14:24:07 +01:00
Zafir Stojanovski
fd5c47d634 update generation of input string 2025-02-18 14:23:05 +01:00