Andreas Köpf
5d7fbac0ad
Minor question template & score_answer improvements ( #261 )
...
* math prompt improvements
* ignore brackets in complex_arithmetic results
* improve additional instruction in prompt of polynomial_equations
* more strict tests for score_answer in polynomial_equations
* simplify special reward handling
* fix test_intermediate_integration
* fix sokoban dataset
* add common dataset score_answer consistency test
2025-03-04 21:55:09 +01:00
Rich Jones
0ba6119850
Game of Life partial scoring and rule-clarification ( #258 )
...
* partial scoring and rule clarification
* better ql scoring
* word seq reverse typos
2025-03-03 22:22:39 +01:00
vncntt
3149edf2c4
fixed problems in knights_knaves ( #251 )
...
* remove unnecessary variables
* added depth logic
* add depth tests
2025-03-02 08:47:54 +01:00
Andreas Köpf
24828e1889
Remove strip from ProceduralDataset::core score_answer() ( #250 )
...
* remove strip from ProceduralDataset::core score_answer(), strip in extract answer (optional, default=True)
* test: Move test_extract_answer() from test_dataset.py to test_utils.py
* refactor: Improve decimal reward computation with more flexible comparison
* fix: Implement rounding for format_number when round_if_needed is True
* test: Add test case for compute_decimal_reward with sign and zeros
2025-03-02 08:46:36 +01:00
Zafir Stojanovski
f549909c3d
fix manipulate matrix ( #247 )
2025-03-01 23:00:29 +01:00
Rich Jones
39f151ad14
more dynamic scoring for jumble ( #246 )
2025-03-01 18:50:59 +01:00
Zafir Stojanovski
9c581f1be1
Mahjong Puzzle ( #241 )
...
* mahjong
2025-03-01 16:27:26 +01:00
Andreas Köpf
c98cc5fcd6
Merge pull request #220 from open-thought/rich/cubeinstructions
...
Make Rubiks Cube Output Format More Explicit
2025-02-27 12:16:09 +01:00
Rich Jones
52d6b2efd2
seed test config
2025-02-27 10:44:28 +01:00
Rich Jones
633a1aa1ba
expand more
2025-02-27 10:41:30 +01:00
Andreas Koepf (aider)
941da618d8
feat: Add comprehensive unit tests for parse_string_to_complex() method
2025-02-26 21:44:32 +01:00
Andreas Koepf
f97bf94caa
fix & simplify score_answer() of TsumegoDataset
2025-02-26 19:04:30 +01:00
Andreas Köpf
5e1594da16
Merge pull request #231 from AhmedSaif2/count-primes
...
Fix primes representation in count_primes dataset metadata
2025-02-26 17:49:50 +01:00
Andreas Köpf
e351d302a3
Merge pull request #219 from open-thought/rich/fix_ccc
...
Fix Cube Rotation Scoring
2025-02-26 17:41:18 +01:00
AhmedSaif2
dcdc38b15d
Fix primes representation in count_primes dataset metadata
2025-02-26 14:58:21 +02:00
Rich Jones
229086131a
fix CCC scoring
2025-02-26 12:54:40 +01:00
Andreas Köpf
48f082663a
Fix PoolMatrixConfigs::score_answer(), add unit tests ( #215 )
2025-02-26 00:43:18 +01:00
vncntt
5f01049607
Add KnightsKnavesDataset (knights_knaves)
...
Adapted code from https://github.com/AlphaPav/mem-kk-logic/blob/main/data_prep/lib_kk.py
---------
Co-authored-by: Andreas Koepf (aider) <andreas.koepf@provisio.com>
2025-02-25 20:15:38 +01:00
Andreas Köpf
d115655f0a
Merge pull request #191 from zafstojano/env/shortest-path
...
feat(env): Shortest Path
2025-02-23 22:28:43 +01:00
Zafir Stojanovski
c5f37d5e9f
predict actual path
2025-02-23 18:24:23 +01:00
Andreas Koepf
469934d9b7
minor arc_1d tweaks
2025-02-23 16:37:40 +01:00
Andreas Koepf
ba56aa0092
add arc_1d size range test
2025-02-23 12:58:51 +01:00
Andreas Koepf
7a45b14a49
fix index out of range of arc_1d dataset ( #190 )
2025-02-23 12:51:41 +01:00
Zafir Stojanovski
97b3097984
shortest path
2025-02-23 11:25:00 +01:00
Andreas Köpf
c56045b9a7
Merge branch 'main' into feat/emoji-mystery
2025-02-21 20:58:39 +01:00
joesharratt1229
9b9554e489
added tests
2025-02-21 17:58:13 +00:00
Andreas Köpf
1c6359f1f3
Merge pull request #181 from open-thought/rich/bitwise
...
Add Bitwise Arithmetic
2025-02-21 17:27:45 +01:00
Andreas Köpf
2947038557
Merge pull request #182 from zafstojano/env/binary-alternation
...
feat(env): Binary Alternation
2025-02-21 17:27:16 +01:00
Andreas Koepf (aider)
af4d79e947
fix: Handle negative hex number prefix variations in bitwise arithmetic test
2025-02-21 17:23:50 +01:00
Andreas Koepf (aider)
e846c53347
test: Update bitwise arithmetic difficulty levels to [1, 2, 3]
2025-02-21 17:22:36 +01:00
Andreas Koepf (aider)
660f7e6f03
test: Add comprehensive unit tests for BitwiseArithmeticDataset
2025-02-21 17:21:00 +01:00
Andreas Köpf
700aab6114
Merge pull request #180 from Adefioye/list-functions
...
Add induction-based tasks for list functions
2025-02-21 16:20:49 +01:00
Andreas Köpf
802b8c4bed
Merge branch 'main' into fix/prop_logix
2025-02-21 15:38:29 +01:00
Andreas Köpf
a6a5d30f1c
Merge pull request #175 from AhmedSaif2/fix-format
...
Add score_answer function to handle comma-formatted numbers
2025-02-21 15:36:21 +01:00
Andreas Koepf
3e7ff3b084
use native types List->list, Dict->dict, Set->set, Tuple->tuple
2025-02-21 15:15:38 +01:00
AhmedSaif2
5c45e55340
extend format tests to allow questions that ends with question marks
2025-02-21 15:50:03 +02:00
Zafir Stojanovski
96464388bb
pre-commit
2025-02-21 13:39:05 +01:00
Zafir Stojanovski
941085e0c5
binary alternation
2025-02-21 13:09:21 +01:00
Rich Jones
17088e9b42
add bitwise arithmetic
2025-02-21 12:02:41 +01:00
abdulhakeem
624594bb1a
Commit more changes
2025-02-21 00:37:29 -06:00
joesharratt1229
ed10c5f9bc
added testing func for prop logic
2025-02-20 23:59:07 +00:00
Andreas Koepf
bedee59616
fix jugs unit test
2025-02-20 23:09:46 +01:00
Andreas Köpf
000e179781
Merge branch 'main' into env/rotten-oranges
2025-02-20 22:51:07 +01:00
Andreas Köpf
a8ce2747c1
Merge pull request #172 from open-thought/rich/jugs
...
Add Water Jug Puzzles
2025-02-20 22:48:12 +01:00
Zafir Stojanovski
0d65bf3668
rotten oranges
2025-02-20 22:33:39 +01:00
Andreas Köpf
994ffa8459
Merge pull request #170 from open-thought/rich/needle
...
Adds Needle in a Haystack problems
2025-02-20 22:12:47 +01:00
Rich Jones
0f798457ed
jugs jugs jugs lint
2025-02-20 16:15:29 +01:00
Rich Jones
6f00690ae1
basic jugs
2025-02-20 15:24:46 +01:00
Andreas Köpf
e25973b118
Merge pull request #158 from open-thought/rich/decimalmath
...
Decimal Arithmetic
2025-02-20 12:35:41 +01:00
Rich Jones
621c20d8d8
adds Needle in a Haystack problems
2025-02-20 12:28:30 +01:00