Andreas Köpf
b2904ccab9
Minor question template & score_answer improvements ( #261 )
...
* math prompt improvements
* ignore brackets in complex_arithmetic results
* improve additional instruction in prompt of polynomial_equations
* more strict tests for score_answer in polynomial_equations
* simplify special reward handling
* fix test_intermediate_integration
* fix sokoban dataset
* add common dataset score_answer consistency test
2025-03-04 21:55:09 +01:00
Andreas Koepf
0a487030ec
minor arc_1d tweaks
2025-02-23 16:37:40 +01:00
Andreas Koepf
f600c7eb30
add arc_1d size range test
2025-02-23 12:58:51 +01:00
Andreas Koepf
e444bbf7a1
fix index out of range of arc_1d dataset ( #190 )
2025-02-23 12:51:41 +01:00
Andreas Koepf
9a21e6776d
adapt unit tests to partial match changes
2025-02-14 21:30:50 +01:00
Andreas Koepf
2ad0965fdc
move arc_1d into from cognition into arc folder
2025-02-08 19:37:26 +01:00
Andreas Koepf
8dc496bc35
add attribution for arc-1d and unit tests
2025-02-02 23:45:25 +01:00
Andreas Koepf (aider)
5cf57500d6
test: Add scoring tests for Arc1D dataset answer evaluation
2025-02-02 23:31:20 +01:00
Andreas Koepf
47fa699745
test: Remove test_arc_1d.py file from tests directory
2025-02-02 23:30:15 +01:00
Andreas Koepf (aider)
2120e4ed1b
feat: Add missing task transformation imports to test_arc_1d.py
2025-02-02 22:42:43 +01:00
Andreas Koepf (aider)
017148d78d
feat: Add task augmentation functions mirror, inverse, and identity to arc_1d.py
2025-02-02 22:42:21 +01:00
Andreas Koepf
604db012c3
change parameter order for basic arc tasks
2025-02-02 17:25:37 +01:00
Andreas Koepf (aider)
dad72bc6d0
fix: Correct argument passing in ARC 1D task test lambda functions
2025-02-02 16:43:25 +01:00
Andreas Koepf (aider)
1da869862a
fix: Update test_arc_1d.py to handle task function argument order
2025-02-02 16:42:46 +01:00
Andreas Koepf (aider)
6e9f0879ac
fix: Remove redundant parameters in ARC 1D task test suite
2025-02-02 16:42:21 +01:00
Andreas Koepf (aider)
6cef2589fe
test: Add comprehensive unittest for arc_1d task functions
2025-02-02 16:40:39 +01:00