Commit graph

708 commits

Author SHA1 Message Date
Zafir Stojanovski
52a56cbc4f system prompt for structured output, and parse such outputs 2025-02-12 10:44:42 +01:00
Andreas Koepf
6c5ee5b915 update GALLERY.md & bump version 2025-02-11 23:43:56 +01:00
Andreas Koepf (aider)
59461aaec8 fix: Add validation for size parameter in ABConfig 2025-02-11 23:39:57 +01:00
Andreas Koepf (aider)
38922c7e6e fix: Add missing random import in test_ab.py 2025-02-11 23:37:49 +01:00
Andreas Koepf (aider)
2e3e01eda0 test: Add comprehensive unit tests for ABDataset 2025-02-11 23:37:40 +01:00
Andreas Koepf
4b7abd0ffd test: Add test for ABConfig dataset generation 2025-02-11 23:37:38 +01:00
Andreas Köpf
8eee8b3770 Merge pull request #112 from open-thought/rich/ab
Add A::B Challenges
2025-02-11 23:34:55 +01:00
Andreas Köpf
18df4d33e8 Merge branch 'main' into rich/ab 2025-02-11 23:34:48 +01:00
Andreas Köpf
96558ddea5 Merge pull request #115 from zafstojano/env/count-primes
(env): Count primes in an interval
2025-02-11 23:19:03 +01:00
Andreas Köpf
6d6c731ffb Merge pull request #114 from zafstojano/fix/simplify-rotate-matrix
fix(env): Simplify rotate matrix core function
2025-02-11 23:03:17 +01:00
Andreas Köpf
36e8228ff2 Merge pull request #111 from open-thought/rich/rectanglecount
Add Rectangle Count Dataset
2025-02-11 22:57:44 +01:00
Andreas Köpf
8d917d133d Merge pull request #110 from open-thought/rich/dice
Adds Dice Probability Dataset
2025-02-11 22:54:02 +01:00
Andreas Köpf
fa2a11ae56 Merge pull request #99 from open-thought/curriculum_basics
Add foundation for auto-curriculum
2025-02-11 22:52:14 +01:00
Zafir Stojanovski
b39184d09e pool matrix 2025-02-11 22:22:39 +01:00
Rich Jones
cb4baab029 Add A::B Challenges 2025-02-11 18:08:25 +01:00
Rich Jones
16bf151786 clarity 2025-02-11 16:22:53 +01:00
Andreas Köpf
05ec556ede Merge pull request #108 from rishabhranawat/eval-v2
Eval V1: improve speed using async
2025-02-11 16:07:47 +01:00
Zafir Stojanovski
d647498c43 lint 2025-02-11 14:44:46 +01:00
Zafir Stojanovski
3873c50ac6 count primes 2025-02-11 14:44:38 +01:00
Rich Jones
2fa1ea106d add rectangle count dataset 2025-02-11 13:56:27 +01:00
Zafir Stojanovski
40cb67f009 Merge branch 'main' of https://github.com/open-thought/reasoning-gym into fix/simplify-rotate-matrix 2025-02-11 13:55:05 +01:00
Zafir Stojanovski
08d39bca81 simplify rotate method 2025-02-11 13:54:54 +01:00
Rich Jones
88efd99b60 lint again 2025-02-11 13:00:12 +01:00
Rich Jones
48c7712007 commit test 2025-02-11 12:59:16 +01:00
Rich Jones
9cd4e825d4 fmt 2025-02-11 12:54:23 +01:00
Rich Jones
93a7a58023 add dice dataset 2025-02-11 12:53:13 +01:00
Andreas Köpf
a174f44361 Merge pull request #109 from joesharratt1229/feat/r1-evals
added r1 evaluation logic
2025-02-11 11:35:46 +01:00
Andreas Koepf
6e22e5d56d fix typo 2025-02-11 11:03:55 +01:00
joesharratt1229
ecddc3aa9f corrected small linting err in cognition.yaml 2025-02-11 06:56:04 +00:00
joesharratt1229
9df5f45083 converted answer to string 2025-02-11 06:48:59 +00:00
rishabhranawat
2d57beb517 commit formatting 2025-02-10 22:05:45 -08:00
rishabhranawat
6e3d049fed [eval-v1] benchmark with 50 samples 2025-02-10 22:05:09 -08:00
rishabhranawat
06cabcfdee [eval-v1] add a simple readme with some details 2025-02-10 21:57:00 -08:00
rishabhranawat
615c63d2f9 [eval-v1] pre commit formatting 2025-02-10 21:50:22 -08:00
rishabhranawat
88c875c00f [eval-v1] add timer 2025-02-10 21:48:44 -08:00
rishabhranawat
be3d04e7cb [eval-v1] async to speed up inference/evaluation 2025-02-10 21:35:46 -08:00
joesharratt1229
a3ea4449d1 added r1 evaluation logic 2025-02-11 03:46:56 +00:00
Andreas Koepf
eb25ab9656 update gallery, lower default config values for PowerFunctionDataset 2025-02-10 22:42:04 +01:00
Andreas Köpf
898dc0754a Merge pull request #100 from zafstojano/env/matrix-manipulation
Matrix Manipulation Dataset
2025-02-10 22:37:37 +01:00
Zafir Stojanovski
f255831f1c add more config params 2025-02-10 22:30:36 +01:00
Zafir Stojanovski
3e42d9588e count bits (#101) 2025-02-10 22:12:50 +01:00
Andreas Koepf
690dc03131 add chain_sum curriculum unit test 2025-02-10 22:09:18 +01:00
Zafir Stojanovski
178895ab1b Power Function (#102)
* power function dataset + tests
2025-02-10 22:04:58 +01:00
Zafir Stojanovski
696fdf8be7 Merge branch 'main' of https://github.com/open-thought/reasoning-gym into env/matrix-manipulation 2025-02-10 20:40:41 +01:00
Andreas Koepf
357a89fe8c Add attributes for curriculum
Co-authored-by: EduardDurech <39579228+EduardDurech@users.noreply.github.com>
2025-02-10 18:58:07 +01:00
Adefioye
767c34297f Add score_answer method to word_ladder (#93)
* Add score_answer method to word_ladder
* add unit test for WordLadderDataset::score_answer()

---------

Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>
2025-02-10 15:15:23 +01:00
Zafir Stojanovski
111f4c9170 matrix manipulation 2025-02-10 13:51:39 +01:00
Andreas Köpf
3150f9d9aa Merge pull request #97 from rishabhranawat/eval-v1
[eval-basic] initial scripts for evaluating models on reasoning gym
2025-02-10 11:59:49 +01:00
rishabhranawat
03f87dbc07 [eval-basic] remove large results files, add gitignore, only leave summary 2025-02-09 22:52:10 -08:00
rishabhranawat
2308ed99fb [eval-basic] run precommit formatting 2025-02-09 22:40:45 -08:00