Zafir Stojanovski
52a56cbc4f
system prompt for structured output, and parse such outputs
2025-02-12 10:44:42 +01:00
Andreas Koepf
6c5ee5b915
update GALLERY.md & bump version
2025-02-11 23:43:56 +01:00
Andreas Koepf (aider)
59461aaec8
fix: Add validation for size parameter in ABConfig
2025-02-11 23:39:57 +01:00
Andreas Koepf (aider)
38922c7e6e
fix: Add missing random import in test_ab.py
2025-02-11 23:37:49 +01:00
Andreas Koepf (aider)
2e3e01eda0
test: Add comprehensive unit tests for ABDataset
2025-02-11 23:37:40 +01:00
Andreas Koepf
4b7abd0ffd
test: Add test for ABConfig dataset generation
2025-02-11 23:37:38 +01:00
Andreas Köpf
8eee8b3770
Merge pull request #112 from open-thought/rich/ab
...
Add A::B Challenges
2025-02-11 23:34:55 +01:00
Andreas Köpf
18df4d33e8
Merge branch 'main' into rich/ab
2025-02-11 23:34:48 +01:00
Andreas Köpf
96558ddea5
Merge pull request #115 from zafstojano/env/count-primes
...
(env): Count primes in an interval
2025-02-11 23:19:03 +01:00
Andreas Köpf
6d6c731ffb
Merge pull request #114 from zafstojano/fix/simplify-rotate-matrix
...
fix(env): Simplify rotate matrix core function
2025-02-11 23:03:17 +01:00
Andreas Köpf
36e8228ff2
Merge pull request #111 from open-thought/rich/rectanglecount
...
Add Rectangle Count Dataset
2025-02-11 22:57:44 +01:00
Andreas Köpf
8d917d133d
Merge pull request #110 from open-thought/rich/dice
...
Adds Dice Probability Dataset
2025-02-11 22:54:02 +01:00
Andreas Köpf
fa2a11ae56
Merge pull request #99 from open-thought/curriculum_basics
...
Add foundation for auto-curriculum
2025-02-11 22:52:14 +01:00
Zafir Stojanovski
b39184d09e
pool matrix
2025-02-11 22:22:39 +01:00
Rich Jones
cb4baab029
Add A::B Challenges
2025-02-11 18:08:25 +01:00
Rich Jones
16bf151786
clarity
2025-02-11 16:22:53 +01:00
Andreas Köpf
05ec556ede
Merge pull request #108 from rishabhranawat/eval-v2
...
Eval V1: improve speed using async
2025-02-11 16:07:47 +01:00
Zafir Stojanovski
d647498c43
lint
2025-02-11 14:44:46 +01:00
Zafir Stojanovski
3873c50ac6
count primes
2025-02-11 14:44:38 +01:00
Rich Jones
2fa1ea106d
add rectangle count dataset
2025-02-11 13:56:27 +01:00
Zafir Stojanovski
40cb67f009
Merge branch 'main' of https://github.com/open-thought/reasoning-gym into fix/simplify-rotate-matrix
2025-02-11 13:55:05 +01:00
Zafir Stojanovski
08d39bca81
simplify rotate method
2025-02-11 13:54:54 +01:00
Rich Jones
88efd99b60
lint again
2025-02-11 13:00:12 +01:00
Rich Jones
48c7712007
commit test
2025-02-11 12:59:16 +01:00
Rich Jones
9cd4e825d4
fmt
2025-02-11 12:54:23 +01:00
Rich Jones
93a7a58023
add dice dataset
2025-02-11 12:53:13 +01:00
Andreas Köpf
a174f44361
Merge pull request #109 from joesharratt1229/feat/r1-evals
...
added r1 evaluation logic
2025-02-11 11:35:46 +01:00
Andreas Koepf
6e22e5d56d
fix typo
2025-02-11 11:03:55 +01:00
joesharratt1229
ecddc3aa9f
corrected small linting err in cognition.yaml
2025-02-11 06:56:04 +00:00
joesharratt1229
9df5f45083
converted answer to string
2025-02-11 06:48:59 +00:00
rishabhranawat
2d57beb517
commit formatting
2025-02-10 22:05:45 -08:00
rishabhranawat
6e3d049fed
[eval-v1] benchmark with 50 samples
2025-02-10 22:05:09 -08:00
rishabhranawat
06cabcfdee
[eval-v1] add a simple readme with some details
2025-02-10 21:57:00 -08:00
rishabhranawat
615c63d2f9
[eval-v1] pre commit formatting
2025-02-10 21:50:22 -08:00
rishabhranawat
88c875c00f
[eval-v1] add timer
2025-02-10 21:48:44 -08:00
rishabhranawat
be3d04e7cb
[eval-v1] async to speed up inference/evaluation
2025-02-10 21:35:46 -08:00
joesharratt1229
a3ea4449d1
added r1 evaluation logic
2025-02-11 03:46:56 +00:00
Andreas Koepf
eb25ab9656
update gallery, lower default config values for PowerFunctionDataset
2025-02-10 22:42:04 +01:00
Andreas Köpf
898dc0754a
Merge pull request #100 from zafstojano/env/matrix-manipulation
...
Matrix Manipulation Dataset
2025-02-10 22:37:37 +01:00
Zafir Stojanovski
f255831f1c
add more config params
2025-02-10 22:30:36 +01:00
Zafir Stojanovski
3e42d9588e
count bits ( #101 )
2025-02-10 22:12:50 +01:00
Andreas Koepf
690dc03131
add chain_sum curriculum unit test
2025-02-10 22:09:18 +01:00
Zafir Stojanovski
178895ab1b
Power Function ( #102 )
...
* power function dataset + tests
2025-02-10 22:04:58 +01:00
Zafir Stojanovski
696fdf8be7
Merge branch 'main' of https://github.com/open-thought/reasoning-gym into env/matrix-manipulation
2025-02-10 20:40:41 +01:00
Andreas Koepf
357a89fe8c
Add attributes for curriculum
...
Co-authored-by: EduardDurech <39579228+EduardDurech@users.noreply.github.com>
2025-02-10 18:58:07 +01:00
Adefioye
767c34297f
Add score_answer method to word_ladder ( #93 )
...
* Add score_answer method to word_ladder
* add unit test for WordLadderDataset::score_answer()
---------
Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>
2025-02-10 15:15:23 +01:00
Zafir Stojanovski
111f4c9170
matrix manipulation
2025-02-10 13:51:39 +01:00
Andreas Köpf
3150f9d9aa
Merge pull request #97 from rishabhranawat/eval-v1
...
[eval-basic] initial scripts for evaluating models on reasoning gym
2025-02-10 11:59:49 +01:00
rishabhranawat
03f87dbc07
[eval-basic] remove large results files, add gitignore, only leave summary
2025-02-09 22:52:10 -08:00
rishabhranawat
2308ed99fb
[eval-basic] run precommit formatting
2025-02-09 22:40:45 -08:00