Commit graph

1254 commits

Author SHA1 Message Date
Andreas Köpf
5b89a3a2d0
Merge pull request #217 from open-thought/feat/o3-mini-eun
added o3 mini yaml rconfiguration
2025-02-26 09:38:11 +01:00
vncntt
29179f783e
fix sonnet eval_dir (#216)
* fix eval_dir

* add logging
2025-02-26 09:37:09 +01:00
joesharratt1229
7d7e44d1af added o3 mini yaml 2025-02-26 08:09:12 +00:00
Andreas Köpf
48f082663a
Fix PoolMatrixConfigs::score_answer(), add unit tests (#215) 2025-02-26 00:43:18 +01:00
Andreas Köpf
99fec3425d
Merge pull request #212 from open-thought/eval_consolidation_2
Add llama-3.3-70b-instruct algebra, algorithmic eval configs
2025-02-25 23:46:08 +01:00
Andreas Koepf
bba128ffd0 fix score_answer of pool_matrix (if -> elif), remove print 2025-02-25 23:43:29 +01:00
Andreas Koepf
f9e8f8b064 add try-except to GraphColorDataset.score_answer() 2025-02-25 23:43:29 +01:00
Andreas Koepf
65d17b9850 add None/empty check to score_answer of cryptarithm 2025-02-25 23:43:29 +01:00
Andreas Koepf
6d5168d1e5 add llama-3.3-70b-instruct algebra, algorithmic eval configs 2025-02-25 23:43:29 +01:00
Andreas Koepf
92c8be1699 fix formatting of NOTICE.txt 2025-02-25 23:43:12 +01:00
Oliver
3028492933 Fix pre-commit 2025-02-25 22:42:06 +00:00
Oliver
aa6759c160 Merge branch 'main' into codeio-sampler 2025-02-25 22:41:47 +00:00
Oliver
81c77a495d Add note on code execution to CodeIODataset 2025-02-25 22:39:06 +00:00
Oliver
0252dd905f Move data file & load into memory on first object creation 2025-02-25 22:36:38 +00:00
Zafir Stojanovski
b47bf882ce filtering 2025-02-25 22:21:26 +01:00
Andreas Koepf (aider)
8ccf077faf docs: Add BibTeX citation for Re-ARC dataset in NOTICE.txt 2025-02-25 20:19:11 +01:00
vncntt
5f01049607
Add KnightsKnavesDataset (knights_knaves)
Adapted code from https://github.com/AlphaPav/mem-kk-logic/blob/main/data_prep/lib_kk.py

---------

Co-authored-by: Andreas Koepf (aider) <andreas.koepf@provisio.com>
2025-02-25 20:15:38 +01:00
Andreas Köpf
ed9292a7f4
Merge pull request #205 from open-thought/consolidate_eval_script
Consolidate eval scripts to have single eval.py
2025-02-25 19:45:05 +01:00
Andreas Koepf
791f16ec0f use results folder name for eval results 2025-02-25 19:41:21 +01:00
joesharratt1229
ffe60ef112 finalised readme 2025-02-25 18:14:39 +00:00
joesharratt1229
56cc111ab3 Merge remote-tracking branch 'origin/consolidate_eval_script' into fix/eval 2025-02-25 18:10:07 +00:00
joesharratt1229
9ac6ea4eb2 changed structure 2025-02-25 16:32:42 +00:00
joesharratt1229
52c3c430b9 updated config and read me 2025-02-25 16:25:16 +00:00
joesharratt1229
7b39f4a3c7 updated read me 2025-02-25 15:51:31 +00:00
joesharratt1229
046c46c0bb updated read me 2025-02-25 15:46:43 +00:00
Andreas Koepf
878f9bbc76 move r1 configs into r1 yaml/r1 subfolder 2025-02-25 16:24:30 +01:00
Andreas Koepf
e7ae82a831 consolidate eval scripts to have single eval.py 2025-02-25 16:13:22 +01:00
Andreas Köpf
bea806fe3c
Merge pull request #204 from open-thought/requirements_txt_eval
Add eval/requirements-eval.txt
2025-02-25 15:55:09 +01:00
Andreas Koepf
8291956554 add aiohttp & tenacity deps to requirements-eval.txt 2025-02-25 15:50:11 +01:00
Andreas Koepf (aider)
e48c1f82cd docs: Update installation instructions in eval README 2025-02-25 15:37:09 +01:00
Andreas Koepf (aider)
a1b0a0414e docs: Add dependency installation step to eval README setup instructions 2025-02-25 15:19:38 +01:00
Andreas Koepf
574edb5c5b remove eval results from main repo 2025-02-25 11:02:02 +01:00
Andreas Koepf (aider)
205174c532 docs: Add info about reasoning-gym-eval repository for evaluation results 2025-02-25 10:53:21 +01:00
Zafir Stojanovski
5ed4395613 async 2025-02-24 22:07:35 +01:00
Oliver
fe502d5eb2 Register CodeIODataset 2025-02-24 18:28:35 +00:00
Oliver
43daec67ea Initial scoring algo for codeio 2025-02-24 18:27:53 +00:00
Oliver
1795c8ea7a Add tiny sample dataset & efficient sampling 2025-02-24 17:58:31 +00:00
Zafir Stojanovski
aac7175c69 generate inputs synchronously 2025-02-24 15:58:06 +01:00
Andreas Köpf
a4b767fa0e
Merge pull request #197 from open-thought/notice_txt_first_version
docs: Add NOTICE.txt file to project
2025-02-24 15:30:28 +01:00
Andreas Koepf
0bea658c94 docs: Add NOTICE.txt file to project 2025-02-24 12:57:28 +01:00
Andreas Köpf
3c589f99bd
Merge pull request #195 from open-thought/fix/eval
pinned provider to nebius fixes #192
2025-02-24 08:34:45 +01:00
joesharratt1229
cffbff935c pinned provider to nebius 2025-02-24 05:01:22 +00:00
Oliver
7b5a12a92c Remove outdated comment 2025-02-23 22:24:13 +00:00
Oliver
e07287e1f9 Add validation 2025-02-23 22:23:45 +00:00
Andreas Koepf
b5f6f7d753 bump version, update gallery 2025-02-23 22:36:39 +01:00
Andreas Köpf
d115655f0a
Merge pull request #191 from zafstojano/env/shortest-path
feat(env): Shortest Path
2025-02-23 22:28:43 +01:00
Andreas Koepf
45e452bff6 reduce size of default shortest_path maze grid 2025-02-23 22:27:17 +01:00
Oliver
342902683f Merge branch 'main' into codeio-sampler 2025-02-23 20:28:06 +00:00
Oliver
f787069fd2 Add input prediction 2025-02-23 20:27:27 +00:00
Zafir Stojanovski
c5f37d5e9f predict actual path 2025-02-23 18:24:23 +01:00