Commit graph

1035 commits

Author SHA1 Message Date
AhmedSaif2
e9e36f3a23 Fix primes representation in count_primes dataset metadata 2025-02-26 14:58:21 +02:00
Andreas Köpf
c0e5941fe5 Merge pull request #217 from open-thought/feat/o3-mini-eun
added o3 mini yaml rconfiguration
2025-02-26 09:38:11 +01:00
vncntt
98af865309 fix sonnet eval_dir (#216)
* fix eval_dir

* add logging
2025-02-26 09:37:09 +01:00
joesharratt1229
8eaece6f05 added o3 mini yaml 2025-02-26 08:09:12 +00:00
Andreas Köpf
6b923d5ea0 Fix PoolMatrixConfigs::score_answer(), add unit tests (#215) 2025-02-26 00:43:18 +01:00
Andreas Köpf
7317c6f0b4 Merge pull request #212 from open-thought/eval_consolidation_2
Add llama-3.3-70b-instruct algebra, algorithmic eval configs
2025-02-25 23:46:08 +01:00
Andreas Koepf
ba6bdb7d6b fix score_answer of pool_matrix (if -> elif), remove print 2025-02-25 23:43:29 +01:00
Andreas Koepf
969ec6a208 add try-except to GraphColorDataset.score_answer() 2025-02-25 23:43:29 +01:00
Andreas Koepf
d1f2f30d8a add None/empty check to score_answer of cryptarithm 2025-02-25 23:43:29 +01:00
Andreas Koepf
9b7eec2d64 add llama-3.3-70b-instruct algebra, algorithmic eval configs 2025-02-25 23:43:29 +01:00
Andreas Koepf
70b9cc813e fix formatting of NOTICE.txt 2025-02-25 23:43:12 +01:00
Andreas Koepf (aider)
6f17eca9da docs: Add BibTeX citation for Re-ARC dataset in NOTICE.txt 2025-02-25 20:19:11 +01:00
vncntt
465db5c5c7 Add KnightsKnavesDataset (knights_knaves)
Adapted code from https://github.com/AlphaPav/mem-kk-logic/blob/main/data_prep/lib_kk.py

---------

Co-authored-by: Andreas Koepf (aider) <andreas.koepf@provisio.com>
2025-02-25 20:15:38 +01:00
Andreas Köpf
0438efd0c7 Merge pull request #205 from open-thought/consolidate_eval_script
Consolidate eval scripts to have single eval.py
2025-02-25 19:45:05 +01:00
Andreas Koepf
a60cdb0775 use results folder name for eval results 2025-02-25 19:41:21 +01:00
joesharratt1229
3a2de98b1c finalised readme 2025-02-25 18:14:39 +00:00
joesharratt1229
e0e8bab09c Merge remote-tracking branch 'origin/consolidate_eval_script' into fix/eval 2025-02-25 18:10:07 +00:00
joesharratt1229
68e8ea89d8 changed structure 2025-02-25 16:32:42 +00:00
joesharratt1229
ce8877167d updated config and read me 2025-02-25 16:25:16 +00:00
joesharratt1229
2eea347296 updated read me 2025-02-25 15:51:31 +00:00
joesharratt1229
93b95d748b updated read me 2025-02-25 15:46:43 +00:00
Andreas Koepf
11fb7e0edf move r1 configs into r1 yaml/r1 subfolder 2025-02-25 16:24:30 +01:00
Andreas Koepf
7f0047667f consolidate eval scripts to have single eval.py 2025-02-25 16:13:22 +01:00
Andreas Köpf
7bb791b338 Merge pull request #204 from open-thought/requirements_txt_eval
Add eval/requirements-eval.txt
2025-02-25 15:55:09 +01:00
Andreas Koepf
4eb4933647 add aiohttp & tenacity deps to requirements-eval.txt 2025-02-25 15:50:11 +01:00
Andreas Koepf (aider)
795685f30e docs: Update installation instructions in eval README 2025-02-25 15:37:09 +01:00
Andreas Koepf (aider)
bb0d1f0a82 docs: Add dependency installation step to eval README setup instructions 2025-02-25 15:19:38 +01:00
Andreas Koepf
d40da704db remove eval results from main repo 2025-02-25 11:02:02 +01:00
Andreas Koepf (aider)
a073a2792b docs: Add info about reasoning-gym-eval repository for evaluation results 2025-02-25 10:53:21 +01:00
Andreas Köpf
1d25601f15 Merge pull request #197 from open-thought/notice_txt_first_version
docs: Add NOTICE.txt file to project
2025-02-24 15:30:28 +01:00
Andreas Koepf
6fffa4ad27 docs: Add NOTICE.txt file to project 2025-02-24 12:57:28 +01:00
Andreas Köpf
2cfd123ec0 Merge pull request #195 from open-thought/fix/eval
pinned provider to nebius fixes #192
2025-02-24 08:34:45 +01:00
joesharratt1229
1b0f774974 pinned provider to nebius 2025-02-24 05:01:22 +00:00
Andreas Koepf
80eff8acb6 bump version, update gallery 2025-02-23 22:36:39 +01:00
Andreas Köpf
8b0a3e2c95 Merge pull request #191 from zafstojano/env/shortest-path
feat(env): Shortest Path
2025-02-23 22:28:43 +01:00
Andreas Koepf
0b8c4bce0c reduce size of default shortest_path maze grid 2025-02-23 22:27:17 +01:00
Zafir Stojanovski
915a0f1f51 predict actual path 2025-02-23 18:24:23 +01:00
Andreas Köpf
f68d7e533c Merge pull request #194 from open-thought/190_fix_arc_1d_out_of_range
minor arc_1d tweaks
2025-02-23 16:40:09 +01:00
Andreas Koepf
0a487030ec minor arc_1d tweaks 2025-02-23 16:37:40 +01:00
Andreas Köpf
b71a051f6a Merge pull request #193 from open-thought/190_fix_arc_1d_out_of_range
Fix index out of range for arc_1d dataset
2025-02-23 13:20:08 +01:00
Andreas Koepf
696769a3d6 remove unnecessary checks, use tuples 2025-02-23 13:17:48 +01:00
Andreas Koepf
f600c7eb30 add arc_1d size range test 2025-02-23 12:58:51 +01:00
Andreas Koepf
e444bbf7a1 fix index out of range of arc_1d dataset (#190) 2025-02-23 12:51:41 +01:00
Zafir Stojanovski
df914dfb49 shortest path 2025-02-23 11:25:00 +01:00
Andreas Koepf
a1a305c8d7 dev minor version one ahead of PyPI released version 2025-02-22 16:54:05 +01:00
Andreas Köpf
5c73043a1e Merge pull request #176 from olliestanley/codeio-experiments
Experiments with CodeI/O techniques for synthesising reasoning data
2025-02-22 16:24:17 +01:00
Oliver
94cd3c4d43 Add steps to synthesize CoTs with DeepSeekV3 2025-02-21 23:36:19 +00:00
Oliver
3297fc1bc0 Improve prompt for better LLM adherence 2025-02-21 23:00:48 +00:00
Andreas Koepf
74f590e24f more native type hints 2025-02-21 21:23:14 +01:00
Andreas Köpf
ae26704d05 Merge pull request #185 from joesharratt1229/feat/emoji-mystery
Implements #173
2025-02-21 21:09:26 +01:00