Commit graph

1227 commits

Author SHA1 Message Date
Andreas Köpf
bea806fe3c
Merge pull request #204 from open-thought/requirements_txt_eval
Add eval/requirements-eval.txt
2025-02-25 15:55:09 +01:00
Andreas Koepf
8291956554 add aiohttp & tenacity deps to requirements-eval.txt 2025-02-25 15:50:11 +01:00
Andreas Koepf (aider)
e48c1f82cd docs: Update installation instructions in eval README 2025-02-25 15:37:09 +01:00
Andreas Koepf (aider)
a1b0a0414e docs: Add dependency installation step to eval README setup instructions 2025-02-25 15:19:38 +01:00
Andreas Koepf
574edb5c5b remove eval results from main repo 2025-02-25 11:02:02 +01:00
Andreas Koepf (aider)
205174c532 docs: Add info about reasoning-gym-eval repository for evaluation results 2025-02-25 10:53:21 +01:00
Zafir Stojanovski
5ed4395613 async 2025-02-24 22:07:35 +01:00
Oliver
fe502d5eb2 Register CodeIODataset 2025-02-24 18:28:35 +00:00
Oliver
43daec67ea Initial scoring algo for codeio 2025-02-24 18:27:53 +00:00
Oliver
1795c8ea7a Add tiny sample dataset & efficient sampling 2025-02-24 17:58:31 +00:00
Zafir Stojanovski
aac7175c69 generate inputs synchronously 2025-02-24 15:58:06 +01:00
Andreas Köpf
a4b767fa0e
Merge pull request #197 from open-thought/notice_txt_first_version
docs: Add NOTICE.txt file to project
2025-02-24 15:30:28 +01:00
Andreas Koepf
0bea658c94 docs: Add NOTICE.txt file to project 2025-02-24 12:57:28 +01:00
Andreas Köpf
3c589f99bd
Merge pull request #195 from open-thought/fix/eval
pinned provider to nebius fixes #192
2025-02-24 08:34:45 +01:00
joesharratt1229
cffbff935c pinned provider to nebius 2025-02-24 05:01:22 +00:00
Oliver
7b5a12a92c Remove outdated comment 2025-02-23 22:24:13 +00:00
Oliver
e07287e1f9 Add validation 2025-02-23 22:23:45 +00:00
Andreas Koepf
b5f6f7d753 bump version, update gallery 2025-02-23 22:36:39 +01:00
Andreas Köpf
d115655f0a
Merge pull request #191 from zafstojano/env/shortest-path
feat(env): Shortest Path
2025-02-23 22:28:43 +01:00
Andreas Koepf
45e452bff6 reduce size of default shortest_path maze grid 2025-02-23 22:27:17 +01:00
Oliver
342902683f Merge branch 'main' into codeio-sampler 2025-02-23 20:28:06 +00:00
Oliver
f787069fd2 Add input prediction 2025-02-23 20:27:27 +00:00
Zafir Stojanovski
c5f37d5e9f predict actual path 2025-02-23 18:24:23 +01:00
Andreas Köpf
eaa8f5253b
Merge pull request #194 from open-thought/190_fix_arc_1d_out_of_range
minor arc_1d tweaks
2025-02-23 16:40:09 +01:00
Andreas Koepf
469934d9b7 minor arc_1d tweaks 2025-02-23 16:37:40 +01:00
Andreas Köpf
8e4ed9bae9
Merge pull request #193 from open-thought/190_fix_arc_1d_out_of_range
Fix index out of range for arc_1d dataset
2025-02-23 13:20:08 +01:00
Andreas Koepf
ec3050a4f6 remove unnecessary checks, use tuples 2025-02-23 13:17:48 +01:00
Zafir Stojanovski
5109ed89c9 pre-commit 2025-02-23 13:11:31 +01:00
Andreas Koepf
ba56aa0092 add arc_1d size range test 2025-02-23 12:58:51 +01:00
Andreas Koepf
7a45b14a49 fix index out of range of arc_1d dataset (#190) 2025-02-23 12:51:41 +01:00
Zafir Stojanovski
97b3097984 shortest path 2025-02-23 11:25:00 +01:00
Zafir Stojanovski
96dad6c7f3 sampling code 2025-02-23 00:40:11 +01:00
Andreas Koepf
e4102a44f6 dev minor version one ahead of PyPI released version 2025-02-22 16:54:05 +01:00
Andreas Köpf
7a1e387d6e
Merge pull request #176 from olliestanley/codeio-experiments
Experiments with CodeI/O techniques for synthesising reasoning data
2025-02-22 16:24:17 +01:00
Zafir Stojanovski
e04ca72809 greedy coreset sampling 2025-02-22 16:15:14 +01:00
Oliver
e718168428 Draft CodeIO-derived reasoning problems dataset 2025-02-22 00:56:52 +00:00
Oliver
563480329e Outline CodeIO dataset classes 2025-02-22 00:21:17 +00:00
Zafir Stojanovski
6bbec2ac4e exploratory notebook 2025-02-22 00:46:33 +01:00
Oliver
081f84dec6 Add steps to synthesize CoTs with DeepSeekV3 2025-02-21 23:36:19 +00:00
Oliver
cce6002c70 Improve prompt for better LLM adherence 2025-02-21 23:00:48 +00:00
Andreas Koepf
eeb9fa31d5 more native type hints 2025-02-21 21:23:14 +01:00
Andreas Köpf
90a1181285
Merge pull request #185 from joesharratt1229/feat/emoji-mystery
Implements #173
2025-02-21 21:09:26 +01:00
Andreas Koepf
51808210aa add markdown tripple backtick code block for emoji_mystry hint 2025-02-21 21:06:07 +01:00
Andreas Köpf
c56045b9a7
Merge branch 'main' into feat/emoji-mystery 2025-02-21 20:58:39 +01:00
Oliver
cb1f634078 Prompt tweak 2025-02-21 18:34:13 +00:00
joesharratt1229
1fb73011f8 added answer format spec in prompt 2025-02-21 18:03:05 +00:00
joesharratt1229
9b9554e489 added tests 2025-02-21 17:58:13 +00:00
joesharratt1229
5e64d1c24c added emoji dataset 2025-02-21 17:57:41 +00:00
Oliver
a0ccfa5144 Merge branch 'main' into codeio-experiments 2025-02-21 17:25:08 +00:00
Andreas Koepf
97b30f5f53 update GALLERY.md 2025-02-21 17:30:33 +01:00