Andreas Koepf
|
4cd5bd42c3
|
verify that OPENROUTER_API_KEY env var is set
|
2025-02-26 22:15:30 +01:00 |
|
Andreas Koepf (aider)
|
a92dcd4a75
|
feat: Add comprehensive unit tests for parse_string_to_complex() method
|
2025-02-26 21:44:32 +01:00 |
|
Andreas Koepf
|
726ba114dc
|
add llama-3.3-70b-instruct eval yaml files
|
2025-02-26 20:54:07 +01:00 |
|
Andreas Koepf
|
2362b52d24
|
add markdown tripple backticks around tsumego board
|
2025-02-26 19:39:05 +01:00 |
|
Andreas Köpf
|
95821a72bc
|
Merge pull request #232 from open-thought/211_fix_tsumego_score_answer
Fix & simplify score_answer() of TsumegoDataset
|
2025-02-26 19:07:32 +01:00 |
|
Andreas Koepf
|
2ddcb7c3c7
|
fix & simplify score_answer() of TsumegoDataset
|
2025-02-26 19:04:30 +01:00 |
|
Andreas Koepf
|
3bdf531122
|
bump version, pypi release of 0.1.12
|
2025-02-26 18:25:16 +01:00 |
|
Andreas Koepf
|
3c16f1c195
|
update gallery
|
2025-02-26 18:23:06 +01:00 |
|
Oliver Stanley
|
a0d466765a
|
Merge pull request #188 from olliestanley/codeio-sampler
Procedural dataset for generating reasoning problems from CodeI/O-style data
|
2025-02-26 16:51:45 +00:00 |
|
Andreas Köpf
|
7c4ab296fd
|
Merge pull request #231 from AhmedSaif2/count-primes
Fix primes representation in count_primes dataset metadata
|
2025-02-26 17:49:50 +01:00 |
|
Andreas Köpf
|
42d42aae89
|
Merge pull request #219 from open-thought/rich/fix_ccc
Fix Cube Rotation Scoring
|
2025-02-26 17:41:18 +01:00 |
|
AhmedSaif2
|
e9e36f3a23
|
Fix primes representation in count_primes dataset metadata
|
2025-02-26 14:58:21 +02:00 |
|
Rich Jones
|
f2479fcacc
|
fix CCC scoring
|
2025-02-26 12:54:40 +01:00 |
|
Oliver
|
8f05e6108c
|
Fix
|
2025-02-26 11:17:23 +00:00 |
|
Andreas Köpf
|
c0e5941fe5
|
Merge pull request #217 from open-thought/feat/o3-mini-eun
added o3 mini yaml rconfiguration
|
2025-02-26 09:38:11 +01:00 |
|
vncntt
|
98af865309
|
fix sonnet eval_dir (#216)
* fix eval_dir
* add logging
|
2025-02-26 09:37:09 +01:00 |
|
joesharratt1229
|
8eaece6f05
|
added o3 mini yaml
|
2025-02-26 08:09:12 +00:00 |
|
Andreas Köpf
|
6b923d5ea0
|
Fix PoolMatrixConfigs::score_answer(), add unit tests (#215)
|
2025-02-26 00:43:18 +01:00 |
|
Andreas Köpf
|
7317c6f0b4
|
Merge pull request #212 from open-thought/eval_consolidation_2
Add llama-3.3-70b-instruct algebra, algorithmic eval configs
|
2025-02-25 23:46:08 +01:00 |
|
Andreas Koepf
|
ba6bdb7d6b
|
fix score_answer of pool_matrix (if -> elif), remove print
|
2025-02-25 23:43:29 +01:00 |
|
Andreas Koepf
|
969ec6a208
|
add try-except to GraphColorDataset.score_answer()
|
2025-02-25 23:43:29 +01:00 |
|
Andreas Koepf
|
d1f2f30d8a
|
add None/empty check to score_answer of cryptarithm
|
2025-02-25 23:43:29 +01:00 |
|
Andreas Koepf
|
9b7eec2d64
|
add llama-3.3-70b-instruct algebra, algorithmic eval configs
|
2025-02-25 23:43:29 +01:00 |
|
Andreas Koepf
|
70b9cc813e
|
fix formatting of NOTICE.txt
|
2025-02-25 23:43:12 +01:00 |
|
Oliver
|
15d4c41457
|
Fix pre-commit
|
2025-02-25 22:42:06 +00:00 |
|
Oliver
|
58caf1fbea
|
Merge branch 'main' into codeio-sampler
|
2025-02-25 22:41:47 +00:00 |
|
Oliver
|
4bdb8c7d6b
|
Add note on code execution to CodeIODataset
|
2025-02-25 22:39:06 +00:00 |
|
Oliver
|
ef2f8d1978
|
Move data file & load into memory on first object creation
|
2025-02-25 22:36:38 +00:00 |
|
Andreas Koepf (aider)
|
6f17eca9da
|
docs: Add BibTeX citation for Re-ARC dataset in NOTICE.txt
|
2025-02-25 20:19:11 +01:00 |
|
vncntt
|
465db5c5c7
|
Add KnightsKnavesDataset (knights_knaves)
Adapted code from https://github.com/AlphaPav/mem-kk-logic/blob/main/data_prep/lib_kk.py
---------
Co-authored-by: Andreas Koepf (aider) <andreas.koepf@provisio.com>
|
2025-02-25 20:15:38 +01:00 |
|
Andreas Köpf
|
0438efd0c7
|
Merge pull request #205 from open-thought/consolidate_eval_script
Consolidate eval scripts to have single eval.py
|
2025-02-25 19:45:05 +01:00 |
|
Andreas Koepf
|
a60cdb0775
|
use results folder name for eval results
|
2025-02-25 19:41:21 +01:00 |
|
joesharratt1229
|
3a2de98b1c
|
finalised readme
|
2025-02-25 18:14:39 +00:00 |
|
joesharratt1229
|
e0e8bab09c
|
Merge remote-tracking branch 'origin/consolidate_eval_script' into fix/eval
|
2025-02-25 18:10:07 +00:00 |
|
joesharratt1229
|
68e8ea89d8
|
changed structure
|
2025-02-25 16:32:42 +00:00 |
|
joesharratt1229
|
ce8877167d
|
updated config and read me
|
2025-02-25 16:25:16 +00:00 |
|
joesharratt1229
|
2eea347296
|
updated read me
|
2025-02-25 15:51:31 +00:00 |
|
joesharratt1229
|
93b95d748b
|
updated read me
|
2025-02-25 15:46:43 +00:00 |
|
Andreas Koepf
|
11fb7e0edf
|
move r1 configs into r1 yaml/r1 subfolder
|
2025-02-25 16:24:30 +01:00 |
|
Andreas Koepf
|
7f0047667f
|
consolidate eval scripts to have single eval.py
|
2025-02-25 16:13:22 +01:00 |
|
Andreas Köpf
|
7bb791b338
|
Merge pull request #204 from open-thought/requirements_txt_eval
Add eval/requirements-eval.txt
|
2025-02-25 15:55:09 +01:00 |
|
Andreas Koepf
|
4eb4933647
|
add aiohttp & tenacity deps to requirements-eval.txt
|
2025-02-25 15:50:11 +01:00 |
|
Andreas Koepf (aider)
|
795685f30e
|
docs: Update installation instructions in eval README
|
2025-02-25 15:37:09 +01:00 |
|
Andreas Koepf (aider)
|
bb0d1f0a82
|
docs: Add dependency installation step to eval README setup instructions
|
2025-02-25 15:19:38 +01:00 |
|
Andreas Koepf
|
d40da704db
|
remove eval results from main repo
|
2025-02-25 11:02:02 +01:00 |
|
Andreas Koepf (aider)
|
a073a2792b
|
docs: Add info about reasoning-gym-eval repository for evaluation results
|
2025-02-25 10:53:21 +01:00 |
|
Oliver
|
f895a458c7
|
Register CodeIODataset
|
2025-02-24 18:28:35 +00:00 |
|
Oliver
|
efbcfb6eed
|
Initial scoring algo for codeio
|
2025-02-24 18:27:53 +00:00 |
|
Oliver
|
5a222a398b
|
Add tiny sample dataset & efficient sampling
|
2025-02-24 17:58:31 +00:00 |
|
Andreas Köpf
|
1d25601f15
|
Merge pull request #197 from open-thought/notice_txt_first_version
docs: Add NOTICE.txt file to project
|
2025-02-24 15:30:28 +01:00 |
|