reasoning-gym

mirror of https://github.com/open-thought/reasoning-gym.git synced 2026-04-28 17:29:39 +00:00

Author	SHA1	Message	Date
Andreas Koepf	4cd5bd42c3	verify that OPENROUTER_API_KEY env var is set	2025-02-26 22:15:30 +01:00
Andreas Koepf (aider)	a92dcd4a75	feat: Add comprehensive unit tests for parse_string_to_complex() method	2025-02-26 21:44:32 +01:00
Andreas Koepf	726ba114dc	add llama-3.3-70b-instruct eval yaml files	2025-02-26 20:54:07 +01:00
Andreas Koepf	2362b52d24	add markdown tripple backticks around tsumego board	2025-02-26 19:39:05 +01:00
Andreas Köpf	95821a72bc	Merge pull request #232 from open-thought/211_fix_tsumego_score_answer Fix & simplify score_answer() of TsumegoDataset	2025-02-26 19:07:32 +01:00
Andreas Koepf	2ddcb7c3c7	fix & simplify score_answer() of TsumegoDataset	2025-02-26 19:04:30 +01:00
Andreas Koepf	3bdf531122	bump version, pypi release of 0.1.12	2025-02-26 18:25:16 +01:00
Andreas Koepf	3c16f1c195	update gallery	2025-02-26 18:23:06 +01:00
Oliver Stanley	a0d466765a	Merge pull request #188 from olliestanley/codeio-sampler Procedural dataset for generating reasoning problems from CodeI/O-style data	2025-02-26 16:51:45 +00:00
Andreas Köpf	7c4ab296fd	Merge pull request #231 from AhmedSaif2/count-primes Fix primes representation in count_primes dataset metadata	2025-02-26 17:49:50 +01:00
Andreas Köpf	42d42aae89	Merge pull request #219 from open-thought/rich/fix_ccc Fix Cube Rotation Scoring	2025-02-26 17:41:18 +01:00
AhmedSaif2	e9e36f3a23	Fix primes representation in count_primes dataset metadata	2025-02-26 14:58:21 +02:00
Rich Jones	f2479fcacc	fix CCC scoring	2025-02-26 12:54:40 +01:00
Oliver	8f05e6108c	Fix	2025-02-26 11:17:23 +00:00
Andreas Köpf	c0e5941fe5	Merge pull request #217 from open-thought/feat/o3-mini-eun added o3 mini yaml rconfiguration	2025-02-26 09:38:11 +01:00
vncntt	98af865309	fix sonnet eval_dir (#216 ) * fix eval_dir * add logging	2025-02-26 09:37:09 +01:00
joesharratt1229	8eaece6f05	added o3 mini yaml	2025-02-26 08:09:12 +00:00
Andreas Köpf	6b923d5ea0	Fix PoolMatrixConfigs::score_answer(), add unit tests (#215 )	2025-02-26 00:43:18 +01:00
Andreas Köpf	7317c6f0b4	Merge pull request #212 from open-thought/eval_consolidation_2 Add llama-3.3-70b-instruct algebra, algorithmic eval configs	2025-02-25 23:46:08 +01:00
Andreas Koepf	ba6bdb7d6b	fix score_answer of pool_matrix (if -> elif), remove print	2025-02-25 23:43:29 +01:00
Andreas Koepf	969ec6a208	add try-except to GraphColorDataset.score_answer()	2025-02-25 23:43:29 +01:00
Andreas Koepf	d1f2f30d8a	add None/empty check to score_answer of cryptarithm	2025-02-25 23:43:29 +01:00
Andreas Koepf	9b7eec2d64	add llama-3.3-70b-instruct algebra, algorithmic eval configs	2025-02-25 23:43:29 +01:00
Andreas Koepf	70b9cc813e	fix formatting of NOTICE.txt	2025-02-25 23:43:12 +01:00
Oliver	15d4c41457	Fix pre-commit	2025-02-25 22:42:06 +00:00
Oliver	58caf1fbea	Merge branch 'main' into codeio-sampler	2025-02-25 22:41:47 +00:00
Oliver	4bdb8c7d6b	Add note on code execution to CodeIODataset	2025-02-25 22:39:06 +00:00
Oliver	ef2f8d1978	Move data file & load into memory on first object creation	2025-02-25 22:36:38 +00:00
Andreas Koepf (aider)	6f17eca9da	docs: Add BibTeX citation for Re-ARC dataset in NOTICE.txt	2025-02-25 20:19:11 +01:00
vncntt	465db5c5c7	Add KnightsKnavesDataset (knights_knaves) Adapted code from https://github.com/AlphaPav/mem-kk-logic/blob/main/data_prep/lib_kk.py --------- Co-authored-by: Andreas Koepf (aider) <andreas.koepf@provisio.com>	2025-02-25 20:15:38 +01:00
Andreas Köpf	0438efd0c7	Merge pull request #205 from open-thought/consolidate_eval_script Consolidate eval scripts to have single eval.py	2025-02-25 19:45:05 +01:00
Andreas Koepf	a60cdb0775	use results folder name for eval results	2025-02-25 19:41:21 +01:00
joesharratt1229	3a2de98b1c	finalised readme	2025-02-25 18:14:39 +00:00
joesharratt1229	e0e8bab09c	Merge remote-tracking branch 'origin/consolidate_eval_script' into fix/eval	2025-02-25 18:10:07 +00:00
joesharratt1229	68e8ea89d8	changed structure	2025-02-25 16:32:42 +00:00
joesharratt1229	ce8877167d	updated config and read me	2025-02-25 16:25:16 +00:00
joesharratt1229	2eea347296	updated read me	2025-02-25 15:51:31 +00:00
joesharratt1229	93b95d748b	updated read me	2025-02-25 15:46:43 +00:00
Andreas Koepf	11fb7e0edf	move r1 configs into r1 yaml/r1 subfolder	2025-02-25 16:24:30 +01:00
Andreas Koepf	7f0047667f	consolidate eval scripts to have single eval.py	2025-02-25 16:13:22 +01:00
Andreas Köpf	7bb791b338	Merge pull request #204 from open-thought/requirements_txt_eval Add eval/requirements-eval.txt	2025-02-25 15:55:09 +01:00
Andreas Koepf	4eb4933647	add aiohttp & tenacity deps to requirements-eval.txt	2025-02-25 15:50:11 +01:00
Andreas Koepf (aider)	795685f30e	docs: Update installation instructions in eval README	2025-02-25 15:37:09 +01:00
Andreas Koepf (aider)	bb0d1f0a82	docs: Add dependency installation step to eval README setup instructions	2025-02-25 15:19:38 +01:00
Andreas Koepf	d40da704db	remove eval results from main repo	2025-02-25 11:02:02 +01:00
Andreas Koepf (aider)	a073a2792b	docs: Add info about reasoning-gym-eval repository for evaluation results	2025-02-25 10:53:21 +01:00
Oliver	f895a458c7	Register CodeIODataset	2025-02-24 18:28:35 +00:00
Oliver	efbcfb6eed	Initial scoring algo for codeio	2025-02-24 18:27:53 +00:00
Oliver	5a222a398b	Add tiny sample dataset & efficient sampling	2025-02-24 17:58:31 +00:00
Andreas Köpf	1d25601f15	Merge pull request #197 from open-thought/notice_txt_first_version docs: Add NOTICE.txt file to project	2025-02-24 15:30:28 +01:00

1 2 3 4 5 ...

1061 commits