reasoning-gym

mirror of https://github.com/open-thought/reasoning-gym.git synced 2026-04-19 12:58:07 +00:00

Author	SHA1	Message	Date
Andreas Köpf	850c1cf6f4	Eval script consolidation (#238 ) The script now supports: - YAML and JSON configurations - Dataset-specific parameters - Overriding configuration via command line - Detailed logging and error handling	2025-02-27 17:39:14 +01:00
Andreas Köpf	8a66d2a216	Merge pull request #237 from open-thought/rich/richmorevalfixes2 Fix graph color example template	2025-02-27 16:08:23 +01:00
Rich Jones	a6c90f40a1	rm typo	2025-02-27 13:44:33 +01:00
Rich Jones	1b95cd3206	fix graph color example template	2025-02-27 13:43:01 +01:00
Andreas Köpf	a56b3b6c5c	Merge pull request #186 from zafstojano/feat/codeio feat(env): CodeIO	2025-02-27 12:18:13 +01:00
Andreas Köpf	c98cc5fcd6	Merge pull request #220 from open-thought/rich/cubeinstructions Make Rubiks Cube Output Format More Explicit	2025-02-27 12:16:09 +01:00
Andreas Köpf	7f64a1bb7c	Merge pull request #236 from open-thought/rich/moreevalfixes Trivial Fixes	2025-02-27 12:14:43 +01:00
Rich Jones	253e49aecf	sm fixes	2025-02-27 11:54:04 +01:00
Rich Jones	52d6b2efd2	seed test config	2025-02-27 10:44:28 +01:00
Rich Jones	633a1aa1ba	expand more	2025-02-27 10:41:30 +01:00
Zafir Stojanovski	4c637c3b13	final tweaks	2025-02-27 08:38:34 +01:00
Andreas Köpf	359feb47c3	Merge pull request #233 from open-thought/llama-3.3-70_eval_config Llama 3.3 70 eval config	2025-02-26 22:56:33 +01:00
Andreas Koepf	477e1f85cc	verify that OPENROUTER_API_KEY env var is set	2025-02-26 22:15:30 +01:00
Andreas Koepf (aider)	941da618d8	feat: Add comprehensive unit tests for parse_string_to_complex() method	2025-02-26 21:44:32 +01:00
Andreas Koepf	acb2d7eb53	add llama-3.3-70b-instruct eval yaml files	2025-02-26 20:54:07 +01:00
Zafir Stojanovski	1ec625cbd9	update timeout	2025-02-26 20:27:43 +01:00
Zafir Stojanovski	2ce450486d	e2b testing	2025-02-26 20:19:52 +01:00
Andreas Koepf	6511725711	add markdown tripple backticks around tsumego board	2025-02-26 19:39:05 +01:00
Andreas Köpf	fb556c39ca	Merge pull request #232 from open-thought/211_fix_tsumego_score_answer Fix & simplify score_answer() of TsumegoDataset	2025-02-26 19:07:32 +01:00
Andreas Koepf	f97bf94caa	fix & simplify score_answer() of TsumegoDataset	2025-02-26 19:04:30 +01:00
Andreas Koepf	72233fc2ea	bump version, pypi release of 0.1.12	2025-02-26 18:25:16 +01:00
Andreas Koepf	b5742de5e5	update gallery	2025-02-26 18:23:06 +01:00
Oliver Stanley	ac4ce13369	Merge pull request #188 from olliestanley/codeio-sampler Procedural dataset for generating reasoning problems from CodeI/O-style data	2025-02-26 16:51:45 +00:00
Andreas Köpf	5e1594da16	Merge pull request #231 from AhmedSaif2/count-primes Fix primes representation in count_primes dataset metadata	2025-02-26 17:49:50 +01:00
Andreas Köpf	e351d302a3	Merge pull request #219 from open-thought/rich/fix_ccc Fix Cube Rotation Scoring	2025-02-26 17:41:18 +01:00
AhmedSaif2	dcdc38b15d	Fix primes representation in count_primes dataset metadata	2025-02-26 14:58:21 +02:00
Rich Jones	f0ca949aaf	support expanded notation anyway	2025-02-26 13:17:03 +01:00
Rich Jones	285e2b20cc	rubiks cube instructions	2025-02-26 13:07:17 +01:00
Rich Jones	229086131a	fix CCC scoring	2025-02-26 12:54:40 +01:00
Oliver	5fa06c961f	Fix	2025-02-26 11:17:23 +00:00
Andreas Köpf	5b89a3a2d0	Merge pull request #217 from open-thought/feat/o3-mini-eun added o3 mini yaml rconfiguration	2025-02-26 09:38:11 +01:00
vncntt	29179f783e	fix sonnet eval_dir (#216 ) * fix eval_dir * add logging	2025-02-26 09:37:09 +01:00
joesharratt1229	7d7e44d1af	added o3 mini yaml	2025-02-26 08:09:12 +00:00
Andreas Köpf	48f082663a	Fix PoolMatrixConfigs::score_answer(), add unit tests (#215 )	2025-02-26 00:43:18 +01:00
Andreas Köpf	99fec3425d	Merge pull request #212 from open-thought/eval_consolidation_2 Add llama-3.3-70b-instruct algebra, algorithmic eval configs	2025-02-25 23:46:08 +01:00
Andreas Koepf	bba128ffd0	fix score_answer of pool_matrix (if -> elif), remove print	2025-02-25 23:43:29 +01:00
Andreas Koepf	f9e8f8b064	add try-except to GraphColorDataset.score_answer()	2025-02-25 23:43:29 +01:00
Andreas Koepf	65d17b9850	add None/empty check to score_answer of cryptarithm	2025-02-25 23:43:29 +01:00
Andreas Koepf	6d5168d1e5	add llama-3.3-70b-instruct algebra, algorithmic eval configs	2025-02-25 23:43:29 +01:00
Andreas Koepf	92c8be1699	fix formatting of NOTICE.txt	2025-02-25 23:43:12 +01:00
Oliver	3028492933	Fix pre-commit	2025-02-25 22:42:06 +00:00
Oliver	aa6759c160	Merge branch 'main' into codeio-sampler	2025-02-25 22:41:47 +00:00
Oliver	81c77a495d	Add note on code execution to CodeIODataset	2025-02-25 22:39:06 +00:00
Oliver	0252dd905f	Move data file & load into memory on first object creation	2025-02-25 22:36:38 +00:00
Zafir Stojanovski	b47bf882ce	filtering	2025-02-25 22:21:26 +01:00
Andreas Koepf (aider)	8ccf077faf	docs: Add BibTeX citation for Re-ARC dataset in NOTICE.txt	2025-02-25 20:19:11 +01:00
vncntt	5f01049607	Add KnightsKnavesDataset (knights_knaves) Adapted code from https://github.com/AlphaPav/mem-kk-logic/blob/main/data_prep/lib_kk.py --------- Co-authored-by: Andreas Koepf (aider) <andreas.koepf@provisio.com>	2025-02-25 20:15:38 +01:00
Andreas Köpf	ed9292a7f4	Merge pull request #205 from open-thought/consolidate_eval_script Consolidate eval scripts to have single eval.py	2025-02-25 19:45:05 +01:00
Andreas Koepf	791f16ec0f	use results folder name for eval results	2025-02-25 19:41:21 +01:00
joesharratt1229	ffe60ef112	finalised readme	2025-02-25 18:14:39 +00:00

1 2 3 4 5 ...

1084 commits