reasoning-gym

mirror of https://github.com/open-thought/reasoning-gym.git synced 2026-04-28 17:29:39 +00:00

Author	SHA1	Message	Date
Andreas Köpf	1b1c04bb70	feat: Add `category` property to `ProceduralDataset` to extract category name (#248 )	2025-03-01 23:11:40 +01:00
Zafir Stojanovski	1bc9f6f09f	fix manipulate matrix (#247 )	2025-03-01 23:00:29 +01:00
Rich Jones	80aafda8e5	more dynamic scoring for jumble (#246 )	2025-03-01 18:50:59 +01:00
Zafir Stojanovski	78c92d7056	Mahjong Puzzle (#241 ) * mahjong	2025-03-01 16:27:26 +01:00
Andreas Köpf	dbd2ac723e	Add base_url and api_key command line args for eval.py script (#244 ) * feat: Add base URL command line parameter to eval.py script * feat: Add API key parameter and CLI option to AsyncModelEvaluator	2025-02-28 18:32:58 +01:00
Rich Jones	27e66ba6dd	log error message on bad api response (#243 )	2025-02-28 15:32:27 +01:00
Andreas Köpf	59922486c6	Eval sampling settings for generation (temperature, top-p, max_tokens) (#242 ) * feat: Add sampling parameters to eval configuration and API call * feat: Add support for system_prompt_id and optional system_prompt configuration	2025-02-28 11:48:37 +01:00
Andreas Koepf	d83e53115a	fix prompt for arc_1d	2025-02-28 08:07:59 +01:00
Andreas Koepf (aider)	82e79d672e	feat: Add system prompt to dataset results and summary output	2025-02-28 00:26:06 +01:00
Andreas Köpf	0b108efac1	Generate eval config tool (#240 ) * feat: Add generate_config.py script to create eval configurations	2025-02-27 21:40:53 +01:00
Andreas Köpf	1ea9a657a7	Eval script consolidation (#238 ) The script now supports: - YAML and JSON configurations - Dataset-specific parameters - Overriding configuration via command line - Detailed logging and error handling	2025-02-27 17:39:14 +01:00
Andreas Köpf	bd745ae959	Merge pull request #237 from open-thought/rich/richmorevalfixes2 Fix graph color example template	2025-02-27 16:08:23 +01:00
Rich Jones	ca5372dcc1	rm typo	2025-02-27 13:44:33 +01:00
Rich Jones	9a8e398f22	fix graph color example template	2025-02-27 13:43:01 +01:00
Andreas Köpf	ba9d625ef4	Merge pull request #186 from zafstojano/feat/codeio feat(env): CodeIO	2025-02-27 12:18:13 +01:00
Andreas Köpf	ed90fff3fa	Merge pull request #220 from open-thought/rich/cubeinstructions Make Rubiks Cube Output Format More Explicit	2025-02-27 12:16:09 +01:00
Andreas Köpf	1cc6eded6a	Merge pull request #236 from open-thought/rich/moreevalfixes Trivial Fixes	2025-02-27 12:14:43 +01:00
Rich Jones	a1b1272e8d	sm fixes	2025-02-27 11:54:04 +01:00
Rich Jones	b2b2311329	seed test config	2025-02-27 10:44:28 +01:00
Rich Jones	9daaccc208	expand more	2025-02-27 10:41:30 +01:00
Zafir Stojanovski	2c566f76ea	final tweaks	2025-02-27 08:38:34 +01:00
Andreas Köpf	6ceb03f224	Merge pull request #233 from open-thought/llama-3.3-70_eval_config Llama 3.3 70 eval config	2025-02-26 22:56:33 +01:00
Andreas Koepf	4cd5bd42c3	verify that OPENROUTER_API_KEY env var is set	2025-02-26 22:15:30 +01:00
Andreas Koepf (aider)	a92dcd4a75	feat: Add comprehensive unit tests for parse_string_to_complex() method	2025-02-26 21:44:32 +01:00
Andreas Koepf	726ba114dc	add llama-3.3-70b-instruct eval yaml files	2025-02-26 20:54:07 +01:00
Zafir Stojanovski	4a59d13100	update timeout	2025-02-26 20:27:43 +01:00
Zafir Stojanovski	20c8392417	e2b testing	2025-02-26 20:19:52 +01:00
Andreas Koepf	2362b52d24	add markdown tripple backticks around tsumego board	2025-02-26 19:39:05 +01:00
Andreas Köpf	95821a72bc	Merge pull request #232 from open-thought/211_fix_tsumego_score_answer Fix & simplify score_answer() of TsumegoDataset	2025-02-26 19:07:32 +01:00
Andreas Koepf	2ddcb7c3c7	fix & simplify score_answer() of TsumegoDataset	2025-02-26 19:04:30 +01:00
Andreas Koepf	3bdf531122	bump version, pypi release of 0.1.12	2025-02-26 18:25:16 +01:00
Andreas Koepf	3c16f1c195	update gallery	2025-02-26 18:23:06 +01:00
Oliver Stanley	a0d466765a	Merge pull request #188 from olliestanley/codeio-sampler Procedural dataset for generating reasoning problems from CodeI/O-style data	2025-02-26 16:51:45 +00:00
Andreas Köpf	7c4ab296fd	Merge pull request #231 from AhmedSaif2/count-primes Fix primes representation in count_primes dataset metadata	2025-02-26 17:49:50 +01:00
Andreas Köpf	42d42aae89	Merge pull request #219 from open-thought/rich/fix_ccc Fix Cube Rotation Scoring	2025-02-26 17:41:18 +01:00
AhmedSaif2	e9e36f3a23	Fix primes representation in count_primes dataset metadata	2025-02-26 14:58:21 +02:00
Rich Jones	214e9d4957	support expanded notation anyway	2025-02-26 13:17:03 +01:00
Rich Jones	b252937f99	rubiks cube instructions	2025-02-26 13:07:17 +01:00
Rich Jones	f2479fcacc	fix CCC scoring	2025-02-26 12:54:40 +01:00
Oliver	8f05e6108c	Fix	2025-02-26 11:17:23 +00:00
Andreas Köpf	c0e5941fe5	Merge pull request #217 from open-thought/feat/o3-mini-eun added o3 mini yaml rconfiguration	2025-02-26 09:38:11 +01:00
vncntt	98af865309	fix sonnet eval_dir (#216 ) * fix eval_dir * add logging	2025-02-26 09:37:09 +01:00
joesharratt1229	8eaece6f05	added o3 mini yaml	2025-02-26 08:09:12 +00:00
Andreas Köpf	6b923d5ea0	Fix PoolMatrixConfigs::score_answer(), add unit tests (#215 )	2025-02-26 00:43:18 +01:00
Andreas Köpf	7317c6f0b4	Merge pull request #212 from open-thought/eval_consolidation_2 Add llama-3.3-70b-instruct algebra, algorithmic eval configs	2025-02-25 23:46:08 +01:00
Andreas Koepf	ba6bdb7d6b	fix score_answer of pool_matrix (if -> elif), remove print	2025-02-25 23:43:29 +01:00
Andreas Koepf	969ec6a208	add try-except to GraphColorDataset.score_answer()	2025-02-25 23:43:29 +01:00
Andreas Koepf	d1f2f30d8a	add None/empty check to score_answer of cryptarithm	2025-02-25 23:43:29 +01:00
Andreas Koepf	9b7eec2d64	add llama-3.3-70b-instruct algebra, algorithmic eval configs	2025-02-25 23:43:29 +01:00
Andreas Koepf	70b9cc813e	fix formatting of NOTICE.txt	2025-02-25 23:43:12 +01:00

1 2 3 4 5 ...

1144 commits