reasoning-gym

mirror of https://github.com/open-thought/reasoning-gym.git synced 2026-04-25 17:10:51 +00:00

Author	SHA1	Message	Date
Andreas Köpf	52b44c47d5	reasoning-gym-server & cli tool (#154 ) * feat: Add initial server structure with configuration, registry, and middleware * feat: Add chain_sum dataset to experiment registry test * fix: Update test_registry to use DatasetSpec for composite config validation * refactor: Update Pydantic config to use json_schema_extra and ConfigDict * feat: Add Pydantic models for API request/response data * feat: Implement basic experiment management endpoints with tests * feat: Implement composite configuration endpoints for experiments * fix: Add missing DatasetConfigUpdate import in server.py * refactor: Update dataset config update method to properly merge config updates * fix: Correctly retrieve current dataset config in composite endpoint * feat: Add basic CLI structure with experiments and config commands * feat: Add initial CLI tool with basic experiment management commands * refactor: Reorganize CLI package structure and fix import paths * refactor: Implement initial CLI commands for experiment management * feat: Implement HTTP client for Reasoning Gym server in RGC CLI tool * fix: Move print statements inside try block to resolve SyntaxError * fix: Resolve SyntaxError in edit_config function by adding missing except block * feat: Add default app instance in server module for easier uvicorn startup * docs: Add README.md with server and RGC tool documentation * remove unused files * refactor: Remove unsupported type annotation in registry.py * refactor: Move ExperimentRegistry to coaching module and add Experiment class * fix: Add missing CompositeDataset import in test_registry.py * refactor: Implement lazy ASGI app creation for server initialization * feat: Add health check command to RGC CLI for server connection * feat: Add version tracking support to CompositeDataset * feat: Add DatasetVersionManager for tracking dataset versions * feat: Add entry_id metadata and score_answer_with_id method to CompositeDataset * feat: Add entry_id metadata combining version and index * fix: Resolve undefined variable by storing version_id before use * test: Add comprehensive unit tests for score_answer_with_id() function * test: Add comprehensive version tracking test for dataset config updates * feat: Validate dataset weights are positive in CompositeDataset initialization * feat: Add weight update and normalization methods to CompositeDataset * refactor: Centralize weight normalization in CompositeDataset and allow zero-weight datasets * feat: Add negative weight validation to CompositeDataset constructor * feat: Add duplicate dataset name check in CompositeDataset and update test * refactor: Move duplicate dataset name check inside dataset iteration loop * refactor: Update CompositeDataset weight management to use config as source of truth * refactor: Move duplicate dataset name check to CompositeConfig.validate() * test: Update composite dataset weight test assertions and validation * feat: Add methods to add and remove datasets in CompositeDataset * refactor: Remove weight normalization and use unnormalized weights directly * refactor: Remove redundant total weight check in update_dataset_weights * feat: Add batch generation and scoring endpoints to server * fix: Import BatchEntry in server.py to resolve undefined name error * refactor: Update ReasoningGymDataset to use server for batch generation and scoring * fix: Add missing List and Dict type imports * feat: Add get_batch() and score_outputs() methods to RGClient * test: Add unit tests for generate_batch and score_outputs endpoints * refactor: Add DatasetVersionManager to Experiment class and CompositeDataset constructor * feat: Add validation for base_index and batch_size in generate_batch endpoint * refactor: Remove unused BatchRequest type from imports * refactor: Convert models to use Pydantic exclusively * test: Update scoring endpoint tests to use correct request model format * refactor: Rename ScoreItem to AnswerItem and update related code * feat: Update scoring endpoint to return ordered ScoringResponse with scores and entry_ids * fix: Add missing ScoringResponse import in server.py * move verl ppo sample with server into own file * refactor: Use Pydantic models for get_batch() and score_outputs() in RGClient * refactor: Update client methods to use Pydantic models for type safety * refactor: Use Pydantic models for experiment and dataset config operations * refactor: Clean up duplicate methods and improve error handling in main.py * first bits of rg server use for verl * refactor: Optimize scoring with single HTTP request in _score_output * fix: Correct experiment creation with ExperimentCreate object * grpo tests with server	2025-02-19 22:41:33 +01:00
Zafir Stojanovski	fec058d905	number format	2025-02-19 11:41:06 +01:00
Andreas Koepf	e9a2097a71	remove redundant assert in ChainSumConfig.validate()	2025-02-19 09:42:32 +01:00
Andreas Köpf	9fb231dde9	Merge pull request #161 from olliestanley/fix/sudoku-unique Fix Sudoku generator for uniqueness, implement scoring	2025-02-18 22:55:43 +01:00
Oliver	47321936a5	Add docstring	2025-02-18 21:38:46 +00:00
Oliver	47b4f29c6a	Remove now redundant is_valid function	2025-02-18 21:37:37 +00:00
Oliver	43ccddf1ac	Remove comment	2025-02-18 21:32:15 +00:00
Oliver	368d13d470	Optimise Sudoku uniqueness checks	2025-02-18 21:30:59 +00:00
Oliver	c1d2e555ee	Fix Sudoku generator uniqueness and scoring	2025-02-18 21:02:49 +00:00
Oliver	8a7c782c73	Tweak mini sudoku config	2025-02-18 18:46:14 +00:00
Oliver	90a77d0f5a	Tweak mini sudoku config	2025-02-18 18:43:19 +00:00
Oliver	0ccb3cbdfd	Tweak num_empty logic	2025-02-18 18:36:12 +00:00
Oliver	bf4c3d26d3	Ensure unique mini sudokus	2025-02-18 18:31:30 +00:00
Zafir Stojanovski	6b0ea7a14c	lint	2025-02-18 14:24:07 +01:00
Zafir Stojanovski	fd5c47d634	update generation of input string	2025-02-18 14:23:05 +01:00
Zafir Stojanovski	ed606631bb	Merge branch 'main' of https://github.com/open-thought/reasoning-gym into env/palindrome-partitioning	2025-02-18 14:08:00 +01:00
Oliver	49081e44bc	Constrain reward	2025-02-17 19:20:45 +00:00
Oliver	0de0044d52	Formatting/scoring improvements for BF & family	2025-02-17 19:08:15 +00:00
Oliver	b40c44059d	Cleanup question & add scoring for mini sudoku	2025-02-17 18:37:07 +00:00
Andreas Köpf	54d52ef6ae	Merge pull request #147 from Adefioye/koko/question-template-tinkering Tweaking some question templates	2025-02-17 18:22:49 +01:00
Andreas Köpf	680665992f	Merge pull request #150 from zafstojano/fix/rectangle-count fix(env): Rectangle Count	2025-02-17 18:22:35 +01:00
Andreas Köpf	97a444040b	Merge pull request #149 from zafstojano/fix/base-conversion fix(env): Base Conversion	2025-02-17 18:21:39 +01:00
Andreas Koepf	aa794253fe	minor formatting changes	2025-02-17 18:20:18 +01:00
Andreas Köpf	5654f300b9	Merge pull request #106 from tohskai/multivariate-polynomial-multiplication Better support for multivariate polynomials in PolynomialMultiplicationDataset	2025-02-17 18:08:36 +01:00
abdulhakeem	d3323b1a77	Remove python code reference in count_primes prompt	2025-02-17 10:29:14 -06:00
tohskai	28fcf4d481	Refactor PolynomialMultiplicationDataset and fix issues with score_answer	2025-02-17 17:04:48 +01:00
Zafir Stojanovski	1c75f7cfd2	fix prompt	2025-02-17 16:12:50 +01:00
Zafir Stojanovski	38e8e371b6	fix prompt	2025-02-17 14:21:10 +01:00
Zafir Stojanovski	c9895f1023	fix prompt and scoring function	2025-02-17 13:17:29 +01:00
abdulhakeem	5b1e42f878	Tweaked some question templates	2025-02-17 02:58:42 -06:00
Andreas Koepf	99b49f868f	fix question templates	2025-02-16 23:04:24 +01:00
Andreas Köpf	79758a31de	Merge pull request #105 from open-thought/circuit_logic initial draft for circuit_logic dataset generator	2025-02-16 22:54:43 +01:00
Andreas Koepf	5348d9ed69	fix comment: legend no longer part of metadata	2025-02-16 22:53:18 +01:00
Andreas Koepf (aider)	0051c266d4	feat: Add scoring method & unit tests for circuit logic dataset	2025-02-16 22:48:51 +01:00
Andreas Köpf	2f3f2c7ccc	Merge pull request #146 from zafstojano/fix/word-sorting fix(env): Word Sorting	2025-02-16 21:55:52 +01:00
Andreas Köpf	6997c71a8b	Merge pull request #145 from joesharratt1229/fix/games Fix/games	2025-02-16 21:54:10 +01:00
Zafir Stojanovski	a27704b44e	fix template	2025-02-16 19:51:24 +01:00
joesharratt1229	2c032a2500	restrcutured maze prompt template	2025-02-16 18:26:24 +00:00
joesharratt1229	94e07ddbf2	updated tower of hanoi question template	2025-02-16 17:54:33 +00:00
joesharratt1229	ca07c8584e	updated countdown question template	2025-02-16 17:53:54 +00:00
joesharratt1229	cf6c15d0ee	Merge pull request #144 from joesharratt1229/fix/arithmetic Added fixes for arithmetic environments	2025-02-16 16:34:09 +00:00
joesharratt1229	230fffe0a4	formatted answer as str	2025-02-16 15:56:58 +00:00
Andreas Köpf	33f888dce7	Merge pull request #140 from theblackcat102/cryptarithm New task : Verbal arithmetic	2025-02-16 16:41:44 +01:00
Andreas Köpf	7f42ec7b19	Merge pull request #143 from zafstojano/fix/string-synthesis fix(env): String Synthesis	2025-02-16 16:39:30 +01:00
Andreas Koepf	b78408200c	cryptarithm change defaults: size=500, include_example=True	2025-02-16 16:38:43 +01:00
Andreas Koepf	446913fee6	import CryptarithmDataset in algorithmic/__init__.py	2025-02-16 16:32:17 +01:00
Andreas Koepf	839c830c2a	formatting	2025-02-16 16:30:28 +01:00
joesharratt1229	fff40f4f36	adjusted gsm symbolic question template	2025-02-16 15:28:44 +00:00
Andreas Koepf	6fae41a6e1	ensure reward is float	2025-02-16 16:27:12 +01:00
Andreas Köpf	66ddb41bbd	Merge pull request #141 from joesharratt1229/feat/score-answer-impl Added score answer implementations `spell_backward` and `sentence reordering`	2025-02-16 16:24:48 +01:00

1 2 3 4 5 ...

626 commits