mirror of
https://github.com/NousResearch/atropos.git
synced 2026-04-19 12:57:58 +00:00
Procedural multi-step integer chains with boxed answers; uses ManagedServer and math_verify for scoring. No external dataset required. Made-with: Cursor
889 B
889 B
Arithmetic Chain
Self-contained RL environment: procedurally generated multi-step integer problems (add / subtract / multiply from a starting value). The model must answer with \boxed{integer}; rewards use the same math_verify path as GSM8K.
No Hugging Face dataset — training items are sampled on the fly.
Run (serve)
From the repo root, with Atropos API and an OpenAI-compatible inference server configured in config_init or via CLI overrides:
python environments/community/arithmetic_chain/arithmetic_chain_server.py serve --slurm false
Process (debug rollouts)
python environments/community/arithmetic_chain/arithmetic_chain_server.py process \
--env.data_path_to_save_groups rollouts.jsonl \
--slurm false
Uses ManagedServer for token/logprob tracking (compatible with trainers that expect Atropos’ standard scored groups).