atropos/environments/community/arithmetic_chain
2026-03-21 17:59:57 -04:00
..
arithmetic_chain_server.py Apply isort import order (ruff) to arithmetic_chain_server 2026-03-21 17:59:57 -04:00
README.md Add arithmetic_chain community environment 2026-03-21 17:42:34 -04:00

Arithmetic Chain

Self-contained RL environment: procedurally generated multi-step integer problems (add / subtract / multiply from a starting value). The model must answer with \boxed{integer}; rewards use the same math_verify path as GSM8K.

No Hugging Face dataset — training items are sampled on the fly.

Run (serve)

From the repo root, with Atropos API and an OpenAI-compatible inference server configured in config_init or via CLI overrides:

python environments/community/arithmetic_chain/arithmetic_chain_server.py serve --slurm false

Process (debug rollouts)

python environments/community/arithmetic_chain/arithmetic_chain_server.py process \
  --env.data_path_to_save_groups rollouts.jsonl \
  --slurm false

Uses ManagedServer for token/logprob tracking (compatible with trainers that expect Atropos standard scored groups).