atropos/environments/community/arithmetic_chain/README.md
nevasini1 e6bc008545 Add arithmetic_chain community environment
Procedural multi-step integer chains with boxed answers; uses ManagedServer
and math_verify for scoring. No external dataset required.

Made-with: Cursor
2026-03-21 17:42:34 -04:00

889 B
Raw Blame History

Arithmetic Chain

Self-contained RL environment: procedurally generated multi-step integer problems (add / subtract / multiply from a starting value). The model must answer with \boxed{integer}; rewards use the same math_verify path as GSM8K.

No Hugging Face dataset — training items are sampled on the fly.

Run (serve)

From the repo root, with Atropos API and an OpenAI-compatible inference server configured in config_init or via CLI overrides:

python environments/community/arithmetic_chain/arithmetic_chain_server.py serve --slurm false

Process (debug rollouts)

python environments/community/arithmetic_chain/arithmetic_chain_server.py process \
  --env.data_path_to_save_groups rollouts.jsonl \
  --slurm false

Uses ManagedServer for token/logprob tracking (compatible with trainers that expect Atropos standard scored groups).