mirror of https://github.com/NousResearch/atropos.git synced 2026-04-19 12:57:58 +00:00

nevasini1 e6bc008545 Add arithmetic_chain community environment

Procedural multi-step integer chains with boxed answers; uses ManagedServer
and math_verify for scoring. No external dataset required.

Made-with: Cursor

2026-03-21 17:42:34 -04:00

889 B

Raw Blame History

Arithmetic Chain

Self-contained RL environment: procedurally generated multi-step integer problems (add / subtract / multiply from a starting value). The model must answer with \boxed{integer}; rewards use the same math_verify path as GSM8K.

No Hugging Face dataset — training items are sampled on the fly.

Run (serve)

From the repo root, with Atropos API and an OpenAI-compatible inference server configured in config_init or via CLI overrides:

python environments/community/arithmetic_chain/arithmetic_chain_server.py serve --slurm false

Process (debug rollouts)

python environments/community/arithmetic_chain/arithmetic_chain_server.py process \
  --env.data_path_to_save_groups rollouts.jsonl \
  --slurm false

Uses ManagedServer for token/logprob tracking (compatible with trainers that expect Atropos’ standard scored groups).

889 B Raw Blame History Unescape Escape

Arithmetic Chain

Run (serve)

Process (debug rollouts)

889 B

Raw Blame History