atropos/environments/community/arithmetic_chain/README.md

# Arithmetic Chain

Self-contained RL environment: procedurally generated multi-step integer problems (add / subtract / multiply from a starting value). The model must answer with `\boxed{integer}`; rewards use the same `math_verify` path as GSM8K.

**No Hugging Face dataset** — training items are sampled on the fly.

## Run (serve)

From the repo root, with Atropos API and an OpenAI-compatible inference server configured in `config_init` or via CLI overrides:

```bash
python environments/community/arithmetic_chain/arithmetic_chain_server.py serve --slurm false
```

## Process (debug rollouts)

```bash
python environments/community/arithmetic_chain/arithmetic_chain_server.py process \
  --env.data_path_to_save_groups rollouts.jsonl \
  --slurm false
```

Uses `ManagedServer` for token/logprob tracking (compatible with trainers that expect Atropos’ standard scored groups).