Add arithmetic_chain community environment

Procedural multi-step integer chains with boxed answers; uses ManagedServer
and math_verify for scoring. No external dataset required.

Made-with: Cursor
This commit is contained in:
nevasini1 2026-03-21 17:42:01 -04:00
parent c421582b6f
commit e6bc008545
2 changed files with 347 additions and 0 deletions

View file

@ -0,0 +1,23 @@
# Arithmetic Chain
Self-contained RL environment: procedurally generated multi-step integer problems (add / subtract / multiply from a starting value). The model must answer with `\boxed{integer}`; rewards use the same `math_verify` path as GSM8K.
**No Hugging Face dataset** — training items are sampled on the fly.
## Run (serve)
From the repo root, with Atropos API and an OpenAI-compatible inference server configured in `config_init` or via CLI overrides:
```bash
python environments/community/arithmetic_chain/arithmetic_chain_server.py serve --slurm false
```
## Process (debug rollouts)
```bash
python environments/community/arithmetic_chain/arithmetic_chain_server.py process \
--env.data_path_to_save_groups rollouts.jsonl \
--slurm false
```
Uses `ManagedServer` for token/logprob tracking (compatible with trainers that expect Atropos standard scored groups).