atropos/environments/community/arithmetic_chain/README.md
nevasini1 e6bc008545 Add arithmetic_chain community environment
Procedural multi-step integer chains with boxed answers; uses ManagedServer
and math_verify for scoring. No external dataset required.

Made-with: Cursor
2026-03-21 17:42:34 -04:00

23 lines
889 B
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Arithmetic Chain
Self-contained RL environment: procedurally generated multi-step integer problems (add / subtract / multiply from a starting value). The model must answer with `\boxed{integer}`; rewards use the same `math_verify` path as GSM8K.
**No Hugging Face dataset** — training items are sampled on the fly.
## Run (serve)
From the repo root, with Atropos API and an OpenAI-compatible inference server configured in `config_init` or via CLI overrides:
```bash
python environments/community/arithmetic_chain/arithmetic_chain_server.py serve --slurm false
```
## Process (debug rollouts)
```bash
python environments/community/arithmetic_chain/arithmetic_chain_server.py process \
--env.data_path_to_save_groups rollouts.jsonl \
--slurm false
```
Uses `ManagedServer` for token/logprob tracking (compatible with trainers that expect Atropos standard scored groups).