mirror of
https://github.com/NousResearch/atropos.git
synced 2026-04-19 12:57:58 +00:00
Procedural multi-step integer chains with boxed answers; uses ManagedServer and math_verify for scoring. No external dataset required. Made-with: Cursor
23 lines
889 B
Markdown
23 lines
889 B
Markdown
# Arithmetic Chain
|
||
|
||
Self-contained RL environment: procedurally generated multi-step integer problems (add / subtract / multiply from a starting value). The model must answer with `\boxed{integer}`; rewards use the same `math_verify` path as GSM8K.
|
||
|
||
**No Hugging Face dataset** — training items are sampled on the fly.
|
||
|
||
## Run (serve)
|
||
|
||
From the repo root, with Atropos API and an OpenAI-compatible inference server configured in `config_init` or via CLI overrides:
|
||
|
||
```bash
|
||
python environments/community/arithmetic_chain/arithmetic_chain_server.py serve --slurm false
|
||
```
|
||
|
||
## Process (debug rollouts)
|
||
|
||
```bash
|
||
python environments/community/arithmetic_chain/arithmetic_chain_server.py process \
|
||
--env.data_path_to_save_groups rollouts.jsonl \
|
||
--slurm false
|
||
```
|
||
|
||
Uses `ManagedServer` for token/logprob tracking (compatible with trainers that expect Atropos’ standard scored groups).
|