mirror of
https://github.com/NousResearch/atropos.git
synced 2026-04-19 12:57:58 +00:00
Add arithmetic_chain community environment
Procedural multi-step integer chains with boxed answers; uses ManagedServer and math_verify for scoring. No external dataset required. Made-with: Cursor
This commit is contained in:
parent
c421582b6f
commit
e6bc008545
2 changed files with 347 additions and 0 deletions
23
environments/community/arithmetic_chain/README.md
Normal file
23
environments/community/arithmetic_chain/README.md
Normal file
|
|
@ -0,0 +1,23 @@
|
|||
# Arithmetic Chain
|
||||
|
||||
Self-contained RL environment: procedurally generated multi-step integer problems (add / subtract / multiply from a starting value). The model must answer with `\boxed{integer}`; rewards use the same `math_verify` path as GSM8K.
|
||||
|
||||
**No Hugging Face dataset** — training items are sampled on the fly.
|
||||
|
||||
## Run (serve)
|
||||
|
||||
From the repo root, with Atropos API and an OpenAI-compatible inference server configured in `config_init` or via CLI overrides:
|
||||
|
||||
```bash
|
||||
python environments/community/arithmetic_chain/arithmetic_chain_server.py serve --slurm false
|
||||
```
|
||||
|
||||
## Process (debug rollouts)
|
||||
|
||||
```bash
|
||||
python environments/community/arithmetic_chain/arithmetic_chain_server.py process \
|
||||
--env.data_path_to_save_groups rollouts.jsonl \
|
||||
--slurm false
|
||||
```
|
||||
|
||||
Uses `ManagedServer` for token/logprob tracking (compatible with trainers that expect Atropos’ standard scored groups).
|
||||
Loading…
Add table
Add a link
Reference in a new issue