mirror of
https://github.com/NousResearch/atropos.git
synced 2026-04-19 12:57:58 +00:00
Add regex generation environment for community
This commit is contained in:
parent
81b2d4daab
commit
86d5163316
4 changed files with 822 additions and 0 deletions
61
environments/community/regex_generation/README.md
Normal file
61
environments/community/regex_generation/README.md
Normal file
|
|
@ -0,0 +1,61 @@
|
|||
# Regex Generation Environment
|
||||
|
||||
An RL environment that trains language models to generate correct Python-compatible regular expressions from natural language descriptions and example test cases.
|
||||
|
||||
## How it works
|
||||
|
||||
Each problem gives the model:
|
||||
- A natural language description of the pattern to match
|
||||
- A set of strings that **should** match
|
||||
- A set of strings that **should not** match
|
||||
|
||||
The model must produce a regex pattern inside `<answer>` tags. The pattern is tested using `re.fullmatch()` against all provided examples.
|
||||
|
||||
## Reward signal
|
||||
|
||||
The reward is the fraction of test cases passed (both positive and negative). A score of 1.0 means the regex correctly matches all positive examples and rejects all negative ones. Groups where all rollouts score identically are discarded (no learning signal).
|
||||
|
||||
## Problem set
|
||||
|
||||
The environment ships with 28 hand-crafted regex problems across three difficulty levels:
|
||||
|
||||
- **Easy**: Basic patterns (digits only, starts with X, exact match)
|
||||
- **Medium**: Emails, dates, phone numbers, hex colors, zip codes
|
||||
- **Hard**: IPv4 addresses, semantic versioning, URLs, repeated words
|
||||
|
||||
Problems are split 80/20 into train/test sets.
|
||||
|
||||
## Running
|
||||
|
||||
```bash
|
||||
# Basic training
|
||||
python regex_env.py serve \
|
||||
--env.tokenizer_name="NousResearch/DeepHermes-3-Llama-3-3B-Preview" \
|
||||
--openai.base_url http://localhost:9001/v1
|
||||
|
||||
# Only easy/medium problems
|
||||
python regex_env.py serve \
|
||||
--env.difficulties='["easy", "medium"]'
|
||||
```
|
||||
|
||||
## Config options
|
||||
|
||||
| Option | Type | Default | Description |
|
||||
|--------|------|---------|-------------|
|
||||
| `difficulties` | list[str] | `["easy", "medium", "hard"]` | Difficulty levels to include |
|
||||
| `score_threshold` | float | `1.0` | Min score to count as "correct" in metrics |
|
||||
|
||||
Standard `BaseEnvConfig` options (`group_size`, `max_token_length`, etc.) also apply.
|
||||
|
||||
## Eval metrics
|
||||
|
||||
| Metric | Description |
|
||||
|--------|-------------|
|
||||
| `eval/avg_score` | Average fraction of test cases passed |
|
||||
| `eval/percent_perfect` | Fraction of problems with all tests passing |
|
||||
| `eval/percent_valid_regex` | Fraction of responses with syntactically valid regex |
|
||||
| `train/percent_correct` | Training accuracy (problems scoring above threshold) |
|
||||
|
||||
## Dependencies
|
||||
|
||||
No extra dependencies beyond what Atropos already provides. Uses only Python's built-in `re` module for regex validation.
|
||||
Loading…
Add table
Add a link
Reference in a new issue