atropos/environments/community/regex_generation/README.md

61 lines
2.3 KiB
Markdown

# Regex Generation Environment
An RL environment that trains language models to generate correct Python-compatible regular expressions from natural language descriptions and example test cases.
## How it works
Each problem gives the model:
- A natural language description of the pattern to match
- A set of strings that **should** match
- A set of strings that **should not** match
The model must produce a regex pattern inside `<answer>` tags. The pattern is tested using `re.fullmatch()` against all provided examples.
## Reward signal
The reward is the fraction of test cases passed (both positive and negative). A score of 1.0 means the regex correctly matches all positive examples and rejects all negative ones. Groups where all rollouts score identically are discarded (no learning signal).
## Problem set
The environment ships with 28 hand-crafted regex problems across three difficulty levels:
- **Easy**: Basic patterns (digits only, starts with X, exact match)
- **Medium**: Emails, dates, phone numbers, hex colors, zip codes
- **Hard**: IPv4 addresses, semantic versioning, URLs, repeated words
Problems are split 80/20 into train/test sets.
## Running
```bash
# Basic training
python regex_env.py serve \
--env.tokenizer_name="NousResearch/DeepHermes-3-Llama-3-3B-Preview" \
--openai.base_url http://localhost:9001/v1
# Only easy/medium problems
python regex_env.py serve \
--env.difficulties='["easy", "medium"]'
```
## Config options
| Option | Type | Default | Description |
|--------|------|---------|-------------|
| `difficulties` | list[str] | `["easy", "medium", "hard"]` | Difficulty levels to include |
| `score_threshold` | float | `1.0` | Min score to count as "correct" in metrics |
Standard `BaseEnvConfig` options (`group_size`, `max_token_length`, etc.) also apply.
## Eval metrics
| Metric | Description |
|--------|-------------|
| `eval/avg_score` | Average fraction of test cases passed |
| `eval/percent_perfect` | Fraction of problems with all tests passing |
| `eval/percent_valid_regex` | Fraction of responses with syntactically valid regex |
| `train/percent_correct` | Training accuracy (problems scoring above threshold) |
## Dependencies
No extra dependencies beyond what Atropos already provides. Uses only Python's built-in `re` module for regex validation.