|
|
||
|---|---|---|
| .. | ||
| __init__.py | ||
| README.md | ||
| regex_env.py | ||
| regex_problems.py | ||
Regex Generation Environment
An RL environment that trains language models to generate correct Python-compatible regular expressions from natural language descriptions and example test cases.
How it works
Each problem gives the model:
- A natural language description of the pattern to match
- A set of strings that should match
- A set of strings that should not match
The model must produce a regex pattern inside <answer> tags. The pattern is tested using re.fullmatch() against all provided examples.
Reward signal
The reward is the fraction of test cases passed (both positive and negative). A score of 1.0 means the regex correctly matches all positive examples and rejects all negative ones. Groups where all rollouts score identically are discarded (no learning signal).
Problem set
The environment ships with 28 hand-crafted regex problems across three difficulty levels:
- Easy: Basic patterns (digits only, starts with X, exact match)
- Medium: Emails, dates, phone numbers, hex colors, zip codes
- Hard: IPv4 addresses, semantic versioning, URLs, repeated words
Problems are split 80/20 into train/test sets.
Running
# Basic training
python regex_env.py serve \
--env.tokenizer_name="NousResearch/DeepHermes-3-Llama-3-3B-Preview" \
--openai.base_url http://localhost:9001/v1
# Only easy/medium problems
python regex_env.py serve \
--env.difficulties='["easy", "medium"]'
Config options
| Option | Type | Default | Description |
|---|---|---|---|
difficulties |
list[str] | ["easy", "medium", "hard"] |
Difficulty levels to include |
score_threshold |
float | 1.0 |
Min score to count as "correct" in metrics |
Standard BaseEnvConfig options (group_size, max_token_length, etc.) also apply.
Eval metrics
| Metric | Description |
|---|---|
eval/avg_score |
Average fraction of test cases passed |
eval/percent_perfect |
Fraction of problems with all tests passing |
eval/percent_valid_regex |
Fraction of responses with syntactically valid regex |
train/percent_correct |
Training accuracy (problems scoring above threshold) |
Dependencies
No extra dependencies beyond what Atropos already provides. Uses only Python's built-in re module for regex validation.