atropos/environments/community/regex_generation
2026-02-11 19:47:28 +00:00
..
__init__.py Add regex generation environment for community 2026-02-11 23:04:47 +03:30
README.md Add regex generation environment for community 2026-02-11 23:04:47 +03:30
regex_env.py [pre-commit.ci] auto fixes from pre-commit.com hooks 2026-02-11 19:47:28 +00:00
regex_problems.py [pre-commit.ci] auto fixes from pre-commit.com hooks 2026-02-11 19:47:28 +00:00

Regex Generation Environment

An RL environment that trains language models to generate correct Python-compatible regular expressions from natural language descriptions and example test cases.

How it works

Each problem gives the model:

  • A natural language description of the pattern to match
  • A set of strings that should match
  • A set of strings that should not match

The model must produce a regex pattern inside <answer> tags. The pattern is tested using re.fullmatch() against all provided examples.

Reward signal

The reward is the fraction of test cases passed (both positive and negative). A score of 1.0 means the regex correctly matches all positive examples and rejects all negative ones. Groups where all rollouts score identically are discarded (no learning signal).

Problem set

The environment ships with 28 hand-crafted regex problems across three difficulty levels:

  • Easy: Basic patterns (digits only, starts with X, exact match)
  • Medium: Emails, dates, phone numbers, hex colors, zip codes
  • Hard: IPv4 addresses, semantic versioning, URLs, repeated words

Problems are split 80/20 into train/test sets.

Running

# Basic training
python regex_env.py serve \
    --env.tokenizer_name="NousResearch/DeepHermes-3-Llama-3-3B-Preview" \
    --openai.base_url http://localhost:9001/v1

# Only easy/medium problems
python regex_env.py serve \
    --env.difficulties='["easy", "medium"]'

Config options

Option Type Default Description
difficulties list[str] ["easy", "medium", "hard"] Difficulty levels to include
score_threshold float 1.0 Min score to count as "correct" in metrics

Standard BaseEnvConfig options (group_size, max_token_length, etc.) also apply.

Eval metrics

Metric Description
eval/avg_score Average fraction of test cases passed
eval/percent_perfect Fraction of problems with all tests passing
eval/percent_valid_regex Fraction of responses with syntactically valid regex
train/percent_correct Training accuracy (problems scoring above threshold)

Dependencies

No extra dependencies beyond what Atropos already provides. Uses only Python's built-in re module for regex validation.