# Regex Generation Environment An RL environment that trains language models to generate correct Python-compatible regular expressions from natural language descriptions and example test cases. ## How it works Each problem gives the model: - A natural language description of the pattern to match - A set of strings that **should** match - A set of strings that **should not** match The model must produce a regex pattern inside `` tags. The pattern is tested using `re.fullmatch()` against all provided examples. ## Reward signal The reward is the fraction of test cases passed (both positive and negative). A score of 1.0 means the regex correctly matches all positive examples and rejects all negative ones. Groups where all rollouts score identically are discarded (no learning signal). ## Problem set The environment ships with 28 hand-crafted regex problems across three difficulty levels: - **Easy**: Basic patterns (digits only, starts with X, exact match) - **Medium**: Emails, dates, phone numbers, hex colors, zip codes - **Hard**: IPv4 addresses, semantic versioning, URLs, repeated words Problems are split 80/20 into train/test sets. ## Running ```bash # Basic training python regex_env.py serve \ --env.tokenizer_name="NousResearch/DeepHermes-3-Llama-3-3B-Preview" \ --openai.base_url http://localhost:9001/v1 # Only easy/medium problems python regex_env.py serve \ --env.difficulties='["easy", "medium"]' ``` ## Config options | Option | Type | Default | Description | |--------|------|---------|-------------| | `difficulties` | list[str] | `["easy", "medium", "hard"]` | Difficulty levels to include | | `score_threshold` | float | `1.0` | Min score to count as "correct" in metrics | Standard `BaseEnvConfig` options (`group_size`, `max_token_length`, etc.) also apply. ## Eval metrics | Metric | Description | |--------|-------------| | `eval/avg_score` | Average fraction of test cases passed | | `eval/percent_perfect` | Fraction of problems with all tests passing | | `eval/percent_valid_regex` | Fraction of responses with syntactically valid regex | | `train/percent_correct` | Training accuracy (problems scoring above threshold) | ## Dependencies No extra dependencies beyond what Atropos already provides. Uses only Python's built-in `re` module for regex validation.