Add regex generation environment for community

2026-04-19 12:57:58 +00:00 · 2026-02-11 23:01:48 +03:30 · 2026-02-11 23:01:48 +03:30 · 86d5163316
commit 86d5163316
parent 81b2d4daab
4 changed files with 822 additions and 0 deletions
--- a/environments/community/regex_generation/README.md
+++ b/environments/community/regex_generation/README.md
@ -0,0 +1,61 @@
+# Regex Generation Environment
+
+An RL environment that trains language models to generate correct Python-compatible regular expressions from natural language descriptions and example test cases.
+
+## How it works
+
+Each problem gives the model:
+- A natural language description of the pattern to match
+- A set of strings that **should** match
+- A set of strings that **should not** match
+
+The model must produce a regex pattern inside `<answer>` tags. The pattern is tested using `re.fullmatch()` against all provided examples.
+
+## Reward signal
+
+The reward is the fraction of test cases passed (both positive and negative). A score of 1.0 means the regex correctly matches all positive examples and rejects all negative ones. Groups where all rollouts score identically are discarded (no learning signal).
+
+## Problem set
+
+The environment ships with 28 hand-crafted regex problems across three difficulty levels:
+
+- **Easy**: Basic patterns (digits only, starts with X, exact match)
+- **Medium**: Emails, dates, phone numbers, hex colors, zip codes
+- **Hard**: IPv4 addresses, semantic versioning, URLs, repeated words
+
+Problems are split 80/20 into train/test sets.
+
+## Running
+
+```bash
+# Basic training
+python regex_env.py serve \
+    --env.tokenizer_name="NousResearch/DeepHermes-3-Llama-3-3B-Preview" \
+    --openai.base_url http://localhost:9001/v1
+
+# Only easy/medium problems
+python regex_env.py serve \
+    --env.difficulties='["easy", "medium"]'
+```
+
+## Config options
+
+| Option | Type | Default | Description |
+|--------|------|---------|-------------|
+| `difficulties` | list[str] | `["easy", "medium", "hard"]` | Difficulty levels to include |
+| `score_threshold` | float | `1.0` | Min score to count as "correct" in metrics |
+
+Standard `BaseEnvConfig` options (`group_size`, `max_token_length`, etc.) also apply.
+
+## Eval metrics
+
+| Metric | Description |
+|--------|-------------|
+| `eval/avg_score` | Average fraction of test cases passed |
+| `eval/percent_perfect` | Fraction of problems with all tests passing |
+| `eval/percent_valid_regex` | Fraction of responses with syntactically valid regex |
+| `train/percent_correct` | Training accuracy (problems scoring above threshold) |
+
+## Dependencies
+
+No extra dependencies beyond what Atropos already provides. Uses only Python's built-in `re` module for regex validation.