Add SQL Query Generation Environment

2026-04-19 12:57:58 +00:00 · 2026-01-05 16:43:34 +01:00 · 2026-01-05 16:43:34 +01:00 · c927794248
commit c927794248
parent 6d79d4c7ad
6 changed files with 1427 additions and 0 deletions
--- a/environments/community/sql_query_env/README.md
+++ b/environments/community/sql_query_env/README.md
@ -0,0 +1,115 @@
+# SQL Query Generation Environment
+
+Train LLMs to generate correct SQL queries from natural language questions.
+
+## Overview
+
+This environment uses the [Salesforce/WikiSQL](https://huggingface.co/datasets/Salesforce/wikisql) dataset to train language models on text-to-SQL tasks. Queries are verified by **executing** the generated SQL against in-memory SQLite databases and comparing results to ground truth.
+
+## Dataset
+
+- **Source**: [Salesforce/WikiSQL](https://huggingface.co/datasets/Salesforce/wikisql)
+- **Size**: 80,654 examples (train + validation + test)
+- **Format**: Natural language questions with table schemas and ground truth SQL
+
+## Usage
+
+### Training Mode (with API Server)
+
+```bash
+# Terminal 1: Start the Atropos API
+run-api
+
+# Terminal 2: Run the environment
+python sql_query_env.py serve --slurm False
+```
+
+### Local Testing (without API)
+
+```bash
+python sql_query_env.py process --env.data_path_to_save_groups sql_output.jsonl
+```
+
+This generates `sql_output.jsonl` and `sql_output.html` for inspection.
+
+### With Local vLLM Server
+
+```bash
+python sql_query_env.py process \
+    --env.data_path_to_save_groups sql_output.jsonl \
+    --openai.base_url http://localhost:9001/v1 \
+    --openai.model_name YOUR_MODEL_NAME
+```
+
+## Reward Function
+
+| Score | Condition |
+|-------|-----------|
+| **1.0** | Generated SQL executes and returns same result as gold SQL |
+| **-1.0** | SQL fails to execute or returns incorrect result |
+
+When all responses in a group are correct, a length penalty is applied to encourage concise solutions.
+
+## Prompt Format
+
+The model receives a table schema and question:
+
+```
+Table: data
+Columns: col1, col2, col3
+Sample data:
+  value1 | value2 | value3
+
+Question: What is the value of col1 where col2 equals X?
+```
+
+Output should be in boxed format:
+```
+<think>
+[Chain of thought reasoning]
+</think>
+
+\boxed{SELECT col1 FROM data WHERE col2 = 'X'}
+```
+
+## Unit Tests
+
+```bash
+# Run unit tests
+python -m pytest test_sql_executor.py -v
+```
+
+All 19 tests cover:
+- Table creation with special column names
+- SQL execution and error handling
+- `\boxed{}` extraction patterns
+- Result comparison and normalization
+- End-to-end scoring integration
+
+## LLM Integration Test
+
+The environment has been verified with Qwen3-8B on an NVIDIA H200:
+
+```bash
+# Run integration test with a local vLLM server
+python test_integration.py --base_url http://localhost:8000/v1 --model Qwen/Qwen3-8B
+```
+
+Test results:
+- **40% accuracy** on 10 random WikiSQL examples
+- SQL extraction from `\boxed{}` working correctly
+- Execution-based scoring producing correct reward signals
+
+## Files
+
+| File | Description |
+|------|-------------|
+| `sql_query_env.py` | Main environment implementation |
+| `sql_executor.py` | SQLite execution and scoring utilities |
+| `wikisql_loader.py` | WikiSQL dataset loader (from GitHub) |
+| `test_sql_executor.py` | Unit tests (19 tests) |
+| `test_integration.py` | LLM integration test |
+
+## Author
+
+Community contribution to Atropos.