atropos/environments/community/sql_query_env
2026-01-06 15:14:26 +01:00
..
README.md Add SQL Query Generation Environment 2026-01-06 15:14:26 +01:00
sql_executor.py Add SQL Query Generation Environment 2026-01-06 15:14:26 +01:00
sql_query_env.py Quote fix 2026-01-06 15:14:26 +01:00
test_integration.py Add SQL Query Generation Environment 2026-01-06 15:14:26 +01:00
test_sql_executor.py Add SQL Query Generation Environment 2026-01-06 15:14:26 +01:00
wikisql_loader.py Add SQL Query Generation Environment 2026-01-06 15:14:26 +01:00

SQL Query Generation Environment

Train LLMs to generate correct SQL queries from natural language questions.

Overview

This environment uses the Salesforce/WikiSQL dataset to train language models on text-to-SQL tasks. Queries are verified by executing the generated SQL against in-memory SQLite databases and comparing results to ground truth.

Dataset

  • Source: Salesforce/WikiSQL
  • Size: 80,654 examples (train + validation + test)
  • Format: Natural language questions with table schemas and ground truth SQL

Usage

Training Mode (with API Server)

# Terminal 1: Start the Atropos API
run-api

# Terminal 2: Run the environment
python sql_query_env.py serve --slurm False

Local Testing (without API)

python sql_query_env.py process --env.data_path_to_save_groups sql_output.jsonl

This generates sql_output.jsonl and sql_output.html for inspection.

With Local vLLM Server

python sql_query_env.py process \
    --env.data_path_to_save_groups sql_output.jsonl \
    --openai.base_url http://localhost:9001/v1 \
    --openai.model_name YOUR_MODEL_NAME

Reward Function

Score Condition
1.0 Generated SQL executes and returns same result as gold SQL
-1.0 SQL fails to execute or returns incorrect result

When all responses in a group are correct, a length penalty is applied to encourage concise solutions.

Prompt Format

The model receives a table schema and question:

Table: data
Columns: col1, col2, col3
Sample data:
  value1 | value2 | value3

Question: What is the value of col1 where col2 equals X?

Output should be in boxed format:

<think>
[Chain of thought reasoning]
</think>

\boxed{SELECT col1 FROM data WHERE col2 = 'X'}

Unit Tests

# Run unit tests
python -m pytest test_sql_executor.py -v

All 19 tests cover:

  • Table creation with special column names
  • SQL execution and error handling
  • \boxed{} extraction patterns
  • Result comparison and normalization
  • End-to-end scoring integration

LLM Integration Test

The environment has been verified with Qwen3-8B on an NVIDIA H200:

# Run integration test with a local vLLM server
python test_integration.py --base_url http://localhost:8000/v1 --model Qwen/Qwen3-8B

Test results:

  • 40% accuracy on 10 random WikiSQL examples
  • SQL extraction from \boxed{} working correctly
  • Execution-based scoring producing correct reward signals

Files

File Description
sql_query_env.py Main environment implementation
sql_executor.py SQLite execution and scoring utilities
wikisql_loader.py WikiSQL dataset loader (from GitHub)
test_sql_executor.py Unit tests (19 tests)
test_integration.py LLM integration test

Author

Community contribution to Atropos.