atropos/environments/community/examcraft/README.md

# 🎓 ExamCraft: Adaptive LLM Teacher Training Environment

**Hackathon Submission**: Train language models to become adaptive teachers through reinforcement learning.

## 🌟 Overview

ExamCraft trains LLMs to be better teachers by generating adaptive questions, providing explanations, and creating personalized lesson plans. The environment rewards effective teaching strategies and penalizes poor ones.

### Key Features
- **Adaptive Question Generation**: Targets student weak areas automatically
- **Real-time Difficulty Adjustment**: Matches challenge level to student ability
- **Comprehensive Teaching Actions**: Questions, explanations, and lesson plans
- **Sophisticated Reward System**: Multi-factor scoring for teaching effectiveness
- **Student Learning Simulation**: Realistic proficiency progression

## 🚀 Quick Start

### Prerequisites
```bash
pip install -r requirements.txt
```

### Running the Environment

1. **Start Atropos trajectory API**:
```bash
run-api
```

2. **Run environment in serve mode**:
```bash
python examcraft_server.py serve --slurm false
```

3. **Generate inference-only rollouts**:
```bash
python examcraft_server.py process --env.data_path_to_save_groups demo_output.jsonl
```

### Training with Atropos
```bash
# Generate SFT data
atropos-sft-gen examcraft_sft.jsonl --tokenizer NousResearch/Hermes-3-Llama-3.1-8B

# Generate DPO data
atropos-dpo-gen examcraft_dpo.jsonl --tokenizer NousResearch/Hermes-3-Llama-3.1-8B
```

## 🎯 Environment Design

### Teaching Actions
- **QUESTION**: Generate adaptive multiple-choice questions
- **EXPLANATION**: Provide detailed concept explanations
- **LESSON_PLAN**: Create personalized study plans

### Reward Components
1. **Correctness Reward**: Base reward for student getting questions right
2. **Targeting Bonus**: Extra points for focusing on weak topics
3. **Difficulty Appropriateness**: Rewards for matching difficulty to ability
4. **Quality Bonus**: Higher scores for detailed, well-structured content
5. **Learning Impact**: Bonuses for explanations that boost understanding

### Student Simulation
- Probabilistic responses based on topic proficiency
- Dynamic learning from good teaching
- Realistic difficulty sensitivity
- Session momentum effects

## 📊 Example Metrics

The environment tracks:
- Student accuracy improvement across topics
- Teaching effectiveness scores
- Adaptive difficulty selection
- Content quality metrics
- Learning progression over time

## 🏆 Why This Matters

Adaptive AI tutoring can revolutionize education by:
- Personalizing learning experiences at scale
- Identifying knowledge gaps automatically
- Providing instant, detailed feedback
- Making quality education globally accessible

## 🔧 Configuration

### Student Profile Format
```json
{
  "student_id": "student001",
  "target_grade": "11th grade",
  "learning_goal": "Master linear algebra basics",
  "current_avg_score": 73,
  "topics": [
    {"name": "vectors", "proficiency": 0.65},
    {"name": "matrices", "proficiency": 0.50}
  ],
  "preferred_learning_style": "visual"
}
```

### Environment Parameters
- `max_questions_per_episode`: 8
- `student_learning_rate`: 0.03
- `enable_lesson_plans`: true

## 📈 Results Preview

After training, teachers learn to:
- Prioritize topics where students struggle most
- Adapt question difficulty based on recent performance
- Generate detailed explanations that boost understanding
- Create comprehensive lesson plans targeting weak areas

Built for the **Nous Research RL Environments Hackathon** 🚀