atropos/environments/community/examcraft/README.md

116 lines
3.5 KiB
Markdown

# 🎓 ExamCraft: Adaptive LLM Teacher Training Environment
**Hackathon Submission**: Train language models to become adaptive teachers through reinforcement learning.
## 🌟 Overview
ExamCraft trains LLMs to be better teachers by generating adaptive questions, providing explanations, and creating personalized lesson plans. The environment rewards effective teaching strategies and penalizes poor ones.
### Key Features
- **Adaptive Question Generation**: Targets student weak areas automatically
- **Real-time Difficulty Adjustment**: Matches challenge level to student ability
- **Comprehensive Teaching Actions**: Questions, explanations, and lesson plans
- **Sophisticated Reward System**: Multi-factor scoring for teaching effectiveness
- **Student Learning Simulation**: Realistic proficiency progression
## 🚀 Quick Start
### Prerequisites
```bash
pip install -r requirements.txt
```
### Running the Environment
1. **Start Atropos trajectory API**:
```bash
run-api
```
2. **Run environment in serve mode**:
```bash
python examcraft_server.py serve --slurm false
```
3. **Generate inference-only rollouts**:
```bash
python examcraft_server.py process --env.data_path_to_save_groups demo_output.jsonl
```
### Training with Atropos
```bash
# Generate SFT data
atropos-sft-gen examcraft_sft.jsonl --tokenizer NousResearch/Hermes-3-Llama-3.1-8B
# Generate DPO data
atropos-dpo-gen examcraft_dpo.jsonl --tokenizer NousResearch/Hermes-3-Llama-3.1-8B
```
## 🎯 Environment Design
### Teaching Actions
- **QUESTION**: Generate adaptive multiple-choice questions
- **EXPLANATION**: Provide detailed concept explanations
- **LESSON_PLAN**: Create personalized study plans
### Reward Components
1. **Correctness Reward**: Base reward for student getting questions right
2. **Targeting Bonus**: Extra points for focusing on weak topics
3. **Difficulty Appropriateness**: Rewards for matching difficulty to ability
4. **Quality Bonus**: Higher scores for detailed, well-structured content
5. **Learning Impact**: Bonuses for explanations that boost understanding
### Student Simulation
- Probabilistic responses based on topic proficiency
- Dynamic learning from good teaching
- Realistic difficulty sensitivity
- Session momentum effects
## 📊 Example Metrics
The environment tracks:
- Student accuracy improvement across topics
- Teaching effectiveness scores
- Adaptive difficulty selection
- Content quality metrics
- Learning progression over time
## 🏆 Why This Matters
Adaptive AI tutoring can revolutionize education by:
- Personalizing learning experiences at scale
- Identifying knowledge gaps automatically
- Providing instant, detailed feedback
- Making quality education globally accessible
## 🔧 Configuration
### Student Profile Format
```json
{
"student_id": "student001",
"target_grade": "11th grade",
"learning_goal": "Master linear algebra basics",
"current_avg_score": 73,
"topics": [
{"name": "vectors", "proficiency": 0.65},
{"name": "matrices", "proficiency": 0.50}
],
"preferred_learning_style": "visual"
}
```
### Environment Parameters
- `max_questions_per_episode`: 8
- `student_learning_rate`: 0.03
- `enable_lesson_plans`: true
## 📈 Results Preview
After training, teachers learn to:
- Prioritize topics where students struggle most
- Adapt question difficulty based on recent performance
- Generate detailed explanations that boost understanding
- Create comprehensive lesson plans targeting weak areas
Built for the **Nous Research RL Environments Hackathon** 🚀