mirror of
https://github.com/NousResearch/atropos.git
synced 2026-04-19 12:57:58 +00:00
116 lines
3.5 KiB
Markdown
116 lines
3.5 KiB
Markdown
# 🎓 ExamCraft: Adaptive LLM Teacher Training Environment
|
|
|
|
**Hackathon Submission**: Train language models to become adaptive teachers through reinforcement learning.
|
|
|
|
## 🌟 Overview
|
|
|
|
ExamCraft trains LLMs to be better teachers by generating adaptive questions, providing explanations, and creating personalized lesson plans. The environment rewards effective teaching strategies and penalizes poor ones.
|
|
|
|
### Key Features
|
|
- **Adaptive Question Generation**: Targets student weak areas automatically
|
|
- **Real-time Difficulty Adjustment**: Matches challenge level to student ability
|
|
- **Comprehensive Teaching Actions**: Questions, explanations, and lesson plans
|
|
- **Sophisticated Reward System**: Multi-factor scoring for teaching effectiveness
|
|
- **Student Learning Simulation**: Realistic proficiency progression
|
|
|
|
## 🚀 Quick Start
|
|
|
|
### Prerequisites
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
### Running the Environment
|
|
|
|
1. **Start Atropos trajectory API**:
|
|
```bash
|
|
run-api
|
|
```
|
|
|
|
2. **Run environment in serve mode**:
|
|
```bash
|
|
python examcraft_server.py serve --slurm false
|
|
```
|
|
|
|
3. **Generate inference-only rollouts**:
|
|
```bash
|
|
python examcraft_server.py process --env.data_path_to_save_groups demo_output.jsonl
|
|
```
|
|
|
|
### Training with Atropos
|
|
```bash
|
|
# Generate SFT data
|
|
atropos-sft-gen examcraft_sft.jsonl --tokenizer NousResearch/Hermes-3-Llama-3.1-8B
|
|
|
|
# Generate DPO data
|
|
atropos-dpo-gen examcraft_dpo.jsonl --tokenizer NousResearch/Hermes-3-Llama-3.1-8B
|
|
```
|
|
|
|
## 🎯 Environment Design
|
|
|
|
### Teaching Actions
|
|
- **QUESTION**: Generate adaptive multiple-choice questions
|
|
- **EXPLANATION**: Provide detailed concept explanations
|
|
- **LESSON_PLAN**: Create personalized study plans
|
|
|
|
### Reward Components
|
|
1. **Correctness Reward**: Base reward for student getting questions right
|
|
2. **Targeting Bonus**: Extra points for focusing on weak topics
|
|
3. **Difficulty Appropriateness**: Rewards for matching difficulty to ability
|
|
4. **Quality Bonus**: Higher scores for detailed, well-structured content
|
|
5. **Learning Impact**: Bonuses for explanations that boost understanding
|
|
|
|
### Student Simulation
|
|
- Probabilistic responses based on topic proficiency
|
|
- Dynamic learning from good teaching
|
|
- Realistic difficulty sensitivity
|
|
- Session momentum effects
|
|
|
|
## 📊 Example Metrics
|
|
|
|
The environment tracks:
|
|
- Student accuracy improvement across topics
|
|
- Teaching effectiveness scores
|
|
- Adaptive difficulty selection
|
|
- Content quality metrics
|
|
- Learning progression over time
|
|
|
|
## 🏆 Why This Matters
|
|
|
|
Adaptive AI tutoring can revolutionize education by:
|
|
- Personalizing learning experiences at scale
|
|
- Identifying knowledge gaps automatically
|
|
- Providing instant, detailed feedback
|
|
- Making quality education globally accessible
|
|
|
|
## 🔧 Configuration
|
|
|
|
### Student Profile Format
|
|
```json
|
|
{
|
|
"student_id": "student001",
|
|
"target_grade": "11th grade",
|
|
"learning_goal": "Master linear algebra basics",
|
|
"current_avg_score": 73,
|
|
"topics": [
|
|
{"name": "vectors", "proficiency": 0.65},
|
|
{"name": "matrices", "proficiency": 0.50}
|
|
],
|
|
"preferred_learning_style": "visual"
|
|
}
|
|
```
|
|
|
|
### Environment Parameters
|
|
- `max_questions_per_episode`: 8
|
|
- `student_learning_rate`: 0.03
|
|
- `enable_lesson_plans`: true
|
|
|
|
## 📈 Results Preview
|
|
|
|
After training, teachers learn to:
|
|
- Prioritize topics where students struggle most
|
|
- Adapt question difficulty based on recent performance
|
|
- Generate detailed explanations that boost understanding
|
|
- Create comprehensive lesson plans targeting weak areas
|
|
|
|
Built for the **Nous Research RL Environments Hackathon** 🚀
|