atropos/environments/community/goofy_math/README.md

# GoofyMath 😂➗

A reinforcement learning environment that trains math models to be both *accurate* and *entertaining*.

## Demo Video

🎬 [Watch the 1-minute demo on YouTube]
( https://www.loom.com/share/8704f63e2d2e4b4db23eab673d7990a2?sid=3b78d63d-7cb0-44b2-a279-281c1be702b9 )

## Motivation & Design

Can a math tutor be both correct AND entertaining? We believe humor can dramatically improve learning outcomes.

The **GoofyMath** environment:
1. Takes standard GSM8K math problems
2. Uses a two-stage judging system:
   - First filters for mathematically correct solutions
   - Then ranks solutions by "goofiness" to reward entertaining explanations
3. Combines RLAIF (AI feedback) with objective correctness verification

The reward function: `score = correctness_score + (goofiness_bonus * 0.5)`
- Solutions MUST be correct (pass verification)
- Extra points (up to +0.5) for humor, sound effects, and creative explanations

## Quickstart

```bash
# Install requirements
pip install -r requirements.txt

# Run process mode to generate examples
export OPENAI_API_KEY=your_key_here
cd atropos
python environments/hack0/goofy_math_server.py process \
  --env.data_path_to_save_groups goofy_math_demo.jsonl \
  --env.total_steps 3
```

## WandB Run

📊 [View our WandB run](https://wandb.ai/goofymath/goofy_math/runs/z92gd2j4)

### Added Metrics
- **train/avg_goofiness_score**: Average goofiness score across solutions (0-1)
- **train/goofiness_histogram**: Distribution of goofiness scores
- **train/judgement_table**: Comparison table showing goofy vs standard solutions
- **train/percent_correct**: Accuracy rate (must maintain high performance)

## Technical Details

### Reward Hacking Prevention
- Goofiness is only rewarded AFTER correctness is verified
- Position bias eliminated by swapping solutions A/B in judgments
- Goofiness bonus capped at 50% of base reward

### Implementation Notes
- Uses RLAIF pattern with a novel twist: combining objective verification with subjective personality scoring
- Differentiator: most math tutoring systems optimize ONLY for correctness
- High-quality goofiness prompting designed to make explanations entertaining without sacrificing clarity

### Future Work
- Context-aware humor (different tones for different math concepts)
- Age-appropriate adjustments for younger vs. older students
- Personalized humor adaptation based on student feedback