Integrate chinguun101 goofy math (#145)

* Add GoofyMath environment for fun, engaging math learning * linting, moved to community folder * linting --------- Co-authored-by: chinguun101 <chinguun@uni.minerva.edu>
2026-04-25 17:10:42 +00:00 · 2025-05-28 12:11:02 +10:00 · 2025-05-28 12:11:02 +10:00 · ea304892ee
commit ea304892ee
parent 1a79132809
4 changed files with 649 additions and 5 deletions
--- a/environments/community/goofy_math/README.md
+++ b/environments/community/goofy_math/README.md
@ -0,0 +1,64 @@
+# GoofyMath 😂➗
+
+A reinforcement learning environment that trains math models to be both *accurate* and *entertaining*.
+
+## Demo Video
+
+🎬 [Watch the 1-minute demo on YouTube]
+( https://www.loom.com/share/8704f63e2d2e4b4db23eab673d7990a2?sid=3b78d63d-7cb0-44b2-a279-281c1be702b9 )
+
+## Motivation & Design
+
+Can a math tutor be both correct AND entertaining? We believe humor can dramatically improve learning outcomes.
+
+The **GoofyMath** environment:
+1. Takes standard GSM8K math problems
+2. Uses a two-stage judging system:
+   - First filters for mathematically correct solutions
+   - Then ranks solutions by "goofiness" to reward entertaining explanations
+3. Combines RLAIF (AI feedback) with objective correctness verification
+
+The reward function: `score = correctness_score + (goofiness_bonus * 0.5)`
+- Solutions MUST be correct (pass verification)
+- Extra points (up to +0.5) for humor, sound effects, and creative explanations
+
+## Quickstart
+
+```bash
+# Install requirements
+pip install -r requirements.txt
+
+# Run process mode to generate examples
+export OPENAI_API_KEY=your_key_here
+cd atropos
+python environments/hack0/goofy_math_server.py process \
+  --env.data_path_to_save_groups goofy_math_demo.jsonl \
+  --env.total_steps 3
+```
+
+## WandB Run
+
+📊 [View our WandB run](https://wandb.ai/goofymath/goofy_math/runs/z92gd2j4)
+
+### Added Metrics
+- **train/avg_goofiness_score**: Average goofiness score across solutions (0-1)
+- **train/goofiness_histogram**: Distribution of goofiness scores
+- **train/judgement_table**: Comparison table showing goofy vs standard solutions
+- **train/percent_correct**: Accuracy rate (must maintain high performance)
+
+## Technical Details
+
+### Reward Hacking Prevention
+- Goofiness is only rewarded AFTER correctness is verified
+- Position bias eliminated by swapping solutions A/B in judgments
+- Goofiness bonus capped at 50% of base reward
+
+### Implementation Notes
+- Uses RLAIF pattern with a novel twist: combining objective verification with subjective personality scoring
+- Differentiator: most math tutoring systems optimize ONLY for correctness
+- High-quality goofiness prompting designed to make explanations entertaining without sacrificing clarity
+
+### Future Work
+- Context-aware humor (different tones for different math concepts)
+- Age-appropriate adjustments for younger vs. older students
+- Personalized humor adaptation based on student feedback