mirror of
https://github.com/NousResearch/atropos.git
synced 2026-04-25 17:10:42 +00:00
Integrate chinguun101 goofy math (#145)
* Add GoofyMath environment for fun, engaging math learning * linting, moved to community folder * linting --------- Co-authored-by: chinguun101 <chinguun@uni.minerva.edu>
This commit is contained in:
parent
1a79132809
commit
ea304892ee
4 changed files with 649 additions and 5 deletions
64
environments/community/goofy_math/README.md
Normal file
64
environments/community/goofy_math/README.md
Normal file
|
|
@ -0,0 +1,64 @@
|
|||
# GoofyMath 😂➗
|
||||
|
||||
A reinforcement learning environment that trains math models to be both *accurate* and *entertaining*.
|
||||
|
||||
## Demo Video
|
||||
|
||||
🎬 [Watch the 1-minute demo on YouTube]
|
||||
( https://www.loom.com/share/8704f63e2d2e4b4db23eab673d7990a2?sid=3b78d63d-7cb0-44b2-a279-281c1be702b9 )
|
||||
|
||||
## Motivation & Design
|
||||
|
||||
Can a math tutor be both correct AND entertaining? We believe humor can dramatically improve learning outcomes.
|
||||
|
||||
The **GoofyMath** environment:
|
||||
1. Takes standard GSM8K math problems
|
||||
2. Uses a two-stage judging system:
|
||||
- First filters for mathematically correct solutions
|
||||
- Then ranks solutions by "goofiness" to reward entertaining explanations
|
||||
3. Combines RLAIF (AI feedback) with objective correctness verification
|
||||
|
||||
The reward function: `score = correctness_score + (goofiness_bonus * 0.5)`
|
||||
- Solutions MUST be correct (pass verification)
|
||||
- Extra points (up to +0.5) for humor, sound effects, and creative explanations
|
||||
|
||||
## Quickstart
|
||||
|
||||
```bash
|
||||
# Install requirements
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Run process mode to generate examples
|
||||
export OPENAI_API_KEY=your_key_here
|
||||
cd atropos
|
||||
python environments/hack0/goofy_math_server.py process \
|
||||
--env.data_path_to_save_groups goofy_math_demo.jsonl \
|
||||
--env.total_steps 3
|
||||
```
|
||||
|
||||
## WandB Run
|
||||
|
||||
📊 [View our WandB run](https://wandb.ai/goofymath/goofy_math/runs/z92gd2j4)
|
||||
|
||||
### Added Metrics
|
||||
- **train/avg_goofiness_score**: Average goofiness score across solutions (0-1)
|
||||
- **train/goofiness_histogram**: Distribution of goofiness scores
|
||||
- **train/judgement_table**: Comparison table showing goofy vs standard solutions
|
||||
- **train/percent_correct**: Accuracy rate (must maintain high performance)
|
||||
|
||||
## Technical Details
|
||||
|
||||
### Reward Hacking Prevention
|
||||
- Goofiness is only rewarded AFTER correctness is verified
|
||||
- Position bias eliminated by swapping solutions A/B in judgments
|
||||
- Goofiness bonus capped at 50% of base reward
|
||||
|
||||
### Implementation Notes
|
||||
- Uses RLAIF pattern with a novel twist: combining objective verification with subjective personality scoring
|
||||
- Differentiator: most math tutoring systems optimize ONLY for correctness
|
||||
- High-quality goofiness prompting designed to make explanations entertaining without sacrificing clarity
|
||||
|
||||
### Future Work
|
||||
- Context-aware humor (different tones for different math concepts)
|
||||
- Age-appropriate adjustments for younger vs. older students
|
||||
- Personalized humor adaptation based on student feedback
|
||||
Loading…
Add table
Add a link
Reference in a new issue