mirror of
https://github.com/NousResearch/atropos.git
synced 2026-04-19 12:57:58 +00:00
Integrate chinguun101 goofy math (#145)
* Add GoofyMath environment for fun, engaging math learning * linting, moved to community folder * linting --------- Co-authored-by: chinguun101 <chinguun@uni.minerva.edu>
This commit is contained in:
parent
1a79132809
commit
ea304892ee
4 changed files with 649 additions and 5 deletions
|
|
@ -338,12 +338,8 @@ Every *SEARCH/REPLACE* edit must use this format:
|
|||
Here is an example:
|
||||
```python
|
||||
### mathweb/flask/app.py
|
||||
<<<<<<< SEARCH
|
||||
from flask import Flask
|
||||
=======
|
||||
import math
|
||||
from flask import Flask
|
||||
>>>>>>> REPLACE
|
||||
```
|
||||
Please note that the *SEARCH/REPLACE* edit REQUIRES PROPER INDENTATION. If you would like to add the line ’ print(x)’, you must fully write that out, with all those spaces before the code!
|
||||
Wrap each *SEARCH/REPLACE* edit in a code block as shown in the example above. If you have multiple *SEARCH/REPLACE* edits, use a separate code block for each one.
|
||||
|
|
@ -507,7 +503,62 @@ python -m atroposlib.cli.dpo \
|
|||
- **Combined Scoring**: Overall article score in [-1, 1] range balancing quality and accuracy
|
||||
- **W&B Integration**: Complete research session tracking with tool usage analytics
|
||||
|
||||
## 33. Options Implied Volatility Prediction Environment
|
||||
## 33. Goofy Math Environment
|
||||
|
||||
**Location:** `environments/community/goofy_math/`
|
||||
**Contributor:** [chinguun101](https://github.com/chinguun101)
|
||||
**PR:** [#91](https://github.com/NousResearch/atropos/pull/91)
|
||||
|
||||
### Core Features
|
||||
- **Dual Reward System**: Mathematical correctness verification + goofiness scoring
|
||||
- **RLAIF-Based Judging**: AI feedback system for ranking entertaining vs. standard solutions
|
||||
- **GSM8K Integration**: Uses standard math dataset with humor enhancement overlay
|
||||
- **Position Bias Elimination**: Forward/reverse judgment pairs to ensure fair evaluation
|
||||
|
||||
### Technical Implementation
|
||||
- **Environment Name**: `goofy_math`
|
||||
- **Correctness Verification**: Uses `math_verify` and `latex2sympy2_extended` for objective scoring
|
||||
- **Goofiness Assessment**: LLM judge evaluates entertainment value of mathematically correct solutions
|
||||
- **Reward Formula**: `score = correctness_score + (goofiness_bonus * 0.5)`
|
||||
- **Output Format**: `<think>...</think>` reasoning + `\boxed{answer}` format
|
||||
|
||||
### Research Applications
|
||||
- **Educational AI**: Training math tutors that are both accurate and engaging
|
||||
- **Personality Injection**: Adding entertainment value while maintaining technical correctness
|
||||
- **Multi-Objective Optimization**: Balancing objective accuracy with subjective entertainment
|
||||
- **Humor in AI**: Systematic approach to training models for appropriate comedic timing
|
||||
|
||||
### Setup and Usage
|
||||
```bash
|
||||
# Install requirements
|
||||
pip install -r environments/community/goofy_math/requirements.txt
|
||||
|
||||
# Environment variables
|
||||
export OPENAI_API_KEY="your-key"
|
||||
|
||||
# Process mode for examples
|
||||
python environments/community/goofy_math/goofy_math_server.py process \
|
||||
--env.data_path_to_save_groups goofy_math_demo.jsonl \
|
||||
--env.total_steps 3
|
||||
|
||||
# Training mode
|
||||
python -m atroposlib.cli.dpo \
|
||||
--env-module "environments.community.goofy_math.goofy_math_server"
|
||||
```
|
||||
|
||||
### Performance Characteristics
|
||||
- **Correctness Requirement**: Solutions must pass mathematical verification to receive any reward
|
||||
- **Goofiness Scoring**: 0-1 range based on humor, sound effects, and creative explanations
|
||||
- **Reward Distribution**: Base 1.0 for correctness + up to 0.5 bonus for entertainment value
|
||||
- **Anti-Reward Hacking**: Goofiness only evaluated after correctness verification
|
||||
- **W&B Integration**: Tracks goofiness histograms, judgment tables, and accuracy metrics
|
||||
|
||||
### Demo and Results
|
||||
- **Video Demo**: [1-minute demonstration](https://www.loom.com/share/8704f63e2d2e4b4db23eab673d7990a2)
|
||||
- **WandB Run**: [Experiment tracking](https://wandb.ai/goofymath/goofy_math/runs/z92gd2j4)
|
||||
- **Unique Metrics**: `train/avg_goofiness_score`, `train/goofiness_histogram`, `train/judgement_table`
|
||||
|
||||
## 34. Options Implied Volatility Prediction Environment
|
||||
|
||||
**Location:** `environments/community/options_iv_prediction/`
|
||||
**Contributor:** [michaelwaves](https://github.com/michaelwaves)
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue