mirror of
https://github.com/NousResearch/atropos.git
synced 2026-04-19 12:57:58 +00:00
62 lines
2.8 KiB
Markdown
62 lines
2.8 KiB
Markdown
# Humor Generation Environment
|
|
|
|
## Overview
|
|
A reinforcement learning environment for training language models to generate humor in the style of specific comedians and formats. The environment uses a multi-dimensional scoring rubric to evaluate joke quality across relevance, style consistency, creativity, humor effectiveness, virality, and cognitive coherence.
|
|
|
|
## Features
|
|
- **Multi-Comedian Training**: Supports various comedian styles (Norm Macdonald, John Mulaney, Hasan Minhaj, Dave Chappelle, Ali Wong, Chris Rock)
|
|
- **Format Diversity**: Trains on different humor formats (haiku, one-liner, q/a over SMS)
|
|
- **Comprehensive Scoring**: 6-dimensional evaluation rubric for joke quality assessment
|
|
- **Dataset Generation**: Automated dataset creation using GPT-4o-mini
|
|
- **WandB Integration**: Comprehensive experiment tracking and visualization
|
|
|
|
## Environment Structure
|
|
- `humor_env.py`: Main environment implementation with scoring logic
|
|
- `generate_humor_dataset.py`: Script for creating training datasets
|
|
- `humor_dataset.jsonl`: Pre-generated dataset with comedian/format combinations
|
|
|
|
## Scoring Rubric
|
|
The environment evaluates generated jokes across six dimensions (0-3 points each):
|
|
1. **Relevance to Format** (0-2): How well the joke fits the specified format
|
|
2. **Style Consistency** (0-2): Adherence to the target comedian's style
|
|
3. **Creativity** (0-3): Originality and inventiveness of the humor
|
|
4. **Humor Effectiveness** (0-3): How funny and engaging the joke is
|
|
5. **Virality** (0-3): Potential for widespread appeal and sharing
|
|
6. **Cognitive Coherence** (0-3): Logical structure and comprehensibility
|
|
|
|
## Usage
|
|
|
|
### Running the Environment
|
|
```bash
|
|
python environments/community/humor_generation/humor_env.py serve
|
|
```
|
|
|
|
### Generating New Datasets
|
|
```bash
|
|
cd environments/community/humor_generation/
|
|
python generate_humor_dataset.py
|
|
```
|
|
|
|
## Configuration
|
|
- **Model**: GPT-4o-mini for both generation and evaluation
|
|
- **Group Size**: 2 completions per prompt
|
|
- **Max Tokens**: 2048 for joke generation, 512 for scoring
|
|
- **Evaluation**: LLM-based scoring using detailed rubric prompts
|
|
|
|
## Requirements
|
|
- OpenAI API key (set as `OPENAI_API_KEY` environment variable)
|
|
- Standard Atropos dependencies
|
|
- WandB account for experiment tracking
|
|
|
|
## Dataset Format
|
|
Each record contains:
|
|
- `comedian`: Target comedian style
|
|
- `format`: Humor format (haiku, one-liner, q/a over SMS)
|
|
- `question`: Prompt asking for model recommendations and example jokes
|
|
- `response`: GPT-4o-mini generated response with explanations and examples
|
|
|
|
## Training Applications
|
|
- **Style Transfer**: Learning to mimic specific comedian voices
|
|
- **Format Adaptation**: Generating humor in constrained formats
|
|
- **Quality Assessment**: Training models to evaluate humor effectiveness
|
|
- **Creative Writing**: Developing AI systems for entertainment content creation
|