mirror of https://github.com/NousResearch/atropos.git synced 2026-04-19 12:57:58 +00:00

Shannon Sands 6e55cbc448 Add edmundman's UFC prediction environment with sample dataset - Moved UFC environment from hack0/ to community/ufc_prediction_env/ - Fixed all linting issues: unused imports, long lines, unused variables - Trimmed large_dataset.csv to 799 records (459KB) to meet repository limits - Added comprehensive documentation to community README - Environment features both text-based and image-based fight prediction - Generates entertaining TTS-ready commentary for voice synthesis - Includes web scraping tools and Flask UI interface

2025-05-23 15:47:12 +10:00

8.8 KiB

Raw Blame History

Community Environments

This directory is home to community-contributed training environments for Atropos. Environments submitted by the community will be placed here after an initial code review.

Note: Environments in this directory are pending full testing and integration. While they have passed a basic code check, they may not yet have been rigorously validated on our compute cluster.

Contributing Your Environment

We encourage you to contribute your own RL environments! When developing a new environment, please follow these guidelines:

Create your environment in this environments/community/ subdirectory. This helps us keep new submissions organized.
Preferred Import Style: We prefer that you treat your environment's directory as the package root for imports within your environment code. For example, if you need to import SomeClass, you can do so directly:
```
from some_file_in_my_env import SomeClass
```
This helps maintain consistency and makes it easier to integrate your environment.

Environment Standards

Community environments should:

Include clear documentation and setup instructions
Specify all dependencies in requirements files
Provide example configurations and usage
Follow the AtroposBaseEnv pattern for consistency
Include appropriate error handling and validation

Submission Process

To contribute a new environment to the community collection:

Fork the repository and create a new branch
Add your environment to this community/ directory
Include comprehensive documentation:
- README with setup instructions
- Requirements file for dependencies
- Example usage and configuration
Follow naming conventions:
- Use descriptive directory names for complex environments
- Single file environments should have descriptive names
Test thoroughly before submitting
Submit a pull request with a clear description

Once your environment is ready, please follow the guidelines in our main CONTRIBUTING.md to submit your contribution.

Available Environments

1. Lean Proof Environment (`lean_proof_env/`)

Author: GabinFay Purpose: Testing Language Learning Models (LLMs) on Lean theorem proving tasks

A comprehensive environment for evaluating LLMs on formal mathematical reasoning using the Lean theorem prover. Features include:

Support for custom problem datasets or MiniF2F benchmark
Integration with Lean 4 theorem prover
Configurable difficulty levels and problem sets
Automated proof validation

Requirements: Lean 4 installation, OpenAI API key

2. Router Environment (`router_env/`)

Author: GabinFay Purpose: Multi-agent routing and coordination system

A sophisticated environment for testing agent routing and coordination capabilities. Includes:

Multiple specialized agents (calendar, contact, Gmail, telephony, etc.)
Model Contextualized Protocol (MCP) tools integration
Spotify, Google Maps, and Perplexity integrations
Complex multi-turn conversation handling

Features:

Telephony agent with inbound/outbound call handling
Calendar and contact management
Memory and calculation agents
Router agent for intelligent task delegation

3. Philosophical RLAIF Environment (`philosophical_rlaif_env.py`)

Author: GabinFay Purpose: Reinforcement Learning from AI Feedback (RLAIF) for philosophical reasoning

An environment focused on training models for deep philosophical inquiry and reasoning. Features:

Deep thinking prompts with systematic reasoning processes
Preference learning for philosophical depth and nuance
Multi-perspective analysis and assumption questioning
Evaluation of response quality for philosophical discussions

Capabilities:

Generates paired responses for preference comparison
Uses judge models to evaluate philosophical depth
Tracks preference consistency and reasoning quality
Supports WandB logging for training insights

4. Playwright Agent Environment (`playwright_agent_env.py`)

Author: erikqu Purpose: Web automation and browser interaction for LLM agents

A comprehensive environment for training LLMs to interact with web pages through browser automation. Features:

Playwright-based browser control with headless operation
Screenshot-based visual input for LLM decision making
JSON-based action commands (navigate, click, type, finish)
Video recording of browser sessions for evaluation
Google Gemini integration for success evaluation

Capabilities:

Loads tasks from WebVoyager dataset or custom task definitions
Supports development mode for testing without LLM calls
Automatic reward computation based on success and efficiency
Comprehensive error handling and fallback mechanisms
Integration with Atropos training pipeline

Requirements: Playwright, optional Google Gemini API for evaluation

5. Metric Card Generator Environment (`metric_card_generator/`)

Author: vivek100 Purpose: Structured JSON generation for AI model evaluation dashboards

A comprehensive environment for training LLMs to generate well-structured JSON configurations for Metric Card UI components. Features:

Closed-loop generation, evaluation, and visualization pipeline
Schema validation for JSON metric card configurations
Multi-dimensional evaluation (validity, compliance, semantic quality)
Support for various business domains and metric types
WandB integration for performance tracking

Capabilities:

Generates metric cards for diverse business contexts (e-commerce, finance, healthcare, etc.)
Validates JSON structure against predefined schemas
Evaluates semantic quality and formatting consistency
Provides training data extraction and filtering utilities
Includes visualization tools for score distribution analysis

Components:

metric_card_generator.py: Main environment implementation
extract_metric_training.py: Training data extraction utility
trainingDataScript.py: Dataset creation from collected examples
show_score_distribution.py: Performance analysis visualization

Requirements: Pydantic, tqdm

6. UFC Prediction Environment (`ufc_prediction_env/`)

Author: edmundman Repository: UFC_FIGHT_PREDICTOR Purpose: UFC fight prediction with entertaining TTS-ready commentary generation

A creative environment that transforms traditional fight prediction into engaging entertainment by generating dynamic, broadcast-style UFC fight commentary. Features both text-based and image-based prediction modes:

Text-Based Predictor (ufc_server.py):

Uses comprehensive fighter statistics (wins/losses, physical attributes, performance metrics)
Generates dramatic fight commentary with commentator personalities
TTS-ready output with natural speech patterns and emphasis markers
Statistical analysis wrapped in entertaining storytelling

Image-Based Predictor (ufc_image_env.py):

Multimodal prediction using fighter profile images
Visual analysis transformed into engaging commentary
Base64 image encoding for API compatibility
Creates dramatic narratives from fighter appearances

Key Features:

Entertainment-first approach with broadcast-style commentary
Direct TTS integration compatibility (designed for models like DIA)
Dramatic elements including commentator phrases and pauses
Proper formatting for voice synthesis applications
Comprehensive scoring system for prediction accuracy and entertainment value

Data Components:

fighter_stats.csv: Detailed fighter statistics and performance metrics
large_dataset.csv: Sample historical fight data (799 records from original 7,440)
fighter_images/: Profile images for visual-based predictions
get_images.py: Web scraping utility for fighter image collection

Note: The included dataset is a sample for demonstration. The full dataset (7,440 fight records) is available in the original UFC_FIGHT_PREDICTOR repository.

Additional Tools:

ufc_predictor_ui.py: Flask-based web interface for interactive predictions
Video demonstrations and example runs available
WandB integration for training tracking

Requirements: PIL, OpenAI API, Flask (for UI), BeautifulSoup4 (for image scraping)

Support

For questions or issues with community environments:

Check the individual environment's README first
Open an issue in the main repository
Tag the environment author if possible

These environments are community contributions and may have different maintenance levels and support compared to core Atropos environments.

8.8 KiB Raw Blame History

Community Environments

Contributing Your Environment

Environment Standards

Submission Process

Available Environments

1. Lean Proof Environment (lean_proof_env/)

2. Router Environment (router_env/)

3. Philosophical RLAIF Environment (philosophical_rlaif_env.py)

4. Playwright Agent Environment (playwright_agent_env.py)

5. Metric Card Generator Environment (metric_card_generator/)

6. UFC Prediction Environment (ufc_prediction_env/)

Support

8.8 KiB

Raw Blame History

1. Lean Proof Environment (`lean_proof_env/`)

2. Router Environment (`router_env/`)

3. Philosophical RLAIF Environment (`philosophical_rlaif_env.py`)

4. Playwright Agent Environment (`playwright_agent_env.py`)

5. Metric Card Generator Environment (`metric_card_generator/`)

6. UFC Prediction Environment (`ufc_prediction_env/`)