8.8 KiB
Community Environments
This directory is home to community-contributed training environments for Atropos. Environments submitted by the community will be placed here after an initial code review.
Note: Environments in this directory are pending full testing and integration. While they have passed a basic code check, they may not yet have been rigorously validated on our compute cluster.
Contributing Your Environment
We encourage you to contribute your own RL environments! When developing a new environment, please follow these guidelines:
- Create your environment in this
environments/community/subdirectory. This helps us keep new submissions organized. - Preferred Import Style: We prefer that you treat your environment's directory as the package root for imports within your environment code. For example, if you need to import
SomeClass, you can do so directly:
This helps maintain consistency and makes it easier to integrate your environment.from some_file_in_my_env import SomeClass
Environment Standards
Community environments should:
- Include clear documentation and setup instructions
- Specify all dependencies in requirements files
- Provide example configurations and usage
- Follow the AtroposBaseEnv pattern for consistency
- Include appropriate error handling and validation
Submission Process
To contribute a new environment to the community collection:
- Fork the repository and create a new branch
- Add your environment to this
community/directory - Include comprehensive documentation:
- README with setup instructions
- Requirements file for dependencies
- Example usage and configuration
- Follow naming conventions:
- Use descriptive directory names for complex environments
- Single file environments should have descriptive names
- Test thoroughly before submitting
- Submit a pull request with a clear description
Once your environment is ready, please follow the guidelines in our main CONTRIBUTING.md to submit your contribution.
Available Environments
1. Lean Proof Environment (lean_proof_env/)
Author: GabinFay Purpose: Testing Language Learning Models (LLMs) on Lean theorem proving tasks
A comprehensive environment for evaluating LLMs on formal mathematical reasoning using the Lean theorem prover. Features include:
- Support for custom problem datasets or MiniF2F benchmark
- Integration with Lean 4 theorem prover
- Configurable difficulty levels and problem sets
- Automated proof validation
Requirements: Lean 4 installation, OpenAI API key
2. Router Environment (router_env/)
Author: GabinFay Purpose: Multi-agent routing and coordination system
A sophisticated environment for testing agent routing and coordination capabilities. Includes:
- Multiple specialized agents (calendar, contact, Gmail, telephony, etc.)
- Model Contextualized Protocol (MCP) tools integration
- Spotify, Google Maps, and Perplexity integrations
- Complex multi-turn conversation handling
Features:
- Telephony agent with inbound/outbound call handling
- Calendar and contact management
- Memory and calculation agents
- Router agent for intelligent task delegation
3. Philosophical RLAIF Environment (philosophical_rlaif_env.py)
Author: GabinFay Purpose: Reinforcement Learning from AI Feedback (RLAIF) for philosophical reasoning
An environment focused on training models for deep philosophical inquiry and reasoning. Features:
- Deep thinking prompts with systematic reasoning processes
- Preference learning for philosophical depth and nuance
- Multi-perspective analysis and assumption questioning
- Evaluation of response quality for philosophical discussions
Capabilities:
- Generates paired responses for preference comparison
- Uses judge models to evaluate philosophical depth
- Tracks preference consistency and reasoning quality
- Supports WandB logging for training insights
4. Playwright Agent Environment (playwright_agent_env.py)
Author: erikqu Purpose: Web automation and browser interaction for LLM agents
A comprehensive environment for training LLMs to interact with web pages through browser automation. Features:
- Playwright-based browser control with headless operation
- Screenshot-based visual input for LLM decision making
- JSON-based action commands (navigate, click, type, finish)
- Video recording of browser sessions for evaluation
- Google Gemini integration for success evaluation
Capabilities:
- Loads tasks from WebVoyager dataset or custom task definitions
- Supports development mode for testing without LLM calls
- Automatic reward computation based on success and efficiency
- Comprehensive error handling and fallback mechanisms
- Integration with Atropos training pipeline
Requirements: Playwright, optional Google Gemini API for evaluation
5. Metric Card Generator Environment (metric_card_generator/)
Author: vivek100 Purpose: Structured JSON generation for AI model evaluation dashboards
A comprehensive environment for training LLMs to generate well-structured JSON configurations for Metric Card UI components. Features:
- Closed-loop generation, evaluation, and visualization pipeline
- Schema validation for JSON metric card configurations
- Multi-dimensional evaluation (validity, compliance, semantic quality)
- Support for various business domains and metric types
- WandB integration for performance tracking
Capabilities:
- Generates metric cards for diverse business contexts (e-commerce, finance, healthcare, etc.)
- Validates JSON structure against predefined schemas
- Evaluates semantic quality and formatting consistency
- Provides training data extraction and filtering utilities
- Includes visualization tools for score distribution analysis
Components:
metric_card_generator.py: Main environment implementationextract_metric_training.py: Training data extraction utilitytrainingDataScript.py: Dataset creation from collected examplesshow_score_distribution.py: Performance analysis visualization
Requirements: Pydantic, tqdm
6. UFC Prediction Environment (ufc_prediction_env/)
Author: edmundman Repository: UFC_FIGHT_PREDICTOR Purpose: UFC fight prediction with entertaining TTS-ready commentary generation
A creative environment that transforms traditional fight prediction into engaging entertainment by generating dynamic, broadcast-style UFC fight commentary. Features both text-based and image-based prediction modes:
Text-Based Predictor (ufc_server.py):
- Uses comprehensive fighter statistics (wins/losses, physical attributes, performance metrics)
- Generates dramatic fight commentary with commentator personalities
- TTS-ready output with natural speech patterns and emphasis markers
- Statistical analysis wrapped in entertaining storytelling
Image-Based Predictor (ufc_image_env.py):
- Multimodal prediction using fighter profile images
- Visual analysis transformed into engaging commentary
- Base64 image encoding for API compatibility
- Creates dramatic narratives from fighter appearances
Key Features:
- Entertainment-first approach with broadcast-style commentary
- Direct TTS integration compatibility (designed for models like DIA)
- Dramatic elements including commentator phrases and pauses
- Proper formatting for voice synthesis applications
- Comprehensive scoring system for prediction accuracy and entertainment value
Data Components:
fighter_stats.csv: Detailed fighter statistics and performance metricslarge_dataset.csv: Sample historical fight data (799 records from original 7,440)fighter_images/: Profile images for visual-based predictionsget_images.py: Web scraping utility for fighter image collection
Note: The included dataset is a sample for demonstration. The full dataset (7,440 fight records) is available in the original UFC_FIGHT_PREDICTOR repository.
Additional Tools:
ufc_predictor_ui.py: Flask-based web interface for interactive predictions- Video demonstrations and example runs available
- WandB integration for training tracking
Requirements: PIL, OpenAI API, Flask (for UI), BeautifulSoup4 (for image scraping)
Support
For questions or issues with community environments:
- Check the individual environment's README first
- Open an issue in the main repository
- Tag the environment author if possible
These environments are community contributions and may have different maintenance levels and support compared to core Atropos environments.