# Community Environments This directory is home to community-contributed training environments for Atropos. Environments submitted by the community will be placed here after an initial code review. **Note:** Environments in this directory are pending full testing and integration. While they have passed a basic code check, they may not yet have been rigorously validated on our compute cluster. ## Contributing Your Environment We encourage you to contribute your own RL environments! When developing a new environment, please follow these guidelines: 1. **Create your environment in this `environments/community/` subdirectory.** This helps us keep new submissions organized. 2. **Preferred Import Style:** We prefer that you treat your environment's directory as the package root for imports within your environment code. For example, if you need to import `SomeClass`, you can do so directly: ```python from some_file_in_my_env import SomeClass ``` This helps maintain consistency and makes it easier to integrate your environment. ### Environment Standards Community environments should: - Include clear documentation and setup instructions - Specify all dependencies in requirements files - Provide example configurations and usage - Follow the AtroposBaseEnv pattern for consistency - Include appropriate error handling and validation ### Submission Process To contribute a new environment to the community collection: 1. **Fork the repository** and create a new branch 2. **Add your environment** to this `community/` directory 3. **Include comprehensive documentation**: - README with setup instructions - Requirements file for dependencies - Example usage and configuration 4. **Follow naming conventions**: - Use descriptive directory names for complex environments - Single file environments should have descriptive names 5. **Test thoroughly** before submitting 6. **Submit a pull request** with a clear description Once your environment is ready, please follow the guidelines in our main [CONTRIBUTING.md](../../CONTRIBUTING.md) to submit your contribution. --- ## Available Environments ### 1. Lean Proof Environment (`lean_proof_env/`) **Author**: [GabinFay](https://github.com/GabinFay) **Purpose**: Testing Language Learning Models (LLMs) on Lean theorem proving tasks A comprehensive environment for evaluating LLMs on formal mathematical reasoning using the Lean theorem prover. Features include: - Support for custom problem datasets or MiniF2F benchmark - Integration with Lean 4 theorem prover - Configurable difficulty levels and problem sets - Automated proof validation **Requirements**: Lean 4 installation, OpenAI API key ### 2. Router Environment (`router_env/`) **Author**: [GabinFay](https://github.com/GabinFay) **Purpose**: Multi-agent routing and coordination system A sophisticated environment for testing agent routing and coordination capabilities. Includes: - Multiple specialized agents (calendar, contact, Gmail, telephony, etc.) - Model Contextualized Protocol (MCP) tools integration - Spotify, Google Maps, and Perplexity integrations - Complex multi-turn conversation handling **Features**: - Telephony agent with inbound/outbound call handling - Calendar and contact management - Memory and calculation agents - Router agent for intelligent task delegation ### 3. Philosophical RLAIF Environment (`philosophical_rlaif_env.py`) **Author**: [GabinFay](https://github.com/GabinFay) **Purpose**: Reinforcement Learning from AI Feedback (RLAIF) for philosophical reasoning An environment focused on training models for deep philosophical inquiry and reasoning. Features: - Deep thinking prompts with systematic reasoning processes - Preference learning for philosophical depth and nuance - Multi-perspective analysis and assumption questioning - Evaluation of response quality for philosophical discussions **Capabilities**: - Generates paired responses for preference comparison - Uses judge models to evaluate philosophical depth - Tracks preference consistency and reasoning quality - Supports WandB logging for training insights ### 4. Playwright Agent Environment (`playwright_agent_env.py`) **Author**: [erikqu](https://github.com/erikqu) **Purpose**: Web automation and browser interaction for LLM agents A comprehensive environment for training LLMs to interact with web pages through browser automation. Features: - Playwright-based browser control with headless operation - Screenshot-based visual input for LLM decision making - JSON-based action commands (navigate, click, type, finish) - Video recording of browser sessions for evaluation - Google Gemini integration for success evaluation **Capabilities**: - Loads tasks from WebVoyager dataset or custom task definitions - Supports development mode for testing without LLM calls - Automatic reward computation based on success and efficiency - Comprehensive error handling and fallback mechanisms - Integration with Atropos training pipeline **Requirements**: Playwright, optional Google Gemini API for evaluation ### 5. Metric Card Generator Environment (`metric_card_generator/`) **Author**: [vivek100](https://github.com/vivek100) **Purpose**: Structured JSON generation for AI model evaluation dashboards A comprehensive environment for training LLMs to generate well-structured JSON configurations for Metric Card UI components. Features: - Closed-loop generation, evaluation, and visualization pipeline - Schema validation for JSON metric card configurations - Multi-dimensional evaluation (validity, compliance, semantic quality) - Support for various business domains and metric types - WandB integration for performance tracking **Capabilities**: - Generates metric cards for diverse business contexts (e-commerce, finance, healthcare, etc.) - Validates JSON structure against predefined schemas - Evaluates semantic quality and formatting consistency - Provides training data extraction and filtering utilities - Includes visualization tools for score distribution analysis **Components**: - `metric_card_generator.py`: Main environment implementation - `extract_metric_training.py`: Training data extraction utility - `trainingDataScript.py`: Dataset creation from collected examples - `show_score_distribution.py`: Performance analysis visualization **Requirements**: Pydantic, tqdm ### 6. UFC Prediction Environment (`ufc_prediction_env/`) **Author**: [edmundman](https://github.com/edmundman) **Repository**: [UFC_FIGHT_PREDICTOR](https://github.com/edmundman/UFC_FIGHT_PREDICTOR) **Purpose**: UFC fight prediction with entertaining TTS-ready commentary generation A creative environment that transforms traditional fight prediction into engaging entertainment by generating dynamic, broadcast-style UFC fight commentary. Features both text-based and image-based prediction modes: **Text-Based Predictor (`ufc_server.py`)**: - Uses comprehensive fighter statistics (wins/losses, physical attributes, performance metrics) - Generates dramatic fight commentary with commentator personalities - TTS-ready output with natural speech patterns and emphasis markers - Statistical analysis wrapped in entertaining storytelling **Image-Based Predictor (`ufc_image_env.py`)**: - Multimodal prediction using fighter profile images - Visual analysis transformed into engaging commentary - Base64 image encoding for API compatibility - Creates dramatic narratives from fighter appearances **Key Features**: - Entertainment-first approach with broadcast-style commentary - Direct TTS integration compatibility (designed for models like DIA) - Dramatic elements including commentator phrases and pauses - Proper formatting for voice synthesis applications - Comprehensive scoring system for prediction accuracy and entertainment value **Data Components**: - `fighter_stats.csv`: Detailed fighter statistics and performance metrics - `large_dataset.csv`: Sample historical fight data (799 records from original 7,440) - `fighter_images/`: Profile images for visual-based predictions - `get_images.py`: Web scraping utility for fighter image collection **Note**: The included dataset is a sample for demonstration. The full dataset (7,440 fight records) is available in the original [UFC_FIGHT_PREDICTOR repository](https://github.com/edmundman/UFC_FIGHT_PREDICTOR). **Additional Tools**: - `ufc_predictor_ui.py`: Flask-based web interface for interactive predictions - Video demonstrations and example runs available - WandB integration for training tracking **Requirements**: PIL, OpenAI API, Flask (for UI), BeautifulSoup4 (for image scraping) ### 7. Accessibility Auto-Fixer Environment (`accessibility_env/`) **Author**: [joshgarza](https://github.com/joshgarza) **Purpose**: Automated web accessibility remediation using WCAG guidelines A specialized environment for training LLMs to automatically identify and fix web accessibility issues in HTML snippets. The environment focuses on objective, rule-based WCAG compliance improvements with minimal code changes. **Features**: - Rule-based scoring system for WCAG 2.1 AA compliance - Support for multiple accessibility criteria (alt text, form labels, link text) - BeautifulSoup-based HTML parsing and validation - Automated scoring for accessibility improvements - Integration with common accessibility testing patterns **Targeted WCAG Criteria**: - **Images**: Missing or empty `alt` attributes (WCAG 1.1.1) - **Form Labels**: Improper `