Add vivek100's Metric Card Generator Environment to community - Added comprehensive environment for JSON UI component generation with schema validation and multi-dimensional evaluation - Fixed all linting issues and updated community README with proper attribution

2026-04-19 12:57:58 +00:00 · 2025-05-23 15:11:03 +10:00 · 2025-05-23 15:11:03 +10:00 · 27ea15e544
commit 27ea15e544
parent 881b2f522f
11 changed files with 16363 additions and 0 deletions
--- a/environments/community/metric_card_generator/README.md
+++ b/environments/community/metric_card_generator/README.md
@ -0,0 +1,65 @@
+# Metric Card Generator Environment
+
+## Design and Motivation
+
+This environment generates structured JSON configurations for Metric Card UI components for AI model evaluation dashboards. It demonstrates a closed-loop generation, evaluation, and visualization pipeline using Atropos.
+
+The environment challenges language models to produce well-structured, valid JSON metric card configurations that can be directly used in front-end applications. This tests the model's ability to:
+- Follow specific schema requirements
+- Generate complex nested structures
+- Maintain consistent JSON formatting
+- Create semantically meaningful metric descriptions
+
+## Quickstart
+
+```bash
+# Install dependencies
+pip install -r requirements.txt
+
+# Run the environment with process command to generate rollouts
+python metric_card_generator.py process --env.data_path_to_save_groups artifacts/metric_rollouts.jsonl
+
+# View the generated HTML visualization
+# Open artifacts/metric_rollouts.html in a browser
+```
+
+## Environment Components
+
+- **metric_card_generator.py**: Main environment implementation with prompting and evaluation logic
+- **extract_metric_training.py**: Utility to extract high-quality examples for training
+- **trainingDataScript.py**: Creates training datasets from collected examples
+- **show_score_distribution.py**: Visualization tool for analyzing model performance
+
+## Artifacts
+
+The artifacts folder includes:
+- **metric_rollouts.jsonl**: Raw model outputs with scores
+- **metric_rollouts.html**: Visualization of model outputs and scores
+- **metric_training.jsonl**: Processed examples suitable for fine-tuning
+- **metric_training_high_quality.jsonl**: Filtered high-quality examples
+
+## Evaluation Metrics
+
+The environment evaluates model outputs on several dimensions:
+
+- **JSON Validity**: Whether the output is valid, parseable JSON
+- **Schema Compliance**: Whether the output follows the required structure
+- **Semantic Quality**: Whether the metrics described make sense for the given context
+- **Formatting**: Proper nesting, field types, and attribute consistency
+
+## WandB Integration
+
+Performance metrics are logged to Weights & Biases, including:
+- Percent of valid JSON responses
+- Average scores across evaluation criteria
+- Token usage efficiency
+- Examples of best and worst performing generations
+
+## Use with Training
+
+This environment can be integrated into the Atropos training loop to improve a model's ability to generate structured JSON output:
+
+```bash
+# Example training command
+python example_trainer/trainer.py --environment metric_card_generator --model your_model --iterations 1000
+```