mirror of
https://github.com/NousResearch/atropos.git
synced 2026-04-19 12:57:58 +00:00
Add vivek100's Metric Card Generator Environment to community - Added comprehensive environment for JSON UI component generation with schema validation and multi-dimensional evaluation - Fixed all linting issues and updated community README with proper attribution
This commit is contained in:
parent
881b2f522f
commit
27ea15e544
11 changed files with 16363 additions and 0 deletions
65
environments/community/metric_card_generator/README.md
Normal file
65
environments/community/metric_card_generator/README.md
Normal file
|
|
@ -0,0 +1,65 @@
|
|||
# Metric Card Generator Environment
|
||||
|
||||
## Design and Motivation
|
||||
|
||||
This environment generates structured JSON configurations for Metric Card UI components for AI model evaluation dashboards. It demonstrates a closed-loop generation, evaluation, and visualization pipeline using Atropos.
|
||||
|
||||
The environment challenges language models to produce well-structured, valid JSON metric card configurations that can be directly used in front-end applications. This tests the model's ability to:
|
||||
- Follow specific schema requirements
|
||||
- Generate complex nested structures
|
||||
- Maintain consistent JSON formatting
|
||||
- Create semantically meaningful metric descriptions
|
||||
|
||||
## Quickstart
|
||||
|
||||
```bash
|
||||
# Install dependencies
|
||||
pip install -r requirements.txt
|
||||
|
||||
# Run the environment with process command to generate rollouts
|
||||
python metric_card_generator.py process --env.data_path_to_save_groups artifacts/metric_rollouts.jsonl
|
||||
|
||||
# View the generated HTML visualization
|
||||
# Open artifacts/metric_rollouts.html in a browser
|
||||
```
|
||||
|
||||
## Environment Components
|
||||
|
||||
- **metric_card_generator.py**: Main environment implementation with prompting and evaluation logic
|
||||
- **extract_metric_training.py**: Utility to extract high-quality examples for training
|
||||
- **trainingDataScript.py**: Creates training datasets from collected examples
|
||||
- **show_score_distribution.py**: Visualization tool for analyzing model performance
|
||||
|
||||
## Artifacts
|
||||
|
||||
The artifacts folder includes:
|
||||
- **metric_rollouts.jsonl**: Raw model outputs with scores
|
||||
- **metric_rollouts.html**: Visualization of model outputs and scores
|
||||
- **metric_training.jsonl**: Processed examples suitable for fine-tuning
|
||||
- **metric_training_high_quality.jsonl**: Filtered high-quality examples
|
||||
|
||||
## Evaluation Metrics
|
||||
|
||||
The environment evaluates model outputs on several dimensions:
|
||||
|
||||
- **JSON Validity**: Whether the output is valid, parseable JSON
|
||||
- **Schema Compliance**: Whether the output follows the required structure
|
||||
- **Semantic Quality**: Whether the metrics described make sense for the given context
|
||||
- **Formatting**: Proper nesting, field types, and attribute consistency
|
||||
|
||||
## WandB Integration
|
||||
|
||||
Performance metrics are logged to Weights & Biases, including:
|
||||
- Percent of valid JSON responses
|
||||
- Average scores across evaluation criteria
|
||||
- Token usage efficiency
|
||||
- Examples of best and worst performing generations
|
||||
|
||||
## Use with Training
|
||||
|
||||
This environment can be integrated into the Atropos training loop to improve a model's ability to generate structured JSON output:
|
||||
|
||||
```bash
|
||||
# Example training command
|
||||
python example_trainer/trainer.py --environment metric_card_generator --model your_model --iterations 1000
|
||||
```
|
||||
Loading…
Add table
Add a link
Reference in a new issue