Move BLEUBERI environment to community folder

- Moved environments/bleuberi to environments/community/bleuberi - Updated .gitmodules to reflect new submodule path - Fixed pre-commit formatting issues - Cleaned up test output files
2026-04-24 17:04:55 +00:00 · 2025-09-08 14:38:43 -05:00 · 2025-09-08 14:38:43 -05:00 · 0f6c06bb56
commit 0f6c06bb56
parent 532024d01e
8 changed files with 16 additions and 9 deletions
--- a/environments/community/bleuberi/README.md
+++ b/environments/community/bleuberi/README.md
@ -0,0 +1,104 @@
+# BLEUBERI Environment for Atropos
+
+This environment implements the BLEUBERI approach for instruction-following using BLEU scores as rewards. BLEUBERI (BLEU-Based Enhanced Utility for Better Evaluating Reward in Instruction-following) demonstrates that BLEU scores, when paired with high-quality references from strong LLMs, can be highly effective rewards for training models to follow instructions.
+
+## Overview
+
+BLEUBERI uses BLEU scores (a simple n-gram matching metric) directly as rewards in a Group Relative Policy Optimization (GRPO) training framework. The approach:
+
+1. Collects high-quality reference responses from top LLMs (Claude, Gemini, etc.)
+2. Computes BLEU scores by comparing model outputs to these references
+3. Uses these scores as rewards to train models through GRPO
+
+## Installation
+
+Before using the BLEUBERI environment, you need to install its dependencies:
+
+```bash
+# Install the required dependencies
+pip install -r environments/bleuberi/requirements.txt
+```
+
+The key dependencies include:
+- `model2vec`: For embedding-based similarity metrics
+- `bert-score`: For semantic similarity evaluation
+- `sacrebleu`: For BLEU score calculation
+- `evaluate`: For evaluation metrics
+- `datasets`: For dataset handling
+
+## Features
+
+- BLEU-based reward functions (with support for multiple reference models)
+- Compatible with the Atropos asynchronous environment framework
+- Support for both SFT and GRPO training approaches
+- Evaluation on instruction-following benchmarks
+
+## Usage
+
+```bash
+# Run the BLEUBERI environment as a service
+python -m environments.bleuberi.bleuberi_env serve --config environments/bleuberi/configs/default.yaml
+
+# Generate data with pre-collected references (for testing and debugging)
+python -m environments.bleuberi.bleuberi_env process --config environments/bleuberi/configs/default.yaml --env.data_path_to_save_groups bleuberi_rollouts.jsonl
+```
+
+## Testing with OpenAI API
+
+The BLEUBERI environment can be tested with OpenAI API or any compatible API server. The API key is loaded securely from environment variables:
+
+1. Set your OpenAI API key as an environment variable:
+   ```bash
+   export OPENAI_API_KEY=your-api-key
+   ```
+
+2. Create or modify a configuration file for OpenAI (e.g., `environments/bleuberi/configs/openai.yaml`):
+   ```yaml
+   env:
+     # Standard environment configuration
+     wandb_name: bleuberi
+     dataset_name: "allenai/tulu-3-sft-mixture"
+     reward_funcs:
+       - "bleu"
+     ref_models:
+       - "gold"
+
+   openai:
+     base_url: "https://api.openai.com/v1"  # Or your custom server URL
+     model: "gpt-4o"  # Or your preferred model
+     temperature: 0.7
+     max_tokens: 1024
+     top_p: 0.95
+   ```
+
+3. Run the environment in process mode to test with OpenAI:
+   ```bash
+   python -m environments.bleuberi.bleuberi_env process \
+     --config environments/bleuberi/configs/openai.yaml \
+     --env.data_path_to_save_groups bleuberi_openai_test.jsonl
+   ```
+
+This will create two files:
+- `bleuberi_openai_test.jsonl`: Raw data containing prompts, responses, and scores
+- `bleuberi_openai_test.html`: A visual representation of the interactions for easy review
+
+4. For local inference server testing:
+   - Set `base_url` to your local server (e.g., "http://localhost:8000/v1")
+   - Specify the model name as expected by your server
+
+5. For custom reference models:
+   - Configure `ref_models` in the YAML to use specific models
+   - Available options include: gold (default), claude-3-7-sonnet@20250219, deepseek-chat-v3, gemini-2.5-pro-exp-03-25, o4-mini-2025-04-16, Llama-3.1-8B-Instruct
+
+## Configuration
+
+See the `configs/` directory for example configurations. The environment supports:
+
+- Using pre-collected references or generating references on-the-fly
+- Multiple reference models for more robust BLEU scoring
+- Various BLEU calculation parameters
+- Different dataset sources (default: Tulu3 mixture)
+
+## References
+
+This implementation is based on the paper [BLEUBERI: BLEU is a surprisingly effective reward for instruction following](https://arxiv.org/abs/2505.11080) and its [original implementation](https://github.com/lilakk/BLEUBERI).