mirror of
https://github.com/NousResearch/atropos.git
synced 2026-04-24 17:04:55 +00:00
Move BLEUBERI environment to community folder
- Moved environments/bleuberi to environments/community/bleuberi
- Updated .gitmodules to reflect new submodule path
- Fixed pre-commit formatting issues
- Cleaned up test output files
This commit is contained in:
parent
532024d01e
commit
0f6c06bb56
8 changed files with 16 additions and 9 deletions
104
environments/community/bleuberi/README.md
Normal file
104
environments/community/bleuberi/README.md
Normal file
|
|
@ -0,0 +1,104 @@
|
|||
# BLEUBERI Environment for Atropos
|
||||
|
||||
This environment implements the BLEUBERI approach for instruction-following using BLEU scores as rewards. BLEUBERI (BLEU-Based Enhanced Utility for Better Evaluating Reward in Instruction-following) demonstrates that BLEU scores, when paired with high-quality references from strong LLMs, can be highly effective rewards for training models to follow instructions.
|
||||
|
||||
## Overview
|
||||
|
||||
BLEUBERI uses BLEU scores (a simple n-gram matching metric) directly as rewards in a Group Relative Policy Optimization (GRPO) training framework. The approach:
|
||||
|
||||
1. Collects high-quality reference responses from top LLMs (Claude, Gemini, etc.)
|
||||
2. Computes BLEU scores by comparing model outputs to these references
|
||||
3. Uses these scores as rewards to train models through GRPO
|
||||
|
||||
## Installation
|
||||
|
||||
Before using the BLEUBERI environment, you need to install its dependencies:
|
||||
|
||||
```bash
|
||||
# Install the required dependencies
|
||||
pip install -r environments/bleuberi/requirements.txt
|
||||
```
|
||||
|
||||
The key dependencies include:
|
||||
- `model2vec`: For embedding-based similarity metrics
|
||||
- `bert-score`: For semantic similarity evaluation
|
||||
- `sacrebleu`: For BLEU score calculation
|
||||
- `evaluate`: For evaluation metrics
|
||||
- `datasets`: For dataset handling
|
||||
|
||||
## Features
|
||||
|
||||
- BLEU-based reward functions (with support for multiple reference models)
|
||||
- Compatible with the Atropos asynchronous environment framework
|
||||
- Support for both SFT and GRPO training approaches
|
||||
- Evaluation on instruction-following benchmarks
|
||||
|
||||
## Usage
|
||||
|
||||
```bash
|
||||
# Run the BLEUBERI environment as a service
|
||||
python -m environments.bleuberi.bleuberi_env serve --config environments/bleuberi/configs/default.yaml
|
||||
|
||||
# Generate data with pre-collected references (for testing and debugging)
|
||||
python -m environments.bleuberi.bleuberi_env process --config environments/bleuberi/configs/default.yaml --env.data_path_to_save_groups bleuberi_rollouts.jsonl
|
||||
```
|
||||
|
||||
## Testing with OpenAI API
|
||||
|
||||
The BLEUBERI environment can be tested with OpenAI API or any compatible API server. The API key is loaded securely from environment variables:
|
||||
|
||||
1. Set your OpenAI API key as an environment variable:
|
||||
```bash
|
||||
export OPENAI_API_KEY=your-api-key
|
||||
```
|
||||
|
||||
2. Create or modify a configuration file for OpenAI (e.g., `environments/bleuberi/configs/openai.yaml`):
|
||||
```yaml
|
||||
env:
|
||||
# Standard environment configuration
|
||||
wandb_name: bleuberi
|
||||
dataset_name: "allenai/tulu-3-sft-mixture"
|
||||
reward_funcs:
|
||||
- "bleu"
|
||||
ref_models:
|
||||
- "gold"
|
||||
|
||||
openai:
|
||||
base_url: "https://api.openai.com/v1" # Or your custom server URL
|
||||
model: "gpt-4o" # Or your preferred model
|
||||
temperature: 0.7
|
||||
max_tokens: 1024
|
||||
top_p: 0.95
|
||||
```
|
||||
|
||||
3. Run the environment in process mode to test with OpenAI:
|
||||
```bash
|
||||
python -m environments.bleuberi.bleuberi_env process \
|
||||
--config environments/bleuberi/configs/openai.yaml \
|
||||
--env.data_path_to_save_groups bleuberi_openai_test.jsonl
|
||||
```
|
||||
|
||||
This will create two files:
|
||||
- `bleuberi_openai_test.jsonl`: Raw data containing prompts, responses, and scores
|
||||
- `bleuberi_openai_test.html`: A visual representation of the interactions for easy review
|
||||
|
||||
4. For local inference server testing:
|
||||
- Set `base_url` to your local server (e.g., "http://localhost:8000/v1")
|
||||
- Specify the model name as expected by your server
|
||||
|
||||
5. For custom reference models:
|
||||
- Configure `ref_models` in the YAML to use specific models
|
||||
- Available options include: gold (default), claude-3-7-sonnet@20250219, deepseek-chat-v3, gemini-2.5-pro-exp-03-25, o4-mini-2025-04-16, Llama-3.1-8B-Instruct
|
||||
|
||||
## Configuration
|
||||
|
||||
See the `configs/` directory for example configurations. The environment supports:
|
||||
|
||||
- Using pre-collected references or generating references on-the-fly
|
||||
- Multiple reference models for more robust BLEU scoring
|
||||
- Various BLEU calculation parameters
|
||||
- Different dataset sources (default: Tulu3 mixture)
|
||||
|
||||
## References
|
||||
|
||||
This implementation is based on the paper [BLEUBERI: BLEU is a surprisingly effective reward for instruction following](https://arxiv.org/abs/2505.11080) and its [original implementation](https://github.com/lilakk/BLEUBERI).
|
||||
Loading…
Add table
Add a link
Reference in a new issue