Update BLEUBERI README with OpenAI API instructions and remove redundant reward functions

2026-04-28 17:29:30 +00:00 · 2025-06-09 07:07:28 -05:00 · 2025-06-09 07:07:28 -05:00 · 3109fe349b
commit 3109fe349b
parent a520f5f663
2 changed files with 51 additions and 198 deletions
--- a/environments/bleuberi/README.md
+++ b/environments/bleuberi/README.md
@ -20,13 +20,60 @@ BLEUBERI uses BLEU scores (a simple n-gram matching metric) directly as rewards
 ## Usage

 ```bash
-# Run the BLEUBERI environment
-python -m atroposlib.cli.dpo --env-module environments.bleuberi.bleuberi_env
+# Run the BLEUBERI environment as a service
+python -m environments.bleuberi.bleuberi_env serve --config environments/bleuberi/configs/default.yaml

-# Generate data with pre-collected references
-python -m environments.bleuberi.bleuberi_env process --config environments/bleuberi/configs/default.yaml
+# Generate data with pre-collected references (for testing and debugging)
+python -m environments.bleuberi.bleuberi_env process --config environments/bleuberi/configs/default.yaml --env.data_path_to_save_groups bleuberi_rollouts.jsonl
 ```

+## Testing with OpenAI API
+
+The BLEUBERI environment can be tested with OpenAI API or any compatible API server. The API key is loaded securely from environment variables:
+
+1. Set your OpenAI API key as an environment variable:
+   ```bash
+   export OPENAI_API_KEY=your-api-key
+   ```
+
+2. Create or modify a configuration file for OpenAI (e.g., `environments/bleuberi/configs/openai.yaml`):
+   ```yaml
+   env:
+     # Standard environment configuration
+     wandb_name: bleuberi
+     dataset_name: "allenai/tulu-3-sft-mixture"
+     reward_funcs:
+       - "bleu"
+     ref_models:
+       - "gold"
+
+   openai:
+     base_url: "https://api.openai.com/v1"  # Or your custom server URL
+     model: "gpt-4o"  # Or your preferred model
+     temperature: 0.7
+     max_tokens: 1024
+     top_p: 0.95
+   ```
+
+3. Run the environment in process mode to test with OpenAI:
+   ```bash
+   python -m environments.bleuberi.bleuberi_env process \
+     --config environments/bleuberi/configs/openai.yaml \
+     --env.data_path_to_save_groups bleuberi_openai_test.jsonl
+   ```
+
+This will create two files:
+- `bleuberi_openai_test.jsonl`: Raw data containing prompts, responses, and scores
+- `bleuberi_openai_test.html`: A visual representation of the interactions for easy review
+
+4. For local inference server testing:
+   - Set `base_url` to your local server (e.g., "http://localhost:8000/v1")
+   - Specify the model name as expected by your server
+
+5. For custom reference models:
+   - Configure `ref_models` in the YAML to use specific models
+   - Available options include: gold (default), claude-3-7-sonnet@20250219, deepseek-chat-v3, gemini-2.5-pro-exp-03-25, o4-mini-2025-04-16, Llama-3.1-8B-Instruct
+
 ## Configuration

 See the `configs/` directory for example configurations. The environment supports: