Introduces `log_eval_sample()` method for stream-writing individual
evaluation samples to `samples.jsonl` during evaluation, with lazy
writer initialization and automatic HTML generation on completion.
Updates GSM8k environment to use streaming approach instead of batching
samples.
Automatically save the final merged evaluate configuration to evaluate_config.yaml
in the data_dir_to_save_evals directory. This includes env config, OpenAI/server
configs, and server manager settings, enabling reproducibility and easier
debugging of evaluation runs.
The config is saved after all merging (CLI args > YAML > defaults) to capture
the exact configuration used for the evaluation.
* removed changes to other files
* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
* fail on scores empty
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
- Add min_batch_allocation parameter to ensure environments contribute minimum proportion to each batch
- Implement grab_batch_with_minimum_allocations function with proper scaling when allocations exceed 100%
- Add mixed-size group buffering to handle variable-sized data submissions
- Update server to use minimum allocation logic when any env has min_batch_allocation set
- Add comprehensive tests for minimum allocation scenarios
- Update documentation in API README and CONFIG.md
- Update example environments to demonstrate the feature
This feature allows critical environments to guarantee they contribute at least a specified proportion (0.0-1.0) to each training batch, ensuring important data sources are always represented during training.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>