Feat/open instruct example (#381)

* added open-instruct

* fixed hooks

* GRPO

---------

Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>
This commit is contained in:
joesharratt1229 2025-03-17 22:20:11 +00:00 committed by GitHub
parent 1511c5e301
commit 9234aa77bf
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
5 changed files with 629 additions and 0 deletions

View file

@ -0,0 +1,34 @@
# Reasoning Gym Open-Instruct Example
This example demonstrates how to use [open-instruct](https://github.com/allenai/open-instruct) with **reasoning-gym** for training language models through reinforcement learning.
## Environment Setup
Before running the training script, you may want to set the following environment variables:
```bash
export CUDA_VISIBLE_DEVICES=0
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
```
- `CUDA_VISIBLE_DEVICES=0` specifies which GPUs to use (in this case, GPUs 0 )
- `PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True` enables dynamic GPU memory allocation
## Running the Training Script
To start training, run:
```bash
bash grpo_config.sh
```
This will train a DeepSeek-R1-Distill-Qwen-1.5B model using GRPO (Group Relative Policy Optimization) on reasoning tasks.
## Key Features
- Uses vLLM for efficient inference
- Supports multi-GPU training
- Implements reward functions for both answer correctness and response formatting
- Includes evaluation during training
- Supports model checkpointing and pushing to Hugging Face Hub
- Integrates with Weights & Biases for experiment tracking