mirror of
https://github.com/open-thought/reasoning-gym.git
synced 2026-04-19 12:58:07 +00:00
* added open-instruct * fixed hooks * GRPO --------- Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>
1.1 KiB
1.1 KiB
Reasoning Gym Open-Instruct Example
This example demonstrates how to use open-instruct with reasoning-gym for training language models through reinforcement learning.
Environment Setup
Before running the training script, you may want to set the following environment variables:
export CUDA_VISIBLE_DEVICES=0
export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
CUDA_VISIBLE_DEVICES=0specifies which GPUs to use (in this case, GPUs 0 )PYTORCH_CUDA_ALLOC_CONF=expandable_segments:Trueenables dynamic GPU memory allocation
Running the Training Script
To start training, run:
bash grpo_config.sh
This will train a DeepSeek-R1-Distill-Qwen-1.5B model using GRPO (Group Relative Policy Optimization) on reasoning tasks.
Key Features
- Uses vLLM for efficient inference
- Supports multi-GPU training
- Implements reward functions for both answer correctness and response formatting
- Includes evaluation during training
- Supports model checkpointing and pushing to Hugging Face Hub
- Integrates with Weights & Biases for experiment tracking