mirror of https://github.com/open-thought/reasoning-gym.git synced 2026-04-19 12:58:07 +00:00

History

joesharratt1229 1c98584f28 Feat/unsloth example (#482 ) * cleaned up examples * updated failing hooks * updated readme * corrected linting checks		2025-06-28 17:04:38 +01:00
..
config	Feat/unsloth example (#482 )	2025-06-28 17:04:38 +01:00
grpo.py	Feat/unsloth example (#482 )	2025-06-28 17:04:38 +01:00
README.md	tutorial(training): Add a minimal example with `trl` (#473 )	2025-06-21 00:01:31 +02:00
set_env.sh	tutorial(training): Add a minimal example with `trl` (#473 )	2025-06-21 00:01:31 +02:00
train.sh	tutorial(training): Add a minimal example with `trl` (#473 )	2025-06-21 00:01:31 +02:00

README.md

Training with TRL

Training stack:

TRL for reinforcement learning training
Accelerate (with DeepSpeed) for distributed training
vLLM for rollouts

Setup

This tutorial uses CUDA 11.8, Python 3.10, and PyTorch 2.5.1

Moreover, we assume that you have 2 GPUs on your machine, the last of which is used for vLLM rollouts.

If you have more than 2 GPUs, adjust the ./config/grpo.yaml file so that the vllm_device is set to the last index of your GPU. For example, if you have 4 GPUs, set it to 3:

vllm_device: 3  # If you have 4 GPUs, set this to 3

Moreover, you would need to update the CUDA_VISIBLE_DEVICES environment variable in the train.sh script to include all your available GPUs. For example, if you have 4 GPUs, set it to:

# ./train.sh

# ... beginning of the script
export CUDA_VISIBLE_DEVICES=0,1,2,3
# ... rest of the script

Install the required packages:

# First, give execute permissions to the script
# chmod +x ./set_env.sh

# Then, run the setup script
./set_env.sh

(Optional) Log in to Weights & Biases for experiment tracking:

# First, set your WANDB_API_KEY as an environment variable
export WANDB_API_KEY=your_wandb_api_key

# Set the project name
export WANDB_PROJECT=your_wandb_project_name

Run the training script

# First, give execute permissions to the script
# chmod +x ./train.sh

# Then, run the training script
./train.sh