mirror of
https://github.com/open-thought/reasoning-gym.git
synced 2026-04-19 12:58:07 +00:00
* cleaned up examples * updated failing hooks * updated readme * corrected linting checks |
||
|---|---|---|
| .. | ||
| config | ||
| grpo.py | ||
| README.md | ||
| set_env.sh | ||
| train.sh | ||
Training with TRL
Training stack:
- TRL for reinforcement learning training
- Accelerate (with DeepSpeed) for distributed training
- vLLM for rollouts
Setup
This tutorial uses CUDA 11.8, Python 3.10, and PyTorch 2.5.1
Moreover, we assume that you have 2 GPUs on your machine, the last of which is used for vLLM rollouts.
If you have more than 2 GPUs, adjust the ./config/grpo.yaml file so that the vllm_device is set to the last index of your GPU. For example, if you have 4 GPUs, set it to 3:
vllm_device: 3 # If you have 4 GPUs, set this to 3
Moreover, you would need to update the CUDA_VISIBLE_DEVICES environment variable in the train.sh script to include all your available GPUs. For example, if you have 4 GPUs, set it to:
# ./train.sh
# ... beginning of the script
export CUDA_VISIBLE_DEVICES=0,1,2,3
# ... rest of the script
- Install the required packages:
# First, give execute permissions to the script
# chmod +x ./set_env.sh
# Then, run the setup script
./set_env.sh
- (Optional) Log in to Weights & Biases for experiment tracking:
# First, set your WANDB_API_KEY as an environment variable
export WANDB_API_KEY=your_wandb_api_key
# Set the project name
export WANDB_PROJECT=your_wandb_project_name
- Run the training script
# First, give execute permissions to the script
# chmod +x ./train.sh
# Then, run the training script
./train.sh