mirror of
https://github.com/open-thought/reasoning-gym.git
synced 2026-04-19 12:58:07 +00:00
* v0 * 2 gpu setup * improve parsing from yaml * update yaml dataset example * remove restriction on flash attn * more comments * first version of the readme * pin torch * simplify requirements * just flash attn * use set env instead * simpler set env * readme * add wandb project to setup * update template * update model id * post init to capture the config and weight * extract metadata * update config * update dataset config * move env for wandb project * pre-commit * remove qwen-math from training * more instructions * unused import * remove trl old * warmup ratio * warmup ratio * change model id * change model_id * add info about CUDA_VISIBLE_DEVICES
56 lines
1.4 KiB
Markdown
56 lines
1.4 KiB
Markdown
# Training with TRL
|
|
|
|
Training stack:
|
|
- TRL for reinforcement learning training
|
|
- Accelerate (with DeepSpeed) for distributed training
|
|
- vLLM for rollouts
|
|
|
|
|
|
## Setup
|
|
|
|
This tutorial uses CUDA 11.8, Python 3.10, and PyTorch 2.5.1
|
|
|
|
Moreover, we assume that you have 2 GPUs on your machine, the last of which is used for vLLM rollouts.
|
|
|
|
If you have more than 2 GPUs, adjust the `./config/grpo.yaml` file so that the `vllm_device` is set to the last index of your GPU. For example, if you have 4 GPUs, set it to 3:
|
|
```yaml
|
|
vllm_device: 3 # If you have 4 GPUs, set this to 3
|
|
```
|
|
|
|
Moreover, you would need to update the `CUDA_VISIBLE_DEVICES` environment variable in the `train.sh` script to include all your available GPUs. For example, if you have 4 GPUs, set it to:
|
|
```bash
|
|
# ./train.sh
|
|
|
|
# ... beginning of the script
|
|
export CUDA_VISIBLE_DEVICES=0,1,2,3
|
|
# ... rest of the script
|
|
```
|
|
|
|
|
|
|
|
1. Install the required packages:
|
|
```bash
|
|
# First, give execute permissions to the script
|
|
# chmod +x ./set_env.sh
|
|
|
|
# Then, run the setup script
|
|
./set_env.sh
|
|
```
|
|
|
|
2. (Optional) Log in to Weights & Biases for experiment tracking:
|
|
```bash
|
|
# First, set your WANDB_API_KEY as an environment variable
|
|
export WANDB_API_KEY=your_wandb_api_key
|
|
|
|
# Set the project name
|
|
export WANDB_PROJECT=your_wandb_project_name
|
|
```
|
|
|
|
3. Run the training script
|
|
```bash
|
|
# First, give execute permissions to the script
|
|
# chmod +x ./train.sh
|
|
|
|
# Then, run the training script
|
|
./train.sh
|
|
```
|