reasoning-gym/examples/verifiers/README.md

## Setup

Prepare virtual environment, e.g.

```bash
python -m venv venv
source venv/bin/activate
```

Install dependencies

```bash
pip install -r requirements.txt
pip install flash-attn --no-build-isolation
```

Login to W&B and HuggingFace if desired

```bash
wandb login
huggingface-cli login
```

## Training

Here we assume two GPUs, with one used for inference (vLLM) and the other for training (accelerate). You may need to adjust some settings for different GPU configs.

Run the vLLM server for inference:

```bash
CUDA_VISIBLE_DEVICES=0 vf-vllm --model Qwen/Qwen2.5-1.5B-Instruct --tensor-parallel-size 1
```

Run the training script using accelerate:

```bash
CUDA_VISIBLE_DEVICES=1 accelerate launch --config-file zero3.yaml --num-processes 1 vf_rg.py
```