reasoning-gym/examples/verifiers/README.md
2025-06-20 16:31:02 +01:00

769 B

Setup

Prepare virtual environment, e.g.

python -m venv venv
source venv/bin/activate

Install dependencies

pip install -r requirements.txt
pip install flash-attn --no-build-isolation

Login to W&B and HuggingFace if desired

wandb login
huggingface-cli login

Training

Here we assume two GPUs, with one used for inference (vLLM) and the other for training (accelerate). You may need to adjust some settings for different GPU configs.

Run the vLLM server for inference:

CUDA_VISIBLE_DEVICES=0 vf-vllm --model Qwen/Qwen2.5-1.5B-Instruct --tensor-parallel-size 1

Run the training script using accelerate:

CUDA_VISIBLE_DEVICES=1 accelerate launch --config-file zero3.yaml --num-processes 1 vf_rg.py