reasoning-gym/examples/verifiers/README.md at 2d19f13e0f40285ac286a3518c7d7fbe86073dbf

open-thought/reasoning-gym

Fork 0

mirror of https://github.com/open-thought/reasoning-gym.git synced 2026-04-19 12:58:07 +00:00

Oliver Stanley 49f3821098

add minimal verifiers example (#472 )

2025-06-20 16:31:02 +01:00

769 B

Raw Blame History

Setup

Prepare virtual environment, e.g.

python -m venv venv
source venv/bin/activate

Install dependencies

pip install -r requirements.txt
pip install flash-attn --no-build-isolation

wandb login
huggingface-cli login

Training

Here we assume two GPUs, with one used for inference (vLLM) and the other for training (accelerate). You may need to adjust some settings for different GPU configs.

Run the vLLM server for inference:

CUDA_VISIBLE_DEVICES=0 vf-vllm --model Qwen/Qwen2.5-1.5B-Instruct --tensor-parallel-size 1

Run the training script using accelerate:

CUDA_VISIBLE_DEVICES=1 accelerate launch --config-file zero3.yaml --num-processes 1 vf_rg.py

769 B Raw Blame History

Setup

Training

769 B

Raw Blame History