mirror of
https://github.com/open-thought/reasoning-gym.git
synced 2026-04-19 12:58:07 +00:00
add minimal verifiers example (#472)
This commit is contained in:
parent
9e79fc84b6
commit
49f3821098
4 changed files with 97 additions and 0 deletions
38
examples/verifiers/README.md
Normal file
38
examples/verifiers/README.md
Normal file
|
|
@ -0,0 +1,38 @@
|
|||
## Setup
|
||||
|
||||
Prepare virtual environment, e.g.
|
||||
|
||||
```bash
|
||||
python -m venv venv
|
||||
source venv/bin/activate
|
||||
```
|
||||
|
||||
Install dependencies
|
||||
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
pip install flash-attn --no-build-isolation
|
||||
```
|
||||
|
||||
Login to W&B and HuggingFace if desired
|
||||
|
||||
```bash
|
||||
wandb login
|
||||
huggingface-cli login
|
||||
```
|
||||
|
||||
## Training
|
||||
|
||||
Here we assume two GPUs, with one used for inference (vLLM) and the other for training (accelerate). You may need to adjust some settings for different GPU configs.
|
||||
|
||||
Run the vLLM server for inference:
|
||||
|
||||
```bash
|
||||
CUDA_VISIBLE_DEVICES=0 vf-vllm --model Qwen/Qwen2.5-1.5B-Instruct --tensor-parallel-size 1
|
||||
```
|
||||
|
||||
Run the training script using accelerate:
|
||||
|
||||
```bash
|
||||
CUDA_VISIBLE_DEVICES=1 accelerate launch --config-file zero3.yaml --num-processes 1 vf_rg.py
|
||||
```
|
||||
Loading…
Add table
Add a link
Reference in a new issue