tutorial(training): Add a minimal example with trl (#473)

* v0 * 2 gpu setup * improve parsing from yaml * update yaml dataset example * remove restriction on flash attn * more comments * first version of the readme * pin torch * simplify requirements * just flash attn * use set env instead * simpler set env * readme * add wandb project to setup * update template * update model id * post init to capture the config and weight * extract metadata * update config * update dataset config * move env for wandb project * pre-commit * remove qwen-math from training * more instructions * unused import * remove trl old * warmup ratio * warmup ratio * change model id * change model_id * add info about CUDA_VISIBLE_DEVICES
2026-04-23 16:55:05 +00:00 · 2025-06-21 00:01:31 +02:00 · 2025-06-21 00:01:31 +02:00 · 56ce2e79a7
commit 56ce2e79a7
parent 49f3821098
59 changed files with 382 additions and 155340 deletions
--- a/examples/trl/set_env.sh
+++ b/examples/trl/set_env.sh
@ -0,0 +1,23 @@
+#!/bin/bash
+# python 3.10 + cuda 11.8.0
+# the execution order the following commands matter
+
+conda clean -a -y
+pip install --upgrade pip
+pip cache purge
+
+# torch
+pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 --index-url https://download.pytorch.org/whl/cu118
+
+# xformers
+pip install xformers==0.0.28.post3 --index-url https://download.pytorch.org/whl/cu118
+
+# vLLM pre-compiled with CUDA 11.8
+pip install https://github.com/vllm-project/vllm/releases/download/v0.7.2/vllm-0.7.2+cu118-cp38-abi3-manylinux1_x86_64.whl
+
+pip install deepspeed
+pip install flash-attn==2.7.3 --no-build-isolation
+pip install "trl==0.15.2"
+pip install "transformers==4.49.0"
+pip install wandb
+pip install reasoning-gym