* cleaned up examples * updated failing hooks * updated readme * corrected linting checks |
||
|---|---|---|
| .. | ||
| README.md | ||
| requirements.txt | ||
| train_grpo_lora.py | ||
Chain Sum LORA Training with unsloth
This example demonstrates how to fine-tune an LLM with RL on a reasoning gym environment using the unsloth framework. Unsloth is a efficient open-source library for fine-tuning & RL. Unsloths default training path uses quantised low rank adaption (QLORA) which results in a signficantly lower memory footprint (\approx 3x) and means you can significantly increase batch sizes and context length without risking OOM errors.
Requirements:
python >= 3.10
Installation
-
Install reasoning-gym:
pip install reasoning-gym -
Install unsloth dependencies:
pip install -r requirements.txt -
Run training script To start training with unsloth with RG environments using default arguments run the following:
python train_grpo_lora.pyTo customise/override any default arguments you can simply:
python train_grpo_lora.py --dataset-name chain_sum --max-seq-length 512 --model-id Qwen/Qwen2.5-7B-Instruct
Note the free open-source version of unsloth is currently built to train models in single GPU environments only.