Feat/unsloth example (#482)

* cleaned up examples

* updated failing hooks

* updated readme

* corrected linting checks
This commit is contained in:
joesharratt1229 2025-06-28 17:04:38 +01:00 committed by GitHub
parent d9cd20c174
commit 1c98584f28
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
29 changed files with 122 additions and 2857 deletions

View file

@ -0,0 +1,30 @@
# Chain Sum LORA Training with unsloth
This example demonstrates how to fine-tune an LLM with RL on a reasoning gym environment using the **unsloth** framework. Unsloth is a efficient open-source library for fine-tuning & RL. Unsloths default training path uses quantised low rank adaption (QLORA) which results in a signficantly lower memory footprint ($\approx 3x$) and means you can significantly increase batch sizes and context length without risking OOM errors.
Requirements:
python >= 3.10
## Installation
1. **Install reasoning-gym**:
```bash
pip install reasoning-gym
```
2. **Install unsloth dependencies**:
```bash
pip install -r requirements.txt
```
3. **Run training script**
To start training with unsloth with RG environments using default arguments run the following:
```bash
python train_grpo_lora.py
```
To customise/override any default arguments you can simply:
```bash
python train_grpo_lora.py --dataset-name chain_sum --max-seq-length 512 --model-id Qwen/Qwen2.5-7B-Instruct
**Note** the free open-source version of unsloth is currently built to train models in single GPU environments only.