mirror of https://github.com/open-thought/reasoning-gym.git synced 2026-04-19 12:58:07 +00:00

History

joesharratt1229 1c98584f28 Feat/unsloth example (#482 ) * cleaned up examples * updated failing hooks * updated readme * corrected linting checks		2025-06-28 17:04:38 +01:00
..
README.md	Feat/unsloth example (#482 )	2025-06-28 17:04:38 +01:00
requirements.txt	Add requirements for Unsloth example	2025-02-20 22:29:13 +00:00
train_grpo_lora.py	Feat/unsloth example (#482 )	2025-06-28 17:04:38 +01:00

README.md

Chain Sum LORA Training with unsloth

This example demonstrates how to fine-tune an LLM with RL on a reasoning gym environment using the unsloth framework. Unsloth is a efficient open-source library for fine-tuning & RL. Unsloths default training path uses quantised low rank adaption (QLORA) which results in a signficantly lower memory footprint (\approx 3x) and means you can significantly increase batch sizes and context length without risking OOM errors.

Requirements:

python >= 3.10

Installation

Install reasoning-gym:
```
pip install reasoning-gym
```
Install unsloth dependencies:
```
pip install -r requirements.txt
```
Run training script To start training with unsloth with RG environments using default arguments run the following:
```
python train_grpo_lora.py
```
To customise/override any default arguments you can simply:
```
python train_grpo_lora.py  --dataset-name chain_sum --max-seq-length 512 --model-id Qwen/Qwen2.5-7B-Instruct
```

Note the free open-source version of unsloth is currently built to train models in single GPU environments only.