mirror of https://github.com/open-thought/reasoning-gym.git synced 2026-04-19 12:58:07 +00:00

joesharratt1229 1c98584f28

* cleaned up examples

* updated failing hooks

* updated readme

* corrected linting checks

2025-06-28 17:04:38 +01:00

1.1 KiB

Raw Blame History

Chain Sum LORA Training with unsloth

This example demonstrates how to fine-tune an LLM with RL on a reasoning gym environment using the unsloth framework. Unsloth is a efficient open-source library for fine-tuning & RL. Unsloths default training path uses quantised low rank adaption (QLORA) which results in a signficantly lower memory footprint (\approx 3x) and means you can significantly increase batch sizes and context length without risking OOM errors.

Requirements:

python >= 3.10

Installation

Install reasoning-gym:
```
pip install reasoning-gym
```
Install unsloth dependencies:
```
pip install -r requirements.txt
```
Run training script To start training with unsloth with RG environments using default arguments run the following:
```
python train_grpo_lora.py
```
To customise/override any default arguments you can simply:
```
python train_grpo_lora.py  --dataset-name chain_sum --max-seq-length 512 --model-id Qwen/Qwen2.5-7B-Instruct
```

Note the free open-source version of unsloth is currently built to train models in single GPU environments only.

1.1 KiB Raw Blame History

Chain Sum LORA Training with unsloth

Installation

1.1 KiB

Raw Blame History