cleaned up examples

2026-04-22 16:49:06 +00:00 · 2025-06-27 07:58:46 +00:00 · 2025-06-27 07:58:46 +00:00 · 799eb51800
commit 799eb51800
parent d9cd20c174
33 changed files with 117 additions and 2954 deletions
--- a/examples/unsloth/README.md
+++ b/examples/unsloth/README.md
@ -0,0 +1,30 @@
+# Chain Sum LORA Training with unsloth
+
+This example demonstrates how to fine-tune an LLM with RL on a reasoning gym environment using the **unsloth** framework. Unsloth is a efficient open-source library for fine-tuning & RL. Unsloths default training path uses quantised low rank adaption (QLORA) which results in a signficantly lower memory footprint ($\approx 3x$) and means you can significantly increase batch sizes and context length without risking OOM errors.
+
+Requirements:
+
+python >= 3.10
+
+## Installation
+
+1. **Install reasoning-gym**:
+   ```bash
+   pip install reasoning-gym
+   ```
+2. **Install unsloth dependencies**:
+   ```bash
+   pip install -r requirements.txt
+   ```
+3. **Run training script**
+   To start training with unsloth with RG environments using default arguments run the following:
+
+   ```bash
+   python train_grpo_lora.py
+   ```
+
+   To customise/override any default arguments you can simply: 
+   ```bash
+   python train_grpo_lora.py  --dataset-name chain_sum --max-seq-length 512 --model-id Qwen/Qwen2.5-7B-Instruct
+
+**Note** the free open-source version of  unsloth is currently built to train models in single GPU environments only.