Feat/open instruct example (#381)

* added open-instruct * fixed hooks * GRPO --------- Co-authored-by: Andreas Koepf <andreas.koepf@provisio.com>
2026-04-19 12:58:07 +00:00 · 2025-03-17 22:20:11 +00:00 · 2025-03-17 22:20:11 +00:00 · 1da84a0b41
commit 1da84a0b41
parent eaef88b45b
5 changed files with 629 additions and 0 deletions
--- a/examples/open-instruct/README.md
+++ b/examples/open-instruct/README.md
@ -0,0 +1,34 @@
+# Reasoning Gym Open-Instruct Example
+
+This example demonstrates how to use [open-instruct](https://github.com/allenai/open-instruct) with **reasoning-gym** for training language models through reinforcement learning.
+
+## Environment Setup
+
+Before running the training script, you may want to set the following environment variables:
+
+```bash
+export CUDA_VISIBLE_DEVICES=0
+export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True
+```
+
+- `CUDA_VISIBLE_DEVICES=0` specifies which GPUs to use (in this case, GPUs 0 )
+- `PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True` enables dynamic GPU memory allocation
+
+## Running the Training Script
+
+To start training, run:
+
+```bash
+bash grpo_config.sh
+```
+
+This will train a DeepSeek-R1-Distill-Qwen-1.5B model using GRPO (Group Relative Policy Optimization) on reasoning tasks.
+
+## Key Features
+
+- Uses vLLM for efficient inference
+- Supports multi-GPU training
+- Implements reward functions for both answer correctness and response formatting
+- Includes evaluation during training
+- Supports model checkpointing and pushing to Hugging Face Hub
+- Integrates with Weights & Biases for experiment tracking