mirror of https://github.com/open-thought/reasoning-gym.git synced 2026-04-19 12:58:07 +00:00

History

joesharratt1229 51c2afc1fc Fix/verl example (#465 ) * updated verl ex * updated script * removed curriculum verl and updated * updatied linting errors * renamed * updated config		2025-06-09 09:53:43 +01:00
..
config	Fix/verl example (#465 )	2025-06-09 09:53:43 +01:00
grpo_train.py	Fix/verl example (#465 )	2025-06-09 09:53:43 +01:00
README.md	Fix/verl example (#465 )	2025-06-09 09:53:43 +01:00

README.md

Chain Sum Training with veRL

This example demonstrates how to train a language model using veRL (Volcano Engine Reinforcement Learning) with the reasoning-gym environment for chain sum problems.

Requirements:

python >= 3.10

Installation

Install veRL: Follow the installation instructions at veRL repository
Install reasoning-gym:
```
pip install reasoning-gym
```

Training

To start training the model on chain sum problems:

python grpo_train.py --config-path config --config-name grpo_trainer

Configuration

You can modify the training by editing the configuration file or overriding arguments in the shell scripts directly

# Change dataset
Here it is easiest to modify the `config/grpo_trainer.yaml` file with a custom training composite. Here is an example experiment which uses a composite of algorithmic training tasks
```yaml
reasoning_gym:
  dataset_size: 20000
  developer_prompt: DeepSeekZero
  datasets:
    ab:
      weight: 1
    base_conversion:
      weight: 1
    binary_alternation:
      weight: 1
      config:
        p_solvable: 0.9
    binary_matrix:
      weight: 1
      config:
        min_n: 2
        max_n: 6
    caesar_cipher:
      weight: 1
      config:
        max_words: 10
    cryptarithm:
      weight: 1
    isomorphic_strings:
      weight: 1
      config:
        max_string_length: 8

Change configuration Set project_name and experiment_name if logging your runs to W&B. T

This config assumes a single GPU node, but you can configure this too. The following command would be for 2 GPUs, with 1 used for vLLM rollouts:

python3 -u train_grpo.py --config-paths configs/inter_generalisation --config-name algorithmic_qwen_3b
actor_rollout_ref.rollout.tensor_model_parallel_size=1
trainer.n_gpus_per_node=2
trainer.project_name=rg-grpo
trainer.experiment_name=algorithmic_qwen2.5_3b

Or similarly you could define this in a config file directly