initial verl training codebase (#389)

* fixes for latest verl
* composite dataset training experiment
* use stateful dataloaders to match verl changes
* training readme
* add formatting reward
* length reward impl
* standalone reasoning_gym config section
* curriculum learning, new length reward, more config
This commit is contained in:
Oliver Stanley 2025-03-20 15:04:57 +00:00 committed by GitHub
parent ce0a6c4878
commit eb69916c1b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
8 changed files with 910 additions and 0 deletions

View file

@ -0,0 +1,3 @@
from .datasets import ReasoningGymDataset, make_dataset
__all__ = ["ReasoningGymDataset", "make_dataset"]