Commit graph

14 commits

Author SHA1 Message Date
Andreas Köpf
7b72c3470b
docs: Update TRL README with GRPO example details and usage instructions (#76) 2025-02-07 07:56:22 +01:00
joesharratt1229
a8e11e71be
Test training with trl (#70)
* first trl grpo implementation
* added config yaml file
* added read me and dependencies
* updated reward format func
2025-02-07 07:42:32 +01:00
Cavit Erginsoy
aff0fecef4 lint 2025-02-03 11:35:30 +00:00
Cavit Erginsoy
9b1068ea39 Merge remote-tracking branch 'upstream/main' 2025-02-03 07:44:32 +00:00
Cavit Erginsoy
7b61fc5043 Completed: full example suite 2025-02-03 07:21:12 +00:00
Andreas Koepf
8202f234be reduce veRL example size 2025-02-01 23:56:11 +00:00
Andreas Koepf
3f24df31dc add deps for veRL experiment in README 2025-02-01 21:27:33 +00:00
Andreas Koepf
e671b97ab4 first bits of veRL example 2025-02-01 21:20:36 +00:00
Andreas Koepf
5ae329becd lint 2025-01-30 23:14:32 +01:00
Cavit Erginsoy
df3c4580ee INIT 2025-01-30 21:32:46 +00:00
Andreas Koepf
fc775eda7e lint, seed & size for figlet 2025-01-30 00:58:34 +01:00
Andreas Koepf
f7313d409c use more realistic hparams for OpenRLHF example 2025-01-28 22:20:22 +00:00
Andreas Koepf
c196d622e0 extract answer from last answer tag 2025-01-28 16:37:19 +00:00
Andreas Koepf
cc0312e446 add first example with OpenRLHF 2025-01-28 14:40:06 +00:00