Commit graph

14 commits

Author SHA1 Message Date
Andreas Köpf
a8f9eafd43 docs: Update TRL README with GRPO example details and usage instructions (#76) 2025-02-07 07:56:22 +01:00
joesharratt1229
d61db3772a Test training with trl (#70)
* first trl grpo implementation
* added config yaml file
* added read me and dependencies
* updated reward format func
2025-02-07 07:42:32 +01:00
Cavit Erginsoy
6c564b3dd9 lint 2025-02-03 11:35:30 +00:00
Cavit Erginsoy
1e27021e11 Merge remote-tracking branch 'upstream/main' 2025-02-03 07:44:32 +00:00
Cavit Erginsoy
de7d37f14f Completed: full example suite 2025-02-03 07:21:12 +00:00
Andreas Koepf
69a0c620f9 reduce veRL example size 2025-02-01 23:56:11 +00:00
Andreas Koepf
703cce274f add deps for veRL experiment in README 2025-02-01 21:27:33 +00:00
Andreas Koepf
c17f17fa9d first bits of veRL example 2025-02-01 21:20:36 +00:00
Andreas Koepf
bf62f631dd lint 2025-01-30 23:14:32 +01:00
Cavit Erginsoy
d57a7947a4 INIT 2025-01-30 21:32:46 +00:00
Andreas Koepf
9450768aad lint, seed & size for figlet 2025-01-30 00:58:34 +01:00
Andreas Koepf
33977f75f6 use more realistic hparams for OpenRLHF example 2025-01-28 22:20:22 +00:00
Andreas Koepf
1bc56b8559 extract answer from last answer tag 2025-01-28 16:37:19 +00:00
Andreas Koepf
655de7a7f3 add first example with OpenRLHF 2025-01-28 14:40:06 +00:00