Andreas Köpf
|
a8f9eafd43
|
docs: Update TRL README with GRPO example details and usage instructions (#76)
|
2025-02-07 07:56:22 +01:00 |
|
joesharratt1229
|
d61db3772a
|
Test training with trl (#70)
* first trl grpo implementation
* added config yaml file
* added read me and dependencies
* updated reward format func
|
2025-02-07 07:42:32 +01:00 |
|
Cavit Erginsoy
|
6c564b3dd9
|
lint
|
2025-02-03 11:35:30 +00:00 |
|
Cavit Erginsoy
|
1e27021e11
|
Merge remote-tracking branch 'upstream/main'
|
2025-02-03 07:44:32 +00:00 |
|
Cavit Erginsoy
|
de7d37f14f
|
Completed: full example suite
|
2025-02-03 07:21:12 +00:00 |
|
Andreas Koepf
|
69a0c620f9
|
reduce veRL example size
|
2025-02-01 23:56:11 +00:00 |
|
Andreas Koepf
|
703cce274f
|
add deps for veRL experiment in README
|
2025-02-01 21:27:33 +00:00 |
|
Andreas Koepf
|
c17f17fa9d
|
first bits of veRL example
|
2025-02-01 21:20:36 +00:00 |
|
Andreas Koepf
|
bf62f631dd
|
lint
|
2025-01-30 23:14:32 +01:00 |
|
Cavit Erginsoy
|
d57a7947a4
|
INIT
|
2025-01-30 21:32:46 +00:00 |
|
Andreas Koepf
|
9450768aad
|
lint, seed & size for figlet
|
2025-01-30 00:58:34 +01:00 |
|
Andreas Koepf
|
33977f75f6
|
use more realistic hparams for OpenRLHF example
|
2025-01-28 22:20:22 +00:00 |
|
Andreas Koepf
|
1bc56b8559
|
extract answer from last answer tag
|
2025-01-28 16:37:19 +00:00 |
|
Andreas Koepf
|
655de7a7f3
|
add first example with OpenRLHF
|
2025-01-28 14:40:06 +00:00 |
|