Andreas Köpf
|
7b72c3470b
|
docs: Update TRL README with GRPO example details and usage instructions (#76)
|
2025-02-07 07:56:22 +01:00 |
|
joesharratt1229
|
a8e11e71be
|
Test training with trl (#70)
* first trl grpo implementation
* added config yaml file
* added read me and dependencies
* updated reward format func
|
2025-02-07 07:42:32 +01:00 |
|
Cavit Erginsoy
|
aff0fecef4
|
lint
|
2025-02-03 11:35:30 +00:00 |
|
Cavit Erginsoy
|
9b1068ea39
|
Merge remote-tracking branch 'upstream/main'
|
2025-02-03 07:44:32 +00:00 |
|
Cavit Erginsoy
|
7b61fc5043
|
Completed: full example suite
|
2025-02-03 07:21:12 +00:00 |
|
Andreas Koepf
|
8202f234be
|
reduce veRL example size
|
2025-02-01 23:56:11 +00:00 |
|
Andreas Koepf
|
3f24df31dc
|
add deps for veRL experiment in README
|
2025-02-01 21:27:33 +00:00 |
|
Andreas Koepf
|
e671b97ab4
|
first bits of veRL example
|
2025-02-01 21:20:36 +00:00 |
|
Andreas Koepf
|
5ae329becd
|
lint
|
2025-01-30 23:14:32 +01:00 |
|
Cavit Erginsoy
|
df3c4580ee
|
INIT
|
2025-01-30 21:32:46 +00:00 |
|
Andreas Koepf
|
fc775eda7e
|
lint, seed & size for figlet
|
2025-01-30 00:58:34 +01:00 |
|
Andreas Koepf
|
f7313d409c
|
use more realistic hparams for OpenRLHF example
|
2025-01-28 22:20:22 +00:00 |
|
Andreas Koepf
|
c196d622e0
|
extract answer from last answer tag
|
2025-01-28 16:37:19 +00:00 |
|
Andreas Koepf
|
cc0312e446
|
add first example with OpenRLHF
|
2025-01-28 14:40:06 +00:00 |
|