Commit graph

24 commits

Author SHA1 Message Date
Jai Suphavadeeprasit
80d2608c4e basic changes 2026-03-02 11:18:52 -05:00
Jai Suphavadeeprasit
14ebf7a492 changes 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
5640d7de25 error handling 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
ff8eaf9e3c param locations update 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
27b122a415 changes based on torchtitan 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
533f0bf286 IPC updates 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
78ea8bc3e7 health changes 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
3b469f2445 add missing parameter 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
689055f0ec standardize the training approach 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
b1b9943473 tracking 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
e4fc514763 training bug 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
c336d981ce smol changes 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
a1725e4ae2 design choice - LoRA and shared vLLM through the bridge 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
e202e2c288 gradient checkpointing issue for LoRAs 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
a7bdc0270d stuff 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
79842edba7 local version 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
61221dd1a2 initial commit 2026-03-02 11:18:49 -05:00
ropresearch
6a20b90549 added gen params for latest examples endpoint 2025-10-01 13:05:37 -04:00
ropresearch
1a68f691f6 linting fixes 2025-09-25 17:17:44 -04:00
ropresearch
c3fc68879c group temps, sample temps, and logprob api params 2025-09-25 16:41:58 -04:00
Brawn
eb179e7fca
Update grpo.py 2025-08-14 20:20:41 +03:00
Brawn
6dccdcc67e
fix: division-by-zero in gradient calculation 2025-08-14 14:33:46 +03:00
dmahan93
40b12dae60 run pre-commit on all files 2025-05-09 09:54:20 -05:00
Dakota Nous
621d00dd80 first commit 2025-04-29 12:10:10 -07:00