Commit graph

1348 commits

Author SHA1 Message Date
Jai Suphavadeeprasit
80d2608c4e basic changes 2026-03-02 11:18:52 -05:00
Jai Suphavadeeprasit
14ebf7a492 changes 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
5640d7de25 error handling 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
ff8eaf9e3c param locations update 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
e2c99f7f97 daemon errors 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
4348345dac monkey patch fixes 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
0d71de18d8 changes based on torchtitan 2 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
27b122a415 changes based on torchtitan 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
67e27def11 Cleanup 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
9512177d0a weight updates async 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
e033e24c64 vllm underlying weights 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
533f0bf286 IPC updates 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
78ea8bc3e7 health changes 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
3b469f2445 add missing parameter 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
12c182f3d4 readme updates 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
689055f0ec standardize the training approach 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
b1b9943473 tracking 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
e4fc514763 training bug 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
c336d981ce smol changes 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
a1725e4ae2 design choice - LoRA and shared vLLM through the bridge 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
e202e2c288 gradient checkpointing issue for LoRAs 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
a7bdc0270d stuff 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
f5c847d39d generate endpoint with logprobs 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
2b240bbd2e changes 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
79842edba7 local version 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
2d3c07dcae correction 2026-03-02 11:18:51 -05:00
Jai Suphavadeeprasit
61221dd1a2 initial commit 2026-03-02 11:18:49 -05:00
Jai Suphavadeeprasit
6e975dd951 Save the eval to the disk 2026-03-02 11:17:44 -05:00
J-SUPHA
b763b4e20d
Merge pull request #387 from NousResearch/opd-filtered
Opd filtered
2026-02-27 21:40:03 -05:00
pre-commit-ci[bot]
216c1f5899 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-02-27 21:17:58 +00:00
Jai Suphavadeeprasit
35587cbdc0 logger changes 2026-02-27 16:17:03 -05:00
dmahan93
1bc4b8a680
Merge pull request #400 from prestoalvarez/patch-1
docs: fix typo
2026-02-27 14:47:04 -06:00
pre-commit-ci[bot]
64d3ee1bd6 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-02-27 18:16:06 +00:00
Jai Suphavadeeprasit
836c346406 narrow down scope further 2026-02-27 13:15:23 -05:00
Jai Suphavadeeprasit
f343b24a6a narrow down scope 2026-02-27 11:14:42 -05:00
Alvarez
d762c229e2
Update instructions.py 2026-02-27 10:23:47 +01:00
dmahan93
7ceed9b6d9
Merge pull request #388 from milord12345/fix/replace-print-with-logger-reasoning-gym
refactor: replace print statements with self.logger in reasoning_gym_environment.py
2026-02-24 14:24:12 -06:00
dmahan93
7a3b619190
Merge pull request #392 from Ocheretovich/main
fix: pass num_steps to register_to_api
2026-02-24 14:23:06 -06:00
Jai Suphavadeeprasit
e8d0e74877 gsm8k cleanup 2026-02-24 12:16:00 -05:00
Ocheretovich Oksana
aec5552db6 fix: pass num_steps to register_to_api
Signed-off-by: Ocheretovich Oksana <ocheretovich@gmail.com>
2026-02-24 11:22:18 +02:00
dmahan93
329a233bba
Merge pull request #389 from CreeptoGengar/fix/validate-without-train
fix: handle validation without training
2026-02-23 14:21:40 -06:00
dmahan93
e4974561bf
Merge pull request #390 from VolodymyrBg/fix/blackjack-env-resource-leak
fix: add try/finally to guarantee gym environment cleanup
2026-02-23 14:20:57 -06:00
dmahan93
67514d1f51
Merge pull request #391 from NousResearch/pre-commit-ci-update-config
[pre-commit.ci] pre-commit autoupdate
2026-02-23 12:57:12 -06:00
pre-commit-ci[bot]
186b86151c
[pre-commit.ci] pre-commit autoupdate
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.15.1 → v0.15.2](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.1...v0.15.2)
2026-02-23 16:42:41 +00:00
pre-commit-ci[bot]
a930d3db12 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-02-21 19:25:14 +00:00
VolodymyrBg
7e5ddbce06
fix: add try/finally to guarantee gym environment cleanup 2026-02-21 21:23:46 +02:00
pre-commit-ci[bot]
929980185d [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-02-21 13:54:38 +00:00
Gengar
34c8c87f0f
fix: handle validation without training
Added validation functionality to the training process and refactored validation method to use a dedicated validator instance.
2026-02-21 15:53:37 +02:00
Jai Suphavadeeprasit
e5297148f9 dynamic system prompt fixed 2026-02-20 14:50:43 -05:00
Jai Suphavadeeprasit
fc248dd65b clean 2026-02-20 12:01:50 -05:00