Jai Suphavadeeprasit
|
80d2608c4e
|
basic changes
|
2026-03-02 11:18:52 -05:00 |
|
Jai Suphavadeeprasit
|
14ebf7a492
|
changes
|
2026-03-02 11:18:51 -05:00 |
|
Jai Suphavadeeprasit
|
5640d7de25
|
error handling
|
2026-03-02 11:18:51 -05:00 |
|
Jai Suphavadeeprasit
|
ff8eaf9e3c
|
param locations update
|
2026-03-02 11:18:51 -05:00 |
|
Jai Suphavadeeprasit
|
e2c99f7f97
|
daemon errors
|
2026-03-02 11:18:51 -05:00 |
|
Jai Suphavadeeprasit
|
4348345dac
|
monkey patch fixes
|
2026-03-02 11:18:51 -05:00 |
|
Jai Suphavadeeprasit
|
0d71de18d8
|
changes based on torchtitan 2
|
2026-03-02 11:18:51 -05:00 |
|
Jai Suphavadeeprasit
|
27b122a415
|
changes based on torchtitan
|
2026-03-02 11:18:51 -05:00 |
|
Jai Suphavadeeprasit
|
67e27def11
|
Cleanup
|
2026-03-02 11:18:51 -05:00 |
|
Jai Suphavadeeprasit
|
9512177d0a
|
weight updates async
|
2026-03-02 11:18:51 -05:00 |
|
Jai Suphavadeeprasit
|
e033e24c64
|
vllm underlying weights
|
2026-03-02 11:18:51 -05:00 |
|
Jai Suphavadeeprasit
|
533f0bf286
|
IPC updates
|
2026-03-02 11:18:51 -05:00 |
|
Jai Suphavadeeprasit
|
78ea8bc3e7
|
health changes
|
2026-03-02 11:18:51 -05:00 |
|
Jai Suphavadeeprasit
|
3b469f2445
|
add missing parameter
|
2026-03-02 11:18:51 -05:00 |
|
Jai Suphavadeeprasit
|
12c182f3d4
|
readme updates
|
2026-03-02 11:18:51 -05:00 |
|
Jai Suphavadeeprasit
|
689055f0ec
|
standardize the training approach
|
2026-03-02 11:18:51 -05:00 |
|
Jai Suphavadeeprasit
|
b1b9943473
|
tracking
|
2026-03-02 11:18:51 -05:00 |
|
Jai Suphavadeeprasit
|
e4fc514763
|
training bug
|
2026-03-02 11:18:51 -05:00 |
|
Jai Suphavadeeprasit
|
c336d981ce
|
smol changes
|
2026-03-02 11:18:51 -05:00 |
|
Jai Suphavadeeprasit
|
a1725e4ae2
|
design choice - LoRA and shared vLLM through the bridge
|
2026-03-02 11:18:51 -05:00 |
|
Jai Suphavadeeprasit
|
e202e2c288
|
gradient checkpointing issue for LoRAs
|
2026-03-02 11:18:51 -05:00 |
|
Jai Suphavadeeprasit
|
a7bdc0270d
|
stuff
|
2026-03-02 11:18:51 -05:00 |
|
Jai Suphavadeeprasit
|
f5c847d39d
|
generate endpoint with logprobs
|
2026-03-02 11:18:51 -05:00 |
|
Jai Suphavadeeprasit
|
2b240bbd2e
|
changes
|
2026-03-02 11:18:51 -05:00 |
|
Jai Suphavadeeprasit
|
79842edba7
|
local version
|
2026-03-02 11:18:51 -05:00 |
|
Jai Suphavadeeprasit
|
2d3c07dcae
|
correction
|
2026-03-02 11:18:51 -05:00 |
|
Jai Suphavadeeprasit
|
61221dd1a2
|
initial commit
|
2026-03-02 11:18:49 -05:00 |
|
Jai Suphavadeeprasit
|
6e975dd951
|
Save the eval to the disk
|
2026-03-02 11:17:44 -05:00 |
|
J-SUPHA
|
b763b4e20d
|
Merge pull request #387 from NousResearch/opd-filtered
Opd filtered
|
2026-02-27 21:40:03 -05:00 |
|
pre-commit-ci[bot]
|
216c1f5899
|
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
|
2026-02-27 21:17:58 +00:00 |
|
Jai Suphavadeeprasit
|
35587cbdc0
|
logger changes
|
2026-02-27 16:17:03 -05:00 |
|
dmahan93
|
1bc4b8a680
|
Merge pull request #400 from prestoalvarez/patch-1
docs: fix typo
|
2026-02-27 14:47:04 -06:00 |
|
pre-commit-ci[bot]
|
64d3ee1bd6
|
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
|
2026-02-27 18:16:06 +00:00 |
|
Jai Suphavadeeprasit
|
836c346406
|
narrow down scope further
|
2026-02-27 13:15:23 -05:00 |
|
Jai Suphavadeeprasit
|
f343b24a6a
|
narrow down scope
|
2026-02-27 11:14:42 -05:00 |
|
Alvarez
|
d762c229e2
|
Update instructions.py
|
2026-02-27 10:23:47 +01:00 |
|
dmahan93
|
7ceed9b6d9
|
Merge pull request #388 from milord12345/fix/replace-print-with-logger-reasoning-gym
refactor: replace print statements with self.logger in reasoning_gym_environment.py
|
2026-02-24 14:24:12 -06:00 |
|
dmahan93
|
7a3b619190
|
Merge pull request #392 from Ocheretovich/main
fix: pass num_steps to register_to_api
|
2026-02-24 14:23:06 -06:00 |
|
Jai Suphavadeeprasit
|
e8d0e74877
|
gsm8k cleanup
|
2026-02-24 12:16:00 -05:00 |
|
Ocheretovich Oksana
|
aec5552db6
|
fix: pass num_steps to register_to_api
Signed-off-by: Ocheretovich Oksana <ocheretovich@gmail.com>
|
2026-02-24 11:22:18 +02:00 |
|
dmahan93
|
329a233bba
|
Merge pull request #389 from CreeptoGengar/fix/validate-without-train
fix: handle validation without training
|
2026-02-23 14:21:40 -06:00 |
|
dmahan93
|
e4974561bf
|
Merge pull request #390 from VolodymyrBg/fix/blackjack-env-resource-leak
fix: add try/finally to guarantee gym environment cleanup
|
2026-02-23 14:20:57 -06:00 |
|
dmahan93
|
67514d1f51
|
Merge pull request #391 from NousResearch/pre-commit-ci-update-config
[pre-commit.ci] pre-commit autoupdate
|
2026-02-23 12:57:12 -06:00 |
|
pre-commit-ci[bot]
|
186b86151c
|
[pre-commit.ci] pre-commit autoupdate
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.15.1 → v0.15.2](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.1...v0.15.2)
|
2026-02-23 16:42:41 +00:00 |
|
pre-commit-ci[bot]
|
a930d3db12
|
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
|
2026-02-21 19:25:14 +00:00 |
|
VolodymyrBg
|
7e5ddbce06
|
fix: add try/finally to guarantee gym environment cleanup
|
2026-02-21 21:23:46 +02:00 |
|
pre-commit-ci[bot]
|
929980185d
|
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
|
2026-02-21 13:54:38 +00:00 |
|
Gengar
|
34c8c87f0f
|
fix: handle validation without training
Added validation functionality to the training process and refactored validation method to use a dedicated validator instance.
|
2026-02-21 15:53:37 +02:00 |
|
Jai Suphavadeeprasit
|
e5297148f9
|
dynamic system prompt fixed
|
2026-02-20 14:50:43 -05:00 |
|
Jai Suphavadeeprasit
|
fc248dd65b
|
clean
|
2026-02-20 12:01:50 -05:00 |
|