Jai Suphavadeeprasit
|
a171358f2e
|
structural changes
|
2026-03-13 18:49:30 -04:00 |
|
pre-commit-ci[bot]
|
6c564799bc
|
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
|
2026-03-13 21:02:08 +00:00 |
|
Jai Suphavadeeprasit
|
697c594c72
|
changes
|
2026-03-13 16:58:37 -04:00 |
|
Jai Suphavadeeprasit
|
a8cdb53a4d
|
address problems
|
2026-03-13 16:12:05 -04:00 |
|
Jai Suphavadeeprasit
|
322e7e6623
|
remove comments
|
2026-03-13 13:30:04 -04:00 |
|
Jai Suphavadeeprasit
|
a1b545c734
|
remove cross tokenization and fix location of configs
|
2026-03-13 13:19:28 -04:00 |
|
pre-commit-ci[bot]
|
d1b0dee8f7
|
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
|
2026-03-13 15:14:09 +00:00 |
|
Jai Suphavadeeprasit
|
e79af5ff69
|
testing config
|
2026-03-13 11:06:02 -04:00 |
|
Jai Suphavadeeprasit
|
64794e7c72
|
sneaky bug
|
2026-03-13 11:06:00 -04:00 |
|
Jai Suphavadeeprasit
|
4f33ab8bf4
|
apparently not so easy
|
2026-03-13 11:04:57 -04:00 |
|
Jai Suphavadeeprasit
|
530fed2877
|
testing set up
|
2026-03-13 11:04:57 -04:00 |
|
dmahan93
|
c421582b6f
|
Merge pull request #408 from daspartho/verl-integration-fixes
fix: re-append stop string in math training path
|
2026-03-10 23:08:58 -05:00 |
|
Partho Das
|
632ab0161c
|
Revert "rm hardcoded same score check"
This reverts commit f02c24204d.
|
2026-03-10 01:42:44 +05:30 |
|
Partho Das
|
cd3a9163c7
|
Revert "eval max_token_length consistent with training config"
This reverts commit 5f52befd38.
|
2026-03-08 04:42:02 +05:30 |
|
dmahan93
|
f4875c5dc6
|
make preserve thinking optional
|
2026-03-04 15:44:12 -06:00 |
|
dmahan93
|
12d61d197f
|
add env using the tool api stuff
|
2026-03-03 19:51:30 -06:00 |
|
Partho Das
|
5f52befd38
|
eval max_token_length consistent with training config
instead of hardcoding, follows other envs pattern
|
2026-03-03 18:03:04 +05:30 |
|
dmahan93
|
be73d92723
|
Merge branch 'main' into pipelineRL
|
2026-03-02 16:43:32 -06:00 |
|
Jai Suphavadeeprasit
|
585244559e
|
more readme changes
|
2026-03-02 11:18:52 -05:00 |
|
Jai Suphavadeeprasit
|
4a7da8049f
|
README changes
|
2026-03-02 11:18:52 -05:00 |
|
pre-commit-ci[bot]
|
91afc9e46e
|
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
|
2026-03-02 11:18:52 -05:00 |
|
Jai Suphavadeeprasit
|
d2ea8cd612
|
remove KL
|
2026-03-02 11:18:52 -05:00 |
|
Jai Suphavadeeprasit
|
dbf6026165
|
remove reqs and update community readme
|
2026-03-02 11:18:52 -05:00 |
|
Jai Suphavadeeprasit
|
45708b4b25
|
packageification
|
2026-03-02 11:18:52 -05:00 |
|
Jai Suphavadeeprasit
|
c33f9170c3
|
nccl loras
|
2026-03-02 11:18:52 -05:00 |
|
Jai Suphavadeeprasit
|
2e5fe8bb44
|
math server
|
2026-03-02 11:18:52 -05:00 |
|
pre-commit-ci[bot]
|
5cfd1929f1
|
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
|
2026-03-02 11:18:52 -05:00 |
|
Jai Suphavadeeprasit
|
d07ab3e3ce
|
math zero work arounds
|
2026-03-02 11:18:52 -05:00 |
|
Jai Suphavadeeprasit
|
119721ef3d
|
evals errors
|
2026-03-02 11:18:52 -05:00 |
|
Jai Suphavadeeprasit
|
fb1d983757
|
evals errors
|
2026-03-02 11:18:52 -05:00 |
|
Jai Suphavadeeprasit
|
00801646d7
|
evals erros
|
2026-03-02 11:18:52 -05:00 |
|
Jai Suphavadeeprasit
|
dedb399911
|
evals
|
2026-03-02 11:18:52 -05:00 |
|
Jai Suphavadeeprasit
|
f78c821b8b
|
evals
|
2026-03-02 11:18:52 -05:00 |
|
Jai Suphavadeeprasit
|
58a3fb8b14
|
pipelineRL
|
2026-03-02 11:18:52 -05:00 |
|
Jai Suphavadeeprasit
|
6e975dd951
|
Save the eval to the disk
|
2026-03-02 11:17:44 -05:00 |
|
dmahan93
|
b3065841c1
|
add code-spell and secrects precommit
|
2026-02-27 20:17:19 -06:00 |
|
Alvarez
|
d762c229e2
|
Update instructions.py
|
2026-02-27 10:23:47 +01:00 |
|
dmahan93
|
7ceed9b6d9
|
Merge pull request #388 from milord12345/fix/replace-print-with-logger-reasoning-gym
refactor: replace print statements with self.logger in reasoning_gym_environment.py
|
2026-02-24 14:24:12 -06:00 |
|
Partho Das
|
adf075112c
|
re-append stop in math training path
|
2026-02-24 12:29:57 +05:30 |
|
Partho Das
|
f02c24204d
|
rm hardcoded same score check
|
2026-02-24 12:29:52 +05:30 |
|
dmahan93
|
329a233bba
|
Merge pull request #389 from CreeptoGengar/fix/validate-without-train
fix: handle validation without training
|
2026-02-23 14:21:40 -06:00 |
|
pre-commit-ci[bot]
|
a930d3db12
|
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
|
2026-02-21 19:25:14 +00:00 |
|
VolodymyrBg
|
7e5ddbce06
|
fix: add try/finally to guarantee gym environment cleanup
|
2026-02-21 21:23:46 +02:00 |
|
pre-commit-ci[bot]
|
929980185d
|
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
|
2026-02-21 13:54:38 +00:00 |
|
Gengar
|
34c8c87f0f
|
fix: handle validation without training
Added validation functionality to the training process and refactored validation method to use a dedicated validator instance.
|
2026-02-21 15:53:37 +02:00 |
|
pre-commit-ci[bot]
|
623dadc5cd
|
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
|
2026-02-20 16:43:18 +00:00 |
|
milord1234
|
853703ffc5
|
refactor: replace print statements with self.logger in reasoning_gym_environment.py
Replace 20 print() calls with appropriate logging levels:
- Error messages -> self.logger.error()
- Warnings -> self.logger.warning()
- Info/status messages -> self.logger.info()
- Debug messages -> self.logger.debug()
Left 2 top-level print() calls untouched (no logger access).
|
2026-02-20 19:57:43 +03:30 |
|
dmahan93
|
708b42a00f
|
Merge pull request #378 from johnh4098/add-regex-generation-env
Add regex generation environment for community
|
2026-02-18 12:37:32 -08:00 |
|
pre-commit-ci[bot]
|
53a69d30e1
|
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
|
2026-02-11 19:47:28 +00:00 |
|
johnh4098
|
86d5163316
|
Add regex generation environment for community
|
2026-02-11 23:04:47 +03:30 |
|