Commit graph

790 commits

Author SHA1 Message Date
dmahan93
c421582b6f
Merge pull request #408 from daspartho/verl-integration-fixes
fix: re-append stop string in math training path
2026-03-10 23:08:58 -05:00
Partho Das
632ab0161c Revert "rm hardcoded same score check"
This reverts commit f02c24204d.
2026-03-10 01:42:44 +05:30
Partho Das
cd3a9163c7 Revert "eval max_token_length consistent with training config"
This reverts commit 5f52befd38.
2026-03-08 04:42:02 +05:30
dmahan93
f4875c5dc6 make preserve thinking optional 2026-03-04 15:44:12 -06:00
dmahan93
12d61d197f add env using the tool api stuff 2026-03-03 19:51:30 -06:00
Partho Das
5f52befd38 eval max_token_length consistent with training config
instead of hardcoding, follows other envs pattern
2026-03-03 18:03:04 +05:30
dmahan93
be73d92723
Merge branch 'main' into pipelineRL 2026-03-02 16:43:32 -06:00
Jai Suphavadeeprasit
585244559e more readme changes 2026-03-02 11:18:52 -05:00
Jai Suphavadeeprasit
4a7da8049f README changes 2026-03-02 11:18:52 -05:00
pre-commit-ci[bot]
91afc9e46e [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-02 11:18:52 -05:00
Jai Suphavadeeprasit
d2ea8cd612 remove KL 2026-03-02 11:18:52 -05:00
Jai Suphavadeeprasit
dbf6026165 remove reqs and update community readme 2026-03-02 11:18:52 -05:00
Jai Suphavadeeprasit
45708b4b25 packageification 2026-03-02 11:18:52 -05:00
Jai Suphavadeeprasit
c33f9170c3 nccl loras 2026-03-02 11:18:52 -05:00
Jai Suphavadeeprasit
2e5fe8bb44 math server 2026-03-02 11:18:52 -05:00
pre-commit-ci[bot]
5cfd1929f1 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-02 11:18:52 -05:00
Jai Suphavadeeprasit
d07ab3e3ce math zero work arounds 2026-03-02 11:18:52 -05:00
Jai Suphavadeeprasit
119721ef3d evals errors 2026-03-02 11:18:52 -05:00
Jai Suphavadeeprasit
fb1d983757 evals errors 2026-03-02 11:18:52 -05:00
Jai Suphavadeeprasit
00801646d7 evals erros 2026-03-02 11:18:52 -05:00
Jai Suphavadeeprasit
dedb399911 evals 2026-03-02 11:18:52 -05:00
Jai Suphavadeeprasit
f78c821b8b evals 2026-03-02 11:18:52 -05:00
Jai Suphavadeeprasit
58a3fb8b14 pipelineRL 2026-03-02 11:18:52 -05:00
Jai Suphavadeeprasit
6e975dd951 Save the eval to the disk 2026-03-02 11:17:44 -05:00
dmahan93
b3065841c1 add code-spell and secrects precommit 2026-02-27 20:17:19 -06:00
Alvarez
d762c229e2
Update instructions.py 2026-02-27 10:23:47 +01:00
dmahan93
7ceed9b6d9
Merge pull request #388 from milord12345/fix/replace-print-with-logger-reasoning-gym
refactor: replace print statements with self.logger in reasoning_gym_environment.py
2026-02-24 14:24:12 -06:00
Partho Das
adf075112c re-append stop in math training path 2026-02-24 12:29:57 +05:30
Partho Das
f02c24204d rm hardcoded same score check 2026-02-24 12:29:52 +05:30
dmahan93
329a233bba
Merge pull request #389 from CreeptoGengar/fix/validate-without-train
fix: handle validation without training
2026-02-23 14:21:40 -06:00
pre-commit-ci[bot]
a930d3db12 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-02-21 19:25:14 +00:00
VolodymyrBg
7e5ddbce06
fix: add try/finally to guarantee gym environment cleanup 2026-02-21 21:23:46 +02:00
pre-commit-ci[bot]
929980185d [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-02-21 13:54:38 +00:00
Gengar
34c8c87f0f
fix: handle validation without training
Added validation functionality to the training process and refactored validation method to use a dedicated validator instance.
2026-02-21 15:53:37 +02:00
pre-commit-ci[bot]
623dadc5cd [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-02-20 16:43:18 +00:00
milord1234
853703ffc5 refactor: replace print statements with self.logger in reasoning_gym_environment.py
Replace 20 print() calls with appropriate logging levels:
- Error messages -> self.logger.error()
- Warnings -> self.logger.warning()
- Info/status messages -> self.logger.info()
- Debug messages -> self.logger.debug()

Left 2 top-level print() calls untouched (no logger access).
2026-02-20 19:57:43 +03:30
dmahan93
708b42a00f
Merge pull request #378 from johnh4098/add-regex-generation-env
Add regex generation environment for community
2026-02-18 12:37:32 -08:00
pre-commit-ci[bot]
53a69d30e1 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-02-11 19:47:28 +00:00
johnh4098
86d5163316 Add regex generation environment for community 2026-02-11 23:04:47 +03:30
victlop
4c18b883c5
Merge branch 'main' into cleanup/remove-redundant-import-comments 2026-02-11 19:14:59 +03:30
victlop
a1823f99fb chore: remove redundant inline comments from swe_rl_env.py imports 2026-02-11 19:08:47 +03:30
dmahan93
1580ab5934
Merge pull request #365 from alireza78a/fix/replace-debug-prints-with-logger
fix: replace debug print statements with logger
2026-02-09 21:01:38 -08:00
Alireza
6b92ee16ec fix duplicate code + add safety checks 2026-02-09 10:58:49 +03:30
alireza78a
1303cb59e8 fix: replace debug print statements with logger in dataset_env and infinimath_env 2026-02-07 14:51:33 +00:00
Teknium
462abbebf7
Merge pull request #339 from VolodymyrBg/bg
chore: fix typos
2026-01-31 09:03:17 -08:00
Teknium
efc85528bc
Merge pull request #338 from windlgrass/fix-init-current-item
fix: initialize current_item in __init__ to prevent AttributeError
2026-01-31 09:02:06 -08:00
Teknium
8b22416dd4
Merge branch 'main' into fix-duplicate-code 2026-01-31 08:52:43 -08:00
VolodymyrBg
f285bbd417
Update refusalbench_environment.py 2026-01-29 12:43:15 +02:00
VolodymyrBg
94f29eac18
Update simpleqa_eval.py 2026-01-29 12:42:28 +02:00
VolodymyrBg
347edc9188
Update instructions.py 2026-01-29 12:31:52 +02:00