J-SUPHA
b763b4e20d
Merge pull request #387 from NousResearch/opd-filtered
...
Opd filtered
2026-02-27 21:40:03 -05:00
pre-commit-ci[bot]
216c1f5899
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-02-27 21:17:58 +00:00
Jai Suphavadeeprasit
35587cbdc0
logger changes
2026-02-27 16:17:03 -05:00
dmahan93
1bc4b8a680
Merge pull request #400 from prestoalvarez/patch-1
...
docs: fix typo
2026-02-27 14:47:04 -06:00
pre-commit-ci[bot]
64d3ee1bd6
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-02-27 18:16:06 +00:00
Jai Suphavadeeprasit
836c346406
narrow down scope further
2026-02-27 13:15:23 -05:00
Jai Suphavadeeprasit
f343b24a6a
narrow down scope
2026-02-27 11:14:42 -05:00
Alvarez
d762c229e2
Update instructions.py
2026-02-27 10:23:47 +01:00
dmahan93
7ceed9b6d9
Merge pull request #388 from milord12345/fix/replace-print-with-logger-reasoning-gym
...
refactor: replace print statements with self.logger in reasoning_gym_environment.py
2026-02-24 14:24:12 -06:00
dmahan93
7a3b619190
Merge pull request #392 from Ocheretovich/main
...
fix: pass num_steps to register_to_api
2026-02-24 14:23:06 -06:00
Jai Suphavadeeprasit
e8d0e74877
gsm8k cleanup
2026-02-24 12:16:00 -05:00
Ocheretovich Oksana
aec5552db6
fix: pass num_steps to register_to_api
...
Signed-off-by: Ocheretovich Oksana <ocheretovich@gmail.com>
2026-02-24 11:22:18 +02:00
dmahan93
329a233bba
Merge pull request #389 from CreeptoGengar/fix/validate-without-train
...
fix: handle validation without training
2026-02-23 14:21:40 -06:00
dmahan93
e4974561bf
Merge pull request #390 from VolodymyrBg/fix/blackjack-env-resource-leak
...
fix: add try/finally to guarantee gym environment cleanup
2026-02-23 14:20:57 -06:00
dmahan93
67514d1f51
Merge pull request #391 from NousResearch/pre-commit-ci-update-config
...
[pre-commit.ci] pre-commit autoupdate
2026-02-23 12:57:12 -06:00
pre-commit-ci[bot]
186b86151c
[pre-commit.ci] pre-commit autoupdate
...
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.15.1 → v0.15.2](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.1...v0.15.2 )
2026-02-23 16:42:41 +00:00
pre-commit-ci[bot]
a930d3db12
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-02-21 19:25:14 +00:00
VolodymyrBg
7e5ddbce06
fix: add try/finally to guarantee gym environment cleanup
2026-02-21 21:23:46 +02:00
pre-commit-ci[bot]
929980185d
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-02-21 13:54:38 +00:00
Gengar
34c8c87f0f
fix: handle validation without training
...
Added validation functionality to the training process and refactored validation method to use a dedicated validator instance.
2026-02-21 15:53:37 +02:00
Jai Suphavadeeprasit
e5297148f9
dynamic system prompt fixed
2026-02-20 14:50:43 -05:00
Jai Suphavadeeprasit
fc248dd65b
clean
2026-02-20 12:01:50 -05:00
pre-commit-ci[bot]
623dadc5cd
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-02-20 16:43:18 +00:00
milord1234
853703ffc5
refactor: replace print statements with self.logger in reasoning_gym_environment.py
...
Replace 20 print() calls with appropriate logging levels:
- Error messages -> self.logger.error()
- Warnings -> self.logger.warning()
- Info/status messages -> self.logger.info()
- Debug messages -> self.logger.debug()
Left 2 top-level print() calls untouched (no logger access).
2026-02-20 19:57:43 +03:30
Jai Suphavadeeprasit
63007d1209
dynamic system prompts
2026-02-20 03:16:27 -05:00
Jai Suphavadeeprasit
55f7cbd091
dynamic system prompts
2026-02-20 03:14:05 -05:00
Jai Suphavadeeprasit
e615eb1f50
assertions
2026-02-20 02:16:49 -05:00
Jai Suphavadeeprasit
559d649a26
proper fallback
2026-02-20 01:45:41 -05:00
Jai Suphavadeeprasit
3910a58f9b
refactor base
2026-02-20 01:45:41 -05:00
Jai Suphavadeeprasit
1c90fc71b0
on policy clean up
2026-02-20 01:45:41 -05:00
Jai Suphavadeeprasit
79e392c446
post merge changes
2026-02-20 01:45:41 -05:00
Jai Suphavadeeprasit
c89854a350
debug changes
2026-02-20 01:45:41 -05:00
Jai Suphavadeeprasit
0510ca9b72
found bug
2026-02-20 01:45:41 -05:00
Jai Suphavadeeprasit
fb23014dcc
base env debugging
2026-02-20 01:45:41 -05:00
Jai Suphavadeeprasit
ea2b388435
base env debugging
2026-02-20 01:45:41 -05:00
Jai Suphavadeeprasit
e814007575
base env debugging
2026-02-20 01:45:41 -05:00
Jai Suphavadeeprasit
b492ac4fce
on policy changes
2026-02-20 01:45:41 -05:00
Jai Suphavadeeprasit
3fdaff9bb4
Fix math_server_zero.py to support CLI OpenAI arguments
...
Change ServerBaseline to APIServerConfig in config_init() so that
--openai.base_url and other CLI arguments work for on-policy distillation.
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-20 01:45:41 -05:00
Jai Suphavadeeprasit
6bc962c746
initial commit
2026-02-20 01:45:41 -05:00
dmahan93
708b42a00f
Merge pull request #378 from johnh4098/add-regex-generation-env
...
Add regex generation environment for community
2026-02-18 12:37:32 -08:00
dmahan93
e2abc5e1a0
Merge pull request #377 from victlop/cleanup/remove-redundant-import-comments
...
chore: remove redundant inline comments from swe_rl_env.py imports
2026-02-18 12:35:22 -08:00
dmahan93
169bf92845
Merge pull request #382 from NousResearch/pre-commit-ci-update-config
...
[pre-commit.ci] pre-commit autoupdate
2026-02-18 12:26:54 -08:00
pre-commit-ci[bot]
5dd52af0ef
[pre-commit.ci] pre-commit autoupdate
...
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.15.0 → v0.15.1](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.0...v0.15.1 )
2026-02-16 16:40:51 +00:00
pre-commit-ci[bot]
53a69d30e1
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-02-11 19:47:28 +00:00
johnh4098
86d5163316
Add regex generation environment for community
2026-02-11 23:04:47 +03:30
victlop
4c18b883c5
Merge branch 'main' into cleanup/remove-redundant-import-comments
2026-02-11 19:14:59 +03:30
victlop
a1823f99fb
chore: remove redundant inline comments from swe_rl_env.py imports
2026-02-11 19:08:47 +03:30
dmahan93
81b2d4daab
Merge pull request #375 from NousResearch/pre-commit-ci-update-config
...
[pre-commit.ci] pre-commit autoupdate
2026-02-09 21:09:44 -08:00
dmahan93
9ffd4de275
Merge pull request #362 from ansulx/fix/trl-vllm-completion-test
...
Add regression test for TRL vLLM completion wrapper
2026-02-09 21:06:12 -08:00
dmahan93
1580ab5934
Merge pull request #365 from alireza78a/fix/replace-debug-prints-with-logger
...
fix: replace debug print statements with logger
2026-02-09 21:01:38 -08:00