Jai Suphavadeeprasit
3910a58f9b
refactor base
2026-02-20 01:45:41 -05:00
Jai Suphavadeeprasit
1c90fc71b0
on policy clean up
2026-02-20 01:45:41 -05:00
Jai Suphavadeeprasit
79e392c446
post merge changes
2026-02-20 01:45:41 -05:00
Jai Suphavadeeprasit
c89854a350
debug changes
2026-02-20 01:45:41 -05:00
Jai Suphavadeeprasit
0510ca9b72
found bug
2026-02-20 01:45:41 -05:00
Jai Suphavadeeprasit
fb23014dcc
base env debugging
2026-02-20 01:45:41 -05:00
Jai Suphavadeeprasit
ea2b388435
base env debugging
2026-02-20 01:45:41 -05:00
Jai Suphavadeeprasit
e814007575
base env debugging
2026-02-20 01:45:41 -05:00
Jai Suphavadeeprasit
b492ac4fce
on policy changes
2026-02-20 01:45:41 -05:00
Jai Suphavadeeprasit
3fdaff9bb4
Fix math_server_zero.py to support CLI OpenAI arguments
...
Change ServerBaseline to APIServerConfig in config_init() so that
--openai.base_url and other CLI arguments work for on-policy distillation.
Co-authored-by: Cursor <cursoragent@cursor.com>
2026-02-20 01:45:41 -05:00
Jai Suphavadeeprasit
6bc962c746
initial commit
2026-02-20 01:45:41 -05:00
dmahan93
708b42a00f
Merge pull request #378 from johnh4098/add-regex-generation-env
...
Add regex generation environment for community
2026-02-18 12:37:32 -08:00
dmahan93
e2abc5e1a0
Merge pull request #377 from victlop/cleanup/remove-redundant-import-comments
...
chore: remove redundant inline comments from swe_rl_env.py imports
2026-02-18 12:35:22 -08:00
dmahan93
169bf92845
Merge pull request #382 from NousResearch/pre-commit-ci-update-config
...
[pre-commit.ci] pre-commit autoupdate
2026-02-18 12:26:54 -08:00
pre-commit-ci[bot]
5dd52af0ef
[pre-commit.ci] pre-commit autoupdate
...
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.15.0 → v0.15.1](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.0...v0.15.1 )
2026-02-16 16:40:51 +00:00
pre-commit-ci[bot]
53a69d30e1
[pre-commit.ci] auto fixes from pre-commit.com hooks
...
for more information, see https://pre-commit.ci
2026-02-11 19:47:28 +00:00
johnh4098
86d5163316
Add regex generation environment for community
2026-02-11 23:04:47 +03:30
victlop
4c18b883c5
Merge branch 'main' into cleanup/remove-redundant-import-comments
2026-02-11 19:14:59 +03:30
victlop
a1823f99fb
chore: remove redundant inline comments from swe_rl_env.py imports
2026-02-11 19:08:47 +03:30
dmahan93
81b2d4daab
Merge pull request #375 from NousResearch/pre-commit-ci-update-config
...
[pre-commit.ci] pre-commit autoupdate
2026-02-09 21:09:44 -08:00
dmahan93
9ffd4de275
Merge pull request #362 from ansulx/fix/trl-vllm-completion-test
...
Add regression test for TRL vLLM completion wrapper
2026-02-09 21:06:12 -08:00
dmahan93
1580ab5934
Merge pull request #365 from alireza78a/fix/replace-debug-prints-with-logger
...
fix: replace debug print statements with logger
2026-02-09 21:01:38 -08:00
dmahan93
31a1cd1a8e
Merge pull request #355 from Ridwannurudeen/docs/improve-setup-and-troubleshooting
...
[docs] Clarify prerequisites, fix Python version inconsistency, and add troubleshooting section
2026-02-09 20:58:49 -08:00
dmahan93
17015f5f96
Merge pull request #373 from NousResearch/add-tokenizer-config-to-servers
...
add tokenizer name config to set the vllm/sglang tokenizer
2026-02-09 20:47:51 -08:00
pre-commit-ci[bot]
41df2a3701
[pre-commit.ci] pre-commit autoupdate
...
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.14.14 → v0.15.0](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.14...v0.15.0 )
2026-02-09 23:25:15 +00:00
Dakota
7d6aeb9bbf
add tokenizer name config to set the vllm/sglang tokenizer to something different if needed
2026-02-09 15:26:29 -06:00
dmahan93
13f282aabc
Merge pull request #370 from alireza78a/fix/minor-bug-fixes
...
fix duplicate code + add safety checks
2026-02-09 13:10:28 -08:00
Alireza
6b92ee16ec
fix duplicate code + add safety checks
2026-02-09 10:58:49 +03:30
Ridwan Nurudeen
b03b09febc
Merge branch 'main' into docs/improve-setup-and-troubleshooting
2026-02-07 19:33:44 +01:00
alireza78a
1303cb59e8
fix: replace debug print statements with logger in dataset_env and infinimath_env
2026-02-07 14:51:33 +00:00
Ansul
3b9b67a3ad
Merge branch 'main' into fix/trl-vllm-completion-test
2026-02-06 02:13:29 +05:30
ansulx
d97f366ae0
Add regression test for TRL vLLM completion wrapper
...
Ensure the TRL vLLM completion wrapper returns a Completion with text so issue #183 stays covered.
2026-02-06 01:57:16 +05:30
dmahan93
7da681ec46
Merge pull request #359 from NousResearch/add-dummy-managed-server-for-openai
...
Add dummy openai managed server
2026-02-04 14:28:22 -08:00
Dakota
9ff24bf370
change to 128 tokens to support low length rejection
2026-02-04 16:23:30 -06:00
Dakota
10f651289c
Add dummy openai managed server
2026-02-04 15:16:36 -06:00
Ridwan Nurudeen
cc4b1f61a3
Revert badge change per reviewer request
2026-02-02 22:05:09 +01:00
Ridwannurudeen
5e2e84835b
[docs] Clarify prerequisites, fix Python version inconsistency, and add troubleshooting section
2026-02-01 23:39:37 +01:00
Teknium
462abbebf7
Merge pull request #339 from VolodymyrBg/bg
...
chore: fix typos
2026-01-31 09:03:17 -08:00
Teknium
efc85528bc
Merge pull request #338 from windlgrass/fix-init-current-item
...
fix: initialize current_item in __init__ to prevent AttributeError
2026-01-31 09:02:06 -08:00
Teknium
a2330dc099
Merge pull request #334 from HusseinAdeiza/fix-typos-docs
...
Fix typos in SLURM.md
2026-01-31 08:59:53 -08:00
Teknium
c2f0de563e
Merge branch 'main' into fix-typos-docs
2026-01-31 08:57:23 -08:00
Teknium
4bbea4ec8e
Merge pull request #330 from windlgrass/fix-duplicate-code
...
fix: remove duplicate code in instruction files
2026-01-31 08:55:26 -08:00
Teknium
8b22416dd4
Merge branch 'main' into fix-duplicate-code
2026-01-31 08:52:43 -08:00
VolodymyrBg
f285bbd417
Update refusalbench_environment.py
2026-01-29 12:43:15 +02:00
VolodymyrBg
94f29eac18
Update simpleqa_eval.py
2026-01-29 12:42:28 +02:00
VolodymyrBg
347edc9188
Update instructions.py
2026-01-29 12:31:52 +02:00
VolodymyrBg
466fd96b41
Update patient.py
2026-01-29 12:16:31 +02:00
VolodymyrBg
39f3509965
Update instruction_following_algorithm_environment.py
2026-01-29 11:22:05 +02:00
VolodymyrBg
1eb0d72099
Update FAQ.md
2026-01-29 10:43:47 +02:00
VolodymyrBg
e0744adf28
Update README.md
2026-01-29 10:23:53 +02:00