Commit graph

91 commits

Author SHA1 Message Date
pre-commit-ci[bot]
5cfd1929f1 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-03-02 11:18:52 -05:00
Jai Suphavadeeprasit
119721ef3d evals errors 2026-03-02 11:18:52 -05:00
Jai Suphavadeeprasit
fb1d983757 evals errors 2026-03-02 11:18:52 -05:00
Jai Suphavadeeprasit
00801646d7 evals erros 2026-03-02 11:18:52 -05:00
Jai Suphavadeeprasit
dedb399911 evals 2026-03-02 11:18:52 -05:00
Jai Suphavadeeprasit
f78c821b8b evals 2026-03-02 11:18:52 -05:00
Jai Suphavadeeprasit
58a3fb8b14 pipelineRL 2026-03-02 11:18:52 -05:00
Alvarez
d762c229e2
Update instructions.py 2026-02-27 10:23:47 +01:00
Teknium
462abbebf7
Merge pull request #339 from VolodymyrBg/bg
chore: fix typos
2026-01-31 09:03:17 -08:00
Teknium
8b22416dd4
Merge branch 'main' into fix-duplicate-code 2026-01-31 08:52:43 -08:00
VolodymyrBg
f285bbd417
Update refusalbench_environment.py 2026-01-29 12:43:15 +02:00
VolodymyrBg
94f29eac18
Update simpleqa_eval.py 2026-01-29 12:42:28 +02:00
VolodymyrBg
347edc9188
Update instructions.py 2026-01-29 12:31:52 +02:00
dmahan93
cf2b280d52
Merge pull request #325 from crStiv/typo
fix: multiple typos of different importance
2026-01-26 11:00:44 -08:00
Wind
42601e2325
Update instructions_utils.py 2026-01-26 17:24:12 +07:00
Wind
7feb826fed
Update instructions_registry.py 2026-01-26 17:23:39 +07:00
Wind
883043de49
Update instructions.py 2026-01-26 17:14:57 +07:00
balyan.sid@gmail.com
4ba69d3a80 revert to using evalbase 2026-01-23 23:41:32 +05:30
balyan.sid@gmail.com
5a20abdce7 switch eval to use managed server adapter impl. moved managed server
adapter
2026-01-23 23:26:29 +05:30
crStiv
ee97038408
Fix typos in instruction description methods
Corrected typos in the docstring for build_description and another function.
2026-01-19 23:58:55 +02:00
Siddharth Balyan
7f28c52994
Merge branch 'main' into sid/verifiers 2026-01-16 11:50:27 +05:30
balyan.sid@gmail.com
c56af35eaa switch to evalbase for verifiers_eval.py 2026-01-15 11:34:40 +05:30
teknium
00a0f5397a Merge branch 'add_reasoning_handling_draft' of https://github.com/NousResearch/atropos into add_reasoning_handling_draft 2026-01-14 13:38:08 +00:00
teknium
3a854cc3af fix linter 2026-01-14 13:38:04 +00:00
pre-commit-ci[bot]
79a55ff186 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-01-13 07:30:33 +00:00
teknium
2a7dd49328 Merge branch 'add_reasoning_handling_draft' of https://github.com/NousResearch/atropos into add_reasoning_handling_draft 2026-01-13 07:29:48 +00:00
teknium
b33cb7f943 A bit more updates for robustness 2026-01-13 07:29:43 +00:00
Teknium
837fc237ee
Merge branch 'main' into add_reasoning_handling_draft 2026-01-12 09:45:38 -08:00
pre-commit-ci[bot]
7907ffd0ad [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-01-12 05:05:11 +00:00
balyan.sid@gmail.com
24b4488c60 clean up eval, pin verifiers version 2026-01-12 10:34:05 +05:30
pre-commit-ci[bot]
d98bc6d9fc [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-01-12 10:34:05 +05:30
balyan.sid@gmail.com
cf636595d2 rework server and eval for rl rollout. add in asyncmanagedserver for
verifiers
2026-01-12 10:34:05 +05:30
pre-commit-ci[bot]
3449a4c23d [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2026-01-12 10:34:05 +05:30
balyan.sid@gmail.com
5b09ad86f4 update readme, add sft-datagen to verifiers_server 2026-01-09 19:20:41 +05:30
balyan.sid@gmail.com
636715bb08 add wandb to eval 2026-01-09 16:51:19 +05:30
balyan.sid@gmail.com
dda85430da fix docstrings 2026-01-09 16:25:44 +05:30
balyan.sid@gmail.com
9d5cd2b593 fix: improve verifiers environments consistency and correctness
- verifiers_server.py: consistent dataset column selection for train/test,
  remove redundant comments, preserve float precision for scores
- verifiers_eval.py: add env_config_cls, fix constructor signature to match
  BaseEnv (slurm bool), make stub methods raise NotImplementedError
2026-01-09 16:21:12 +05:30
Teknium
11ebecd93f
Merge branch 'main' into add-eval-runner 2026-01-05 15:46:39 -08:00
teknium
cb6bf37e68 update name of eval example 2026-01-05 23:46:27 +00:00
teknium
747fbc9285 fix linting 2025-12-30 11:56:21 +00:00
teknium
62fa51240c Add support for reasoning models and their variety of providers/endpoints 2025-12-30 00:23:00 +00:00
pre-commit-ci[bot]
f7fe9d612b [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-12-28 12:32:56 +00:00
teknium
b912983e5e Merge branch 'port_many_evals' of https://github.com/NousResearch/atropos into port_many_evals 2025-12-28 12:32:14 +00:00
teknium
c3f7c8dea6 final 2025-12-28 12:32:12 +00:00
pre-commit-ci[bot]
55e50f5782 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-12-28 12:29:37 +00:00
teknium
b975a315fe linters 2025-12-28 12:28:52 +00:00
pre-commit-ci[bot]
1d4275d441 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-12-28 04:12:17 +00:00
teknium
ea6db6fe92 Merge branch 'port_many_evals' of https://github.com/NousResearch/atropos into port_many_evals 2025-12-28 04:11:32 +00:00
teknium
bcfbd647e3 fix some bugs 2025-12-28 04:09:34 +00:00
pre-commit-ci[bot]
52110f3fb4 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-12-28 01:45:06 +00:00