Commit graph

784 commits

Author SHA1 Message Date
dmahan93
8284a0b95c
Update pyproject.toml 2025-07-16 15:38:16 -05:00
dmahan93
b65bc1b4ba
version update 2025-07-16 15:38:04 -05:00
Teknium
62cee8ac66
Merge pull request #209 from NousResearch/add-pairwise-judge-environment
Add LLM as a judge environment for eval and train based on RewardBench
2025-07-16 13:37:09 -07:00
pre-commit-ci[bot]
6455c305e6 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-07-16 17:51:15 +00:00
teknium
542185bbcc Merge branch 'add-pairwise-judge-environment' of https://github.com/NousResearch/atropos into add-pairwise-judge-environment 2025-07-16 17:48:44 +00:00
teknium
a43520e619 one last linter... 2025-07-16 17:48:43 +00:00
pre-commit-ci[bot]
eab2c938ea [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-07-16 16:58:42 +00:00
teknium
18f228615d linter stuff 2025-07-16 16:57:51 +00:00
pre-commit-ci[bot]
ffc210e470 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-07-16 16:51:19 +00:00
teknium
2f37714e84 Merge branch 'add-pairwise-judge-environment' of https://github.com/NousResearch/atropos into add-pairwise-judge-environment 2025-07-16 16:50:04 +00:00
teknium
0113dc906b add a bunch of extra debugging traces - configurable 2025-07-16 16:49:42 +00:00
Teknium
145762f64a
Merge pull request #212 from sky-coderay/main
fix: correct quantum environment repository URL
2025-07-16 02:45:34 -07:00
Skylar Ray
e889324171
fix: correct quantum environment repository URL 2025-07-16 11:00:45 +03:00
pre-commit-ci[bot]
1af508b27f [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-07-16 07:46:52 +00:00
teknium
10bb22f557 adding debugging 2025-07-16 07:46:17 +00:00
pre-commit-ci[bot]
7d980372d3 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-07-15 18:40:26 +00:00
teknium
02ad3e8661 Merge branch 'add-pairwise-judge-environment' of https://github.com/NousResearch/atropos into add-pairwise-judge-environment 2025-07-15 18:39:52 +00:00
teknium
8aa540275b add to the envs readme 2025-07-15 18:39:50 +00:00
pre-commit-ci[bot]
9f3e2ee460 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-07-15 18:24:13 +00:00
teknium
856a8455b1 please the precommit gods 2025-07-15 18:20:44 +00:00
dmahan93
b1914cdf93
Merge pull request #211 from Myashka/main
Correct completion length calculation in BaseEnv
2025-07-15 10:07:38 -05:00
pre-commit-ci[bot]
3d2d9e67fa [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-07-15 11:42:46 +00:00
pre-commit-ci[bot]
c053a9f134 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-07-15 11:40:22 +00:00
teknium
ce1f72059c Merge branch 'add-pairwise-judge-environment' of https://github.com/NousResearch/atropos into add-pairwise-judge-environment 2025-07-15 11:39:46 +00:00
teknium
47c396c43f switch to chat completions endpoint to eval closed lab stuff 2025-07-15 11:39:29 +00:00
Alexey Gorbatovski
53984580c8 Bug fix 2025-07-15 14:37:55 +03:00
pre-commit-ci[bot]
818ec9d7c1 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-07-15 11:18:03 +00:00
teknium
982645ce73 Implement proper ties category scoring 2025-07-15 11:16:15 +00:00
dmahan93
9dbef4e552
Merge pull request #210 from NousResearch/pre-commit-ci-update-config
[pre-commit.ci] pre-commit autoupdate
2025-07-14 17:59:05 -05:00
pre-commit-ci[bot]
110066a700
[pre-commit.ci] pre-commit autoupdate
updates:
- [github.com/astral-sh/ruff-pre-commit: v0.12.2 → v0.12.3](https://github.com/astral-sh/ruff-pre-commit/compare/v0.12.2...v0.12.3)
2025-07-14 16:36:32 +00:00
pre-commit-ci[bot]
41c847ddf4 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-07-14 09:43:20 +00:00
teknium
ef04098718 glitch 2025-07-14 09:42:44 +00:00
teknium
51d4d52765 Merge branch 'add-pairwise-judge-environment' of https://github.com/NousResearch/atropos into add-pairwise-judge-environment 2025-07-14 09:42:21 +00:00
teknium
9607880f3d Lots of updates to the environment to cleanup, add more metrics, make more robust - ties has an issue though 2025-07-14 09:39:00 +00:00
hjc-puro
f5f7a77608 Merge branch 'add-pairwise-judge-environment' of github.com:NousResearch/atropos into add-pairwise-judge-environment 2025-07-12 22:51:54 +00:00
hjc-puro
04e69d4a19 appease precommit 2025-07-12 22:51:39 +00:00
pre-commit-ci[bot]
0e743f7921 [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-07-12 22:41:48 +00:00
hjc-puro
a94e4c9bf0 autoscale metrics table 2025-07-12 22:41:14 +00:00
pre-commit-ci[bot]
107809260d [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
2025-07-12 11:23:13 +00:00
teknium
e83d796c74 add pairwise judgement environment 2025-07-12 11:15:56 +00:00
hjc-puro
75a4264f8d
Merge pull request #208 from NousResearch/2025-07-08-evals
Add `evaluate_log` method, gsm8k example
2025-07-12 06:45:05 +08:00
hjc-puro
6e9baaf9d8 table 2025-07-11 09:52:19 +00:00
hjc-puro
72210cf4ad rename fn 2025-07-11 04:04:55 +00:00
hjc-puro
352e1b8f88 comments 2025-07-11 03:55:16 +00:00
hjc-puro
d133ba3867 comment 2025-07-11 03:54:03 +00:00
hjc-puro
ccb8eaf230 move table to util 2025-07-11 03:52:24 +00:00
hjc-puro
5e61331360 simplify schema 2025-07-11 03:49:49 +00:00
hjc-puro
0d4ce37b73 add eval types 2025-07-11 03:36:55 +00:00
hjc-puro
290e087fc5 remove some imports 2025-07-11 03:25:10 +00:00
hjc-puro
68da3809e2 move table to display util 2025-07-11 02:06:56 +00:00