teknium
|
47c396c43f
|
switch to chat completions endpoint to eval closed lab stuff
|
2025-07-15 11:39:29 +00:00 |
|
teknium
|
982645ce73
|
Implement proper ties category scoring
|
2025-07-15 11:16:15 +00:00 |
|
pre-commit-ci[bot]
|
41c847ddf4
|
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
|
2025-07-14 09:43:20 +00:00 |
|
teknium
|
ef04098718
|
glitch
|
2025-07-14 09:42:44 +00:00 |
|
teknium
|
51d4d52765
|
Merge branch 'add-pairwise-judge-environment' of https://github.com/NousResearch/atropos into add-pairwise-judge-environment
|
2025-07-14 09:42:21 +00:00 |
|
teknium
|
9607880f3d
|
Lots of updates to the environment to cleanup, add more metrics, make more robust - ties has an issue though
|
2025-07-14 09:39:00 +00:00 |
|
pre-commit-ci[bot]
|
107809260d
|
[pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci
|
2025-07-12 11:23:13 +00:00 |
|
teknium
|
e83d796c74
|
add pairwise judgement environment
|
2025-07-12 11:15:56 +00:00 |
|