Andreas Koepf
|
477e1f85cc
|
verify that OPENROUTER_API_KEY env var is set
|
2025-02-26 22:15:30 +01:00 |
|
vncntt
|
29179f783e
|
fix sonnet eval_dir (#216)
* fix eval_dir
* add logging
|
2025-02-26 09:37:09 +01:00 |
|
Andreas Koepf
|
6d5168d1e5
|
add llama-3.3-70b-instruct algebra, algorithmic eval configs
|
2025-02-25 23:43:29 +01:00 |
|
joesharratt1229
|
56cc111ab3
|
Merge remote-tracking branch 'origin/consolidate_eval_script' into fix/eval
|
2025-02-25 18:10:07 +00:00 |
|
joesharratt1229
|
046c46c0bb
|
updated read me
|
2025-02-25 15:46:43 +00:00 |
|
Andreas Koepf
|
878f9bbc76
|
move r1 configs into r1 yaml/r1 subfolder
|
2025-02-25 16:24:30 +01:00 |
|
Andreas Koepf
|
e7ae82a831
|
consolidate eval scripts to have single eval.py
|
2025-02-25 16:13:22 +01:00 |
|
Andreas Köpf
|
2947038557
|
Merge pull request #182 from zafstojano/env/binary-alternation
feat(env): Binary Alternation
|
2025-02-21 17:27:16 +01:00 |
|
Andreas Koepf
|
3e7ff3b084
|
use native types List->list, Dict->dict, Set->set, Tuple->tuple
|
2025-02-21 15:15:38 +01:00 |
|
Zafir Stojanovski
|
77789257d3
|
include pre-parsed responses in json
|
2025-02-21 13:50:48 +01:00 |
|
Zafir Stojanovski
|
3d84816f95
|
system prompt for structured output, and parse such outputs
|
2025-02-12 10:44:42 +01:00 |
|
rishabhranawat
|
9e4870125d
|
[eval-v1] pre commit formatting
|
2025-02-10 21:50:22 -08:00 |
|
rishabhranawat
|
df5438498e
|
[eval-v1] add timer
|
2025-02-10 21:48:44 -08:00 |
|
rishabhranawat
|
247464a47d
|
[eval-v1] async to speed up inference/evaluation
|
2025-02-10 21:35:46 -08:00 |
|
rishabhranawat
|
0657222a8f
|
[eval-basic] remove large results files, add gitignore, only leave summary
|
2025-02-09 22:52:10 -08:00 |
|