BLEUBERI/eval
2025-06-04 20:36:43 +00:00
..
arena-hard initial commit 2025-06-04 20:36:43 +00:00
arena-hard-v2.0 initial commit 2025-06-04 20:36:43 +00:00
FastChat initial commit 2025-06-04 20:36:43 +00:00
WildBench initial commit 2025-06-04 20:36:43 +00:00
README.md initial commit 2025-06-04 20:36:43 +00:00
run_all_evals.sh initial commit 2025-06-04 20:36:43 +00:00
show_eval_results.sh initial commit 2025-06-04 20:36:43 +00:00

To display benchmark results for models reported in the paper, run show_eval_results.sh.

To run a model on all benchmarks, see run_all_evals.sh.