BLEUBERI/eval at 70a66e4a167577167ff796678d8a264265d4f3cd - lilakk/BLEUBERI - Forgejo: Beyond coding. We Forge.

lilakk/BLEUBERI

mirror of https://github.com/lilakk/BLEUBERI.git synced 2026-04-19 12:58:12 +00:00

History

yapei 70a66e4a16 initial commit		2025-06-04 20:36:43 +00:00
..
arena-hard	initial commit	2025-06-04 20:36:43 +00:00
arena-hard-v2.0	initial commit	2025-06-04 20:36:43 +00:00
FastChat	initial commit	2025-06-04 20:36:43 +00:00
WildBench	initial commit	2025-06-04 20:36:43 +00:00
README.md	initial commit	2025-06-04 20:36:43 +00:00
run_all_evals.sh	initial commit	2025-06-04 20:36:43 +00:00
show_eval_results.sh	initial commit	2025-06-04 20:36:43 +00:00

README.md

To display benchmark results for models reported in the paper, run show_eval_results.sh.

To run a model on all benchmarks, see run_all_evals.sh.