mirror of
https://github.com/lilakk/BLEUBERI.git
synced 2026-04-19 12:58:12 +00:00
360 B
360 B
Human agreement results are in arena_1k_final_results/arena_1k_new_filtered_aggregate.json. It was obtained by running run_arena_analysis.py.
To compute the human agreement of the BLEU+RM combined metrics, run compute_bleu_plus_rm.py.