BLEUBERI/arena_analysis/README.md
2025-06-04 20:36:43 +00:00

360 B

Human agreement results are in arena_1k_final_results/arena_1k_new_filtered_aggregate.json. It was obtained by running run_arena_analysis.py.

To compute the human agreement of the BLEU+RM combined metrics, run compute_bleu_plus_rm.py.