BLEUBERI/arena_analysis
2025-06-04 20:36:43 +00:00
..
arena_1k_final initial commit 2025-06-04 20:36:43 +00:00
arena_1k_final_results initial commit 2025-06-04 20:36:43 +00:00
ref_outputs initial commit 2025-06-04 20:36:43 +00:00
compute_bleu_plus_rm.py initial commit 2025-06-04 20:36:43 +00:00
README.md initial commit 2025-06-04 20:36:43 +00:00
run_arena_analysis.py initial commit 2025-06-04 20:36:43 +00:00

Human agreement results are in arena_1k_final_results/arena_1k_new_filtered_aggregate.json. It was obtained by running run_arena_analysis.py.

To compute the human agreement of the BLEU+RM combined metrics, run compute_bleu_plus_rm.py.