BLEUBERI/arena_analysis/README.md
2025-06-04 20:36:43 +00:00

3 lines
No EOL
360 B
Markdown

Human agreement results are in [`arena_1k_final_results/arena_1k_new_filtered_aggregate.json`](./arena_1k_final_results/arena_1k_new_filtered_aggregate.json). It was obtained by running [`run_arena_analysis.py`](./run_arena_analysis.py).
To compute the human agreement of the BLEU+RM combined metrics, run [`compute_bleu_plus_rm.py`](compute_bleu_plus_rm.py).