Update simpleqa_eval.py

2026-04-25 17:10:42 +00:00 · 2026-01-29 12:42:28 +02:00 · 2026-01-29 12:42:28 +02:00 · 94f29eac18
commit 94f29eac18
parent 347edc9188
1 changed files with 1 additions and 1 deletions
--- a/environments/eval_environments/simpleqa_eval.py
+++ b/environments/eval_environments/simpleqa_eval.py
@ -113,7 +113,7 @@ Also note the following things:
 - Do not punish predicted answers if they omit information that would be clearly inferred from the question.
 - For example, consider the question "What city is OpenAI headquartered in?" and the gold target "San Francisco, California". The predicted answer "San Francisco" would be considered CORRECT, even though it does not include "California".  # noqa: E501
 - Consider the question "What award did A pretrainer's guide to training data: Measuring the effects of data age, domain coverage, quality, & toxicity win at NAACL '24?", the gold target is "Outstanding Paper Award". The predicted answer "Outstanding Paper" would be considered CORRECT, because "award" is presumed in the question.  # noqa: E501
- For the question "What is the height of Jason Wei in meters?", the gold target is "1.73 m". The predicted answer "1.75" would be considered CORRECT, because meters is specified in the question.  # noqa: E501
+- For the question "What is the height of Jason Wei in meters?", the gold target is "1.73 m". The predicted answer "1.75" would be considered CORRECT, because meters are specified in the question.  # noqa: E501
 - For the question "What is the name of Barack Obama's wife?", the gold target is "Michelle Obama". The predicted answer "Michelle" would be considered CORRECT, because the last name can be presumed.  # noqa: E501
 - Do not punish for typos in people's name if it's clearly the same name.
 - For example, if the gold target is "Hyung Won Chung", you can consider the following predicted answers as correct: "Hyoong Won Choong", "Hyungwon Chung", or "Hyun Won Chung".  # noqa: E501