mirror of
https://github.com/open-thought/reasoning-gym.git
synced 2026-04-27 17:23:19 +00:00
update training dir with external eval details (#437)
* added games * added llama 3b training conf * update readme with details of external evals * readme update --------- Co-authored-by: joesharratt1229 <joesharratt1229@gmail.com>
This commit is contained in:
parent
5961a10145
commit
add527ada1
5 changed files with 374 additions and 0 deletions
26
training/evaluations/lmeh/llama_math_algebra.yaml
Normal file
26
training/evaluations/lmeh/llama_math_algebra.yaml
Normal file
|
|
@ -0,0 +1,26 @@
|
|||
task: llama_math_algebra
|
||||
dataset_path: EleutherAI/hendrycks_math
|
||||
process_docs: !function utils.process_docs
|
||||
dataset_name: algebra
|
||||
output_type: generate_until
|
||||
training_split: train
|
||||
test_split: test
|
||||
doc_to_text: "<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\nYou are a helpful AI Assistant that provides well-reasoned and detailed responses.\nYou first think about the reasoning process as an internal monologue and then provide the user with the answer.\nRespond in the following format:\n<think>\n...\n</think>\n<answer>\n...\n</answer><|eot_id|><|start_header_id|>user<|end_header_id|>\n\nSolve the following math problem efficiently and clearly:\n\n- For simple problems (2 steps or fewer):\nProvide a concise solution with minimal explanation.\n\n- For complex problems (3 steps or more):\nUse this step-by-step format:\n\n## Step 1: [Concise description]\n[Brief explanation and calculations]\n\n## Step 2: [Concise description]\n[Brief explanation and calculations]\n\n...\n\nRegardless of the approach, always conclude with:\n\nTherefore, the final answer is: $\\\\boxed{answer}$. I hope it is correct.\n\nWhere [answer] is just the final number or expression that solves the problem.\n\nProblem: {{ problem }}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n"
|
||||
process_results: !function utils.process_results
|
||||
doc_to_target: "{{answer if few_shot is undefined else solution}}"
|
||||
generation_kwargs:
|
||||
until:
|
||||
- "Problem:"
|
||||
- "</answer>"
|
||||
max_gen_toks: 4096
|
||||
do_sample: false
|
||||
temperature: 0
|
||||
metric_list:
|
||||
- metric: exact_match
|
||||
aggregation: mean
|
||||
higher_is_better: true
|
||||
num_fewshot: 0
|
||||
metadata:
|
||||
version: 1.0
|
||||
dataset_kwargs:
|
||||
trust_remote_code: true
|
||||
Loading…
Add table
Add a link
Reference in a new issue