mirror of
https://github.com/open-thought/reasoning-gym.git
synced 2026-04-19 12:58:07 +00:00
Add steps to synthesize CoTs with DeepSeekV3
This commit is contained in:
parent
3297fc1bc0
commit
94cd3c4d43
1 changed files with 177 additions and 47 deletions
|
|
@ -15,7 +15,9 @@
|
||||||
"- a natural language description of all inputs (function parameters) and outputs (function return values)\n",
|
"- a natural language description of all inputs (function parameters) and outputs (function return values)\n",
|
||||||
"- an input generator, which can generate a dictionary of valid inputs for the function\n",
|
"- an input generator, which can generate a dictionary of valid inputs for the function\n",
|
||||||
"\n",
|
"\n",
|
||||||
"This notebook seeks to experiment with prompting an LLM to this end, as a starting point. The raw code data is from this GitHub repository that the DeepSeek paper mentions as one of their raw code sources: https://github.com/TheAlgorithms/Python"
|
"This notebook seeks to experiment with prompting an LLM to this end, as a starting point. The raw code data is from this GitHub repository that the DeepSeek paper mentions as one of their raw code sources: https://github.com/TheAlgorithms/Python\n",
|
||||||
|
"\n",
|
||||||
|
"NOTE: Be careful with the raw code you input into this, as cells later execute the LLM-generated outputs."
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
|
|
@ -40,11 +42,11 @@
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": 2,
|
"execution_count": 9,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"prompt_template = \"\"\"\n",
|
"format_prompt_template = \"\"\"\n",
|
||||||
"You are tasked with preprocessing a raw file of Python code into a standard format. The format is made up of several components. Here is a very simple example of a raw code file:\n",
|
"You are tasked with preprocessing a raw file of Python code into a standard format. The format is made up of several components. Here is a very simple example of a raw code file:\n",
|
||||||
"\n",
|
"\n",
|
||||||
"def kg_to_pounds(weights):\n",
|
"def kg_to_pounds(weights):\n",
|
||||||
|
|
@ -61,7 +63,7 @@
|
||||||
"\n",
|
"\n",
|
||||||
"1. Cleaned reference code, with a main entrypoint function that takes all required arguments as parameters and returns all outputs.\n",
|
"1. Cleaned reference code, with a main entrypoint function that takes all required arguments as parameters and returns all outputs.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"The name of the main entrypoint function should be `main`. The parameters should be clearly named but do not require type hints. The function should return a dict mapping output names to values. The function should contain all the necessary code to perform the functionality, without splitting into several functions. The function should not print or otherwise output anything; results should be returned as part of the result dict.\n",
|
"The name of the main entrypoint function should be `main`. The parameters should be clearly named but do not require type hints. The function should return a dict mapping output names to values. The function should contain all the necessary code to perform the functionality, without splitting into several functions. The function should not print or otherwise output anything; results should be returned as part of the result dict. Ensure you include any imports necessary, prior to the function definition.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"Example function signature: `def main(weights_kg, days):`\n",
|
"Example function signature: `def main(weights_kg, days):`\n",
|
||||||
"\n",
|
"\n",
|
||||||
|
|
@ -146,65 +148,41 @@
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": 4,
|
"execution_count": 12,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [
|
"outputs": [
|
||||||
{
|
{
|
||||||
"name": "stdout",
|
"name": "stdout",
|
||||||
"output_type": "stream",
|
"output_type": "stream",
|
||||||
"text": [
|
"text": [
|
||||||
"raw_files/bitmask.py\n",
|
"raw_files/climbing_stairs.py\n",
|
||||||
"def main(task_performed, total_tasks):\n",
|
"def main(number_of_steps):\n",
|
||||||
" dp = [[-1 for _ in range(total_tasks + 1)] for _ in range(2 ** len(task_performed))]\n",
|
" assert isinstance(number_of_steps, int) and number_of_steps > 0, (\n",
|
||||||
" task = defaultdict(list)\n",
|
" f\"number_of_steps needs to be positive integer, your input {number_of_steps}\"\n",
|
||||||
" final_mask = (1 << len(task_performed)) - 1\n",
|
" )\n",
|
||||||
"\n",
|
" if number_of_steps == 1:\n",
|
||||||
" def count_ways_until(mask, task_no):\n",
|
" return {\"distinct_ways\": 1}\n",
|
||||||
" if mask == final_mask:\n",
|
" previous, current = 1, 1\n",
|
||||||
" return 1\n",
|
" for _ in range(number_of_steps - 1):\n",
|
||||||
" if task_no > total_tasks:\n",
|
" current, previous = current + previous, current\n",
|
||||||
" return 0\n",
|
" return {\"distinct_ways\": current}\n",
|
||||||
" if dp[mask][task_no] != -1:\n",
|
|
||||||
" return dp[mask][task_no]\n",
|
|
||||||
"\n",
|
|
||||||
" total_ways_util = count_ways_until(mask, task_no + 1)\n",
|
|
||||||
" for p in task[task_no]:\n",
|
|
||||||
" if mask & (1 << p):\n",
|
|
||||||
" continue\n",
|
|
||||||
" total_ways_util += count_ways_until(mask | (1 << p), task_no + 1)\n",
|
|
||||||
" \n",
|
|
||||||
" dp[mask][task_no] = total_ways_util\n",
|
|
||||||
" return dp[mask][task_no]\n",
|
|
||||||
"\n",
|
|
||||||
" for i in range(len(task_performed)):\n",
|
|
||||||
" for j in task_performed[i]:\n",
|
|
||||||
" task[j].append(i)\n",
|
|
||||||
"\n",
|
|
||||||
" total_ways = count_ways_until(0, 1)\n",
|
|
||||||
" return {\"total_ways\": total_ways}\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"---\n",
|
"---\n",
|
||||||
"\n",
|
"You are given an integer `number_of_steps` representing the number of steps on a staircase. Your task is to calculate the number of distinct ways to climb the staircase, where each time you can either climb 1 or 2 steps. Return the number of distinct ways as an integer.\n",
|
||||||
"You are given a list `task_performed` and an integer `total_tasks`. `task_performed` represents the tasks that can be performed by each person, where each sublist corresponds to the tasks a person can do. `total_tasks` is the total number of tasks (N). Your task is to calculate the total number of ways to distribute the tasks among the persons. Each person can do only one task, and a task can be done by only one person. Return the total number of ways.\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"---\n",
|
"---\n",
|
||||||
"\n",
|
|
||||||
"Input:\n",
|
"Input:\n",
|
||||||
" task_performed (list of list of int): List of tasks that each person can perform. Each sublist contains the tasks a person can do.\n",
|
" number_of_steps (int): The number of steps on the staircase. Must be a positive integer.\n",
|
||||||
" total_tasks (int): The total number of tasks.\n",
|
|
||||||
"\n",
|
"\n",
|
||||||
"Output:\n",
|
"Output:\n",
|
||||||
" return (dict): A dictionary with one key:\n",
|
" return (dict): A dictionary with one key:\n",
|
||||||
" - total_ways (int): The total number of ways to distribute the tasks.\n",
|
" - distinct_ways (int): The number of distinct ways to climb the staircase.\n",
|
||||||
"\n",
|
"\n",
|
||||||
"---\n",
|
"---\n",
|
||||||
"\n",
|
|
||||||
"def input_generator():\n",
|
"def input_generator():\n",
|
||||||
" import random\n",
|
" import random\n",
|
||||||
" M = random.randint(2, 5)\n",
|
" number_of_steps = random.randint(1, 100)\n",
|
||||||
" N = random.randint(2, 5)\n",
|
" return {\"number_of_steps\": number_of_steps}\n"
|
||||||
" task_performed = [random.sample(range(1, N + 1), random.randint(1, N)) for _ in range(M)]\n",
|
|
||||||
" return {\"task_performed\": task_performed, \"total_tasks\": N}\n"
|
|
||||||
]
|
]
|
||||||
}
|
}
|
||||||
],
|
],
|
||||||
|
|
@ -215,7 +193,7 @@
|
||||||
"\n",
|
"\n",
|
||||||
"raw_code = raw_file.read_text()\n",
|
"raw_code = raw_file.read_text()\n",
|
||||||
"\n",
|
"\n",
|
||||||
"prompt = prompt_template.format(raw_code)\n",
|
"prompt = format_prompt_template.format(raw_code)\n",
|
||||||
"\n",
|
"\n",
|
||||||
"messages = [\n",
|
"messages = [\n",
|
||||||
" {\"role\": \"user\", \"content\": prompt},\n",
|
" {\"role\": \"user\", \"content\": prompt},\n",
|
||||||
|
|
@ -227,13 +205,165 @@
|
||||||
},
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": 5,
|
"execution_count": 13,
|
||||||
"metadata": {},
|
"metadata": {},
|
||||||
"outputs": [],
|
"outputs": [],
|
||||||
"source": [
|
"source": [
|
||||||
"code, query, parameters, generator = response.choices[0].message.content.split(\"\\n---\\n\")"
|
"code, query, parameters, generator = response.choices[0].message.content.split(\"\\n---\\n\")"
|
||||||
]
|
]
|
||||||
},
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"The below cell executes arbitrary code, so be careful with what you run."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 14,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"def generate_io_pairs(main_code: str, input_generator_code: str, num_pairs: int = 100):\n",
|
||||||
|
" local_vars = {}\n",
|
||||||
|
" exec(main_code, {}, local_vars)\n",
|
||||||
|
" exec(input_generator_code, {}, local_vars)\n",
|
||||||
|
" io_pairs = []\n",
|
||||||
|
" for _ in range(num_pairs):\n",
|
||||||
|
" inputs = local_vars[\"input_generator\"]()\n",
|
||||||
|
" outputs = local_vars[\"main\"](**inputs)\n",
|
||||||
|
" io_pairs.append((inputs, outputs))\n",
|
||||||
|
" return io_pairs\n",
|
||||||
|
"\n",
|
||||||
|
"io_pairs = generate_io_pairs(code, generator, num_pairs=2)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 15,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"data": {
|
||||||
|
"text/plain": [
|
||||||
|
"[({'number_of_steps': 65}, {'distinct_ways': 27777890035288}),\n",
|
||||||
|
" ({'number_of_steps': 19}, {'distinct_ways': 6765})]"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"execution_count": 15,
|
||||||
|
"metadata": {},
|
||||||
|
"output_type": "execute_result"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"io_pairs"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "markdown",
|
||||||
|
"metadata": {},
|
||||||
|
"source": [
|
||||||
|
"Next we need to synthesize chains of thought from the LLM for use in building a supervised finetuning dataset. From the paper:\n",
|
||||||
|
"\n",
|
||||||
|
"> Since we aim for the input-output prediction tasks, we construct the prompt using a designed template to combine the function, the query, the reference code, and either a specific input or output. The response should ideally be a natural language CoT to reason about how to derive the correct output or a feasible input.\n",
|
||||||
|
"\n",
|
||||||
|
"The below prompts are from the paper."
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 16,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [],
|
||||||
|
"source": [
|
||||||
|
"synthetic_cot_prompt_prefix = \"\"\"\n",
|
||||||
|
"You are given a question that requires some input and output variables as follows:\n",
|
||||||
|
"\n",
|
||||||
|
"{0}\n",
|
||||||
|
"\n",
|
||||||
|
"The input and output requirements are as follows:\n",
|
||||||
|
"\n",
|
||||||
|
"{1}\n",
|
||||||
|
"\"\"\"\n",
|
||||||
|
"\n",
|
||||||
|
"synthetic_cot_prompt_suffix = \"\"\"\n",
|
||||||
|
"Tip: Here is a reference code snippet for this question. You can refer to this code to guide your reasoning but not copy spans of code directly.\n",
|
||||||
|
"\n",
|
||||||
|
"{3}\n",
|
||||||
|
"\"\"\"\n",
|
||||||
|
"\n",
|
||||||
|
"synthetic_cot_prompt_input_prediction = synthetic_cot_prompt_prefix + \"\"\"\n",
|
||||||
|
"Given the following output:\n",
|
||||||
|
"\n",
|
||||||
|
"{2}\n",
|
||||||
|
"\n",
|
||||||
|
"Can you predict a feasible input without writing any code? Please reason and put your final answer in the following json format: \"input\": <your input>, where <your input> should be a dictionary, even if the there is only one input variable, with keys strictly matching the input variables' names as specified.\n",
|
||||||
|
"\"\"\" + synthetic_cot_prompt_suffix\n",
|
||||||
|
"\n",
|
||||||
|
"synthetic_cot_prompt_output_prediction = synthetic_cot_prompt_prefix + \"\"\"\n",
|
||||||
|
"Given the following input:\n",
|
||||||
|
"\n",
|
||||||
|
"{2}\n",
|
||||||
|
"\n",
|
||||||
|
"Can you predict the output without writing any code? Please reason and put your final answer in the following json format: \"output\": <your output>, where <your output> should strictly match the the output requirement as specified.\n",
|
||||||
|
"\"\"\" + synthetic_cot_prompt_suffix"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 17,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"data": {
|
||||||
|
"text/plain": [
|
||||||
|
"'To determine the input `number_of_steps` that results in the output `{\\'distinct_ways\\': 27777890035288}`, we need to understand that this problem is related to the Fibonacci sequence. Specifically, the number of distinct ways to climb `n` steps, where you can climb either 1 or 2 steps at a time, is equal to the `(n+1)`-th Fibonacci number.\\n\\nGiven the output `27777890035288`, we need to find the integer `n` such that the `(n+1)`-th Fibonacci number is `27777890035288`.\\n\\nThe Fibonacci sequence grows exponentially, and the number `27777890035288` is a very large Fibonacci number. To find the corresponding `n`, we can use the fact that the Fibonacci sequence follows the recurrence relation:\\n\\n\\\\[ F(n) = F(n-1) + F(n-2) \\\\]\\n\\nGiven that `F(73) = 806515533049393` and `F(72) = 498454011879264`, it is clear that `27777890035288` is much smaller than `F(73)`. We need to find the exact `n` such that `F(n+1) = 27777890035288`.\\n\\nHowever, calculating Fibonacci numbers manually for large `n` is impractical. Instead, we can use the fact that `F(75) = 2111485077978050`, which is larger than `27777890035288`. Therefore, the `n` we are looking for must be between 72 and 75.\\n\\nBy checking Fibonacci numbers closer to `27777890035288`, we find that:\\n\\n\\\\[ F(74) = 1304969544928657 \\\\]\\n\\\\[ F(75) = 2111485077978050 \\\\]\\n\\nSince `27777890035288` is significantly larger than `F(74)` but smaller than `F(75)`, it is clear that `n` is 74.\\n\\nThus, the input `number_of_steps` should be 74, which corresponds to `F(75) = 27777890035288`.\\n\\nTherefore, the feasible input is:\\n\\n```json\\n{\"number_of_steps\": 74}\\n```'"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"execution_count": 17,
|
||||||
|
"metadata": {},
|
||||||
|
"output_type": "execute_result"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"def predict_input(query, parameters, output, reference_code):\n",
|
||||||
|
" messages = [\n",
|
||||||
|
" {\"role\": \"user\", \"content\": synthetic_cot_prompt_input_prediction.format(query, parameters, output, reference_code)},\n",
|
||||||
|
" ]\n",
|
||||||
|
" response = llm_generate(open_router_client, messages, sampling_params)\n",
|
||||||
|
" return response.choices[0].message.content\n",
|
||||||
|
"\n",
|
||||||
|
"predict_input(query, parameters, io_pairs[0][1], code)"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"cell_type": "code",
|
||||||
|
"execution_count": 18,
|
||||||
|
"metadata": {},
|
||||||
|
"outputs": [
|
||||||
|
{
|
||||||
|
"data": {
|
||||||
|
"text/plain": [
|
||||||
|
"'To solve this problem, we need to calculate the number of distinct ways to climb a staircase with `number_of_steps` steps, where you can either take 1 or 2 steps at a time. This problem is a classic example of a dynamic programming problem and is very similar to the Fibonacci sequence.\\n\\n### Reasoning:\\n- The number of distinct ways to climb `n` steps is equal to the sum of the number of distinct ways to climb `n-1` steps and the number of distinct ways to climb `n-2` steps. This is because from the `n-1`th step, you can take a single step to reach the `n`th step, and from the `n-2`th step, you can take two steps to reach the `n`th step.\\n- The base cases are:\\n - For `n = 1`, there is only 1 way to climb the staircase (taking a single step).\\n - For `n = 2`, there are 2 ways to climb the staircase (taking two single steps or one double step).\\n\\nThe number of distinct ways to climb `n` steps follows the Fibonacci sequence. The Fibonacci sequence is defined as follows:\\n- F(0) = 0\\n- F(1) = 1\\n- F(n) = F(n-1) + F(n-2) for n ≥ 2\\n\\nHowever, in our problem, the number of ways to climb `n` steps corresponds to F(n+1) in the Fibonacci sequence. For example:\\n- For `n = 1` (F(2)), there is 1 way.\\n- For `n = 2` (F(3)), there are 2 ways.\\n- For `n = 3` (F(4)), there are 3 ways.\\n- For `n = 4` (F(5)), there are 5 ways.\\n\\nGiven `number_of_steps = 19`, we need to calculate F(20).\\n\\nThe Fibonacci sequence up to F(20) is as follows:\\n- F(0) = 0\\n- F(1) = 1\\n- F(2) = 1\\n- F(3) = 2\\n- F(4) = 3\\n- F(5) = 5\\n- F(6) = 8\\n- F(7) = 13\\n- F(8) = 21\\n- F(9) = 34\\n- F(10) = 55\\n- F(11) = 89\\n- F(12) = 144\\n- F(13) = 233\\n- F(14) = 377\\n- F(15) = 610\\n- F(16) = 987\\n- F(17) = 1597\\n- F(18) = 2584\\n- F(19) = 4181\\n- F(20) = 6765\\n\\nTherefore, the number of distinct ways to climb a staircase with 19 steps is 6765.\\n\\n### Final Answer:\\n```json\\n{\"output\": {\"distinct_ways\": 6765}}\\n```'"
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"execution_count": 18,
|
||||||
|
"metadata": {},
|
||||||
|
"output_type": "execute_result"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"source": [
|
||||||
|
"def predict_output(query, parameters, input, reference_code):\n",
|
||||||
|
" messages = [\n",
|
||||||
|
" {\"role\": \"user\", \"content\": synthetic_cot_prompt_output_prediction.format(query, parameters, input, reference_code)},\n",
|
||||||
|
" ]\n",
|
||||||
|
" response = llm_generate(open_router_client, messages, sampling_params)\n",
|
||||||
|
" return response.choices[0].message.content\n",
|
||||||
|
"\n",
|
||||||
|
"predict_output(query, parameters, io_pairs[1][0], code)"
|
||||||
|
]
|
||||||
|
},
|
||||||
{
|
{
|
||||||
"cell_type": "code",
|
"cell_type": "code",
|
||||||
"execution_count": null,
|
"execution_count": null,
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue