first steps for automatic generation of gsm generator functions

2026-04-19 12:58:07 +00:00 · 2025-01-29 17:55:37 +01:00 · 2025-01-29 17:55:37 +01:00 · d6c9a534af
commit d6c9a534af
parent f7313d409c
1 changed files with 258 additions and 0 deletions
--- a/notebooks/gsm_symbolic.ipynb
+++ b/notebooks/gsm_symbolic.ipynb
@ -0,0 +1,258 @@
+{
+ "cells": [
+  {
+   "cell_type": "code",
+   "execution_count": 14,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "prompt_template = \"\"\"You need to generate python code for a synthetic procedural dataset. The dataset is similar to OpenAI's GSM8K which contains grade-school level math questions in natural language.\n",
+    "\n",
+    "Here is a the SOURCE item from the dataset which you should translate into a python generator:\n",
+    "\n",
+    "```json\n",
+    "{0}\n",
+    "```\n",
+    "\n",
+    "As you can see we have already `question_annotated` and `answer_annotated`, but they are not really native python functions. The variable names have these `$` prefixes etc. Could you generate python code which would generate synthetic questions and answers, i.e. with numerical variables from some ranges which would make sense for the given case? Beside the question and answer I also need some metadata e.g. in a dict about the variables used.\n",
+    "\n",
+    "I would like to use the generator function later to generate many different variants of of questions and answers based on the same template.\n",
+    "To control the difficulty I want to provide a floating point `difficulty` factor which could be used to scale the numeric ranges .. but please ensure the values integers (e.g cast back to int). If there are variables for which no values are provided like male_names, objects etc. please generate a list of values to sample from that fits  in this context.\n",
+    "\n",
+    "1. To make it modular and testable let's split the generator into one function called `generate_from_variables()` which gets the input variables and generates the question and answer texts. It should calculate the answer value from the inputs and the main randomized generator `generate_example()` (see below). \n",
+    "\n",
+    "2. The generator function should have a signature like`def generate_example(rng: Random, difficulty: float = 1.0) -> dict`.\n",
+    "\n",
+    "The output dict should contain:\n",
+    "{{\n",
+    "  'question': '<the generated question>',\n",
+    "  'answer': '<the_final_answer>',  # here only the final answer, e.g. the number\n",
+    "  'metadata': {{\n",
+    "    'difficulty': difficulty,\n",
+    "    'answer_value:': <numeric_answer_value>,\n",
+    "    'answer_cot': '<full_long_form_answer>' # chain of thought, similar to 'answer' in the SOURCE\n",
+    "    'variables': {{\n",
+    "        ...  # the variable used\n",
+    "    }}\n",
+    "  }}\n",
+    "}}\n",
+    "\n",
+    "3. Write a simple `original_example()` function which calls `generate_from_variables()` and passes the original input values from SOURCE use in the json example above (in order to compare the output).\n",
+    "\n",
+    "Your task:\n",
+    "\n",
+    "- Generate reasonable random values for all the variables\n",
+    "- Ensure mathematical consistency (total distance is divisible by distance per interval)\n",
+    "- Create natural language question and answer texts\n",
+    "- Include metadata about the variables and solution\n",
+    "\n",
+    "\n",
+    "I'll provide an example:\n",
+    "\n",
+    "INPUT: Original entry from dataset:\n",
+    "```json\n",
+    "{{\n",
+    "  \"question\": \"A fog bank rolls in from the ocean to cover a city. It takes 10 minutes to cover every 3 miles of the city. If the city is 42 miles across from the oceanfront to the opposite inland edge, how many minutes will it take for the fog bank to cover the whole city?\",\n",
+    "  \"answer\": \"The city will be covered in 42 / 3 = <<42/3=14>>14 intervals of 10 minutes.\\nThus, it will take 14 * 10 = <<14*10=140>>140 minutes for the fog to cover the whole city.\\n#### 140\",\n",
+    "  \"id_orig\": 103,\n",
+    "  \"id_shuffled\": 1,\n",
+    "  \"question_annotated\": \"A fog bank rolls in from the ocean to cover a city. It takes {{t,10}} minutes to cover every {{d,3}} miles of the city. If the city is {{y,42}} miles across from the oceanfront to the opposite inland edge, how many minutes will it take for the fog bank to cover the whole city?\\n\\n#init:\\n- $t = range(2, 500)\\n- $d = range(2, 100)\\n- $y=range(2, 100)\\n\\n#conditions:\\n- is_int(y/d)\\n\\n#answer: y//d*t\",\n",
+    "  \"answer_annotated\": \"The city will be covered in {{y}}/ {{d}} = <<{{y}}/{{d}}={{y//d}}>>{{y//d}} intervals of {{t}} minutes.\\nThus, it will take {{y//d}} * {{t}} = <<{{y//d}}*{{t}}={{y//d*t}}>>{{y//d*t}} minutes for the fog to cover the whole city.\\n#### {{y//d*t}}\"\n",
+    "}}\n",
+    "```\n",
+    "\n",
+    "OUTPUT: Output in the form which should be generated\n",
+    "```python\n",
+    "from random import Random\n",
+    "from typing import Dict, Any\n",
+    "\n",
+    "def generate_from_variables(time_per_interval: int, distance_per_interval: int, total_distance: int) -> Dict[str, Any]:\n",
+    "    intervals = total_distance // distance_per_interval\n",
+    "    total_time = intervals * time_per_interval\n",
+    "    \n",
+    "    question = f\"A fog bank rolls in from the ocean to cover a city. It takes {{time_per_interval}} minutes to cover every {{distance_per_interval}} miles of the city. If the city is {{total_distance}} miles across from the oceanfront to the opposite inland edge, how many minutes will it take for the fog bank to cover the whole city?\"\n",
+    "    \n",
+    "    answer_cot = f\"The city will be covered in {{total_distance}} / {{distance_per_interval}} = {{intervals}} intervals of {{time_per_interval}} minutes.\\nThus, it will take {{intervals}} * {{time_per_interval}} = {{total_time}} minutes for the fog to cover the whole city.\\n#### {{total_time}}\"\n",
+    "    \n",
+    "    return {{\n",
+    "        'question': question,\n",
+    "        'answer': f'{{total_time}}',\n",
+    "        'answer_cot': answer,\n",
+    "        'answer_value': total_time,\n",
+    "        'variables': {{\n",
+    "            'time_per_interval': time_per_interval,\n",
+    "            'distance_per_interval': distance_per_interval, \n",
+    "            'total_distance': total_distance,\n",
+    "            'intervals': intervals\n",
+    "        }}\n",
+    "    }}\n",
+    "\n",
+    "def generate_example(rng: Random, difficulty: float = 1.0) -> Dict[str, Any]:\n",
+    "    # Generate random values scaled by difficulty\n",
+    "    distance_per_interval = int(rng.randint(2, int(10 * difficulty)))\n",
+    "    time_per_interval = int(rng.randint(5, int(30 * difficulty)))\n",
+    "    \n",
+    "    # Ensure total distance is divisible by distance_per_interval\n",
+    "    num_intervals = rng.randint(2, int(20 * difficulty))\n",
+    "    total_distance = distance_per_interval * num_intervals\n",
+    "    \n",
+    "    result = generate_from_variables(time_per_interval, distance_per_interval, total_distance)\n",
+    "    \n",
+    "    return {{\n",
+    "        'question': result['question'],\n",
+    "        'answer': result['answer'],\n",
+    "        'metadata': {{\n",
+    "            'answer_cot': result['answer_cot'],\n",
+    "            'difficulty': difficulty,\n",
+    "            'variables': result['variables']\n",
+    "        }}\n",
+    "    }}\n",
+    "\n",
+    "def original_example() -> Dict[str, Any]:\n",
+    "   return generate_from_variables(10, 3, 42)\n",
+    "```\n",
+    "\n",
+    "Just generate the three python functions for the SOURCE dataset item - no additional explanation.\n",
+    "\"\"\""
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 19,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "The dotenv extension is already loaded. To reload it, use:\n",
+      "  %reload_ext dotenv\n"
+     ]
+    }
+   ],
+   "source": [
+    "# create open-router client, place your OPENROUTER_API_KEY in .env file\n",
+    "%load_ext dotenv\n",
+    "%dotenv\n",
+    "import os\n",
+    "import re\n",
+    "from pathlib import Path\n",
+    "from typing import Any, Iterable, Optional\n",
+    "from openai import OpenAI\n",
+    "from openai.types.chat import ChatCompletion, ChatCompletionMessageParam\n",
+    "import time\n",
+    "\n",
+    "def llm_generate(\n",
+    "    client: OpenAI,\n",
+    "    messages: Iterable[ChatCompletionMessageParam],\n",
+    "    sampling_params: dict[str, Any],\n",
+    ") -> ChatCompletion:\n",
+    "    max_retry = 3\n",
+    "    for trial in range(max_retry):\n",
+    "        try:\n",
+    "            return client.chat.completions.create(\n",
+    "                messages=messages,\n",
+    "                **sampling_params,\n",
+    "            )\n",
+    "        except Exception as e:\n",
+    "            print(\"failure response:\", e)\n",
+    "            time.sleep(trial * trial)  # quadratic backoff\n",
+    "            if trial == max_retry - 1:\n",
+    "                raise\n",
+    "\n",
+    "open_router_client = OpenAI(\n",
+    "    base_url=\"https://openrouter.ai/api/v1\",\n",
+    "    api_key=os.getenv(\"OPENROUTER_API_KEY\"),\n",
+    "    timeout=90.0,\n",
+    ")\n",
+    "\n",
+    "sampling_params = {\n",
+    "    \"model\": \"anthropic/claude-3.5-sonnet\",\n",
+    "    \"max_tokens\": 4096,\n",
+    "}"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 21,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "/home/koepf/code/open-thought/reasoning-gym/notebooks/../../../ml-gsm-symbolic/templates/symbolic\n"
+     ]
+    },
+    {
+     "data": {
+      "text/plain": [
+       "'from random import Random\\nfrom typing import Dict, Any\\n\\ndef generate_from_variables(item: str, n1: int, p: int, c1: str, c2: str, c3: str) -> Dict[str, Any]:\\n    more_cards = int(p/100 * n1)\\n    n2 = n1 + more_cards\\n    n3 = n1 + n2\\n    total = n3 + n3\\n\\n    question = f\"In a set of {item}\\'s cards, there are {n1} {c1} cards, and {p}% more {c2} cards. {c3} cards are as many as the sum of {c1} and {c2} cards. How many cards of all mentioned colors are there?\"\\n\\n    answer_cot = f\"There are {p}/100 * {n1} = {more_cards} more {c2} cards than {c1} cards.\\\\n\" \\\\\\n                 f\"Which means there are {n1} + {more_cards} = {n2} {c2} cards.\\\\n\" \\\\\\n                 f\"{c3} cards make up to {n1} + {n2} = {n3} cards.\\\\n\" \\\\\\n                 f\"So in total, there are {n3} + {n3} = {total} cards of different colors.\\\\n\" \\\\\\n                 f\"#### {total}\"\\n\\n    return {\\n        \\'question\\': question,\\n        \\'answer\\': str(total),\\n        \\'answer_cot\\': answer_cot,\\n        \\'answer_value\\': total,\\n        \\'variables\\': {\\n            \\'item\\': item,\\n            \\'n1\\': n1,\\n            \\'p\\': p,\\n            \\'c1\\': c1,\\n            \\'c2\\': c2, \\n            \\'c3\\': c3,\\n            \\'more_cards\\': more_cards,\\n            \\'total\\': total\\n        }\\n    }\\n\\ndef generate_example(rng: Random, difficulty: float = 1.0) -> Dict[str, Any]:\\n    items = [\"magician\", \"artist\", \"chef\", \"scientist\", \"athlete\"]\\n    colors = [\"red\", \"blue\", \"green\", \"yellow\", \"purple\", \"orange\"]\\n    \\n    item = rng.choice(items)\\n    c1, c2, c3 = rng.sample(colors, 3)\\n    \\n    n1 = int(rng.randint(20, int(81 * difficulty)))\\n    \\n    # Generate p ensuring p/100 * n1 is an integer\\n    while True:\\n        p = int(rng.randint(20, min(90, int(90 * difficulty))))\\n        if (p/100 * n1).is_integer():\\n            break\\n            \\n    result = generate_from_variables(item, n1, p, c1, c2, c3)\\n    \\n    return {\\n        \\'question\\': result[\\'question\\'],\\n        \\'answer\\': result[\\'answer\\'],\\n        \\'metadata\\': {\\n            \\'difficulty\\': difficulty,\\n            \\'answer_value\\': result[\\'answer_value\\'],\\n            \\'answer_cot\\': result[\\'answer_cot\\'],\\n            \\'variables\\': result[\\'variables\\']\\n        }\\n    }\\n\\ndef original_example() -> Dict[str, Any]:\\n    return generate_from_variables(\"magician\", 15, 60, \"red\", \"green\", \"yellow\")\\n'"
+      ]
+     },
+     "execution_count": 21,
+     "metadata": {},
+     "output_type": "execute_result"
+    }
+   ],
+   "source": [
+    "def generate_simple_request(user_prompt: str, developer_prompt: Optional[str] = None) -> list[dict]:\n",
+    "    prompt = []\n",
+    "    if developer_prompt is not None:\n",
+    "        prompt.append( { \"role\": \"system\", \"content\": developer_prompt } )\n",
+    "    \n",
+    "    prompt.append( { \"role\": \"user\", \"content\": user_prompt })\n",
+    "    return prompt\n",
+    "    \n",
+    "\n",
+    "def produce_generator(json_path: Path):\n",
+    "    json = json_path.read_text()\n",
+    "    user_request = prompt_template.format(json)\n",
+    "    \n",
+    "    input_messages = generate_simple_request(user_prompt=user_request)\n",
+    "    output =  llm_generate(open_router_client, input_messages, sampling_params)\n",
+    "\n",
+    "    response = output.choices[0].message.content\n",
+    "\n",
+    "    return response\n",
+    "    \n",
+    "\n",
+    "# clone the gsm-symbolic from apple somewhere and set the path here, `git clone https://github.com/apple/ml-gsm-symbolic.git``\n",
+    "path_to_gsmsym = Path(\"../../../ml-gsm-symbolic/templates/symbolic/\")\n",
+    "print(\"Reading templates from path: \", path_to_gsmsym.absolute())\n",
+    "\n",
+    "template_files = list(path_to_gsmsym.glob(\"*.json\"))\n",
+    "\n",
+    "# for testing just do it for the first entry\n",
+    "response_text = produce_generator(template_files[0])\n",
+    "\n",
+    "# extract python source section\n",
+    "result_match = re.search(r\"^```.*\\n((.*\\n)+)```\", response_text, flags=re.MULTILINE)\n",
+    "\n",
+    "pytho_source = result_match.group(1)\n",
+    "pytho_source"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "open-thought",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.5"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}