diff --git a/GALLERY.md b/GALLERY.md index eeec0119..9e56a205 100644 --- a/GALLERY.md +++ b/GALLERY.md @@ -18,7 +18,6 @@ This gallery shows examples from all available datasets using their default conf - [fraction_simplification](#fraction_simplification) - [game_of_life](#game_of_life) - [gcd](#gcd) -- [gsm_symbolic](#gsm_symbolic) - [intermediate_integration](#intermediate_integration) - [lcm](#lcm) - [leg_counting](#leg_counting) @@ -843,32 +842,6 @@ Metadata: {'numbers': [297, 30], 'result': 3} ```` -### gsm_symbolic -Default configuration: -```python -seed = 42 -size = 500 -``` - -Example tasks: -```` -Example 1: -Question: There are currently 16 orange balls, 12 yellow balls, and 44 blue balls in the shop. orange balls cost $13, yellow balls cost $10 and blue balls cost $6. How much will the shop have received after all the balls are sold? -Answer: 592 -Metadata: {'difficulty': 1.0, 'answer_value': 592, 'answer_cot': 'For the orange balls, 16 balls * $13/ball = $208.\nFor the yellow balls, 12 balls * $10/ball = $120.\nFor the blue balls, 44 balls * $6/ball = $264.\nFor all balls, $208 + $120 + $264 = $592.\n#### 592', 'variables': {'store': 'shop', 'colors': ['orange', 'yellow', 'blue'], 'quantities': [16, 12, 44], 'prices': [13, 10, 6], 'currency': '$', 'subtotals': [208, 120, 264], 'total': 592}} - -Example 2: -Question: A plumber works for 3 weeks every month and for 4 days every week. If he gets paid £150 every day, how much does he earn if he works for a year? -Answer: 21600 -Metadata: {'difficulty': 1.0, 'answer_value': 21600, 'answer_cot': 'The plumber works for 4 days every week and works for 3 weeks every month so he works for 4 days/week * 3 weeks/month = 12 days/month\nIf he earns £150 every day he then earns £150/day * 12 days/month = £1800/month\nA year is equal to 12 months so every year he earns £1800/month * 12 months/year = £21600\n#### 21600', 'variables': {'occupation': 'plumber', 'weeks_per_month': 3, 'days_per_week': 4, 'pay_per_day': 150, 'currency': '£', 'days_per_month': 12, 'monthly_pay': 1800}} - -Example 3: -Question: Ava sliced an mango into 33 pieces. She ate 5 slice, her cousin ate 7 more than her, and her brother ate 4 more than her cousin. How many slices of mango did they all eat? -Answer: 33 -Metadata: {'difficulty': 1.0, 'answer_value': 33, 'answer_cot': 'Her cousin ate 5 + 7 = 12 slices.\nHer brother ate 12 + 4 = 16 slices.\nThey ate a total of 5 + 12 + 16 = 33 slices.\n#### 33', 'variables': {'name': 'Ava', 'fruit': 'mango', 'total_slices': 33, 'first_person_slices': 5, 'second_person_extra': 7, 'third_person_extra': 4, 'sibling1': 'cousin', 'sibling2': 'brother', 'total_eaten': 33}} - -```` - ### intermediate_integration Generates intermediate integration problem - either by substitution or by parts @@ -2061,12 +2034,12 @@ Example tasks: ```` Example 1: Question: Transform the word ladder 'HAND' to 'GLEE' by changing one letter at a time. -Answer: HAND,RAND,REND,FEND,FEED,FLED,FLEE,GLEE +Answer: HAND,RAND,REND,REED,FEED,FLED,FLEE,GLEE Metadata: {'start_word': 'HAND', 'end_word': 'GLEE', 'word_length': 4, 'chain_length': 8} Example 2: Question: Transform the word ladder 'JAZZ' to 'DORM' by changing one letter at a time. -Answer: JAZZ,JIZZ,FIZZ,FUZZ,FUZE,FAZE,FARE,FARM,FORM,DORM +Answer: JAZZ,JIZZ,FIZZ,FUZZ,FUZE,FAZE,FARE,FORE,FORM,DORM Metadata: {'start_word': 'JAZZ', 'end_word': 'DORM', 'word_length': 4, 'chain_length': 10} Example 3: @@ -2157,21 +2130,21 @@ Example tasks: ```` Example 1: Question: This is a logic puzzle. There are 4 houses (numbered 1 on the left, 4 on the right), from the perspective of someone standing across the street from them. Each has a different person in them. They have different characteristics: - - Each person has a unique name: arnold, eric, alice, peter - - People use different phone models: samsung galaxy s21, iphone 13, google pixel 6, oneplus 9 - - Each person has a favorite drink: tea, water, milk, coffee - - The people keep different animals: fish, cat, horse, bird + - Each person has a unique name: alice, eric, arnold, peter + - People use different phone models: samsung galaxy s21, oneplus 9, google pixel 6, iphone 13 + - Each person has a favorite drink: coffee, water, tea, milk + - The people keep different animals: horse, cat, fish, bird -1. The one who only drinks water is Peter. -2. The cat lover is in the second house. -3. The coffee drinker is the fish enthusiast. -4. The person who uses a OnePlus 9 is the tea drinker. -5. Peter is directly left of Arnold. -6. The person who keeps horses is in the fourth house. -7. The person who keeps horses is Alice. -8. Alice is the person who uses a Google Pixel 6. -9. The person who uses a Samsung Galaxy S21 is the one who only drinks water. -10. Peter is in the first house. +1. Peter is the one who only drinks water. +9. The fish enthusiast is directly left of the person who keeps horses. +7. The bird keeper is Peter. +6. Alice is in the fourth house. +2. The tea drinker is the person who uses a OnePlus 9. +3. The person who uses an iPhone 13 is the fish enthusiast. +5. The person who uses an iPhone 13 is directly left of the person who uses a Google Pixel 6. +8. The coffee drinker is the person who uses an iPhone 13. +4. The tea drinker and the person who uses an iPhone 13 are next to each other. +10. Eric and the person who uses a Google Pixel 6 are next to each other. What is Name of the person who lives in House 1? Answer: peter @@ -2179,20 +2152,21 @@ Metadata: {'num_people': 4, 'num_characteristics': 4} Example 2: Question: This is a logic puzzle. There are 4 houses (numbered 1 on the left, 4 on the right), from the perspective of someone standing across the street from them. Each has a different person in them. They have different characteristics: - - Each person has a unique name: alice, eric, arnold, peter - - Each mother is accompanied by their child: fred, bella, samantha, meredith - - The people are of nationalities: norwegian, swede, brit, dane - - Everyone has something different for lunch: spaghetti, grilled cheese, pizza, stew + - Each person has a unique name: arnold, peter, eric, alice + - Each mother is accompanied by their child: meredith, samantha, bella, fred + - The people are of nationalities: dane, norwegian, brit, swede + - Everyone has something different for lunch: stew, spaghetti, pizza, grilled cheese -1. The person who loves the stew is Eric. -2. The person's child is named Fred is directly left of the person who loves the spaghetti eater. -3. The person's child is named Samantha is Peter. -4. The person who is a pizza lover is the person's child is named Meredith. -5. The person's child is named Meredith is directly left of Eric. -6. The British person is the person's child is named Meredith. +1. The Dane is in the second house. +8. The Norwegian is the person who loves the spaghetti eater. +5. The person who is a pizza lover is the person's child is named Meredith. +2. Peter is directly left of the person who loves eating grilled cheese. +3. The British person is Alice. +9. The Swedish person is in the fourth house. +6. The person who is a pizza lover and Eric are next to each other. 7. The person's child is named Samantha is in the third house. -8. Arnold is the Swedish person. -9. The person's child is named Samantha is the Norwegian. +10. The person who is a pizza lover is in the first house. +4. Eric is the person's child is named Fred. What is Name of the person who lives in House 1? Answer: alice @@ -2200,21 +2174,21 @@ Metadata: {'num_people': 4, 'num_characteristics': 4} Example 3: Question: This is a logic puzzle. There are 4 houses (numbered 1 on the left, 4 on the right), from the perspective of someone standing across the street from them. Each has a different person in them. They have different characteristics: - - Each person has a unique name: alice, peter, eric, arnold - - Everyone has a different favorite cigar: prince, dunhill, pall mall, blue master - - Everyone has something different for lunch: stew, pizza, spaghetti, grilled cheese - - Each person has a favorite color: green, red, yellow, white + - Each person has a unique name: arnold, eric, peter, alice + - Everyone has a different favorite cigar: blue master, pall mall, dunhill, prince + - Everyone has something different for lunch: spaghetti, pizza, stew, grilled cheese + - Each person has a favorite color: yellow, white, red, green -1. Eric is the person who loves white. -2. Alice and the Dunhill smoker are next to each other. +7. The person whose favorite color is green is the person who loves the spaghetti eater. +5. The Dunhill smoker is the person who loves the stew. +4. The person who loves yellow is the Dunhill smoker. 3. The person who loves the stew is Arnold. -4. The person whose favorite color is green is directly left of the person who loves the stew. -5. The person who smokes Blue Master is Alice. -6. Alice is the person who loves the spaghetti eater. -7. The person partial to Pall Mall is directly left of Eric. -8. The Prince smoker is in the fourth house. -9. The person who loves yellow is in the second house. -10. Arnold and the person who loves eating grilled cheese are next to each other. +1. The person whose favorite color is green is Alice. +2. The person partial to Pall Mall is Peter. +9. The person who smokes Blue Master is in the first house. +10. Peter is directly left of the person who loves white. +8. The person who loves eating grilled cheese is the person whose favorite color is red. +6. The person partial to Pall Mall is in the third house. What is Name of the person who lives in House 1? Answer: alice diff --git a/reasoning_gym/arithmetic/__init__.py b/reasoning_gym/arithmetic/__init__.py index 9a6d775a..6d615efa 100644 --- a/reasoning_gym/arithmetic/__init__.py +++ b/reasoning_gym/arithmetic/__init__.py @@ -12,7 +12,8 @@ from .calendar_arithmetic import CalendarArithmeticConfig, CalendarArithmeticDat from .chain_sum import ChainSum, ChainSumConfig from .fraction_simplification import FractionSimplificationConfig, FractionSimplificationDataset from .gcd import GCDConfig, GCDDataset -from .gsm_symbolic.gsm_symbolic_datasets import GSMSymbolicDataset, GSMSymbolicDatasetConfig + +# from .gsm_symbolic.gsm_symbolic_datasets import GSMSymbolicDataset, GSMSymbolicDatasetConfig from .lcm import LCMConfig, LCMDataset from .leg_counting import LegCountingConfig, LegCountingDataset from .prime_factorization import PrimeFactorizationConfig, PrimeFactorizationDataset @@ -38,8 +39,8 @@ __all__ = [ "LegCountingDataset", "PrimeFactorizationConfig", "PrimeFactorizationDataset", - "GSMSymbolicDatasetConfig", - "GSMSymbolicDataset", + # "GSMSymbolicDatasetConfig", + # "GSMSymbolicDataset", "TimeIntervalsConfig", "TimeIntervalsDataset", ] diff --git a/reasoning_gym/logic/contrib/logic_puzzle/generate.py b/reasoning_gym/logic/contrib/logic_puzzle/generate.py index 28f679cd..ab365235 100644 --- a/reasoning_gym/logic/contrib/logic_puzzle/generate.py +++ b/reasoning_gym/logic/contrib/logic_puzzle/generate.py @@ -4,14 +4,10 @@ puzzle_generator.py This is a driver script that can be used to generate new zebra puzzles. """ -import json -import pickle -import sys - # from tqdm import tqdm from itertools import product -from random import choices, randint, sample, seed, shuffle -from typing import Dict, Iterable, List, Optional, Set, Tuple, Type +from random import Random +from typing import Dict, Iterable, List, Set, Tuple, Type from tabulate import tabulate @@ -72,7 +68,7 @@ def generate_consecutive_beside(puzzle: Puzzle, solution: Dict[Literal, int]) -> for pair in pairs: # consecutive is just a more informative version of beside, but they have same structure # because of this, don't include both - if randint(0, 1) == 0: + if puzzle.rng.randint(0, 1) == 0: clues.add(consecutive(pair[0], pair[1], puzzle.houses)) else: clues.add(beside(pair[0], pair[1], puzzle.houses)) @@ -169,7 +165,7 @@ def try_to_remove(puzzle: Puzzle, clues: Set[Clue], n: int, must_have=set()) -> return weights.get(type(clue), 1) weights = [weight(clue) for clue in clues] - candidates: Set[Clue] = set(choices(list(clues), weights, k=n)) + candidates: Set[Clue] = set(puzzle.rng.choices(list(clues), weights, k=n)) candidates = candidates - must_have clues = clues.difference(candidates) if has_unique_solution(puzzle, clues): @@ -191,7 +187,7 @@ def reduce_individually( and added to `removed`. If no clues can be removed, we return the original two sets. """ - candidates = set(sample(list(clues), len(clues))) + candidates = set(puzzle.rng.sample(list(clues), len(clues))) for clue in candidates: if clue not in must_have: clues.remove(clue) @@ -239,7 +235,7 @@ def reduce_clues(puzzle: Puzzle, clues: Set[Clue], must_have=set()) -> Tuple[Set """ # this is a stupid way to shuffle the set of clues without modifying it - minimal_clues = set(sample(list(clues), k=len(clues))) + minimal_clues = set(puzzle.rng.sample(list(clues), k=len(clues))) while True: # print(f"There are {len(minimal_clues)} clues in ba sing se") @@ -278,7 +274,7 @@ def reduce_clues(puzzle: Puzzle, clues: Set[Clue], must_have=set()) -> Tuple[Set return minimal_clues, removed_clues -def question_generation(col_name, table_data): +def question_generation(rng: Random, col_name, table_data): values_by_cols = {} for row in table_data: for idx, value in enumerate(row): @@ -294,7 +290,7 @@ def question_generation(col_name, table_data): continue question = f"What is {col} of the person who lives in House {row[0]}?" options = values_by_cols[col][:] - shuffle(options) + rng.shuffle(options) truth = row[cid] assert truth in options questions_data.append( @@ -306,18 +302,18 @@ def question_generation(col_name, table_data): return questions_data -def generate_solution_dict(selected_elements: List[Literal], n: int) -> Dict[Literal, int]: +def generate_solution_dict(rng: Random, selected_elements: List[Literal], n: int) -> Dict[Literal, int]: solution = {} house_ids = list(range(1, n + 1)) for element in selected_elements: - shuffle(house_ids) - attributes: List[element] = list(element.__members__.values()) + rng.shuffle(house_ids) + attributes: List[Literal] = list(element.__members__.values()) for i in range(n): solution[attributes[i]] = house_ids[i] return solution -def wrap_up_dict(random_elements, solution, puzzle, reduced, extra_clues, context, K, M): +def wrap_up_dict(rng: Random, random_elements, solution, puzzle, reduced, extra_clues, context, K, M): col_names = [e.__name__ for e in random_elements] house_data = {} for item, house in solution.items(): @@ -337,7 +333,7 @@ def wrap_up_dict(random_elements, solution, puzzle, reduced, extra_clues, contex table = tabulate(table_data, headers=col_names, tablefmt="grid") ## Generate multiple-choice questions - q_data = question_generation(col_names, table_data) + q_data = question_generation(rng, col_names, table_data) all_in_one = {} all_in_one["size"] = f"{K}*{M}" all_in_one["puzzle_context"] = context @@ -358,7 +354,7 @@ def check_correctness(p): return set(solution_set) == set(_first_solution) -def generate_puzzle(K=2, M=3, mode="train"): +def generate_puzzle(rng: Random, K=2, M=3): elements = [Color, Nationality, Animal, Drink, Cigar, Food, Flower, PhoneModel, Children, Smoothie] clue_types = [ generate_found_at, @@ -366,12 +362,12 @@ def generate_puzzle(K=2, M=3, mode="train"): generate_consecutive_beside, ] - shuffle(elements) + rng.shuffle(elements) random_elements = [Name] + elements[: M - 1] - solution = generate_solution_dict(random_elements, K) + solution = generate_solution_dict(rng, random_elements, K) # set up the puzzle with default constraints - puzzle = Puzzle(element_types=random_elements, elements=solution.keys(), n_houses=K).set_constraints() + puzzle = Puzzle(rng=rng, element_types=random_elements, elements=solution.keys(), n_houses=K).set_constraints() puzzle.solution = solution context = str(puzzle) @@ -383,68 +379,11 @@ def generate_puzzle(K=2, M=3, mode="train"): reduced, _ = reduce_clues(puzzle, clues) extra_clues = clues - reduced - extra_clues = set(sample(list(extra_clues), min(len(extra_clues), 30))) + extra_clues = set(rng.sample(list(extra_clues), min(len(extra_clues), 30))) for clue in reduced: puzzle.add_clue(clue) assert has_unique_solution(puzzle, puzzle.clues, remove_after=False) assert check_correctness(puzzle) - all_in_one = wrap_up_dict(random_elements, solution, puzzle, reduced, extra_clues, context, K, M) + all_in_one = wrap_up_dict(rng, random_elements, solution, puzzle, reduced, extra_clues, context, K, M) return all_in_one, puzzle - - -# def main(): -# mode = sys.argv[1] -# print(f"mode={mode}") -# if mode.startswith("train"): -# seed(1337) -# N = 30 -# if mode.endswith("_large"): -# N = 150 -# if mode.endswith("_xl"): -# N = 1000 -# Ks = [2,3,4] -# Ms = [2,3,4] - -# if mode.endswith("_xxl"): -# N = 500 -# Ks = [2,3,4,5,6] -# Ms = [2,3,4,5,6] - -# elif mode == "dev" or mode.startswith("test_"): -# seed(42+len(mode)) -# N = 10 -# Ks = [2,3,4,5] -# Ms = [2,3,4,5] -# if mode.startswith("test_id_xl"): -# Ks = [2,3,4,5,6] -# Ms = [2,3,4,5,6] -# if mode.startswith("test_id_xxl"): -# Ks = [2,3,4,5,6,7] -# Ms = [2,3,4,5,6,7] -# if mode.endswith("_50"): -# N = 50 - -# instances = [] -# puzzle_objs = [] -# for K, M, idx in tqdm(list(product(Ks, Ms, list(range(N))))): -# if mode.startswith("test_id_xl"): -# if K != 6 and M != 6: -# continue -# if mode.startswith("test_id_xxl"): -# if K != 7 and M != 7: -# continue -# instance, puzzle = generate_puzzle(K, M, mode) -# instance["idx"] = f"lgp-{mode}-{K}x{M}-{idx}" -# instances.append(instance) -# puzzle_objs.append({"idx": instance["idx"], "puzzle": puzzle}) - -# with open(f"logic_grid_puzzles.{mode}.pkl", "wb") as f: -# pickle.dump(puzzle_objs, f) - -# with open(f"logic_grid_puzzles.{mode}.json", "w") as f: -# json.dump(instances, f, indent=2) - - -if __name__ == "__main__": - main() diff --git a/reasoning_gym/logic/contrib/logic_puzzle/graph/reasoning_path.py b/reasoning_gym/logic/contrib/logic_puzzle/graph/reasoning_path.py index 18a286c9..cc63a80e 100644 --- a/reasoning_gym/logic/contrib/logic_puzzle/graph/reasoning_path.py +++ b/reasoning_gym/logic/contrib/logic_puzzle/graph/reasoning_path.py @@ -26,7 +26,6 @@ def logic_grid_puzzle(inputfile, ground_truth, size, lower_part, higher_part): reasoning_result = [] answers = json.load(open(ground_truth, "r")) puzzles = pickle.load(open(inputfile, "rb")) - cell_difficulty = {} mode = inputfile[inputfile.find("puzzles.") + 8 : inputfile.find(".pkl")] print("Number of puzzles", len(answers)) assert len(answers) == len(puzzles) diff --git a/reasoning_gym/logic/contrib/logic_puzzle/puzzle.py b/reasoning_gym/logic/contrib/logic_puzzle/puzzle.py index a72c2bf7..a6e5f018 100644 --- a/reasoning_gym/logic/contrib/logic_puzzle/puzzle.py +++ b/reasoning_gym/logic/contrib/logic_puzzle/puzzle.py @@ -6,8 +6,8 @@ Solve the Einstein puzzle using Raymond Hettinger's approach. from __future__ import annotations from contextlib import contextmanager -from random import shuffle -from typing import Dict, Generator, Iterable, List, Set, Tuple, Type +from random import Random +from typing import Generator, Iterable, List, Set, Tuple, Type from reasoning_gym.logic.contrib.logic_puzzle.clues import ( Clue, @@ -58,6 +58,7 @@ class Puzzle: def __init__( self, *, + rng: Random, element_types: Iterable[Type[Literal]], elements: Iterable[Literal] = None, n_houses: int = 5, @@ -73,6 +74,7 @@ class Puzzle: ones. """ + self.rng = rng self.element_classes = list(element_types) if elements is None: self.literals = [el for el_class in self.element_classes for el in el_class] @@ -145,15 +147,17 @@ class Puzzle: s += f"They have different characteristics:\n" for element_type in self.element_classes: literals = [l for l in self.literals if isinstance(l, element_type)] - shuffle(literals) + self.rng.shuffle(literals) desc = element_type.description() idx = desc.index(":") if ":" in desc else None desc = desc[:idx] s += f" - {desc}: " + ", ".join(e.name.replace("_", " ") for e in literals) + "\n" s += "\n" - for i, clue in enumerate(self.clues): - s += f"{i + 1}. {clue}\n" + + clues = sorted(f"{i + 1}. {clue}\n" for i, clue in enumerate(self.clues)) + self.rng.shuffle(clues) + s += "".join(clues) return s @@ -196,7 +200,7 @@ if __name__ == "__main__": literals: List[Literal] = [el for group in enum_classes for el in group] # set up the puzzle with constraints and clues - puzzle = Puzzle(element_types=[Color, Nationality, Drink, Cigar, Animal]) + puzzle = Puzzle(rng=Random(), element_types=[Color, Nationality, Drink, Cigar, Animal]) puzzle = ( puzzle.set_constraints() @@ -246,7 +250,7 @@ if __name__ == "__main__": literals = [el for group in enum_classes for el in group] # set up the puzzle with constraints and clues - puzzle = Puzzle(element_types=[Mother, Children, Flower, Food]) + puzzle = Puzzle(rng=Random(), element_types=[Mother, Children, Flower, Food]) puzzle = ( puzzle.set_constraints() diff --git a/reasoning_gym/logic/zebra_puzzles.py b/reasoning_gym/logic/zebra_puzzles.py index 3ba177a7..9992f25e 100644 --- a/reasoning_gym/logic/zebra_puzzles.py +++ b/reasoning_gym/logic/zebra_puzzles.py @@ -1,6 +1,6 @@ from dataclasses import dataclass -from random import Random, seed -from typing import Dict, List, Optional, Tuple +from random import Random +from typing import Dict, Optional from ..factory import ProceduralDataset, register_dataset from .contrib.logic_puzzle.generate import generate_puzzle @@ -36,11 +36,11 @@ class ZebraDataset(ProceduralDataset): - answer: str, a solution string - metadata: dict with generation parameters """ - seed(self.seed + idx) + rng = Random(self.seed + idx) K = self.config.num_people M = self.config.num_characteristics - instance, puzzle = generate_puzzle(K, M, "train") + instance, puzzle = generate_puzzle(rng, K, M) q = instance["questions"][0]["question"] answer = instance["questions"][0]["answer"] question = str(puzzle) + "\n" + q