use explicit rng for zebra generation (not yet fully deterministic)

2026-04-28 17:29:39 +00:00 · 2025-02-04 00:00:54 +01:00 · 2025-02-04 00:00:54 +01:00 · 04cd81dd76
commit 04cd81dd76
parent d0760926d0
6 changed files with 80 additions and 163 deletions
--- a/GALLERY.md
+++ b/GALLERY.md
@ -18,7 +18,6 @@ This gallery shows examples from all available datasets using their default conf
 - [fraction_simplification](#fraction_simplification)
 - [game_of_life](#game_of_life)
 - [gcd](#gcd)
 - [gsm_symbolic](#gsm_symbolic)
 - [intermediate_integration](#intermediate_integration)
 - [lcm](#lcm)
 - [leg_counting](#leg_counting)
@ -843,32 +842,6 @@ Metadata: {'numbers': [297, 30], 'result': 3}
 ````
 ### gsm_symbolic
 Default configuration:
 ```python
 seed = 42
 size = 500
 ```
 Example tasks:
 ````
 Example 1:
 Question: There are currently 16 orange balls, 12 yellow balls, and 44 blue balls in the shop. orange balls cost $13, yellow balls cost $10 and blue balls cost $6. How much will the shop have received after all the balls are sold?
 Answer: 592
 Metadata: {'difficulty': 1.0, 'answer_value': 592, 'answer_cot': 'For the orange balls, 16 balls * $13/ball = $208.\nFor the yellow balls, 12 balls * $10/ball = $120.\nFor the blue balls, 44 balls * $6/ball = $264.\nFor all balls, $208 + $120 + $264 = $592.\n#### 592', 'variables': {'store': 'shop', 'colors': ['orange', 'yellow', 'blue'], 'quantities': [16, 12, 44], 'prices': [13, 10, 6], 'currency': '$', 'subtotals': [208, 120, 264], 'total': 592}}
 Example 2:
 Question: A plumber works for 3 weeks every month and for 4 days every week. If he gets paid £150 every day, how much does he earn if he works for a year?
 Answer: 21600
 Metadata: {'difficulty': 1.0, 'answer_value': 21600, 'answer_cot': 'The plumber works for 4 days every week and works for 3 weeks every month so he works for 4 days/week * 3 weeks/month = 12 days/month\nIf he earns £150 every day he then earns £150/day * 12 days/month = £1800/month\nA year is equal to 12 months so every year he earns £1800/month * 12 months/year = £21600\n#### 21600', 'variables': {'occupation': 'plumber', 'weeks_per_month': 3, 'days_per_week': 4, 'pay_per_day': 150, 'currency': '£', 'days_per_month': 12, 'monthly_pay': 1800}}
 Example 3:
 Question: Ava sliced an mango into 33 pieces. She ate 5 slice, her cousin ate 7 more than her, and her brother ate 4 more than her cousin. How many slices of mango did they all eat?
 Answer: 33
 Metadata: {'difficulty': 1.0, 'answer_value': 33, 'answer_cot': 'Her cousin ate 5 + 7 = 12 slices.\nHer brother ate 12 + 4 = 16 slices.\nThey ate a total of 5 + 12 + 16 = 33 slices.\n#### 33', 'variables': {'name': 'Ava', 'fruit': 'mango', 'total_slices': 33, 'first_person_slices': 5, 'second_person_extra': 7, 'third_person_extra': 4, 'sibling1': 'cousin', 'sibling2': 'brother', 'total_eaten': 33}}
 ````
 ### intermediate_integration
 Generates intermediate integration problem - either
    by substitution or by parts
@ -2061,12 +2034,12 @@ Example tasks:
 ````
 Example 1:
 Question: Transform the word ladder 'HAND' to 'GLEE' by changing one letter at a time.
-Answer: HAND,RAND,REND,FEND,FEED,FLED,FLEE,GLEE
+Answer: HAND,RAND,REND,REED,FEED,FLED,FLEE,GLEE
 Metadata: {'start_word': 'HAND', 'end_word': 'GLEE', 'word_length': 4, 'chain_length': 8}
 Example 2:
 Question: Transform the word ladder 'JAZZ' to 'DORM' by changing one letter at a time.
-Answer: JAZZ,JIZZ,FIZZ,FUZZ,FUZE,FAZE,FARE,FARM,FORM,DORM
+Answer: JAZZ,JIZZ,FIZZ,FUZZ,FUZE,FAZE,FARE,FORE,FORM,DORM
 Metadata: {'start_word': 'JAZZ', 'end_word': 'DORM', 'word_length': 4, 'chain_length': 10}
 Example 3:
@ -2157,21 +2130,21 @@ Example tasks:
 ````
 Example 1:
 Question: This is a logic puzzle. There are 4 houses (numbered 1 on the left, 4 on the right), from the perspective of someone standing across the street from them. Each has a different person in them. They have different characteristics:
- - Each person has a unique name: arnold, eric, alice, peter
+ - Each person has a unique name: alice, eric, arnold, peter
- - People use different phone models: samsung galaxy s21, iphone 13, google pixel 6, oneplus 9
+ - People use different phone models: samsung galaxy s21, oneplus 9, google pixel 6, iphone 13
- - Each person has a favorite drink: tea, water, milk, coffee
+ - Each person has a favorite drink: coffee, water, tea, milk
- - The people keep different animals: fish, cat, horse, bird
+ - The people keep different animals: horse, cat, fish, bird
-1. The one who only drinks water is Peter.
+1. Peter is the one who only drinks water.
-2. The cat lover is in the second house.
+9. The fish enthusiast is directly left of the person who keeps horses.
-3. The coffee drinker is the fish enthusiast.
+7. The bird keeper is Peter.
-4. The person who uses a OnePlus 9 is the tea drinker.
+6. Alice is in the fourth house.
-5. Peter is directly left of Arnold.
+2. The tea drinker is the person who uses a OnePlus 9.
-6. The person who keeps horses is in the fourth house.
+3. The person who uses an iPhone 13 is the fish enthusiast.
-7. The person who keeps horses is Alice.
+5. The person who uses an iPhone 13 is directly left of the person who uses a Google Pixel 6.
-8. Alice is the person who uses a Google Pixel 6.
+8. The coffee drinker is the person who uses an iPhone 13.
-9. The person who uses a Samsung Galaxy S21 is the one who only drinks water.
+4. The tea drinker and the person who uses an iPhone 13 are next to each other.
-10. Peter is in the first house.
+10. Eric and the person who uses a Google Pixel 6 are next to each other.
 What is Name of the person who lives in House 1?
 Answer: peter
@ -2179,20 +2152,21 @@ Metadata: {'num_people': 4, 'num_characteristics': 4}
 Example 2:
 Question: This is a logic puzzle. There are 4 houses (numbered 1 on the left, 4 on the right), from the perspective of someone standing across the street from them. Each has a different person in them. They have different characteristics:
- - Each person has a unique name: alice, eric, arnold, peter
+ - Each person has a unique name: arnold, peter, eric, alice
- - Each mother is accompanied by their child: fred, bella, samantha, meredith
+ - Each mother is accompanied by their child: meredith, samantha, bella, fred
- - The people are of nationalities: norwegian, swede, brit, dane
+ - The people are of nationalities: dane, norwegian, brit, swede
- - Everyone has something different for lunch: spaghetti, grilled cheese, pizza, stew
+ - Everyone has something different for lunch: stew, spaghetti, pizza, grilled cheese
-1. The person who loves the stew is Eric.
+1. The Dane is in the second house.
-2. The person's child is named Fred is directly left of the person who loves the spaghetti eater.
+8. The Norwegian is the person who loves the spaghetti eater.
-3. The person's child is named Samantha is Peter.
+5. The person who is a pizza lover is the person's child is named Meredith.
-4. The person who is a pizza lover is the person's child is named Meredith.
+2. Peter is directly left of the person who loves eating grilled cheese.
-5. The person's child is named Meredith is directly left of Eric.
+3. The British person is Alice.
-6. The British person is the person's child is named Meredith.
+9. The Swedish person is in the fourth house.
 6. The person who is a pizza lover and Eric are next to each other.
 7. The person's child is named Samantha is in the third house.
-8. Arnold is the Swedish person.
+10. The person who is a pizza lover is in the first house.
-9. The person's child is named Samantha is the Norwegian.
+4. Eric is the person's child is named Fred.
 What is Name of the person who lives in House 1?
 Answer: alice
@ -2200,21 +2174,21 @@ Metadata: {'num_people': 4, 'num_characteristics': 4}
 Example 3:
 Question: This is a logic puzzle. There are 4 houses (numbered 1 on the left, 4 on the right), from the perspective of someone standing across the street from them. Each has a different person in them. They have different characteristics:
- - Each person has a unique name: alice, peter, eric, arnold
+ - Each person has a unique name: arnold, eric, peter, alice
- - Everyone has a different favorite cigar: prince, dunhill, pall mall, blue master
+ - Everyone has a different favorite cigar: blue master, pall mall, dunhill, prince
- - Everyone has something different for lunch: stew, pizza, spaghetti, grilled cheese
+ - Everyone has something different for lunch: spaghetti, pizza, stew, grilled cheese
- - Each person has a favorite color: green, red, yellow, white
+ - Each person has a favorite color: yellow, white, red, green
-1. Eric is the person who loves white.
+7. The person whose favorite color is green is the person who loves the spaghetti eater.
-2. Alice and the Dunhill smoker are next to each other.
+5. The Dunhill smoker is the person who loves the stew.
 4. The person who loves yellow is the Dunhill smoker.
 3. The person who loves the stew is Arnold.
-4. The person whose favorite color is green is directly left of the person who loves the stew.
+1. The person whose favorite color is green is Alice.
-5. The person who smokes Blue Master is Alice.
+2. The person partial to Pall Mall is Peter.
-6. Alice is the person who loves the spaghetti eater.
+9. The person who smokes Blue Master is in the first house.
-7. The person partial to Pall Mall is directly left of Eric.
+10. Peter is directly left of the person who loves white.
-8. The Prince smoker is in the fourth house.
+8. The person who loves eating grilled cheese is the person whose favorite color is red.
-9. The person who loves yellow is in the second house.
+6. The person partial to Pall Mall is in the third house.
 10. Arnold and the person who loves eating grilled cheese are next to each other.
 What is Name of the person who lives in House 1?
 Answer: alice
--- a/reasoning_gym/arithmetic/init.py
+++ b/reasoning_gym/arithmetic/init.py
@ -12,7 +12,8 @@ from .calendar_arithmetic import CalendarArithmeticConfig, CalendarArithmeticDat
 from .chain_sum import ChainSum, ChainSumConfig
 from .fraction_simplification import FractionSimplificationConfig, FractionSimplificationDataset
 from .gcd import GCDConfig, GCDDataset
-from .gsm_symbolic.gsm_symbolic_datasets import GSMSymbolicDataset, GSMSymbolicDatasetConfig
+
 # from .gsm_symbolic.gsm_symbolic_datasets import GSMSymbolicDataset, GSMSymbolicDatasetConfig
 from .lcm import LCMConfig, LCMDataset
 from .leg_counting import LegCountingConfig, LegCountingDataset
 from .prime_factorization import PrimeFactorizationConfig, PrimeFactorizationDataset
@ -38,8 +39,8 @@ __all__ = [
    "LegCountingDataset",
    "PrimeFactorizationConfig",
    "PrimeFactorizationDataset",
-    "GSMSymbolicDatasetConfig",
+    # "GSMSymbolicDatasetConfig",
-    "GSMSymbolicDataset",
+    # "GSMSymbolicDataset",
    "TimeIntervalsConfig",
    "TimeIntervalsDataset",
 ]
--- a/reasoning_gym/logic/contrib/logic_puzzle/generate.py
+++ b/reasoning_gym/logic/contrib/logic_puzzle/generate.py
@ -4,14 +4,10 @@ puzzle_generator.py
 This is a driver script that can be used to generate new zebra puzzles.
 """
 import json
 import pickle
 import sys
 # from tqdm import tqdm
 from itertools import product
-from random import choices, randint, sample, seed, shuffle
+from random import Random
-from typing import Dict, Iterable, List, Optional, Set, Tuple, Type
+from typing import Dict, Iterable, List, Set, Tuple, Type
 from tabulate import tabulate
@ -72,7 +68,7 @@ def generate_consecutive_beside(puzzle: Puzzle, solution: Dict[Literal, int]) ->
        for pair in pairs:
            # consecutive is just a more informative version of beside, but they have same structure
            # because of this, don't include both
-            if randint(0, 1) == 0:
+            if puzzle.rng.randint(0, 1) == 0:
                clues.add(consecutive(pair[0], pair[1], puzzle.houses))
            else:
                clues.add(beside(pair[0], pair[1], puzzle.houses))
@ -169,7 +165,7 @@ def try_to_remove(puzzle: Puzzle, clues: Set[Clue], n: int, must_have=set()) ->
        return weights.get(type(clue), 1)
    weights = [weight(clue) for clue in clues]
-    candidates: Set[Clue] = set(choices(list(clues), weights, k=n))
+    candidates: Set[Clue] = set(puzzle.rng.choices(list(clues), weights, k=n))
    candidates = candidates - must_have
    clues = clues.difference(candidates)
    if has_unique_solution(puzzle, clues):
@ -191,7 +187,7 @@ def reduce_individually(
    and added to `removed`. If no clues can be removed, we return the original two sets.
    """
-    candidates = set(sample(list(clues), len(clues)))
+    candidates = set(puzzle.rng.sample(list(clues), len(clues)))
    for clue in candidates:
        if clue not in must_have:
            clues.remove(clue)
@ -239,7 +235,7 @@ def reduce_clues(puzzle: Puzzle, clues: Set[Clue], must_have=set()) -> Tuple[Set
    """
    # this is a stupid way to shuffle the set of clues without modifying it
-    minimal_clues = set(sample(list(clues), k=len(clues)))
+    minimal_clues = set(puzzle.rng.sample(list(clues), k=len(clues)))
    while True:
        # print(f"There are {len(minimal_clues)} clues in ba sing se")
@ -278,7 +274,7 @@ def reduce_clues(puzzle: Puzzle, clues: Set[Clue], must_have=set()) -> Tuple[Set
    return minimal_clues, removed_clues
-def question_generation(col_name, table_data):
+def question_generation(rng: Random, col_name, table_data):
    values_by_cols = {}
    for row in table_data:
        for idx, value in enumerate(row):
@ -294,7 +290,7 @@ def question_generation(col_name, table_data):
                continue
            question = f"What is {col} of the person who lives in House {row[0]}?"
            options = values_by_cols[col][:]
-            shuffle(options)
+            rng.shuffle(options)
            truth = row[cid]
            assert truth in options
            questions_data.append(
@ -306,18 +302,18 @@ def question_generation(col_name, table_data):
    return questions_data
-def generate_solution_dict(selected_elements: List[Literal], n: int) -> Dict[Literal, int]:
+def generate_solution_dict(rng: Random, selected_elements: List[Literal], n: int) -> Dict[Literal, int]:
    solution = {}
    house_ids = list(range(1, n + 1))
    for element in selected_elements:
-        shuffle(house_ids)
+        rng.shuffle(house_ids)
-        attributes: List[element] = list(element.__members__.values())
+        attributes: List[Literal] = list(element.__members__.values())
        for i in range(n):
            solution[attributes[i]] = house_ids[i]
    return solution
-def wrap_up_dict(random_elements, solution, puzzle, reduced, extra_clues, context, K, M):
+def wrap_up_dict(rng: Random, random_elements, solution, puzzle, reduced, extra_clues, context, K, M):
    col_names = [e.__name__ for e in random_elements]
    house_data = {}
    for item, house in solution.items():
@ -337,7 +333,7 @@ def wrap_up_dict(random_elements, solution, puzzle, reduced, extra_clues, contex
    table = tabulate(table_data, headers=col_names, tablefmt="grid")
    ## Generate multiple-choice questions
-    q_data = question_generation(col_names, table_data)
+    q_data = question_generation(rng, col_names, table_data)
    all_in_one = {}
    all_in_one["size"] = f"{K}*{M}"
    all_in_one["puzzle_context"] = context
@ -358,7 +354,7 @@ def check_correctness(p):
    return set(solution_set) == set(_first_solution)
-def generate_puzzle(K=2, M=3, mode="train"):
+def generate_puzzle(rng: Random, K=2, M=3):
    elements = [Color, Nationality, Animal, Drink, Cigar, Food, Flower, PhoneModel, Children, Smoothie]
    clue_types = [
        generate_found_at,
@ -366,12 +362,12 @@ def generate_puzzle(K=2, M=3, mode="train"):
        generate_consecutive_beside,
    ]
-    shuffle(elements)
+    rng.shuffle(elements)
    random_elements = [Name] + elements[: M - 1]
-    solution = generate_solution_dict(random_elements, K)
+    solution = generate_solution_dict(rng, random_elements, K)
    # set up the puzzle with default constraints
-    puzzle = Puzzle(element_types=random_elements, elements=solution.keys(), n_houses=K).set_constraints()
+    puzzle = Puzzle(rng=rng, element_types=random_elements, elements=solution.keys(), n_houses=K).set_constraints()
    puzzle.solution = solution
    context = str(puzzle)
@ -383,68 +379,11 @@ def generate_puzzle(K=2, M=3, mode="train"):
    reduced, _ = reduce_clues(puzzle, clues)
    extra_clues = clues - reduced
-    extra_clues = set(sample(list(extra_clues), min(len(extra_clues), 30)))
+    extra_clues = set(rng.sample(list(extra_clues), min(len(extra_clues), 30)))
    for clue in reduced:
        puzzle.add_clue(clue)
    assert has_unique_solution(puzzle, puzzle.clues, remove_after=False)
    assert check_correctness(puzzle)
-    all_in_one = wrap_up_dict(random_elements, solution, puzzle, reduced, extra_clues, context, K, M)
+    all_in_one = wrap_up_dict(rng, random_elements, solution, puzzle, reduced, extra_clues, context, K, M)
    return all_in_one, puzzle
 # def main():
 #     mode = sys.argv[1]
 #     print(f"mode={mode}")
 #     if mode.startswith("train"):
 #         seed(1337)
 #         N = 30
 #         if mode.endswith("_large"):
 #             N = 150
 #         if mode.endswith("_xl"):
 #             N = 1000
 #         Ks = [2,3,4]
 #         Ms = [2,3,4]
 #         if mode.endswith("_xxl"):
 #             N = 500
 #             Ks = [2,3,4,5,6]
 #             Ms = [2,3,4,5,6]
 #     elif mode == "dev" or mode.startswith("test_"):
 #         seed(42+len(mode))
 #         N = 10
 #         Ks = [2,3,4,5]
 #         Ms = [2,3,4,5]
 #         if mode.startswith("test_id_xl"):
 #             Ks = [2,3,4,5,6]
 #             Ms = [2,3,4,5,6]
 #         if mode.startswith("test_id_xxl"):
 #             Ks = [2,3,4,5,6,7]
 #             Ms = [2,3,4,5,6,7]
 #         if mode.endswith("_50"):
 #             N = 50
 #     instances = []
 #     puzzle_objs = []
 #     for K, M, idx in tqdm(list(product(Ks, Ms, list(range(N))))):
 #         if mode.startswith("test_id_xl"):
 #             if K != 6 and M != 6:
 #                 continue
 #         if mode.startswith("test_id_xxl"):
 #             if K != 7 and M != 7:
 #                 continue
 #         instance, puzzle = generate_puzzle(K, M, mode)
 #         instance["idx"] = f"lgp-{mode}-{K}x{M}-{idx}"
 #         instances.append(instance)
 #         puzzle_objs.append({"idx": instance["idx"], "puzzle": puzzle})
 #     with open(f"logic_grid_puzzles.{mode}.pkl", "wb") as f:
 #         pickle.dump(puzzle_objs, f)
 #     with open(f"logic_grid_puzzles.{mode}.json", "w") as f:
 #         json.dump(instances, f, indent=2)
 if __name__ == "__main__":
    main()
--- a/reasoning_gym/logic/contrib/logic_puzzle/graph/reasoning_path.py
+++ b/reasoning_gym/logic/contrib/logic_puzzle/graph/reasoning_path.py
@ -26,7 +26,6 @@ def logic_grid_puzzle(inputfile, ground_truth, size, lower_part, higher_part):
    reasoning_result = []
    answers = json.load(open(ground_truth, "r"))
    puzzles = pickle.load(open(inputfile, "rb"))
    cell_difficulty = {}
    mode = inputfile[inputfile.find("puzzles.") + 8 : inputfile.find(".pkl")]
    print("Number of puzzles", len(answers))
    assert len(answers) == len(puzzles)
--- a/reasoning_gym/logic/contrib/logic_puzzle/puzzle.py
+++ b/reasoning_gym/logic/contrib/logic_puzzle/puzzle.py
@ -6,8 +6,8 @@ Solve the Einstein puzzle using Raymond Hettinger's approach.
 from __future__ import annotations
 from contextlib import contextmanager
-from random import shuffle
+from random import Random
-from typing import Dict, Generator, Iterable, List, Set, Tuple, Type
+from typing import Generator, Iterable, List, Set, Tuple, Type
 from reasoning_gym.logic.contrib.logic_puzzle.clues import (
    Clue,
@ -58,6 +58,7 @@ class Puzzle:
    def __init__(
        self,
        *,
        rng: Random,
        element_types: Iterable[Type[Literal]],
        elements: Iterable[Literal] = None,
        n_houses: int = 5,
@ -73,6 +74,7 @@ class Puzzle:
        ones.
        """
        self.rng = rng
        self.element_classes = list(element_types)
        if elements is None:
            self.literals = [el for el_class in self.element_classes for el in el_class]
@ -145,15 +147,17 @@ class Puzzle:
        s += f"They have different characteristics:\n"
        for element_type in self.element_classes:
            literals = [l for l in self.literals if isinstance(l, element_type)]
-            shuffle(literals)
+            self.rng.shuffle(literals)
            desc = element_type.description()
            idx = desc.index(":") if ":" in desc else None
            desc = desc[:idx]
            s += f" - {desc}: " + ", ".join(e.name.replace("_", " ") for e in literals) + "\n"
        s += "\n"
-        for i, clue in enumerate(self.clues):
+
-            s += f"{i + 1}. {clue}\n"
+        clues = sorted(f"{i + 1}. {clue}\n" for i, clue in enumerate(self.clues))
        self.rng.shuffle(clues)
        s += "".join(clues)
        return s
@ -196,7 +200,7 @@ if __name__ == "__main__":
    literals: List[Literal] = [el for group in enum_classes for el in group]
    # set up the puzzle with constraints and clues
-    puzzle = Puzzle(element_types=[Color, Nationality, Drink, Cigar, Animal])
+    puzzle = Puzzle(rng=Random(), element_types=[Color, Nationality, Drink, Cigar, Animal])
    puzzle = (
        puzzle.set_constraints()
@ -246,7 +250,7 @@ if __name__ == "__main__":
    literals = [el for group in enum_classes for el in group]
    # set up the puzzle with constraints and clues
-    puzzle = Puzzle(element_types=[Mother, Children, Flower, Food])
+    puzzle = Puzzle(rng=Random(), element_types=[Mother, Children, Flower, Food])
    puzzle = (
        puzzle.set_constraints()
--- a/reasoning_gym/logic/zebra_puzzles.py
+++ b/reasoning_gym/logic/zebra_puzzles.py
@ -1,6 +1,6 @@
 from dataclasses import dataclass
-from random import Random, seed
+from random import Random
-from typing import Dict, List, Optional, Tuple
+from typing import Dict, Optional
 from ..factory import ProceduralDataset, register_dataset
 from .contrib.logic_puzzle.generate import generate_puzzle
@ -36,11 +36,11 @@ class ZebraDataset(ProceduralDataset):
                - answer: str, a solution string
                - metadata: dict with generation parameters
        """
-        seed(self.seed + idx)
+        rng = Random(self.seed + idx)
        K = self.config.num_people
        M = self.config.num_characteristics
-        instance, puzzle = generate_puzzle(K, M, "train")
+        instance, puzzle = generate_puzzle(rng, K, M)
        q = instance["questions"][0]["question"]
        answer = instance["questions"][0]["answer"]
        question = str(puzzle) + "\n" + q