Merge pull request #31 from cavit99/main

feat: Add Word Ladder dataset generator
2026-04-19 12:58:07 +00:00 · 2025-01-30 23:11:58 +01:00 · 2025-01-30 23:11:58 +01:00 · 6117162bad
commit 6117162bad
parent 072f292661 893ef97c2e
6 changed files with 13541 additions and 0 deletions
--- a/README.md
+++ b/README.md
@ -95,6 +95,7 @@ See the [Dataset Gallery](GALLERY.md) for a complete list of available datasets
 - `SentenceReorderingDataset`: Reorder sentence after words in it have been randomly shuffled
 - `SpellBackwardDataset`: Spell individual words backward (e.g. "sun" -> "nus")
 - `WordSequenceReversalDataset`: Reverse word order in text spans
+- `WordLadderDataset`: Generate word ladder puzzles where one word is transformed into another by changing one letter at a time

 ### <small>Code Tasks</small>

--- a/examples/generate_word_ladder_examples.py
+++ b/examples/generate_word_ladder_examples.py
@ -0,0 +1,220 @@
+# generates dataset of word ladder examples, and then generates simulated chain of thought reasoning for each example
+
+import reasoning_gym
+from openai import OpenAI
+import os
+
+# Configuration for the dataset
+config = {
+    'dataset_name': 'word_ladder',
+    'dataset_config': {
+        'min_word_length': 5,
+        'max_word_length': 5,
+        'min_chain_length':3, # set to -1 for shortest possible path, increase to generate more examples
+        'max_chain_length':5,
+        'size': 1,  # Generate a small dataset for demonstration
+    }
+}
+
+system_prompt = """Word Ladder puzzles involve transforming a start word into an end word. 
+You are allowed to change only one letter a time and you must keep the number of letters constant. 
+Each time you change one letter the word in the chain must be forming one that's valid in English. 
+Plurals are allowed, but not proper nouns. 
+Given a start and an end word, generate a detailed step-by-step chain of thought reasoning of the transformation process.
+You will be given the word ladder question, as well as the correct solution path. So you don't need to solve the problem, you have the solution.
+Your task is to provide a perfectly simulated chain of thought reasoning exactly in the style of the example below, 
+including considering multiple possibilities, validating words, and showing the final path. 
+Use casual, verbose thinking with markdown formatting.
+Example chain of thought reasoning: 
+{sample_CoT}
+"""
+
+# sample CoT for demonstration, extracted from real inference data from deepseek r1
+sample_CoT = """ 
+Word Ladder puzzles involve transforming a start word into an end word. 
+You are allowed to change only one letter a time and you must keep the number of letters constant. 
+Each time you change one letter the word in the chain must be forming one that's valid in English. 
+Plurals are allowed, but not proper nouns. 
+Given a start and an end word, generate a detailed step-by-step chain of thought reasoning of the transformation process.
+You will be given the word ladder question, as well as the correct solution path. So you don't need to solve the problem, you have the solution already, which should be your 'cheat sheet' to refer to as you go along.
+Your task is to provide a perfectly simulated chain of thought reasoning exactly in the style of the example below, 
+including considering multiple possibilities, validating words, and showing the final path. 
+Use casual, verbose thinking with markdown formatting.
+VERY IMPORTANT ***NEVER MENTION YOU HAVE BEEN PROVIDED WITH THE ANSWER, YOU NEED TO WRITE ONLY AS IF YOU'RE SOLVING THE QUESTION STEP BY STEP ***
+
+Example chain of thought reasoning: 
+<think>
+Okay, so I need to figure out how to transform the word "CARK" into "ZIPS" by changing one letter at a time, making sure each intermediate step is a valid English word. Let me start by breaking down the problem.
+
+First, both words are four letters long, so each step must maintain four letters. Each change can only alter one letter, either adding, removing, or replacing, but since the length must stay the same, it has to be a replacement each time. So, I need to find a path from CARK to ZIPS where each intermediate word is valid.
+
+Let me write down the starting and target words:
+
+CARK
+
+...
+
+ZIPS
+
+I need to see which letters need to change. Let's compare the letters:
+
+C -> Z
+
+A -> I
+
+R -> P
+
+K -> S
+
+So all four letters need to change. But since each step can only change one letter, I need to find a sequence where each single-letter change results in a valid word. Maybe I can approach this by changing one letter at a time, checking possible words each step.
+
+Starting with CARK. Let's look at possible changes for each letter:
+
+First letter C: changing to other letters to form a word. Possible options could be BARK (C to B), DARK (C to D), LARK (C to L), MARK (C to M), PARK (C to P), etc. Wait, but CARK is a valid word? Let me confirm. CARK means to worry or burden, so yes, it's valid.
+
+So, possible first steps from CARK:
+
+Change C to B: BARK
+
+Change C to D: DARK
+
+Change C to L: LARK
+
+Change C to M: MARK
+
+Change C to P: PARK
+
+Change C to W: WARK (is that a word? Wark might be a Scottish term for work, but I'm not sure if it's accepted in standard English. Maybe not.)
+
+Other possibilities: changing the A, R, or K. Let's see:
+
+Second letter A: changing to other vowels or consonants. For example, CORK (changing A to O), but CORK is valid. Or CIRK (changing A to I; cirk is a variant of circus, but I think "cirk" isn't standard. Maybe CIRC? No, that's not right. Wait, maybe CIRK is a valid word in some contexts, like a circular trench in fortification? Hmm, not sure. Better to stick with more common words.
+
+Third letter R: changing to another letter. For example, CAKE (changing R to K), but that would require changing both the third letter and the fourth letter? Wait, no: CARK to CAKE would be changing R to E and K to E? Wait, no. Wait, CARK is C-A-R-K. To get CAKE, you'd need to change R to K and K to E, which is two changes. So not possible in one step. Maybe changing R to another letter. For example, CASH (changing R to S and K to H? No, that's two letters again. Wait, no. Let's see: CARK. Changing the third letter R to another letter, like CASK (changing R to S). But that would be changing the third letter from R to S, resulting in CASK. Yes, that's a valid word. So CARK -> CASK.
+
+Fourth letter K: changing to another letter. For example, CARD (changing K to D), CARP (changing K to P), CARS (changing K to S), CART (changing K to T). All of these are valid.
+
+So possible first steps from CARK:
+
+- BARK, DARK, LARK, MARK, PARK, CASK, CARD, CARP, CARS, CART.
+
+Now, which of these would lead towards ZIPS? Let's think about the target word. ZIPS has a Z, which is the first letter. So maybe we need to get to a word starting with Z eventually. But changing from C to Z in one step would be a big jump. Alternatively, maybe we can change other letters first to reach intermediate steps that can approach Z.
+
+Alternatively, maybe work on changing the last letters first. Let's see. For example, changing K to S (CARS) might be useful since the target ends with S. Let's try that.
+
+CARK -> CARS (changing K to S). Now, from CARS, we need to change the third letter R to P and the first letter C to Z, and the second A to I. Let's see.
+
+From CARS, possible changes:
+
+Change C to Z: ZARS (is that a word? ZAR is a currency, plural ZARS? Maybe, but not sure. Alternatively, ZAPS (changing R to P and S to something? Wait, no. Let's check each letter.
+
+Alternatively, from CARS, change A to I: CIRS (is that a word? CIR is a three-letter word, but CIRS? Maybe not. Or CURS (changing A to U), which is valid. But how does that help?
+
+Alternatively, from CARS, change R to P: CAPS. Yes, that's valid. So CARS -> CAPS (changing R to P). Then from CAPS, we can change C to Z: ZAPS. Yes, ZAPS is a word. Then ZAPS -> ZIPS by changing A to I. So that would be a path:
+
+CARK -> CARS -> CAPS -> ZAPS -> ZIPS.
+
+Wait, let's check each step:
+
+1. CARK to CARS: change K to S. Valid.
+
+2. CARS to CAPS: change R to P. Valid.
+
+3. CAPS to ZAPS: change C to Z. Valid.
+
+4. ZAPS to ZIPS: change A to I. Valid.
+
+That's four steps. But wait, starting from CARK (step 0), then steps 1-4 would be four changes. But the problem says "transform CARK into ZIPS by changing one letter at a time", so the number of steps would be the number of letters changed. Since all four letters are different, at minimum it would take four steps. But perhaps the path is longer if there are no shorter paths.
+
+Wait, let me check if this works:
+
+CARK → CARS → CAPS → ZAPS → ZIPS.
+
+Yes, that's four steps. Each step changes one letter, and all intermediate words are valid. Let me verify each word:
+
+- CARK: valid.
+
+- CARS: valid.
+
+- CAPS: valid.
+
+- ZAPS: valid.
+
+- ZIPS: valid.
+
+Yes, that seems to work. But let me check if there's a shorter path or if this is correct.
+
+Alternatively, is there a three-step path? Probably not since four letters need to change. Each step can only change one letter, so at least four steps are needed. So the path above is four steps, which is the minimum. Let me confirm each step again.
+
+Another possible path: changing the first letter earlier. For example:
+
+CARK → PARK (change C to P). Then PARK → PARKS (but that's five letters; can't do that. Alternatively, changing K to S: PARS. Then PARS -> PAWS (changing R to W?), but not helpful. Or PARK → PARS (change K to S), then PARS → PAIS (change R to I?), but PAIS is valid. Then PAIS → ZIPS (change P to Z and A to I? No, that's two letters. Hmm, not helpful.
+
+Alternatively, CARK → DARK → DANK → DINK → ZINK → ZIPS? Let's see. CARK to DARK (C→D). DARK to DANK (R→N). DANK to DINK (A→I). DINK to ZINK (D→Z). ZINK is a variant spelling of zinc, but ZINK is a valid word. Then ZINK to ZIPS (K→S and N→P? Wait, no. ZINK is Z-I-N-K. To get to ZIPS, we need Z-I-P-S. So need to change N to P and K to S. That's two steps. So that path would be longer.
+
+Another path: CARK → CARD → ZARD? ZARD isn't a word. CARD → WARD → WARD → ... not helpful.
+
+Alternatively, CARK → CARK to CIRK (A→I). CIRK is a variant of circus, but maybe not standard. Then CIRK → ZIRK (C→Z). ZIRK is a surname or a brand, not sure if it's a valid word. Then ZIRK → ZIPS (R→P and K→S). But again, two letters at once. Not helpful.
+
+So the first path I thought of seems viable: CARK → CARS → CAPS → ZAPS → ZIPS.
+
+Another check: From CARK, can we go to CASK? CARK → CASK (R→S). Then CASK → ZASK (C→Z). ZASK? Not a word. Alternatively, CASK → CASK → CASH (K→H). Then CASH → ZASH (C→Z). Not a word. Not helpful.
+
+Alternatively, CARK → CARP (K→P). CARP is valid. Then CARP → CARP → ZARP (C→Z). ZARP? Not a word. Or CARP → HARp (C→H), but HARP is valid. Then HARP → HIPS (A→I and R→P). But that's two changes. Not allowed.
+
+So the initial path seems best. Let me confirm each step again:
+
+1. CARK → CARS: change K to S. Valid.
+
+2. CARS → CAPS: change R to P. Valid.
+
+3. CAPS → ZAPS: change C to Z. Valid.
+
+4. ZAPS → ZIPS: change A to I. Valid.
+
+Yes, that's four steps, each changing one letter, all words are valid. So that should be the correct sequence.
+</think>
+"""
+
+client = OpenAI(
+  base_url="https://openrouter.ai/api/v1", # openrouter used in this example
+  api_key=os.environ["OPENROUTER_API_KEY"], # set your key in environment variable
+)
+
+
+def generate_cot(question: str, answer: str) -> str:
+    """Generate chain of thought reasoning for word ladder"""
+    prompt = f"""The question is {question}. The correct solution is {answer}. 
+    Provide the verbose chain of thought reasoning to transform the start word into the end word exactly in the style and length required."""
+    
+    completion = client.chat.completions.create(
+        model="microsoft/phi-4", # choose model
+        messages=[
+            {
+                "role": "system",
+                "content": system_prompt
+            },
+            {
+                "role": "user",
+                "content": prompt
+            }
+        ],
+        temperature=0.6,
+        max_tokens=10000
+    )
+    return completion.choices[0].message.content
+# Create the word ladder dataset
+dataset = reasoning_gym.create_dataset(config['dataset_name'], **config['dataset_config'])
+print(f"Generated {len(dataset)} examples, moving on to generate CoT reasoning...")
+# Generate and print examples with CoT
+for item in dataset:
+    # Generate CoT reasoning demo
+
+    item['reasoning'] = generate_cot(item['question'],item['answer'])
+    
+    print("\n--- Example ---")
+    print("Question:", item['question'])
+    print("Answer:", item['answer'])
+    print("\nChain of Thought:")
+    print(item['reasoning'])
+    print("\nMetadata:", item['metadata']) 
--- a/reasoning_gym/algorithmic/init.py
+++ b/reasoning_gym/algorithmic/init.py
@ -16,6 +16,7 @@ from .sentence_reordering import SentenceReorderingConfig, SentenceReorderingDat
 from .spell_backward import SpellBackwardConfig, SpellBackwardDataset
 from .word_sequence_reversal import WordSequenceReversalConfig, WordSequenceReversalDataset
 from .word_sorting import TextTransformation, WordSortingConfig, WordSortingDataset
+from .word_ladder import WordLadderConfig, WordLadderDataset

 __all__ = [
    "SpellBackwardConfig",
@ -39,4 +40,6 @@ __all__ = [
    "WordSortingConfig",
    "WordSortingDataset",
    "TextTransformation",
+    "WordLadderConfig",
+    "WordLadderDataset",
 ]
--- a/reasoning_gym/algorithmic/word_ladder.py
+++ b/reasoning_gym/algorithmic/word_ladder.py
@ -0,0 +1,207 @@
+"""Word ladder task generator"""
+
+from dataclasses import dataclass
+from random import Random
+from typing import List, Optional, Set, Dict, Tuple
+from collections import deque
+from reasoning_gym.data import read_data_file
+
+from ..factory import ProceduralDataset, register_dataset
+
+@dataclass
+class WordLadderConfig:
+    """Configuration for word ladder task generation"""
+    
+    min_word_length: int = 3       # Minimum word length
+    max_word_length: int = 5       # Maximum word length
+    min_chain_length: int = -1     # Set to -1 for shortest path or a minimum of 3
+    max_chain_length: int = -1     # Set to -1 for shortest path or a max 
+    seed: Optional[int] = None
+    size: int = 500                # Virtual dataset size
+
+    def validate(self) -> None:
+        """Validate configuration parameters"""
+        assert self.min_word_length > 2, "min_word_length must be 3"
+        assert self.max_word_length >= self.min_word_length, "max_word_length must be >= min_word_length"
+        assert self.max_word_length <= 5, "max_word_length must be 5"
+        
+        # Modified validation logic
+        if self.min_chain_length == -1:
+            if self.max_chain_length != -1:
+                assert self.max_chain_length >= 3, "When min_chain_length=-1 (shortest path), max_chain_length must be -1 or >=3"
+        elif self.max_chain_length == -1:
+            raise AssertionError("max_chain_length cannot be -1 unless min_chain_length is also -1")
+        else:
+            assert self.min_chain_length >= 3, "min_chain_length must be 3 or -1"
+            assert self.max_chain_length >= self.min_chain_length, "max_chain_length must be >= min_chain_length"
+
+class WordLadderDataset(ProceduralDataset):
+    """Generates word ladder transformation tasks"""
+
+    def __init__(self, config: WordLadderConfig):
+        super().__init__(config=config, seed=config.seed, size=config.size)
+        
+        # Load words from CSV file
+        self.word_sets = self._load_words_from_csv()
+
+    def _load_words_from_csv(self) -> Dict[int, Set[str]]:
+        """Load words from CSV file organized by length"""
+        import csv
+        from io import StringIO
+        word_sets = {}
+        
+        try:
+            # Get CSV content as string
+            csv_content = read_data_file("words.csv")
+            
+            # Use StringIO to create a file-like object from the string
+            csv_file = StringIO(csv_content)
+            reader = csv.DictReader(csv_file)
+            
+            for row in reader:
+                # Process each word length column
+                for length in range(3, 6):
+                    col_name = f'{length}_letter'
+                    word = row.get(col_name, '')
+                    
+                    if not word:  # Skip empty entries
+                        continue
+                        
+                    if self.config.min_word_length <= length <= self.config.max_word_length:
+                        word_sets.setdefault(length, set()).add(word.upper())
+                        
+        except Exception as e:
+            raise RuntimeError(f"Error processing words.csv content: {e}") from e
+        
+        # Validate we have words for each length
+        for length in range(self.config.min_word_length, self.config.max_word_length + 1):
+            if length not in word_sets or not word_sets[length]:
+                raise ValueError(f"No valid words found for length {length}")
+                
+        return word_sets
+
+    def _differs_by_one(self, word1: str, word2: str) -> bool:
+        """Check if two words differ by exactly one letter"""
+        if len(word1) != len(word2):
+            return False
+        differences = 0
+        for c1, c2 in zip(word1, word2):
+            if c1 != c2:
+                differences += 1
+                if differences > 1:
+                    return False
+        return differences == 1
+
+    def _find_path(self, start: str, end: str, word_set: Set[str]) -> Optional[List[str]]:
+        """Find path between start and end words that meets length requirements"""
+        if start == end:
+            return [start]
+        
+        # First find shortest path length
+        shortest_path = self._bfs_shortest_path(start, end, word_set)
+        if not shortest_path:
+            return None
+            
+        min_length = self.config.min_chain_length
+        if len(shortest_path) > min_length:
+            return shortest_path  # Shortest path is already longer than required
+            
+        # Now look for longer paths using DFS with depth constraint
+        return self._dfs_with_depth(start, end, word_set, min_length)
+
+    def _bfs_shortest_path(self, start: str, end: str, word_set: Set[str]) -> Optional[List[str]]:
+        """BFS implementation to find shortest path"""
+        queue = deque([(start, [start])])
+        visited = {start}
+        
+        while queue:
+            current, path = queue.popleft()
+            if current == end:
+                return path
+                
+            for neighbor in self._get_neighbors(current, word_set):
+                if neighbor not in visited:
+                    visited.add(neighbor)
+                    queue.append((neighbor, path + [neighbor]))
+        return None
+
+    def _dfs_with_depth(self, start: str, end: str, word_set: Set[str], target_length: int) -> Optional[List[str]]:
+        """DFS implementation looking for paths of exact length"""
+        stack = [(start, [start], set([start]))]
+        
+        while stack:
+            current, path, visited = stack.pop()
+            
+            if len(path) == target_length:
+                if current == end:
+                    return path
+                continue
+                
+            if len(path) > target_length:
+                continue
+                
+            # Explore neighbors in random order to find different paths
+            neighbors = list(self._get_neighbors(current, word_set))
+            Random().shuffle(neighbors)
+            
+            for neighbor in neighbors:
+                if neighbor not in visited:
+                    new_visited = set(visited)
+                    new_visited.add(neighbor)
+                    stack.append((neighbor, path + [neighbor], new_visited))
+                    
+        return None
+
+    def _get_neighbors(self, word: str, word_set: Set[str]) -> Set[str]:
+        """Get all valid neighbors that differ by one letter"""
+        neighbors = set()
+        word_chars = list(word)
+        
+        for i in range(len(word_chars)):
+            original = word_chars[i]
+            for c in 'ABCDEFGHIJKLMNOPQRSTUVWXYZ':
+                if c == original:
+                    continue
+                word_chars[i] = c
+                new_word = ''.join(word_chars)
+                if new_word in word_set:
+                    neighbors.add(new_word)
+            word_chars[i] = original
+            
+        return neighbors
+
+    def _generate_word_pair(self, rng: Random, length: int) -> Tuple[str, str, List[str]]:
+        """Generate valid start/end words with solution path"""
+        word_set = self.word_sets[length]
+        max_attempts = 500
+        
+        for _ in range(max_attempts):
+            start, end = rng.sample(sorted(word_set), 2)
+            path = self._find_path(start, end, word_set)
+            if path and (
+                (self.config.min_chain_length == -1 and self.config.max_chain_length == -1) or
+                (self.config.min_chain_length <= len(path) <= self.config.max_chain_length)
+            ):
+                return start, end, path
+        
+        raise RuntimeError(f"Failed to find valid pair for length {length} after {max_attempts} attempts")
+
+    def __getitem__(self, idx: int) -> dict:
+        """Generate a single word ladder task"""
+        rng = Random(self.seed + idx)
+        length = rng.randint(self.config.min_word_length, self.config.max_word_length)
+        start, end, path = self._generate_word_pair(rng, length)
+        
+        return {
+            "question": f"Transform the word '{start}' into '{end}' by changing one letter at a time. Each step must create a valid English word (including plurals) and keep the same word length. Show the sequence of words needed.",
+            "answer": ",".join(path),
+            "metadata": {
+                "start_word": start,
+                "end_word": end,
+                "word_length": length,
+                "chain_length": len(path)
+            }
+        }
+
+
+register_dataset("word_ladder", WordLadderDataset, WordLadderConfig)
--- a/reasoning_gym/data/words.csv
+++ b/reasoning_gym/data/words.csv
--- a/tests/test_word_ladder.py
+++ b/tests/test_word_ladder.py
@ -0,0 +1,147 @@
+import pytest
+
+from reasoning_gym.algorithmic.word_ladder import WordLadderConfig, WordLadderDataset
+
+
+def test_word_ladder_config_validation():
+    """Test that invalid configs raise appropriate errors"""
+    # Test min_word_length validation
+    with pytest.raises(AssertionError):
+        config = WordLadderConfig(min_word_length=2)
+        config.validate()
+
+    # Test max_word_length validation
+    with pytest.raises(AssertionError):
+        config = WordLadderConfig(max_word_length=6)
+        config.validate()
+
+    # Test word length relationship
+    with pytest.raises(AssertionError):
+        config = WordLadderConfig(min_word_length=5, max_word_length=3)
+        config.validate()
+
+    # Test min_chain_length validation
+    with pytest.raises(AssertionError):
+        config = WordLadderConfig(min_chain_length=2)
+        config.validate()
+
+    # Test chain length relationship
+    with pytest.raises(AssertionError):
+        config = WordLadderConfig(min_chain_length=5, max_chain_length=3)
+        config.validate()
+
+
+def test_word_ladder_dataset_deterministic():
+    """Test that dataset generates same items with same seed"""
+    config = WordLadderConfig(seed=42, size=10)
+    dataset1 = WordLadderDataset(config)
+    dataset2 = WordLadderDataset(config)
+
+    for i in range(len(dataset1)):
+        assert dataset1[i] == dataset2[i]
+
+
+def test_word_ladder_dataset_items():
+    """Test basic properties of generated items"""
+    config = WordLadderConfig(
+        min_word_length=3,
+        max_word_length=5,
+        min_chain_length=3,
+        max_chain_length=5,
+        size=10,
+        seed=42
+    )
+    dataset = WordLadderDataset(config)
+
+    for i in range(len(dataset)):
+        item = dataset[i]
+        # Check item structure
+        assert isinstance(item, dict)
+        assert "question" in item
+        assert "answer" in item
+        assert "metadata" in item
+
+        # Check metadata
+        metadata = item["metadata"]
+        assert "start_word" in metadata
+        assert "end_word" in metadata
+        assert "word_length" in metadata
+        assert "chain_length" in metadata
+
+        # Verify word length constraints
+        word_length = metadata["word_length"]
+        assert config.min_word_length <= word_length <= config.max_word_length
+        assert len(metadata["start_word"]) == word_length
+        assert len(metadata["end_word"]) == word_length
+
+        # Verify solution chain from answer
+        solution_chain = item["answer"].split(",")
+        
+        # Handle chain length validation based on whether it's shortest path (-1) or specified length
+        if metadata["chain_length"] == -1:
+            # For shortest path, just ensure it's a valid path (we can't predict exact length)
+            assert len(solution_chain) >= 2  # Must have at least start and end words
+        else:
+            # For specified length, ensure it matches config constraints
+            assert config.min_chain_length <= len(solution_chain) <= config.max_chain_length
+            assert len(solution_chain) == metadata["chain_length"]
+        
+        assert solution_chain[0] == metadata["start_word"]
+        assert solution_chain[-1] == metadata["end_word"]
+        assert all(len(word) == word_length for word in solution_chain)
+
+        # Verify each step differs by only one letter
+        for j in range(len(solution_chain) - 1):
+            differences = sum(1 for a, b in zip(solution_chain[j], solution_chain[j + 1]) if a != b)
+            assert differences == 1
+
+
+def test_word_ladder_differs_by_one():
+    """Test the _differs_by_one helper method"""
+    config = WordLadderConfig()
+    dataset = WordLadderDataset(config)
+
+    # Test words that differ by one letter
+    assert dataset._differs_by_one("CAT", "BAT")
+    assert dataset._differs_by_one("DOG", "LOG")
+    assert dataset._differs_by_one("WORD", "WARD")
+
+    # Test words that differ by more than one letter
+    assert not dataset._differs_by_one("CAT", "DOG")
+    assert not dataset._differs_by_one("WORD", "WAND")
+
+    # Test words of different lengths
+    assert not dataset._differs_by_one("CAT", "CATS")
+    assert not dataset._differs_by_one("DOG", "DO")
+
+    # Test identical words
+    assert not dataset._differs_by_one("CAT", "CAT")
+
+
+def test_word_ladder_find_path():
+    """Test the _find_path helper method"""
+    config = WordLadderConfig()
+    dataset = WordLadderDataset(config)
+
+    # Create a small test word set
+    word_set = {"CAT", "BAT", "BAR", "CAR"}
+
+    # Test finding valid paths
+    path1 = dataset._find_path("CAT", "BAR", word_set)
+    assert path1 is not None
+    assert path1[0] == "CAT"
+    assert path1[-1] == "BAR"
+    assert all(word in word_set for word in path1)
+
+    # Test when no path exists
+    word_set = {"CAT", "DOG"}
+    path2 = dataset._find_path("CAT", "DOG", word_set)
+    assert path2 is None
+
+    # Test path to same word
+    path3 = dataset._find_path("CAT", "CAT", word_set)
+    assert path3 == ["CAT"]
+
+
+if __name__ == "__main__":
+    pytest.main([__file__])