LeetCode 433: Minimum Genetic Mutation Solution in Python – A Step-by-Step Guide

Imagine you’re a geneticist tasked with tweaking a DNA sequence—like changing "AACCGGTT" into "AACCGGTA"—but you can only swap one letter at a time (A, C, G, T), and each step must match a valid gene from a given bank, such as ["AACCGGTA"]. How many swaps do you need to reach the target? For this pair, it’s just one swap (T to A at the end). That’s the fascinating challenge of LeetCode 433: Minimum Genetic Mutation, a medium-level problem that’s all about finding the shortest path through a genetic maze. Using Python, we’ll tackle it two ways: the Best Solution, a BFS with queue that finds the minimum steps efficiently, and an Alternative Solution, a DFS with backtracking that explores all paths. With examples, code, and a friendly vibe, this guide will help you mutate that gene, whether you’re new to coding or leveling up your skills. Let’s tweak those bases and dive in!

What Is LeetCode 433: Minimum Genetic Mutation?

Section link icon

In LeetCode 433: Minimum Genetic Mutation, you’re given:

  • startGene: A string of 8 characters (A, C, G, T) representing the initial gene (e.g., "AACCGGTT").
  • endGene: A target gene string of the same length (e.g., "AACCGGTA").
  • bank: A list of valid gene strings (e.g., ["AACCGGTA"]).

Your task is to return the minimum number of mutations (single-character changes) needed to transform startGene into endGene, where each intermediate gene must be in bank. If impossible, return -1. For example, "AACCGGTT" to "AACCGGTA" with bank ["AACCGGTA"] takes 1 mutation, as "AACCGGTT" → "AACCGGTA" is valid.

Problem Statement

  • Input: startGene (str), endGene (str), bank (List[str]).
  • Output: An integer—minimum mutations or -1 if impossible.
  • Rules:
    • Each mutation changes one character (A, C, G, T).
    • Each intermediate gene must be in bank.
    • Start gene may or may not be in bank.

Constraints

  • startGene.length == 8.
  • endGene.length == 8.
  • 0 <= bank.length <= 10.
  • bank[i].length == 8.
  • All strings consist of 'A', 'C', 'G', 'T'.

Examples

  • Input: startGene = "AACCGGTT", endGene = "AACCGGTA", bank = ["AACCGGTA"]
    • Output: 1 (one mutation: T → A).
  • Input: startGene = "AACCGGTT", endGene = "AAACGGTA", bank = ["AACCGGTA","AACCGCTA","AAACGGTA"]
    • Output: 2 (e.g., "AACCGGTT" → "AACCGGTA" → "AAACGGTA").
  • Input: startGene = "AAAAACCC", endGene = "AACCCCCC", bank = ["AAAACCCC","AAACCCCC","AACCCCCC"]
    • Output: 3 (e.g., "AAAAACCC" → "AAAACCCC" → "AAACCCCC" → "AACCCCCC").

Understanding the Problem: Finding the Shortest Mutation Path

Section link icon

To solve LeetCode 433: Minimum Genetic Mutation in Python, we need to find the minimum number of single-character changes to transform startGene into endGene, ensuring each step is a valid gene from bank. A naive idea might be to try every possible mutation—but with 8 positions and 4 choices (A, C, G, T), that’s inefficient! We’ll use:

  • Best Solution (BFS with Queue): O(n * 8 * 4) time, O(n) space—finds shortest path efficiently.
  • Alternative Solution (DFS with Backtracking): O(4⁸) time worst case, O(h) space—explores all paths.

Let’s dive into the BFS solution with a clear, step-by-step explanation.

Best Solution: BFS with Queue

Section link icon

Why This Is the Best Solution

The BFS (Breadth-First Search) with queue method is the top pick because it’s efficient—O(n * 8 * 4) time (n = bank size, 8 positions, 4 bases), O(n) space—and guarantees the shortest path by exploring level-by-level, ensuring we find the minimum mutations first. It uses a queue to track genes at each step, a set to avoid revisiting, and checks valid mutations against the bank. It’s like mapping a genetic maze, finding the quickest route to the target!

How It Works

Think of the genes as nodes in a graph, connected by single mutations:

  • Step 1: Initialize:
    • Queue: Start with (startGene, 0) (gene, mutations).
    • Seen: Set to track visited genes (avoid cycles).
    • Bank: Convert to set for O(1) lookup.
  • Step 2: BFS Loop:
    • Dequeue gene and mutation count.
    • If gene is endGene, return count (shortest path found).
    • For each position (0-7):
      • Try replacing with A, C, G, T.
      • If new gene is in bank and unseen, enqueue with count + 1.
  • Step 3: Return:
    • If queue empties without finding endGene, return -1.
  • Step 4: Why It Works:
    • BFS ensures shortest path (level = mutations).
    • Queue explores all valid mutations systematically.
    • It’s like a genetic GPS finding the fastest mutation route!

Step-by-Step Example

Example: startGene = "AACCGGTT", endGene = "AACCGGTA", bank = ["AACCGGTA"]

  • Setup:
    • Queue: [("AACCGGTT", 0)].
    • Seen: {"AACCGGTT"}.
    • Bank: {"AACCGGTA"}.
  • Level 0:
    • Dequeue ("AACCGGTT", 0):
      • Pos 7: T → A: "AACCGGTA" in bank, not seen.
      • Enqueue ("AACCGGTA", 1), seen = {"AACCGGTT", "AACCGGTA"}.
      • Other mutations (e.g., T → C) not in bank, skip.
  • Level 1:
    • Dequeue ("AACCGGTA", 1):
      • Matches "AACCGGTA", return 1.
  • Result: 1.

Example: startGene = "AACCGGTT", endGene = "AAACGGTA", bank = ["AACCGGTA","AACCGCTA","AAACGGTA"]

  • Queue: [("AACCGGTT", 0)], seen = {"AACCGGTT"}.
  • Level 0:
    • "AACCGGTT":
      • Pos 7: T → A: "AACCGGTA", enqueue ("AACCGGTA", 1).
  • Level 1:
    • "AACCGGTA":
      • Pos 2: C → A: "AAACGGTA", enqueue ("AAACGGTA", 2).
  • Level 2:
    • "AAACGGTA": Matches, return 2.
  • Result: 2.

Code with Detailed Line-by-Line Explanation

Here’s the Python code, broken down so you can follow every step:

from collections import deque

class Solution:
    def minMutation(self, startGene: str, endGene: str, bank: List[str]) -> int:
        if endGene not in bank:
            return -1

        # Step 1: Initialize
        bank_set = set(bank)  # O(1) lookup
        queue = deque([(startGene, 0)])  # (gene, mutations)
        seen = {startGene}  # Visited genes
        bases = ['A', 'C', 'G', 'T']  # Possible mutations

        # Step 2: BFS
        while queue:
            gene, mutations = queue.popleft()
            if gene == endGene:
                return mutations

            # Try mutating each position
            for i in range(8):
                for base in bases:
                    if gene[i] != base:
                        new_gene = gene[:i] + base + gene[i+1:]
                        if new_gene in bank_set and new_gene not in seen:
                            queue.append((new_gene, mutations + 1))
                            seen.add(new_gene)

        # Step 3: If not found
        return -1
  • Line 1: Import deque for queue.
  • Line 5-7: Check if endGene is in bank, return -1 if not (impossible).
  • Line 10-13: Initialize:
    • bank_set: Set for O(1) lookup (e.g., {"AACCGGTA"}).
    • queue: Start with (startGene, 0) (e.g., ("AACCGGTT", 0)).
    • seen: Track visited genes (e.g., {"AACCGGTT"}).
    • bases: List of mutation options (A, C, G, T).
  • Line 16-27: BFS loop:
    • Line 17-18: Dequeue gene and count (e.g., ("AACCGGTT", 0)).
    • Line 19-20: If matches endGene, return mutations (e.g., 1).
    • Line 22-27: For each position and base:
      • Line 24: Skip if same (e.g., T → T).
      • Line 25: Create new gene (e.g., "AACCGGTA").
      • Line 26-27: If valid and unseen, enqueue with +1 mutation (e.g., ("AACCGGTA", 1)).
  • Line 29: Return -1 if queue empties without match.
  • Time Complexity: O(n * 8 * 4)—n bank checks, 8 positions, 4 bases.
  • Space Complexity: O(n)—queue and seen set.

This is like navigating a genetic maze with a BFS-guided compass!

Alternative Solution: DFS with Backtracking

Section link icon

Why an Alternative Approach?

The DFS (Depth-First Search) with backtracking method explores all possible mutation paths recursively, tracking the minimum steps to reach endGene. It’s O(4⁸) time in the worst case (exponential due to branching) and O(h) space (h = recursion depth)—less efficient but intuitive for path exploration. It’s like a genetic adventurer diving deep into every mutation possibility!

How It Works

  • Step 1: Start at startGene, track steps and visited genes.
  • Step 2: Recurse on each valid mutation:
    • Change one position to A, C, G, T.
    • If in bank and unseen, explore deeper.
    • Update min steps if endGene reached.
  • Step 3: Return min steps or -1.

Step-by-Step Example (Simplified)

Example: startGene = "AACCGGTT", endGene = "AACCGGTA", bank = ["AACCGGTA"]

  • DFS:
    • Start "AACCGGTT", steps = 0:
      • Pos 7: T → A: "AACCGGTA" (in bank), steps = 1, matches → min = 1.
    • Other paths (e.g., T → C) not in bank, prune.
  • Result: 1.

Code for DFS Approach

class Solution:
    def minMutation(self, startGene: str, endGene: str, bank: List[str]) -> int:
        if endGene not in bank:
            return -1

        bank_set = set(bank)
        bases = ['A', 'C', 'G', 'T']
        self.min_steps = float('inf')

        def dfs(gene, steps, seen):
            if gene == endGene:
                self.min_steps = min(self.min_steps, steps)
                return
            if steps >= self.min_steps:
                return

            for i in range(8):
                for base in bases:
                    if gene[i] != base:
                        new_gene = gene[:i] + base + gene[i+1:]
                        if new_gene in bank_set and new_gene not in seen:
                            seen.add(new_gene)
                            dfs(new_gene, steps + 1, seen)
                            seen.remove(new_gene)

        dfs(startGene, 0, {startGene})
        return self.min_steps if self.min_steps != float('inf') else -1
  • Time Complexity: O(4⁸)—exponential worst-case branching.
  • Space Complexity: O(h)—recursion stack.

It’s a deep-diving genetic explorer!

Comparing the Two Solutions

Section link icon
  • BFS with Queue (Best):
    • Pros: O(n * 8 * 4), O(n), guarantees shortest path, efficient.
    • Cons: Queue overhead.
  • DFS with Backtracking (Alternative):
    • Pros: O(4⁸), O(h), intuitive exploration.
    • Cons: Exponential time, slower.

BFS wins for efficiency and optimality.

Additional Examples and Edge Cases

Section link icon
  • No path: -1 (e.g., "AACCGGTT", "AACCGGAA", ["AACCGGTA"]).
  • Empty bank: -1 if endGene not reachable.
  • Direct match: 0 if startGene == endGene in bank.

BFS handles all.

Complexity Breakdown

Section link icon
  • BFS: Time O(n * 8 * 4), Space O(n).
  • DFS: Time O(4⁸), Space O(h).

BFS is superior.

Key Takeaways

Section link icon
  • BFS: Shortest path fast!
  • DFS: Explore it all!
  • Mutations: Steps matter.
  • Python Tip: Queues rock—see [Python Basics](/python/basics).

Final Thoughts: Mutate That Gene

Section link icon

LeetCode 433: Minimum Genetic Mutation in Python is a genetic pathfinding quest. BFS with queue zips to the minimum, while DFS with backtracking explores every twist. Want more graph fun? Try LeetCode 207: Course Schedule or LeetCode 752: Open the Lock. Ready to mutate? Head to Solve LeetCode 433 on LeetCode and tweak that gene today!