LeetCode 30: Substring with Concatenation of All Words Solution in Python Explained

Problem Statement

Section link icon

LeetCode 30, Substring with Concatenation of All Words, is a hard-level problem where you’re given a string s and an array of words words, all of the same length. Your task is to find all starting indices of substrings in s that are a concatenation of each word in words exactly once, in any order. The result can be returned in any order.

Constraints

  • 1 <= s.length <= 10^4: String length is between 1 and 10,000.
  • 1 <= words.length <= 5000: Number of words is between 1 and 5000.
  • 1 <= words[i].length <= 30: Each word’s length is between 1 and 30.
  • s and words[i] consist of lowercase English letters.
  • All words in words have the same length.

Example

Input: s = "barfoothefoobarman", words = ["foo","bar"]
Output: [0,9]
Explanation: "barfoo" (index 0) and "foobar" (index 9) are concatenations of "foo" and "bar".

Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]
Output: []
Explanation: No substring contains exactly one "word", "good", "best", "word".

Input: s = "catfoxcat", words = ["cat","fox"]
Output: [0,6]
Explanation: "catfox" (index 0) and "foxcat" (index 6) match.

Understanding the Problem

Section link icon

How do you solve LeetCode 30: Substring with Concatenation of All Words in Python? For s = "barfoothefoobarman" and words = ["foo","bar"], you need to find substrings like "barfoo" (index 0) and "foobar" (index 9) that use each word exactly once. The words are fixed-length, and order doesn’t matter. We’ll explore two approaches: a Sliding Window with HashMap Solution (optimal and primary) and an Alternative with Brute Force (intuitive but slower). The sliding window uses a hash map to track word counts efficiently.

Solution 1: Sliding Window with HashMap Approach (Primary)

Section link icon

Explanation

Use a sliding window of size len(words) * len(words[0]) (total length of all words concatenated). Slide through s, checking substrings by splitting them into word-sized chunks and comparing their counts against words using a HashMap. Since words can start at different offsets (e.g., 0, 1, 2 for word length 3), check each offset up to the word length.

Here’s a visual for s = "barfoothefoobarman", words = ["foo","bar"]:

Window at 0: "barfoo" -> {"bar":1, "foo":1} (match)
Window at 1: "arfoot" -> no match
...
Window at 9: "foobar" -> {"foo":1, "bar":1} (match)
  1. Handle Edge Case.
  • If s or words is empty, return empty list.
  1. Setup Parameters.
  • Word length, total window size, HashMap of words.
  1. Slide Window for Each Offset.
  • Check substrings starting at each position modulo word length.
  1. Track Word Counts.
  • Use HashMap to match substring words with words.
  1. Return Indices.
  • Collect all valid starting indices.

Step-by-Step Example

Example 1: s = "barfoothefoobarman", words = ["foo","bar"]

We need indices [0, 9].

  • Goal: Find substrings with "foo" and "bar" once each.
  • Result for Reference: [0, 9].
  • Start: s = "barfoothefoobarman", words = ["foo","bar"], word_len = 3, total_len = 6.
  • Step 1: Edge cases pass.
  • Step 2: word_count = {"foo":1, "bar":1}, n = 18, check offsets 0 to 2.
  • Step 3: Offset 0.
    • i = 0, "barfoo" -> {"bar":1, "foo":1}, matches, add 0.
    • i = 6, "thefoo" -> no match.
    • i = 12, "foobar" -> {"foo":1, "bar":1}, matches, add 9 (12-3).
  • Step 4: Offset 1, 2 (e.g., "arf", "rfo") – no full matches.
  • Finish: Return [0, 9].

How the Code Works (Sliding Window)

Here’s the Python code with line-by-line explanation:

from collections import Counter

def findSubstring(s: str, words: list[str]) -> list[int]:
    # Line 1: Handle empty inputs
    if not s or not words:
        return []

    # Line 2: Setup parameters
    word_len = len(words[0])          # Length of each word
    total_len = word_len * len(words) # Total length of substring
    word_count = Counter(words)       # HashMap of word frequencies
    result = []
    n = len(s)

    # Line 3: Check each offset up to word_len
    for offset in range(word_len):
        # Line 4: Slide window starting at offset
        left = offset
        seen = Counter()
        count = 0  # Number of valid words seen

        # Line 5: Iterate through possible windows
        for right in range(offset, n - word_len + 1, word_len):
            # Line 6: Get current word
            curr_word = s[right:right + word_len]
            # Line 7: If word in words
            if curr_word in word_count:
                seen[curr_word] += 1
                count += 1
                # Line 8: Shrink window if too many words
                while seen[curr_word] > word_count[curr_word]:
                    seen[s[left:left + word_len]] -= 1
                    left += word_len
                    count -= 1
                # Line 9: If exact match
                if count == len(words):
                    result.append(left)
                    seen[s[left:left + word_len]] -= 1
                    left += word_len
                    count -= 1
            else:
                # Line 10: Reset if word not in words
                seen.clear()
                count = 0
                left = right + word_len

    # Line 11: Return all valid indices
    return result

Alternative: Brute Force Approach

Section link icon

Explanation

Check every possible substring of length total_len and verify if it’s a permutation of words by splitting and counting. This is simpler but much slower.

  1. Setup Parameters.
  2. Check Each Substring.
  3. Validate with HashMap.
  4. Return Indices.

Step-by-Step Example (Alternative)

For s = "barfoothefoobarman", words = ["foo","bar"]:

  • Goal: Indices [0, 9].
  • Start: total_len = 6.
  • Step 1: i = 0, "barfoo" -> {"bar":1, "foo":1}, match, add 0.
  • Step 2: i = 1, "arfoot" -> no match.
  • Step 3: i = 9, "foobar" -> {"foo":1, "bar":1}, match, add 9.
  • Finish: Return [0, 9].

How the Code Works (Brute Force)

from collections import Counter

def findSubstringBrute(s: str, words: list[str]) -> list[int]:
    # Line 1: Handle empty inputs
    if not s or not words:
        return []

    # Line 2: Setup
    word_len = len(words[0])
    total_len = word_len * len(words)
    word_count = Counter(words)
    result = []

    # Line 3: Check every substring
    for i in range(len(s) - total_len + 1):
        curr_str = s[i:i + total_len]
        # Line 4: Split into words
        temp = [curr_str[j:j + word_len] for j in range(0, total_len, word_len)]
        # Line 5: Compare counts
        if Counter(temp) == word_count:
            result.append(i)

    # Line 6: Return result
    return result

Complexity

  • Sliding Window:
    • Time: O(n * k) – n is string length, k is word length (for substring checks).
    • Space: O(m) – m is number of unique words in words.
  • Brute Force:
    • Time: O(n * m * k) – n substrings, m words, k for comparison.
    • Space: O(m).

Efficiency Notes

Sliding Window is far more efficient (O(n * k) vs. O(n * m * k)), making it ideal for LeetCode 30. Brute force is too slow for large inputs but helps build intuition.

Key Insights

Tips to master LeetCode 30:

  • HashMap is Your Friend: Track word frequencies efficiently.
  • Window Size is Fixed: Total length = words count * word length.
  • Offsets Matter: Check each starting position up to word length.

Additional Example

Section link icon

For s = "catfoxcat", words = ["cat","fox"]:

  • Goal: Indices [0, 6].
  • Sliding Window: "catfox" (0), "foxcat" (6), return [0, 6].
  • Result: [0, 6].

Edge Cases

Section link icon
  • Empty Input: s = "", words = [] – Return [].
  • No Match: s = "abc", words = ["def"] – Return [].
  • Repeated Words: s = "wordword", words = ["word","word"] – Return [0].

Final Thoughts

Section link icon

The Sliding Window with HashMap solution shines for LeetCode 30 in Python—efficient and clever, perfect for handling complex string matching. For a related challenge, try LeetCode 29: Divide Two Integers to test your bit manipulation skills! Ready to tackle this beast? Solve LeetCode 30 on LeetCode now and experiment with tricky inputs like repeated words to master it. Jump in and conquer the challenge!