LeetCode 30: Substring with Concatenation of All Words Solution in Python Explained
Problem Statement
LeetCode 30, Substring with Concatenation of All Words, is a hard-level problem where you’re given a string s
and an array of words words
, all of the same length. Your task is to find all starting indices of substrings in s
that are a concatenation of each word in words
exactly once, in any order. The result can be returned in any order.
Constraints
- 1 <= s.length <= 10^4: String length is between 1 and 10,000.
- 1 <= words.length <= 5000: Number of words is between 1 and 5000.
- 1 <= words[i].length <= 30: Each word’s length is between 1 and 30.
- s and words[i] consist of lowercase English letters.
- All words in words have the same length.
Example
Input: s = "barfoothefoobarman", words = ["foo","bar"]
Output: [0,9]
Explanation: "barfoo" (index 0) and "foobar" (index 9) are concatenations of "foo" and "bar".
Input: s = "wordgoodgoodgoodbestword", words = ["word","good","best","word"]
Output: []
Explanation: No substring contains exactly one "word", "good", "best", "word".
Input: s = "catfoxcat", words = ["cat","fox"]
Output: [0,6]
Explanation: "catfox" (index 0) and "foxcat" (index 6) match.
Understanding the Problem
How do you solve LeetCode 30: Substring with Concatenation of All Words in Python? For s = "barfoothefoobarman"
and words = ["foo","bar"]
, you need to find substrings like "barfoo" (index 0) and "foobar" (index 9) that use each word exactly once. The words are fixed-length, and order doesn’t matter. We’ll explore two approaches: a Sliding Window with HashMap Solution (optimal and primary) and an Alternative with Brute Force (intuitive but slower). The sliding window uses a hash map to track word counts efficiently.
Solution 1: Sliding Window with HashMap Approach (Primary)
Explanation
Use a sliding window of size len(words) * len(words[0])
(total length of all words concatenated). Slide through s
, checking substrings by splitting them into word-sized chunks and comparing their counts against words
using a HashMap. Since words can start at different offsets (e.g., 0, 1, 2 for word length 3), check each offset up to the word length.
Here’s a visual for s = "barfoothefoobarman", words = ["foo","bar"]
:
Window at 0: "barfoo" -> {"bar":1, "foo":1} (match)
Window at 1: "arfoot" -> no match
...
Window at 9: "foobar" -> {"foo":1, "bar":1} (match)
- Handle Edge Case.
- If s or words is empty, return empty list.
- Setup Parameters.
- Word length, total window size, HashMap of words.
- Slide Window for Each Offset.
- Check substrings starting at each position modulo word length.
- Track Word Counts.
- Use HashMap to match substring words with words.
- Return Indices.
- Collect all valid starting indices.
Step-by-Step Example
Example 1: s = "barfoothefoobarman", words = ["foo","bar"]
We need indices [0, 9].
- Goal: Find substrings with "foo" and "bar" once each.
- Result for Reference: [0, 9].
- Start: s = "barfoothefoobarman", words = ["foo","bar"], word_len = 3, total_len = 6.
- Step 1: Edge cases pass.
- Step 2: word_count = {"foo":1, "bar":1}, n = 18, check offsets 0 to 2.
- Step 3: Offset 0.
- i = 0, "barfoo" -> {"bar":1, "foo":1}, matches, add 0.
- i = 6, "thefoo" -> no match.
- i = 12, "foobar" -> {"foo":1, "bar":1}, matches, add 9 (12-3).
- Step 4: Offset 1, 2 (e.g., "arf", "rfo") – no full matches.
- Finish: Return [0, 9].
How the Code Works (Sliding Window)
Here’s the Python code with line-by-line explanation:
from collections import Counter
def findSubstring(s: str, words: list[str]) -> list[int]:
# Line 1: Handle empty inputs
if not s or not words:
return []
# Line 2: Setup parameters
word_len = len(words[0]) # Length of each word
total_len = word_len * len(words) # Total length of substring
word_count = Counter(words) # HashMap of word frequencies
result = []
n = len(s)
# Line 3: Check each offset up to word_len
for offset in range(word_len):
# Line 4: Slide window starting at offset
left = offset
seen = Counter()
count = 0 # Number of valid words seen
# Line 5: Iterate through possible windows
for right in range(offset, n - word_len + 1, word_len):
# Line 6: Get current word
curr_word = s[right:right + word_len]
# Line 7: If word in words
if curr_word in word_count:
seen[curr_word] += 1
count += 1
# Line 8: Shrink window if too many words
while seen[curr_word] > word_count[curr_word]:
seen[s[left:left + word_len]] -= 1
left += word_len
count -= 1
# Line 9: If exact match
if count == len(words):
result.append(left)
seen[s[left:left + word_len]] -= 1
left += word_len
count -= 1
else:
# Line 10: Reset if word not in words
seen.clear()
count = 0
left = right + word_len
# Line 11: Return all valid indices
return result
Alternative: Brute Force Approach
Explanation
Check every possible substring of length total_len
and verify if it’s a permutation of words
by splitting and counting. This is simpler but much slower.
- Setup Parameters.
- Check Each Substring.
- Validate with HashMap.
- Return Indices.
Step-by-Step Example (Alternative)
For s = "barfoothefoobarman", words = ["foo","bar"]
:
- Goal: Indices [0, 9].
- Start: total_len = 6.
- Step 1: i = 0, "barfoo" -> {"bar":1, "foo":1}, match, add 0.
- Step 2: i = 1, "arfoot" -> no match.
- Step 3: i = 9, "foobar" -> {"foo":1, "bar":1}, match, add 9.
- Finish: Return [0, 9].
How the Code Works (Brute Force)
from collections import Counter
def findSubstringBrute(s: str, words: list[str]) -> list[int]:
# Line 1: Handle empty inputs
if not s or not words:
return []
# Line 2: Setup
word_len = len(words[0])
total_len = word_len * len(words)
word_count = Counter(words)
result = []
# Line 3: Check every substring
for i in range(len(s) - total_len + 1):
curr_str = s[i:i + total_len]
# Line 4: Split into words
temp = [curr_str[j:j + word_len] for j in range(0, total_len, word_len)]
# Line 5: Compare counts
if Counter(temp) == word_count:
result.append(i)
# Line 6: Return result
return result
Complexity
- Sliding Window:
- Time: O(n * k) – n is string length, k is word length (for substring checks).
- Space: O(m) – m is number of unique words in words.
- Brute Force:
- Time: O(n * m * k) – n substrings, m words, k for comparison.
- Space: O(m).
Efficiency Notes
Sliding Window is far more efficient (O(n * k) vs. O(n * m * k)), making it ideal for LeetCode 30. Brute force is too slow for large inputs but helps build intuition.
Key Insights
Tips to master LeetCode 30:
- HashMap is Your Friend: Track word frequencies efficiently.
- Window Size is Fixed: Total length = words count * word length.
- Offsets Matter: Check each starting position up to word length.
Additional Example
For s = "catfoxcat", words = ["cat","fox"]
:
- Goal: Indices [0, 6].
- Sliding Window: "catfox" (0), "foxcat" (6), return [0, 6].
- Result: [0, 6].
Edge Cases
- Empty Input: s = "", words = [] – Return [].
- No Match: s = "abc", words = ["def"] – Return [].
- Repeated Words: s = "wordword", words = ["word","word"] – Return [0].
Final Thoughts
The Sliding Window with HashMap solution shines for LeetCode 30 in Python—efficient and clever, perfect for handling complex string matching. For a related challenge, try LeetCode 29: Divide Two Integers to test your bit manipulation skills! Ready to tackle this beast? Solve LeetCode 30 on LeetCode now and experiment with tricky inputs like repeated words to master it. Jump in and conquer the challenge!