LeetCode 192: Word Frequency Solution in Shell Explained

Counting word frequencies in a text file might feel like tallying votes in a linguistic election, and LeetCode 192: Word Frequency is a medium-level challenge that makes it approachable! Given a text file words.txt, you need to write a bash command to output each word and its frequency, sorted by frequency in descending order, with ties broken by ascending alphabetical order, in the format word frequency. In this blog, we’ll solve it with shell commands, exploring two solutions—awk Command (our best solution) and grep with sort (a practical alternative). With step-by-step examples, detailed command breakdowns, and tips, you’ll master this problem. While this is a shell challenge, you can explore Python basics at Python Basics for related coding skills. Let’s count those words!

Problem Statement

Section link icon

In LeetCode 192, you’re given a text file words.txt containing space-separated words. Your task is to write a one-line bash command to:

  • Read the file.
  • Count the frequency of each word.
  • Sort by frequency descending, then alphabetically ascending for ties.
  • Output each word followed by its frequency (e.g., "the 5").

This differs from bit manipulation like LeetCode 191: Number of 1 Bits, focusing on text processing with shell tools rather than integer operations.

Constraints

  • File exists and is readable (words.txt).
  • Words are space-separated, no punctuation or special formatting.
  • Output format: word frequency, one per line.

Example

Let’s see a case:

words.txt:
the day is sunny the sunny is day

Output:
the 2
day 2
is 2
sunny 2
Explanation:
<ul>
<li>"the": 2 times.</li>
<li>"day": 2 times.</li>
<li>"is": 2 times.</li>
<li>"sunny": 2 times.</li>
<li>Sorted by frequency (all 2), then alphabetically.</li>
</ul>
words.txt:
cat bat cat bat dog

Output:
bat 2
cat 2
dog 1
Explanation:
<ul>
<li>"bat": 2, "cat": 2, "dog": 1.</li>
<li>Sorted: frequency descending (2, 1), then alpha (bat, cat).</li>
</ul>

These examples show we’re counting and sorting word frequencies.

Understanding the Problem

Section link icon

How do you solve LeetCode 192: Word Frequency in a bash command? We need to:

  • Read words.txt and split into words.
  • Count occurrences of each word.
  • Sort by frequency (descending) and word (ascending).
  • Output in word frequency format.

For "the day is sunny the sunny is day", all words appear twice, so we sort alphabetically: "day 2", "is 2", "sunny 2", "the 2". We need a concise command-line solution using tools like awk, grep, sort, and uniq, not a Python task like LeetCode 190: Reverse Bits. We’ll use: 1. awk Command: Efficient, concise—our best solution. 2. grep with sort: Alternative approach.

Let’s dive into the best solution.

Best Solution: awk Command Approach

Section link icon

Explanation

awk Command processes the file by:

  • Splitting input into words (cat words.txt).
  • Using awk to count frequencies in an associative array.
  • Sorting in the END block by frequency (descending) and word (ascending).
  • Printing each word and its count.

This ensures O(n) time for counting (n = words), O(m log m) for sorting (m = unique words), and O(m) space (m = unique words), making it efficient and compact in one line.

For "the day is sunny the sunny is day":

  • Count: the=2, day=2, is=2, sunny=2.
  • Sort: day, is, sunny, the (all 2).
  • Output: "day 2", "is 2", "sunny 2", "the 2".

Step-by-Step Example

Example 1: words.txt = "the day is sunny the sunny is day"

Goal: Return day 2, is 2, sunny 2, the 2.

  • Step 1: Read file.
    • cat words.txt: "the day is sunny the sunny is day".
  • Step 2: Process with awk.
    • Split into words: "the", "day", "is", "sunny", "the", "sunny", "is", "day".
    • Count: cnt["the"]=2, cnt["day"]=2, cnt["is"]=2, cnt["sunny"]=2.
  • Step 3: Sort and output in END block.
    • Array: {"the":2, "day":2, "is":2, "sunny":2{.
    • Sort by value (desc, all 2), then key (asc): "day", "is", "sunny", "the".
    • Print: "day 2", "is 2", "sunny 2", "the 2".
  • Finish: Output as required.

Example 2: words.txt = "cat bat cat bat dog"

Goal: Return bat 2, cat 2, dog 1.

  • Step 1: cat words.txt: "cat bat cat bat dog".
  • Step 2: awk counts:
    • cnt["cat"]=2, cnt["bat"]=2, cnt["dog"]=1.
  • Step 3: Sort:
    • Freq desc: 2 (bat, cat), 1 (dog).
    • Alpha asc: "bat", "cat".
    • Result: "bat 2", "cat 2", "dog 1".
  • Finish: Output as required.

How the Code Works (awk Command) – Detailed Line-by-Line Explanation

Here’s the bash command with a thorough breakdown (as a one-liner):

# Command: cat words.txt | awk '{for(i=1;i<=NF;i++)cnt[$i]++{ END{for(w in cnt)print w,cnt[w]{' | sort -k2nr -k1

# Line 1: Read file
cat words.txt |
    # Input file (e.g., "the day is sunny the sunny is day")

# Line 2: awk processes words
awk '{for(i=1;i<=NF;i++)cnt[$i]++{ END{for(w in cnt)print w,cnt[w]{' |
    # {for(i=1;i<=NF;i++)cnt[$i]++{: Count each word (e.g., cnt["the"]+=1)
    # END{for(w in cnt)print w,cnt[w]{: Print word and count (e.g., "the 2")

# Line 3: Sort output
sort -k2nr -k1
    # -k2nr: Sort by 2nd field (freq), numeric, reverse (desc)
    # -k1: Then by 1st field (word), ascending
    # e.g., "day 2", "is 2", "sunny 2", "the 2"

This detailed breakdown clarifies how the awk command efficiently counts and sorts word frequencies.

Alternative: grep with sort Approach

Section link icon

Explanation

grep with sort uses multiple tools:

  • tr to split words into lines.
  • sort and uniq -c to count frequencies.
  • sort again for final ordering.
  • awk to reformat output.

It’s a practical alternative, O(n log n) time (sorting dominates), O(n) space (temporary files), but less concise and involves more steps.

For "cat bat cat bat dog":

  • Split: "cat", "bat", "cat", "bat", "dog".
  • Count: "bat 2", "cat 2", "dog 1".
  • Sort: "bat 2", "cat 2", "dog 1".

Step-by-Step Example (Alternative)

For "cat bat cat bat dog":

  • Step 1: tr ' ' '\n': "cat\nbat\ncat\nbat\ndog".
  • Step 2: sort | uniq -c: " 1 dog", " 2 bat", " 2 cat".
  • Step 3: sort -k1nr: " 2 bat", " 2 cat", " 1 dog".
  • Step 4: awk '{print $2 " " $1{': "bat 2", "cat 2", "dog 1".
  • Finish: Output as required.

How the Code Works (grep with sort)

# Command: cat words.txt | tr ' ' '\n' | sort | uniq -c | sort -k1nr | awk '{print $2 " " $1{'

cat words.txt | tr ' ' '\n' | sort | uniq -c | sort -k1nr | awk '{print $2 " " $1{'

Complexity

  • awk Command:
    • Time: O(n + m log m) – counting + sorting (n=words, m=unique).
    • Space: O(m) – hash table.
  • grep with sort:
    • Time: O(n log n) – multiple sorts.
    • Space: O(n) – temporary lines.

Efficiency Notes

awk Command is the best solution with O(n + m log m) time and O(m) space, offering conciseness and efficiency in one tool—grep with sort uses O(n log n) time and O(n) space, requiring multiple steps, making it less elegant but still effective.

Key Insights

  • awk: All-in-one processing.
  • tr/sort: Pipeline approach.
  • Sorting: Dual criteria.

Additional Example

Section link icon
words.txt:
dog dog cat
Output:
dog 2
cat 1
Explanation: "dog" twice, "cat" once.

Edge Cases

Section link icon
  • Empty File: Empty output.
  • Single Word: Single line.
  • All Same: One line with count.

Both solutions handle these well.

Final Thoughts

Section link icon

LeetCode 192: Word Frequency in shell is a practical text-processing challenge. The awk Command solution excels with its efficiency and brevity, while grep with sort offers a multi-tool approach. Want more shell? Try LeetCode 193: Valid Phone Numbers for pattern matching or explore Python Basics for coding skills. Ready to practice? Solve LeetCode 192 on LeetCode with "the day is sunny the sunny is day", aiming for the sorted frequencies—test your skills now!