LeetCode 182: Duplicate Emails Solution in SQL Explained

Finding duplicate emails in a table might feel like spotting repeat offenders in a contact list, and LeetCode 182: Duplicate Emails is an easy-level challenge that makes it approachable! Given a Person table with columns id and email, you need to write an SQL query to report all duplicate email addresses, returning a result with a single column Email, ordered arbitrarily. In this blog, we’ll solve it with SQL, exploring two solutions—GROUP BY with HAVING (our best solution) and Self Join (a practical alternative). With step-by-step examples, detailed query breakdowns, and tips, you’ll master this problem. While this is an SQL challenge, you can explore Python basics at Python Basics for related coding skills. Let’s hunt down those duplicates!

Problem Statement

Section link icon

In LeetCode 182, you’re given a Person table:

  • id (int, primary key)
  • email (varchar)

Your task is to write an SQL query to find all email addresses that appear more than once in the table, returning a result with a single column Email, ordered arbitrarily. This differs from hierarchical comparison like LeetCode 181: Employees Earning More Than Their Managers, focusing on frequency detection within a single table.

Constraints

  • Table has at least 1 row.
  • email is a string (case-sensitive in SQL).

Example

Let’s see a case:

Person table:
+----+---------+
| id | email   |
+----+---------+
| 1  | a@b.com |
| 2  | c@d.com |
| 3  | a@b.com |
+----+---------+
Output:
+---------+
| Email   |
+---------+
| a@b.com |
+---------+
Explanation:
<ul>
<li>a@b.com appears twice (ids 1, 3).</li>
<li>c@d.com appears once (id 2).</li>
<li>Only a@b.com is duplicated.</li>
</ul>

This shows we’re identifying repeated emails.

Understanding the Problem

Section link icon

How do you solve LeetCode 182: Duplicate Emails in SQL? We need to:

  • Find emails that appear more than once in the Person table.
  • Return only those emails in a column named Email.
  • Handle any number of rows, ensuring duplicates are detected.

For [1:a@b.com, 2:c@d.com, 3:a@b.com], a@b.com appears twice, so it’s included; c@d.com appears once, so it’s not. We need an efficient query to count or compare occurrences, not a sequence task like LeetCode 180: Consecutive Numbers. We’ll use: 1. GROUP BY with HAVING: Simple, efficient—our best solution. 2. Self Join: Alternative approach.

Let’s dive into the best solution.

Best Solution: GROUP BY with HAVING Approach

Section link icon

Explanation

GROUP BY with HAVING finds duplicate emails by:

  • Grouping the Person table by email.
  • Using HAVING COUNT(*) > 1 to filter groups with more than one occurrence.
  • Selecting the email column as Email.

This ensures O(n) time with indexing on email, O(1) extra space beyond result, and clarity with a single aggregation, making it the most efficient and straightforward solution.

For the example:

  • Group by email: a@b.com (2), c@d.com (1).
  • Filter: a@b.com > 1, output "a@b.com".

Step-by-Step Example

Example: Person = [1:a@b.com, 2:c@d.com, 3:a@b.com]

Goal: Return table with Email = "a@b.com".

  • Step 1: Group by email.
    • a@b.com: ids 1, 3 → count = 2.
    • c@d.com: id 2 → count = 1.
  • Step 2: Apply HAVING COUNT(*) > 1.
    • a@b.com: 2 > 1 → include.
    • c@d.com: 1 ≤ 1 → exclude.
  • Step 3: Select email as Email.
    • Result: "a@b.com".
  • Finish: Return ["a@b.com"].

How the Code Works (GROUP BY with HAVING) – Detailed Line-by-Line Explanation

Here’s the SQL query with a thorough breakdown:

-- Line 1: Select email column
SELECT 
    email AS Email
    -- Email column renamed (e.g., "a@b.com")

-- Line 2: From Person table
FROM 
    Person
    -- Source table (e.g., all rows)

-- Line 3: Group by email
GROUP BY 
    email
    -- Aggregate by email (e.g., a@b.com:2, c@d.com:1)

-- Line 4: Filter for duplicates
HAVING 
    COUNT(*) > 1
    -- Keep groups with count > 1 (e.g., 2 > 1 for a@b.com)

This detailed breakdown clarifies how GROUP BY with HAVING efficiently identifies duplicate emails.

Alternative: Self Join Approach

Section link icon

Explanation

Self Join finds duplicate emails by:

  • Joining Person with itself (p1, p2).
  • Matching p1.email = p2.email where p1.id != p2.id (different rows).
  • Selecting distinct p1.email as Email.

It’s a practical alternative, O(n²) time without indexing (O(n) with indexing on email), O(n) space for join results, but more complex and less efficient than aggregation, though widely compatible.

For the example:

  • Join: (1,a@b.com) with (3,a@b.com) → "a@b.com".
  • Distinct: "a@b.com".

Step-by-Step Example (Alternative)

For the same example:

  • Step 1: Self-join Person.
    • (1,a@b.com)(3,a@b.com): same email, different ids → match.
    • (2,c@d.com) → no match (no duplicate).
  • Step 2: Select distinct emails.
    • "a@b.com".
  • Finish: Return ["a@b.com"].

How the Code Works (Self Join)

SELECT DISTINCT 
    p1.email AS Email
FROM 
    Person p1
JOIN 
    Person p2
ON 
    p1.email = p2.email
    AND p1.id != p2.id

Complexity

  • GROUP BY with HAVING:
    • Time: O(n) – aggregation with indexing.
    • Space: O(1) – minimal extra space.
  • Self Join:
    • Time: O(n²) – join without indexing (O(n) with indexing).
    • Space: O(n) – join result.

Efficiency Notes

GROUP BY with HAVING is the best solution with O(n) time and O(1) space, offering simplicity and efficiency with a single pass—Self Join uses O(n²) time without optimization (O(n) with indexing), requiring more space and complexity, making it a viable but less preferred alternative.

Key Insights

  • GROUP BY: Counts occurrences.
  • Self Join: Pairs duplicates.
  • DISTINCT: Avoids repeats.

Additional Example

Section link icon
Person:
| 1 | x@y.com |
| 2 | z@w.com |
| 3 | x@y.com |
| 4 | z@w.com |
Output:
| x@y.com |
| z@w.com |
Explanation: Both x@y.com and z@w.com appear twice.

Edge Cases

Section link icon
  • No Duplicates: [a@b.com, c@d.com][].
  • Single Row: [a@b.com][].
  • All Duplicates: [a@b.com, a@b.com, a@b.com][a@b.com].

Both solutions handle these well.

Final Thoughts

Section link icon

LeetCode 182: Duplicate Emails in SQL is a foundational frequency challenge. The GROUP BY with HAVING solution excels with its efficiency and clarity, while Self Join offers a traditional approach. Want more SQL? Try LeetCode 181: Employees Earning More Than Their Managers for joins or explore Python Basics for coding skills. Ready to practice? Solve LeetCode 182 on LeetCode with [a@b.com, c@d.com, a@b.com], aiming for "a@b.com"—test your skills now!