LeetCode 196: Delete Duplicate Emails Solution in SQL Explained

Deleting duplicate email entries from a table might feel like cleaning up a cluttered inbox, and LeetCode 196: Delete Duplicate Emails is an easy-level challenge that makes it approachable! Given a Person table with columns id and email, you need to write an SQL query to delete all duplicate email rows, keeping only the one with the smallest id for each email. The query modifies the table in-place. In this blog, we’ll solve it with SQL, exploring two solutions—Self Join with DELETE (our best solution) and Subquery with DELETE (a practical alternative). With step-by-step examples, detailed query breakdowns, and tips, you’ll master this problem. While this is an SQL challenge, you can explore Python basics at Python Basics for related coding skills. Let’s clean up those duplicates!

Problem Statement

In LeetCode 196, you’re given a Person table:

id (int, primary key)
email (varchar)

Your task is to write an SQL query to:

Identify rows with duplicate email values.
Delete all duplicates, keeping only the row with the smallest id for each email.
Modify the table in-place.

This differs from line extraction like LeetCode 195: Tenth Line, focusing on data deduplication rather than text filtering.

Constraints

Table has at least 1 row.
email is a string (case-sensitive).
id is unique.

Example

Let’s see a case:

Person table:
+----+------------------+
| id | email            |
+----+------------------+
| 1  | john@example.com |
| 2  | bob@example.com  |
| 3  | john@example.com |
+----+------------------+

After query:
+----+------------------+
| id | email            |
+----+------------------+
| 1  | john@example.com |
| 2  | bob@example.com  |
+----+------------------+
Explanation:
<ul>
<li>"john@example.com" appears at id=1 and id=3; keep id=1 (smallest), delete id=3.</li>
<li>"bob@example.com" appears once, unchanged.</li>
</ul>

Person table:
+----+------------------+
| id | email            |
+----+------------------+
| 1  | a@b.com          |
| 2  | a@b.com          |
| 3  | c@d.com          |
+----+------------------+

After query:
+----+------------------+
| id | email            |
+----+------------------+
| 1  | a@b.com          |
| 3  | c@d.com          |
+----+------------------+
Explanation:
<ul>
<li>"a@b.com" at id=1 and id=2; keep id=1, delete id=2.</li>
</ul>

These examples show we’re removing duplicate emails, keeping the lowest id.

Understanding the Problem

How do you solve LeetCode 196: Delete Duplicate Emails in SQL? We need to:

Identify rows where email appears more than once.
For each duplicate email, keep the row with the smallest id.
Delete all other rows with that email.
Modify the table directly.

For "id=1, john@example.com; id=2, bob@example.com; id=3, john@example.com", we keep id=1 for "john@example.com" (smallest id) and delete id=3. We need a query to compare rows and delete efficiently, not a file transposition task like LeetCode 194: Transpose File. We’ll use: 1. Self Join with DELETE: Clear, efficient—our best solution. 2. Subquery with DELETE: Alternative approach.

Let’s dive into the best solution.

Best Solution: Self Join with DELETE Approach

Explanation

Self Join with DELETE removes duplicates by:

Joining Person with itself (p1 and p2) on matching email.
Filtering where p1.id > p2.id (keep smaller id, delete larger).
Using DELETE to remove the p1 rows (higher ids).

This ensures O(n²) time without indexing (O(n log n) with email index), O(1) space beyond query execution, and clarity by directly comparing rows to identify duplicates for deletion.

For "id=1, john@example.com; id=3, john@example.com":

Join: (1, john@example.com) with (3, john@example.com).
1 > 3 is false, but 3 > 1 is true → delete id=3.

Step-by-Step Example

Example 1: Person = {1, "john@example.com"{, {2, "bob@example.com"{, {3, "john@example.com"{

Goal: Delete id=3 row.

Step 1: Self-join Person as p1 and p2.

p1.email = p2.email:

(1, john@example.com) joins (3, john@example.com).
(3, john@example.com) joins (1, john@example.com).
(2, bob@example.com) joins itself (no duplicate).

Step 2: Filter p1.id > p2.id.

(1, john) vs (3, john): 1 > 3 → false.
(3, john) vs (1, john): 3 > 1 → true → mark p1 (id=3) for deletion.
(2, bob) vs (2, bob): 2 > 2 → false.

Step 3: Delete marked rows.

Delete id=3 row.

Step 4: Resulting table:

{1, "john@example.com"{, {2, "bob@example.com"{.

Finish: Table updated in-place.

Example 2: Person = {1, "a@b.com"{, {2, "a@b.com"{, {3, "c@d.com"{

Goal: Delete id=2 row.

Step 1: Join:

(1, a@b.com) joins (2, a@b.com).
(2, a@b.com) joins (1, a@b.com).
(3, c@d.com) joins itself.

Step 2: Filter:

(1, a@b.com) vs (2, a@b.com): 1 > 2 → false.
(2, a@b.com) vs (1, a@b.com): 2 > 1 → true → delete id=2.

Step 3: Delete:

Remove id=2 row.

Finish: {1, "a@b.com"{, {3, "c@d.com"{.

How the Code Works (Self Join with DELETE) – Detailed Line-by-Line Explanation

Here’s the SQL query with a thorough breakdown:

-- Line 1: Delete from Person with alias p1
DELETE p1 
    -- Target rows in p1 to delete (e.g., id=3)

-- Line 2: Join Person with itself
FROM 
    Person p1
    -- First instance (e.g., all rows)
INNER JOIN 
    Person p2
    -- Second instance (e.g., all rows)

-- Line 3: Match on email
ON 
    p1.email = p2.email
    -- Same email (e.g., "john@example.com" in id=1 and id=3)

-- Line 4: Filter higher id
WHERE 
    p1.id > p2.id
    -- Keep smaller id, delete larger (e.g., 3 > 1 → delete id=3)

This detailed breakdown clarifies how the self join with DELETE efficiently removes duplicate emails.

Alternative: Subquery with DELETE Approach

Explanation

Subquery with DELETE removes duplicates by:

Using a subquery to find the minimum id for each email.
Deleting rows where id is not in the set of minimum ids.

It’s a practical alternative, O(n²) time without indexing (O(n log n) with email index), O(n) space for subquery results, but less intuitive due to nested logic, though equally effective.

For "id=1, john@example.com; id=3, john@example.com":

Subquery: Min id for "john@example.com" = 1.
Delete where id not in (1) → remove id=3.

Step-by-Step Example (Alternative)

For {1, "john@example.com"{, {2, "bob@example.com"{, {3, "john@example.com"{:

Step 1: Subquery:

SELECT MIN(id) FROM Person GROUP BY email:

"john@example.com" → 1.
"bob@example.com" → 2.

Step 2: Delete:

DELETE FROM Person WHERE id NOT IN (1, 2):

id=3 not in (1, 2) → delete id=3.

Step 3: Result:

{1, "john@example.com"{, {2, "bob@example.com"{.

Finish: Table updated.

How the Code Works (Subquery)

DELETE FROM Person 
WHERE id NOT IN (
    SELECT MIN(id) 
    FROM Person 
    GROUP BY email
)

Complexity

Self Join with DELETE:

Time: O(n²) – join without index (O(n log n) with index).
Space: O(1) – minimal overhead.

Subquery with DELETE:

Time: O(n²) – subquery per row (O(n log n) with index).
Space: O(n) – subquery result.

Efficiency Notes

Self Join with DELETE is the best solution with O(n²) time (optimized with indexing) and O(1) space, offering clarity and directness—Subquery with DELETE matches time complexity but uses O(n) space for the subquery result and is less intuitive, making it a viable but less preferred alternative.

Key Insights

Self Join: Compares rows.
Subquery: Filters by min id.
DELETE: In-place modification.

Additional Example

Person:
| 1 | x@y.com |
| 2 | x@y.com |
| 3 | z@w.com |
After:
| 1 | x@y.com |
| 3 | z@w.com |
Explanation: Delete id=2 (duplicate).

Edge Cases

No Duplicates: No deletions.
All Duplicates: Keep smallest id.
Single Row: Unchanged.

Both solutions handle these well.

Final Thoughts

LeetCode 196: Delete Duplicate Emails in SQL is a practical deduplication challenge. The Self Join with DELETE solution excels with its efficiency and clarity, while Subquery with DELETE offers an alternative approach. Want more SQL? Try LeetCode 182: Duplicate Emails for finding duplicates or explore Python Basics for coding skills. Ready to practice? Solve LeetCode 196 on LeetCode with the example table, aiming to remove duplicates—test your skills now!