MySQL & Database Design

Subqueries Deep Dive

13 min Lesson 7 of 40

Subqueries Deep Dive

Subqueries (also known as nested queries or inner queries) are powerful SQL constructs that allow you to use the result of one query inside another. In this lesson, we'll explore the different types of subqueries and when to use them effectively.

What are Subqueries?

A subquery is a SELECT statement nested inside another SQL statement. Subqueries can be used in SELECT, INSERT, UPDATE, DELETE, and FROM clauses.

Key Concept: Subqueries are executed first, and their results are passed to the outer query. Think of them as temporary tables that exist only for the duration of the query.

Types of Subqueries

MySQL supports several types of subqueries, each serving different purposes:

Scalar Subqueries: Return a single value (one row, one column)
Column Subqueries: Return a single column with multiple rows
Row Subqueries: Return a single row with multiple columns
Table Subqueries: Return a complete result set (multiple rows and columns)
Correlated Subqueries: Reference columns from the outer query

1. Scalar Subqueries

Scalar subqueries return exactly one value and can be used wherever a single value is expected:

-- Find products with price above average
SELECT product_name, price
FROM products
WHERE price > (
    SELECT AVG(price)
    FROM products
);

-- Display product with its price difference from average
SELECT
    product_name,
    price,
    price - (SELECT AVG(price) FROM products) AS price_diff
FROM products;

2. Column Subqueries (IN, ANY, ALL)

Column subqueries return multiple values in a single column and are typically used with IN, ANY, or ALL operators:

-- Find customers who placed orders in 2023
SELECT customer_name
FROM customers
WHERE customer_id IN (
    SELECT DISTINCT customer_id
    FROM orders
    WHERE YEAR(order_date) = 2023
);

-- Find products more expensive than ALL products in category "Books"
SELECT product_name, price
FROM products
WHERE price > ALL (
    SELECT price
    FROM products
    WHERE category = 'Books'
);

Tip: Use IN when checking membership in a set. Use ANY/SOME when comparing with at least one value. Use ALL when comparing with every value in the set.

3. Row Subqueries

Row subqueries return a single row with multiple columns:

-- Find product matching specific price and stock combination
SELECT product_name
FROM products
WHERE (price, stock_quantity) = (
    SELECT MIN(price), MAX(stock_quantity)
    FROM products
    WHERE category = 'Electronics'
);

4. Table Subqueries (FROM Clause)

Table subqueries return multiple rows and columns, often used in the FROM clause as derived tables:

-- Calculate average order value by customer
SELECT
    customer_name,
    avg_order_value
FROM customers c
JOIN (
    SELECT
        customer_id,
        AVG(total_amount) AS avg_order_value
    FROM orders
    GROUP BY customer_id
) AS order_stats ON c.customer_id = order_stats.customer_id
WHERE avg_order_value > 100;

-- Find top 3 products in each category
SELECT *
FROM (
    SELECT
        product_name,
        category,
        price,
        ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) AS rank_in_category
    FROM products
) AS ranked_products
WHERE rank_in_category <= 3;

5. Correlated Subqueries

Correlated subqueries reference columns from the outer query and are executed once for each row in the outer query:

-- Find products that are the most expensive in their category
SELECT p1.product_name, p1.category, p1.price
FROM products p1
WHERE p1.price = (
    SELECT MAX(p2.price)
    FROM products p2
    WHERE p2.category = p1.category
);

-- Find customers who have placed more than their average order value
SELECT
    c.customer_name,
    o.order_id,
    o.total_amount
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.total_amount > (
    SELECT AVG(total_amount)
    FROM orders o2
    WHERE o2.customer_id = c.customer_id
);

Performance Warning: Correlated subqueries can be slow because they execute once per row in the outer query. Consider using JOINs or window functions as alternatives when possible.

EXISTS and NOT EXISTS

EXISTS tests whether a subquery returns any rows. It's often more efficient than IN for large datasets:

-- Find customers who have placed at least one order
SELECT customer_name
FROM customers c
WHERE EXISTS (
    SELECT 1
    FROM orders o
    WHERE o.customer_id = c.customer_id
);

-- Find products that have never been ordered
SELECT product_name
FROM products p
WHERE NOT EXISTS (
    SELECT 1
    FROM order_items oi
    WHERE oi.product_id = p.product_id
);

Optimization Tip: SELECT 1 in EXISTS clauses is a convention - MySQL only checks if rows exist, not what's selected. This makes EXISTS very efficient.

Subqueries in Different Clauses

-- Subquery in SELECT (scalar)
SELECT
    product_name,
    (SELECT COUNT(*) FROM order_items WHERE product_id = p.product_id) AS times_ordered
FROM products p;

-- Subquery in WHERE
SELECT product_name
FROM products
WHERE category_id IN (SELECT category_id FROM categories WHERE is_active = 1);

-- Subquery in HAVING
SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category
HAVING COUNT(*) > (SELECT AVG(cnt) FROM (SELECT COUNT(*) AS cnt FROM products GROUP BY category) AS counts);

Performance Considerations

Best Practices:

Use EXISTS instead of IN for correlated checks with large datasets
Consider JOINs as alternatives - they're often faster
Avoid subqueries in SELECT clause when processing many rows
Use EXPLAIN to analyze subquery performance
Create appropriate indexes on columns used in subqueries

Real-World Example: E-Commerce Analytics

-- Find products performing better than category average
SELECT
    p.product_name,
    p.category,
    p.sales_count,
    (SELECT AVG(sales_count)
     FROM products p2
     WHERE p2.category = p.category) AS category_avg,
    p.sales_count - (SELECT AVG(sales_count)
                     FROM products p2
                     WHERE p2.category = p.category) AS diff_from_avg
FROM products p
WHERE p.sales_count > (
    SELECT AVG(sales_count)
    FROM products p2
    WHERE p2.category = p.category
)
ORDER BY category, diff_from_avg DESC;

Practice Exercise:

Challenge: Write queries for the following scenarios:

Find all customers who have spent more than the average total spending
List products that have never been ordered using EXISTS
Find the second highest salary in each department

Solution Hints:

-- 1. Customer spending above average
SELECT customer_name, total_spent
FROM (
    SELECT c.customer_name, SUM(o.total_amount) AS total_spent
    FROM customers c
    JOIN orders o ON c.customer_id = o.customer_id
    GROUP BY c.customer_id, c.customer_name
) AS customer_spending
WHERE total_spent > (SELECT AVG(total_spent) FROM (...));

-- 2. Products never ordered
SELECT product_name FROM products p
WHERE NOT EXISTS (SELECT 1 FROM order_items WHERE product_id = p.product_id);

-- 3. Second highest salary
SELECT department, MAX(salary) AS second_highest
FROM employees e1
WHERE salary < (SELECT MAX(salary) FROM employees e2 WHERE e2.department = e1.department)
GROUP BY department;

Summary

In this lesson, you learned:

Five types of subqueries: scalar, column, row, table, and correlated
How to use IN, ANY, ALL, EXISTS, and NOT EXISTS with subqueries
Correlated subqueries reference outer query columns
Performance considerations and optimization techniques
When to use subqueries vs JOINs

Next Up: In the next lesson, we'll master advanced JOIN techniques including self joins, multiple table joins, and join optimization strategies!