Subqueries Deep Dive
Subqueries (also known as nested queries or inner queries) are powerful SQL constructs that allow you to use the result of one query inside another. In this lesson, we'll explore the different types of subqueries and when to use them effectively.
What are Subqueries?
A subquery is a SELECT statement nested inside another SQL statement. Subqueries can be used in SELECT, INSERT, UPDATE, DELETE, and FROM clauses.
Key Concept: Subqueries are executed first, and their results are passed to the outer query. Think of them as temporary tables that exist only for the duration of the query.
Types of Subqueries
MySQL supports several types of subqueries, each serving different purposes:
1. Scalar Subqueries: Return a single value (one row, one column)
2. Column Subqueries: Return a single column with multiple rows
3. Row Subqueries: Return a single row with multiple columns
4. Table Subqueries: Return a complete result set (multiple rows and columns)
5. Correlated Subqueries: Reference columns from the outer query
1. Scalar Subqueries
Scalar subqueries return exactly one value and can be used wherever a single value is expected:
-- Find products with price above average
SELECT product_name, price
FROM products
WHERE price > (
SELECT AVG(price)
FROM products
);
-- Display product with its price difference from average
SELECT
product_name,
price,
price - (SELECT AVG(price) FROM products) AS price_diff
FROM products;
2. Column Subqueries (IN, ANY, ALL)
Column subqueries return multiple values in a single column and are typically used with IN, ANY, or ALL operators:
-- Find customers who placed orders in 2023
SELECT customer_name
FROM customers
WHERE customer_id IN (
SELECT DISTINCT customer_id
FROM orders
WHERE YEAR(order_date) = 2023
);
-- Find products more expensive than ALL products in category "Books"
SELECT product_name, price
FROM products
WHERE price > ALL (
SELECT price
FROM products
WHERE category = 'Books'
);
Tip: Use IN when checking membership in a set. Use ANY/SOME when comparing with at least one value. Use ALL when comparing with every value in the set.
3. Row Subqueries
Row subqueries return a single row with multiple columns:
-- Find product matching specific price and stock combination
SELECT product_name
FROM products
WHERE (price, stock_quantity) = (
SELECT MIN(price), MAX(stock_quantity)
FROM products
WHERE category = 'Electronics'
);
4. Table Subqueries (FROM Clause)
Table subqueries return multiple rows and columns, often used in the FROM clause as derived tables:
-- Calculate average order value by customer
SELECT
customer_name,
avg_order_value
FROM customers c
JOIN (
SELECT
customer_id,
AVG(total_amount) AS avg_order_value
FROM orders
GROUP BY customer_id
) AS order_stats ON c.customer_id = order_stats.customer_id
WHERE avg_order_value > 100;
-- Find top 3 products in each category
SELECT *
FROM (
SELECT
product_name,
category,
price,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) AS rank_in_category
FROM products
) AS ranked_products
WHERE rank_in_category <= 3;
5. Correlated Subqueries
Correlated subqueries reference columns from the outer query and are executed once for each row in the outer query:
-- Find products that are the most expensive in their category
SELECT p1.product_name, p1.category, p1.price
FROM products p1
WHERE p1.price = (
SELECT MAX(p2.price)
FROM products p2
WHERE p2.category = p1.category
);
-- Find customers who have placed more than their average order value
SELECT
c.customer_name,
o.order_id,
o.total_amount
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.total_amount > (
SELECT AVG(total_amount)
FROM orders o2
WHERE o2.customer_id = c.customer_id
);
Performance Warning: Correlated subqueries can be slow because they execute once per row in the outer query. Consider using JOINs or window functions as alternatives when possible.
EXISTS and NOT EXISTS
EXISTS tests whether a subquery returns any rows. It's often more efficient than IN for large datasets:
-- Find customers who have placed at least one order
SELECT customer_name
FROM customers c
WHERE EXISTS (
SELECT 1
FROM orders o
WHERE o.customer_id = c.customer_id
);
-- Find products that have never been ordered
SELECT product_name
FROM products p
WHERE NOT EXISTS (
SELECT 1
FROM order_items oi
WHERE oi.product_id = p.product_id
);
Optimization Tip: SELECT 1 in EXISTS clauses is a convention - MySQL only checks if rows exist, not what's selected. This makes EXISTS very efficient.
Subqueries in Different Clauses
-- Subquery in SELECT (scalar)
SELECT
product_name,
(SELECT COUNT(*) FROM order_items WHERE product_id = p.product_id) AS times_ordered
FROM products p;
-- Subquery in WHERE
SELECT product_name
FROM products
WHERE category_id IN (SELECT category_id FROM categories WHERE is_active = 1);
-- Subquery in HAVING
SELECT category, COUNT(*) AS product_count
FROM products
GROUP BY category
HAVING COUNT(*) > (SELECT AVG(cnt) FROM (SELECT COUNT(*) AS cnt FROM products GROUP BY category) AS counts);
Performance Considerations
Best Practices:
- Use EXISTS instead of IN for correlated checks with large datasets
- Consider JOINs as alternatives - they're often faster
- Avoid subqueries in SELECT clause when processing many rows
- Use EXPLAIN to analyze subquery performance
- Create appropriate indexes on columns used in subqueries
Real-World Example: E-Commerce Analytics
-- Find products performing better than category average
SELECT
p.product_name,
p.category,
p.sales_count,
(SELECT AVG(sales_count)
FROM products p2
WHERE p2.category = p.category) AS category_avg,
p.sales_count - (SELECT AVG(sales_count)
FROM products p2
WHERE p2.category = p.category) AS diff_from_avg
FROM products p
WHERE p.sales_count > (
SELECT AVG(sales_count)
FROM products p2
WHERE p2.category = p.category
)
ORDER BY category, diff_from_avg DESC;
Practice Exercise:
Challenge: Write queries for the following scenarios:
- Find all customers who have spent more than the average total spending
- List products that have never been ordered using EXISTS
- Find the second highest salary in each department
Solution Hints:
-- 1. Customer spending above average
SELECT customer_name, total_spent
FROM (
SELECT c.customer_name, SUM(o.total_amount) AS total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name
) AS customer_spending
WHERE total_spent > (SELECT AVG(total_spent) FROM (...));
-- 2. Products never ordered
SELECT product_name FROM products p
WHERE NOT EXISTS (SELECT 1 FROM order_items WHERE product_id = p.product_id);
-- 3. Second highest salary
SELECT department, MAX(salary) AS second_highest
FROM employees e1
WHERE salary < (SELECT MAX(salary) FROM employees e2 WHERE e2.department = e1.department)
GROUP BY department;
Summary
In this lesson, you learned:
- Five types of subqueries: scalar, column, row, table, and correlated
- How to use IN, ANY, ALL, EXISTS, and NOT EXISTS with subqueries
- Correlated subqueries reference outer query columns
- Performance considerations and optimization techniques
- When to use subqueries vs JOINs
Next Up: In the next lesson, we'll master advanced JOIN techniques including self joins, multiple table joins, and join optimization strategies!