MySQL & Database Design

Subqueries Deep Dive

13 min Lesson 7 of 40

Subqueries Deep Dive

Subqueries (also known as nested queries or inner queries) are powerful SQL constructs that allow you to use the result of one query inside another. In this lesson, we'll explore the different types of subqueries and when to use them effectively.

What are Subqueries?

A subquery is a SELECT statement nested inside another SQL statement. Subqueries can be used in SELECT, INSERT, UPDATE, DELETE, and FROM clauses.

Key Concept: Subqueries are executed first, and their results are passed to the outer query. Think of them as temporary tables that exist only for the duration of the query.

Types of Subqueries

MySQL supports several types of subqueries, each serving different purposes:

1. Scalar Subqueries: Return a single value (one row, one column) 2. Column Subqueries: Return a single column with multiple rows 3. Row Subqueries: Return a single row with multiple columns 4. Table Subqueries: Return a complete result set (multiple rows and columns) 5. Correlated Subqueries: Reference columns from the outer query

1. Scalar Subqueries

Scalar subqueries return exactly one value and can be used wherever a single value is expected:

-- Find products with price above average SELECT product_name, price FROM products WHERE price > ( SELECT AVG(price) FROM products );
-- Display product with its price difference from average SELECT product_name, price, price - (SELECT AVG(price) FROM products) AS price_diff FROM products;

2. Column Subqueries (IN, ANY, ALL)

Column subqueries return multiple values in a single column and are typically used with IN, ANY, or ALL operators:

-- Find customers who placed orders in 2023 SELECT customer_name FROM customers WHERE customer_id IN ( SELECT DISTINCT customer_id FROM orders WHERE YEAR(order_date) = 2023 );
-- Find products more expensive than ALL products in category "Books" SELECT product_name, price FROM products WHERE price > ALL ( SELECT price FROM products WHERE category = 'Books' );
Tip: Use IN when checking membership in a set. Use ANY/SOME when comparing with at least one value. Use ALL when comparing with every value in the set.

3. Row Subqueries

Row subqueries return a single row with multiple columns:

-- Find product matching specific price and stock combination SELECT product_name FROM products WHERE (price, stock_quantity) = ( SELECT MIN(price), MAX(stock_quantity) FROM products WHERE category = 'Electronics' );

4. Table Subqueries (FROM Clause)

Table subqueries return multiple rows and columns, often used in the FROM clause as derived tables:

-- Calculate average order value by customer SELECT customer_name, avg_order_value FROM customers c JOIN ( SELECT customer_id, AVG(total_amount) AS avg_order_value FROM orders GROUP BY customer_id ) AS order_stats ON c.customer_id = order_stats.customer_id WHERE avg_order_value > 100;
-- Find top 3 products in each category SELECT * FROM ( SELECT product_name, category, price, ROW_NUMBER() OVER (PARTITION BY category ORDER BY price DESC) AS rank_in_category FROM products ) AS ranked_products WHERE rank_in_category <= 3;

5. Correlated Subqueries

Correlated subqueries reference columns from the outer query and are executed once for each row in the outer query:

-- Find products that are the most expensive in their category SELECT p1.product_name, p1.category, p1.price FROM products p1 WHERE p1.price = ( SELECT MAX(p2.price) FROM products p2 WHERE p2.category = p1.category );
-- Find customers who have placed more than their average order value SELECT c.customer_name, o.order_id, o.total_amount FROM customers c JOIN orders o ON c.customer_id = o.customer_id WHERE o.total_amount > ( SELECT AVG(total_amount) FROM orders o2 WHERE o2.customer_id = c.customer_id );
Performance Warning: Correlated subqueries can be slow because they execute once per row in the outer query. Consider using JOINs or window functions as alternatives when possible.

EXISTS and NOT EXISTS

EXISTS tests whether a subquery returns any rows. It's often more efficient than IN for large datasets:

-- Find customers who have placed at least one order SELECT customer_name FROM customers c WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = c.customer_id );
-- Find products that have never been ordered SELECT product_name FROM products p WHERE NOT EXISTS ( SELECT 1 FROM order_items oi WHERE oi.product_id = p.product_id );
Optimization Tip: SELECT 1 in EXISTS clauses is a convention - MySQL only checks if rows exist, not what's selected. This makes EXISTS very efficient.

Subqueries in Different Clauses

-- Subquery in SELECT (scalar) SELECT product_name, (SELECT COUNT(*) FROM order_items WHERE product_id = p.product_id) AS times_ordered FROM products p; -- Subquery in WHERE SELECT product_name FROM products WHERE category_id IN (SELECT category_id FROM categories WHERE is_active = 1); -- Subquery in HAVING SELECT category, COUNT(*) AS product_count FROM products GROUP BY category HAVING COUNT(*) > (SELECT AVG(cnt) FROM (SELECT COUNT(*) AS cnt FROM products GROUP BY category) AS counts);

Performance Considerations

Best Practices:
  • Use EXISTS instead of IN for correlated checks with large datasets
  • Consider JOINs as alternatives - they're often faster
  • Avoid subqueries in SELECT clause when processing many rows
  • Use EXPLAIN to analyze subquery performance
  • Create appropriate indexes on columns used in subqueries

Real-World Example: E-Commerce Analytics

-- Find products performing better than category average SELECT p.product_name, p.category, p.sales_count, (SELECT AVG(sales_count) FROM products p2 WHERE p2.category = p.category) AS category_avg, p.sales_count - (SELECT AVG(sales_count) FROM products p2 WHERE p2.category = p.category) AS diff_from_avg FROM products p WHERE p.sales_count > ( SELECT AVG(sales_count) FROM products p2 WHERE p2.category = p.category ) ORDER BY category, diff_from_avg DESC;

Practice Exercise:

Challenge: Write queries for the following scenarios:

  1. Find all customers who have spent more than the average total spending
  2. List products that have never been ordered using EXISTS
  3. Find the second highest salary in each department

Solution Hints:

-- 1. Customer spending above average SELECT customer_name, total_spent FROM ( SELECT c.customer_name, SUM(o.total_amount) AS total_spent FROM customers c JOIN orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.customer_name ) AS customer_spending WHERE total_spent > (SELECT AVG(total_spent) FROM (...)); -- 2. Products never ordered SELECT product_name FROM products p WHERE NOT EXISTS (SELECT 1 FROM order_items WHERE product_id = p.product_id); -- 3. Second highest salary SELECT department, MAX(salary) AS second_highest FROM employees e1 WHERE salary < (SELECT MAX(salary) FROM employees e2 WHERE e2.department = e1.department) GROUP BY department;

Summary

In this lesson, you learned:

  • Five types of subqueries: scalar, column, row, table, and correlated
  • How to use IN, ANY, ALL, EXISTS, and NOT EXISTS with subqueries
  • Correlated subqueries reference outer query columns
  • Performance considerations and optimization techniques
  • When to use subqueries vs JOINs
Next Up: In the next lesson, we'll master advanced JOIN techniques including self joins, multiple table joins, and join optimization strategies!

ES
Edrees Salih
14 hours ago

We are still cooking the magic in the way!