We are still cooking the magic in the way!
MySQL & Database Design
Common Table Expressions (CTEs)
Common Table Expressions (CTEs)
Common Table Expressions (CTEs) are temporary named result sets that exist only during query execution. They make complex queries more readable, maintainable, and can handle recursive data structures that would be difficult or impossible with traditional queries.
What are CTEs?
A CTE is defined using the WITH clause and can be referenced multiple times in the main query:
-- Basic CTE syntax
WITH cte_name AS (
SELECT column1, column2
FROM table_name
WHERE condition
)
SELECT *
FROM cte_name;
Key Benefits: CTEs improve readability by breaking complex queries into logical steps, allow reusing the same subquery multiple times, and enable recursive queries for hierarchical data.
Simple CTEs
Simple CTEs replace complex subqueries with named, reusable result sets:
-- Without CTE (hard to read)
SELECT c.customer_name, o.total_spent
FROM customers c
JOIN (
SELECT customer_id, SUM(total_amount) AS total_spent
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY customer_id
) o ON c.customer_id = o.customer_id
WHERE o.total_spent > 1000;
-- With CTE (much clearer)
WITH customer_spending AS (
SELECT customer_id, SUM(total_amount) AS total_spent
FROM orders
WHERE order_date >= '2024-01-01'
GROUP BY customer_id
)
SELECT c.customer_name, cs.total_spent
FROM customers c
JOIN customer_spending cs ON c.customer_id = cs.customer_id
WHERE cs.total_spent > 1000;
-- Calculating statistics with CTE
WITH product_stats AS (
SELECT
category,
AVG(price) AS avg_price,
COUNT(*) AS product_count,
MAX(price) AS max_price
FROM products
GROUP BY category
)
SELECT
p.product_name,
p.category,
p.price,
ps.avg_price,
ROUND((p.price - ps.avg_price) / ps.avg_price * 100, 2) AS pct_diff_from_avg
FROM products p
JOIN product_stats ps ON p.category = ps.category
WHERE p.price > ps.avg_price
ORDER BY pct_diff_from_avg DESC;
Multiple CTEs
You can define multiple CTEs in a single query, separated by commas:
-- Multiple CTEs for complex analysis
WITH
high_value_customers AS (
SELECT customer_id, SUM(total_amount) AS total_spent
FROM orders
GROUP BY customer_id
HAVING total_spent > 5000
),
recent_orders AS (
SELECT customer_id, COUNT(*) AS recent_order_count
FROM orders
WHERE order_date >= DATE_SUB(CURDATE(), INTERVAL 90 DAY)
GROUP BY customer_id
),
product_diversity AS (
SELECT o.customer_id, COUNT(DISTINCT oi.product_id) AS unique_products
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
GROUP BY o.customer_id
)
SELECT
c.customer_name,
c.email,
hvc.total_spent,
COALESCE(ro.recent_order_count, 0) AS recent_orders,
COALESCE(pd.unique_products, 0) AS products_purchased
FROM customers c
JOIN high_value_customers hvc ON c.customer_id = hvc.customer_id
LEFT JOIN recent_orders ro ON c.customer_id = ro.customer_id
LEFT JOIN product_diversity pd ON c.customer_id = pd.customer_id
ORDER BY hvc.total_spent DESC;
Best Practice: Use descriptive CTE names that clearly indicate what data they contain. This makes queries self-documenting and easier to maintain.
CTEs Referencing Other CTEs
Later CTEs can reference earlier ones in the same WITH clause:
-- CTEs building on each other
WITH
monthly_sales AS (
SELECT
DATE_FORMAT(order_date, '%Y-%m') AS month,
SUM(total_amount) AS revenue
FROM orders
GROUP BY DATE_FORMAT(order_date, '%Y-%m')
),
sales_with_growth AS (
SELECT
month,
revenue,
LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue
FROM monthly_sales
),
growth_analysis AS (
SELECT
month,
revenue,
prev_month_revenue,
revenue - prev_month_revenue AS growth_amount,
ROUND((revenue - prev_month_revenue) * 100.0 / prev_month_revenue, 2) AS growth_pct
FROM sales_with_growth
WHERE prev_month_revenue IS NOT NULL
)
SELECT *
FROM growth_analysis
WHERE ABS(growth_pct) > 10
ORDER BY month;
Recursive CTEs
Recursive CTEs can reference themselves, perfect for hierarchical or tree-structured data:
-- Basic recursive CTE syntax
WITH RECURSIVE cte_name AS (
-- Anchor member (base case)
SELECT columns
FROM table
WHERE base_condition
UNION ALL
-- Recursive member
SELECT columns
FROM table
JOIN cte_name ON join_condition
WHERE recursive_condition
)
SELECT * FROM cte_name;
-- Employee hierarchy (org chart)
WITH RECURSIVE employee_hierarchy AS (
-- Anchor: Start with CEO (no manager)
SELECT
employee_id,
employee_name,
manager_id,
position,
1 AS level,
CAST(employee_name AS CHAR(500)) AS hierarchy_path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: Add employees reporting to previous level
SELECT
e.employee_id,
e.employee_name,
e.manager_id,
e.position,
eh.level + 1,
CONCAT(eh.hierarchy_path, ' > ', e.employee_name)
FROM employees e
JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id
WHERE eh.level < 10 -- Prevent infinite loops
)
SELECT
REPEAT(' ', level - 1) AS indent,
employee_name,
position,
level,
hierarchy_path
FROM employee_hierarchy
ORDER BY hierarchy_path;
Important: Always include a termination condition in recursive CTEs to prevent infinite loops. Use a level counter or limit depth with WHERE clause.
Real-World Recursive Examples
-- Category tree with all ancestors
WITH RECURSIVE category_tree AS (
-- Anchor: Leaf categories
SELECT
category_id,
category_name,
parent_category_id,
1 AS depth,
category_name AS full_path
FROM categories
WHERE category_id = 42 -- Start from specific category
UNION ALL
-- Recursive: Add parent categories
SELECT
c.category_id,
c.category_name,
c.parent_category_id,
ct.depth + 1,
CONCAT(c.category_name, ' > ', ct.full_path)
FROM categories c
JOIN category_tree ct ON c.category_id = ct.parent_category_id
)
SELECT * FROM category_tree
ORDER BY depth DESC;
-- Find all subordinates of a manager
WITH RECURSIVE subordinates AS (
-- Anchor: The manager
SELECT employee_id, employee_name, manager_id, 0 AS levels_down
FROM employees
WHERE employee_id = 5 -- Manager ID
UNION ALL
-- Recursive: Their reports
SELECT e.employee_id, e.employee_name, e.manager_id, s.levels_down + 1
FROM employees e
JOIN subordinates s ON e.manager_id = s.employee_id
WHERE s.levels_down < 5
)
SELECT
levels_down,
employee_name,
(SELECT COUNT(*) - 1 FROM subordinates) AS total_subordinates
FROM subordinates
ORDER BY levels_down, employee_name;
Generating Sequences with Recursive CTEs
-- Generate date range
WITH RECURSIVE date_range AS (
SELECT DATE('2024-01-01') AS date
UNION ALL
SELECT DATE_ADD(date, INTERVAL 1 DAY)
FROM date_range
WHERE date < '2024-12-31'
)
SELECT
dr.date,
COALESCE(SUM(o.total_amount), 0) AS daily_revenue
FROM date_range dr
LEFT JOIN orders o ON DATE(o.order_date) = dr.date
GROUP BY dr.date
ORDER BY dr.date;
-- Generate number sequence (1 to 100)
WITH RECURSIVE numbers AS (
SELECT 1 AS n
UNION ALL
SELECT n + 1
FROM numbers
WHERE n < 100
)
SELECT n FROM numbers;
CTEs vs Subqueries vs Temporary Tables
When to Use Each:
- CTEs: Improve readability, need to reference result multiple times, recursive queries
- Subqueries: Simple one-time use, small result sets
- Temporary Tables: Very large result sets, need indexes, multiple queries using same data
-- CTE vs Subquery performance
-- CTE (can be referenced multiple times)
WITH top_products AS (
SELECT product_id, product_name, sales_count
FROM products
WHERE sales_count > 100
)
SELECT * FROM top_products WHERE product_name LIKE 'A%'
UNION ALL
SELECT * FROM top_products WHERE product_name LIKE 'B%';
-- Subquery (computed twice - less efficient)
SELECT * FROM (SELECT ...) WHERE product_name LIKE 'A%'
UNION ALL
SELECT * FROM (SELECT ...) WHERE product_name LIKE 'B%';
Performance Implications
Performance Tips:
- CTEs are typically materialized once and reused (MySQL 8.0+)
- Add indexes to base tables, not CTEs (they're temporary)
- For very large result sets, consider temporary tables with indexes
- Recursive CTEs can be slow - always limit recursion depth
- Use EXPLAIN to understand how MySQL executes CTE queries
Complex Real-World Example
-- Customer cohort retention analysis
WITH RECURSIVE
months AS (
SELECT DATE_FORMAT(MIN(order_date), '%Y-%m-01') AS month
FROM orders
UNION ALL
SELECT DATE_ADD(month, INTERVAL 1 MONTH)
FROM months
WHERE month < DATE_FORMAT(CURDATE(), '%Y-%m-01')
),
first_purchases AS (
SELECT
customer_id,
DATE_FORMAT(MIN(order_date), '%Y-%m-01') AS cohort_month
FROM orders
GROUP BY customer_id
),
customer_months AS (
SELECT
fp.customer_id,
fp.cohort_month,
DATE_FORMAT(o.order_date, '%Y-%m-01') AS purchase_month,
TIMESTAMPDIFF(MONTH, fp.cohort_month, DATE_FORMAT(o.order_date, '%Y-%m-01')) AS months_since_first
FROM first_purchases fp
JOIN orders o ON fp.customer_id = o.customer_id
)
SELECT
cohort_month,
COUNT(DISTINCT customer_id) AS cohort_size,
SUM(CASE WHEN months_since_first = 0 THEN 1 ELSE 0 END) AS month_0,
SUM(CASE WHEN months_since_first = 1 THEN 1 ELSE 0 END) AS month_1,
SUM(CASE WHEN months_since_first = 2 THEN 1 ELSE 0 END) AS month_2,
SUM(CASE WHEN months_since_first = 3 THEN 1 ELSE 0 END) AS month_3,
ROUND(SUM(CASE WHEN months_since_first = 3 THEN 1 ELSE 0 END) * 100.0 /
COUNT(DISTINCT customer_id), 2) AS retention_3month_pct
FROM customer_months
GROUP BY cohort_month
ORDER BY cohort_month;
Practice Exercise:
Challenge: Write queries using CTEs:
- Find customers who purchased in 3 consecutive months (use CTE with window function)
- Create a recursive CTE to show product category hierarchy
- Generate a report showing monthly revenue with 3-month moving average (multiple CTEs)
Solution Hints:
-- 1. Consecutive purchases
WITH monthly_purchases AS (
SELECT DISTINCT
customer_id,
DATE_FORMAT(order_date, '%Y-%m-01') AS month
FROM orders
),
with_sequence AS (
SELECT
customer_id,
month,
ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY month) AS rn,
DATE_SUB(month, INTERVAL ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY month) MONTH) AS grp
FROM monthly_purchases
)
SELECT customer_id, MIN(month) AS start_month, COUNT(*) AS consecutive_months
FROM with_sequence
GROUP BY customer_id, grp
HAVING consecutive_months >= 3;
-- 2. Category hierarchy
WITH RECURSIVE category_path AS (
SELECT category_id, category_name, parent_id, 1 AS level,
category_name AS path
FROM categories WHERE parent_id IS NULL
UNION ALL
SELECT c.category_id, c.category_name, c.parent_id, cp.level + 1,
CONCAT(cp.path, ' > ', c.category_name)
FROM categories c
JOIN category_path cp ON c.parent_id = cp.category_id
)
SELECT * FROM category_path ORDER BY path;
Summary
In this lesson, you mastered:
- CTE syntax with WITH clause for named result sets
- Multiple CTEs and CTEs referencing each other
- Recursive CTEs for hierarchical data structures
- Practical applications: org charts, category trees, date ranges
- When to use CTEs vs subqueries vs temporary tables
- Performance considerations and best practices
Next Up: In the next lesson, we'll explore advanced string and date functions for powerful data manipulation and transformation!