MySQL & Database Design

Common Table Expressions (CTEs)

13 min Lesson 11 of 40

Common Table Expressions (CTEs)

Common Table Expressions (CTEs) are temporary named result sets that exist only during query execution. They make complex queries more readable, maintainable, and can handle recursive data structures that would be difficult or impossible with traditional queries.

What are CTEs?

A CTE is defined using the WITH clause and can be referenced multiple times in the main query:

-- Basic CTE syntax WITH cte_name AS ( SELECT column1, column2 FROM table_name WHERE condition ) SELECT * FROM cte_name;
Key Benefits: CTEs improve readability by breaking complex queries into logical steps, allow reusing the same subquery multiple times, and enable recursive queries for hierarchical data.

Simple CTEs

Simple CTEs replace complex subqueries with named, reusable result sets:

-- Without CTE (hard to read) SELECT c.customer_name, o.total_spent FROM customers c JOIN ( SELECT customer_id, SUM(total_amount) AS total_spent FROM orders WHERE order_date >= '2024-01-01' GROUP BY customer_id ) o ON c.customer_id = o.customer_id WHERE o.total_spent > 1000; -- With CTE (much clearer) WITH customer_spending AS ( SELECT customer_id, SUM(total_amount) AS total_spent FROM orders WHERE order_date >= '2024-01-01' GROUP BY customer_id ) SELECT c.customer_name, cs.total_spent FROM customers c JOIN customer_spending cs ON c.customer_id = cs.customer_id WHERE cs.total_spent > 1000;
-- Calculating statistics with CTE WITH product_stats AS ( SELECT category, AVG(price) AS avg_price, COUNT(*) AS product_count, MAX(price) AS max_price FROM products GROUP BY category ) SELECT p.product_name, p.category, p.price, ps.avg_price, ROUND((p.price - ps.avg_price) / ps.avg_price * 100, 2) AS pct_diff_from_avg FROM products p JOIN product_stats ps ON p.category = ps.category WHERE p.price > ps.avg_price ORDER BY pct_diff_from_avg DESC;

Multiple CTEs

You can define multiple CTEs in a single query, separated by commas:

-- Multiple CTEs for complex analysis WITH high_value_customers AS ( SELECT customer_id, SUM(total_amount) AS total_spent FROM orders GROUP BY customer_id HAVING total_spent > 5000 ), recent_orders AS ( SELECT customer_id, COUNT(*) AS recent_order_count FROM orders WHERE order_date >= DATE_SUB(CURDATE(), INTERVAL 90 DAY) GROUP BY customer_id ), product_diversity AS ( SELECT o.customer_id, COUNT(DISTINCT oi.product_id) AS unique_products FROM orders o JOIN order_items oi ON o.order_id = oi.order_id GROUP BY o.customer_id ) SELECT c.customer_name, c.email, hvc.total_spent, COALESCE(ro.recent_order_count, 0) AS recent_orders, COALESCE(pd.unique_products, 0) AS products_purchased FROM customers c JOIN high_value_customers hvc ON c.customer_id = hvc.customer_id LEFT JOIN recent_orders ro ON c.customer_id = ro.customer_id LEFT JOIN product_diversity pd ON c.customer_id = pd.customer_id ORDER BY hvc.total_spent DESC;
Best Practice: Use descriptive CTE names that clearly indicate what data they contain. This makes queries self-documenting and easier to maintain.

CTEs Referencing Other CTEs

Later CTEs can reference earlier ones in the same WITH clause:

-- CTEs building on each other WITH monthly_sales AS ( SELECT DATE_FORMAT(order_date, '%Y-%m') AS month, SUM(total_amount) AS revenue FROM orders GROUP BY DATE_FORMAT(order_date, '%Y-%m') ), sales_with_growth AS ( SELECT month, revenue, LAG(revenue) OVER (ORDER BY month) AS prev_month_revenue FROM monthly_sales ), growth_analysis AS ( SELECT month, revenue, prev_month_revenue, revenue - prev_month_revenue AS growth_amount, ROUND((revenue - prev_month_revenue) * 100.0 / prev_month_revenue, 2) AS growth_pct FROM sales_with_growth WHERE prev_month_revenue IS NOT NULL ) SELECT * FROM growth_analysis WHERE ABS(growth_pct) > 10 ORDER BY month;

Recursive CTEs

Recursive CTEs can reference themselves, perfect for hierarchical or tree-structured data:

-- Basic recursive CTE syntax WITH RECURSIVE cte_name AS ( -- Anchor member (base case) SELECT columns FROM table WHERE base_condition UNION ALL -- Recursive member SELECT columns FROM table JOIN cte_name ON join_condition WHERE recursive_condition ) SELECT * FROM cte_name;
-- Employee hierarchy (org chart) WITH RECURSIVE employee_hierarchy AS ( -- Anchor: Start with CEO (no manager) SELECT employee_id, employee_name, manager_id, position, 1 AS level, CAST(employee_name AS CHAR(500)) AS hierarchy_path FROM employees WHERE manager_id IS NULL UNION ALL -- Recursive: Add employees reporting to previous level SELECT e.employee_id, e.employee_name, e.manager_id, e.position, eh.level + 1, CONCAT(eh.hierarchy_path, ' > ', e.employee_name) FROM employees e JOIN employee_hierarchy eh ON e.manager_id = eh.employee_id WHERE eh.level < 10 -- Prevent infinite loops ) SELECT REPEAT(' ', level - 1) AS indent, employee_name, position, level, hierarchy_path FROM employee_hierarchy ORDER BY hierarchy_path;
Important: Always include a termination condition in recursive CTEs to prevent infinite loops. Use a level counter or limit depth with WHERE clause.

Real-World Recursive Examples

-- Category tree with all ancestors WITH RECURSIVE category_tree AS ( -- Anchor: Leaf categories SELECT category_id, category_name, parent_category_id, 1 AS depth, category_name AS full_path FROM categories WHERE category_id = 42 -- Start from specific category UNION ALL -- Recursive: Add parent categories SELECT c.category_id, c.category_name, c.parent_category_id, ct.depth + 1, CONCAT(c.category_name, ' > ', ct.full_path) FROM categories c JOIN category_tree ct ON c.category_id = ct.parent_category_id ) SELECT * FROM category_tree ORDER BY depth DESC;
-- Find all subordinates of a manager WITH RECURSIVE subordinates AS ( -- Anchor: The manager SELECT employee_id, employee_name, manager_id, 0 AS levels_down FROM employees WHERE employee_id = 5 -- Manager ID UNION ALL -- Recursive: Their reports SELECT e.employee_id, e.employee_name, e.manager_id, s.levels_down + 1 FROM employees e JOIN subordinates s ON e.manager_id = s.employee_id WHERE s.levels_down < 5 ) SELECT levels_down, employee_name, (SELECT COUNT(*) - 1 FROM subordinates) AS total_subordinates FROM subordinates ORDER BY levels_down, employee_name;

Generating Sequences with Recursive CTEs

-- Generate date range WITH RECURSIVE date_range AS ( SELECT DATE('2024-01-01') AS date UNION ALL SELECT DATE_ADD(date, INTERVAL 1 DAY) FROM date_range WHERE date < '2024-12-31' ) SELECT dr.date, COALESCE(SUM(o.total_amount), 0) AS daily_revenue FROM date_range dr LEFT JOIN orders o ON DATE(o.order_date) = dr.date GROUP BY dr.date ORDER BY dr.date;
-- Generate number sequence (1 to 100) WITH RECURSIVE numbers AS ( SELECT 1 AS n UNION ALL SELECT n + 1 FROM numbers WHERE n < 100 ) SELECT n FROM numbers;

CTEs vs Subqueries vs Temporary Tables

When to Use Each:
  • CTEs: Improve readability, need to reference result multiple times, recursive queries
  • Subqueries: Simple one-time use, small result sets
  • Temporary Tables: Very large result sets, need indexes, multiple queries using same data
-- CTE vs Subquery performance -- CTE (can be referenced multiple times) WITH top_products AS ( SELECT product_id, product_name, sales_count FROM products WHERE sales_count > 100 ) SELECT * FROM top_products WHERE product_name LIKE 'A%' UNION ALL SELECT * FROM top_products WHERE product_name LIKE 'B%'; -- Subquery (computed twice - less efficient) SELECT * FROM (SELECT ...) WHERE product_name LIKE 'A%' UNION ALL SELECT * FROM (SELECT ...) WHERE product_name LIKE 'B%';

Performance Implications

Performance Tips:
  • CTEs are typically materialized once and reused (MySQL 8.0+)
  • Add indexes to base tables, not CTEs (they're temporary)
  • For very large result sets, consider temporary tables with indexes
  • Recursive CTEs can be slow - always limit recursion depth
  • Use EXPLAIN to understand how MySQL executes CTE queries

Complex Real-World Example

-- Customer cohort retention analysis WITH RECURSIVE months AS ( SELECT DATE_FORMAT(MIN(order_date), '%Y-%m-01') AS month FROM orders UNION ALL SELECT DATE_ADD(month, INTERVAL 1 MONTH) FROM months WHERE month < DATE_FORMAT(CURDATE(), '%Y-%m-01') ), first_purchases AS ( SELECT customer_id, DATE_FORMAT(MIN(order_date), '%Y-%m-01') AS cohort_month FROM orders GROUP BY customer_id ), customer_months AS ( SELECT fp.customer_id, fp.cohort_month, DATE_FORMAT(o.order_date, '%Y-%m-01') AS purchase_month, TIMESTAMPDIFF(MONTH, fp.cohort_month, DATE_FORMAT(o.order_date, '%Y-%m-01')) AS months_since_first FROM first_purchases fp JOIN orders o ON fp.customer_id = o.customer_id ) SELECT cohort_month, COUNT(DISTINCT customer_id) AS cohort_size, SUM(CASE WHEN months_since_first = 0 THEN 1 ELSE 0 END) AS month_0, SUM(CASE WHEN months_since_first = 1 THEN 1 ELSE 0 END) AS month_1, SUM(CASE WHEN months_since_first = 2 THEN 1 ELSE 0 END) AS month_2, SUM(CASE WHEN months_since_first = 3 THEN 1 ELSE 0 END) AS month_3, ROUND(SUM(CASE WHEN months_since_first = 3 THEN 1 ELSE 0 END) * 100.0 / COUNT(DISTINCT customer_id), 2) AS retention_3month_pct FROM customer_months GROUP BY cohort_month ORDER BY cohort_month;

Practice Exercise:

Challenge: Write queries using CTEs:

  1. Find customers who purchased in 3 consecutive months (use CTE with window function)
  2. Create a recursive CTE to show product category hierarchy
  3. Generate a report showing monthly revenue with 3-month moving average (multiple CTEs)

Solution Hints:

-- 1. Consecutive purchases WITH monthly_purchases AS ( SELECT DISTINCT customer_id, DATE_FORMAT(order_date, '%Y-%m-01') AS month FROM orders ), with_sequence AS ( SELECT customer_id, month, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY month) AS rn, DATE_SUB(month, INTERVAL ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY month) MONTH) AS grp FROM monthly_purchases ) SELECT customer_id, MIN(month) AS start_month, COUNT(*) AS consecutive_months FROM with_sequence GROUP BY customer_id, grp HAVING consecutive_months >= 3; -- 2. Category hierarchy WITH RECURSIVE category_path AS ( SELECT category_id, category_name, parent_id, 1 AS level, category_name AS path FROM categories WHERE parent_id IS NULL UNION ALL SELECT c.category_id, c.category_name, c.parent_id, cp.level + 1, CONCAT(cp.path, ' > ', c.category_name) FROM categories c JOIN category_path cp ON c.parent_id = cp.category_id ) SELECT * FROM category_path ORDER BY path;

Summary

In this lesson, you mastered:

  • CTE syntax with WITH clause for named result sets
  • Multiple CTEs and CTEs referencing each other
  • Recursive CTEs for hierarchical data structures
  • Practical applications: org charts, category trees, date ranges
  • When to use CTEs vs subqueries vs temporary tables
  • Performance considerations and best practices
Next Up: In the next lesson, we'll explore advanced string and date functions for powerful data manipulation and transformation!

ES
Edrees Salih
19 hours ago

We are still cooking the magic in the way!