MySQL & Database Design

UNION, INTERSECT, and Set Operations

13 min Lesson 9 of 40

UNION, INTERSECT, and Set Operations

Set operations allow you to combine, compare, and manipulate result sets from multiple queries. These operations treat query results as mathematical sets, enabling powerful data analysis and reporting capabilities.

Understanding Set Operations

Set operations combine rows from two or more SELECT statements based on set theory principles:

UNION: Combines results, removes duplicates UNION ALL: Combines results, keeps duplicates (faster) INTERSECT: Returns only common rows (MySQL 8.0.31+) EXCEPT/MINUS: Returns rows in first set but not in second
Important: All queries in a set operation must have the same number of columns with compatible data types. Column names come from the first query.

UNION - Combining Result Sets

UNION combines results from multiple queries and removes duplicate rows:

-- Combine active and archived customers SELECT customer_id, customer_name, email, 'active' AS status FROM customers WHERE is_active = 1 UNION SELECT customer_id, customer_name, email, 'archived' AS status FROM archived_customers;
-- Products from multiple sources SELECT product_name, price, 'warehouse_a' AS source FROM warehouse_a_products WHERE stock > 0 UNION SELECT product_name, price, 'warehouse_b' AS source FROM warehouse_b_products WHERE stock > 0 ORDER BY product_name;
Tip: ORDER BY and LIMIT can only be used at the end of the entire UNION query, not on individual SELECT statements. Use parentheses with ORDER BY/LIMIT for individual queries if needed.

UNION ALL - Keeping Duplicates

UNION ALL is faster because it doesn't remove duplicates. Use it when you know there are no duplicates or when duplicates are desired:

-- All transactions from multiple payment systems SELECT transaction_id, customer_id, amount, transaction_date, 'stripe' AS payment_system FROM stripe_transactions WHERE transaction_date >= '2024-01-01' UNION ALL SELECT transaction_id, customer_id, amount, transaction_date, 'paypal' AS payment_system FROM paypal_transactions WHERE transaction_date >= '2024-01-01' ORDER BY transaction_date DESC;
Performance Note: UNION ALL is significantly faster than UNION because it skips the duplicate removal step. Use UNION ALL whenever possible.

Complex UNION Queries

-- Comprehensive sales report from multiple channels SELECT order_date, 'online' AS channel, COUNT(*) AS order_count, SUM(total_amount) AS revenue FROM online_orders WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31' GROUP BY order_date UNION ALL SELECT sale_date AS order_date, 'retail' AS channel, COUNT(*) AS order_count, SUM(total_amount) AS revenue FROM retail_sales WHERE sale_date BETWEEN '2024-01-01' AND '2024-12-31' GROUP BY sale_date UNION ALL SELECT order_date, 'wholesale' AS channel, COUNT(*) AS order_count, SUM(total_amount) AS revenue FROM wholesale_orders WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31' GROUP BY order_date ORDER BY order_date, channel;

INTERSECT - Finding Common Rows

INTERSECT returns only rows that appear in both result sets. MySQL 8.0.31+ supports INTERSECT natively:

-- Customers who placed orders in both 2023 and 2024 (MySQL 8.0.31+) SELECT customer_id, customer_name FROM customers JOIN orders ON customers.customer_id = orders.customer_id WHERE YEAR(order_date) = 2023 INTERSECT SELECT customer_id, customer_name FROM customers JOIN orders ON customers.customer_id = orders.customer_id WHERE YEAR(order_date) = 2024;

Simulating INTERSECT (Pre-8.0.31)

For older MySQL versions, use INNER JOIN or EXISTS to simulate INTERSECT:

-- Method 1: Using INNER JOIN SELECT DISTINCT c.customer_id, c.customer_name FROM customers c JOIN orders o1 ON c.customer_id = o1.customer_id AND YEAR(o1.order_date) = 2023 JOIN orders o2 ON c.customer_id = o2.customer_id AND YEAR(o2.order_date) = 2024; -- Method 2: Using IN with subqueries SELECT customer_id, customer_name FROM customers WHERE customer_id IN ( SELECT customer_id FROM orders WHERE YEAR(order_date) = 2023 ) AND customer_id IN ( SELECT customer_id FROM orders WHERE YEAR(order_date) = 2024 );

EXCEPT - Finding Differences

EXCEPT (or MINUS in Oracle) returns rows from the first query that don't appear in the second:

-- Customers who ordered in 2023 but not in 2024 (MySQL 8.0.31+) SELECT customer_id, customer_name FROM customers JOIN orders ON customers.customer_id = orders.customer_id WHERE YEAR(order_date) = 2023 EXCEPT SELECT customer_id, customer_name FROM customers JOIN orders ON customers.customer_id = orders.customer_id WHERE YEAR(order_date) = 2024;

Simulating EXCEPT

For older MySQL versions, use LEFT JOIN with NULL check or NOT IN:

-- Method 1: Using LEFT JOIN SELECT DISTINCT c.customer_id, c.customer_name FROM customers c JOIN orders o1 ON c.customer_id = o1.customer_id AND YEAR(o1.order_date) = 2023 LEFT JOIN orders o2 ON c.customer_id = o2.customer_id AND YEAR(o2.order_date) = 2024 WHERE o2.order_id IS NULL; -- Method 2: Using NOT IN SELECT DISTINCT customer_id, customer_name FROM customers JOIN orders ON customers.customer_id = orders.customer_id WHERE YEAR(order_date) = 2023 AND customer_id NOT IN ( SELECT customer_id FROM orders WHERE YEAR(order_date) = 2024 ); -- Method 3: Using NOT EXISTS (most efficient) SELECT DISTINCT c.customer_id, c.customer_name FROM customers c JOIN orders o1 ON c.customer_id = o1.customer_id WHERE YEAR(o1.order_date) = 2023 AND NOT EXISTS ( SELECT 1 FROM orders o2 WHERE o2.customer_id = c.customer_id AND YEAR(o2.order_date) = 2024 );
Best Practice: NOT EXISTS is generally more efficient than NOT IN for large datasets, especially when the subquery might contain NULL values.

Order of Operations

When combining multiple set operations, understand operator precedence:

-- INTERSECT has higher precedence than UNION/EXCEPT -- This query: (A UNION B) INTERSECT C (SELECT product_id FROM featured_products UNION SELECT product_id FROM trending_products) INTERSECT SELECT product_id FROM in_stock_products; -- Use parentheses for clarity and control (SELECT product_id FROM featured_products INTERSECT SELECT product_id FROM in_stock_products) UNION (SELECT product_id FROM trending_products INTERSECT SELECT product_id FROM in_stock_products);

Real-World Use Cases

-- Customer segmentation report SELECT 'high_value' AS segment, customer_id, customer_name, total_spent FROM ( SELECT c.customer_id, c.customer_name, SUM(o.total_amount) AS total_spent FROM customers c JOIN orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.customer_name HAVING total_spent > 10000 ) AS high_value_customers UNION ALL SELECT 'medium_value' AS segment, customer_id, customer_name, total_spent FROM ( SELECT c.customer_id, c.customer_name, SUM(o.total_amount) AS total_spent FROM customers c JOIN orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.customer_name HAVING total_spent BETWEEN 1000 AND 10000 ) AS medium_value_customers UNION ALL SELECT 'low_value' AS segment, customer_id, customer_name, total_spent FROM ( SELECT c.customer_id, c.customer_name, SUM(o.total_amount) AS total_spent FROM customers c JOIN orders o ON c.customer_id = o.customer_id GROUP BY c.customer_id, c.customer_name HAVING total_spent < 1000 ) AS low_value_customers ORDER BY total_spent DESC;
-- Multi-source inventory consolidation SELECT product_name, SUM(quantity) AS total_quantity, GROUP_CONCAT(DISTINCT location) AS locations FROM ( SELECT product_name, quantity, 'Warehouse A' AS location FROM warehouse_a_inventory UNION ALL SELECT product_name, quantity, 'Warehouse B' AS location FROM warehouse_b_inventory UNION ALL SELECT product_name, quantity, 'Store Network' AS location FROM store_inventory ) AS all_inventory GROUP BY product_name HAVING total_quantity > 0 ORDER BY total_quantity DESC;

Performance Considerations

Optimization Tips:
  • Use UNION ALL instead of UNION when duplicates don't matter (much faster)
  • Add WHERE clauses to filter data before set operations
  • Use indexes on columns used in WHERE and JOIN conditions
  • Consider creating temporary tables for complex multi-step operations
  • LIMIT can be applied to the final result to reduce data transfer
  • Ensure column types match to avoid implicit conversions
-- Efficient UNION with filtering SELECT product_id, product_name, price FROM products WHERE category_id = 5 AND is_active = 1 UNION ALL SELECT product_id, product_name, price FROM archived_products WHERE category_id = 5 AND archived_date > DATE_SUB(CURDATE(), INTERVAL 1 YEAR) ORDER BY price DESC LIMIT 100;

Practice Exercise:

Challenge: Write queries for these scenarios:

  1. Find products available in all three warehouses (simulate INTERSECT)
  2. Create a unified contact list from customers, suppliers, and employees
  3. Find users who signed up but never made a purchase (simulate EXCEPT)

Solutions:

-- 1. Products in all warehouses SELECT product_id, product_name FROM products p WHERE EXISTS (SELECT 1 FROM warehouse_a WHERE product_id = p.product_id) AND EXISTS (SELECT 1 FROM warehouse_b WHERE product_id = p.product_id) AND EXISTS (SELECT 1 FROM warehouse_c WHERE product_id = p.product_id); -- 2. Unified contact list SELECT name, email, phone, 'customer' AS type FROM customers UNION SELECT company_name, contact_email, contact_phone, 'supplier' FROM suppliers UNION SELECT CONCAT(first_name, ' ', last_name), email, phone, 'employee' FROM employees ORDER BY name; -- 3. Users who never purchased SELECT user_id, username, email FROM users WHERE NOT EXISTS ( SELECT 1 FROM orders WHERE orders.user_id = users.user_id );

Summary

In this lesson, you mastered:

  • UNION and UNION ALL for combining result sets
  • INTERSECT for finding common rows (MySQL 8.0.31+)
  • EXCEPT for finding differences between sets
  • Simulating INTERSECT and EXCEPT in older MySQL versions
  • Order of operations and using parentheses for control
  • Real-world applications and performance optimization
Next Up: In the next lesson, we'll explore Window Functions and Analytics for advanced data analysis and reporting!

ES
Edrees Salih
16 hours ago

We are still cooking the magic in the way!