UNION, INTERSECT, and Set Operations
Set operations allow you to combine, compare, and manipulate result sets from multiple queries. These operations treat query results as mathematical sets, enabling powerful data analysis and reporting capabilities.
Understanding Set Operations
Set operations combine rows from two or more SELECT statements based on set theory principles:
UNION: Combines results, removes duplicates
UNION ALL: Combines results, keeps duplicates (faster)
INTERSECT: Returns only common rows (MySQL 8.0.31+)
EXCEPT/MINUS: Returns rows in first set but not in second
Important: All queries in a set operation must have the same number of columns with compatible data types. Column names come from the first query.
UNION - Combining Result Sets
UNION combines results from multiple queries and removes duplicate rows:
-- Combine active and archived customers
SELECT customer_id, customer_name, email, 'active' AS status
FROM customers
WHERE is_active = 1
UNION
SELECT customer_id, customer_name, email, 'archived' AS status
FROM archived_customers;
-- Products from multiple sources
SELECT product_name, price, 'warehouse_a' AS source
FROM warehouse_a_products
WHERE stock > 0
UNION
SELECT product_name, price, 'warehouse_b' AS source
FROM warehouse_b_products
WHERE stock > 0
ORDER BY product_name;
Tip: ORDER BY and LIMIT can only be used at the end of the entire UNION query, not on individual SELECT statements. Use parentheses with ORDER BY/LIMIT for individual queries if needed.
UNION ALL - Keeping Duplicates
UNION ALL is faster because it doesn't remove duplicates. Use it when you know there are no duplicates or when duplicates are desired:
-- All transactions from multiple payment systems
SELECT
transaction_id,
customer_id,
amount,
transaction_date,
'stripe' AS payment_system
FROM stripe_transactions
WHERE transaction_date >= '2024-01-01'
UNION ALL
SELECT
transaction_id,
customer_id,
amount,
transaction_date,
'paypal' AS payment_system
FROM paypal_transactions
WHERE transaction_date >= '2024-01-01'
ORDER BY transaction_date DESC;
Performance Note: UNION ALL is significantly faster than UNION because it skips the duplicate removal step. Use UNION ALL whenever possible.
Complex UNION Queries
-- Comprehensive sales report from multiple channels
SELECT
order_date,
'online' AS channel,
COUNT(*) AS order_count,
SUM(total_amount) AS revenue
FROM online_orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY order_date
UNION ALL
SELECT
sale_date AS order_date,
'retail' AS channel,
COUNT(*) AS order_count,
SUM(total_amount) AS revenue
FROM retail_sales
WHERE sale_date BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY sale_date
UNION ALL
SELECT
order_date,
'wholesale' AS channel,
COUNT(*) AS order_count,
SUM(total_amount) AS revenue
FROM wholesale_orders
WHERE order_date BETWEEN '2024-01-01' AND '2024-12-31'
GROUP BY order_date
ORDER BY order_date, channel;
INTERSECT - Finding Common Rows
INTERSECT returns only rows that appear in both result sets. MySQL 8.0.31+ supports INTERSECT natively:
-- Customers who placed orders in both 2023 and 2024 (MySQL 8.0.31+)
SELECT customer_id, customer_name
FROM customers
JOIN orders ON customers.customer_id = orders.customer_id
WHERE YEAR(order_date) = 2023
INTERSECT
SELECT customer_id, customer_name
FROM customers
JOIN orders ON customers.customer_id = orders.customer_id
WHERE YEAR(order_date) = 2024;
Simulating INTERSECT (Pre-8.0.31)
For older MySQL versions, use INNER JOIN or EXISTS to simulate INTERSECT:
-- Method 1: Using INNER JOIN
SELECT DISTINCT c.customer_id, c.customer_name
FROM customers c
JOIN orders o1 ON c.customer_id = o1.customer_id AND YEAR(o1.order_date) = 2023
JOIN orders o2 ON c.customer_id = o2.customer_id AND YEAR(o2.order_date) = 2024;
-- Method 2: Using IN with subqueries
SELECT customer_id, customer_name
FROM customers
WHERE customer_id IN (
SELECT customer_id FROM orders WHERE YEAR(order_date) = 2023
)
AND customer_id IN (
SELECT customer_id FROM orders WHERE YEAR(order_date) = 2024
);
EXCEPT - Finding Differences
EXCEPT (or MINUS in Oracle) returns rows from the first query that don't appear in the second:
-- Customers who ordered in 2023 but not in 2024 (MySQL 8.0.31+)
SELECT customer_id, customer_name
FROM customers
JOIN orders ON customers.customer_id = orders.customer_id
WHERE YEAR(order_date) = 2023
EXCEPT
SELECT customer_id, customer_name
FROM customers
JOIN orders ON customers.customer_id = orders.customer_id
WHERE YEAR(order_date) = 2024;
Simulating EXCEPT
For older MySQL versions, use LEFT JOIN with NULL check or NOT IN:
-- Method 1: Using LEFT JOIN
SELECT DISTINCT c.customer_id, c.customer_name
FROM customers c
JOIN orders o1 ON c.customer_id = o1.customer_id AND YEAR(o1.order_date) = 2023
LEFT JOIN orders o2 ON c.customer_id = o2.customer_id AND YEAR(o2.order_date) = 2024
WHERE o2.order_id IS NULL;
-- Method 2: Using NOT IN
SELECT DISTINCT customer_id, customer_name
FROM customers
JOIN orders ON customers.customer_id = orders.customer_id
WHERE YEAR(order_date) = 2023
AND customer_id NOT IN (
SELECT customer_id
FROM orders
WHERE YEAR(order_date) = 2024
);
-- Method 3: Using NOT EXISTS (most efficient)
SELECT DISTINCT c.customer_id, c.customer_name
FROM customers c
JOIN orders o1 ON c.customer_id = o1.customer_id
WHERE YEAR(o1.order_date) = 2023
AND NOT EXISTS (
SELECT 1
FROM orders o2
WHERE o2.customer_id = c.customer_id
AND YEAR(o2.order_date) = 2024
);
Best Practice: NOT EXISTS is generally more efficient than NOT IN for large datasets, especially when the subquery might contain NULL values.
Order of Operations
When combining multiple set operations, understand operator precedence:
-- INTERSECT has higher precedence than UNION/EXCEPT
-- This query: (A UNION B) INTERSECT C
(SELECT product_id FROM featured_products
UNION
SELECT product_id FROM trending_products)
INTERSECT
SELECT product_id FROM in_stock_products;
-- Use parentheses for clarity and control
(SELECT product_id FROM featured_products
INTERSECT
SELECT product_id FROM in_stock_products)
UNION
(SELECT product_id FROM trending_products
INTERSECT
SELECT product_id FROM in_stock_products);
Real-World Use Cases
-- Customer segmentation report
SELECT
'high_value' AS segment,
customer_id,
customer_name,
total_spent
FROM (
SELECT c.customer_id, c.customer_name, SUM(o.total_amount) AS total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name
HAVING total_spent > 10000
) AS high_value_customers
UNION ALL
SELECT
'medium_value' AS segment,
customer_id,
customer_name,
total_spent
FROM (
SELECT c.customer_id, c.customer_name, SUM(o.total_amount) AS total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name
HAVING total_spent BETWEEN 1000 AND 10000
) AS medium_value_customers
UNION ALL
SELECT
'low_value' AS segment,
customer_id,
customer_name,
total_spent
FROM (
SELECT c.customer_id, c.customer_name, SUM(o.total_amount) AS total_spent
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id, c.customer_name
HAVING total_spent < 1000
) AS low_value_customers
ORDER BY total_spent DESC;
-- Multi-source inventory consolidation
SELECT
product_name,
SUM(quantity) AS total_quantity,
GROUP_CONCAT(DISTINCT location) AS locations
FROM (
SELECT product_name, quantity, 'Warehouse A' AS location
FROM warehouse_a_inventory
UNION ALL
SELECT product_name, quantity, 'Warehouse B' AS location
FROM warehouse_b_inventory
UNION ALL
SELECT product_name, quantity, 'Store Network' AS location
FROM store_inventory
) AS all_inventory
GROUP BY product_name
HAVING total_quantity > 0
ORDER BY total_quantity DESC;
Performance Considerations
Optimization Tips:
- Use UNION ALL instead of UNION when duplicates don't matter (much faster)
- Add WHERE clauses to filter data before set operations
- Use indexes on columns used in WHERE and JOIN conditions
- Consider creating temporary tables for complex multi-step operations
- LIMIT can be applied to the final result to reduce data transfer
- Ensure column types match to avoid implicit conversions
-- Efficient UNION with filtering
SELECT product_id, product_name, price
FROM products
WHERE category_id = 5 AND is_active = 1
UNION ALL
SELECT product_id, product_name, price
FROM archived_products
WHERE category_id = 5 AND archived_date > DATE_SUB(CURDATE(), INTERVAL 1 YEAR)
ORDER BY price DESC
LIMIT 100;
Practice Exercise:
Challenge: Write queries for these scenarios:
- Find products available in all three warehouses (simulate INTERSECT)
- Create a unified contact list from customers, suppliers, and employees
- Find users who signed up but never made a purchase (simulate EXCEPT)
Solutions:
-- 1. Products in all warehouses
SELECT product_id, product_name
FROM products p
WHERE EXISTS (SELECT 1 FROM warehouse_a WHERE product_id = p.product_id)
AND EXISTS (SELECT 1 FROM warehouse_b WHERE product_id = p.product_id)
AND EXISTS (SELECT 1 FROM warehouse_c WHERE product_id = p.product_id);
-- 2. Unified contact list
SELECT name, email, phone, 'customer' AS type FROM customers
UNION
SELECT company_name, contact_email, contact_phone, 'supplier' FROM suppliers
UNION
SELECT CONCAT(first_name, ' ', last_name), email, phone, 'employee' FROM employees
ORDER BY name;
-- 3. Users who never purchased
SELECT user_id, username, email
FROM users
WHERE NOT EXISTS (
SELECT 1 FROM orders WHERE orders.user_id = users.user_id
);
Summary
In this lesson, you mastered:
- UNION and UNION ALL for combining result sets
- INTERSECT for finding common rows (MySQL 8.0.31+)
- EXCEPT for finding differences between sets
- Simulating INTERSECT and EXCEPT in older MySQL versions
- Order of operations and using parentheses for control
- Real-world applications and performance optimization
Next Up: In the next lesson, we'll explore Window Functions and Analytics for advanced data analysis and reporting!