Window Functions & Analytics
Window functions (also called analytic functions) perform calculations across a set of table rows related to the current row. Unlike GROUP BY, window functions don't collapse rows - they add calculated columns while preserving all original rows.
What are Window Functions?
Window functions operate on a "window" of rows and return a value for each row based on calculations over that window:
Key Difference: GROUP BY reduces rows to groups. Window functions keep all rows and add analytical columns. This makes them perfect for rankings, running totals, and comparative analytics.
-- Basic window function syntax
SELECT
column1,
column2,
WINDOW_FUNCTION() OVER (
[PARTITION BY partition_column]
[ORDER BY sort_column]
[ROWS or RANGE frame_specification]
) AS result_column
FROM table_name;
ROW_NUMBER() - Sequential Numbering
ROW_NUMBER() assigns a unique sequential number to each row within a partition:
-- Number all orders for each customer
SELECT
customer_id,
order_id,
order_date,
total_amount,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_date
) AS order_sequence
FROM orders;
-- Find each customer's most recent order
SELECT *
FROM (
SELECT
customer_id,
order_id,
order_date,
total_amount,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY order_date DESC
) AS rn
FROM orders
) AS ranked_orders
WHERE rn = 1;
RANK() and DENSE_RANK()
RANK() and DENSE_RANK() assign rankings with different handling of ties:
-- Compare RANK vs DENSE_RANK
SELECT
product_name,
sales_count,
RANK() OVER (ORDER BY sales_count DESC) AS rank_position,
DENSE_RANK() OVER (ORDER BY sales_count DESC) AS dense_rank_position
FROM products;
-- RANK(): 1, 2, 2, 4, 5 (skips numbers after ties)
-- DENSE_RANK(): 1, 2, 2, 3, 4 (no gaps)
-- Top 3 products per category
SELECT *
FROM (
SELECT
category,
product_name,
price,
DENSE_RANK() OVER (
PARTITION BY category
ORDER BY price DESC
) AS price_rank
FROM products
) AS ranked_products
WHERE price_rank <= 3;
When to Use: Use RANK() for traditional rankings with gaps after ties. Use DENSE_RANK() when you want consecutive rankings. Use ROW_NUMBER() when you need unique numbers (even for ties).
PARTITION BY - Creating Windows
PARTITION BY divides result sets into groups (partitions) for the window function:
-- Sales ranking within each region
SELECT
region,
salesperson_name,
total_sales,
RANK() OVER (
PARTITION BY region
ORDER BY total_sales DESC
) AS regional_rank,
RANK() OVER (
ORDER BY total_sales DESC
) AS overall_rank
FROM sales_data;
-- Percentage of category total
SELECT
category,
product_name,
revenue,
SUM(revenue) OVER (PARTITION BY category) AS category_total,
ROUND(revenue * 100.0 / SUM(revenue) OVER (PARTITION BY category), 2) AS pct_of_category
FROM product_revenue;
LAG() and LEAD() - Accessing Adjacent Rows
LAG() accesses previous rows, LEAD() accesses following rows:
-- Compare each month's sales to previous month
SELECT
sale_month,
monthly_revenue,
LAG(monthly_revenue) OVER (ORDER BY sale_month) AS previous_month,
monthly_revenue - LAG(monthly_revenue) OVER (ORDER BY sale_month) AS month_over_month_change,
ROUND((monthly_revenue - LAG(monthly_revenue) OVER (ORDER BY sale_month)) * 100.0 /
LAG(monthly_revenue) OVER (ORDER BY sale_month), 2) AS pct_change
FROM monthly_sales
ORDER BY sale_month;
-- Compare current price to previous and next price changes
SELECT
product_name,
price_date,
price,
LAG(price, 1) OVER (PARTITION BY product_id ORDER BY price_date) AS previous_price,
LEAD(price, 1) OVER (PARTITION BY product_id ORDER BY price_date) AS next_price,
price - LAG(price, 1) OVER (PARTITION BY product_id ORDER BY price_date) AS price_change
FROM price_history;
Parameters: LAG(column, offset, default) and LEAD(column, offset, default). Offset defaults to 1, default value is NULL if no row exists at that offset.
Running Totals and Cumulative Aggregates
Window functions can calculate running totals, moving averages, and cumulative statistics:
-- Running total of sales
SELECT
order_date,
daily_revenue,
SUM(daily_revenue) OVER (
ORDER BY order_date
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_revenue
FROM daily_sales
ORDER BY order_date;
-- 7-day moving average
SELECT
sale_date,
daily_sales,
AVG(daily_sales) OVER (
ORDER BY sale_date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7day
FROM daily_sales
ORDER BY sale_date;
-- Running count and percentage
SELECT
order_date,
order_count,
SUM(order_count) OVER (ORDER BY order_date) AS cumulative_orders,
ROUND(SUM(order_count) OVER (ORDER BY order_date) * 100.0 /
SUM(order_count) OVER (), 2) AS cumulative_pct
FROM daily_order_counts
ORDER BY order_date;
Frame Specifications
Frame specifications define which rows to include in window calculations:
-- Frame specification syntax
ROWS BETWEEN start AND end
-- Common frame specifications:
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW -- From start to current
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW -- Last 4 rows including current
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING -- Current row +/- 1
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING -- Current to end
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING -- All rows (default)
-- 3-month rolling metrics
SELECT
month,
revenue,
AVG(revenue) OVER (
ORDER BY month
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS avg_3month,
MAX(revenue) OVER (
ORDER BY month
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS max_3month
FROM monthly_revenue;
NTILE() - Percentiles and Quartiles
NTILE() divides rows into a specified number of equal groups:
-- Divide customers into quartiles by spending
SELECT
customer_name,
total_spent,
NTILE(4) OVER (ORDER BY total_spent DESC) AS spending_quartile
FROM customer_spending;
-- Percentile analysis
SELECT
spending_percentile,
COUNT(*) AS customer_count,
MIN(total_spent) AS min_spending,
MAX(total_spent) AS max_spending
FROM (
SELECT
customer_name,
total_spent,
NTILE(100) OVER (ORDER BY total_spent) AS spending_percentile
FROM customer_spending
) AS percentile_data
GROUP BY spending_percentile
ORDER BY spending_percentile;
Multiple Window Functions
You can use multiple window functions in the same query:
-- Comprehensive product analysis
SELECT
product_name,
category,
price,
sales_count,
ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales_count DESC) AS sales_rank_in_category,
DENSE_RANK() OVER (ORDER BY sales_count DESC) AS overall_sales_rank,
ROUND(price * 100.0 / AVG(price) OVER (PARTITION BY category), 2) AS pct_of_avg_category_price,
sales_count - AVG(sales_count) OVER (PARTITION BY category) AS diff_from_category_avg,
NTILE(10) OVER (ORDER BY price) AS price_decile
FROM products;
Real-World Analytics Examples
-- Customer cohort analysis
SELECT
customer_id,
first_purchase_month,
purchase_month,
DATEDIFF(purchase_month, first_purchase_month) / 30 AS months_since_first,
monthly_revenue,
SUM(monthly_revenue) OVER (
PARTITION BY customer_id
ORDER BY purchase_month
) AS lifetime_value,
ROW_NUMBER() OVER (
PARTITION BY customer_id
ORDER BY purchase_month
) AS purchase_sequence
FROM (
SELECT
customer_id,
DATE_FORMAT(order_date, '%Y-%m-01') AS purchase_month,
MIN(DATE_FORMAT(order_date, '%Y-%m-01')) OVER (PARTITION BY customer_id) AS first_purchase_month,
SUM(total_amount) AS monthly_revenue
FROM orders
GROUP BY customer_id, DATE_FORMAT(order_date, '%Y-%m-01')
) AS customer_purchases;
-- Sales performance dashboard
SELECT
salesperson_name,
region,
monthly_sales,
RANK() OVER (PARTITION BY region ORDER BY monthly_sales DESC) AS regional_rank,
ROUND(monthly_sales * 100.0 / SUM(monthly_sales) OVER (PARTITION BY region), 2) AS pct_of_regional_sales,
monthly_sales - LAG(monthly_sales) OVER (PARTITION BY salesperson_name ORDER BY sale_month) AS month_over_month,
AVG(monthly_sales) OVER (
PARTITION BY salesperson_name
ORDER BY sale_month
ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
) AS avg_3month
FROM salesperson_monthly_sales
ORDER BY region, regional_rank;
Performance Note: Window functions can be resource-intensive on large datasets. Always add appropriate indexes on PARTITION BY and ORDER BY columns.
Practice Exercise:
Challenge: Write queries using window functions:
- Find products whose price is above their category average
- Calculate year-over-year growth rate for monthly sales
- Identify top 20% of customers by revenue (use NTILE)
Solutions:
-- 1. Products above category average
SELECT product_name, category, price, category_avg
FROM (
SELECT
product_name,
category,
price,
AVG(price) OVER (PARTITION BY category) AS category_avg
FROM products
) AS price_comparison
WHERE price > category_avg;
-- 2. Year-over-year growth
SELECT
sale_month,
monthly_revenue,
LAG(monthly_revenue, 12) OVER (ORDER BY sale_month) AS same_month_last_year,
ROUND((monthly_revenue - LAG(monthly_revenue, 12) OVER (ORDER BY sale_month)) * 100.0 /
LAG(monthly_revenue, 12) OVER (ORDER BY sale_month), 2) AS yoy_growth_pct
FROM monthly_sales;
-- 3. Top 20% customers
SELECT customer_name, total_revenue
FROM (
SELECT
customer_name,
total_revenue,
NTILE(5) OVER (ORDER BY total_revenue DESC) AS revenue_quintile
FROM customer_revenue
) AS customer_segments
WHERE revenue_quintile = 1;
Summary
In this lesson, you mastered:
- Window function syntax with OVER, PARTITION BY, and ORDER BY
- ROW_NUMBER(), RANK(), and DENSE_RANK() for rankings
- LAG() and LEAD() for accessing adjacent rows
- Running totals and moving averages with frame specifications
- NTILE() for percentile and quartile analysis
- Real-world analytics patterns using window functions
Next Up: In the next lesson, we'll explore Common Table Expressions (CTEs) for organizing complex queries and handling recursive data!