MySQL & Database Design

Window Functions & Analytics

13 min Lesson 10 of 40

Window Functions & Analytics

Window functions (also called analytic functions) perform calculations across a set of table rows related to the current row. Unlike GROUP BY, window functions don't collapse rows - they add calculated columns while preserving all original rows.

What are Window Functions?

Window functions operate on a "window" of rows and return a value for each row based on calculations over that window:

Key Difference: GROUP BY reduces rows to groups. Window functions keep all rows and add analytical columns. This makes them perfect for rankings, running totals, and comparative analytics.
-- Basic window function syntax SELECT column1, column2, WINDOW_FUNCTION() OVER ( [PARTITION BY partition_column] [ORDER BY sort_column] [ROWS or RANGE frame_specification] ) AS result_column FROM table_name;

ROW_NUMBER() - Sequential Numbering

ROW_NUMBER() assigns a unique sequential number to each row within a partition:

-- Number all orders for each customer SELECT customer_id, order_id, order_date, total_amount, ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY order_date ) AS order_sequence FROM orders;
-- Find each customer's most recent order SELECT * FROM ( SELECT customer_id, order_id, order_date, total_amount, ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY order_date DESC ) AS rn FROM orders ) AS ranked_orders WHERE rn = 1;

RANK() and DENSE_RANK()

RANK() and DENSE_RANK() assign rankings with different handling of ties:

-- Compare RANK vs DENSE_RANK SELECT product_name, sales_count, RANK() OVER (ORDER BY sales_count DESC) AS rank_position, DENSE_RANK() OVER (ORDER BY sales_count DESC) AS dense_rank_position FROM products; -- RANK(): 1, 2, 2, 4, 5 (skips numbers after ties) -- DENSE_RANK(): 1, 2, 2, 3, 4 (no gaps)
-- Top 3 products per category SELECT * FROM ( SELECT category, product_name, price, DENSE_RANK() OVER ( PARTITION BY category ORDER BY price DESC ) AS price_rank FROM products ) AS ranked_products WHERE price_rank <= 3;
When to Use: Use RANK() for traditional rankings with gaps after ties. Use DENSE_RANK() when you want consecutive rankings. Use ROW_NUMBER() when you need unique numbers (even for ties).

PARTITION BY - Creating Windows

PARTITION BY divides result sets into groups (partitions) for the window function:

-- Sales ranking within each region SELECT region, salesperson_name, total_sales, RANK() OVER ( PARTITION BY region ORDER BY total_sales DESC ) AS regional_rank, RANK() OVER ( ORDER BY total_sales DESC ) AS overall_rank FROM sales_data;
-- Percentage of category total SELECT category, product_name, revenue, SUM(revenue) OVER (PARTITION BY category) AS category_total, ROUND(revenue * 100.0 / SUM(revenue) OVER (PARTITION BY category), 2) AS pct_of_category FROM product_revenue;

LAG() and LEAD() - Accessing Adjacent Rows

LAG() accesses previous rows, LEAD() accesses following rows:

-- Compare each month's sales to previous month SELECT sale_month, monthly_revenue, LAG(monthly_revenue) OVER (ORDER BY sale_month) AS previous_month, monthly_revenue - LAG(monthly_revenue) OVER (ORDER BY sale_month) AS month_over_month_change, ROUND((monthly_revenue - LAG(monthly_revenue) OVER (ORDER BY sale_month)) * 100.0 / LAG(monthly_revenue) OVER (ORDER BY sale_month), 2) AS pct_change FROM monthly_sales ORDER BY sale_month;
-- Compare current price to previous and next price changes SELECT product_name, price_date, price, LAG(price, 1) OVER (PARTITION BY product_id ORDER BY price_date) AS previous_price, LEAD(price, 1) OVER (PARTITION BY product_id ORDER BY price_date) AS next_price, price - LAG(price, 1) OVER (PARTITION BY product_id ORDER BY price_date) AS price_change FROM price_history;
Parameters: LAG(column, offset, default) and LEAD(column, offset, default). Offset defaults to 1, default value is NULL if no row exists at that offset.

Running Totals and Cumulative Aggregates

Window functions can calculate running totals, moving averages, and cumulative statistics:

-- Running total of sales SELECT order_date, daily_revenue, SUM(daily_revenue) OVER ( ORDER BY order_date ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW ) AS cumulative_revenue FROM daily_sales ORDER BY order_date;
-- 7-day moving average SELECT sale_date, daily_sales, AVG(daily_sales) OVER ( ORDER BY sale_date ROWS BETWEEN 6 PRECEDING AND CURRENT ROW ) AS moving_avg_7day FROM daily_sales ORDER BY sale_date;
-- Running count and percentage SELECT order_date, order_count, SUM(order_count) OVER (ORDER BY order_date) AS cumulative_orders, ROUND(SUM(order_count) OVER (ORDER BY order_date) * 100.0 / SUM(order_count) OVER (), 2) AS cumulative_pct FROM daily_order_counts ORDER BY order_date;

Frame Specifications

Frame specifications define which rows to include in window calculations:

-- Frame specification syntax ROWS BETWEEN start AND end -- Common frame specifications: ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW -- From start to current ROWS BETWEEN 3 PRECEDING AND CURRENT ROW -- Last 4 rows including current ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING -- Current row +/- 1 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING -- Current to end ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING -- All rows (default)
-- 3-month rolling metrics SELECT month, revenue, AVG(revenue) OVER ( ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW ) AS avg_3month, MAX(revenue) OVER ( ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW ) AS max_3month FROM monthly_revenue;

NTILE() - Percentiles and Quartiles

NTILE() divides rows into a specified number of equal groups:

-- Divide customers into quartiles by spending SELECT customer_name, total_spent, NTILE(4) OVER (ORDER BY total_spent DESC) AS spending_quartile FROM customer_spending;
-- Percentile analysis SELECT spending_percentile, COUNT(*) AS customer_count, MIN(total_spent) AS min_spending, MAX(total_spent) AS max_spending FROM ( SELECT customer_name, total_spent, NTILE(100) OVER (ORDER BY total_spent) AS spending_percentile FROM customer_spending ) AS percentile_data GROUP BY spending_percentile ORDER BY spending_percentile;

Multiple Window Functions

You can use multiple window functions in the same query:

-- Comprehensive product analysis SELECT product_name, category, price, sales_count, ROW_NUMBER() OVER (PARTITION BY category ORDER BY sales_count DESC) AS sales_rank_in_category, DENSE_RANK() OVER (ORDER BY sales_count DESC) AS overall_sales_rank, ROUND(price * 100.0 / AVG(price) OVER (PARTITION BY category), 2) AS pct_of_avg_category_price, sales_count - AVG(sales_count) OVER (PARTITION BY category) AS diff_from_category_avg, NTILE(10) OVER (ORDER BY price) AS price_decile FROM products;

Real-World Analytics Examples

-- Customer cohort analysis SELECT customer_id, first_purchase_month, purchase_month, DATEDIFF(purchase_month, first_purchase_month) / 30 AS months_since_first, monthly_revenue, SUM(monthly_revenue) OVER ( PARTITION BY customer_id ORDER BY purchase_month ) AS lifetime_value, ROW_NUMBER() OVER ( PARTITION BY customer_id ORDER BY purchase_month ) AS purchase_sequence FROM ( SELECT customer_id, DATE_FORMAT(order_date, '%Y-%m-01') AS purchase_month, MIN(DATE_FORMAT(order_date, '%Y-%m-01')) OVER (PARTITION BY customer_id) AS first_purchase_month, SUM(total_amount) AS monthly_revenue FROM orders GROUP BY customer_id, DATE_FORMAT(order_date, '%Y-%m-01') ) AS customer_purchases;
-- Sales performance dashboard SELECT salesperson_name, region, monthly_sales, RANK() OVER (PARTITION BY region ORDER BY monthly_sales DESC) AS regional_rank, ROUND(monthly_sales * 100.0 / SUM(monthly_sales) OVER (PARTITION BY region), 2) AS pct_of_regional_sales, monthly_sales - LAG(monthly_sales) OVER (PARTITION BY salesperson_name ORDER BY sale_month) AS month_over_month, AVG(monthly_sales) OVER ( PARTITION BY salesperson_name ORDER BY sale_month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW ) AS avg_3month FROM salesperson_monthly_sales ORDER BY region, regional_rank;
Performance Note: Window functions can be resource-intensive on large datasets. Always add appropriate indexes on PARTITION BY and ORDER BY columns.

Practice Exercise:

Challenge: Write queries using window functions:

  1. Find products whose price is above their category average
  2. Calculate year-over-year growth rate for monthly sales
  3. Identify top 20% of customers by revenue (use NTILE)

Solutions:

-- 1. Products above category average SELECT product_name, category, price, category_avg FROM ( SELECT product_name, category, price, AVG(price) OVER (PARTITION BY category) AS category_avg FROM products ) AS price_comparison WHERE price > category_avg; -- 2. Year-over-year growth SELECT sale_month, monthly_revenue, LAG(monthly_revenue, 12) OVER (ORDER BY sale_month) AS same_month_last_year, ROUND((monthly_revenue - LAG(monthly_revenue, 12) OVER (ORDER BY sale_month)) * 100.0 / LAG(monthly_revenue, 12) OVER (ORDER BY sale_month), 2) AS yoy_growth_pct FROM monthly_sales; -- 3. Top 20% customers SELECT customer_name, total_revenue FROM ( SELECT customer_name, total_revenue, NTILE(5) OVER (ORDER BY total_revenue DESC) AS revenue_quintile FROM customer_revenue ) AS customer_segments WHERE revenue_quintile = 1;

Summary

In this lesson, you mastered:

  • Window function syntax with OVER, PARTITION BY, and ORDER BY
  • ROW_NUMBER(), RANK(), and DENSE_RANK() for rankings
  • LAG() and LEAD() for accessing adjacent rows
  • Running totals and moving averages with frame specifications
  • NTILE() for percentile and quartile analysis
  • Real-world analytics patterns using window functions
Next Up: In the next lesson, we'll explore Common Table Expressions (CTEs) for organizing complex queries and handling recursive data!

ES
Edrees Salih
18 hours ago

We are still cooking the magic in the way!