MySQL & Database Design

Time-Series & Analytics Database Design

13 min Lesson 39 of 40

Time-Series & Analytics Database Design

Time-series data is essential for tracking metrics, logs, sensor data, and user analytics. In this lesson, we'll design databases optimized for high-volume time-stamped data, implement efficient partitioning strategies, and explore data warehousing concepts for analytics.

Time-Series Data Characteristics

Time-series databases have unique requirements:

Characteristics: - High write volume (thousands per second) - Time-based queries (last hour, day, month) - Mostly append-only (rarely update old data) - Data grows continuously - Aggregations over time periods Common Use Cases: - Application performance monitoring (APM) - IoT sensor data - Financial trading data - User behavior analytics - System metrics and logs - Website traffic analytics

Time-Series Schema Design

Optimized schema for time-series data:

-- Metrics data (optimized for time-series) CREATE TABLE metrics ( id BIGINT UNSIGNED AUTO_INCREMENT, metric_name VARCHAR(100) NOT NULL, metric_value DOUBLE NOT NULL, tags JSON COMMENT 'Additional dimensions: {host, region, etc}', recorded_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3), PRIMARY KEY (recorded_at, id), INDEX idx_metric_time (metric_name, recorded_at), INDEX idx_recorded (recorded_at) ) ENGINE=InnoDB PARTITION BY RANGE (UNIX_TIMESTAMP(recorded_at)) ( PARTITION p202401 VALUES LESS THAN (UNIX_TIMESTAMP('2024-02-01')), PARTITION p202402 VALUES LESS THAN (UNIX_TIMESTAMP('2024-03-01')), PARTITION p202403 VALUES LESS THAN (UNIX_TIMESTAMP('2024-04-01')), PARTITION p202404 VALUES LESS THAN (UNIX_TIMESTAMP('2024-05-01')), PARTITION p_future VALUES LESS THAN MAXVALUE ); -- Page views tracking CREATE TABLE page_views ( id BIGINT UNSIGNED AUTO_INCREMENT, user_id BIGINT UNSIGNED NULL, session_id VARCHAR(100) NOT NULL, page_url VARCHAR(500) NOT NULL, referrer_url VARCHAR(500), user_agent TEXT, ip_address VARCHAR(45), country_code CHAR(2), device_type ENUM('desktop', 'mobile', 'tablet') NOT NULL, viewed_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (viewed_at, id), INDEX idx_user_time (user_id, viewed_at), INDEX idx_session (session_id, viewed_at), INDEX idx_url_time (page_url(100), viewed_at) ) ENGINE=InnoDB PARTITION BY RANGE (TO_DAYS(viewed_at)) ( PARTITION p_week1 VALUES LESS THAN (TO_DAYS('2024-01-08')), PARTITION p_week2 VALUES LESS THAN (TO_DAYS('2024-01-15')), PARTITION p_week3 VALUES LESS THAN (TO_DAYS('2024-01-22')), PARTITION p_week4 VALUES LESS THAN (TO_DAYS('2024-01-29')), PARTITION p_future VALUES LESS THAN MAXVALUE ); -- Application events/logs CREATE TABLE app_events ( id BIGINT UNSIGNED AUTO_INCREMENT, event_type VARCHAR(50) NOT NULL, event_category VARCHAR(50) NOT NULL, user_id BIGINT UNSIGNED NULL, properties JSON, severity ENUM('info', 'warning', 'error', 'critical') DEFAULT 'info', created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (created_at, id), INDEX idx_type_time (event_type, created_at), INDEX idx_user_time (user_id, created_at), INDEX idx_severity (severity, created_at) ) ENGINE=InnoDB PARTITION BY RANGE (UNIX_TIMESTAMP(created_at)) ( PARTITION p_day1 VALUES LESS THAN (UNIX_TIMESTAMP('2024-01-02')), PARTITION p_day2 VALUES LESS THAN (UNIX_TIMESTAMP('2024-01-03')), PARTITION p_day3 VALUES LESS THAN (UNIX_TIMESTAMP('2024-01-04')), PARTITION p_future VALUES LESS THAN MAXVALUE ); -- System metrics (CPU, memory, disk) CREATE TABLE system_metrics ( id BIGINT UNSIGNED AUTO_INCREMENT, hostname VARCHAR(100) NOT NULL, metric_type ENUM('cpu', 'memory', 'disk', 'network') NOT NULL, metric_value DOUBLE NOT NULL, unit VARCHAR(20) NOT NULL COMMENT '%, MB, GB, etc', recorded_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP, PRIMARY KEY (recorded_at, id), INDEX idx_host_type (hostname, metric_type, recorded_at) ) ENGINE=InnoDB PARTITION BY RANGE (UNIX_TIMESTAMP(recorded_at)) ( PARTITION p_hour1 VALUES LESS THAN (UNIX_TIMESTAMP('2024-01-01 01:00:00')), PARTITION p_hour2 VALUES LESS THAN (UNIX_TIMESTAMP('2024-01-01 02:00:00')), PARTITION p_hour3 VALUES LESS THAN (UNIX_TIMESTAMP('2024-01-01 03:00:00')), PARTITION p_future VALUES LESS THAN MAXVALUE );
Partitioning Strategy: Partition by time to enable fast queries on recent data and easy deletion of old data by dropping partitions.

Time-Series Queries

Common patterns for querying time-series data:

-- Get metrics for last hour SELECT metric_name, AVG(metric_value) AS avg_value, MAX(metric_value) AS max_value, MIN(metric_value) AS min_value, COUNT(*) AS data_points FROM metrics WHERE recorded_at >= NOW() - INTERVAL 1 HOUR GROUP BY metric_name; -- Get page views per hour for last 24 hours SELECT DATE_FORMAT(viewed_at, '%Y-%m-%d %H:00:00') AS hour, COUNT(*) AS views, COUNT(DISTINCT session_id) AS unique_sessions, COUNT(DISTINCT user_id) AS unique_users FROM page_views WHERE viewed_at >= NOW() - INTERVAL 24 HOUR GROUP BY hour ORDER BY hour; -- Get top pages by views SELECT page_url, COUNT(*) AS view_count, COUNT(DISTINCT user_id) AS unique_visitors FROM page_views WHERE viewed_at >= NOW() - INTERVAL 7 DAY GROUP BY page_url ORDER BY view_count DESC LIMIT 20; -- Calculate moving average (5-minute windows) SELECT DATE_FORMAT(recorded_at, '%Y-%m-%d %H:%i:00') AS time_bucket, metric_name, AVG(metric_value) AS avg_value FROM metrics WHERE recorded_at >= NOW() - INTERVAL 1 HOUR AND metric_name = 'cpu_usage' GROUP BY time_bucket, metric_name ORDER BY time_bucket; -- Detect anomalies (values 2x stddev from mean) WITH metric_stats AS ( SELECT metric_name, AVG(metric_value) AS mean_value, STDDEV(metric_value) AS stddev_value FROM metrics WHERE recorded_at >= NOW() - INTERVAL 1 DAY GROUP BY metric_name ) SELECT m.recorded_at, m.metric_name, m.metric_value, ms.mean_value, ms.stddev_value FROM metrics m JOIN metric_stats ms ON m.metric_name = ms.metric_name WHERE m.recorded_at >= NOW() - INTERVAL 1 HOUR AND ABS(m.metric_value - ms.mean_value) > (2 * ms.stddev_value) ORDER BY m.recorded_at DESC;

Data Aggregation Tables

Pre-aggregate data for faster queries:

-- Hourly aggregations CREATE TABLE metrics_hourly ( metric_name VARCHAR(100) NOT NULL, hour_start TIMESTAMP NOT NULL, avg_value DOUBLE NOT NULL, min_value DOUBLE NOT NULL, max_value DOUBLE NOT NULL, sum_value DOUBLE NOT NULL, count_value BIGINT UNSIGNED NOT NULL, PRIMARY KEY (metric_name, hour_start), INDEX idx_hour (hour_start) ) ENGINE=InnoDB; -- Daily aggregations CREATE TABLE metrics_daily ( metric_name VARCHAR(100) NOT NULL, date DATE NOT NULL, avg_value DOUBLE NOT NULL, min_value DOUBLE NOT NULL, max_value DOUBLE NOT NULL, sum_value DOUBLE NOT NULL, count_value BIGINT UNSIGNED NOT NULL, PRIMARY KEY (metric_name, date), INDEX idx_date (date) ) ENGINE=InnoDB; -- Populate hourly aggregations (run every hour) INSERT INTO metrics_hourly SELECT metric_name, DATE_FORMAT(recorded_at, '%Y-%m-%d %H:00:00') AS hour_start, AVG(metric_value) AS avg_value, MIN(metric_value) AS min_value, MAX(metric_value) AS max_value, SUM(metric_value) AS sum_value, COUNT(*) AS count_value FROM metrics WHERE recorded_at >= DATE_FORMAT(NOW() - INTERVAL 1 HOUR, '%Y-%m-%d %H:00:00') AND recorded_at < DATE_FORMAT(NOW(), '%Y-%m-%d %H:00:00') GROUP BY metric_name, hour_start ON DUPLICATE KEY UPDATE avg_value = VALUES(avg_value), min_value = VALUES(min_value), max_value = VALUES(max_value), sum_value = VALUES(sum_value), count_value = VALUES(count_value); -- Query aggregated data (much faster) SELECT metric_name, AVG(avg_value) AS avg_value, MAX(max_value) AS max_value, MIN(min_value) AS min_value FROM metrics_hourly WHERE hour_start >= NOW() - INTERVAL 7 DAY GROUP BY metric_name;
Performance Tip: Always query aggregated tables for historical data (older than 24 hours) and only query raw tables for real-time data.

Analytics Database (OLAP)

Data warehouse schema for business intelligence:

-- Fact table: Sales transactions CREATE TABLE fact_sales ( sale_id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY, date_key INT UNSIGNED NOT NULL, product_key INT UNSIGNED NOT NULL, customer_key INT UNSIGNED NOT NULL, store_key INT UNSIGNED NOT NULL, quantity INT UNSIGNED NOT NULL, unit_price DECIMAL(10, 2) NOT NULL, discount_amount DECIMAL(10, 2) DEFAULT 0, total_amount DECIMAL(10, 2) NOT NULL, profit_amount DECIMAL(10, 2) NOT NULL, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, INDEX idx_date (date_key), INDEX idx_product (product_key), INDEX idx_customer (customer_key), INDEX idx_store (store_key) ) ENGINE=InnoDB; -- Dimension table: Date CREATE TABLE dim_date ( date_key INT UNSIGNED PRIMARY KEY, full_date DATE NOT NULL, year INT NOT NULL, quarter INT NOT NULL, month INT NOT NULL, week INT NOT NULL, day_of_month INT NOT NULL, day_of_week INT NOT NULL, day_name VARCHAR(20) NOT NULL, month_name VARCHAR(20) NOT NULL, is_weekend BOOLEAN NOT NULL, is_holiday BOOLEAN DEFAULT FALSE, UNIQUE KEY unique_date (full_date), INDEX idx_year_month (year, month) ) ENGINE=InnoDB; -- Dimension table: Product CREATE TABLE dim_product ( product_key INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, product_id BIGINT UNSIGNED NOT NULL COMMENT 'ID from OLTP system', product_name VARCHAR(255) NOT NULL, category VARCHAR(100) NOT NULL, subcategory VARCHAR(100), brand VARCHAR(100), unit_cost DECIMAL(10, 2) NOT NULL, is_active BOOLEAN DEFAULT TRUE, INDEX idx_category (category), INDEX idx_brand (brand) ) ENGINE=InnoDB; -- Dimension table: Customer CREATE TABLE dim_customer ( customer_key INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, customer_id BIGINT UNSIGNED NOT NULL COMMENT 'ID from OLTP system', customer_name VARCHAR(200) NOT NULL, customer_type ENUM('individual', 'business') NOT NULL, country VARCHAR(100) NOT NULL, region VARCHAR(100) NOT NULL, city VARCHAR(100) NOT NULL, customer_since DATE NOT NULL, lifetime_value DECIMAL(12, 2) DEFAULT 0, INDEX idx_country (country), INDEX idx_region (region) ) ENGINE=InnoDB; -- Dimension table: Store CREATE TABLE dim_store ( store_key INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, store_id INT UNSIGNED NOT NULL COMMENT 'ID from OLTP system', store_name VARCHAR(200) NOT NULL, store_type ENUM('physical', 'online', 'warehouse') NOT NULL, country VARCHAR(100) NOT NULL, region VARCHAR(100) NOT NULL, city VARCHAR(100) NOT NULL, opened_date DATE NOT NULL, INDEX idx_country (country), INDEX idx_type (store_type) ) ENGINE=InnoDB;
Star Schema: Fact table at center connected to dimension tables. Optimized for analytical queries with denormalized dimensions.

Analytics Queries (OLAP)

Complex analytical queries:

-- Sales by month and category SELECT d.year, d.month_name, p.category, SUM(f.total_amount) AS total_sales, SUM(f.profit_amount) AS total_profit, COUNT(DISTINCT f.customer_key) AS unique_customers FROM fact_sales f JOIN dim_date d ON f.date_key = d.date_key JOIN dim_product p ON f.product_key = p.product_key WHERE d.year = 2024 GROUP BY d.year, d.month, d.month_name, p.category ORDER BY d.month, total_sales DESC; -- Top products by profit SELECT p.product_name, p.category, SUM(f.quantity) AS units_sold, SUM(f.total_amount) AS revenue, SUM(f.profit_amount) AS profit, ROUND(SUM(f.profit_amount) / SUM(f.total_amount) * 100, 2) AS profit_margin FROM fact_sales f JOIN dim_product p ON f.product_key = p.product_key JOIN dim_date d ON f.date_key = d.date_key WHERE d.year = 2024 GROUP BY p.product_key, p.product_name, p.category ORDER BY profit DESC LIMIT 20; -- Customer segmentation by value SELECT CASE WHEN c.lifetime_value >= 10000 THEN 'VIP' WHEN c.lifetime_value >= 5000 THEN 'High Value' WHEN c.lifetime_value >= 1000 THEN 'Medium Value' ELSE 'Low Value' END AS customer_segment, COUNT(DISTINCT c.customer_key) AS customer_count, AVG(c.lifetime_value) AS avg_lifetime_value, SUM(f.total_amount) AS total_sales FROM dim_customer c LEFT JOIN fact_sales f ON c.customer_key = f.customer_key AND f.date_key >= 20240101 GROUP BY customer_segment ORDER BY avg_lifetime_value DESC; -- Year-over-year growth SELECT d.month_name, SUM(CASE WHEN d.year = 2024 THEN f.total_amount ELSE 0 END) AS sales_2024, SUM(CASE WHEN d.year = 2023 THEN f.total_amount ELSE 0 END) AS sales_2023, ROUND( (SUM(CASE WHEN d.year = 2024 THEN f.total_amount ELSE 0 END) - SUM(CASE WHEN d.year = 2023 THEN f.total_amount ELSE 0 END)) / NULLIF(SUM(CASE WHEN d.year = 2023 THEN f.total_amount ELSE 0 END), 0) * 100, 2) AS growth_percentage FROM fact_sales f JOIN dim_date d ON f.date_key = d.date_key WHERE d.year IN (2023, 2024) GROUP BY d.month, d.month_name ORDER BY d.month; -- Cohort analysis SELECT DATE_FORMAT(c.customer_since, '%Y-%m') AS cohort_month, COUNT(DISTINCT c.customer_key) AS cohort_size, SUM(f.total_amount) AS total_revenue, AVG(f.total_amount) AS avg_revenue_per_customer FROM dim_customer c LEFT JOIN fact_sales f ON c.customer_key = f.customer_key WHERE c.customer_since >= '2024-01-01' GROUP BY cohort_month ORDER BY cohort_month;

Data Archiving Strategy

-- Archive old partitions ALTER TABLE metrics DROP PARTITION p202401; -- Move to archive table before dropping CREATE TABLE metrics_archive LIKE metrics; ALTER TABLE metrics_archive REMOVE PARTITIONING; INSERT INTO metrics_archive SELECT * FROM metrics PARTITION (p202401); -- Then drop the partition ALTER TABLE metrics DROP PARTITION p202401; -- Automated cleanup (delete data older than 90 days) DELETE FROM app_events WHERE created_at < NOW() - INTERVAL 90 DAY LIMIT 10000; -- Batch delete to avoid locks DELIMITER $$ CREATE PROCEDURE cleanup_old_events() BEGIN DECLARE rows_deleted INT DEFAULT 1; WHILE rows_deleted > 0 DO DELETE FROM app_events WHERE created_at < NOW() - INTERVAL 90 DAY LIMIT 1000; SET rows_deleted = ROW_COUNT(); DO SLEEP(1); END WHILE; END$$ DELIMITER ;
Performance Warning: Deleting millions of rows can lock tables. Use partitioning or batch deletes with sleep intervals.

ETL Process (OLTP to OLAP)

-- Extract, Transform, Load from OLTP to OLAP -- Step 1: Extract new sales from OLTP CREATE TEMPORARY TABLE temp_new_sales AS SELECT o.id AS order_id, DATE(o.created_at) AS sale_date, oi.product_id, o.user_id AS customer_id, 1 AS store_id, oi.quantity, oi.unit_price, 0 AS discount_amount, oi.total_price AS total_amount, (oi.total_price - (oi.quantity * p.cost_price)) AS profit_amount FROM orders o JOIN order_items oi ON o.id = oi.order_id JOIN products p ON oi.product_id = p.id WHERE o.created_at >= CURDATE() - INTERVAL 1 DAY AND o.status = 'completed'; -- Step 2: Load into fact table INSERT INTO fact_sales (date_key, product_key, customer_key, store_key, quantity, unit_price, discount_amount, total_amount, profit_amount) SELECT CAST(DATE_FORMAT(t.sale_date, '%Y%m%d') AS UNSIGNED) AS date_key, dp.product_key, dc.customer_key, ds.store_key, t.quantity, t.unit_price, t.discount_amount, t.total_amount, t.profit_amount FROM temp_new_sales t JOIN dim_product dp ON t.product_id = dp.product_id JOIN dim_customer dc ON t.customer_id = dc.customer_id JOIN dim_store ds ON t.store_id = ds.store_id;

Practice Exercise:

Task: Create a user session analytics system.

Requirements:

  1. Track user sessions with start/end times
  2. Record pages viewed per session
  3. Calculate session duration
  4. Identify bounce rate (single page sessions)

Solution:

CREATE TABLE user_sessions ( id BIGINT UNSIGNED AUTO_INCREMENT, session_id VARCHAR(100) UNIQUE NOT NULL, user_id BIGINT UNSIGNED NULL, started_at TIMESTAMP NOT NULL, ended_at TIMESTAMP NULL, page_count INT UNSIGNED DEFAULT 0, duration_seconds INT UNSIGNED DEFAULT 0, device_type ENUM('desktop', 'mobile', 'tablet') NOT NULL, PRIMARY KEY (started_at, id), INDEX idx_session (session_id), INDEX idx_user (user_id, started_at) ) ENGINE=InnoDB; -- Calculate bounce rate SELECT DATE(started_at) AS date, COUNT(*) AS total_sessions, SUM(CASE WHEN page_count = 1 THEN 1 ELSE 0 END) AS bounce_sessions, ROUND(SUM(CASE WHEN page_count = 1 THEN 1 ELSE 0 END) / COUNT(*) * 100, 2) AS bounce_rate FROM user_sessions WHERE started_at >= CURDATE() - INTERVAL 7 DAY GROUP BY date ORDER BY date; -- Average session duration by device SELECT device_type, COUNT(*) AS sessions, ROUND(AVG(duration_seconds), 0) AS avg_duration_seconds, ROUND(AVG(page_count), 1) AS avg_pages_per_session FROM user_sessions WHERE started_at >= CURDATE() - INTERVAL 30 DAY AND duration_seconds > 0 GROUP BY device_type;

Summary

In this lesson, you learned:

  • Designing databases for time-series data
  • Implementing table partitioning for performance
  • Creating pre-aggregated summary tables
  • Building star schema for data warehousing
  • Writing analytical queries for business intelligence
  • ETL processes from OLTP to OLAP
  • Data archiving and cleanup strategies
Next Up: In the final lesson, we'll review all database design patterns and best practices covered throughout this tutorial!