MySQL & Database Design
Time-Series & Analytics Database Design
Time-Series & Analytics Database Design
Time-series data is essential for tracking metrics, logs, sensor data, and user analytics. In this lesson, we'll design databases optimized for high-volume time-stamped data, implement efficient partitioning strategies, and explore data warehousing concepts for analytics.
Time-Series Data Characteristics
Time-series databases have unique requirements:
Characteristics:
- High write volume (thousands per second)
- Time-based queries (last hour, day, month)
- Mostly append-only (rarely update old data)
- Data grows continuously
- Aggregations over time periods
Common Use Cases:
- Application performance monitoring (APM)
- IoT sensor data
- Financial trading data
- User behavior analytics
- System metrics and logs
- Website traffic analytics
Time-Series Schema Design
Optimized schema for time-series data:
-- Metrics data (optimized for time-series)
CREATE TABLE metrics (
id BIGINT UNSIGNED AUTO_INCREMENT,
metric_name VARCHAR(100) NOT NULL,
metric_value DOUBLE NOT NULL,
tags JSON COMMENT 'Additional dimensions: {host, region, etc}',
recorded_at TIMESTAMP(3) NOT NULL DEFAULT CURRENT_TIMESTAMP(3),
PRIMARY KEY (recorded_at, id),
INDEX idx_metric_time (metric_name, recorded_at),
INDEX idx_recorded (recorded_at)
) ENGINE=InnoDB
PARTITION BY RANGE (UNIX_TIMESTAMP(recorded_at)) (
PARTITION p202401 VALUES LESS THAN (UNIX_TIMESTAMP('2024-02-01')),
PARTITION p202402 VALUES LESS THAN (UNIX_TIMESTAMP('2024-03-01')),
PARTITION p202403 VALUES LESS THAN (UNIX_TIMESTAMP('2024-04-01')),
PARTITION p202404 VALUES LESS THAN (UNIX_TIMESTAMP('2024-05-01')),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
-- Page views tracking
CREATE TABLE page_views (
id BIGINT UNSIGNED AUTO_INCREMENT,
user_id BIGINT UNSIGNED NULL,
session_id VARCHAR(100) NOT NULL,
page_url VARCHAR(500) NOT NULL,
referrer_url VARCHAR(500),
user_agent TEXT,
ip_address VARCHAR(45),
country_code CHAR(2),
device_type ENUM('desktop', 'mobile', 'tablet') NOT NULL,
viewed_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (viewed_at, id),
INDEX idx_user_time (user_id, viewed_at),
INDEX idx_session (session_id, viewed_at),
INDEX idx_url_time (page_url(100), viewed_at)
) ENGINE=InnoDB
PARTITION BY RANGE (TO_DAYS(viewed_at)) (
PARTITION p_week1 VALUES LESS THAN (TO_DAYS('2024-01-08')),
PARTITION p_week2 VALUES LESS THAN (TO_DAYS('2024-01-15')),
PARTITION p_week3 VALUES LESS THAN (TO_DAYS('2024-01-22')),
PARTITION p_week4 VALUES LESS THAN (TO_DAYS('2024-01-29')),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
-- Application events/logs
CREATE TABLE app_events (
id BIGINT UNSIGNED AUTO_INCREMENT,
event_type VARCHAR(50) NOT NULL,
event_category VARCHAR(50) NOT NULL,
user_id BIGINT UNSIGNED NULL,
properties JSON,
severity ENUM('info', 'warning', 'error', 'critical') DEFAULT 'info',
created_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (created_at, id),
INDEX idx_type_time (event_type, created_at),
INDEX idx_user_time (user_id, created_at),
INDEX idx_severity (severity, created_at)
) ENGINE=InnoDB
PARTITION BY RANGE (UNIX_TIMESTAMP(created_at)) (
PARTITION p_day1 VALUES LESS THAN (UNIX_TIMESTAMP('2024-01-02')),
PARTITION p_day2 VALUES LESS THAN (UNIX_TIMESTAMP('2024-01-03')),
PARTITION p_day3 VALUES LESS THAN (UNIX_TIMESTAMP('2024-01-04')),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
-- System metrics (CPU, memory, disk)
CREATE TABLE system_metrics (
id BIGINT UNSIGNED AUTO_INCREMENT,
hostname VARCHAR(100) NOT NULL,
metric_type ENUM('cpu', 'memory', 'disk', 'network') NOT NULL,
metric_value DOUBLE NOT NULL,
unit VARCHAR(20) NOT NULL COMMENT '%, MB, GB, etc',
recorded_at TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (recorded_at, id),
INDEX idx_host_type (hostname, metric_type, recorded_at)
) ENGINE=InnoDB
PARTITION BY RANGE (UNIX_TIMESTAMP(recorded_at)) (
PARTITION p_hour1 VALUES LESS THAN (UNIX_TIMESTAMP('2024-01-01 01:00:00')),
PARTITION p_hour2 VALUES LESS THAN (UNIX_TIMESTAMP('2024-01-01 02:00:00')),
PARTITION p_hour3 VALUES LESS THAN (UNIX_TIMESTAMP('2024-01-01 03:00:00')),
PARTITION p_future VALUES LESS THAN MAXVALUE
);
Partitioning Strategy: Partition by time to enable fast queries on recent data and easy deletion of old data by dropping partitions.
Time-Series Queries
Common patterns for querying time-series data:
-- Get metrics for last hour
SELECT
metric_name,
AVG(metric_value) AS avg_value,
MAX(metric_value) AS max_value,
MIN(metric_value) AS min_value,
COUNT(*) AS data_points
FROM metrics
WHERE recorded_at >= NOW() - INTERVAL 1 HOUR
GROUP BY metric_name;
-- Get page views per hour for last 24 hours
SELECT
DATE_FORMAT(viewed_at, '%Y-%m-%d %H:00:00') AS hour,
COUNT(*) AS views,
COUNT(DISTINCT session_id) AS unique_sessions,
COUNT(DISTINCT user_id) AS unique_users
FROM page_views
WHERE viewed_at >= NOW() - INTERVAL 24 HOUR
GROUP BY hour
ORDER BY hour;
-- Get top pages by views
SELECT
page_url,
COUNT(*) AS view_count,
COUNT(DISTINCT user_id) AS unique_visitors
FROM page_views
WHERE viewed_at >= NOW() - INTERVAL 7 DAY
GROUP BY page_url
ORDER BY view_count DESC
LIMIT 20;
-- Calculate moving average (5-minute windows)
SELECT
DATE_FORMAT(recorded_at, '%Y-%m-%d %H:%i:00') AS time_bucket,
metric_name,
AVG(metric_value) AS avg_value
FROM metrics
WHERE recorded_at >= NOW() - INTERVAL 1 HOUR
AND metric_name = 'cpu_usage'
GROUP BY time_bucket, metric_name
ORDER BY time_bucket;
-- Detect anomalies (values 2x stddev from mean)
WITH metric_stats AS (
SELECT
metric_name,
AVG(metric_value) AS mean_value,
STDDEV(metric_value) AS stddev_value
FROM metrics
WHERE recorded_at >= NOW() - INTERVAL 1 DAY
GROUP BY metric_name
)
SELECT
m.recorded_at,
m.metric_name,
m.metric_value,
ms.mean_value,
ms.stddev_value
FROM metrics m
JOIN metric_stats ms ON m.metric_name = ms.metric_name
WHERE m.recorded_at >= NOW() - INTERVAL 1 HOUR
AND ABS(m.metric_value - ms.mean_value) > (2 * ms.stddev_value)
ORDER BY m.recorded_at DESC;
Data Aggregation Tables
Pre-aggregate data for faster queries:
-- Hourly aggregations
CREATE TABLE metrics_hourly (
metric_name VARCHAR(100) NOT NULL,
hour_start TIMESTAMP NOT NULL,
avg_value DOUBLE NOT NULL,
min_value DOUBLE NOT NULL,
max_value DOUBLE NOT NULL,
sum_value DOUBLE NOT NULL,
count_value BIGINT UNSIGNED NOT NULL,
PRIMARY KEY (metric_name, hour_start),
INDEX idx_hour (hour_start)
) ENGINE=InnoDB;
-- Daily aggregations
CREATE TABLE metrics_daily (
metric_name VARCHAR(100) NOT NULL,
date DATE NOT NULL,
avg_value DOUBLE NOT NULL,
min_value DOUBLE NOT NULL,
max_value DOUBLE NOT NULL,
sum_value DOUBLE NOT NULL,
count_value BIGINT UNSIGNED NOT NULL,
PRIMARY KEY (metric_name, date),
INDEX idx_date (date)
) ENGINE=InnoDB;
-- Populate hourly aggregations (run every hour)
INSERT INTO metrics_hourly
SELECT
metric_name,
DATE_FORMAT(recorded_at, '%Y-%m-%d %H:00:00') AS hour_start,
AVG(metric_value) AS avg_value,
MIN(metric_value) AS min_value,
MAX(metric_value) AS max_value,
SUM(metric_value) AS sum_value,
COUNT(*) AS count_value
FROM metrics
WHERE recorded_at >= DATE_FORMAT(NOW() - INTERVAL 1 HOUR, '%Y-%m-%d %H:00:00')
AND recorded_at < DATE_FORMAT(NOW(), '%Y-%m-%d %H:00:00')
GROUP BY metric_name, hour_start
ON DUPLICATE KEY UPDATE
avg_value = VALUES(avg_value),
min_value = VALUES(min_value),
max_value = VALUES(max_value),
sum_value = VALUES(sum_value),
count_value = VALUES(count_value);
-- Query aggregated data (much faster)
SELECT
metric_name,
AVG(avg_value) AS avg_value,
MAX(max_value) AS max_value,
MIN(min_value) AS min_value
FROM metrics_hourly
WHERE hour_start >= NOW() - INTERVAL 7 DAY
GROUP BY metric_name;
Performance Tip: Always query aggregated tables for historical data (older than 24 hours) and only query raw tables for real-time data.
Analytics Database (OLAP)
Data warehouse schema for business intelligence:
-- Fact table: Sales transactions
CREATE TABLE fact_sales (
sale_id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
date_key INT UNSIGNED NOT NULL,
product_key INT UNSIGNED NOT NULL,
customer_key INT UNSIGNED NOT NULL,
store_key INT UNSIGNED NOT NULL,
quantity INT UNSIGNED NOT NULL,
unit_price DECIMAL(10, 2) NOT NULL,
discount_amount DECIMAL(10, 2) DEFAULT 0,
total_amount DECIMAL(10, 2) NOT NULL,
profit_amount DECIMAL(10, 2) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
INDEX idx_date (date_key),
INDEX idx_product (product_key),
INDEX idx_customer (customer_key),
INDEX idx_store (store_key)
) ENGINE=InnoDB;
-- Dimension table: Date
CREATE TABLE dim_date (
date_key INT UNSIGNED PRIMARY KEY,
full_date DATE NOT NULL,
year INT NOT NULL,
quarter INT NOT NULL,
month INT NOT NULL,
week INT NOT NULL,
day_of_month INT NOT NULL,
day_of_week INT NOT NULL,
day_name VARCHAR(20) NOT NULL,
month_name VARCHAR(20) NOT NULL,
is_weekend BOOLEAN NOT NULL,
is_holiday BOOLEAN DEFAULT FALSE,
UNIQUE KEY unique_date (full_date),
INDEX idx_year_month (year, month)
) ENGINE=InnoDB;
-- Dimension table: Product
CREATE TABLE dim_product (
product_key INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
product_id BIGINT UNSIGNED NOT NULL COMMENT 'ID from OLTP system',
product_name VARCHAR(255) NOT NULL,
category VARCHAR(100) NOT NULL,
subcategory VARCHAR(100),
brand VARCHAR(100),
unit_cost DECIMAL(10, 2) NOT NULL,
is_active BOOLEAN DEFAULT TRUE,
INDEX idx_category (category),
INDEX idx_brand (brand)
) ENGINE=InnoDB;
-- Dimension table: Customer
CREATE TABLE dim_customer (
customer_key INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
customer_id BIGINT UNSIGNED NOT NULL COMMENT 'ID from OLTP system',
customer_name VARCHAR(200) NOT NULL,
customer_type ENUM('individual', 'business') NOT NULL,
country VARCHAR(100) NOT NULL,
region VARCHAR(100) NOT NULL,
city VARCHAR(100) NOT NULL,
customer_since DATE NOT NULL,
lifetime_value DECIMAL(12, 2) DEFAULT 0,
INDEX idx_country (country),
INDEX idx_region (region)
) ENGINE=InnoDB;
-- Dimension table: Store
CREATE TABLE dim_store (
store_key INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,
store_id INT UNSIGNED NOT NULL COMMENT 'ID from OLTP system',
store_name VARCHAR(200) NOT NULL,
store_type ENUM('physical', 'online', 'warehouse') NOT NULL,
country VARCHAR(100) NOT NULL,
region VARCHAR(100) NOT NULL,
city VARCHAR(100) NOT NULL,
opened_date DATE NOT NULL,
INDEX idx_country (country),
INDEX idx_type (store_type)
) ENGINE=InnoDB;
Star Schema: Fact table at center connected to dimension tables. Optimized for analytical queries with denormalized dimensions.
Analytics Queries (OLAP)
Complex analytical queries:
-- Sales by month and category
SELECT
d.year,
d.month_name,
p.category,
SUM(f.total_amount) AS total_sales,
SUM(f.profit_amount) AS total_profit,
COUNT(DISTINCT f.customer_key) AS unique_customers
FROM fact_sales f
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_product p ON f.product_key = p.product_key
WHERE d.year = 2024
GROUP BY d.year, d.month, d.month_name, p.category
ORDER BY d.month, total_sales DESC;
-- Top products by profit
SELECT
p.product_name,
p.category,
SUM(f.quantity) AS units_sold,
SUM(f.total_amount) AS revenue,
SUM(f.profit_amount) AS profit,
ROUND(SUM(f.profit_amount) / SUM(f.total_amount) * 100, 2) AS profit_margin
FROM fact_sales f
JOIN dim_product p ON f.product_key = p.product_key
JOIN dim_date d ON f.date_key = d.date_key
WHERE d.year = 2024
GROUP BY p.product_key, p.product_name, p.category
ORDER BY profit DESC
LIMIT 20;
-- Customer segmentation by value
SELECT
CASE
WHEN c.lifetime_value >= 10000 THEN 'VIP'
WHEN c.lifetime_value >= 5000 THEN 'High Value'
WHEN c.lifetime_value >= 1000 THEN 'Medium Value'
ELSE 'Low Value'
END AS customer_segment,
COUNT(DISTINCT c.customer_key) AS customer_count,
AVG(c.lifetime_value) AS avg_lifetime_value,
SUM(f.total_amount) AS total_sales
FROM dim_customer c
LEFT JOIN fact_sales f ON c.customer_key = f.customer_key
AND f.date_key >= 20240101
GROUP BY customer_segment
ORDER BY avg_lifetime_value DESC;
-- Year-over-year growth
SELECT
d.month_name,
SUM(CASE WHEN d.year = 2024 THEN f.total_amount ELSE 0 END) AS sales_2024,
SUM(CASE WHEN d.year = 2023 THEN f.total_amount ELSE 0 END) AS sales_2023,
ROUND(
(SUM(CASE WHEN d.year = 2024 THEN f.total_amount ELSE 0 END) -
SUM(CASE WHEN d.year = 2023 THEN f.total_amount ELSE 0 END)) /
NULLIF(SUM(CASE WHEN d.year = 2023 THEN f.total_amount ELSE 0 END), 0) * 100,
2) AS growth_percentage
FROM fact_sales f
JOIN dim_date d ON f.date_key = d.date_key
WHERE d.year IN (2023, 2024)
GROUP BY d.month, d.month_name
ORDER BY d.month;
-- Cohort analysis
SELECT
DATE_FORMAT(c.customer_since, '%Y-%m') AS cohort_month,
COUNT(DISTINCT c.customer_key) AS cohort_size,
SUM(f.total_amount) AS total_revenue,
AVG(f.total_amount) AS avg_revenue_per_customer
FROM dim_customer c
LEFT JOIN fact_sales f ON c.customer_key = f.customer_key
WHERE c.customer_since >= '2024-01-01'
GROUP BY cohort_month
ORDER BY cohort_month;
Data Archiving Strategy
-- Archive old partitions
ALTER TABLE metrics DROP PARTITION p202401;
-- Move to archive table before dropping
CREATE TABLE metrics_archive LIKE metrics;
ALTER TABLE metrics_archive REMOVE PARTITIONING;
INSERT INTO metrics_archive
SELECT * FROM metrics PARTITION (p202401);
-- Then drop the partition
ALTER TABLE metrics DROP PARTITION p202401;
-- Automated cleanup (delete data older than 90 days)
DELETE FROM app_events
WHERE created_at < NOW() - INTERVAL 90 DAY
LIMIT 10000;
-- Batch delete to avoid locks
DELIMITER $$
CREATE PROCEDURE cleanup_old_events()
BEGIN
DECLARE rows_deleted INT DEFAULT 1;
WHILE rows_deleted > 0 DO
DELETE FROM app_events
WHERE created_at < NOW() - INTERVAL 90 DAY
LIMIT 1000;
SET rows_deleted = ROW_COUNT();
DO SLEEP(1);
END WHILE;
END$$
DELIMITER ;
Performance Warning: Deleting millions of rows can lock tables. Use partitioning or batch deletes with sleep intervals.
ETL Process (OLTP to OLAP)
-- Extract, Transform, Load from OLTP to OLAP
-- Step 1: Extract new sales from OLTP
CREATE TEMPORARY TABLE temp_new_sales AS
SELECT
o.id AS order_id,
DATE(o.created_at) AS sale_date,
oi.product_id,
o.user_id AS customer_id,
1 AS store_id,
oi.quantity,
oi.unit_price,
0 AS discount_amount,
oi.total_price AS total_amount,
(oi.total_price - (oi.quantity * p.cost_price)) AS profit_amount
FROM orders o
JOIN order_items oi ON o.id = oi.order_id
JOIN products p ON oi.product_id = p.id
WHERE o.created_at >= CURDATE() - INTERVAL 1 DAY
AND o.status = 'completed';
-- Step 2: Load into fact table
INSERT INTO fact_sales
(date_key, product_key, customer_key, store_key,
quantity, unit_price, discount_amount, total_amount, profit_amount)
SELECT
CAST(DATE_FORMAT(t.sale_date, '%Y%m%d') AS UNSIGNED) AS date_key,
dp.product_key,
dc.customer_key,
ds.store_key,
t.quantity,
t.unit_price,
t.discount_amount,
t.total_amount,
t.profit_amount
FROM temp_new_sales t
JOIN dim_product dp ON t.product_id = dp.product_id
JOIN dim_customer dc ON t.customer_id = dc.customer_id
JOIN dim_store ds ON t.store_id = ds.store_id;
Practice Exercise:
Task: Create a user session analytics system.
Requirements:
- Track user sessions with start/end times
- Record pages viewed per session
- Calculate session duration
- Identify bounce rate (single page sessions)
Solution:
CREATE TABLE user_sessions (
id BIGINT UNSIGNED AUTO_INCREMENT,
session_id VARCHAR(100) UNIQUE NOT NULL,
user_id BIGINT UNSIGNED NULL,
started_at TIMESTAMP NOT NULL,
ended_at TIMESTAMP NULL,
page_count INT UNSIGNED DEFAULT 0,
duration_seconds INT UNSIGNED DEFAULT 0,
device_type ENUM('desktop', 'mobile', 'tablet') NOT NULL,
PRIMARY KEY (started_at, id),
INDEX idx_session (session_id),
INDEX idx_user (user_id, started_at)
) ENGINE=InnoDB;
-- Calculate bounce rate
SELECT
DATE(started_at) AS date,
COUNT(*) AS total_sessions,
SUM(CASE WHEN page_count = 1 THEN 1 ELSE 0 END) AS bounce_sessions,
ROUND(SUM(CASE WHEN page_count = 1 THEN 1 ELSE 0 END) / COUNT(*) * 100, 2) AS bounce_rate
FROM user_sessions
WHERE started_at >= CURDATE() - INTERVAL 7 DAY
GROUP BY date
ORDER BY date;
-- Average session duration by device
SELECT
device_type,
COUNT(*) AS sessions,
ROUND(AVG(duration_seconds), 0) AS avg_duration_seconds,
ROUND(AVG(page_count), 1) AS avg_pages_per_session
FROM user_sessions
WHERE started_at >= CURDATE() - INTERVAL 30 DAY
AND duration_seconds > 0
GROUP BY device_type;
Summary
In this lesson, you learned:
- Designing databases for time-series data
- Implementing table partitioning for performance
- Creating pre-aggregated summary tables
- Building star schema for data warehousing
- Writing analytical queries for business intelligence
- ETL processes from OLTP to OLAP
- Data archiving and cleanup strategies
Next Up: In the final lesson, we'll review all database design patterns and best practices covered throughout this tutorial!