MySQL & Database Design

Performance Best Practices

13 min Lesson 19 of 40

Performance Best Practices

In this final lesson of the Indexes & Performance module, we'll explore comprehensive performance best practices including connection pooling, batch operations, pagination optimization, and strategies to avoid common performance pitfalls.

Query Caching (Deprecated in MySQL 8.0)

Query cache was removed in MySQL 8.0 due to scalability issues. Understanding why helps avoid similar patterns:

Old Approach (MySQL 5.7 and earlier):
-- Query cache stored exact SELECT results
-- Second identical query returned cached result
-- Problem: Cache invalidated on ANY table change

Why it was removed:
- Poor scalability on multi-core systems
- Cache invalidation was too aggressive
- Didn't work well with write-heavy workloads

Modern Alternative:
Use application-level caching (Redis, Memcached)
+ More flexible cache invalidation
+ Works across multiple MySQL servers
+ Better control over what to cache

Best Practice: Implement caching at the application layer using Redis or Memcached. This gives you fine-grained control over cache expiration and invalidation strategies.

Connection Pooling

Opening and closing database connections is expensive. Connection pooling reuses connections for better performance:

Without Connection Pooling (Inefficient)

// Bad: Creating new connection for each request
function getUserData($userId) {
    $conn = new PDO('mysql:host=localhost;dbname=mydb', 'user', 'pass');
    $stmt = $conn->prepare('SELECT * FROM users WHERE id = ?');
    $stmt->execute([$userId]);
    $result = $stmt->fetch();
    $conn = null;  // Close connection
    return $result;
}

// Problem: For 100 requests, creates 100 connections!
// Each connection takes ~10-50ms to establish

With Connection Pooling (Efficient)

// Good: Reuse persistent connection
function getUserData($userId) {
    // Get connection from pool (PDO persistent connection)
    $conn = new PDO(
        'mysql:host=localhost;dbname=mydb',
        'user',
        'pass',
        [PDO::ATTR_PERSISTENT => true]  // Enable connection pooling
    );

    $stmt = $conn->prepare('SELECT * FROM users WHERE id = ?');
    $stmt->execute([$userId]);
    return $stmt->fetch();
    // Connection returned to pool, not closed
}

// For 100 requests, reuses a few pooled connections
// Saves 90-99% of connection overhead

Connection Pool Configuration

MySQL Server Settings (my.cnf):

[mysqld]
max_connections = 200          -- Maximum concurrent connections
max_connect_errors = 10        -- Block host after N failed attempts
wait_timeout = 28800           -- Close idle connections after 8 hours
interactive_timeout = 28800    -- Same for interactive clients

Best Practices:
✓ Set max_connections based on your traffic (100-500 typical)
✓ Use persistent connections in PHP/Python/Node.js
✓ Close connections in long-running scripts
✓ Monitor connection usage with SHOW PROCESSLIST

Batch Operations

Processing multiple rows in a single query is much faster than individual queries:

Slow: Individual INSERTs

-- Bad: 1000 separate INSERT statements
INSERT INTO users (name, email) VALUES ('John', 'john@example.com');
INSERT INTO users (name, email) VALUES ('Jane', 'jane@example.com');
INSERT INTO users (name, email) VALUES ('Bob', 'bob@example.com');
-- ... 997 more times

Performance:
1000 queries × 5ms per query = 5000ms (5 seconds)

Fast: Batch INSERT

-- Good: Single INSERT with multiple values
INSERT INTO users (name, email) VALUES
    ('John', 'john@example.com'),
    ('Jane', 'jane@example.com'),
    ('Bob', 'bob@example.com'),
    -- ... up to 1000 rows
    ('Alice', 'alice@example.com');

Performance:
1 query = ~50ms (100x faster!)

Batch UPDATE Example

-- Slow: Individual UPDATEs
UPDATE users SET status = 'active' WHERE id = 1;
UPDATE users SET status = 'active' WHERE id = 2;
UPDATE users SET status = 'active' WHERE id = 3;
-- ... many more

-- Fast: Single UPDATE with IN clause
UPDATE users
SET status = 'active'
WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10);

-- Even better: Use CASE for different values
UPDATE users
SET status = CASE
    WHEN id IN (1, 2, 3) THEN 'active'
    WHEN id IN (4, 5, 6) THEN 'inactive'
    WHEN id IN (7, 8, 9) THEN 'pending'
END
WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9);

Batch Size Limits: MySQL has a max_allowed_packet setting (default 64MB). For very large batches, break them into chunks of 1000-5000 rows to stay under this limit.

Pagination Optimization

Naive pagination becomes slow as offset increases. Here are better approaches:

Naive Pagination (Slow for Large Offsets)

-- Page 1 (fast)
SELECT * FROM articles ORDER BY id LIMIT 20 OFFSET 0;

-- Page 1000 (VERY SLOW)
SELECT * FROM articles ORDER BY id LIMIT 20 OFFSET 20000;
-- MySQL must scan 20,020 rows to return 20!

Problem:
As page number increases, performance degrades
Page 10,000 scans 200,000 rows just to return 20

Keyset Pagination (Fast for All Pages)

-- Page 1: Get first 20 articles
SELECT * FROM articles
ORDER BY id
LIMIT 20;
-- Returns IDs: 1-20

-- Page 2: Use last ID from previous page (id = 20)
SELECT * FROM articles
WHERE id > 20
ORDER BY id
LIMIT 20;
-- Returns IDs: 21-40 (fast! only scans 20 rows)

-- Page 3: Use last ID from page 2 (id = 40)
SELECT * FROM articles
WHERE id > 40
ORDER BY id
LIMIT 20;
-- Returns IDs: 41-60 (still fast!)

Advantages:
✓ Consistent performance regardless of page number
✓ Works with indexes efficiently
✓ Perfect for "Load More" or infinite scroll UIs

Pagination with Composite Key

-- Paginating by created_at (not unique)
CREATE INDEX idx_created_id ON articles(created_at, id);

-- Page 1
SELECT * FROM articles
ORDER BY created_at DESC, id DESC
LIMIT 20;
-- Last item: created_at='2024-01-15', id=458

-- Page 2: Use both values
SELECT * FROM articles
WHERE (created_at, id) < ('2024-01-15', 458)
ORDER BY created_at DESC, id DESC
LIMIT 20;

This approach:
✓ Works with non-unique ordering columns
✓ Uses composite index efficiently
✓ Maintains consistent performance

The N+1 Query Problem

One of the most common performance killers in database applications:

The Problem

-- Get 100 users (1 query)
SELECT * FROM users LIMIT 100;

-- Then for each user, get their posts (100 queries!)
-- Loop:
SELECT * FROM posts WHERE user_id = 1;
SELECT * FROM posts WHERE user_id = 2;
SELECT * FROM posts WHERE user_id = 3;
-- ... 97 more queries

Total: 1 + 100 = 101 queries
If each query takes 5ms: 101 × 5ms = 505ms

Solution 1: JOIN

-- Single query with JOIN (1 query total!)
SELECT
    u.id, u.name, u.email,
    p.id as post_id, p.title, p.content
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
LIMIT 100;

Total: 1 query
Query time: ~20ms (25x faster!)

Solution 2: IN Clause (Eager Loading)

-- Get users (1 query)
SELECT * FROM users LIMIT 100;
-- Results: user IDs are 1, 2, 3, ..., 100

-- Get all posts for these users (1 query)
SELECT * FROM posts
WHERE user_id IN (1, 2, 3, 4, 5, ..., 100);

Total: 2 queries
Much better than 101 queries!

Common in ORMs: The N+1 problem is extremely common when using ORMs (Eloquent, Doctrine, etc.) if you don't use eager loading. Always use eager loading methods like `with()` in Laravel or `fetch()` in Doctrine.

Database Profiling

MySQL provides tools to identify slow queries and performance bottlenecks:

Enable Profiling for Current Session

-- Enable profiling
SET profiling = 1;

-- Run your queries
SELECT * FROM users WHERE country = 'USA';
SELECT * FROM orders WHERE status = 'pending';
SELECT COUNT(*) FROM products;

-- View query list with timing
SHOW PROFILES;

+----------+------------+------------------------------------------------+
| Query_ID | Duration   | Query                                          |
+----------+------------+------------------------------------------------+
| 1        | 0.00234500 | SELECT * FROM users WHERE country = 'USA'     |
| 2        | 0.15678900 | SELECT * FROM orders WHERE status = 'pending' |
| 3        | 0.00089200 | SELECT COUNT(*) FROM products                  |
+----------+------------+------------------------------------------------+

Analysis: Query 2 is slow (156ms) - needs optimization!

Detailed Profile Information

-- Get detailed breakdown for specific query
SHOW PROFILE FOR QUERY 2;

+----------------------+----------+
| Status               | Duration |
+----------------------+----------+
| starting             | 0.000087 |
| checking permissions | 0.000012 |
| Opening tables       | 0.000034 |
| init                 | 0.000045 |
| System lock          | 0.000015 |
| optimizing           | 0.000023 |
| statistics           | 0.000098 |
| preparing            | 0.000034 |
| executing            | 0.150234 | ← 96% of time here!
| end                  | 0.000012 |
| query end            | 0.000008 |
| closing tables       | 0.000011 |
| freeing items        | 0.000098 |
| cleaning up          | 0.000023 |
+----------------------+----------+

Bottleneck identified: "executing" phase (full table scan likely)

Performance Monitoring Tools

1. SHOW PROCESSLIST - View Active Queries
SHOW FULL PROCESSLIST;

-- Shows all current connections and queries
-- Identify long-running queries, locked queries

2. Performance Schema (MySQL 5.5+)
-- Most expensive queries by total time
SELECT
    DIGEST_TEXT,
    COUNT_STAR as executions,
    AVG_TIMER_WAIT/1000000000 as avg_ms,
    SUM_TIMER_WAIT/1000000000 as total_ms
FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;

3. Slow Query Log
-- Already covered in previous lesson
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;  -- Log queries >1 second

4. sys Schema (MySQL 5.7+)
-- User-friendly views of Performance Schema
SELECT * FROM sys.statement_analysis LIMIT 10;
SELECT * FROM sys.statements_with_full_table_scans;
SELECT * FROM sys.schema_index_statistics;

Performance Optimization Checklist

Database Design:
✓ Normalize to 3NF, denormalize strategically
✓ Choose appropriate data types (INT vs BIGINT)
✓ Use ENUM for small fixed sets of values
✓ Consider partitioning for very large tables

Indexing:
✓ Index foreign keys
✓ Index WHERE, JOIN, ORDER BY columns
✓ Use composite indexes for multi-column queries
✓ Remove unused/redundant indexes
✓ Monitor index usage with sys.schema_unused_indexes

Query Optimization:
✓ Use EXPLAIN for all important queries
✓ Avoid SELECT * (select only needed columns)
✓ Avoid functions on indexed columns in WHERE
✓ Use EXISTS instead of IN for large subqueries
✓ Batch INSERT/UPDATE operations
✓ Use keyset pagination instead of OFFSET

Connection Management:
✓ Use connection pooling
✓ Close connections in long-running scripts
✓ Set appropriate max_connections
✓ Monitor active connections

Caching:
✓ Implement application-level caching (Redis)
✓ Cache expensive queries
✓ Set appropriate cache expiration times
✓ Invalidate cache on data changes

Monitoring:
✓ Enable slow query log
✓ Review slow queries weekly
✓ Use Performance Schema for detailed analysis
✓ Monitor server resources (CPU, RAM, disk I/O)
✓ Set up alerts for performance degradation

Common Performance Anti-Patterns

1. SELECT * in Loops
-- Bad: Fetches all columns, creates N+1 problem
for each user:
    SELECT * FROM users WHERE id = ?

-- Good: Single query with JOIN or IN
SELECT u.*, p.* FROM users u LEFT JOIN posts p ON u.id = p.user_id

2. Inefficient Counting
-- Bad: Scans entire table
SELECT COUNT(*) FROM orders WHERE status = 'pending';

-- Good: Use covering index
CREATE INDEX idx_status ON orders(status);
-- Or maintain counters in separate table

3. Using OR Instead of UNION
-- Slower: Can't use indexes effectively
SELECT * FROM users WHERE email = ? OR username = ?

-- Faster: Uses indexes for each part
SELECT * FROM users WHERE email = ?
UNION
SELECT * FROM users WHERE username = ?

4. Wildcard at Start of LIKE
-- Can't use index
SELECT * FROM products WHERE name LIKE '%phone%';

-- Can use index
SELECT * FROM products WHERE name LIKE 'phone%';

5. Not Using LIMIT
-- Bad: Returns all results
SELECT * FROM logs ORDER BY created_at DESC;

-- Good: Limits result set
SELECT * FROM logs ORDER BY created_at DESC LIMIT 100;

Practice Exercise:

Scenario: Your application is slow. Here's the problematic code:

// Get all users
$users = query("SELECT * FROM users LIMIT 1000");

// For each user, get their order count and total spent
foreach ($users as $user) {
    $orderCount = query(
        "SELECT COUNT(*) FROM orders WHERE customer_id = " . $user['id']
    );

    $totalSpent = query(
        "SELECT SUM(total) FROM orders WHERE customer_id = " . $user['id']
    );

    echo $user['name'] . ": " . $orderCount . " orders, $" . $totalSpent;
}

Problems:
1. SELECT * fetches unnecessary columns
2. N+1 query problem (1 + 1000 + 1000 = 2001 queries!)
3. No indexes mentioned
4. Not using prepared statements (SQL injection risk)

Task: Optimize this code.

Solution:

Step 1: Create necessary indexes
CREATE INDEX idx_customer_id ON orders(customer_id);

Step 2: Rewrite as single query with JOIN
$results = query("
    SELECT
        u.id,
        u.name,
        COUNT(o.id) as order_count,
        COALESCE(SUM(o.total), 0) as total_spent
    FROM users u
    LEFT JOIN orders o ON u.id = o.customer_id
    GROUP BY u.id, u.name
    LIMIT 1000
");

foreach ($results as $row) {
    echo $row['name'] . ": " . $row['order_count'] .
         " orders, $" . $row['total_spent'];
}

Performance Improvement:
Before: 2001 queries × 5ms = 10,005ms (10 seconds)
After: 1 query = ~50ms (200x faster!)

Alternative: If JOIN is too complex, use 2 queries
// Query 1: Get users
$users = query("SELECT id, name FROM users LIMIT 1000");
$userIds = array_column($users, 'id');

// Query 2: Get aggregated order data for all users
$orderStats = query("
    SELECT
        customer_id,
        COUNT(*) as order_count,
        SUM(total) as total_spent
    FROM orders
    WHERE customer_id IN (" . implode(',', $userIds) . ")
    GROUP BY customer_id
");

// Merge results in application code
// Still much better: 2 queries vs 2001!

Summary

In this lesson, you learned:

Query cache was removed in MySQL 8.0 - use application-level caching instead
Connection pooling reuses connections for significant performance gains
Batch operations (bulk INSERT/UPDATE) are 100x faster than individual queries
Keyset pagination maintains performance for all page numbers
The N+1 query problem kills performance - use JOINs or eager loading
Profile queries with SHOW PROFILES to identify bottlenecks
Use Performance Schema and sys views for comprehensive monitoring
Follow the optimization checklist for database design, indexing, and queries
Avoid common anti-patterns like SELECT * in loops and inefficient LIKE patterns

Module Complete! You've mastered indexes and performance optimization. In the next module, we'll explore advanced database objects including views, stored procedures, functions, and triggers!