MySQL & Database Design

Performance Best Practices

13 min Lesson 19 of 40

Performance Best Practices

In this final lesson of the Indexes & Performance module, we'll explore comprehensive performance best practices including connection pooling, batch operations, pagination optimization, and strategies to avoid common performance pitfalls.

Query Caching (Deprecated in MySQL 8.0)

Query cache was removed in MySQL 8.0 due to scalability issues. Understanding why helps avoid similar patterns:

Old Approach (MySQL 5.7 and earlier): -- Query cache stored exact SELECT results -- Second identical query returned cached result -- Problem: Cache invalidated on ANY table change Why it was removed: - Poor scalability on multi-core systems - Cache invalidation was too aggressive - Didn't work well with write-heavy workloads Modern Alternative: Use application-level caching (Redis, Memcached) + More flexible cache invalidation + Works across multiple MySQL servers + Better control over what to cache
Best Practice: Implement caching at the application layer using Redis or Memcached. This gives you fine-grained control over cache expiration and invalidation strategies.

Connection Pooling

Opening and closing database connections is expensive. Connection pooling reuses connections for better performance:

Without Connection Pooling (Inefficient)

// Bad: Creating new connection for each request function getUserData($userId) { $conn = new PDO('mysql:host=localhost;dbname=mydb', 'user', 'pass'); $stmt = $conn->prepare('SELECT * FROM users WHERE id = ?'); $stmt->execute([$userId]); $result = $stmt->fetch(); $conn = null; // Close connection return $result; } // Problem: For 100 requests, creates 100 connections! // Each connection takes ~10-50ms to establish

With Connection Pooling (Efficient)

// Good: Reuse persistent connection function getUserData($userId) { // Get connection from pool (PDO persistent connection) $conn = new PDO( 'mysql:host=localhost;dbname=mydb', 'user', 'pass', [PDO::ATTR_PERSISTENT => true] // Enable connection pooling ); $stmt = $conn->prepare('SELECT * FROM users WHERE id = ?'); $stmt->execute([$userId]); return $stmt->fetch(); // Connection returned to pool, not closed } // For 100 requests, reuses a few pooled connections // Saves 90-99% of connection overhead

Connection Pool Configuration

MySQL Server Settings (my.cnf): [mysqld] max_connections = 200 -- Maximum concurrent connections max_connect_errors = 10 -- Block host after N failed attempts wait_timeout = 28800 -- Close idle connections after 8 hours interactive_timeout = 28800 -- Same for interactive clients Best Practices: ✓ Set max_connections based on your traffic (100-500 typical) ✓ Use persistent connections in PHP/Python/Node.js ✓ Close connections in long-running scripts ✓ Monitor connection usage with SHOW PROCESSLIST

Batch Operations

Processing multiple rows in a single query is much faster than individual queries:

Slow: Individual INSERTs

-- Bad: 1000 separate INSERT statements INSERT INTO users (name, email) VALUES ('John', 'john@example.com'); INSERT INTO users (name, email) VALUES ('Jane', 'jane@example.com'); INSERT INTO users (name, email) VALUES ('Bob', 'bob@example.com'); -- ... 997 more times Performance: 1000 queries × 5ms per query = 5000ms (5 seconds)

Fast: Batch INSERT

-- Good: Single INSERT with multiple values INSERT INTO users (name, email) VALUES ('John', 'john@example.com'), ('Jane', 'jane@example.com'), ('Bob', 'bob@example.com'), -- ... up to 1000 rows ('Alice', 'alice@example.com'); Performance: 1 query = ~50ms (100x faster!)

Batch UPDATE Example

-- Slow: Individual UPDATEs UPDATE users SET status = 'active' WHERE id = 1; UPDATE users SET status = 'active' WHERE id = 2; UPDATE users SET status = 'active' WHERE id = 3; -- ... many more -- Fast: Single UPDATE with IN clause UPDATE users SET status = 'active' WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9, 10); -- Even better: Use CASE for different values UPDATE users SET status = CASE WHEN id IN (1, 2, 3) THEN 'active' WHEN id IN (4, 5, 6) THEN 'inactive' WHEN id IN (7, 8, 9) THEN 'pending' END WHERE id IN (1, 2, 3, 4, 5, 6, 7, 8, 9);
Batch Size Limits: MySQL has a max_allowed_packet setting (default 64MB). For very large batches, break them into chunks of 1000-5000 rows to stay under this limit.

Pagination Optimization

Naive pagination becomes slow as offset increases. Here are better approaches:

Naive Pagination (Slow for Large Offsets)

-- Page 1 (fast) SELECT * FROM articles ORDER BY id LIMIT 20 OFFSET 0; -- Page 1000 (VERY SLOW) SELECT * FROM articles ORDER BY id LIMIT 20 OFFSET 20000; -- MySQL must scan 20,020 rows to return 20! Problem: As page number increases, performance degrades Page 10,000 scans 200,000 rows just to return 20

Keyset Pagination (Fast for All Pages)

-- Page 1: Get first 20 articles SELECT * FROM articles ORDER BY id LIMIT 20; -- Returns IDs: 1-20 -- Page 2: Use last ID from previous page (id = 20) SELECT * FROM articles WHERE id > 20 ORDER BY id LIMIT 20; -- Returns IDs: 21-40 (fast! only scans 20 rows) -- Page 3: Use last ID from page 2 (id = 40) SELECT * FROM articles WHERE id > 40 ORDER BY id LIMIT 20; -- Returns IDs: 41-60 (still fast!) Advantages: ✓ Consistent performance regardless of page number ✓ Works with indexes efficiently ✓ Perfect for "Load More" or infinite scroll UIs

Pagination with Composite Key

-- Paginating by created_at (not unique) CREATE INDEX idx_created_id ON articles(created_at, id); -- Page 1 SELECT * FROM articles ORDER BY created_at DESC, id DESC LIMIT 20; -- Last item: created_at='2024-01-15', id=458 -- Page 2: Use both values SELECT * FROM articles WHERE (created_at, id) < ('2024-01-15', 458) ORDER BY created_at DESC, id DESC LIMIT 20; This approach: ✓ Works with non-unique ordering columns ✓ Uses composite index efficiently ✓ Maintains consistent performance

The N+1 Query Problem

One of the most common performance killers in database applications:

The Problem

-- Get 100 users (1 query) SELECT * FROM users LIMIT 100; -- Then for each user, get their posts (100 queries!) -- Loop: SELECT * FROM posts WHERE user_id = 1; SELECT * FROM posts WHERE user_id = 2; SELECT * FROM posts WHERE user_id = 3; -- ... 97 more queries Total: 1 + 100 = 101 queries If each query takes 5ms: 101 × 5ms = 505ms

Solution 1: JOIN

-- Single query with JOIN (1 query total!) SELECT u.id, u.name, u.email, p.id as post_id, p.title, p.content FROM users u LEFT JOIN posts p ON u.id = p.user_id LIMIT 100; Total: 1 query Query time: ~20ms (25x faster!)

Solution 2: IN Clause (Eager Loading)

-- Get users (1 query) SELECT * FROM users LIMIT 100; -- Results: user IDs are 1, 2, 3, ..., 100 -- Get all posts for these users (1 query) SELECT * FROM posts WHERE user_id IN (1, 2, 3, 4, 5, ..., 100); Total: 2 queries Much better than 101 queries!
Common in ORMs: The N+1 problem is extremely common when using ORMs (Eloquent, Doctrine, etc.) if you don't use eager loading. Always use eager loading methods like `with()` in Laravel or `fetch()` in Doctrine.

Database Profiling

MySQL provides tools to identify slow queries and performance bottlenecks:

Enable Profiling for Current Session

-- Enable profiling SET profiling = 1; -- Run your queries SELECT * FROM users WHERE country = 'USA'; SELECT * FROM orders WHERE status = 'pending'; SELECT COUNT(*) FROM products; -- View query list with timing SHOW PROFILES; +----------+------------+------------------------------------------------+ | Query_ID | Duration | Query | +----------+------------+------------------------------------------------+ | 1 | 0.00234500 | SELECT * FROM users WHERE country = 'USA' | | 2 | 0.15678900 | SELECT * FROM orders WHERE status = 'pending' | | 3 | 0.00089200 | SELECT COUNT(*) FROM products | +----------+------------+------------------------------------------------+ Analysis: Query 2 is slow (156ms) - needs optimization!

Detailed Profile Information

-- Get detailed breakdown for specific query SHOW PROFILE FOR QUERY 2; +----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000087 | | checking permissions | 0.000012 | | Opening tables | 0.000034 | | init | 0.000045 | | System lock | 0.000015 | | optimizing | 0.000023 | | statistics | 0.000098 | | preparing | 0.000034 | | executing | 0.150234 | ← 96% of time here! | end | 0.000012 | | query end | 0.000008 | | closing tables | 0.000011 | | freeing items | 0.000098 | | cleaning up | 0.000023 | +----------------------+----------+ Bottleneck identified: "executing" phase (full table scan likely)

Performance Monitoring Tools

1. SHOW PROCESSLIST - View Active Queries SHOW FULL PROCESSLIST; -- Shows all current connections and queries -- Identify long-running queries, locked queries 2. Performance Schema (MySQL 5.5+) -- Most expensive queries by total time SELECT DIGEST_TEXT, COUNT_STAR as executions, AVG_TIMER_WAIT/1000000000 as avg_ms, SUM_TIMER_WAIT/1000000000 as total_ms FROM performance_schema.events_statements_summary_by_digest ORDER BY SUM_TIMER_WAIT DESC LIMIT 10; 3. Slow Query Log -- Already covered in previous lesson SET GLOBAL slow_query_log = 'ON'; SET GLOBAL long_query_time = 1; -- Log queries >1 second 4. sys Schema (MySQL 5.7+) -- User-friendly views of Performance Schema SELECT * FROM sys.statement_analysis LIMIT 10; SELECT * FROM sys.statements_with_full_table_scans; SELECT * FROM sys.schema_index_statistics;

Performance Optimization Checklist

Database Design: ✓ Normalize to 3NF, denormalize strategically ✓ Choose appropriate data types (INT vs BIGINT) ✓ Use ENUM for small fixed sets of values ✓ Consider partitioning for very large tables Indexing: ✓ Index foreign keys ✓ Index WHERE, JOIN, ORDER BY columns ✓ Use composite indexes for multi-column queries ✓ Remove unused/redundant indexes ✓ Monitor index usage with sys.schema_unused_indexes Query Optimization: ✓ Use EXPLAIN for all important queries ✓ Avoid SELECT * (select only needed columns) ✓ Avoid functions on indexed columns in WHERE ✓ Use EXISTS instead of IN for large subqueries ✓ Batch INSERT/UPDATE operations ✓ Use keyset pagination instead of OFFSET Connection Management: ✓ Use connection pooling ✓ Close connections in long-running scripts ✓ Set appropriate max_connections ✓ Monitor active connections Caching: ✓ Implement application-level caching (Redis) ✓ Cache expensive queries ✓ Set appropriate cache expiration times ✓ Invalidate cache on data changes Monitoring: ✓ Enable slow query log ✓ Review slow queries weekly ✓ Use Performance Schema for detailed analysis ✓ Monitor server resources (CPU, RAM, disk I/O) ✓ Set up alerts for performance degradation

Common Performance Anti-Patterns

1. SELECT * in Loops -- Bad: Fetches all columns, creates N+1 problem for each user: SELECT * FROM users WHERE id = ? -- Good: Single query with JOIN or IN SELECT u.*, p.* FROM users u LEFT JOIN posts p ON u.id = p.user_id 2. Inefficient Counting -- Bad: Scans entire table SELECT COUNT(*) FROM orders WHERE status = 'pending'; -- Good: Use covering index CREATE INDEX idx_status ON orders(status); -- Or maintain counters in separate table 3. Using OR Instead of UNION -- Slower: Can't use indexes effectively SELECT * FROM users WHERE email = ? OR username = ? -- Faster: Uses indexes for each part SELECT * FROM users WHERE email = ? UNION SELECT * FROM users WHERE username = ? 4. Wildcard at Start of LIKE -- Can't use index SELECT * FROM products WHERE name LIKE '%phone%'; -- Can use index SELECT * FROM products WHERE name LIKE 'phone%'; 5. Not Using LIMIT -- Bad: Returns all results SELECT * FROM logs ORDER BY created_at DESC; -- Good: Limits result set SELECT * FROM logs ORDER BY created_at DESC LIMIT 100;

Practice Exercise:

Scenario: Your application is slow. Here's the problematic code:

// Get all users $users = query("SELECT * FROM users LIMIT 1000"); // For each user, get their order count and total spent foreach ($users as $user) { $orderCount = query( "SELECT COUNT(*) FROM orders WHERE customer_id = " . $user['id'] ); $totalSpent = query( "SELECT SUM(total) FROM orders WHERE customer_id = " . $user['id'] ); echo $user['name'] . ": " . $orderCount . " orders, $" . $totalSpent; } Problems: 1. SELECT * fetches unnecessary columns 2. N+1 query problem (1 + 1000 + 1000 = 2001 queries!) 3. No indexes mentioned 4. Not using prepared statements (SQL injection risk)

Task: Optimize this code.

Solution:

Step 1: Create necessary indexes CREATE INDEX idx_customer_id ON orders(customer_id); Step 2: Rewrite as single query with JOIN $results = query(" SELECT u.id, u.name, COUNT(o.id) as order_count, COALESCE(SUM(o.total), 0) as total_spent FROM users u LEFT JOIN orders o ON u.id = o.customer_id GROUP BY u.id, u.name LIMIT 1000 "); foreach ($results as $row) { echo $row['name'] . ": " . $row['order_count'] . " orders, $" . $row['total_spent']; } Performance Improvement: Before: 2001 queries × 5ms = 10,005ms (10 seconds) After: 1 query = ~50ms (200x faster!) Alternative: If JOIN is too complex, use 2 queries // Query 1: Get users $users = query("SELECT id, name FROM users LIMIT 1000"); $userIds = array_column($users, 'id'); // Query 2: Get aggregated order data for all users $orderStats = query(" SELECT customer_id, COUNT(*) as order_count, SUM(total) as total_spent FROM orders WHERE customer_id IN (" . implode(',', $userIds) . ") GROUP BY customer_id "); // Merge results in application code // Still much better: 2 queries vs 2001!

Summary

In this lesson, you learned:

  • Query cache was removed in MySQL 8.0 - use application-level caching instead
  • Connection pooling reuses connections for significant performance gains
  • Batch operations (bulk INSERT/UPDATE) are 100x faster than individual queries
  • Keyset pagination maintains performance for all page numbers
  • The N+1 query problem kills performance - use JOINs or eager loading
  • Profile queries with SHOW PROFILES to identify bottlenecks
  • Use Performance Schema and sys views for comprehensive monitoring
  • Follow the optimization checklist for database design, indexing, and queries
  • Avoid common anti-patterns like SELECT * in loops and inefficient LIKE patterns
Module Complete! You've mastered indexes and performance optimization. In the next module, we'll explore advanced database objects including views, stored procedures, functions, and triggers!