MySQL & Database Design

Query Optimization Basics

13 min Lesson 16 of 40

Query Optimization Basics

Query optimization is the process of improving database query performance to make them execute faster and use fewer resources. In this lesson, you'll master the EXPLAIN statement and learn to identify and fix slow queries.

The EXPLAIN Statement

EXPLAIN is your primary tool for understanding how MySQL executes a query. It shows the query execution plan without actually running the query:

-- Basic EXPLAIN syntax EXPLAIN SELECT * FROM users WHERE email = 'john@example.com'; -- EXPLAIN with complex query EXPLAIN SELECT u.name, COUNT(o.id) as order_count FROM users u LEFT JOIN orders o ON u.id = o.customer_id WHERE u.country = 'USA' GROUP BY u.id;
Pro Tip: Always run EXPLAIN on any query you suspect might be slow. It only takes milliseconds and provides invaluable insights into query performance.

Understanding EXPLAIN Output

EXPLAIN returns several columns that describe how MySQL will execute your query:

EXPLAIN SELECT * FROM users WHERE email = 'john@example.com'; +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+ | 1 | SIMPLE | users | ref | idx_email | idx | 767 | const | 1 | NULL | +----+-------------+-------+------+---------------+-------+---------+-------+------+-------+

Let's break down each column:

1. id - Query Identifier

-- Simple query id = 1 -- Subquery id = 1 (outer query) id = 2 (inner query) -- Union id = 1 (first SELECT) id = 2 (second SELECT) id = NULL (UNION result)

2. select_type - Type of SELECT

SIMPLE: Simple SELECT (no subqueries or unions) PRIMARY: Outermost SELECT in complex query SUBQUERY: Subquery in SELECT or WHERE DERIVED: Derived table (subquery in FROM) UNION: Second or later SELECT in UNION UNION RESULT: Result of UNION

3. table - Table Being Accessed

-- Shows which table is being read table = users table = orders table = <derived2> (derived table)

4. type - Join Type (MOST IMPORTANT)

The "type" column shows how MySQL accesses rows. From best to worst:

system: Table has only one row (extremely fast) const: At most one matching row (PRIMARY KEY or UNIQUE lookup) Example: WHERE id = 123 eq_ref: One row per combination from previous tables (JOIN on PRIMARY/UNIQUE) Example: JOIN orders ON users.id = orders.customer_id ref: Multiple rows with matching index value (good performance) Example: WHERE country = 'USA' (with index) range: Index used for range operations (decent performance) Example: WHERE age BETWEEN 18 AND 30 index: Full index scan (slow - scans entire index) Example: SELECT email FROM users (covering index) ALL: Full table scan (VERY SLOW - avoid!) Example: WHERE country = 'USA' (without index)
Performance Alert: If you see type = "ALL" on a large table, that's a red flag! The query is scanning every row and needs optimization.

5. possible_keys - Indexes That Could Be Used

-- Shows indexes MySQL might consider possible_keys = idx_email,idx_country -- NULL means no suitable index exists possible_keys = NULL

6. key - Index Actually Used

-- Shows which index MySQL chose key = idx_email ✓ Good (index is used) key = NULL ✗ Bad (no index used)

7. key_len - Index Length Used

-- Shows how many bytes of the index are used -- Useful for composite indexes -- Index: idx_country_city (country, city) key_len = 152 (only country used) key_len = 458 (both country and city used) -- Higher key_len for composite index = more of the index is utilized

8. ref - Columns Compared to Index

-- Shows what is being compared to the index ref = const (constant value: WHERE id = 123) ref = dbname.orders.customer_id (column from another table in JOIN) ref = func (function result)

9. rows - Estimated Rows to Examine

-- Estimated number of rows MySQL must examine rows = 1 ✓ Excellent (exact lookup) rows = 100 ✓ Good (small subset) rows = 10000 ⚠ Warning (needs review) rows = 1000000 ✗ Poor (full table scan likely) -- Lower is better!

10. Extra - Additional Information

Good indicators: "Using index" - Covering index (no table access needed) ✓ "Using index condition" - Index pushdown optimization ✓ "Using where" - WHERE clause filtering ✓ Warning indicators: "Using filesort" - External sort needed (add index for ORDER BY) ⚠ "Using temporary" - Temporary table needed (expensive) ⚠ Bad indicators: "Using where; Using join buffer" - No index for join (add index!) ✗ "Range checked for each record" - No good index found ✗

Real-World EXPLAIN Examples

Example 1: Optimized Query

EXPLAIN SELECT * FROM users WHERE email = 'john@example.com'; +----+--------+-------+-------+----------------+-----------+---------+-------+------+-------+ | id | type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------+-------+-------+----------------+-----------+---------+-------+------+-------+ | 1 | SIMPLE | users | const | idx_email | idx_email | 767 | const | 1 | NULL | +----+--------+-------+-------+----------------+-----------+---------+-------+------+-------+ Analysis: EXCELLENT ✓ - type = "const" (fastest possible) - key = idx_email (index is used) - rows = 1 (exactly one row examined) - No warnings in Extra

Example 2: Unoptimized Query

EXPLAIN SELECT * FROM users WHERE YEAR(created_at) = 2024; +----+--------+-------+------+---------------+------+---------+------+--------+-------------+ | id | type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------+-------+------+---------------+------+---------+------+--------+-------------+ | 1 | SIMPLE | users | ALL | NULL | NULL | NULL | NULL | 500000 | Using where | +----+--------+-------+------+---------------+------+---------+------+--------+-------------+ Analysis: VERY BAD ✗ - type = "ALL" (full table scan) - key = NULL (no index used) - rows = 500000 (scanning entire table) - Problem: Function on indexed column prevents index usage Fix: Rewrite query -- Bad: Using function on column WHERE YEAR(created_at) = 2024 -- Good: Compare without function WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01' -- Now index can be used!

Example 3: JOIN Without Index

EXPLAIN SELECT u.name, o.total FROM users u JOIN orders o ON u.id = o.customer_id WHERE u.country = 'USA'; +----+--------+--------+------+---------------+-------------+---------+----------------+-------+------------------+ | id | type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+--------+--------+------+---------------+-------------+---------+----------------+-------+------------------+ | 1 | SIMPLE | u | ref | idx_country | idx_country | 152 | const | 5000 | NULL | | 1 | SIMPLE | o | ALL | NULL | NULL | NULL | NULL | 100000| Using where;join | +----+--------+--------+------+---------------+-------------+---------+----------------+-------+------------------+ Analysis: PARTIALLY OPTIMIZED ⚠ - users table: Good (using idx_country) - orders table: Bad (full table scan, no index on customer_id) Fix: Add index CREATE INDEX idx_customer_id ON orders(customer_id); -- After adding index: | 1 | SIMPLE | o | ref | idx_customer_id | idx_customer_id | 4 | u.id | 10 | NULL | Now type = "ref" and rows = 10 ✓

EXPLAIN ANALYZE (MySQL 8.0.18+)

EXPLAIN ANALYZE goes beyond EXPLAIN by actually executing the query and showing real timing data:

EXPLAIN ANALYZE SELECT * FROM users WHERE country = 'USA'; -> Filter: (users.country = 'USA') (cost=450.25 rows=5000) (actual time=0.123..12.456 rows=4892 loops=1) -> Table scan on users (cost=450.25 rows=50000) (actual time=0.089..10.234 rows=50000 loops=1) Key Information: - cost: Estimated cost (internal MySQL metric) - rows: Estimated rows (before) vs actual rows (after execution) - actual time: Real execution time in milliseconds - loops: Number of times the operation was executed
When to use EXPLAIN vs EXPLAIN ANALYZE: - EXPLAIN: Quick analysis without running query (safe for production) - EXPLAIN ANALYZE: Detailed timing data but actually runs query (use on test data)

Identifying Slow Queries

MySQL provides a slow query log to track queries that take too long:

-- Enable slow query log SET GLOBAL slow_query_log = 'ON'; -- Set threshold (queries slower than 2 seconds) SET GLOBAL long_query_time = 2; -- Set log file location SET GLOBAL slow_query_log_file = '/var/log/mysql/slow-query.log'; -- Check current settings SHOW VARIABLES LIKE 'slow_query%'; SHOW VARIABLES LIKE 'long_query_time';

Analyze slow query log:

-- View recent slow queries SELECT * FROM mysql.slow_log ORDER BY start_time DESC LIMIT 10; -- Use mysqldumpslow tool (command line) $ mysqldumpslow /var/log/mysql/slow-query.log -- Shows most frequent slow queries: Count: 45 Time=3.21s (144s) Lock=0.00s (0s) Rows=500.0 (22500) SELECT * FROM orders WHERE customer_id = N

Query Optimization Techniques

1. Avoid Functions on Indexed Columns

-- BAD: Function prevents index usage SELECT * FROM users WHERE LOWER(email) = 'john@example.com'; SELECT * FROM orders WHERE YEAR(created_at) = 2024; -- GOOD: Rewrite without functions SELECT * FROM users WHERE email = 'john@example.com'; SELECT * FROM orders WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';

2. Use LIMIT for Large Result Sets

-- BAD: Returns all rows (memory intensive) SELECT * FROM users ORDER BY created_at DESC; -- GOOD: Limit results SELECT * FROM users ORDER BY created_at DESC LIMIT 100;

3. Select Only Needed Columns

-- BAD: Transfers unnecessary data SELECT * FROM users WHERE country = 'USA'; -- GOOD: Select only needed columns SELECT id, name, email FROM users WHERE country = 'USA'; -- BETTER: May use covering index CREATE INDEX idx_country_id_name_email ON users(country, id, name, email);

4. Use EXISTS Instead of IN for Subqueries

-- SLOWER: IN with subquery SELECT * FROM users WHERE id IN (SELECT customer_id FROM orders WHERE total > 1000); -- FASTER: EXISTS (stops at first match) SELECT * FROM users u WHERE EXISTS ( SELECT 1 FROM orders o WHERE o.customer_id = u.id AND o.total > 1000 );

5. Avoid SELECT DISTINCT When Possible

-- SLOW: DISTINCT on large result set SELECT DISTINCT country FROM users; -- FASTER: Use GROUP BY SELECT country FROM users GROUP BY country; -- FASTEST: If you just need to check existence SELECT country FROM users GROUP BY country LIMIT 1;

Practice Exercise:

Scenario: This query is running slowly on a blog with 100,000 articles:

SELECT a.title, a.content, COUNT(c.id) as comment_count FROM articles a LEFT JOIN comments c ON a.id = c.article_id WHERE MONTH(a.published_at) = 12 AND a.is_published = 1 GROUP BY a.id ORDER BY comment_count DESC; -- Run EXPLAIN EXPLAIN [the query above]; -- Output shows: +----+--------+----------+------+---------------+------+------+--------+------+--------------------------+ | id | type | table | type | possible_keys | key | rows | Extra | +----+--------+----------+------+---------------+------+------+--------+------+--------------------------+ | 1 | SIMPLE | a | ALL | NULL | NULL | NULL | 100000 | Using where; Using filesort | | 1 | SIMPLE | c | ALL | NULL | NULL | NULL | 500000 | Using where; Using temporary | +----+--------+----------+------+---------------+------+------+--------+------+--------------------------+

Questions:

  1. What are the performance problems?
  2. How would you fix them?

Answers:

Problems identified: 1. type = "ALL" on both tables (full table scans) 2. "Using filesort" (ORDER BY not using index) 3. "Using temporary" (GROUP BY creating temp table) 4. MONTH(published_at) prevents index usage 5. No index on comments.article_id for JOIN Solutions: -- Fix 1: Remove MONTH() function WHERE a.published_at >= '2024-12-01' AND a.published_at < '2025-01-01' AND a.is_published = 1 -- Fix 2: Add indexes CREATE INDEX idx_published ON articles(is_published, published_at); CREATE INDEX idx_article_id ON comments(article_id); -- Fix 3: Optimize query structure SELECT a.id, a.title, COUNT(c.id) as comment_count FROM articles a LEFT JOIN comments c ON a.id = c.article_id WHERE a.published_at >= '2024-12-01' AND a.published_at < '2025-01-01' AND a.is_published = 1 GROUP BY a.id ORDER BY comment_count DESC LIMIT 20; -- Added LIMIT -- Fix 4: Consider pre-aggregated data -- Create a comment_count column in articles table -- Update it with triggers or scheduled jobs for even better performance

Summary

In this lesson, you learned:

  • EXPLAIN shows how MySQL executes queries without running them
  • The "type" column is most important: aim for const, eq_ref, or ref
  • type = "ALL" indicates full table scan (usually needs optimization)
  • The "rows" column shows how many rows will be examined
  • EXPLAIN ANALYZE (MySQL 8.0.18+) provides actual execution timing
  • Enable slow query log to identify problematic queries
  • Avoid functions on indexed columns - they prevent index usage
  • Rewrite queries to be index-friendly
  • Always SELECT only the columns you need
Next Up: In the next lesson, we'll dive deeper into index optimization strategies including selectivity, leftmost prefix rule, and index hints!