MySQL & Database Design

Query Optimization Basics

13 min Lesson 16 of 40

Query Optimization Basics

Query optimization is the process of improving database query performance to make them execute faster and use fewer resources. In this lesson, you'll master the EXPLAIN statement and learn to identify and fix slow queries.

The EXPLAIN Statement

EXPLAIN is your primary tool for understanding how MySQL executes a query. It shows the query execution plan without actually running the query:

-- Basic EXPLAIN syntax
EXPLAIN SELECT * FROM users WHERE email = 'john@example.com';

-- EXPLAIN with complex query
EXPLAIN
SELECT u.name, COUNT(o.id) as order_count
FROM users u
LEFT JOIN orders o ON u.id = o.customer_id
WHERE u.country = 'USA'
GROUP BY u.id;

Pro Tip: Always run EXPLAIN on any query you suspect might be slow. It only takes milliseconds and provides invaluable insights into query performance.

Understanding EXPLAIN Output

EXPLAIN returns several columns that describe how MySQL will execute your query:

EXPLAIN SELECT * FROM users WHERE email = 'john@example.com';

+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
| id | select_type | table | type | possible_keys | key   | key_len | ref   | rows | Extra |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+
| 1  | SIMPLE      | users | ref  | idx_email     | idx   | 767     | const | 1    | NULL  |
+----+-------------+-------+------+---------------+-------+---------+-------+------+-------+

Let's break down each column:

1. id - Query Identifier

-- Simple query
id = 1

-- Subquery
id = 1  (outer query)
id = 2  (inner query)

-- Union
id = 1  (first SELECT)
id = 2  (second SELECT)
id = NULL (UNION result)

2. select_type - Type of SELECT

SIMPLE: Simple SELECT (no subqueries or unions)
PRIMARY: Outermost SELECT in complex query
SUBQUERY: Subquery in SELECT or WHERE
DERIVED: Derived table (subquery in FROM)
UNION: Second or later SELECT in UNION
UNION RESULT: Result of UNION

3. table - Table Being Accessed

-- Shows which table is being read
table = users
table = orders
table = <derived2> (derived table)

4. type - Join Type (MOST IMPORTANT)

The "type" column shows how MySQL accesses rows. From best to worst:

system: Table has only one row (extremely fast)
const: At most one matching row (PRIMARY KEY or UNIQUE lookup)
    Example: WHERE id = 123

eq_ref: One row per combination from previous tables (JOIN on PRIMARY/UNIQUE)
    Example: JOIN orders ON users.id = orders.customer_id

ref: Multiple rows with matching index value (good performance)
    Example: WHERE country = 'USA' (with index)

range: Index used for range operations (decent performance)
    Example: WHERE age BETWEEN 18 AND 30

index: Full index scan (slow - scans entire index)
    Example: SELECT email FROM users (covering index)

ALL: Full table scan (VERY SLOW - avoid!)
    Example: WHERE country = 'USA' (without index)

Performance Alert: If you see type = "ALL" on a large table, that's a red flag! The query is scanning every row and needs optimization.

5. possible_keys - Indexes That Could Be Used

-- Shows indexes MySQL might consider
possible_keys = idx_email,idx_country

-- NULL means no suitable index exists
possible_keys = NULL

6. key - Index Actually Used

-- Shows which index MySQL chose
key = idx_email        ✓ Good (index is used)
key = NULL             ✗ Bad (no index used)

7. key_len - Index Length Used

-- Shows how many bytes of the index are used
-- Useful for composite indexes

-- Index: idx_country_city (country, city)
key_len = 152    (only country used)
key_len = 458    (both country and city used)

-- Higher key_len for composite index = more of the index is utilized

8. ref - Columns Compared to Index

-- Shows what is being compared to the index
ref = const              (constant value: WHERE id = 123)
ref = dbname.orders.customer_id  (column from another table in JOIN)
ref = func               (function result)

9. rows - Estimated Rows to Examine

-- Estimated number of rows MySQL must examine
rows = 1         ✓ Excellent (exact lookup)
rows = 100       ✓ Good (small subset)
rows = 10000     ⚠ Warning (needs review)
rows = 1000000   ✗ Poor (full table scan likely)

-- Lower is better!

10. Extra - Additional Information

Good indicators:
"Using index"           - Covering index (no table access needed) ✓
"Using index condition" - Index pushdown optimization ✓
"Using where"           - WHERE clause filtering ✓

Warning indicators:
"Using filesort"        - External sort needed (add index for ORDER BY) ⚠
"Using temporary"       - Temporary table needed (expensive) ⚠

Bad indicators:
"Using where; Using join buffer" - No index for join (add index!) ✗
"Range checked for each record"  - No good index found ✗

Real-World EXPLAIN Examples

Example 1: Optimized Query

EXPLAIN SELECT * FROM users WHERE email = 'john@example.com';

+----+--------+-------+-------+----------------+-----------+---------+-------+------+-------+
| id | type   | table | type  | possible_keys  | key       | key_len | ref   | rows | Extra |
+----+--------+-------+-------+----------------+-----------+---------+-------+------+-------+
| 1  | SIMPLE | users | const | idx_email      | idx_email | 767     | const | 1    | NULL  |
+----+--------+-------+-------+----------------+-----------+---------+-------+------+-------+

Analysis: EXCELLENT ✓
- type = "const" (fastest possible)
- key = idx_email (index is used)
- rows = 1 (exactly one row examined)
- No warnings in Extra

Example 2: Unoptimized Query

EXPLAIN SELECT * FROM users WHERE YEAR(created_at) = 2024;

+----+--------+-------+------+---------------+------+---------+------+--------+-------------+
| id | type   | table | type | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+--------+-------+------+---------------+------+---------+------+--------+-------------+
| 1  | SIMPLE | users | ALL  | NULL          | NULL | NULL    | NULL | 500000 | Using where |
+----+--------+-------+------+---------------+------+---------+------+--------+-------------+

Analysis: VERY BAD ✗
- type = "ALL" (full table scan)
- key = NULL (no index used)
- rows = 500000 (scanning entire table)
- Problem: Function on indexed column prevents index usage

Fix: Rewrite query
-- Bad: Using function on column
WHERE YEAR(created_at) = 2024

-- Good: Compare without function
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01'

-- Now index can be used!

Example 3: JOIN Without Index

EXPLAIN
SELECT u.name, o.total
FROM users u
JOIN orders o ON u.id = o.customer_id
WHERE u.country = 'USA';

+----+--------+--------+------+---------------+-------------+---------+----------------+-------+------------------+
| id | type   | table  | type | possible_keys | key         | key_len | ref            | rows  | Extra            |
+----+--------+--------+------+---------------+-------------+---------+----------------+-------+------------------+
| 1  | SIMPLE | u      | ref  | idx_country   | idx_country | 152     | const          | 5000  | NULL             |
| 1  | SIMPLE | o      | ALL  | NULL          | NULL        | NULL    | NULL           | 100000| Using where;join |
+----+--------+--------+------+---------------+-------------+---------+----------------+-------+------------------+

Analysis: PARTIALLY OPTIMIZED ⚠
- users table: Good (using idx_country)
- orders table: Bad (full table scan, no index on customer_id)

Fix: Add index
CREATE INDEX idx_customer_id ON orders(customer_id);

-- After adding index:
| 1  | SIMPLE | o      | ref  | idx_customer_id | idx_customer_id | 4       | u.id  | 10    | NULL |
Now type = "ref" and rows = 10 ✓

EXPLAIN ANALYZE (MySQL 8.0.18+)

EXPLAIN ANALYZE goes beyond EXPLAIN by actually executing the query and showing real timing data:

EXPLAIN ANALYZE
SELECT * FROM users WHERE country = 'USA';

-> Filter: (users.country = 'USA')
   (cost=450.25 rows=5000)
   (actual time=0.123..12.456 rows=4892 loops=1)
   -> Table scan on users
      (cost=450.25 rows=50000)
      (actual time=0.089..10.234 rows=50000 loops=1)

Key Information:
- cost: Estimated cost (internal MySQL metric)
- rows: Estimated rows (before) vs actual rows (after execution)
- actual time: Real execution time in milliseconds
- loops: Number of times the operation was executed

When to use EXPLAIN vs EXPLAIN ANALYZE: - EXPLAIN: Quick analysis without running query (safe for production) - EXPLAIN ANALYZE: Detailed timing data but actually runs query (use on test data)

Identifying Slow Queries

MySQL provides a slow query log to track queries that take too long:

-- Enable slow query log
SET GLOBAL slow_query_log = 'ON';

-- Set threshold (queries slower than 2 seconds)
SET GLOBAL long_query_time = 2;

-- Set log file location
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow-query.log';

-- Check current settings
SHOW VARIABLES LIKE 'slow_query%';
SHOW VARIABLES LIKE 'long_query_time';

Analyze slow query log:

-- View recent slow queries
SELECT * FROM mysql.slow_log ORDER BY start_time DESC LIMIT 10;

-- Use mysqldumpslow tool (command line)
$ mysqldumpslow /var/log/mysql/slow-query.log

-- Shows most frequent slow queries:
Count: 45  Time=3.21s (144s)  Lock=0.00s (0s)  Rows=500.0 (22500)
  SELECT * FROM orders WHERE customer_id = N

Query Optimization Techniques

1. Avoid Functions on Indexed Columns

-- BAD: Function prevents index usage
SELECT * FROM users WHERE LOWER(email) = 'john@example.com';
SELECT * FROM orders WHERE YEAR(created_at) = 2024;

-- GOOD: Rewrite without functions
SELECT * FROM users WHERE email = 'john@example.com';
SELECT * FROM orders
WHERE created_at >= '2024-01-01' AND created_at < '2025-01-01';

2. Use LIMIT for Large Result Sets

-- BAD: Returns all rows (memory intensive)
SELECT * FROM users ORDER BY created_at DESC;

-- GOOD: Limit results
SELECT * FROM users ORDER BY created_at DESC LIMIT 100;

3. Select Only Needed Columns

-- BAD: Transfers unnecessary data
SELECT * FROM users WHERE country = 'USA';

-- GOOD: Select only needed columns
SELECT id, name, email FROM users WHERE country = 'USA';

-- BETTER: May use covering index
CREATE INDEX idx_country_id_name_email ON users(country, id, name, email);

4. Use EXISTS Instead of IN for Subqueries

-- SLOWER: IN with subquery
SELECT * FROM users
WHERE id IN (SELECT customer_id FROM orders WHERE total > 1000);

-- FASTER: EXISTS (stops at first match)
SELECT * FROM users u
WHERE EXISTS (
    SELECT 1 FROM orders o
    WHERE o.customer_id = u.id AND o.total > 1000
);

5. Avoid SELECT DISTINCT When Possible

-- SLOW: DISTINCT on large result set
SELECT DISTINCT country FROM users;

-- FASTER: Use GROUP BY
SELECT country FROM users GROUP BY country;

-- FASTEST: If you just need to check existence
SELECT country FROM users GROUP BY country LIMIT 1;

Practice Exercise:

Scenario: This query is running slowly on a blog with 100,000 articles:

SELECT
    a.title,
    a.content,
    COUNT(c.id) as comment_count
FROM articles a
LEFT JOIN comments c ON a.id = c.article_id
WHERE MONTH(a.published_at) = 12
  AND a.is_published = 1
GROUP BY a.id
ORDER BY comment_count DESC;

-- Run EXPLAIN
EXPLAIN [the query above];

-- Output shows:
+----+--------+----------+------+---------------+------+------+--------+------+--------------------------+
| id | type   | table    | type | possible_keys | key  | rows | Extra                            |
+----+--------+----------+------+---------------+------+------+--------+------+--------------------------+
| 1  | SIMPLE | a        | ALL  | NULL          | NULL | NULL | 100000 | Using where; Using filesort  |
| 1  | SIMPLE | c        | ALL  | NULL          | NULL | NULL | 500000 | Using where; Using temporary |
+----+--------+----------+------+---------------+------+------+--------+------+--------------------------+

Questions:

What are the performance problems?
How would you fix them?

Answers:

Problems identified:
1. type = "ALL" on both tables (full table scans)
2. "Using filesort" (ORDER BY not using index)
3. "Using temporary" (GROUP BY creating temp table)
4. MONTH(published_at) prevents index usage
5. No index on comments.article_id for JOIN

Solutions:

-- Fix 1: Remove MONTH() function
WHERE a.published_at >= '2024-12-01'
  AND a.published_at < '2025-01-01'
  AND a.is_published = 1

-- Fix 2: Add indexes
CREATE INDEX idx_published ON articles(is_published, published_at);
CREATE INDEX idx_article_id ON comments(article_id);

-- Fix 3: Optimize query structure
SELECT
    a.id,
    a.title,
    COUNT(c.id) as comment_count
FROM articles a
LEFT JOIN comments c ON a.id = c.article_id
WHERE a.published_at >= '2024-12-01'
  AND a.published_at < '2025-01-01'
  AND a.is_published = 1
GROUP BY a.id
ORDER BY comment_count DESC
LIMIT 20;  -- Added LIMIT

-- Fix 4: Consider pre-aggregated data
-- Create a comment_count column in articles table
-- Update it with triggers or scheduled jobs for even better performance

Summary

In this lesson, you learned:

EXPLAIN shows how MySQL executes queries without running them
The "type" column is most important: aim for const, eq_ref, or ref
type = "ALL" indicates full table scan (usually needs optimization)
The "rows" column shows how many rows will be examined
EXPLAIN ANALYZE (MySQL 8.0.18+) provides actual execution timing
Enable slow query log to identify problematic queries
Avoid functions on indexed columns - they prevent index usage
Rewrite queries to be index-friendly
Always SELECT only the columns you need

Next Up: In the next lesson, we'll dive deeper into index optimization strategies including selectivity, leftmost prefix rule, and index hints!