Understanding Indexes
Database indexes are one of the most powerful tools for improving query performance. In this lesson, we'll explore how indexes work, when to use them, and the trade-offs involved in implementing them.
What is a Database Index?
A database index is a data structure that improves the speed of data retrieval operations on a table. Think of it like an index in a book - instead of reading every page to find a topic, you can look it up in the index and jump directly to the relevant pages.
Key Concept: Indexes speed up SELECT queries but add overhead to INSERT, UPDATE, and DELETE operations because the index must also be updated when data changes.
How Indexes Work: B-Tree Structure
Most MySQL indexes use a B-Tree (Balanced Tree) data structure. Here's how it works:
Without Index:
SELECT * FROM users WHERE email = 'john@example.com';
→ Scans ALL rows sequentially (Full Table Scan)
→ 1,000,000 rows = 1,000,000 comparisons
With Index on email:
SELECT * FROM users WHERE email = 'john@example.com';
→ Uses B-Tree to find exact location
→ 1,000,000 rows = ~20 comparisons (log₂ operations)
Performance Gain: An index can reduce query time from seconds to milliseconds. A table with 1 million rows might require 1 million comparisons without an index, but only 20 comparisons with a B-Tree index!
Index Types in MySQL
MySQL supports several types of indexes, each designed for specific use cases:
1. PRIMARY KEY Index
CREATE TABLE users (
id INT PRIMARY KEY,
email VARCHAR(255)
);
-- Primary key automatically creates a UNIQUE index
-- NULL values are not allowed
-- Only ONE primary key per table
2. UNIQUE Index
CREATE TABLE users (
id INT PRIMARY KEY,
email VARCHAR(255) UNIQUE
);
-- Ensures all values in the column are different
-- NULL values are allowed (but only one NULL in most cases)
-- Can have multiple UNIQUE indexes per table
3. INDEX (Regular Index)
CREATE TABLE users (
id INT PRIMARY KEY,
email VARCHAR(255),
country VARCHAR(50),
INDEX idx_country (country)
);
-- Allows duplicate values
-- Speeds up searches on the indexed column(s)
-- Most commonly used index type
4. FULLTEXT Index
CREATE TABLE articles (
id INT PRIMARY KEY,
title VARCHAR(255),
content TEXT,
FULLTEXT idx_content (title, content)
);
-- Used for full-text searches
-- Works with MATCH() AGAINST() syntax
-- Only available for CHAR, VARCHAR, TEXT columns
Single-Column vs Multi-Column Indexes
Single-Column Index
CREATE INDEX idx_email ON users(email);
-- Good for: WHERE email = 'john@example.com'
-- Good for: ORDER BY email
Multi-Column (Composite) Index
CREATE INDEX idx_country_city ON users(country, city);
-- Optimized for queries using BOTH columns
-- Follows the "leftmost prefix" rule
The leftmost prefix rule means the index can be used for:
✓ WHERE country = 'USA'
✓ WHERE country = 'USA' AND city = 'New York'
✗ WHERE city = 'New York' (can't use index - city is not leftmost)
Important: The order of columns in a composite index matters! Always place the most selective column first, or the column most frequently used in WHERE clauses.
When to Use Indexes
Indexes are beneficial in these scenarios:
✓ Index These:
- Columns used frequently in WHERE clauses
- Columns used in JOIN conditions
- Columns used in ORDER BY or GROUP BY
- Columns with high cardinality (many unique values)
- Foreign key columns
✗ Don't Index These:
- Small tables (under 1000 rows)
- Columns with low cardinality (few unique values like gender: M/F)
- Columns that are frequently updated
- Columns that are rarely queried
- Very large TEXT or BLOB columns
Index Overhead and Trade-offs
While indexes improve SELECT performance, they come with costs:
Benefits:
+ Faster SELECT queries (sometimes 100x faster)
+ Faster JOIN operations
+ Faster sorting (ORDER BY)
+ Enforces uniqueness (UNIQUE indexes)
Costs:
- Slower INSERT operations (index must be updated)
- Slower UPDATE operations (if indexed columns change)
- Slower DELETE operations (index must be updated)
- Additional disk space (indexes can be large)
- Maintenance overhead
Rule of Thumb: For read-heavy applications (many SELECTs), use more indexes. For write-heavy applications (many INSERTs/UPDATEs), use fewer indexes.
Viewing Indexes on a Table
You can view all indexes on a table using several methods:
-- Method 1: SHOW INDEXES
SHOW INDEXES FROM users;
-- Method 2: SHOW INDEX
SHOW INDEX FROM users;
-- Method 3: SHOW KEYS
SHOW KEYS FROM users;
-- Method 4: information_schema
SELECT
INDEX_NAME,
COLUMN_NAME,
SEQ_IN_INDEX,
NON_UNIQUE
FROM information_schema.STATISTICS
WHERE TABLE_NAME = 'users'
ORDER BY INDEX_NAME, SEQ_IN_INDEX;
Output example:
+------------+------------+-------------+-----------+
| Table | Non_unique | Key_name | Column |
+------------+------------+-------------+-----------+
| users | 0 | PRIMARY | id |
| users | 0 | email | email |
| users | 1 | idx_country | country |
+------------+------------+-------------+-----------+
Non_unique = 0: UNIQUE index (PRIMARY, UNIQUE)
Non_unique = 1: Regular index (allows duplicates)
Checking Index Usage
You can verify if a query is using an index with EXPLAIN:
EXPLAIN SELECT * FROM users WHERE email = 'john@example.com';
+----+------+---------+------+---------+-------+
| id | type | key | rows | filtered| Extra |
+----+------+---------+------+---------+-------+
| 1 | ref | email | 1 | 100.00 | NULL |
+----+------+---------+------+---------+-------+
Key column shows "email": Index is being used ✓
Type is "ref": Index lookup (fast) ✓
Rows is 1: Only 1 row examined ✓
EXPLAIN SELECT * FROM users WHERE country = 'USA';
+----+------+------+--------+---------+-------+
| id | type | key | rows | filtered| Extra |
+----+------+------+--------+---------+-------+
| 1 | ALL | NULL | 100000 | 10.00 | Using |
+----+------+------+--------+---------+-------+
Type is "ALL": Full table scan (slow) ✗
Key is NULL: No index used ✗
Rows is 100000: All rows examined ✗
Real-World Example
Let's see the performance difference with and without indexes:
-- Create table with 1 million users
CREATE TABLE users (
id INT PRIMARY KEY AUTO_INCREMENT,
email VARCHAR(255),
country VARCHAR(50),
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Without index on email
SELECT * FROM users WHERE email = 'john@example.com';
-- Query time: 2.5 seconds (full table scan)
-- Add index
CREATE INDEX idx_email ON users(email);
-- With index on email
SELECT * FROM users WHERE email = 'john@example.com';
-- Query time: 0.001 seconds (index lookup)
Performance improvement: 2500x faster!
Practice Exercise:
Scenario: You have a blog with 500,000 articles. Users frequently search by category, and you need to list articles ordered by published date.
CREATE TABLE articles (
id INT PRIMARY KEY,
title VARCHAR(255),
category VARCHAR(50),
author_id INT,
published_at DATETIME,
views INT
);
-- Common query:
SELECT * FROM articles
WHERE category = 'Technology'
ORDER BY published_at DESC
LIMIT 20;
Question: What indexes would you create to optimize this query?
Answer:
-- Create composite index with category first, then published_at
CREATE INDEX idx_category_published ON articles(category, published_at);
-- This index helps with:
-- 1. WHERE category = 'Technology' (uses leftmost column)
-- 2. ORDER BY published_at (uses second column)
-- 3. Both conditions together (uses full index)
-- Alternative if you need separate queries:
CREATE INDEX idx_category ON articles(category);
CREATE INDEX idx_published ON articles(published_at);
-- But composite index is more efficient for this specific query
Summary
In this lesson, you learned:
- Indexes use B-Tree structure to speed up data retrieval dramatically
- MySQL supports PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT index types
- Composite indexes follow the leftmost prefix rule
- Index columns used in WHERE, JOIN, ORDER BY, and GROUP BY clauses
- Indexes improve SELECT performance but slow down INSERT/UPDATE/DELETE
- Use SHOW INDEXES and EXPLAIN to view and verify index usage
- Proper indexing can improve query performance by 100x or more
Next Up: In the next lesson, we'll learn how to create and manage indexes effectively, including advanced techniques like covering indexes and invisible indexes!