MySQL & Database Design

Understanding Indexes

13 min Lesson 14 of 40

Understanding Indexes

Database indexes are one of the most powerful tools for improving query performance. In this lesson, we'll explore how indexes work, when to use them, and the trade-offs involved in implementing them.

What is a Database Index?

A database index is a data structure that improves the speed of data retrieval operations on a table. Think of it like an index in a book - instead of reading every page to find a topic, you can look it up in the index and jump directly to the relevant pages.

Key Concept: Indexes speed up SELECT queries but add overhead to INSERT, UPDATE, and DELETE operations because the index must also be updated when data changes.

How Indexes Work: B-Tree Structure

Most MySQL indexes use a B-Tree (Balanced Tree) data structure. Here's how it works:

Without Index: SELECT * FROM users WHERE email = 'john@example.com'; → Scans ALL rows sequentially (Full Table Scan) → 1,000,000 rows = 1,000,000 comparisons With Index on email: SELECT * FROM users WHERE email = 'john@example.com'; → Uses B-Tree to find exact location → 1,000,000 rows = ~20 comparisons (log₂ operations)
Performance Gain: An index can reduce query time from seconds to milliseconds. A table with 1 million rows might require 1 million comparisons without an index, but only 20 comparisons with a B-Tree index!

Index Types in MySQL

MySQL supports several types of indexes, each designed for specific use cases:

1. PRIMARY KEY Index

CREATE TABLE users ( id INT PRIMARY KEY, email VARCHAR(255) ); -- Primary key automatically creates a UNIQUE index -- NULL values are not allowed -- Only ONE primary key per table

2. UNIQUE Index

CREATE TABLE users ( id INT PRIMARY KEY, email VARCHAR(255) UNIQUE ); -- Ensures all values in the column are different -- NULL values are allowed (but only one NULL in most cases) -- Can have multiple UNIQUE indexes per table

3. INDEX (Regular Index)

CREATE TABLE users ( id INT PRIMARY KEY, email VARCHAR(255), country VARCHAR(50), INDEX idx_country (country) ); -- Allows duplicate values -- Speeds up searches on the indexed column(s) -- Most commonly used index type

4. FULLTEXT Index

CREATE TABLE articles ( id INT PRIMARY KEY, title VARCHAR(255), content TEXT, FULLTEXT idx_content (title, content) ); -- Used for full-text searches -- Works with MATCH() AGAINST() syntax -- Only available for CHAR, VARCHAR, TEXT columns

Single-Column vs Multi-Column Indexes

Single-Column Index

CREATE INDEX idx_email ON users(email); -- Good for: WHERE email = 'john@example.com' -- Good for: ORDER BY email

Multi-Column (Composite) Index

CREATE INDEX idx_country_city ON users(country, city); -- Optimized for queries using BOTH columns -- Follows the "leftmost prefix" rule

The leftmost prefix rule means the index can be used for:

✓ WHERE country = 'USA' ✓ WHERE country = 'USA' AND city = 'New York' ✗ WHERE city = 'New York' (can't use index - city is not leftmost)
Important: The order of columns in a composite index matters! Always place the most selective column first, or the column most frequently used in WHERE clauses.

When to Use Indexes

Indexes are beneficial in these scenarios:

✓ Index These: - Columns used frequently in WHERE clauses - Columns used in JOIN conditions - Columns used in ORDER BY or GROUP BY - Columns with high cardinality (many unique values) - Foreign key columns ✗ Don't Index These: - Small tables (under 1000 rows) - Columns with low cardinality (few unique values like gender: M/F) - Columns that are frequently updated - Columns that are rarely queried - Very large TEXT or BLOB columns

Index Overhead and Trade-offs

While indexes improve SELECT performance, they come with costs:

Benefits: + Faster SELECT queries (sometimes 100x faster) + Faster JOIN operations + Faster sorting (ORDER BY) + Enforces uniqueness (UNIQUE indexes) Costs: - Slower INSERT operations (index must be updated) - Slower UPDATE operations (if indexed columns change) - Slower DELETE operations (index must be updated) - Additional disk space (indexes can be large) - Maintenance overhead
Rule of Thumb: For read-heavy applications (many SELECTs), use more indexes. For write-heavy applications (many INSERTs/UPDATEs), use fewer indexes.

Viewing Indexes on a Table

You can view all indexes on a table using several methods:

-- Method 1: SHOW INDEXES SHOW INDEXES FROM users; -- Method 2: SHOW INDEX SHOW INDEX FROM users; -- Method 3: SHOW KEYS SHOW KEYS FROM users; -- Method 4: information_schema SELECT INDEX_NAME, COLUMN_NAME, SEQ_IN_INDEX, NON_UNIQUE FROM information_schema.STATISTICS WHERE TABLE_NAME = 'users' ORDER BY INDEX_NAME, SEQ_IN_INDEX;

Output example:

+------------+------------+-------------+-----------+ | Table | Non_unique | Key_name | Column | +------------+------------+-------------+-----------+ | users | 0 | PRIMARY | id | | users | 0 | email | email | | users | 1 | idx_country | country | +------------+------------+-------------+-----------+ Non_unique = 0: UNIQUE index (PRIMARY, UNIQUE) Non_unique = 1: Regular index (allows duplicates)

Checking Index Usage

You can verify if a query is using an index with EXPLAIN:

EXPLAIN SELECT * FROM users WHERE email = 'john@example.com'; +----+------+---------+------+---------+-------+ | id | type | key | rows | filtered| Extra | +----+------+---------+------+---------+-------+ | 1 | ref | email | 1 | 100.00 | NULL | +----+------+---------+------+---------+-------+ Key column shows "email": Index is being used ✓ Type is "ref": Index lookup (fast) ✓ Rows is 1: Only 1 row examined ✓
EXPLAIN SELECT * FROM users WHERE country = 'USA'; +----+------+------+--------+---------+-------+ | id | type | key | rows | filtered| Extra | +----+------+------+--------+---------+-------+ | 1 | ALL | NULL | 100000 | 10.00 | Using | +----+------+------+--------+---------+-------+ Type is "ALL": Full table scan (slow) ✗ Key is NULL: No index used ✗ Rows is 100000: All rows examined ✗

Real-World Example

Let's see the performance difference with and without indexes:

-- Create table with 1 million users CREATE TABLE users ( id INT PRIMARY KEY AUTO_INCREMENT, email VARCHAR(255), country VARCHAR(50), created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); -- Without index on email SELECT * FROM users WHERE email = 'john@example.com'; -- Query time: 2.5 seconds (full table scan) -- Add index CREATE INDEX idx_email ON users(email); -- With index on email SELECT * FROM users WHERE email = 'john@example.com'; -- Query time: 0.001 seconds (index lookup) Performance improvement: 2500x faster!

Practice Exercise:

Scenario: You have a blog with 500,000 articles. Users frequently search by category, and you need to list articles ordered by published date.

CREATE TABLE articles ( id INT PRIMARY KEY, title VARCHAR(255), category VARCHAR(50), author_id INT, published_at DATETIME, views INT ); -- Common query: SELECT * FROM articles WHERE category = 'Technology' ORDER BY published_at DESC LIMIT 20;

Question: What indexes would you create to optimize this query?

Answer:

-- Create composite index with category first, then published_at CREATE INDEX idx_category_published ON articles(category, published_at); -- This index helps with: -- 1. WHERE category = 'Technology' (uses leftmost column) -- 2. ORDER BY published_at (uses second column) -- 3. Both conditions together (uses full index) -- Alternative if you need separate queries: CREATE INDEX idx_category ON articles(category); CREATE INDEX idx_published ON articles(published_at); -- But composite index is more efficient for this specific query

Summary

In this lesson, you learned:

  • Indexes use B-Tree structure to speed up data retrieval dramatically
  • MySQL supports PRIMARY KEY, UNIQUE, INDEX, and FULLTEXT index types
  • Composite indexes follow the leftmost prefix rule
  • Index columns used in WHERE, JOIN, ORDER BY, and GROUP BY clauses
  • Indexes improve SELECT performance but slow down INSERT/UPDATE/DELETE
  • Use SHOW INDEXES and EXPLAIN to view and verify index usage
  • Proper indexing can improve query performance by 100x or more
Next Up: In the next lesson, we'll learn how to create and manage indexes effectively, including advanced techniques like covering indexes and invisible indexes!