Full-Text Search
Full-text search in MySQL allows you to perform complex text searches much faster and more effectively than using LIKE patterns. In this lesson, we'll explore FULLTEXT indexes and the MATCH AGAINST syntax for powerful text searching capabilities.
What is Full-Text Search?
Full-text search is optimized for searching large amounts of text data. It's much more efficient than pattern matching with LIKE:
Traditional LIKE search (SLOW):
SELECT * FROM articles
WHERE content LIKE '%database%'
OR content LIKE '%performance%'
OR content LIKE '%optimization%';
-- Problem: Full table scan, can't use regular index
Full-text search (FAST):
SELECT * FROM articles
WHERE MATCH(content) AGAINST('database performance optimization');
-- Uses FULLTEXT index, returns relevance-ranked results
Key Advantage: Full-text search returns results ranked by relevance score, automatically handles word boundaries, and can search multiple columns simultaneously.
Creating FULLTEXT Indexes
FULLTEXT indexes can only be created on CHAR, VARCHAR, and TEXT columns:
Single Column FULLTEXT Index
-- Create table with FULLTEXT index
CREATE TABLE articles (
id INT PRIMARY KEY AUTO_INCREMENT,
title VARCHAR(255),
content TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
FULLTEXT KEY ft_content (content)
);
-- Add FULLTEXT index to existing table
ALTER TABLE articles ADD FULLTEXT INDEX ft_content (content);
-- Or using CREATE INDEX
CREATE FULLTEXT INDEX ft_content ON articles(content);
Multi-Column FULLTEXT Index
-- Search across multiple columns
CREATE TABLE articles (
id INT PRIMARY KEY AUTO_INCREMENT,
title VARCHAR(255),
description TEXT,
content TEXT,
FULLTEXT KEY ft_all (title, description, content)
);
-- This allows searching across all three columns at once
SELECT * FROM articles
WHERE MATCH(title, description, content) AGAINST('MySQL optimization');
Best Practice: Create separate FULLTEXT indexes for different search scenarios. One for title-only searches, another for content-only, and a third for searching all fields together.
MATCH AGAINST Syntax
The MATCH AGAINST clause is used to query FULLTEXT indexes:
Basic Syntax:
SELECT * FROM table
WHERE MATCH(column1, column2, ...) AGAINST('search terms');
Important Rules:
1. Columns in MATCH() must exactly match a FULLTEXT index
2. Can't mix MATCH columns from different FULLTEXT indexes
3. Returns rows sorted by relevance score (highest first)
Full-Text Search Modes
MySQL offers three full-text search modes, each with different capabilities:
1. Natural Language Mode (Default)
-- Natural language search (default)
SELECT *, MATCH(title, content) AGAINST('database performance') as relevance
FROM articles
WHERE MATCH(title, content) AGAINST('database performance')
ORDER BY relevance DESC;
How it works:
- Searches for any of the words: "database" OR "performance"
- Automatically ignores very common words (stopwords)
- Returns results ranked by relevance
- Words in quotes are treated as phrases
Example matches:
✓ "Database optimization techniques"
✓ "Improving query performance"
✓ "Database and performance tuning"
✗ "Getting started with MySQL" (no matching words)
2. Boolean Mode
Boolean mode gives you precise control over search behavior:
-- Boolean mode search
SELECT * FROM articles
WHERE MATCH(title, content) AGAINST('+database +performance' IN BOOLEAN MODE);
Boolean Operators:
+ Word MUST be present
Example: +database
Matches: "Database design" ✓
Not: "SQL queries" ✗
- Word MUST NOT be present
Example: +database -MySQL
Matches: "PostgreSQL database" ✓
Not: "MySQL database" ✗
* Wildcard (word prefix)
Example: optim*
Matches: "optimization", "optimize", "optimal" ✓
"" Exact phrase
Example: "database design"
Matches: "...database design..." ✓
Not: "database and design" ✗
> Increase word relevance
Example: +database >performance
"performance" increases relevance score
< Decrease word relevance
Example: +database <legacy
"legacy" decreases relevance score
() Grouping
Example: +(database mysql) +performance
Must have (database OR mysql) AND performance
Boolean Mode Examples
-- Must contain "database" and "performance"
WHERE MATCH(content) AGAINST('+database +performance' IN BOOLEAN MODE);
-- Must contain "database", must not contain "Oracle"
WHERE MATCH(content) AGAINST('+database -Oracle' IN BOOLEAN MODE);
-- Wildcard search: optimization, optimize, optimal
WHERE MATCH(content) AGAINST('optim*' IN BOOLEAN MODE);
-- Exact phrase search
WHERE MATCH(content) AGAINST('"database performance"' IN BOOLEAN MODE);
-- Complex: Must have MySQL, prefer performance, exclude deprecated
WHERE MATCH(content) AGAINST('+MySQL >performance -deprecated' IN BOOLEAN MODE);
-- Either database OR performance, but not both required
WHERE MATCH(content) AGAINST('database performance' IN BOOLEAN MODE);
-- Must have database AND (optimization OR tuning)
WHERE MATCH(content) AGAINST('+database +(optimization tuning)' IN BOOLEAN MODE);
3. Query Expansion Mode
Query expansion performs a two-pass search to find related terms:
-- Query expansion search
SELECT * FROM articles
WHERE MATCH(content) AGAINST('database' WITH QUERY EXPANSION);
How it works:
Pass 1: Search for "database"
Pass 2: Find common words in Pass 1 results, search again
Example:
Initial search: "database"
Pass 1 results contain: MySQL, SQL, tables, queries
Pass 2 searches: "database MySQL SQL tables queries"
When to use:
✓ User enters very short search terms
✓ You want to find related content
✓ Dealing with synonyms and related concepts
Warning:
Can return less relevant results if your dataset is diverse
Use carefully - test with your actual data
Relevance Scoring
MySQL calculates a relevance score for each result. Higher scores indicate better matches:
-- Get relevance scores
SELECT
id,
title,
MATCH(title, content) AGAINST('MySQL performance') as relevance_score
FROM articles
WHERE MATCH(title, content) AGAINST('MySQL performance')
ORDER BY relevance_score DESC;
+----+--------------------------------+-----------------+
| id | title | relevance_score |
+----+--------------------------------+-----------------+
| 15 | MySQL Performance Tuning Guide | 3.45678 |
| 23 | Optimizing MySQL Queries | 2.89123 |
| 8 | Database Performance Tips | 1.23456 |
+----+--------------------------------+-----------------+
Factors affecting relevance:
- Word frequency in document (more occurrences = higher score)
- Word rarity across all documents (rare words = higher score)
- Document length (shorter documents score higher)
- Word position (words in title may score higher with multi-column index)
Tip: Always include the relevance score in your SELECT to understand why results are ranked the way they are. This helps you improve your search queries.
Stopwords
Stopwords are common words that are ignored by full-text search:
Default stopwords include:
a, an, and, are, as, at, be, by, for, from, in, is, it, of, on, or,
that, the, this, to, was, will, with, ...
-- These searches are equivalent:
MATCH(content) AGAINST('the database performance');
MATCH(content) AGAINST('database performance');
-- "the" is automatically ignored
Minimum word length:
- Default: 4 characters (InnoDB), 3 characters (MyISAM)
- Words shorter than minimum are ignored
-- "SQL" (3 chars) might be ignored with InnoDB default settings
MATCH(content) AGAINST('SQL database'); -- Only searches "database"
Configuring Stopwords and Word Length
-- View current settings
SHOW VARIABLES LIKE 'ft_%';
-- Change minimum word length (requires server restart)
-- In my.cnf or my.ini:
[mysqld]
innodb_ft_min_token_size = 3 -- Minimum word length for InnoDB
ft_min_word_len = 3 -- Minimum word length for MyISAM
-- Use custom stopword list
innodb_ft_server_stopword_table = 'your_db/custom_stopwords';
-- After changing, rebuild FULLTEXT indexes
ALTER TABLE articles DROP INDEX ft_content;
ALTER TABLE articles ADD FULLTEXT INDEX ft_content (content);
Full-Text Search Limitations
Limitations to be aware of:
1. Only for specific column types:
✓ CHAR, VARCHAR, TEXT
✗ INT, DATE, BLOB
2. Storage engine restrictions:
✓ InnoDB (MySQL 5.6+)
✓ MyISAM
✗ Memory, CSV engines
3. Case insensitive:
"MySQL" and "mysql" are treated the same
4. No substring matching:
"data" won't match "database" (use wildcard: data*)
5. 50% threshold rule:
Words appearing in >50% of rows are ignored
(Less relevant for boolean mode)
6. Performance on very large datasets:
FULLTEXT is fast, but dedicated search engines (Elasticsearch)
are better for millions of documents
Real-World Use Cases
Example 1: Blog Search
-- Blog with title, excerpt, and content
CREATE TABLE blog_posts (
id INT PRIMARY KEY AUTO_INCREMENT,
title VARCHAR(255),
excerpt TEXT,
content TEXT,
author VARCHAR(100),
created_at TIMESTAMP,
FULLTEXT KEY ft_title (title),
FULLTEXT KEY ft_content (content),
FULLTEXT KEY ft_all (title, excerpt, content)
);
-- Search only titles (fast, most relevant)
SELECT * FROM blog_posts
WHERE MATCH(title) AGAINST('MySQL performance');
-- Search all fields (comprehensive)
SELECT
id,
title,
MATCH(title, excerpt, content) AGAINST('MySQL performance') as score
FROM blog_posts
WHERE MATCH(title, excerpt, content) AGAINST('MySQL performance')
ORDER BY score DESC
LIMIT 10;
Example 2: Product Search
CREATE TABLE products (
id INT PRIMARY KEY,
name VARCHAR(255),
description TEXT,
category VARCHAR(50),
price DECIMAL(10,2),
FULLTEXT KEY ft_search (name, description)
);
-- Search products with boolean mode
SELECT
id,
name,
price,
MATCH(name, description) AGAINST('+laptop +gaming -refurbished' IN BOOLEAN MODE) as relevance
FROM products
WHERE MATCH(name, description) AGAINST('+laptop +gaming -refurbished' IN BOOLEAN MODE)
AND price BETWEEN 500 AND 2000
ORDER BY relevance DESC
LIMIT 20;
-- Combines full-text search with regular WHERE conditions
Example 3: Knowledge Base
CREATE TABLE kb_articles (
id INT PRIMARY KEY,
title VARCHAR(255),
content TEXT,
tags VARCHAR(255),
views INT DEFAULT 0,
FULLTEXT KEY ft_search (title, content, tags)
);
-- Advanced search with relevance boosting
SELECT
id,
title,
views,
MATCH(title, content, tags) AGAINST('+MySQL >performance >optimization' IN BOOLEAN MODE) as relevance
FROM kb_articles
WHERE MATCH(title, content, tags) AGAINST('+MySQL >performance >optimization' IN BOOLEAN MODE)
ORDER BY relevance DESC, views DESC
LIMIT 10;
-- Ranks by relevance first, then by popularity (views)
Practice Exercise:
Scenario: You're building a recipe website with full-text search.
CREATE TABLE recipes (
id INT PRIMARY KEY AUTO_INCREMENT,
title VARCHAR(255),
ingredients TEXT,
instructions TEXT,
cuisine VARCHAR(50),
difficulty ENUM('easy', 'medium', 'hard'),
prep_time INT, -- minutes
FULLTEXT KEY ft_search (title, ingredients, instructions)
);
-- Sample data
INSERT INTO recipes (title, ingredients, instructions, cuisine, difficulty, prep_time)
VALUES
('Classic Spaghetti Carbonara', 'pasta, eggs, bacon, parmesan, black pepper',
'Cook pasta. Fry bacon. Mix eggs and cheese. Combine all ingredients...',
'Italian', 'easy', 20),
('Chicken Tikka Masala', 'chicken, yogurt, tomato, cream, spices, garlic, ginger',
'Marinate chicken. Make sauce. Cook chicken in sauce...',
'Indian', 'medium', 45);
Tasks:
- Write a query to find recipes containing "chicken" and "garlic"
- Find recipes with "pasta" but NOT "cream"
- Search for recipes with "spice" or "spicy" (wildcard)
- Find "easy Italian" recipes (combine full-text with regular WHERE)
Solutions:
-- 1. Must contain "chicken" AND "garlic"
SELECT title, cuisine,
MATCH(title, ingredients, instructions) AGAINST('+chicken +garlic' IN BOOLEAN MODE) as score
FROM recipes
WHERE MATCH(title, ingredients, instructions) AGAINST('+chicken +garlic' IN BOOLEAN MODE)
ORDER BY score DESC;
-- 2. "pasta" but NOT "cream"
SELECT title, ingredients
FROM recipes
WHERE MATCH(title, ingredients, instructions) AGAINST('+pasta -cream' IN BOOLEAN MODE);
-- 3. Wildcard search for spice/spicy
SELECT title
FROM recipes
WHERE MATCH(title, ingredients, instructions) AGAINST('spic*' IN BOOLEAN MODE);
-- 4. Combine full-text with regular conditions
SELECT title, cuisine, difficulty, prep_time
FROM recipes
WHERE MATCH(title, ingredients, instructions) AGAINST('Italian')
AND difficulty = 'easy'
AND prep_time <= 30
ORDER BY MATCH(title, ingredients, instructions) AGAINST('Italian') DESC;
Summary
In this lesson, you learned:
- FULLTEXT indexes enable fast, relevance-ranked text searching
- Create FULLTEXT indexes on CHAR, VARCHAR, and TEXT columns
- Natural language mode searches for any words (default)
- Boolean mode provides precise control (+, -, *, "", operators)
- Query expansion finds related terms automatically
- Relevance scores rank results by match quality
- Stopwords and minimum word length affect search behavior
- Full-text search is much faster than LIKE for text searches
- Combine full-text search with regular WHERE conditions for powerful queries
Next Up: In the final lesson of this module, we'll explore performance best practices including connection pooling, batch operations, and avoiding common performance pitfalls!