MySQL & Database Design

Full-Text Search

13 min Lesson 18 of 40

Full-Text Search

Full-text search in MySQL allows you to perform complex text searches much faster and more effectively than using LIKE patterns. In this lesson, we'll explore FULLTEXT indexes and the MATCH AGAINST syntax for powerful text searching capabilities.

What is Full-Text Search?

Full-text search is optimized for searching large amounts of text data. It's much more efficient than pattern matching with LIKE:

Traditional LIKE search (SLOW): SELECT * FROM articles WHERE content LIKE '%database%' OR content LIKE '%performance%' OR content LIKE '%optimization%'; -- Problem: Full table scan, can't use regular index Full-text search (FAST): SELECT * FROM articles WHERE MATCH(content) AGAINST('database performance optimization'); -- Uses FULLTEXT index, returns relevance-ranked results
Key Advantage: Full-text search returns results ranked by relevance score, automatically handles word boundaries, and can search multiple columns simultaneously.

Creating FULLTEXT Indexes

FULLTEXT indexes can only be created on CHAR, VARCHAR, and TEXT columns:

Single Column FULLTEXT Index

-- Create table with FULLTEXT index CREATE TABLE articles ( id INT PRIMARY KEY AUTO_INCREMENT, title VARCHAR(255), content TEXT, created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, FULLTEXT KEY ft_content (content) ); -- Add FULLTEXT index to existing table ALTER TABLE articles ADD FULLTEXT INDEX ft_content (content); -- Or using CREATE INDEX CREATE FULLTEXT INDEX ft_content ON articles(content);

Multi-Column FULLTEXT Index

-- Search across multiple columns CREATE TABLE articles ( id INT PRIMARY KEY AUTO_INCREMENT, title VARCHAR(255), description TEXT, content TEXT, FULLTEXT KEY ft_all (title, description, content) ); -- This allows searching across all three columns at once SELECT * FROM articles WHERE MATCH(title, description, content) AGAINST('MySQL optimization');
Best Practice: Create separate FULLTEXT indexes for different search scenarios. One for title-only searches, another for content-only, and a third for searching all fields together.

MATCH AGAINST Syntax

The MATCH AGAINST clause is used to query FULLTEXT indexes:

Basic Syntax: SELECT * FROM table WHERE MATCH(column1, column2, ...) AGAINST('search terms'); Important Rules: 1. Columns in MATCH() must exactly match a FULLTEXT index 2. Can't mix MATCH columns from different FULLTEXT indexes 3. Returns rows sorted by relevance score (highest first)

Full-Text Search Modes

MySQL offers three full-text search modes, each with different capabilities:

1. Natural Language Mode (Default)

-- Natural language search (default) SELECT *, MATCH(title, content) AGAINST('database performance') as relevance FROM articles WHERE MATCH(title, content) AGAINST('database performance') ORDER BY relevance DESC; How it works: - Searches for any of the words: "database" OR "performance" - Automatically ignores very common words (stopwords) - Returns results ranked by relevance - Words in quotes are treated as phrases Example matches: ✓ "Database optimization techniques" ✓ "Improving query performance" ✓ "Database and performance tuning" ✗ "Getting started with MySQL" (no matching words)

2. Boolean Mode

Boolean mode gives you precise control over search behavior:

-- Boolean mode search SELECT * FROM articles WHERE MATCH(title, content) AGAINST('+database +performance' IN BOOLEAN MODE); Boolean Operators: + Word MUST be present Example: +database Matches: "Database design" ✓ Not: "SQL queries" ✗ - Word MUST NOT be present Example: +database -MySQL Matches: "PostgreSQL database" ✓ Not: "MySQL database" ✗ * Wildcard (word prefix) Example: optim* Matches: "optimization", "optimize", "optimal" ✓ "" Exact phrase Example: "database design" Matches: "...database design..." ✓ Not: "database and design" ✗ > Increase word relevance Example: +database >performance "performance" increases relevance score < Decrease word relevance Example: +database <legacy "legacy" decreases relevance score () Grouping Example: +(database mysql) +performance Must have (database OR mysql) AND performance

Boolean Mode Examples

-- Must contain "database" and "performance" WHERE MATCH(content) AGAINST('+database +performance' IN BOOLEAN MODE); -- Must contain "database", must not contain "Oracle" WHERE MATCH(content) AGAINST('+database -Oracle' IN BOOLEAN MODE); -- Wildcard search: optimization, optimize, optimal WHERE MATCH(content) AGAINST('optim*' IN BOOLEAN MODE); -- Exact phrase search WHERE MATCH(content) AGAINST('"database performance"' IN BOOLEAN MODE); -- Complex: Must have MySQL, prefer performance, exclude deprecated WHERE MATCH(content) AGAINST('+MySQL >performance -deprecated' IN BOOLEAN MODE); -- Either database OR performance, but not both required WHERE MATCH(content) AGAINST('database performance' IN BOOLEAN MODE); -- Must have database AND (optimization OR tuning) WHERE MATCH(content) AGAINST('+database +(optimization tuning)' IN BOOLEAN MODE);

3. Query Expansion Mode

Query expansion performs a two-pass search to find related terms:

-- Query expansion search SELECT * FROM articles WHERE MATCH(content) AGAINST('database' WITH QUERY EXPANSION); How it works: Pass 1: Search for "database" Pass 2: Find common words in Pass 1 results, search again Example: Initial search: "database" Pass 1 results contain: MySQL, SQL, tables, queries Pass 2 searches: "database MySQL SQL tables queries" When to use: ✓ User enters very short search terms ✓ You want to find related content ✓ Dealing with synonyms and related concepts Warning: Can return less relevant results if your dataset is diverse Use carefully - test with your actual data

Relevance Scoring

MySQL calculates a relevance score for each result. Higher scores indicate better matches:

-- Get relevance scores SELECT id, title, MATCH(title, content) AGAINST('MySQL performance') as relevance_score FROM articles WHERE MATCH(title, content) AGAINST('MySQL performance') ORDER BY relevance_score DESC; +----+--------------------------------+-----------------+ | id | title | relevance_score | +----+--------------------------------+-----------------+ | 15 | MySQL Performance Tuning Guide | 3.45678 | | 23 | Optimizing MySQL Queries | 2.89123 | | 8 | Database Performance Tips | 1.23456 | +----+--------------------------------+-----------------+ Factors affecting relevance: - Word frequency in document (more occurrences = higher score) - Word rarity across all documents (rare words = higher score) - Document length (shorter documents score higher) - Word position (words in title may score higher with multi-column index)
Tip: Always include the relevance score in your SELECT to understand why results are ranked the way they are. This helps you improve your search queries.

Stopwords

Stopwords are common words that are ignored by full-text search:

Default stopwords include: a, an, and, are, as, at, be, by, for, from, in, is, it, of, on, or, that, the, this, to, was, will, with, ... -- These searches are equivalent: MATCH(content) AGAINST('the database performance'); MATCH(content) AGAINST('database performance'); -- "the" is automatically ignored Minimum word length: - Default: 4 characters (InnoDB), 3 characters (MyISAM) - Words shorter than minimum are ignored -- "SQL" (3 chars) might be ignored with InnoDB default settings MATCH(content) AGAINST('SQL database'); -- Only searches "database"

Configuring Stopwords and Word Length

-- View current settings SHOW VARIABLES LIKE 'ft_%'; -- Change minimum word length (requires server restart) -- In my.cnf or my.ini: [mysqld] innodb_ft_min_token_size = 3 -- Minimum word length for InnoDB ft_min_word_len = 3 -- Minimum word length for MyISAM -- Use custom stopword list innodb_ft_server_stopword_table = 'your_db/custom_stopwords'; -- After changing, rebuild FULLTEXT indexes ALTER TABLE articles DROP INDEX ft_content; ALTER TABLE articles ADD FULLTEXT INDEX ft_content (content);

Full-Text Search Limitations

Limitations to be aware of: 1. Only for specific column types: ✓ CHAR, VARCHAR, TEXT ✗ INT, DATE, BLOB 2. Storage engine restrictions: ✓ InnoDB (MySQL 5.6+) ✓ MyISAM ✗ Memory, CSV engines 3. Case insensitive: "MySQL" and "mysql" are treated the same 4. No substring matching: "data" won't match "database" (use wildcard: data*) 5. 50% threshold rule: Words appearing in >50% of rows are ignored (Less relevant for boolean mode) 6. Performance on very large datasets: FULLTEXT is fast, but dedicated search engines (Elasticsearch) are better for millions of documents

Real-World Use Cases

Example 1: Blog Search

-- Blog with title, excerpt, and content CREATE TABLE blog_posts ( id INT PRIMARY KEY AUTO_INCREMENT, title VARCHAR(255), excerpt TEXT, content TEXT, author VARCHAR(100), created_at TIMESTAMP, FULLTEXT KEY ft_title (title), FULLTEXT KEY ft_content (content), FULLTEXT KEY ft_all (title, excerpt, content) ); -- Search only titles (fast, most relevant) SELECT * FROM blog_posts WHERE MATCH(title) AGAINST('MySQL performance'); -- Search all fields (comprehensive) SELECT id, title, MATCH(title, excerpt, content) AGAINST('MySQL performance') as score FROM blog_posts WHERE MATCH(title, excerpt, content) AGAINST('MySQL performance') ORDER BY score DESC LIMIT 10;

Example 2: Product Search

CREATE TABLE products ( id INT PRIMARY KEY, name VARCHAR(255), description TEXT, category VARCHAR(50), price DECIMAL(10,2), FULLTEXT KEY ft_search (name, description) ); -- Search products with boolean mode SELECT id, name, price, MATCH(name, description) AGAINST('+laptop +gaming -refurbished' IN BOOLEAN MODE) as relevance FROM products WHERE MATCH(name, description) AGAINST('+laptop +gaming -refurbished' IN BOOLEAN MODE) AND price BETWEEN 500 AND 2000 ORDER BY relevance DESC LIMIT 20; -- Combines full-text search with regular WHERE conditions

Example 3: Knowledge Base

CREATE TABLE kb_articles ( id INT PRIMARY KEY, title VARCHAR(255), content TEXT, tags VARCHAR(255), views INT DEFAULT 0, FULLTEXT KEY ft_search (title, content, tags) ); -- Advanced search with relevance boosting SELECT id, title, views, MATCH(title, content, tags) AGAINST('+MySQL >performance >optimization' IN BOOLEAN MODE) as relevance FROM kb_articles WHERE MATCH(title, content, tags) AGAINST('+MySQL >performance >optimization' IN BOOLEAN MODE) ORDER BY relevance DESC, views DESC LIMIT 10; -- Ranks by relevance first, then by popularity (views)

Practice Exercise:

Scenario: You're building a recipe website with full-text search.

CREATE TABLE recipes ( id INT PRIMARY KEY AUTO_INCREMENT, title VARCHAR(255), ingredients TEXT, instructions TEXT, cuisine VARCHAR(50), difficulty ENUM('easy', 'medium', 'hard'), prep_time INT, -- minutes FULLTEXT KEY ft_search (title, ingredients, instructions) ); -- Sample data INSERT INTO recipes (title, ingredients, instructions, cuisine, difficulty, prep_time) VALUES ('Classic Spaghetti Carbonara', 'pasta, eggs, bacon, parmesan, black pepper', 'Cook pasta. Fry bacon. Mix eggs and cheese. Combine all ingredients...', 'Italian', 'easy', 20), ('Chicken Tikka Masala', 'chicken, yogurt, tomato, cream, spices, garlic, ginger', 'Marinate chicken. Make sauce. Cook chicken in sauce...', 'Indian', 'medium', 45);

Tasks:

  1. Write a query to find recipes containing "chicken" and "garlic"
  2. Find recipes with "pasta" but NOT "cream"
  3. Search for recipes with "spice" or "spicy" (wildcard)
  4. Find "easy Italian" recipes (combine full-text with regular WHERE)

Solutions:

-- 1. Must contain "chicken" AND "garlic" SELECT title, cuisine, MATCH(title, ingredients, instructions) AGAINST('+chicken +garlic' IN BOOLEAN MODE) as score FROM recipes WHERE MATCH(title, ingredients, instructions) AGAINST('+chicken +garlic' IN BOOLEAN MODE) ORDER BY score DESC; -- 2. "pasta" but NOT "cream" SELECT title, ingredients FROM recipes WHERE MATCH(title, ingredients, instructions) AGAINST('+pasta -cream' IN BOOLEAN MODE); -- 3. Wildcard search for spice/spicy SELECT title FROM recipes WHERE MATCH(title, ingredients, instructions) AGAINST('spic*' IN BOOLEAN MODE); -- 4. Combine full-text with regular conditions SELECT title, cuisine, difficulty, prep_time FROM recipes WHERE MATCH(title, ingredients, instructions) AGAINST('Italian') AND difficulty = 'easy' AND prep_time <= 30 ORDER BY MATCH(title, ingredients, instructions) AGAINST('Italian') DESC;

Summary

In this lesson, you learned:

  • FULLTEXT indexes enable fast, relevance-ranked text searching
  • Create FULLTEXT indexes on CHAR, VARCHAR, and TEXT columns
  • Natural language mode searches for any words (default)
  • Boolean mode provides precise control (+, -, *, "", operators)
  • Query expansion finds related terms automatically
  • Relevance scores rank results by match quality
  • Stopwords and minimum word length affect search behavior
  • Full-text search is much faster than LIKE for text searches
  • Combine full-text search with regular WHERE conditions for powerful queries
Next Up: In the final lesson of this module, we'll explore performance best practices including connection pooling, batch operations, and avoiding common performance pitfalls!