We are still cooking the magic in the way!
MySQL & Database Design
Normalization - 3NF & BCNF
Normalization - 3NF & BCNF
In this lesson, we'll complete our journey through normalization by exploring Third Normal Form (3NF) and Boyce-Codd Normal Form (BCNF). We'll also discuss when denormalization might be necessary for performance.
Third Normal Form (3NF)
A table is in Third Normal Form if:
3NF Rules:
1. Must be in 2NF
- All 2NF rules satisfied
- No partial dependencies
2. No transitive dependencies
- Non-key columns cannot depend on other non-key columns
- Every non-key column must depend directly on the primary key
- Eliminate "A → B → C" dependencies
Key Concept: In 3NF, every piece of information is stored in only one place, and every non-key column provides a fact about the key, the whole key, and nothing but the key.
Understanding Transitive Dependencies
A transitive dependency occurs when a non-key column depends on another non-key column, which in turn depends on the primary key.
Example of Transitive Dependency:
employee_id → department_id → department_name
- department_name depends on department_id
- department_id depends on employee_id
- Therefore: department_name transitively depends on employee_id
Problem:
If you change a department name, you must update it in every employee record,
leading to update anomalies and data inconsistency.
3NF Violation Example
❌ NOT in 3NF (Has Transitive Dependency):
CREATE TABLE employees_bad (
employee_id INT PRIMARY KEY,
employee_name VARCHAR(100),
department_id INT,
department_name VARCHAR(100), -- ❌ Depends on department_id
department_location VARCHAR(100) -- ❌ Depends on department_id
);
INSERT INTO employees_bad VALUES
(1, 'John Doe', 10, 'Engineering', 'Building A'),
(2, 'Jane Smith', 10, 'Engineering', 'Building A'),
(3, 'Bob Johnson', 20, 'Marketing', 'Building B');
Problems:
1. department_name stored redundantly for every employee
2. If department name changes, must update multiple rows
3. Cannot store department info without an employee
4. Deleting last employee in department loses department data
Converting to 3NF
To convert to 3NF, remove transitive dependencies by creating separate tables:
✓ In 3NF (No Transitive Dependencies):
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
employee_name VARCHAR(100),
department_id INT,
hire_date DATE,
salary DECIMAL(10,2),
FOREIGN KEY (department_id) REFERENCES departments(department_id)
);
CREATE TABLE departments (
department_id INT PRIMARY KEY,
department_name VARCHAR(100),
department_location VARCHAR(100),
manager_id INT
);
INSERT INTO employees VALUES
(1, 'John Doe', 10, '2023-01-15', 75000.00),
(2, 'Jane Smith', 10, '2023-02-20', 82000.00),
(3, 'Bob Johnson', 20, '2023-03-10', 68000.00);
INSERT INTO departments VALUES
(10, 'Engineering', 'Building A', 1),
(20, 'Marketing', 'Building B', 3);
Benefits:
- No redundancy: department info stored once
- Update in one place: change department name once
- Can add departments without employees
- No deletion anomalies
Memory Aid: "The key, the whole key, and nothing but the key, so help me Codd" (after Edgar F. Codd, creator of the relational model).
Real-World 3NF Example: Product Catalog
❌ NOT in 3NF:
CREATE TABLE products_bad (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
category_id INT,
category_name VARCHAR(100), -- ❌ Transitive dependency
supplier_id INT,
supplier_name VARCHAR(100), -- ❌ Transitive dependency
supplier_country VARCHAR(50) -- ❌ Transitive dependency
);
✓ In 3NF:
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
category_id INT,
supplier_id INT,
price DECIMAL(10,2),
stock_quantity INT,
FOREIGN KEY (category_id) REFERENCES categories(category_id),
FOREIGN KEY (supplier_id) REFERENCES suppliers(supplier_id)
);
CREATE TABLE categories (
category_id INT PRIMARY KEY,
category_name VARCHAR(100),
description TEXT
);
CREATE TABLE suppliers (
supplier_id INT PRIMARY KEY,
supplier_name VARCHAR(100),
supplier_country VARCHAR(50),
contact_email VARCHAR(100),
phone VARCHAR(20)
);
Boyce-Codd Normal Form (BCNF)
BCNF is a stricter version of 3NF. A table is in BCNF if:
BCNF Rules:
1. Must be in 3NF
- All 3NF rules satisfied
2. For every functional dependency (X → Y), X must be a super key
- The left side of every dependency must be a candidate key
- No column or set of columns determines another unless it's a key
Note: Most tables in 3NF are also in BCNF. BCNF violations are rare but important to recognize.
BCNF Violation Example
❌ In 3NF but NOT in BCNF:
CREATE TABLE course_instructors (
student_id INT,
course VARCHAR(100),
instructor VARCHAR(100),
PRIMARY KEY (student_id, course)
);
Scenario:
- Each student can take a course only once
- Each course is taught by only one instructor
- Different students in the same course have the same instructor
Functional Dependencies:
- (student_id, course) → instructor ✓ (primary key determines instructor)
- course → instructor ❌ (course determines instructor, but course is not a key)
Problem:
The second dependency violates BCNF because "course" is not a super key,
yet it determines "instructor".
Converting to BCNF
✓ In BCNF:
CREATE TABLE enrollments (
student_id INT,
course VARCHAR(100),
enrollment_date DATE,
grade VARCHAR(2),
PRIMARY KEY (student_id, course),
FOREIGN KEY (course) REFERENCES course_instructors(course)
);
CREATE TABLE course_instructors (
course VARCHAR(100) PRIMARY KEY,
instructor VARCHAR(100),
semester VARCHAR(20)
);
Now all dependencies are valid:
- course → instructor (course is a key ✓)
- (student_id, course) → all columns (composite key ✓)
3NF vs BCNF Comparison
3NF:
- No transitive dependencies
- Every non-key attribute depends on the key
- Easier to achieve
- Most common in practice
BCNF:
- Stricter than 3NF
- Every determinant must be a candidate key
- Prevents all redundancy (theoretically)
- Harder to achieve
- Sometimes requires more joins
When to use which:
- Use 3NF for most practical applications
- Use BCNF when data integrity is critical
- Consider denormalization for performance
When to Denormalize
Normalization is ideal for data integrity, but sometimes denormalization is necessary for performance:
Valid Reasons to Denormalize:
✓ Performance Requirements
- Queries become too slow with many joins
- Read-heavy applications (reports, dashboards)
- Caching frequently accessed data
✓ Simplify Common Queries
- Reduce complexity of frequently-run queries
- Avoid 5+ table joins
- Improve response time for critical features
✓ Data Warehousing
- Analytics and reporting databases
- Star schema and snowflake schema designs
- Historical data storage
✓ Calculated Values
- Store aggregate counts (total_orders, total_spent)
- Cache expensive calculations
- Update via triggers or batch jobs
Warning: Denormalization introduces data redundancy. Always ensure you have mechanisms (triggers, application logic, or batch jobs) to keep denormalized data synchronized.
Denormalization Example
Normalized (3NF):
-- Orders table
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE
);
-- Order items table
CREATE TABLE order_items (
order_id INT,
product_id INT,
quantity INT,
price DECIMAL(10,2)
);
-- Query to get order total (requires JOIN and SUM)
SELECT o.order_id, SUM(oi.quantity * oi.price) as total
FROM orders o
JOIN order_items oi ON o.order_id = oi.order_id
GROUP BY o.order_id;
Denormalized (for performance):
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_id INT,
order_date DATE,
total_amount DECIMAL(10,2) -- ✓ Denormalized: stores calculated value
);
-- Update total_amount when items change (using trigger)
CREATE TRIGGER update_order_total
AFTER INSERT ON order_items
FOR EACH ROW
UPDATE orders
SET total_amount = (
SELECT SUM(quantity * price)
FROM order_items
WHERE order_id = NEW.order_id
)
WHERE order_id = NEW.order_id;
Trade-off:
- Faster reads (no JOIN or SUM needed)
- Slower writes (trigger overhead)
- Data redundancy (total stored twice)
- Must maintain consistency
Normalization Trade-offs
Pros of Full Normalization (3NF/BCNF):
✓ No data redundancy
✓ Easier to maintain data integrity
✓ Smaller database size
✓ Faster writes (fewer places to update)
✓ Prevents anomalies
Cons of Full Normalization:
❌ More complex queries (many JOINs)
❌ Slower read performance
❌ Harder to understand for non-technical users
❌ More foreign keys to manage
Best Practice:
- Start with 3NF
- Denormalize strategically where proven necessary
- Document denormalization decisions
- Implement safeguards to maintain consistency
Practical Denormalization Patterns
1. Counter Caches
users table: add "posts_count" column
UPDATE users SET posts_count = posts_count + 1 WHERE id = ?;
2. Snapshot Data
Store product price at time of order (price may change later)
order_items: product_id, price_at_purchase
3. Materialized Views
Pre-calculate complex aggregations
CREATE VIEW monthly_sales AS SELECT...
4. Redundant Timestamps
Store both created_at and updated_at for quick filtering
INDEX on updated_at for recent changes queries
Practice Exercise:
Identify the normal form and convert to 3NF:
CREATE TABLE book_sales (
sale_id INT PRIMARY KEY,
book_isbn VARCHAR(13),
book_title VARCHAR(200),
author_name VARCHAR(100),
author_country VARCHAR(50),
publisher_name VARCHAR(100),
publisher_city VARCHAR(50),
quantity INT,
sale_date DATE
);
Analysis:
Current State:
- In 1NF: ✓ All values atomic
- In 2NF: ✓ Single-column primary key (no partial dependencies)
- In 3NF: ❌ Multiple transitive dependencies
Transitive Dependencies:
- book_isbn → book_title (book info depends on ISBN)
- book_isbn → author_name (author depends on book)
- author_name → author_country (country depends on author)
- book_isbn → publisher_name (publisher depends on book)
- publisher_name → publisher_city (city depends on publisher)
3NF Solution:
CREATE TABLE books (
book_isbn VARCHAR(13) PRIMARY KEY,
book_title VARCHAR(200),
author_id INT,
publisher_id INT,
FOREIGN KEY (author_id) REFERENCES authors(author_id),
FOREIGN KEY (publisher_id) REFERENCES publishers(publisher_id)
);
CREATE TABLE authors (
author_id INT PRIMARY KEY,
author_name VARCHAR(100),
author_country VARCHAR(50)
);
CREATE TABLE publishers (
publisher_id INT PRIMARY KEY,
publisher_name VARCHAR(100),
publisher_city VARCHAR(50)
);
CREATE TABLE book_sales (
sale_id INT PRIMARY KEY,
book_isbn VARCHAR(13),
quantity INT,
sale_date DATE,
FOREIGN KEY (book_isbn) REFERENCES books(book_isbn)
);
Summary
In this lesson, you learned:
- 3NF eliminates transitive dependencies between non-key columns
- In 3NF, non-key columns depend only on the primary key
- BCNF is stricter: every determinant must be a candidate key
- Most tables in 3NF are also in BCNF
- Denormalization trades data integrity for query performance
- Denormalize strategically when proven necessary
- Use triggers or application logic to maintain denormalized data
Next Up: In the next lesson, we'll learn about choosing the right data types and column design best practices!