MySQL & Database Design

Normalization - 3NF & BCNF

13 min Lesson 4 of 40

Normalization - 3NF & BCNF

In this lesson, we'll complete our journey through normalization by exploring Third Normal Form (3NF) and Boyce-Codd Normal Form (BCNF). We'll also discuss when denormalization might be necessary for performance.

Third Normal Form (3NF)

A table is in Third Normal Form if:

3NF Rules: 1. Must be in 2NF - All 2NF rules satisfied - No partial dependencies 2. No transitive dependencies - Non-key columns cannot depend on other non-key columns - Every non-key column must depend directly on the primary key - Eliminate "A → B → C" dependencies
Key Concept: In 3NF, every piece of information is stored in only one place, and every non-key column provides a fact about the key, the whole key, and nothing but the key.

Understanding Transitive Dependencies

A transitive dependency occurs when a non-key column depends on another non-key column, which in turn depends on the primary key.

Example of Transitive Dependency: employee_id → department_id → department_name - department_name depends on department_id - department_id depends on employee_id - Therefore: department_name transitively depends on employee_id Problem: If you change a department name, you must update it in every employee record, leading to update anomalies and data inconsistency.

3NF Violation Example

❌ NOT in 3NF (Has Transitive Dependency): CREATE TABLE employees_bad ( employee_id INT PRIMARY KEY, employee_name VARCHAR(100), department_id INT, department_name VARCHAR(100), -- ❌ Depends on department_id department_location VARCHAR(100) -- ❌ Depends on department_id ); INSERT INTO employees_bad VALUES (1, 'John Doe', 10, 'Engineering', 'Building A'), (2, 'Jane Smith', 10, 'Engineering', 'Building A'), (3, 'Bob Johnson', 20, 'Marketing', 'Building B'); Problems: 1. department_name stored redundantly for every employee 2. If department name changes, must update multiple rows 3. Cannot store department info without an employee 4. Deleting last employee in department loses department data

Converting to 3NF

To convert to 3NF, remove transitive dependencies by creating separate tables:

✓ In 3NF (No Transitive Dependencies): CREATE TABLE employees ( employee_id INT PRIMARY KEY, employee_name VARCHAR(100), department_id INT, hire_date DATE, salary DECIMAL(10,2), FOREIGN KEY (department_id) REFERENCES departments(department_id) ); CREATE TABLE departments ( department_id INT PRIMARY KEY, department_name VARCHAR(100), department_location VARCHAR(100), manager_id INT ); INSERT INTO employees VALUES (1, 'John Doe', 10, '2023-01-15', 75000.00), (2, 'Jane Smith', 10, '2023-02-20', 82000.00), (3, 'Bob Johnson', 20, '2023-03-10', 68000.00); INSERT INTO departments VALUES (10, 'Engineering', 'Building A', 1), (20, 'Marketing', 'Building B', 3); Benefits: - No redundancy: department info stored once - Update in one place: change department name once - Can add departments without employees - No deletion anomalies
Memory Aid: "The key, the whole key, and nothing but the key, so help me Codd" (after Edgar F. Codd, creator of the relational model).

Real-World 3NF Example: Product Catalog

❌ NOT in 3NF: CREATE TABLE products_bad ( product_id INT PRIMARY KEY, product_name VARCHAR(100), category_id INT, category_name VARCHAR(100), -- ❌ Transitive dependency supplier_id INT, supplier_name VARCHAR(100), -- ❌ Transitive dependency supplier_country VARCHAR(50) -- ❌ Transitive dependency ); ✓ In 3NF: CREATE TABLE products ( product_id INT PRIMARY KEY, product_name VARCHAR(100), category_id INT, supplier_id INT, price DECIMAL(10,2), stock_quantity INT, FOREIGN KEY (category_id) REFERENCES categories(category_id), FOREIGN KEY (supplier_id) REFERENCES suppliers(supplier_id) ); CREATE TABLE categories ( category_id INT PRIMARY KEY, category_name VARCHAR(100), description TEXT ); CREATE TABLE suppliers ( supplier_id INT PRIMARY KEY, supplier_name VARCHAR(100), supplier_country VARCHAR(50), contact_email VARCHAR(100), phone VARCHAR(20) );

Boyce-Codd Normal Form (BCNF)

BCNF is a stricter version of 3NF. A table is in BCNF if:

BCNF Rules: 1. Must be in 3NF - All 3NF rules satisfied 2. For every functional dependency (X → Y), X must be a super key - The left side of every dependency must be a candidate key - No column or set of columns determines another unless it's a key
Note: Most tables in 3NF are also in BCNF. BCNF violations are rare but important to recognize.

BCNF Violation Example

❌ In 3NF but NOT in BCNF: CREATE TABLE course_instructors ( student_id INT, course VARCHAR(100), instructor VARCHAR(100), PRIMARY KEY (student_id, course) ); Scenario: - Each student can take a course only once - Each course is taught by only one instructor - Different students in the same course have the same instructor Functional Dependencies: - (student_id, course) → instructor ✓ (primary key determines instructor) - course → instructor ❌ (course determines instructor, but course is not a key) Problem: The second dependency violates BCNF because "course" is not a super key, yet it determines "instructor".

Converting to BCNF

✓ In BCNF: CREATE TABLE enrollments ( student_id INT, course VARCHAR(100), enrollment_date DATE, grade VARCHAR(2), PRIMARY KEY (student_id, course), FOREIGN KEY (course) REFERENCES course_instructors(course) ); CREATE TABLE course_instructors ( course VARCHAR(100) PRIMARY KEY, instructor VARCHAR(100), semester VARCHAR(20) ); Now all dependencies are valid: - course → instructor (course is a key ✓) - (student_id, course) → all columns (composite key ✓)

3NF vs BCNF Comparison

3NF: - No transitive dependencies - Every non-key attribute depends on the key - Easier to achieve - Most common in practice BCNF: - Stricter than 3NF - Every determinant must be a candidate key - Prevents all redundancy (theoretically) - Harder to achieve - Sometimes requires more joins When to use which: - Use 3NF for most practical applications - Use BCNF when data integrity is critical - Consider denormalization for performance

When to Denormalize

Normalization is ideal for data integrity, but sometimes denormalization is necessary for performance:

Valid Reasons to Denormalize:Performance Requirements - Queries become too slow with many joins - Read-heavy applications (reports, dashboards) - Caching frequently accessed data ✓ Simplify Common Queries - Reduce complexity of frequently-run queries - Avoid 5+ table joins - Improve response time for critical features ✓ Data Warehousing - Analytics and reporting databases - Star schema and snowflake schema designs - Historical data storage ✓ Calculated Values - Store aggregate counts (total_orders, total_spent) - Cache expensive calculations - Update via triggers or batch jobs
Warning: Denormalization introduces data redundancy. Always ensure you have mechanisms (triggers, application logic, or batch jobs) to keep denormalized data synchronized.

Denormalization Example

Normalized (3NF): -- Orders table CREATE TABLE orders ( order_id INT PRIMARY KEY, customer_id INT, order_date DATE ); -- Order items table CREATE TABLE order_items ( order_id INT, product_id INT, quantity INT, price DECIMAL(10,2) ); -- Query to get order total (requires JOIN and SUM) SELECT o.order_id, SUM(oi.quantity * oi.price) as total FROM orders o JOIN order_items oi ON o.order_id = oi.order_id GROUP BY o.order_id; Denormalized (for performance): CREATE TABLE orders ( order_id INT PRIMARY KEY, customer_id INT, order_date DATE, total_amount DECIMAL(10,2) -- ✓ Denormalized: stores calculated value ); -- Update total_amount when items change (using trigger) CREATE TRIGGER update_order_total AFTER INSERT ON order_items FOR EACH ROW UPDATE orders SET total_amount = ( SELECT SUM(quantity * price) FROM order_items WHERE order_id = NEW.order_id ) WHERE order_id = NEW.order_id; Trade-off: - Faster reads (no JOIN or SUM needed) - Slower writes (trigger overhead) - Data redundancy (total stored twice) - Must maintain consistency

Normalization Trade-offs

Pros of Full Normalization (3NF/BCNF): ✓ No data redundancy ✓ Easier to maintain data integrity ✓ Smaller database size ✓ Faster writes (fewer places to update) ✓ Prevents anomalies Cons of Full Normalization: ❌ More complex queries (many JOINs) ❌ Slower read performance ❌ Harder to understand for non-technical users ❌ More foreign keys to manage Best Practice: - Start with 3NF - Denormalize strategically where proven necessary - Document denormalization decisions - Implement safeguards to maintain consistency

Practical Denormalization Patterns

1. Counter Caches users table: add "posts_count" column UPDATE users SET posts_count = posts_count + 1 WHERE id = ?; 2. Snapshot Data Store product price at time of order (price may change later) order_items: product_id, price_at_purchase 3. Materialized Views Pre-calculate complex aggregations CREATE VIEW monthly_sales AS SELECT... 4. Redundant Timestamps Store both created_at and updated_at for quick filtering INDEX on updated_at for recent changes queries

Practice Exercise:

Identify the normal form and convert to 3NF:

CREATE TABLE book_sales ( sale_id INT PRIMARY KEY, book_isbn VARCHAR(13), book_title VARCHAR(200), author_name VARCHAR(100), author_country VARCHAR(50), publisher_name VARCHAR(100), publisher_city VARCHAR(50), quantity INT, sale_date DATE );

Analysis:

Current State: - In 1NF: ✓ All values atomic - In 2NF: ✓ Single-column primary key (no partial dependencies) - In 3NF: ❌ Multiple transitive dependencies Transitive Dependencies: - book_isbn → book_title (book info depends on ISBN) - book_isbn → author_name (author depends on book) - author_name → author_country (country depends on author) - book_isbn → publisher_name (publisher depends on book) - publisher_name → publisher_city (city depends on publisher) 3NF Solution: CREATE TABLE books ( book_isbn VARCHAR(13) PRIMARY KEY, book_title VARCHAR(200), author_id INT, publisher_id INT, FOREIGN KEY (author_id) REFERENCES authors(author_id), FOREIGN KEY (publisher_id) REFERENCES publishers(publisher_id) ); CREATE TABLE authors ( author_id INT PRIMARY KEY, author_name VARCHAR(100), author_country VARCHAR(50) ); CREATE TABLE publishers ( publisher_id INT PRIMARY KEY, publisher_name VARCHAR(100), publisher_city VARCHAR(50) ); CREATE TABLE book_sales ( sale_id INT PRIMARY KEY, book_isbn VARCHAR(13), quantity INT, sale_date DATE, FOREIGN KEY (book_isbn) REFERENCES books(book_isbn) );

Summary

In this lesson, you learned:

  • 3NF eliminates transitive dependencies between non-key columns
  • In 3NF, non-key columns depend only on the primary key
  • BCNF is stricter: every determinant must be a candidate key
  • Most tables in 3NF are also in BCNF
  • Denormalization trades data integrity for query performance
  • Denormalize strategically when proven necessary
  • Use triggers or application logic to maintain denormalized data
Next Up: In the next lesson, we'll learn about choosing the right data types and column design best practices!

ES
Edrees Salih
10 hours ago

We are still cooking the magic in the way!