Normalization - 1NF & 2NF
Normalization is the process of organizing database tables to reduce redundancy and improve data integrity. In this lesson, we'll explore the first two normal forms and learn how to apply them to your database design.
What is Normalization and Why Normalize?
Normalization is a systematic approach to decomposing tables to eliminate data redundancy and undesirable characteristics like insertion, update, and deletion anomalies.
Key Goal: Normalization ensures that each piece of data is stored in exactly one place, making your database more efficient and easier to maintain.
Benefits of Normalization:
✓ Eliminates Data Redundancy
- No duplicate information
- Saves storage space
- Reduces inconsistencies
✓ Improves Data Integrity
- Single source of truth
- Easier to enforce constraints
- Fewer update errors
✓ Simplifies Maintenance
- Changes made in one place
- Less complex update logic
- Reduced risk of errors
✓ Better Query Performance
- Smaller table sizes
- More efficient indexes
- Faster searches
Understanding Anomalies
Before normalization, databases often suffer from three types of anomalies:
❌ Example: Unnormalized Student Table
CREATE TABLE students_bad (
student_id INT,
student_name VARCHAR(100),
courses VARCHAR(200),
instructors VARCHAR(200),
departments VARCHAR(200)
);
INSERT INTO students_bad VALUES
(1, 'John', 'Math,Physics', 'Dr. Smith,Dr. Jones', 'Science,Science'),
(2, 'Jane', 'Math', 'Dr. Smith', 'Science');
Problems:
1. Multiple values stored in single columns (violates 1NF)
2. Difficult to query specific courses
3. Cannot easily add a new course without a student
4. Updating instructor name requires multiple changes
Types of Anomalies
1. Insertion Anomaly
Problem: Cannot add data without adding unrelated data
Example: Cannot add a new course unless a student enrolls
2. Update Anomaly
Problem: Must update data in multiple places
Example: If instructor name changes, must update all student records
3. Deletion Anomaly
Problem: Deleting data causes loss of other unrelated data
Example: If last student drops a course, we lose course information
Real-World Impact: A major e-commerce site once had product names stored with each order. When they fixed a typo in a product name, thousands of past orders still showed the old spelling!
First Normal Form (1NF)
A table is in First Normal Form if it satisfies these rules:
1NF Rules:
1. Each column contains atomic values
- No multiple values in one cell
- No comma-separated lists
- No arrays or sets
2. Each column contains values of the same type
- All values in a column have the same data type
- Consistent format
3. Each column has a unique name
- No duplicate column names
- Clear, descriptive names
4. Order of rows does not matter
- Results should be same regardless of row order
- Use ORDER BY for sorting, not physical order
Converting to 1NF
Let's fix our students table to comply with 1NF:
❌ NOT in 1NF (Multiple Values in Cells):
CREATE TABLE students_bad (
student_id INT,
student_name VARCHAR(100),
courses VARCHAR(200) -- ❌ Stores "Math,Physics,Chemistry"
);
✓ In 1NF (Atomic Values):
CREATE TABLE students (
student_id INT,
student_name VARCHAR(100),
course VARCHAR(100) -- ✓ One value per cell
);
INSERT INTO students VALUES
(1, 'John', 'Math'),
(1, 'John', 'Physics'),
(1, 'John', 'Chemistry'),
(2, 'Jane', 'Math');
Result:
- Each cell contains exactly one value
- No comma-separated lists
- Data is now queryable and maintainable
Quick Test: If you need to use LIKE '%value%' or string functions to find data, you probably violate 1NF.
1NF Example: E-Commerce Orders
❌ NOT in 1NF:
CREATE TABLE orders_bad (
order_id INT PRIMARY KEY,
customer_name VARCHAR(100),
products VARCHAR(500), -- "Laptop,Mouse,Keyboard"
quantities VARCHAR(100) -- "1,2,1"
);
✓ In 1NF:
CREATE TABLE orders (
order_id INT,
customer_name VARCHAR(100),
product VARCHAR(100),
quantity INT,
PRIMARY KEY (order_id, product)
);
INSERT INTO orders VALUES
(1001, 'Alice', 'Laptop', 1),
(1001, 'Alice', 'Mouse', 2),
(1001, 'Alice', 'Keyboard', 1);
Benefits:
- Easy to query: SELECT SUM(quantity) FROM orders WHERE product = 'Mouse'
- Easy to update: UPDATE orders SET quantity = 3 WHERE product = 'Mouse'
- No parsing needed
Second Normal Form (2NF)
A table is in Second Normal Form if:
2NF Rules:
1. Must be in 1NF
- All 1NF rules satisfied
2. No partial dependencies
- Every non-key column must depend on the ENTIRE primary key
- Applies only to tables with composite keys
- Non-key attributes cannot depend on part of the key
Understanding Partial Dependencies
A partial dependency occurs when a non-key column depends on only part of a composite primary key.
❌ NOT in 2NF (Has Partial Dependency):
CREATE TABLE order_details (
order_id INT,
product_id INT,
customer_name VARCHAR(100), -- ❌ Depends only on order_id
customer_email VARCHAR(100), -- ❌ Depends only on order_id
product_name VARCHAR(100), -- ❌ Depends only on product_id
product_price DECIMAL(10,2), -- ❌ Depends only on product_id
quantity INT, -- ✓ Depends on BOTH keys
PRIMARY KEY (order_id, product_id)
);
Problems:
- customer_name depends only on order_id (not product_id)
- product_name depends only on product_id (not order_id)
- This creates redundancy and anomalies
Converting to 2NF
To convert to 2NF, separate the partially dependent columns into their own tables:
✓ In 2NF (No Partial Dependencies):
-- Customer info depends only on order
CREATE TABLE orders (
order_id INT PRIMARY KEY,
customer_name VARCHAR(100),
customer_email VARCHAR(100),
order_date DATE
);
-- Product info depends only on product
CREATE TABLE products (
product_id INT PRIMARY KEY,
product_name VARCHAR(100),
product_price DECIMAL(10,2),
stock_quantity INT
);
-- Junction table - only columns that depend on BOTH keys
CREATE TABLE order_items (
order_id INT,
product_id INT,
quantity INT,
price_at_purchase DECIMAL(10,2),
PRIMARY KEY (order_id, product_id),
FOREIGN KEY (order_id) REFERENCES orders(order_id),
FOREIGN KEY (product_id) REFERENCES products(product_id)
);
Benefits:
- No redundancy: customer_name stored once per order
- No update anomalies: changing product price doesn't affect old orders
- Clear separation of concerns
Important: Tables with single-column primary keys automatically satisfy 2NF because there are no partial dependencies possible!
Practical 2NF Example: Course Enrollment
❌ NOT in 2NF:
CREATE TABLE enrollment_bad (
student_id INT,
course_id INT,
student_name VARCHAR(100), -- Partial dependency on student_id
student_email VARCHAR(100), -- Partial dependency on student_id
course_name VARCHAR(100), -- Partial dependency on course_id
instructor VARCHAR(100), -- Partial dependency on course_id
grade VARCHAR(2), -- Full dependency ✓
enrollment_date DATE, -- Full dependency ✓
PRIMARY KEY (student_id, course_id)
);
✓ In 2NF:
CREATE TABLE students (
student_id INT PRIMARY KEY,
student_name VARCHAR(100),
student_email VARCHAR(100)
);
CREATE TABLE courses (
course_id INT PRIMARY KEY,
course_name VARCHAR(100),
instructor VARCHAR(100),
credits INT
);
CREATE TABLE enrollments (
student_id INT,
course_id INT,
grade VARCHAR(2),
enrollment_date DATE,
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (student_id) REFERENCES students(student_id),
FOREIGN KEY (course_id) REFERENCES courses(course_id)
);
Testing for 1NF and 2NF
1NF Checklist:
□ No repeating groups or arrays
□ Each cell contains exactly one value
□ All entries in a column are of the same type
□ Each column has a unique name
□ Each row is unique (has a primary key)
2NF Checklist:
□ Table is in 1NF
□ Table has a primary key defined
□ If composite key exists:
□ All non-key columns depend on entire key
□ No column depends on only part of the key
□ If single-column key: automatically in 2NF
Practice Exercise:
Identify the problems and convert to 2NF:
CREATE TABLE project_assignments (
employee_id INT,
project_id INT,
employee_name VARCHAR(100),
department VARCHAR(50),
project_name VARCHAR(100),
project_budget DECIMAL(12,2),
hours_worked INT,
PRIMARY KEY (employee_id, project_id)
);
Solution:
Problems:
- employee_name, department depend only on employee_id (partial)
- project_name, project_budget depend only on project_id (partial)
2NF Solution:
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
employee_name VARCHAR(100),
department VARCHAR(50)
);
CREATE TABLE projects (
project_id INT PRIMARY KEY,
project_name VARCHAR(100),
project_budget DECIMAL(12,2)
);
CREATE TABLE assignments (
employee_id INT,
project_id INT,
hours_worked INT,
PRIMARY KEY (employee_id, project_id),
FOREIGN KEY (employee_id) REFERENCES employees(employee_id),
FOREIGN KEY (project_id) REFERENCES projects(project_id)
);
Summary
In this lesson, you learned:
- Normalization eliminates redundancy and prevents data anomalies
- 1NF requires atomic values - no multiple values in one cell
- 1NF eliminates repeating groups and ensures each cell has one value
- 2NF eliminates partial dependencies on composite keys
- 2NF requires all non-key columns to depend on the entire primary key
- Tables with single-column keys automatically satisfy 2NF
Next Up: In the next lesson, we'll continue with Third Normal Form (3NF) and Boyce-Codd Normal Form (BCNF)!