We are still cooking the magic in the way!
MySQL & Database Design
Data Types & Column Design
Data Types & Column Design
Choosing the right data types is crucial for database performance, storage efficiency, and data integrity. In this lesson, we'll explore MySQL data types and learn how to select the optimal type for each column.
Why Data Types Matter
Impact of Data Type Choices:
✓ Storage Efficiency
- Right-sized types save disk space
- Smaller tables = faster queries
- Important for large datasets
✓ Performance
- Smaller data types = faster comparisons
- Better index performance
- More rows fit in memory
✓ Data Integrity
- Type constraints prevent invalid data
- Automatic validation
- Clearer data semantics
✓ Future-Proofing
- Choose types that accommodate growth
- Avoid painful migrations later
- Balance between current needs and future scaling
Golden Rule: Use the smallest data type that can reliably hold your data, with room for reasonable growth.
Integer Types: INT vs BIGINT vs TINYINT
MySQL provides several integer types with different storage sizes and value ranges:
Integer Data Types:
TINYINT - 1 byte
Range: -128 to 127 (signed) or 0 to 255 (unsigned)
Use for: Age, status codes, boolean (0/1), small counts
SMALLINT - 2 bytes
Range: -32,768 to 32,767 (signed) or 0 to 65,535 (unsigned)
Use for: Years, small inventory counts, port numbers
MEDIUMINT - 3 bytes
Range: -8,388,608 to 8,388,607 (signed) or 0 to 16,777,215 (unsigned)
Use for: Medium-sized sequences, product IDs
INT - 4 bytes (most common)
Range: -2,147,483,648 to 2,147,483,647 (signed) or 0 to 4,294,967,295 (unsigned)
Use for: Primary keys, foreign keys, user IDs, order IDs
BIGINT - 8 bytes
Range: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 (signed)
Use for: Very large sequences, timestamps in milliseconds, financial amounts in cents
Practical Integer Examples
Good Choices:
CREATE TABLE users (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, -- ✓ 4 billion users max
age TINYINT UNSIGNED, -- ✓ 0-255 is plenty
login_count INT UNSIGNED DEFAULT 0, -- ✓ Handles millions of logins
created_at TIMESTAMP -- ✓ Built-in type for dates
);
CREATE TABLE orders (
id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY, -- ✓ For high-volume systems
user_id INT UNSIGNED, -- ✓ Matches users.id
status TINYINT UNSIGNED, -- ✓ 0=pending, 1=paid, 2=shipped
total_cents BIGINT, -- ✓ Store money in cents
FOREIGN KEY (user_id) REFERENCES users(id)
);
Bad Choices:
CREATE TABLE users_bad (
id TINYINT AUTO_INCREMENT PRIMARY KEY, -- ❌ Only 127 users max!
age VARCHAR(10), -- ❌ Should be numeric
login_count BIGINT, -- ❌ Overkill for most cases
created_at VARCHAR(50) -- ❌ Use TIMESTAMP/DATETIME
);
Common Mistake: Using VARCHAR for numeric data like phone numbers or ZIP codes. While they may contain non-digits, if you never perform math operations on them, VARCHAR is acceptable. But for true numbers, use numeric types.
String Types: VARCHAR vs CHAR vs TEXT
Choosing the right string type affects storage and performance:
CHAR(n) - Fixed length
- Always uses exactly n bytes (padded with spaces)
- Faster for fixed-length data
- Use for: Country codes (CHAR(2)), status flags, MD5 hashes (CHAR(32))
Example: CHAR(2) for "US", "UK", "FR"
VARCHAR(n) - Variable length
- Uses 1-2 bytes + actual string length
- Most common and flexible
- Maximum: 65,535 bytes
- Use for: Names, emails, URLs, descriptions
Example: VARCHAR(100) for email addresses
TEXT - Large text
- Cannot have default value
- Stored outside the row (slower)
- Variants: TINYTEXT (255), TEXT (64KB), MEDIUMTEXT (16MB), LONGTEXT (4GB)
- Use for: Blog posts, comments, rich content
Example: TEXT for article_body
Key Differences:
- CHAR: Fixed, padded, fast for short strings
- VARCHAR: Variable, efficient, most common
- TEXT: Large content, no default, slower indexing
String Type Examples
Optimal String Usage:
CREATE TABLE products (
id INT PRIMARY KEY,
sku CHAR(10), -- ✓ Fixed format: "PROD000123"
name VARCHAR(200), -- ✓ Product names vary
description TEXT, -- ✓ Long descriptions
slug VARCHAR(255) UNIQUE, -- ✓ URL-friendly identifier
short_desc VARCHAR(500) -- ✓ Brief summary
);
CREATE TABLE users (
id INT PRIMARY KEY,
email VARCHAR(255) UNIQUE, -- ✓ Standard email length
first_name VARCHAR(50), -- ✓ Reasonable name length
last_name VARCHAR(50),
country_code CHAR(2), -- ✓ Always 2 chars: "US", "CA"
phone VARCHAR(20), -- ✓ International formats vary
bio TEXT, -- ✓ User biography can be long
password_hash CHAR(60) -- ✓ bcrypt always 60 chars
);
Poor Choices:
CREATE TABLE products_bad (
name CHAR(200), -- ❌ Wastes space if name is "Pen"
description VARCHAR(100), -- ❌ Too short for descriptions
price VARCHAR(10) -- ❌ Should be DECIMAL for money
);
Pro Tip: For email addresses, VARCHAR(255) is standard because the maximum email length according to RFC 5321 is 254 characters.
Date and Time Types
MySQL provides several types for temporal data, each with specific use cases:
DATE - Date only (no time)
- Format: YYYY-MM-DD
- Range: 1000-01-01 to 9999-12-31
- Storage: 3 bytes
- Use for: Birth dates, publication dates, deadlines
DATETIME - Date and time
- Format: YYYY-MM-DD HH:MM:SS
- Range: 1000-01-01 00:00:00 to 9999-12-31 23:59:59
- Storage: 5-8 bytes
- No timezone awareness
- Use for: Event timestamps, scheduled times
TIMESTAMP - Unix timestamp
- Format: YYYY-MM-DD HH:MM:SS
- Range: 1970-01-01 00:00:01 to 2038-01-19 03:14:07
- Storage: 4 bytes
- Automatically updates on row change (if DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP)
- Timezone-aware (converts to UTC)
- Use for: created_at, updated_at, last_login
TIME - Time only (no date)
- Format: HH:MM:SS
- Range: -838:59:59 to 838:59:59
- Use for: Duration, time of day, opening hours
YEAR - Year only
- Format: YYYY
- Range: 1901 to 2155
- Storage: 1 byte
- Use for: Year of manufacture, graduation year
Date/Time Best Practices
Standard Pattern:
CREATE TABLE posts (
id INT PRIMARY KEY,
title VARCHAR(200),
content TEXT,
published_date DATE, -- ✓ Date only
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, -- ✓ Auto-set on insert
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
ON UPDATE CURRENT_TIMESTAMP -- ✓ Auto-update on change
);
CREATE TABLE events (
id INT PRIMARY KEY,
event_name VARCHAR(100),
event_datetime DATETIME, -- ✓ Specific date and time
duration TIME, -- ✓ Event length (e.g., "02:30:00")
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE users (
id INT PRIMARY KEY,
birth_date DATE, -- ✓ No time needed
registered_at TIMESTAMP, -- ✓ Exact registration time
last_login TIMESTAMP -- ✓ Track last activity
);
TIMESTAMP vs DATETIME:
- Use TIMESTAMP for: created_at, updated_at (auto-updating)
- Use DATETIME for: Scheduled events, appointments (no auto-update)
- Use DATE for: Birth dates, deadlines (no time component)
Important: TIMESTAMP has a 2038 problem (32-bit limit). For dates beyond 2038, use DATETIME. MySQL 8.0+ extends TIMESTAMP support, but DATETIME is safer for far-future dates.
ENUM and SET Types
ENUM and SET store predefined sets of values:
ENUM - Choose ONE value from a list
- Storage: 1-2 bytes (stored as integer internally)
- Example: ENUM('draft', 'published', 'archived')
- Use for: Status fields with fixed options
CREATE TABLE posts (
id INT PRIMARY KEY,
title VARCHAR(200),
status ENUM('draft', 'published', 'archived') DEFAULT 'draft',
priority ENUM('low', 'medium', 'high') DEFAULT 'medium'
);
-- Query examples
SELECT * FROM posts WHERE status = 'published';
UPDATE posts SET status = 'archived' WHERE id = 1;
SET - Choose MULTIPLE values from a list
- Storage: 1-8 bytes depending on number of values
- Example: SET('email', 'sms', 'push')
- Use for: Multiple selections, permissions, flags
CREATE TABLE users (
id INT PRIMARY KEY,
name VARCHAR(100),
notifications SET('email', 'sms', 'push', 'newsletter') DEFAULT 'email'
);
INSERT INTO users VALUES (1, 'John', 'email,sms,push');
-- Query examples
SELECT * FROM users WHERE FIND_IN_SET('email', notifications);
ENUM Limitations: Adding or removing values requires ALTER TABLE. For frequently changing options, consider a lookup table instead. ENUM is best for truly static lists (like status codes).
JSON Data Type (MySQL 5.7+)
MySQL's native JSON type stores and validates JSON documents:
JSON Type Benefits:
- Automatic validation (rejects invalid JSON)
- Efficient storage (binary format)
- Fast access with JSON functions
- Flexible schema for variable data
Example Usage:
CREATE TABLE products (
id INT PRIMARY KEY,
name VARCHAR(200),
attributes JSON -- Store flexible product attributes
);
INSERT INTO products VALUES (
1,
'Laptop',
'{"brand": "Dell", "ram": "16GB", "storage": "512GB SSD", "color": "Silver"}'
);
-- Query JSON data
SELECT
name,
JSON_EXTRACT(attributes, '$.brand') as brand,
JSON_EXTRACT(attributes, '$.ram') as ram
FROM products;
-- Shorter syntax (-> operator)
SELECT name, attributes->'$.brand' as brand FROM products;
-- Update JSON field
UPDATE products
SET attributes = JSON_SET(attributes, '$.color', 'Black')
WHERE id = 1;
When to Use JSON:
✓ Flexible attributes (product specs, user settings)
✓ Varying structure across rows
✓ API response storage
✓ Configuration data
When NOT to Use JSON:
❌ Frequently queried fields (use regular columns)
❌ Data requiring complex indexes
❌ Strictly structured data (normalize instead)
Decimal vs Float vs Double
DECIMAL(M,D) - Exact precision
- M = total digits, D = digits after decimal
- Example: DECIMAL(10,2) stores 12345678.90
- No rounding errors
- Use for: Money, prices, quantities requiring exactness
FLOAT - Approximate (4 bytes)
- Precision: ~7 decimal digits
- Use for: Scientific measurements, non-critical decimals
DOUBLE - Approximate (8 bytes)
- Precision: ~15 decimal digits
- Use for: Scientific calculations, coordinates
For Money - ALWAYS use DECIMAL:
CREATE TABLE orders (
id INT PRIMARY KEY,
subtotal DECIMAL(10,2), -- ✓ Exact: $99,999,999.99 max
tax DECIMAL(10,2),
total DECIMAL(10,2),
discount_percent DECIMAL(5,2) -- ✓ 0.00 to 100.00
);
-- Or store as integers (cents)
CREATE TABLE orders_alt (
id INT PRIMARY KEY,
total_cents BIGINT -- ✓ $21,474,836.47 max (if using INT)
);
Practice Exercise:
Fix the data type issues in this table:
CREATE TABLE employees_bad (
id TINYINT PRIMARY KEY,
name VARCHAR(10),
email TEXT,
age VARCHAR(5),
salary FLOAT,
hire_date VARCHAR(50),
is_active VARCHAR(10),
department VARCHAR(255)
);
Improved Version:
CREATE TABLE employees (
id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, -- ✓ Support more than 127 employees
name VARCHAR(100), -- ✓ Longer names
email VARCHAR(255) UNIQUE, -- ✓ Standard email length
age TINYINT UNSIGNED, -- ✓ Numeric type, 0-255
salary DECIMAL(10,2), -- ✓ Exact precision for money
hire_date DATE, -- ✓ Proper date type
is_active BOOLEAN DEFAULT TRUE, -- ✓ TINYINT(1) for true/false
department ENUM('HR','IT','Sales','Finance'), -- ✓ Fixed list of departments
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP
);
Storage Optimization Tips
1. Use UNSIGNED for non-negative values
INT UNSIGNED stores 0 to 4.2 billion (vs -2.1B to 2.1B)
2. Avoid VARCHAR(255) everywhere
Analyze actual data length and choose appropriately
3. Use ENUM for fixed lists
ENUM('yes','no') uses 1 byte vs VARCHAR(3)
4. Store booleans as TINYINT(1) or BOOLEAN
More efficient than VARCHAR('true'/'false')
5. Use appropriate date types
DATE (3 bytes) vs DATETIME (8 bytes) vs VARCHAR(20)
6. Consider CHAR for fixed-length data
CHAR(2) for country codes, CHAR(32) for MD5 hashes
Summary
In this lesson, you learned:
- Choose the smallest data type that reliably holds your data
- INT for most IDs, BIGINT for high-volume or timestamps in milliseconds
- VARCHAR for variable text, CHAR for fixed-length, TEXT for large content
- TIMESTAMP for created_at/updated_at, DATETIME for events, DATE for birth dates
- ENUM for fixed lists, JSON for flexible attributes
- DECIMAL for money (never FLOAT), store cents as BIGINT alternative
- Use UNSIGNED for non-negative values to double positive range
Next Up: In the next lesson, we'll explore database design best practices and complete patterns for real-world applications!