MySQL & Database Design

Data Types & Column Design

13 min Lesson 5 of 40

Data Types & Column Design

Choosing the right data types is crucial for database performance, storage efficiency, and data integrity. In this lesson, we'll explore MySQL data types and learn how to select the optimal type for each column.

Why Data Types Matter

Impact of Data Type Choices:Storage Efficiency - Right-sized types save disk space - Smaller tables = faster queries - Important for large datasets ✓ Performance - Smaller data types = faster comparisons - Better index performance - More rows fit in memory ✓ Data Integrity - Type constraints prevent invalid data - Automatic validation - Clearer data semantics ✓ Future-Proofing - Choose types that accommodate growth - Avoid painful migrations later - Balance between current needs and future scaling
Golden Rule: Use the smallest data type that can reliably hold your data, with room for reasonable growth.

Integer Types: INT vs BIGINT vs TINYINT

MySQL provides several integer types with different storage sizes and value ranges:

Integer Data Types: TINYINT - 1 byte Range: -128 to 127 (signed) or 0 to 255 (unsigned) Use for: Age, status codes, boolean (0/1), small counts SMALLINT - 2 bytes Range: -32,768 to 32,767 (signed) or 0 to 65,535 (unsigned) Use for: Years, small inventory counts, port numbers MEDIUMINT - 3 bytes Range: -8,388,608 to 8,388,607 (signed) or 0 to 16,777,215 (unsigned) Use for: Medium-sized sequences, product IDs INT - 4 bytes (most common) Range: -2,147,483,648 to 2,147,483,647 (signed) or 0 to 4,294,967,295 (unsigned) Use for: Primary keys, foreign keys, user IDs, order IDs BIGINT - 8 bytes Range: -9,223,372,036,854,775,808 to 9,223,372,036,854,775,807 (signed) Use for: Very large sequences, timestamps in milliseconds, financial amounts in cents

Practical Integer Examples

Good Choices: CREATE TABLE users ( id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, -- ✓ 4 billion users max age TINYINT UNSIGNED, -- ✓ 0-255 is plenty login_count INT UNSIGNED DEFAULT 0, -- ✓ Handles millions of logins created_at TIMESTAMP -- ✓ Built-in type for dates ); CREATE TABLE orders ( id BIGINT UNSIGNED AUTO_INCREMENT PRIMARY KEY, -- ✓ For high-volume systems user_id INT UNSIGNED, -- ✓ Matches users.id status TINYINT UNSIGNED, -- ✓ 0=pending, 1=paid, 2=shipped total_cents BIGINT, -- ✓ Store money in cents FOREIGN KEY (user_id) REFERENCES users(id) ); Bad Choices: CREATE TABLE users_bad ( id TINYINT AUTO_INCREMENT PRIMARY KEY, -- ❌ Only 127 users max! age VARCHAR(10), -- ❌ Should be numeric login_count BIGINT, -- ❌ Overkill for most cases created_at VARCHAR(50) -- ❌ Use TIMESTAMP/DATETIME );
Common Mistake: Using VARCHAR for numeric data like phone numbers or ZIP codes. While they may contain non-digits, if you never perform math operations on them, VARCHAR is acceptable. But for true numbers, use numeric types.

String Types: VARCHAR vs CHAR vs TEXT

Choosing the right string type affects storage and performance:

CHAR(n) - Fixed length - Always uses exactly n bytes (padded with spaces) - Faster for fixed-length data - Use for: Country codes (CHAR(2)), status flags, MD5 hashes (CHAR(32)) Example: CHAR(2) for "US", "UK", "FR" VARCHAR(n) - Variable length - Uses 1-2 bytes + actual string length - Most common and flexible - Maximum: 65,535 bytes - Use for: Names, emails, URLs, descriptions Example: VARCHAR(100) for email addresses TEXT - Large text - Cannot have default value - Stored outside the row (slower) - Variants: TINYTEXT (255), TEXT (64KB), MEDIUMTEXT (16MB), LONGTEXT (4GB) - Use for: Blog posts, comments, rich content Example: TEXT for article_body Key Differences: - CHAR: Fixed, padded, fast for short strings - VARCHAR: Variable, efficient, most common - TEXT: Large content, no default, slower indexing

String Type Examples

Optimal String Usage: CREATE TABLE products ( id INT PRIMARY KEY, sku CHAR(10), -- ✓ Fixed format: "PROD000123" name VARCHAR(200), -- ✓ Product names vary description TEXT, -- ✓ Long descriptions slug VARCHAR(255) UNIQUE, -- ✓ URL-friendly identifier short_desc VARCHAR(500) -- ✓ Brief summary ); CREATE TABLE users ( id INT PRIMARY KEY, email VARCHAR(255) UNIQUE, -- ✓ Standard email length first_name VARCHAR(50), -- ✓ Reasonable name length last_name VARCHAR(50), country_code CHAR(2), -- ✓ Always 2 chars: "US", "CA" phone VARCHAR(20), -- ✓ International formats vary bio TEXT, -- ✓ User biography can be long password_hash CHAR(60) -- ✓ bcrypt always 60 chars ); Poor Choices: CREATE TABLE products_bad ( name CHAR(200), -- ❌ Wastes space if name is "Pen" description VARCHAR(100), -- ❌ Too short for descriptions price VARCHAR(10) -- ❌ Should be DECIMAL for money );
Pro Tip: For email addresses, VARCHAR(255) is standard because the maximum email length according to RFC 5321 is 254 characters.

Date and Time Types

MySQL provides several types for temporal data, each with specific use cases:

DATE - Date only (no time) - Format: YYYY-MM-DD - Range: 1000-01-01 to 9999-12-31 - Storage: 3 bytes - Use for: Birth dates, publication dates, deadlines DATETIME - Date and time - Format: YYYY-MM-DD HH:MM:SS - Range: 1000-01-01 00:00:00 to 9999-12-31 23:59:59 - Storage: 5-8 bytes - No timezone awareness - Use for: Event timestamps, scheduled times TIMESTAMP - Unix timestamp - Format: YYYY-MM-DD HH:MM:SS - Range: 1970-01-01 00:00:01 to 2038-01-19 03:14:07 - Storage: 4 bytes - Automatically updates on row change (if DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP) - Timezone-aware (converts to UTC) - Use for: created_at, updated_at, last_login TIME - Time only (no date) - Format: HH:MM:SS - Range: -838:59:59 to 838:59:59 - Use for: Duration, time of day, opening hours YEAR - Year only - Format: YYYY - Range: 1901 to 2155 - Storage: 1 byte - Use for: Year of manufacture, graduation year

Date/Time Best Practices

Standard Pattern: CREATE TABLE posts ( id INT PRIMARY KEY, title VARCHAR(200), content TEXT, published_date DATE, -- ✓ Date only created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, -- ✓ Auto-set on insert updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP -- ✓ Auto-update on change ); CREATE TABLE events ( id INT PRIMARY KEY, event_name VARCHAR(100), event_datetime DATETIME, -- ✓ Specific date and time duration TIME, -- ✓ Event length (e.g., "02:30:00") created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ); CREATE TABLE users ( id INT PRIMARY KEY, birth_date DATE, -- ✓ No time needed registered_at TIMESTAMP, -- ✓ Exact registration time last_login TIMESTAMP -- ✓ Track last activity ); TIMESTAMP vs DATETIME: - Use TIMESTAMP for: created_at, updated_at (auto-updating) - Use DATETIME for: Scheduled events, appointments (no auto-update) - Use DATE for: Birth dates, deadlines (no time component)
Important: TIMESTAMP has a 2038 problem (32-bit limit). For dates beyond 2038, use DATETIME. MySQL 8.0+ extends TIMESTAMP support, but DATETIME is safer for far-future dates.

ENUM and SET Types

ENUM and SET store predefined sets of values:

ENUM - Choose ONE value from a list - Storage: 1-2 bytes (stored as integer internally) - Example: ENUM('draft', 'published', 'archived') - Use for: Status fields with fixed options CREATE TABLE posts ( id INT PRIMARY KEY, title VARCHAR(200), status ENUM('draft', 'published', 'archived') DEFAULT 'draft', priority ENUM('low', 'medium', 'high') DEFAULT 'medium' ); -- Query examples SELECT * FROM posts WHERE status = 'published'; UPDATE posts SET status = 'archived' WHERE id = 1; SET - Choose MULTIPLE values from a list - Storage: 1-8 bytes depending on number of values - Example: SET('email', 'sms', 'push') - Use for: Multiple selections, permissions, flags CREATE TABLE users ( id INT PRIMARY KEY, name VARCHAR(100), notifications SET('email', 'sms', 'push', 'newsletter') DEFAULT 'email' ); INSERT INTO users VALUES (1, 'John', 'email,sms,push'); -- Query examples SELECT * FROM users WHERE FIND_IN_SET('email', notifications);
ENUM Limitations: Adding or removing values requires ALTER TABLE. For frequently changing options, consider a lookup table instead. ENUM is best for truly static lists (like status codes).

JSON Data Type (MySQL 5.7+)

MySQL's native JSON type stores and validates JSON documents:

JSON Type Benefits: - Automatic validation (rejects invalid JSON) - Efficient storage (binary format) - Fast access with JSON functions - Flexible schema for variable data Example Usage: CREATE TABLE products ( id INT PRIMARY KEY, name VARCHAR(200), attributes JSON -- Store flexible product attributes ); INSERT INTO products VALUES ( 1, 'Laptop', '{"brand": "Dell", "ram": "16GB", "storage": "512GB SSD", "color": "Silver"}' ); -- Query JSON data SELECT name, JSON_EXTRACT(attributes, '$.brand') as brand, JSON_EXTRACT(attributes, '$.ram') as ram FROM products; -- Shorter syntax (-> operator) SELECT name, attributes->'$.brand' as brand FROM products; -- Update JSON field UPDATE products SET attributes = JSON_SET(attributes, '$.color', 'Black') WHERE id = 1; When to Use JSON: ✓ Flexible attributes (product specs, user settings) ✓ Varying structure across rows ✓ API response storage ✓ Configuration data When NOT to Use JSON: ❌ Frequently queried fields (use regular columns) ❌ Data requiring complex indexes ❌ Strictly structured data (normalize instead)

Decimal vs Float vs Double

DECIMAL(M,D) - Exact precision - M = total digits, D = digits after decimal - Example: DECIMAL(10,2) stores 12345678.90 - No rounding errors - Use for: Money, prices, quantities requiring exactness FLOAT - Approximate (4 bytes) - Precision: ~7 decimal digits - Use for: Scientific measurements, non-critical decimals DOUBLE - Approximate (8 bytes) - Precision: ~15 decimal digits - Use for: Scientific calculations, coordinates For Money - ALWAYS use DECIMAL: CREATE TABLE orders ( id INT PRIMARY KEY, subtotal DECIMAL(10,2), -- ✓ Exact: $99,999,999.99 max tax DECIMAL(10,2), total DECIMAL(10,2), discount_percent DECIMAL(5,2) -- ✓ 0.00 to 100.00 ); -- Or store as integers (cents) CREATE TABLE orders_alt ( id INT PRIMARY KEY, total_cents BIGINT -- ✓ $21,474,836.47 max (if using INT) );

Practice Exercise:

Fix the data type issues in this table:

CREATE TABLE employees_bad ( id TINYINT PRIMARY KEY, name VARCHAR(10), email TEXT, age VARCHAR(5), salary FLOAT, hire_date VARCHAR(50), is_active VARCHAR(10), department VARCHAR(255) );

Improved Version:

CREATE TABLE employees ( id INT UNSIGNED AUTO_INCREMENT PRIMARY KEY, -- ✓ Support more than 127 employees name VARCHAR(100), -- ✓ Longer names email VARCHAR(255) UNIQUE, -- ✓ Standard email length age TINYINT UNSIGNED, -- ✓ Numeric type, 0-255 salary DECIMAL(10,2), -- ✓ Exact precision for money hire_date DATE, -- ✓ Proper date type is_active BOOLEAN DEFAULT TRUE, -- ✓ TINYINT(1) for true/false department ENUM('HR','IT','Sales','Finance'), -- ✓ Fixed list of departments created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP, updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP );

Storage Optimization Tips

1. Use UNSIGNED for non-negative values INT UNSIGNED stores 0 to 4.2 billion (vs -2.1B to 2.1B) 2. Avoid VARCHAR(255) everywhere Analyze actual data length and choose appropriately 3. Use ENUM for fixed lists ENUM('yes','no') uses 1 byte vs VARCHAR(3) 4. Store booleans as TINYINT(1) or BOOLEAN More efficient than VARCHAR('true'/'false') 5. Use appropriate date types DATE (3 bytes) vs DATETIME (8 bytes) vs VARCHAR(20) 6. Consider CHAR for fixed-length data CHAR(2) for country codes, CHAR(32) for MD5 hashes

Summary

In this lesson, you learned:

  • Choose the smallest data type that reliably holds your data
  • INT for most IDs, BIGINT for high-volume or timestamps in milliseconds
  • VARCHAR for variable text, CHAR for fixed-length, TEXT for large content
  • TIMESTAMP for created_at/updated_at, DATETIME for events, DATE for birth dates
  • ENUM for fixed lists, JSON for flexible attributes
  • DECIMAL for money (never FLOAT), store cents as BIGINT alternative
  • Use UNSIGNED for non-negative values to double positive range
Next Up: In the next lesson, we'll explore database design best practices and complete patterns for real-world applications!

ES
Edrees Salih
11 hours ago

We are still cooking the magic in the way!