Designing the Data Layer from the ERD
Designing the Data Layer from the ERD
By the time a project reaches design, you have an approved Entity-Relationship Diagram (ERD) that captures what data the business needs and how the entities relate. That ERD is a logical model — it describes the world in business terms, free of any database technology. The next step is to translate it into a physical schema: concrete tables, typed columns, indexes, and integrity constraints that a real database engine can enforce. This translation is called logical-to-physical mapping, and it is one of the most consequential decisions a systems analyst makes. Done well, the database supports the application for years. Done carelessly, it forces painful migrations, silently loses data, or grinds queries to a halt under load.
Step 1 — Map Each Entity to a Table
The starting rule is straightforward: every strong entity in the ERD becomes a table, and every attribute becomes a column. But "straightforward" does not mean "thoughtless." Each decision deserves deliberate consideration:
- Naming convention: table names should be consistent — plural snake_case (
clinic_appointments) or singular Pascal-case, but never mixed. Pick one and enforce it across the schema. - Primary key: every table needs a primary key. If the natural key (e.g., a national ID or email address) is stable and compact, it can serve as the PK. More often, a surrogate key — an auto-incrementing integer or a UUID — is chosen because natural keys can change and real-world identifiers are sometimes duplicated.
- Weak entities: a weak entity (one that cannot be identified without its owner) maps to a table whose primary key is a composite of its own partial key plus the FK pointing to the owner table.
Step 2 — Choose Column Data Types Precisely
Every ERD attribute must be assigned a concrete data type. This choice affects storage efficiency, query performance, and data integrity. Below are the most common mapping decisions analysts encounter:
- Short text identifiers (names, codes, status flags) →
VARCHAR(n)with a realistic upper bound — do not default toVARCHAR(255)everywhere; a country code is 2 characters, not 255. - Long text (notes, descriptions, addresses) →
TEXTorNVARCHAR(MAX). - Whole numbers (quantities, counts, age) →
INTorSMALLINT; useBIGINTonly when the range demands it. - Money and rates →
DECIMAL(p, s), neverFLOATorDOUBLE; floating-point arithmetic introduces rounding errors in financial calculations. - Dates and times →
DATE,TIME,DATETIME, orTIMESTAMP— choose based on what you actually store. An appointment date does not need a time component; a log entry does. Store all timestamps in UTC. - Boolean flags →
BOOLEAN(orTINYINT(1)in MySQL) — avoid storing'Y'/'N'strings; they are invisible to query optimizers. - Enumerated values (status, category) → an
ENUMtype or a lookup/reference table. Reference tables are more maintainable when the list changes over time.
Step 3 — Translate Relationships into Foreign Keys
Every relationship line on the ERD becomes one or more foreign-key columns in the physical schema. The cardinality determines where the FK lives:
- One-to-Many (1:N): the FK goes on the "many" side. In a clinic system, one
doctorhas manyappointments, soappointments.doctor_idreferencesdoctors.id. - One-to-One (1:1): the FK can go on either side; place it on the table that is optional or more frequently queried alone.
- Many-to-Many (M:N): create a junction table (also called a bridge or association table) whose PK is the composite of the two FKs. A course-enrollment system has
studentsandcoursesin an M:N relationship — the junction tableenrollmentsholdsstudent_idandcourse_id, plus any attributes of the enrollment (enrollment date, grade).
Step 4 — Define Constraints
Data integrity cannot be enforced by application code alone — users access databases through multiple paths (reporting tools, admin scripts, APIs). The database itself must be the last line of defence. The principal constraint types to specify in your design document are:
- NOT NULL: every column that must always have a value — do not leave nullability to chance. In the clinic schema,
appt_datecan never be null;notescan. - UNIQUE: a single-column or multi-column uniqueness constraint — a patient's phone number must be unique; an invoice line must be unique by
(invoice_id, line_number). - CHECK: inline value validation —
CHECK (price >= 0),CHECK (end_date >= start_date). Always specify these; they prevent impossible states from reaching the database. - FOREIGN KEY with referential action: define
ON DELETEandON UPDATEbehaviour explicitly. Options areRESTRICT,CASCADE,SET NULL, andNO ACTION. For the clinic, deleting a doctor should beRESTRICTif appointments exist — you do not want orphaned records silently created. - DEFAULT values:
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMPremoves the burden from the application and makes audit trailing automatic.
ON DELETE CASCADE is dangerous unless you genuinely want child rows destroyed when the parent is deleted. In an e-commerce system, deleting a product with CASCADE would silently erase all order lines referencing that product — destroying historical sales records. Use RESTRICT by default and justify every CASCADE in writing.
Step 5 — Design Indexes for Performance
A primary-key index is created automatically by every major database engine. But for any non-trivial system, you need to design additional indexes based on the queries the application will run. The principle is: index the columns you filter by and join on, not every column.
- Foreign-key columns: index every FK column. Without an index on
appointments.doctor_id, a query for "all appointments for Dr. Hana" does a full table scan every time. - Frequently filtered columns: if users search appointments by status or date range, those columns need indexes.
- Composite indexes: when queries always filter on two columns together (e.g.,
patient_id + appt_date), a composite index is faster than two separate indexes. - Unique indexes: any
UNIQUEconstraint is implemented as a unique index — declare both together.
Step 6 — Handle Derived Attributes and Multi-Valued Attributes
ERDs sometimes include attributes that do not map directly to a single column:
- Derived attributes (e.g., age, derived from date of birth): do not store them as columns. Store the source (
dob) and let the application or a database view compute the derived value. Storing derived data creates consistency problems as soon as the source changes. - Multi-valued attributes (e.g., a patient can have multiple phone numbers): do not stuff them into a single comma-delimited column. Create a child table —
patient_phones(id, patient_id, phone_number, phone_type)— with a FK back to the parent. This preserves normalization and makes each phone number independently queryable.
Documenting the Physical Schema
The output of this step is not code — it is a data layer specification that becomes a section of the Design Specification Document. For each table, document: table name and purpose, column names with types and nullability, primary key, all foreign keys and their referential actions, all check constraints, all non-PK indexes, and any relevant notes on default values or triggers. This document is reviewed by the DBA, the developers, and the lead analyst before a single line of SQL is written.
In a logistics company that manages shipments, this discipline prevents the silent bugs that arise when a developer assumes weight is an integer while the requirements say it can be fractional, or assumes status is unconstrained while the business has exactly five valid values. The physical schema specification closes those gaps before they become runtime failures.