Data Modeling & ERDs

Project: A Complete Data Model

18 min Lesson 10 of 10

Project: A Complete Data Model

Throughout this tutorial you have studied each component of data modeling in isolation: entities and attributes, cardinality notation, normalization rules, junction entities, keys, and the data dictionary. In this final lesson you bring all of those skills together on a single, realistic system — a clinic appointment booking system — and produce the three deliverables a professional analyst hands to a development team: a fully drawn ERD, a normalization audit, and a data dictionary.

Working through a complete example end-to-end is where the real learning happens. Isolated rules feel abstract; applying them to one coherent system reveals how every design decision affects every other one.

Step 1: Understand the Business Domain

Before drawing a single box, read the narrative carefully. Here is the scenario:

Scenario — City Clinic Booking System: The clinic employs several doctors, each with one or more specialties. Patients register once and can book many appointments. Each appointment is with exactly one doctor on a specific date and time. After the appointment the doctor records a diagnosis and can prescribe one or more medications. The clinic needs to track which medications were prescribed at each visit, along with dosage and duration. Receptionists log cancellations and rescheduling with a reason.

From this paragraph a skilled analyst extracts the key nouns (candidate entities), the key verbs (candidate relationships), and the descriptive words (candidate attributes). The shortlist is: Patient, Doctor, Specialty, Appointment, Diagnosis, Medication, Prescription.

Step 2: Identify Relationships and Cardinality

Work through every entity pair and ask: "Can one instance of A be related to many instances of B, and vice versa?" The results:

  • Patient — Appointment: One patient, many appointments (1:M).
  • Doctor — Appointment: One doctor, many appointments (1:M).
  • Appointment — Diagnosis: One appointment produces at most one diagnosis (1:1 or 1:0..1).
  • Diagnosis — Medication: Many-to-many — a diagnosis can involve multiple medications, and a medication appears in many diagnoses. Resolve via Prescription junction.
  • Doctor — Specialty: A doctor can have multiple specialties; a specialty is shared by many doctors. Resolve via DoctorSpecialty junction.

Step 3: The Complete ERD

The diagram below uses crow-foot notation. Each entity box lists its primary key (PK), foreign keys (FK), and main attributes. Relationship lines carry optionality markers (circle = optional, dash = mandatory) and crow-feet for "many".

Complete ERD for the Clinic Booking System Patient PK patientID fullName dateOfBirth phone email registeredAt Doctor PK doctorID fullName licenseNo phone hiredAt DoctorSpecialty FK doctorID FK specialtyID certifiedYear Specialty PK specialtyID specialtyName Appointment PK appointmentID FK patientID FK doctorID scheduledAt status cancelReason createdAt Diagnosis PK diagnosisID FK appointmentID icdCode notes Prescription PK prescriptionID FK diagnosisID FK medicationID dosage / days Medication PK medicationID brandName genericName formulation 1 M 1 M 0..1 1 M M 1 1 : M M : 1
Complete crow-foot ERD for the Clinic Booking System — 7 entities, all relationships resolved.
Reading the diagram: Follow any path from Patient rightward through Appointment down to Diagnosis and on to Prescription / Medication. That chain represents the full lifecycle of a patient visit — from booking to treatment.

Step 4: Normalization Audit

Before signing off on a model, always verify that each entity satisfies at least Third Normal Form (3NF). Run through the checklist:

  1. 1NF: Every cell holds one atomic value. Check that Appointment.status is a single value (not "confirmed, paid") and that there are no repeating groups of medications inside the Appointment row — and there are not, because we placed medications in their own Prescription junction. ✓
  2. 2NF: Every non-key attribute depends on the whole primary key. Prescription has a composite candidate key (diagnosisID, medicationID). The attributes dosage and days describe this specific prescription event — not the medication alone and not the diagnosis alone — so 2NF is satisfied. ✓
  3. 3NF: No transitive dependencies. In the original naive sketch, some analysts put doctorSpecialty directly on the Doctor table as a text field. That would introduce a transitive dependency when multiple specialties are stored. Moving it to DoctorSpecialty and Specialty eliminates the transitive path. ✓
Common pitfall: Storing the patient's age as a column instead of dateOfBirth violates the principle that derived data must not be persisted. Age changes every year; dateOfBirth never does. Always store the raw fact, not the calculated result.

Step 5: The Data Dictionary (excerpt)

The ERD shows structure; the data dictionary defines meaning. Below is a representative excerpt covering the Appointment entity — the most operationally significant table in this system.

Table: Appointment Primary Key: appointmentID Column | Type | Nullable | Constraints | Description ---------------|---------------|----------|----------------------|---------------------------------------------- appointmentID | INT UNSIGNED | No | PK, AUTO_INCREMENT | Surrogate key, system-generated. patientID | INT UNSIGNED | No | FK → Patient(PK) | Links to the registered patient. doctorID | INT UNSIGNED | No | FK → Doctor(PK) | Links to the attending doctor. scheduledAt | DATETIME | No | — | Date and time of the slot (UTC stored). status | ENUM | No | DEFAULT 'pending' | Values: pending | confirmed | completed | cancelled. cancelReason | VARCHAR(500) | Yes | NULL unless status | Receptionist note. Required when status = cancelled. | | | = cancelled | createdAt | DATETIME | No | DEFAULT NOW() | Row insertion timestamp.
Why the data dictionary matters: Two developers reading only the ERD might implement scheduledAt in local time on one server and UTC on another. The data dictionary prevents this ambiguity by specifying storage rules and valid value sets alongside the structural diagram.

Step 6: Traceability Back to Requirements

Every entity in this model can be traced to at least one business requirement from the scenario. This is the final quality check:

  • Patient — "Patients register once and can book many appointments."
  • Doctor / DoctorSpecialty / Specialty — "Each doctor has one or more specialties."
  • Appointment — "Each appointment is with exactly one doctor on a specific date and time."
  • Diagnosis — "The doctor records a diagnosis."
  • Prescription / Medication — "Can prescribe one or more medications … dosage and duration."
  • cancelReason on Appointment — "Receptionists log cancellations … with a reason."

If any entity or attribute cannot be traced to a requirement, it is either missing a requirement or it is speculative scope creep. Remove or flag it for stakeholder review.

Professional habit: Maintain a simple traceability matrix — a table mapping each requirement ID to the entity/attribute that satisfies it. This becomes invaluable during design reviews and change requests.

Delivering the Model

A complete data modeling deliverable for a development team consists of three documents: the conceptual ERD (business-level, no keys), the logical ERD (normalized, with keys and cardinality — this lesson's diagram), and the data dictionary (column-level specification). Together they give architects enough to design a physical schema and enough semantic depth to write meaningful validation rules, reports, and integration contracts.

Congratulations — you have now applied every concept from this tutorial to a single, coherent, production-grade data model. The same workflow — narrative extraction, entity identification, relationship analysis, ERD drawing, normalization audit, data dictionary — scales from a five-table system to an enterprise schema of five hundred tables.

Tutorial Complete!

Congratulations! You have completed all lessons in this tutorial.