Data Modeling & ERDs

The Data Dictionary

18 min Lesson 9 of 10

The Data Dictionary

An ERD is a picture. It shows structure — which entities exist, how they relate, and where foreign keys live. But a picture cannot tell you that status in the appointments table accepts only the values pending, confirmed, completed, and cancelled. It cannot tell you that email must be unique across the entire system, or that discount_rate is a decimal between 0.00 and 1.00, or that notes is optional while patient_id is mandatory. That information lives in the data dictionary.

A data dictionary is a structured catalog that documents every entity, every attribute, and every constraint in a data model. It is the written companion to the ERD — together they form a complete specification of the system's data. No developer, DBA, or future analyst should ever need to guess what a field means or what values it accepts.

What a Data Dictionary Contains

At minimum, a professional data dictionary entry covers the following for each attribute:

  • Attribute name — the exact name as it will appear in the database (snake_case by convention)
  • Description — a plain-language explanation of what the attribute means in business terms
  • Data type — the logical type: String, Integer, Decimal, Date, Boolean, Enum, etc. (not yet platform-specific)
  • Length / precision — maximum characters for strings; digits and decimal places for numbers
  • Required — whether the attribute is mandatory (NOT NULL) or optional (NULLable)
  • Unique — whether duplicate values are forbidden
  • Default value — the value assigned when none is provided
  • Allowed values / domain — for Enums and constrained fields: the exhaustive list of valid values
  • Key role — PK, FK, CK (candidate key), or none
  • Foreign key reference — for FKs: which entity and attribute are referenced
  • Business rule — any additional constraint that cannot be expressed by the type alone (e.g., "must be a future date", "must be greater than zero")
  • Example values — one or two representative samples that clarify ambiguous descriptions
Logical, not physical: A data dictionary at the logical modeling stage uses technology-neutral types — String, Integer, Date — not MySQL VARCHAR(255) or Oracle NUMBER(10,2). Platform-specific types appear only in the physical model. Keeping the dictionary logical preserves flexibility to target multiple platforms.

A Complete Example: Clinic Booking System

Consider the appointments entity from a clinic booking system. The ERD shows the entity and its relationships. The data dictionary tells you exactly what each attribute means, what type it holds, and what rules constrain it.

Data Dictionary — Appointments Entity Data Dictionary — Entity: Appointment Attribute Type Required Unique Key Description / Constraint appointment_id Integer Yes Yes PK System-generated unique identifier. Auto-increment. patient_id Integer Yes No FK References Patient(patient_id). Cannot be null. doctor_id Integer Yes No FK References Doctor(doctor_id). Cannot be null. appointment_date Date Yes No Calendar date of the appointment. Must be today or future. start_time Time Yes No Scheduled start time (24-hr). Within clinic opening hours. duration_minutes Integer Yes No Length in minutes. Default: 30. Range: 5–180. status Enum Yes No Allowed: pending | confirmed | completed | cancelled. Default: pending. reason String(500) No No Patient-provided reason for visit. Optional, max 500 chars. booked_at DateTime Yes No Timestamp when booking was created. System-set, read-only. cancelled_at DateTime No No Timestamp of cancellation. NULL when status is not cancelled. Set automatically on cancellation. cancellation_reason String(300) No No Free-text reason for cancellation. NULL if not cancelled.
A full data dictionary entry for the Appointment entity — each attribute documented with its type, constraints, key role, and business rules.

Documenting Enum Domains

Enumerated attributes deserve special attention. Wherever a field accepts a fixed set of values, the data dictionary must list every allowed value and explain what each means. Vague entries like "status (Enum)" without listing the values leave the developer guessing — and guessing produces inconsistent data.

For the status attribute of an online store's orders entity, a proper domain entry reads:

Attribute: status Type: Enum Allowed values: - pending : Order placed but not yet confirmed by the warehouse - processing : Payment verified; warehouse has started picking items - shipped : Package handed to carrier; tracking number issued - delivered : Carrier confirmed delivery to the customer address - cancelled : Order voided before shipping; refund triggered - returned : Customer returned goods after delivery; refund pending Default: pending Business rule: Once status reaches "delivered", it cannot revert to any prior state.

Notice how each value carries a business definition, not just a label. This is the difference between a data dictionary that guides implementation and one that merely lists names.

Documenting Relationships in the Dictionary

The data dictionary does not replace the ERD for relationships, but it should include a relationships section for each entity that summarizes the cardinality rules in plain language. This makes the model readable without needing to decode crow-foot notation.

Relationships Section of a Data Dictionary Relationships — Entity: Order (Online Store) Related Entity Cardinality FK in On Delete Business Description Customer Many Orders → one Customer Order Restrict Each order belongs to exactly one customer. A customer can have many orders. OrderLine One Order → many OrderLines OrderLine Cascade An order contains one or more line items. Deleting an order deletes its lines. Address Many Orders → one Address Order Set Null Shipping address at time of order. Address deletion nullifies reference (history preserved). Coupon Many Orders → zero/one Coupon Order Set Null Optional discount coupon applied to the order. Null when no coupon used. Payment One Order → zero/one Payment Payment Restrict Payment record created when order is paid. Null until payment attempt is made. On Delete options: Restrict — prevent deletion of parent if children exist Set Null — null the FK; child record survives Cascade — delete child records automatically
The relationships section of a data dictionary entry documents cardinality, foreign key location, referential integrity rules, and a plain-language description for each link.

How to Write Descriptions That Actually Help

The most neglected part of a data dictionary is the description column. Analysts sometimes fill it with obvious restatements: "customer_id — the ID of the customer." This adds no value. A useful description answers questions that the name alone cannot:

  • What does it mean in business terms? — "The unique identifier assigned by the CRM system when a customer account is created. Not the same as the loyalty card number."
  • When is it set? — "Populated by the booking engine at the moment of confirmation, not at the moment of request."
  • What edge cases exist? — "May be null for guest checkouts. Will always be non-null for registered customers."
  • What is the unit? — "Stored in whole minutes, not hours or seconds."
Interview source: The best descriptions come from the people who use the data daily — the clinic receptionist who knows that reason is sometimes used by doctors to pre-prepare equipment, or the warehouse manager who knows that processing means the pick list has been printed. Interview before you document.

Entity-Level Documentation

Before listing attributes, each entity in the data dictionary should have a short entity header that includes:

  • Entity name and corresponding table name
  • Business definition — one or two sentences explaining what this entity represents in the real world
  • Volume estimate — approximate number of rows today and in three years (helps the DBA plan indexes and partitioning)
  • Primary key strategy — surrogate key (auto-increment integer), natural key, or composite key, and why
  • Owning system — if this is a multi-system landscape: which system is the master record for this entity
  • Retention policy — how long records are kept before archiving or deletion (important for GDPR compliance)
Common omission: Analysts often document attributes thoroughly but skip the entity header. Six months later, a new developer asks why the table uses a surrogate key instead of the national ID number, or why rows are never deleted. Without the entity header, nobody knows — that decision is lost institutional knowledge. Always write the header.

Maintaining the Data Dictionary

A data dictionary that is created once and never updated becomes misleading — worse than having none, because it creates false confidence. The dictionary is a living document. Practices that keep it current:

  • Treat dictionary updates as part of the definition of done for any story that changes the data model
  • Store it in version control alongside the ERD so that changes are trackable and reversible
  • Link it to the requirements specification — every attribute should trace back to at least one requirement
  • Schedule a review at the end of each phase to catch drift between the dictionary and the actual database

The Data Dictionary and the ERD Together

Neither the ERD nor the data dictionary alone is sufficient. The ERD without the dictionary is a skeleton — you can see the structure but not the content. The dictionary without the ERD is a table of facts with no sense of how they connect. Together they form a complete logical specification that a developer can implement without ambiguity, a tester can use to verify that constraints are enforced, and a future analyst can use to understand the system years after the original team has moved on.

When you hand over a data model at the end of the analysis phase, you are handing over both artifacts. The ERD goes on the wall. The data dictionary goes in the specification document. Both get reviewed and signed off by stakeholders.

Summary

  • A data dictionary documents every entity and attribute: name, type, length, required, unique, default, allowed values, key role, foreign key reference, business rules, and examples.
  • Each attribute description should answer what the field means, when it is set, what edge cases exist, and what the unit is.
  • Every entity needs a header: business definition, volume estimate, PK strategy, owning system, and retention policy.
  • Enum attributes must list every allowed value with a plain-language definition for each.
  • The relationships section documents cardinality, FK location, and referential integrity rules (Restrict / Cascade / Set Null) in plain language.
  • The data dictionary is a living document: update it with every schema change, version-control it, and link it to requirements.
  • The ERD and data dictionary are complementary — neither is sufficient alone. Together they form a complete, unambiguous logical data specification.