The Data Dictionary
The Data Dictionary
An ERD is a picture. It shows structure — which entities exist, how they relate, and where foreign keys live. But a picture cannot tell you that status in the appointments table accepts only the values pending, confirmed, completed, and cancelled. It cannot tell you that email must be unique across the entire system, or that discount_rate is a decimal between 0.00 and 1.00, or that notes is optional while patient_id is mandatory. That information lives in the data dictionary.
A data dictionary is a structured catalog that documents every entity, every attribute, and every constraint in a data model. It is the written companion to the ERD — together they form a complete specification of the system's data. No developer, DBA, or future analyst should ever need to guess what a field means or what values it accepts.
What a Data Dictionary Contains
At minimum, a professional data dictionary entry covers the following for each attribute:
- Attribute name — the exact name as it will appear in the database (snake_case by convention)
- Description — a plain-language explanation of what the attribute means in business terms
- Data type — the logical type: String, Integer, Decimal, Date, Boolean, Enum, etc. (not yet platform-specific)
- Length / precision — maximum characters for strings; digits and decimal places for numbers
- Required — whether the attribute is mandatory (NOT NULL) or optional (NULLable)
- Unique — whether duplicate values are forbidden
- Default value — the value assigned when none is provided
- Allowed values / domain — for Enums and constrained fields: the exhaustive list of valid values
- Key role — PK, FK, CK (candidate key), or none
- Foreign key reference — for FKs: which entity and attribute are referenced
- Business rule — any additional constraint that cannot be expressed by the type alone (e.g., "must be a future date", "must be greater than zero")
- Example values — one or two representative samples that clarify ambiguous descriptions
A Complete Example: Clinic Booking System
Consider the appointments entity from a clinic booking system. The ERD shows the entity and its relationships. The data dictionary tells you exactly what each attribute means, what type it holds, and what rules constrain it.
Documenting Enum Domains
Enumerated attributes deserve special attention. Wherever a field accepts a fixed set of values, the data dictionary must list every allowed value and explain what each means. Vague entries like "status (Enum)" without listing the values leave the developer guessing — and guessing produces inconsistent data.
For the status attribute of an online store's orders entity, a proper domain entry reads:
Notice how each value carries a business definition, not just a label. This is the difference between a data dictionary that guides implementation and one that merely lists names.
Documenting Relationships in the Dictionary
The data dictionary does not replace the ERD for relationships, but it should include a relationships section for each entity that summarizes the cardinality rules in plain language. This makes the model readable without needing to decode crow-foot notation.
How to Write Descriptions That Actually Help
The most neglected part of a data dictionary is the description column. Analysts sometimes fill it with obvious restatements: "customer_id — the ID of the customer." This adds no value. A useful description answers questions that the name alone cannot:
- What does it mean in business terms? — "The unique identifier assigned by the CRM system when a customer account is created. Not the same as the loyalty card number."
- When is it set? — "Populated by the booking engine at the moment of confirmation, not at the moment of request."
- What edge cases exist? — "May be null for guest checkouts. Will always be non-null for registered customers."
- What is the unit? — "Stored in whole minutes, not hours or seconds."
reason is sometimes used by doctors to pre-prepare equipment, or the warehouse manager who knows that processing means the pick list has been printed. Interview before you document.
Entity-Level Documentation
Before listing attributes, each entity in the data dictionary should have a short entity header that includes:
- Entity name and corresponding table name
- Business definition — one or two sentences explaining what this entity represents in the real world
- Volume estimate — approximate number of rows today and in three years (helps the DBA plan indexes and partitioning)
- Primary key strategy — surrogate key (auto-increment integer), natural key, or composite key, and why
- Owning system — if this is a multi-system landscape: which system is the master record for this entity
- Retention policy — how long records are kept before archiving or deletion (important for GDPR compliance)
Maintaining the Data Dictionary
A data dictionary that is created once and never updated becomes misleading — worse than having none, because it creates false confidence. The dictionary is a living document. Practices that keep it current:
- Treat dictionary updates as part of the definition of done for any story that changes the data model
- Store it in version control alongside the ERD so that changes are trackable and reversible
- Link it to the requirements specification — every attribute should trace back to at least one requirement
- Schedule a review at the end of each phase to catch drift between the dictionary and the actual database
The Data Dictionary and the ERD Together
Neither the ERD nor the data dictionary alone is sufficient. The ERD without the dictionary is a skeleton — you can see the structure but not the content. The dictionary without the ERD is a table of facts with no sense of how they connect. Together they form a complete logical specification that a developer can implement without ambiguity, a tester can use to verify that constraints are enforced, and a future analyst can use to understand the system years after the original team has moved on.
When you hand over a data model at the end of the analysis phase, you are handing over both artifacts. The ERD goes on the wall. The data dictionary goes in the specification document. Both get reviewed and signed off by stakeholders.
Summary
- A data dictionary documents every entity and attribute: name, type, length, required, unique, default, allowed values, key role, foreign key reference, business rules, and examples.
- Each attribute description should answer what the field means, when it is set, what edge cases exist, and what the unit is.
- Every entity needs a header: business definition, volume estimate, PK strategy, owning system, and retention policy.
- Enum attributes must list every allowed value with a plain-language definition for each.
- The relationships section documents cardinality, FK location, and referential integrity rules (Restrict / Cascade / Set Null) in plain language.
- The data dictionary is a living document: update it with every schema change, version-control it, and link it to requirements.
- The ERD and data dictionary are complementary — neither is sufficient alone. Together they form a complete, unambiguous logical data specification.