Requirements Gathering Techniques

Document & Artifact Analysis

18 min Lesson 5 of 10

Document & Artifact Analysis

Before you schedule a single interview or send a survey, an enormous body of requirements evidence already exists — hiding in plain sight inside the organisation you are analysing. Every form a receptionist fills in, every report a manager prints on Monday morning, every screen a warehouse picker taps during a shift, every policy document approved by the board — each one is a frozen snapshot of what the business currently does, what it considers important, and where the pain points lie. Document and artifact analysis is the discipline of mining that evidence systematically before you speak to a single person.

This technique is especially valuable at the start of a project, when stakeholders are busy, when the domain is unfamiliar, or when you suspect that what people say they do differs from what they actually do. Documents do not have opinions, do not forget details, and cannot be embarrassed. They are the closest thing to an objective record of the current system.

Core principle: Existing documents and artifacts encode implicit requirements. Your job is to decode them — extracting the data fields, business rules, volumes, and pain signals that stakeholders take for granted and would never think to tell you.

What Counts as a Document or Artifact?

The category is broader than most analysts initially assume. Think in four groups:

  • Operational forms and templates — paper or digital forms that capture transactions. A clinic's patient intake form, a logistics firm's shipment manifest, an online store's return authorisation slip. These reveal every data field the business currently collects and (often) which fields staff actually fill in versus which are left blank.
  • Reports and dashboards — the outputs stakeholders consume. A warehouse manager's daily picking report, a finance director's monthly reconciliation PDF, a sales dashboard exported to Excel every Friday. Reports tell you which data aggregations matter, what KPIs drive decisions, and what information the current system fails to produce (often revealed by manual additions in the margins of a printed report).
  • Existing system interfaces — screens, menus, and error messages from legacy applications, ERPs, and CRMs. A screenshot of the current booking system tells you the data model in use. A list of validation error messages tells you the business rules already encoded in software.
  • Policy, procedure, and compliance documents — standard operating procedures (SOPs), regulatory filings, training manuals. These capture business rules and constraints that may not change in the new system, so they must flow forward as non-negotiable requirements.

The Analysis Process

Raw documents do not speak for themselves — you must interrogate them. A reliable five-step process follows:

  1. Inventory and classify. List every document type you can obtain. Note its format (paper, PDF, spreadsheet, screen), its owner (who produces it), its consumer (who reads it), and its frequency (daily, monthly, ad hoc). This inventory becomes your artifact register — a living log you update throughout the project.
  2. Extract data fields. For each form or report, list every distinct data element it captures or displays. A clinic intake form might yield: patient ID, full name, date of birth, national ID, insurance number, referring physician, chief complaint, allergy list, consent signature, and date/time of intake. These fields are candidate data requirements for the new system.
  3. Surface business rules. Every validation, every mandatory field, every dropdown list, every calculation in a report encodes a business rule. "Insurance number is mandatory only if the patient selects insurance as the payment method" is a conditional business rule hiding inside a form. "Total due = line items + 15% VAT — applicable discount" is a calculation rule hiding inside an invoice template.
  4. Identify volumes and frequencies. Count the instances. How many forms are submitted per day? How many line items does the average invoice contain? How many concurrent users does the report imply? A logistics firm processing 4,000 shipment manifests per day has very different non-functional requirements from one processing 40.
  5. Flag gaps, anomalies, and workarounds. This is where the richest insights live. Look for: fields that are always crossed out or overwritten, columns added in handwriting to a printed form, Excel macros that pre-process data before it enters a system, Post-it notes stuck to a monitor explaining a known bug, email threads cc'd to a distribution list as a substitute for a missing notification feature.
Document Analysis — Five-Step Process 1. Inventory List & classify all artifacts 2. Extract Data fields per document 3. Rules Surface business rules & logic 4. Volumes Count instances, rates, sizes 5. Flag Gaps & Workarounds Overwritten fields · handwritten additions · Excel macros · sticky notes → each is a hidden pain signal or missing feature Outputs Artifact Register · Candidate Data Requirements · Business Rules Catalogue Volume/Frequency Profile · Gap & Workaround Log
The five-step document analysis process, from artifact inventory through to the deliverables that feed your requirements specification.

A Worked Example: Clinic Booking System

A regional private clinic is replacing its paper-and-spreadsheet booking process with a digital system. Before speaking to anyone, the analyst collects and analyses three artifacts:

  • Patient intake form (paper, A4): 22 fields captured. Notable findings: "Referring physician" is pre-printed but 60% of entries are blank — indicating most patients self-refer; this field may be optional in the new system. The "Allergies" section has two lines of space but nurses regularly staple an extra page — the new system needs an unbounded allergy list.
  • Weekly appointment report (Excel, emailed every Monday): Six calculated columns including Utilisation % (booked slots / available slots × 100) and No-show Rate %. The formula is hard-coded per department — meaning each department has different definitions of "available slots". This is a business rule the new system must replicate per department, not globally.
  • Legacy booking screen (screenshot): The appointment type dropdown contains 47 entries, 11 of which begin with "DEPRECATED". Validation rules allow double-booking if the appointment type is marked as "Teleconsult". These are implicit rules — the new system must make them explicit and configurable.

In two hours of document analysis, the analyst identified 22 data fields, 4 business rules, 2 non-functional volume signals, and 3 workarounds — all before a single interview.

Analyst tip: Always ask for a sample of completed documents, not blank templates. A blank form shows you the intended fields. A completed form (or a batch of 50) shows you which fields are actually used, which are consistently blank, and where staff have added annotations that the system did not anticipate.

Reading Reports as Requirement Proxies

Reports deserve special attention because they are specifications of information requirements in disguise. Every column in a report is a data element the system must store or compute. Every filter or sort order is a query pattern the system must support efficiently. Every scheduled report (daily at 08:00, monthly at month-end) is a non-functional timing requirement.

In a logistics firm, the operations director's "Exception Report" — listing shipments that missed a milestone by more than two hours — implies the system must: record a timestamp at every milestone, store a planned arrival time per milestone, calculate the deviation, and filter deviations above a threshold. That is four data requirements and one performance calculation, all deduced from a single two-page PDF.

Spotting Workarounds: Where the Real Requirements Hide

A workaround is a human intervention that compensates for a gap in the current system. They are the highest-value targets in document analysis because each workaround directly implies a missing feature in the system you are about to build. Common workaround signals in documents:

  • Fields that are always the same value (a drop-down stuck on "Other") — the list does not match reality
  • A column in an Excel report labelled "Manual Adjustment" — the system cannot compute this automatically
  • A printed form where a section is always crossed out — that section is obsolete and should be removed
  • A note at the bottom of a procedure document: "For exceptions, email the operations manager directly" — the system has no exception-handling workflow
  • Two separate forms that capture the same patient ID — a missed integration point between two subsystems
Workaround Signals and the Requirements They Imply Workaround Signal Implied Requirement Drop-down always set to "Other" Appointment type list is incomplete Configurable, extensible type catalogue Admin can add/retire values without code change Excel "Manual Adjustment" column System cannot auto-calculate overrides Override field with audit trail Record who adjusted, when, and why "Email ops manager for exceptions" No exception-handling workflow exists Exception escalation workflow Configurable thresholds + in-app notifications Same patient ID on two separate forms Duplicate data entry across subsystems Single patient record, shared across modules One source of truth, no re-entry
Four common workaround signals found in documents, and the new-system requirements each directly implies.

Limitations and Trade-offs

Document analysis is powerful but not complete on its own. Its primary limitation is that documents capture what was, not what should be. A form designed in 2009 encodes the business rules of 2009. If the business has changed — new regulations, new products, new competitors — the documents may describe an obsolete process. Use document analysis to establish a baseline, then validate whether that baseline still reflects current reality through interviews and observation.

A second limitation: documents capture the intended process but not always the actual one. The SOP says the manager must approve all returns over AED 500. In practice, the team processes them without approval because the manager is never available. The document gives you the rule; observation gives you the gap. Both are requirements.

Common pitfall: Do not treat every field in an existing form as a confirmed requirement for the new system. Many legacy forms contain obsolete fields kept out of inertia, duplicate fields that exist because two departments never integrated, and fields that staff have never understood and always leave blank. Challenge each field: who uses this, for what decision, and what happens if it is missing? If no one can answer, it is a candidate for removal.

Combining Document Analysis with Other Techniques

Document analysis works best as the first technique you apply on any project, because it gives you a concrete, evidence-based foundation before you enter any human conversation. Armed with your artifact register, your candidate data requirements, and your list of workarounds, you arrive at interviews and workshops as a prepared, credible analyst — not a blank-slate questioner. Your questions become precise: "The intake form has an 'Allergies' section, but nurses told me it always overflows onto extra pages — how many allergies does the average patient have, and what categories do you track?" That is a question only someone who studied the documents would think to ask.

In the next lesson we will look at workshops and JAD sessions — where you bring stakeholders together to resolve the ambiguities and conflicts that document analysis surfaces but cannot resolve on its own.