Configuration Management with Ansible

Playbooks

18 min Lesson 4 of 30

Playbooks

If ad-hoc commands are Ansible's screwdriver, playbooks are its architecture blueprint. A playbook is a YAML file that declares what should be true about a set of hosts — which packages are installed, which services are running, which config files contain which content. Ansible reads the playbook and drives the hosts to that state, idempotently and in order. At companies like Stripe, Cloudflare, and Shopify, playbooks encode decades of operational knowledge and can bring an entire fleet from bare metal to fully configured in minutes.

This lesson covers the four structural elements that make up every professional playbook: plays, tasks, handlers, and the notify directive that connects them. Get these four right and you have the mental model for everything else Ansible does.

Playbook Anatomy: The Play

A playbook is a list of plays. Each play maps a set of hosts to a set of tasks and defines the execution context for those tasks. A playbook can have one play or fifty — each play runs sequentially, and all hosts in a play run in parallel (up to the forks limit, default 5).

The essential keys of a play:

name — human-readable label; shows up in output and is the single best documentation for what the play does.
hosts — inventory pattern (a group name, a glob, all, or a comma-separated list).
become — escalate to root via sudo for the entire play. Can be overridden per task.
gather_facts — default true; collects system facts (OS, IP, memory) before tasks run. Disable with false when facts are unused and you need speed.
vars — play-scoped variables (covered in depth in lesson 5).
tasks — ordered list of actions to execute on the matched hosts.
handlers — tasks triggered only when notified (covered below).

One playbook, multiple plays: A single YAML file can contain plays targeting different host groups in sequence. For example: play 1 configures database servers, play 2 configures app servers (which may depend on the databases being ready). This lets you orchestrate multi-tier deployments in one file with a single ansible-playbook invocation.

Tasks: The Unit of Work

A task is one call to an Ansible module. Each task has a name (required in production — never skip it), a module key, and the module's arguments. Tasks run top-to-bottom within a play. If a task fails, Ansible stops the play on that host by default (fail-fast per host; other hosts in the play continue unless you set any_errors_fatal: true).

Here is a complete, production-quality playbook that installs Nginx, drops a configuration file, and ensures the service is running and enabled:

---
# site.yml — baseline web server configuration
- name: Configure web servers
  hosts: webservers
  become: true
  gather_facts: true

  vars:
    nginx_worker_processes: "auto"
    nginx_worker_connections: 1024

  tasks:
    - name: Ensure Nginx is installed
      ansible.builtin.package:
        name: nginx
        state: present

    - name: Deploy Nginx main config
      ansible.builtin.template:
        src: templates/nginx.conf.j2
        dest: /etc/nginx/nginx.conf
        owner: root
        group: root
        mode: "0644"
        validate: "nginx -t -c %s"
      notify: Reload Nginx

    - name: Ensure Nginx is started and enabled
      ansible.builtin.service:
        name: nginx
        state: started
        enabled: true

  handlers:
    - name: Reload Nginx
      ansible.builtin.service:
        name: nginx
        state: reloaded

Run it with:

# Dry-run first — see what would change without touching hosts
ansible-playbook site.yml --check --diff

# Real run against production inventory
ansible-playbook -i inventories/production/hosts.ini site.yml

# Limit to a single host for a canary test
ansible-playbook -i inventories/production/hosts.ini site.yml --limit web01.prod.example.com

# Verbose output — print task results
ansible-playbook site.yml -v        # module results
ansible-playbook site.yml -vvv      # connection + SSH details

Always use --check --diff before running against production. Check mode runs the playbook without making changes and prints what would change. --diff shows the before/after diff of file content. Together they are your dry-run. At scale, enforce this in CI: run --check on every PR, apply only on merges to main.

Handlers and notify: Event-Driven Side Effects

Handlers are tasks that run at the end of a play — but only if at least one task notified them during that play's execution. This pattern solves a fundamental operations problem: you want to restart a service only when its configuration actually changed, not on every run.

The mechanics:

A task declares notify: <Handler Name>. The name must match the handler's name field exactly (case-sensitive).
If that task reports changed (not skipped, not ok — changed), Ansible marks the named handler as pending.
After all tasks complete, Ansible flushes pending handlers once, in the order they are declared in the handlers section (not in the order they were notified).
Even if ten tasks notify the same handler, it runs exactly once.

Tasks run sequentially; handlers fire once at the end — only when at least one task notified them with a CHANGED result.

Handler Pitfalls in Production

Handlers are elegant but have sharp edges that trip up engineers in production:

Handlers do not run if the play fails mid-way. If Task 3 errors before tasks complete, pending handlers are skipped. Use meta: flush_handlers as a task to force handlers to run at a specific point in the play — for example, right after deploying a config file but before starting dependent services.
Handler name must match exactly. A typo in notify silently does nothing — Ansible does not raise an error for a handler that was never triggered. Always test with --check and look for NOTIFIED in the output.
Handler order is declaration order, not notification order. If Handler B depends on Handler A completing first, declare A before B.
One reload, not ten. Ten tasks all changing config snippets can all notify the same handler — it runs once at the end. This is the correct pattern for assembling config from multiple sources.

Never restart a service inside a task block to work around handlers. Engineers new to Ansible sometimes write a task that directly restarts a service unconditionally — to avoid learning handlers. This causes unnecessary downtime on every playbook run, even when nothing changed. Handlers exist precisely to solve this: only restart when something actually changed.

The Complete Multi-Play Playbook Pattern

Real infrastructure playbooks orchestrate multiple tiers. Here is the canonical pattern for a two-tier app stack deployment — databases first, then app servers:

---
# full-stack.yml
- name: Configure database tier
  hosts: dbservers
  become: true
  gather_facts: true
  tasks:
    - name: Install PostgreSQL
      ansible.builtin.package:
        name: postgresql
        state: present

    - name: Deploy pg_hba.conf
      ansible.builtin.template:
        src: templates/pg_hba.conf.j2
        dest: /etc/postgresql/15/main/pg_hba.conf
        owner: postgres
        group: postgres
        mode: "0640"
        validate: "pg_hba: %s"   # custom validation script
      notify: Reload PostgreSQL

    - name: Ensure PostgreSQL is started and enabled
      ansible.builtin.service:
        name: postgresql
        state: started
        enabled: true

  handlers:
    - name: Reload PostgreSQL
      ansible.builtin.service:
        name: postgresql
        state: reloaded

- name: Configure application tier
  hosts: appservers
  become: true
  gather_facts: true
  tasks:
    - name: Deploy application config
      ansible.builtin.template:
        src: templates/app.env.j2
        dest: /opt/myapp/.env
        owner: myapp
        group: myapp
        mode: "0600"
      notify: Restart application

    - name: Ensure application service is running
      ansible.builtin.service:
        name: myapp
        state: started
        enabled: true

  handlers:
    - name: Restart application
      ansible.builtin.service:
        name: myapp
        state: restarted

Use FQCN (Fully Qualified Collection Name) for all modules. Write ansible.builtin.package, not just package. FQCNs make it unambiguous which collection a module comes from, survive collection version upgrades without silent shadowing, and are enforced by linters like ansible-lint. Google, HashiCorp, and every enterprise Ansible shop mandates FQCNs in shared playbook codebases.

Idempotency: The Contract You Must Uphold

Every task in a professional playbook must be idempotent — running the playbook ten times produces the same result as running it once. The package, service, and template modules are idempotent by design. Shell and command tasks are not — use creates, removes, or changed_when guards to make them idempotent or to suppress false-positive change reports:

# Idempotent shell task — only run if the output file does not exist
- name: Generate TLS certificate
  ansible.builtin.command:
    cmd: openssl req -x509 -newkey rsa:4096 -keyout /etc/ssl/private/app.key -out /etc/ssl/certs/app.crt -days 365 -nodes -subj "/CN=app.internal"
    creates: /etc/ssl/certs/app.crt   # skip if this file already exists

# Suppress spurious changed status for read-only checks
- name: Check kernel parameter
  ansible.builtin.command: sysctl net.ipv4.ip_forward
  register: sysctl_result
  changed_when: false   # this task never changes state — report as OK always

Breaking idempotency is one of the most common production Ansible bugs. A playbook that flips a config on every run will continually restart services, generate change-event noise in your CMDB, and make it impossible to tell from the run report whether something actually changed. Treat every CHANGED line in a playbook run as a real event that should require explanation.