Everything you have built across this tutorial — strict mode, variables, conditionals, loops, functions, redirection, text processing, error handling, and argument parsing — converges here. This capstone project builds a production-grade backup script from scratch, the kind you would find running nightly on database servers at mid-to-large engineering organizations. Backup scripts are deceptively simple to get wrong: they appear to work until the moment you actually need a restore, at which point you discover the archive was corrupt, the retention logic deleted everything, or the failure emails were swallowed silently.
We will build backup.sh in layers, explaining the production rationale at each step.
Layer 1: Safety Harness and Configuration Block
Every production backup script starts with the same two lines and a readonly configuration block. The configuration block must be the single source of truth — no magic strings buried deep in the logic.
Key idea — configuration at the top, secrets from files: Every tunable value lives in the readonly block. The database password is never hardcoded — it is read from a root-owned, mode-600 file (DB_PASS_FILE). This satisfies secret-scanning tools like GitGuardian and truffleHog, and matches the pattern used by PostgreSQL's .pgpass and MySQL's --defaults-extra-file.
Layer 2: Logging, Notification and Cleanup
A backup that fails silently is worse than no backup. Every log line must carry a timestamp, a severity level, and the script name so grep can find it instantly in a log aggregator like Datadog or Splunk. The trap on EXIT ensures partial archives are removed even if the script is killed mid-write.
_log() {
local level="$1"; shift
local msg="[${SCRIPT_NAME}] [${level}] $(date +%Y-%m-%dT%H:%M:%S) $*"
echo "$msg" | tee -a "$LOG_FILE"
}
log_info() { _log "INFO " "$@"; }
log_warn() { _log "WARN " "$@" >&2; }
log_error() { _log "ERROR" "$@" >&2; }
log_debug() { $VERBOSE && _log "DEBUG" "$@" || true; }
die() {
log_error "$*"
notify_slack ":x: *Backup FAILED* on $(hostname): $*"
exit 1
}
notify_slack() {
[[ -z "$SLACK_WEBHOOK" ]] && return 0
local payload
payload="$(printf '{"text": "%s"}' "$1")"
curl -sf --max-time 10 -X POST \
-H 'Content-Type: application/json' \
-d "$payload" "$SLACK_WEBHOOK" >/dev/null \
|| log_warn "Slack notification failed (non-fatal)"
}
# CURRENT_ARCHIVE is set later; cleanup removes it if the script dies mid-write
CURRENT_ARCHIVE=""
cleanup() {
local exit_code=$?
if [[ -n "$CURRENT_ARCHIVE" && -f "$CURRENT_ARCHIVE" && $exit_code -ne 0 ]]; then
log_warn "Removing incomplete archive: $CURRENT_ARCHIVE"
rm -f "$CURRENT_ARCHIVE"
fi
exit $exit_code
}
trap cleanup EXIT
trap 'die "Received SIGINT"' INT
trap 'die "Received SIGTERM"' TERM
Layer 3: Preflight Checks
Fail fast and loud before touching any data. Preflight checks prevent the silent partial failures that give you an empty archive with a 0 exit code.
require_command() {
command -v "$1" >/dev/null 2>&1 || die "Required command not found: $1"
}
check_free_disk() {
local dir="$1"
local required_gb="$2"
local free_gb
free_gb=$(df -BG "$dir" | awk 'NR==2 {gsub(/G/,""); print $4}')
(( free_gb >= required_gb )) \
|| die "Insufficient disk on $dir: ${free_gb}G free, need ${required_gb}G"
log_info "Disk OK: ${free_gb}G free on $dir"
}
preflight() {
log_info "--- Preflight checks ---"
require_command mysqldump
require_command gzip
require_command tar
require_command openssl
[[ -f "$DB_PASS_FILE" ]] \
|| die "DB password file not found: $DB_PASS_FILE"
[[ "$(stat -c %a "$DB_PASS_FILE")" == "600" ]] \
|| die "DB password file has unsafe permissions (must be 600): $DB_PASS_FILE"
mkdir -p "${BACKUP_ROOT}/db" "${BACKUP_ROOT}/files" "${BACKUP_ROOT}/logs"
check_free_disk "$BACKUP_ROOT" "$MIN_FREE_GB"
log_info "Preflight passed"
}
Production pitfall — unchecked permissions on credential files: Many teams create the password file but forget to lock it down. If backup_user.pass is world-readable, any process on the host can read your database password. The stat -c %a check enforces mode 600 at runtime, so a misconfigured deploy fails loudly rather than silently leaking credentials.
Layer 4: Backup Functions
Separate the database dump from the file archive. This lets each step be retried, tested, or monitored independently. The --single-transaction flag for MySQL is critical in production: it takes a consistent snapshot without acquiring table locks, so your application keeps serving traffic during the backup window.
backup_database() {
log_info "--- Database backup ---"
local db_archive="${BACKUP_ROOT}/db/db_${TIMESTAMP}.sql.gz"
CURRENT_ARCHIVE="$db_archive" # register for cleanup trap
local db_pass
db_pass="$(cat "$DB_PASS_FILE")"
if $DRY_RUN; then
log_info "[DRY-RUN] Would dump $DB_NAME to $db_archive"
return 0
fi
# --single-transaction: consistent snapshot without table locks (InnoDB)
# --routines --events: include stored procedures and event scheduler objects
# MYSQL_PWD avoids the password appearing in `ps aux` output
MYSQL_PWD="$db_pass" mysqldump \
--user="$DB_USER" \
--single-transaction \
--routines \
--events \
--quick \
"$DB_NAME" \
| gzip -9 > "$db_archive"
# Verify the archive is non-empty and passes gzip integrity check
gzip -t "$db_archive" || die "DB archive failed integrity check: $db_archive"
local size_mb
size_mb=$(du -m "$db_archive" | awk '{print $1}')
log_info "DB archive OK: $db_archive (${size_mb}MB)"
CURRENT_ARCHIVE="" # clear: archive is complete
}
backup_files() {
log_info "--- Files backup ---"
local files_archive="${BACKUP_ROOT}/files/files_${TIMESTAMP}.tar.gz"
CURRENT_ARCHIVE="$files_archive"
if $DRY_RUN; then
log_info "[DRY-RUN] Would archive $FILES_DIR to $files_archive"
return 0
fi
# --exclude caches and temp directories; preserve permissions
tar -czf "$files_archive" \
--exclude="$FILES_DIR/framework/cache" \
--exclude="$FILES_DIR/framework/sessions" \
--exclude="$FILES_DIR/logs" \
-C "$(dirname "$FILES_DIR")" \
"$(basename "$FILES_DIR")"
tar -tzf "$files_archive" >/dev/null || die "Files archive failed integrity check: $files_archive"
local size_mb
size_mb=$(du -m "$files_archive" | awk '{print $1}')
log_info "Files archive OK: $files_archive (${size_mb}MB)"
CURRENT_ARCHIVE=""
}
Layer 5: Retention Enforcement
Retention logic is where backup scripts most often destroy themselves. The find -mtime +N -delete idiom is the industry standard; avoid hand-rolling loops that compare dates as strings — they break at month and year boundaries.
enforce_retention() {
log_info "--- Enforcing ${RETENTION_DAYS}-day retention ---"
local deleted_count=0
while IFS= read -r -d '' old_file; do
if $DRY_RUN; then
log_info "[DRY-RUN] Would delete: $old_file"
else
rm -f "$old_file"
log_info "Deleted old backup: $old_file"
fi
(( deleted_count++ )) || true
done < <(find "$BACKUP_ROOT" \
\( -name "db_*.sql.gz" -o -name "files_*.tar.gz" \) \
-mtime "+${RETENTION_DAYS}" \
-print0)
log_info "Retention sweep complete: removed ${deleted_count} old archive(s)"
}
Pro practice — use -print0 with read -d '': Filenames can legally contain spaces and newlines. The find -print0 / read -d '' pair uses the null byte as a delimiter, which can never appear in a filename. This is the Google Shell Style Guide's mandatory pattern for iterating over find output safely.
Layer 6: Argument Parsing and main()
The script accepts --dry-run and --verbose flags. Dry-run mode lets operators validate the backup plan in a new environment without touching any data or wasting disk space.
parse_args() {
while [[ $# -gt 0 ]]; do
case "$1" in
--dry-run) DRY_RUN=true ; shift ;;
--verbose) VERBOSE=true ; shift ;;
--help|-h)
echo "Usage: $SCRIPT_NAME [--dry-run] [--verbose]"
exit 0
;;
*) die "Unknown argument: $1" ;;
esac
done
}
main() {
parse_args "$@"
# Ensure log directory exists before any tee calls
mkdir -p "$(dirname "$LOG_FILE")"
log_info "========================================="
log_info "Backup started (host: $(hostname), user: $(whoami))"
$DRY_RUN && log_warn "DRY-RUN mode: no files will be written"
preflight
backup_database
backup_files
enforce_retention
local db_size files_size
db_size=$(du -sh "${BACKUP_ROOT}/db/db_${TIMESTAMP}.sql.gz" 2>/dev/null | awk '{print $1}' || echo "N/A")
files_size=$(du -sh "${BACKUP_ROOT}/files/files_${TIMESTAMP}.tar.gz" 2>/dev/null | awk '{print $1}' || echo "N/A")
log_info "Backup complete. DB: $db_size Files: $files_size"
log_info "========================================="
notify_slack ":white_check_mark: *Backup succeeded* on $(hostname). DB: ${db_size}, Files: ${files_size}"
}
if [[ "${BASH_SOURCE[0]}" == "${0}" ]]; then
main "$@"
fi
Execution flow of the production backup script. The trap safety net fires on any exit path, ensuring partial archives never accumulate.
Running and Scheduling
Make the script executable and test it in dry-run mode before scheduling:
# One-time setup
chmod +x /opt/scripts/backup.sh
chown root:root /opt/scripts/backup.sh
chmod 700 /opt/scripts/backup.sh
# Validate without writing data
/opt/scripts/backup.sh --dry-run --verbose
# Real run (test before adding to cron)
/opt/scripts/backup.sh --verbose
# Add to root's crontab — runs at 02:00 every day
# Append BOTH stdout and stderr to a persistent log
crontab -e
# Add this line:
# 0 2 * * * /opt/scripts/backup.sh >> /var/log/backup.log 2>&1
# Monitor recent runs
grep "Backup complete\|Backup FAILED" /var/log/backup.log | tail -20
Pro practice — verify restores on a schedule: A backup that has never been restored is a hypothesis, not a guarantee. At companies like Stripe, automated restore tests run weekly in a staging environment: spin up a temporary database, load the most recent dump, run a smoke query, then tear it down. This is the only reliable proof that your backup is usable.
This script embodies every lesson in the tutorial: set -euo pipefail catches silent failures, trap handles unexpected exits, functions isolate concerns, readonly prevents accidental mutation, --dry-run enables safe validation, and structured logging makes every run auditable. That is production-grade shell scripting.