Redis in Production
Redis in Production
Running Redis in development is trivial. Running it at scale — where it serves millions of requests per second, backs session state for every authenticated user, and sits in the critical path of your checkout flow — is a different discipline entirely. This lesson covers the four pillars that separate a stable production Redis from one that pages you at 3 AM: eviction policy tuning, hot key and big key detection, latency spike diagnosis, and a monitoring stack that gives you actionable signal before users notice.
Eviction Policy Tuning
When Redis approaches its maxmemory limit, it must decide what to do. The answer is controlled by maxmemory-policy, and choosing the wrong one for your workload is one of the most common causes of subtle production bugs.
The eight policies split across two axes: what to evict (LRU — least recently used, LFU — least frequently used, TTL — closest to expiry, or random) and which key space to sample (all keys, or only keys that already have a TTL set). The practical matrix:
- noeviction — reject writes once memory is full. Use for primary data you cannot afford to lose (e.g. a queue backed by Redis lists). Your application will get
OOM command not allowederrors and must handle them gracefully. - allkeys-lru — evict the least-recently-used key across all keys. The right default for a cache where all keys are equally eligible for eviction. Used at Netflix for their EVCache layer.
- volatile-lru — evict only from keys that have a TTL. Safe for mixed workloads where some keys are persistent (counters, rate limiters) and some are ephemeral (session tokens). Only the ephemeral keys get evicted.
- allkeys-lfu — evict least-frequently-used keys. Superior to LRU for workloads with a high scan-once pattern (large datasets that are iterated periodically rather than hot-accessed). Redis 4.0+.
- volatile-ttl — evict the key closest to its TTL expiry. Useful when your TTL already encodes business priority — keys expiring soonest are least valuable.
noeviction thinking it means "never delete data." What it actually means is "stop accepting writes when full." If your application does not catch the OOM error and retry with backoff, it will silently drop events, fail requests, or crash. If you truly need durability, use persistence (RDB + AOF) and size your instance so it never hits maxmemory, or use a queue-backed architecture.
The LFU eviction policies rely on two configuration knobs that control how quickly the frequency counter decays and how much probabilistic insertion randomness is applied:
maxmemory: Set it to 75–80% of the instance's physical RAM. Leave headroom for Redis's internal metadata (each key has ~50–90 bytes of overhead regardless of value size), replication buffers, and OS page cache. On a 16 GB instance, set maxmemory 12gb. Monitor used_memory_rss in your metrics stack — if RSS consistently exceeds physical RAM, you are swapping, which destroys latency.
Hot Keys
A hot key is a single key receiving a disproportionate share of traffic — often thousands of times more requests per second than the average key. Because Redis is single-threaded (for command processing), a hot key can saturate CPU on a single shard, creating a queue of waiting commands and causing latency spikes across the entire instance even for keys that are not hot.
Classic hot key scenarios: a viral tweet's like counter, a sale item's inventory count, a feature flag that every API request checks, or a session key for a shared service account.
Redis 7.4+ includes redis-cli --hotkeys which uses the LFU counter to identify candidates without additional overhead. For older instances or higher precision, use the keyspace notification sampling approach:
Mitigating hot keys at production scale requires a strategy, not just detection:
- Client-side caching (Redis 6+): Use the client-tracking protocol so your application servers maintain a local in-memory copy. The Redis server sends invalidation messages when the key changes. This eliminates the network round-trip entirely for read-heavy hot keys.
- Key sharding: For a counter-like hot key, split it into N shards (
counter:foo:0throughcounter:foo:N-1), increment a random shard on write, and sum all shards on read. Facebook uses this at 10x+ key multiplication for their most contended counters. - Read replicas with client-side routing: Route hot-key reads to replicas. Redis Cluster does not do this automatically — you must implement it at the client or proxy layer (Envoy, Twemproxy, or a custom client pool).
- Local in-process cache (L1): Cache the hot key in application memory with a short TTL (1–5 seconds). Stale for a second is acceptable for most read-mostly hot keys and eliminates Redis traffic entirely.
Big Keys
A big key is a key whose value is large enough to cause operational problems: slow commands that block the event loop, large replication payload spikes, and slow RDB serialization. Redis's single-threaded model means a single DEL or LRANGE on a 10 MB value blocks every other client for tens of milliseconds.
The thresholds that matter at production scale: strings > 1 MB, lists/sets/hashes/sorted sets > 5,000 elements (or > 1 MB serialized). Anything beyond these is a candidate for redesign.
UNLINK instead of DEL for any key larger than ~100 KB. DEL is synchronous and blocks the main thread while freeing memory. UNLINK (Redis 4.0+) does the same logical operation but defers the actual memory reclamation to a background thread. For very large keys (tens of MB), the difference can be 50–200 ms of blocking. In a production environment where your p99 SLO is 10 ms, a single DEL call can blow through your entire error budget.
Latency Spike Diagnosis
Redis latency spikes come from a finite set of causes. Knowing the taxonomy lets you go from "Redis is slow" to root cause in minutes, not hours.
1. Slow commands. O(N) commands like KEYS *, SMEMBERS on a large set, SORT, and LRANGE with large ranges block the event loop. The slowlog captures them automatically:
2. Fork latency (RDB/AOF rewrite). When Redis forks for persistence, the OS must copy the page table. On a 20 GB instance with 512 MB page tables, the fork itself can take 200–500 ms and block the main thread. Symptoms: a regular latency spike every save interval or AOF rewrite cycle. Mitigation: use smaller Redis instances (keep datasets under 10 GB per shard), use THP (Transparent Huge Pages) disabled — Redis explicitly recommends echo never > /sys/kernel/mm/transparent_hugepage/enabled — and prefer replicas for RDB saves (bgsave on replica, not primary).
3. Active memory defragmentation. Long-running Redis instances accumulate memory fragmentation. When mem_fragmentation_ratio exceeds 1.5, you are wasting significant RAM. Redis 4.0+ has online defragmentation, but it consumes CPU and can cause brief latency spikes during compaction windows. Monitor the ratio and tune aggressively only when fragmentation is confirmed:
4. Network and OS-level causes. Latency that Redis itself cannot explain — check with redis-cli --latency (measures round-trip, not command execution) versus LATENCY HISTORY (measures command execution). If round-trip latency is high but command latency is low, the cause is network (NIC queues, TCP Nagle, high interrupt rate) or OS scheduling (VM steal time, NUMA imbalance). Run redis-cli --intrinsic-latency 30 on the server to establish an OS baseline.
Monitoring Redis in Production
A production Redis monitoring stack needs three layers: real-time metrics (Prometheus), structured alerting (Alertmanager rules on the metrics), and a capacity dashboard (Grafana). The redis_exporter by Oliver006 is the production standard — it exposes every INFO section as Prometheus metrics with no configuration beyond a Redis connection string.
Key metrics to track and their healthy ranges at production scale:
instantaneous_ops_per_sec— baseline varies; track the rate-of-change, not the absolute value. A sudden 5x spike is more informative than any threshold.used_memory_rss— must stay below physical RAM. RSS growth without logical memory growth indicates fragmentation.evicted_keysrate — zero is the goal for a cache with a correctly sizedmaxmemory. Sustained eviction under a stable load means the instance is undersized.blocked_clients— clients waiting onBLPOP/BRPOP/BZPOPMIN. A non-zero value is expected for queue consumers; a growing value indicates producer/consumer imbalance.rejected_connections— any value is a severity-1 incident. Redis has refused a client connection because themaxclientslimit was hit.rdb_last_bgsave_status/aof_last_rewrite_status— persistence failures are silent by default; without alerting on these, you can lose your persistence safety net unnoticed.
commandstats section of INFO, hot-key dashboards built from LFU data, and automated runbooks triggered by Alertmanager webhooks that page on-call with a pre-populated Jupyter notebook showing the relevant metrics window.
The operational practices covered in this lesson — eviction policy selection, hot-key mitigation, UNLINK for big keys, slowlog analysis, and a Prometheus/Alertmanager monitoring stack — form the baseline competency expected of any team running Redis in a production SLA. Master these before reaching for more complex solutions like Redis Cluster or external proxy layers; most Redis incidents at scale are preventable with these primitives applied correctly.