Services Deep Dive
Services Deep Dive
In the previous lesson you learned that the Kubernetes network model gives every Pod a unique, routable IP. That IP is ephemeral: when a Pod is replaced — by a rolling update, a crash, or a node eviction — it gets a completely new address. The higher-level construct that gives your workloads a stable identity is the Service. But a Service is far more than a static IP: the mechanism that actually routes packets, the topology of the DNS entry it creates, and the session-affinity guarantees it provides are all choices that determine how your system behaves at production scale.
kube-proxy: The Dataplane Under Every Service
When you create a Service, the control-plane component kube-proxy — a DaemonSet running on every node — is responsible for programming the local network so that traffic sent to the Service's ClusterIP (a virtual IP, or VIP) is load-balanced to one of the healthy backing Pods. kube-proxy supports three dataplane modes, and the choice matters enormously at scale.
Mode 1: iptables (Default in Most Clusters)
kube-proxy watches the Endpoints (or EndpointSlices) API and translates each Service into a chain of iptables DNAT rules in the KUBE-SERVICES chain. A packet destined for the VIP hits the chain, a rule is probabilistically selected (e.g. a 1-of-3 chance for each of 3 endpoints), and the destination is rewritten to the chosen Pod IP before the packet is forwarded.
The critical operational characteristic of iptables mode is that the rules are a flat, sequential list. For a cluster with 10,000 Services and 50,000 endpoints, every packet that enters the kernel must traverse up to 500,000 iptables rules. Latency spikes and CPU saturation on kube-proxy pods are the most common symptom. Additionally, the entire rule set is rewritten on every endpoint change — a thundering-herd problem on high-churn clusters.
Mode 2: IPVS (Production Standard for Large Clusters)
IPVS (IP Virtual Server) is a kernel-space load balancer purpose-built for this problem. Instead of a flat rule chain, IPVS uses a hash table: Service VIP lookup is O(1) regardless of how many Services exist. IPVS also supports richer load-balancing algorithms — round-robin, least-connection, source-hashing — that iptables cannot express. At any cluster above ~500 Services or ~5,000 endpoints, IPVS mode is the engineering-sound default.
Headless Services: Removing the VIP
A standard Service provides a single stable VIP and does the load-balancing inside the kernel. But this is the wrong model for a stateful workload — a Kafka consumer that must connect to partition leader 2, a Redis Sentinel client that needs to discover the primary, or a StatefulSet where every Pod has its own identity. These clients need the actual Pod IPs, not an opaque VIP.
Setting clusterIP: None creates a headless Service. kube-proxy does nothing. Instead, the cluster DNS returns an A record per healthy endpoint directly. The client receives all Pod IPs and is responsible for selecting one — enabling client-side load-balancing, sticky connections, or topology-aware routing that the kernel dataplane cannot express.
Session Affinity: Pinning Clients to a Backend
By default, every new TCP connection from a client is independently load-balanced — there is no guarantee that a client will hit the same backend Pod twice. For stateless services this is desirable; for services that store per-session data in memory (shopping carts, WebSocket upgrade handshakes, ML inference servers that load a model per session), routing every request from the same client to the same Pod is critical.
Kubernetes Services support sessionAffinity: ClientIP, which programs IPVS (or iptables) to use source-IP hashing as the load-balancing key. Any connection from the same source IP will be routed to the same backend for the duration of the timeoutSeconds window (default 10800 seconds = 3 hours).
nginx.ingress.kubernetes.io/affinity: cookie) or design the backend to be truly stateless and externalise session state to Redis.EndpointSlices: Scalability for Large Backends
Before Kubernetes 1.17, a Service had a single Endpoints object containing all Pod IPs. On a Deployment with 1,000 replicas, every endpoint change (a single Pod restart) caused the entire 1,000-entry object to be re-written, re-sent to every node, and re-processed by kube-proxy — O(N) work for O(1) change. EndpointSlices shard the endpoint list into 100-entry chunks. Each chunk is independent; a Pod restart updates one slice. This change reduced kube-proxy CPU by 90% in large-scale benchmarks at Google and Datadog. EndpointSlices are now the default and should never be disabled.
service.kubernetes.io/topology-mode: Auto on a Service enables topology-aware hints. kube-proxy will prefer endpoints on the same node or availability zone as the client, reducing cross-zone data transfer costs (which are real and measurable on AWS/GCP at scale) and lowering latency. Enable it on high-throughput Services once you have stable traffic across zones.Putting It Together: Choosing the Right Service Shape
As a rule of thumb: use a standard ClusterIP Service for stateless workloads. Use a headless Service for StatefulSets, peer-discovery protocols (Elasticsearch, Cassandra, Kafka), and any case where the client needs to address individual Pods. Enable sessionAffinity: ClientIP sparingly — only when statefulness genuinely lives in the Pod and cannot be moved to a shared store — and be aware of the NAT-hotspot failure mode. Run IPVS mode on any cluster with more than a few hundred Services. These choices, made correctly, are invisible to users; made incorrectly, they become the hardest class of production bugs to diagnose.