Networking & Communication

API Gateways & Reverse Proxies

18 min Lesson 8 of 10

API Gateways & Reverse Proxies

Every large-scale system places a layer of infrastructure in front of its backend services — before any request reaches an application server. That layer has two related but distinct incarnations: the reverse proxy and the API gateway. Understanding what each one does, why it exists, and where the boundary between them blurs is essential for designing real production architectures.

What Is a Reverse Proxy?

A reverse proxy is a server that sits between external clients and one or more backend servers. Clients talk to the proxy, the proxy forwards requests to a backend, and the backend's response comes back through the proxy. From the client's perspective there is only one server — the proxy hides everything behind it.

The word "reverse" distinguishes it from a forward proxy, which sits in front of clients (like a corporate network proxy that all outbound traffic passes through). A reverse proxy sits in front of servers.

Core responsibilities of a reverse proxy:

  • TLS termination — decrypt HTTPS once at the edge; internal traffic can stay plain HTTP on a private network, saving CPU on every backend server.
  • Load balancing — distribute incoming requests across a pool of backend instances using round-robin, least-connections, IP-hash, or weighted strategies.
  • Caching — cache static assets or API responses so repeat requests never hit the backend at all.
  • Compression — gzip or Brotli responses in one place instead of every backend.
  • Connection pooling — maintain persistent upstream connections, reducing TCP handshake overhead for each backend call.
  • DDoS / rate limiting — drop or throttle abusive traffic before it reaches your application code.

Popular reverse proxies: Nginx, HAProxy, Envoy, Caddy. Cloud equivalents: AWS ALB (Application Load Balancer), GCP Cloud Load Balancing, Cloudflare.

Reverse proxy in front of backend service pool Clients HTTPS :443 HTTPS Reverse Proxy TLS termination Load balance Cache / Compress HTTP App Server Instance 1 App Server Instance 2 App Server Instance 3 Private Network (HTTP)
A reverse proxy terminates TLS at the edge and distributes plain-HTTP traffic across backend instances on a private network.

What Is an API Gateway?

An API gateway is a specialised reverse proxy built specifically for managing API traffic. It does everything a reverse proxy does, plus a rich set of API-aware concerns:

  • Authentication & authorization — validate JWTs, API keys, or OAuth tokens before the request reaches any service. Services trust that the gateway has already verified the caller.
  • Rate limiting & quotas — enforce per-client or per-endpoint limits (e.g. 1,000 req/min per API key). Prevents a single noisy consumer from starving others.
  • Request / response transformation — translate between protocol versions, strip internal headers, inject correlation IDs, rewrite paths.
  • Routing — route /api/v1/users to the User Service and /api/v1/orders to the Order Service based on path, headers, or query parameters.
  • Request aggregation (BFF pattern) — fan out one client request to several upstream services and merge the results into a single response, reducing mobile round trips.
  • Observability — centralised access logs, latency metrics, and distributed trace headers injected on every request.
  • Circuit breaking & retries — automatically stop forwarding to a degraded upstream and return a fallback; retry transient failures with backoff.

Popular API gateways: Kong (open-source, Lua/Go plugins), AWS API Gateway, Azure API Management, Apigee, Traefik, Envoy + Istio (service-mesh approach).

Key distinction: A reverse proxy is infrastructure — it makes traffic routing reliable and efficient. An API gateway is a product boundary — it makes your API a managed, governed surface with identity, metering, and visibility.

Architecture: Gateway in Front of Microservices

The most important architectural role of an API gateway is acting as the single entry point into a microservices cluster. Without it, every client would need to know the address and port of every service, and cross-cutting concerns (auth, logging, rate limits) would have to be reimplemented in every service. The gateway centralises all of that.

API Gateway as single entry point to microservices Web App Mobile App 3rd-party API Gateway Auth / JWT Rate Limiting Routing Logging Transform Circuit Break /users /orders /payments User Service Order Service Payment Service Service-to-Service (internal, no gateway) Inventory Svc Notification Svc direct gRPC Gateway is for EXTERNAL clients only (or an internal "east-west" gateway for multi-team)
The API Gateway is the single entry point for all external clients. Internal service-to-service calls typically bypass it and communicate directly.

TLS Termination in Depth

One of the most universally valuable reverse proxy features is TLS termination. Here is what it saves you:

  • Every backend server no longer needs a TLS certificate. Certificates are managed in one place.
  • The TLS handshake CPU cost — which for RSA-2048 can be 10–50 ms — is paid once at the proxy, not multiplied across every service instance.
  • Internal traffic stays on a trusted private network, so encryption overhead is avoided where the threat model does not require it. (For regulated industries or zero-trust environments, you re-encrypt after the proxy — called TLS passthrough or mTLS bridging.)
  • Certificate rotation happens at the proxy tier without touching application deployments.

Rate Limiting Strategies

API gateways implement rate limiting to protect services and enforce fair use. Common algorithms:

  • Token bucket — a bucket with capacity N refills at R tokens/second. Each request costs one token. Allows controlled bursts up to N. Used by AWS API Gateway and most cloud providers.
  • Leaky bucket — requests fill a queue that drains at a constant rate; excess requests are dropped. Produces very smooth output but cannot absorb bursts.
  • Fixed window counter — count requests per time window (e.g. 1,000/minute). Simple, but a burst at the boundary can effectively allow 2× the limit.
  • Sliding window log — precise, stores a timestamp log per client. Accurate but memory-heavy at scale.
Design tip: Return HTTP 429 Too Many Requests with a Retry-After header so well-behaved clients can back off automatically. Expose quota headers (X-RateLimit-Limit, X-RateLimit-Remaining, X-RateLimit-Reset) so developers can track their usage proactively.

The BFF Pattern (Backend for Frontend)

A Backend for Frontend is a specialised API gateway variant where you create a separate gateway instance per client type — one for the mobile app, one for the web app, one for third-party partners. Each BFF aggregates and shapes the data specifically for that consumer's needs.

Example: a mobile BFF might call the User Service, Order Service, and Recommendation Service in parallel and return a single merged JSON object, saving the mobile client three round trips and reducing the payload to only the fields the app actually renders. The web BFF can afford richer responses because it is on a faster connection with more bandwidth.

Netflix and SoundCloud popularised this pattern when they discovered that a single "generic" API forced every client to make many round trips and parse far more data than it needed.

Trade-offs and Failure Modes

The gateway is a powerful abstraction, but it comes with real costs:

  • Single point of failure — if the gateway goes down, every client loses access to every service. Mitigate with multiple gateway instances behind a load balancer and aggressive health checks.
  • Added latency — every request passes through one extra hop. A well-tuned gateway adds ~1–5 ms. A misconfigured one doing synchronous DB lookups for auth on every request can add 50+ ms.
  • Operational complexity — the gateway accumulates cross-cutting logic over time and can become a bottleneck for teams ("who owns the gateway config?"). Adopt a declarative, code-reviewed configuration process early.
  • Fan-out amplification — a single client request triggers multiple upstream calls. If one upstream is slow (tail latency), the whole aggregated response is held up. Use hedged requests or timeouts-per-upstream to bound this.
Common pitfall: Putting business logic in the API gateway. The gateway should route, authenticate, rate-limit, and transform — but it should not make domain decisions ("if the user's subscription tier is Gold, call this service instead"). Business logic belongs in your services. A gateway stuffed with logic becomes a distributed monolith in disguise.

Real-World Numbers

To calibrate your mental model:

  • Nginx on a single 4-core VM can reverse-proxy ~50,000 req/sec with <5 ms overhead at 95th percentile.
  • Kong (built on Nginx + Lua) adds ~1–3 ms per plugin (auth, rate-limit) per request. At 10 plugins that is 10–30 ms — plan your plugin count accordingly.
  • AWS API Gateway has a default account limit of 10,000 req/sec and charges per million requests (~$3.50). For very high throughput, migrating to ALB + a self-managed gateway is often cheaper.
  • Twitter runs a custom API gateway layer (internally called "TFE — Twitter Front End") that handles auth, routing, and observability for ~300,000 requests per second at peak.

Putting It Together

A production-ready entry point for a microservices system typically looks like:

  1. CDN / edge layer (Cloudflare, Fastly) — caches static assets, terminates TLS closest to the user, absorbs volumetric DDoS.
  2. API Gateway (Kong, AWS API GW, custom Envoy) — auth, rate limiting, routing, observability, retries.
  3. Load balancer — if the gateway itself is clustered, a layer-4 LB distributes among gateway instances.
  4. Upstream services — each service handles only its own domain logic, trusting the gateway to have already verified identity and enforced quotas.

The reverse proxy and API gateway are not optional conveniences — they are the control plane for your API surface. Getting them right means your services stay simple, your security posture is consistent, and your observability is centralised from day one.