Networking & HTTP

Networking Best Practices

15 min Lesson 9 of 13

Networking Best Practices

Knowing how to open a socket or fire an HTTP request is only half the job. Professional networking code must also survive the realities of distributed systems: slow servers, transient failures, exhausted connection pools, and man-in-the-middle attacks. This lesson distills the four pillars that separate production-grade network code from tutorial code: timeouts, retries with back-off, connection reuse, and TLS fundamentals.

1. Timeouts — Always Set Them

Every network call can block forever unless you explicitly bound it. Java's HttpClient exposes two independent timeout types:

Connect timeout — how long to wait for the TCP handshake to complete. Set on the HttpClient itself.
Request timeout — how long the entire request/response cycle may take. Set per HttpRequest.

import java.net.URI;
import java.net.http.HttpClient;
import java.net.http.HttpRequest;
import java.net.http.HttpResponse;
import java.time.Duration;

HttpClient client = HttpClient.newBuilder()
        .connectTimeout(Duration.ofSeconds(5))   // TCP handshake limit
        .build();

HttpRequest request = HttpRequest.newBuilder()
        .uri(URI.create("https://api.example.com/data"))
        .timeout(Duration.ofSeconds(10))          // full request limit
        .GET()
        .build();

HttpResponse<String> response =
        client.send(request, HttpResponse.BodyHandlers.ofString());

If the connect timeout fires you get a ConnectException; if the request timeout fires you get an HttpTimeoutException. Catch them separately — a timeout is a recoverable signal, not a fatal bug.

Never omit timeouts in production code. Without them a single slow upstream server can exhaust your thread pool as threads pile up waiting indefinitely. This is one of the most common causes of cascading failure in microservice architectures.

2. Retries with Exponential Back-off

Transient failures — a momentary DNS hiccup, a 503 Service Unavailable, a dropped TCP connection — are normal in distributed systems. The correct response is a bounded retry loop with exponential back-off and jitter.

Why exponential back-off? If a server is overloaded and all clients retry simultaneously at a fixed interval, they create a thundering herd that makes the situation worse. Doubling the wait time with each attempt spreads the load out. Adding random jitter prevents synchronised retry storms even when clients started at the same time.

import java.net.http.HttpTimeoutException;
import java.util.Set;

public static HttpResponse<String> sendWithRetry(
        HttpClient client, HttpRequest request, int maxAttempts)
        throws Exception {

    Set<Integer> retryableStatuses = Set.of(429, 500, 502, 503, 504);
    long delayMs = 200;  // initial back-off

    for (int attempt = 1; attempt <= maxAttempts; attempt++) {
        try {
            HttpResponse<String> resp =
                    client.send(request, HttpResponse.BodyHandlers.ofString());

            if (!retryableStatuses.contains(resp.statusCode())) {
                return resp;             // success or non-retryable error
            }

            System.err.printf("Attempt %d: status %d — retrying%n",
                    attempt, resp.statusCode());

        } catch (HttpTimeoutException | java.net.ConnectException e) {
            if (attempt == maxAttempts) throw e;
            System.err.printf("Attempt %d: %s — retrying%n",
                    attempt, e.getClass().getSimpleName());
        }

        // jittered exponential back-off
        long jitter = (long) (Math.random() * delayMs * 0.3);
        Thread.sleep(delayMs + jitter);
        delayMs = Math.min(delayMs * 2, 8_000);  // cap at 8 s
    }
    throw new RuntimeException("All " + maxAttempts + " attempts failed");
}

Only retry idempotent operations. A GET or DELETE is safe to repeat. A POST that creates a resource is not — replaying it may duplicate data. For non-idempotent calls, investigate idempotency keys (send a unique Idempotency-Key header the server can use to deduplicate).

Also honour the Retry-After response header when it is present — servers that return 429 Too Many Requests often tell you exactly how long to wait. Parse that header and use it instead of your back-off value.

3. Connection Reuse — The HttpClient Connection Pool

Opening a TCP connection is expensive: it requires a three-way handshake, and if TLS is involved, an additional 1–2 round trips for the handshake. Creating a new HttpClient per request throws that investment away every time.

Java's HttpClient maintains an internal connection pool automatically when you reuse the same instance. The pool supports HTTP/1.1 keep-alive and HTTP/2 multiplexing (many requests over a single connection).

// Correct — create ONCE, reuse everywhere
public class ApiService {

    private static final HttpClient HTTP = HttpClient.newBuilder()
            .connectTimeout(Duration.ofSeconds(5))
            .version(HttpClient.Version.HTTP_2)   // prefer HTTP/2 multiplexing
            .build();

    public String fetch(String url) throws Exception {
        HttpRequest req = HttpRequest.newBuilder()
                .uri(URI.create(url))
                .timeout(Duration.ofSeconds(10))
                .build();
        return HTTP.send(req, HttpResponse.BodyHandlers.ofString()).body();
    }
}

Do not create a new HttpClient inside a loop or per-request method. Each instance starts with an empty pool, so you pay the full connection overhead on every call and you may exhaust file descriptors under load.

In frameworks like Spring Boot you would inject a shared HttpClient bean (or use RestClient / WebClient which manage their own pools). The principle is the same: one pool per downstream service.

4. TLS Basics — Trust, Certificates, and Hostname Verification

Java's HttpClient enforces TLS by default for https:// URLs. Understanding what happens under the hood helps you handle the cases where the defaults need tuning.

Trust store — the JVM ships with a cacerts trust store containing well-known CA certificates. When your client connects, the server presents its certificate chain; the JVM verifies it chains up to a trusted CA and that the certificate has not expired.
Hostname verification — after trust is established, the JVM checks that the hostname in the URL matches the Subject Alternative Name (or CN) in the certificate. This prevents a valid certificate for evil.com being presented for bank.com.
TLS version — by default the JVM negotiates the highest mutually supported version. Java 17 supports TLS 1.2 and TLS 1.3; TLS 1.0 and 1.1 are disabled. Never downgrade this in production.

The most common TLS problem in internal or dev environments is a self-signed certificate. The correct fix is to add that certificate to a custom trust store — not to disable verification.

import java.security.KeyStore;
import java.security.cert.CertificateFactory;
import java.io.FileInputStream;
import javax.net.ssl.SSLContext;
import javax.net.ssl.TrustManagerFactory;

// Load a custom CA certificate for an internal service
CertificateFactory cf = CertificateFactory.getInstance("X.509");
try (FileInputStream fis = new FileInputStream("internal-ca.crt")) {
    var cert = cf.generateCertificate(fis);
    KeyStore ks = KeyStore.getInstance(KeyStore.getDefaultType());
    ks.load(null, null);
    ks.setCertificateEntry("internal-ca", cert);

    TrustManagerFactory tmf =
            TrustManagerFactory.getInstance(TrustManagerFactory.getDefaultAlgorithm());
    tmf.init(ks);

    SSLContext sslContext = SSLContext.getInstance("TLS");
    sslContext.init(null, tmf.getTrustManagers(), null);

    HttpClient client = HttpClient.newBuilder()
            .sslContext(sslContext)
            .build();
}

Never disable certificate validation in production. Code like trustAllCerts() or setting a no-op TrustManager removes every TLS guarantee. An attacker can intercept the connection with any certificate. This is occasionally seen in test code — keep it strictly in tests and behind a flag, never in a production path.

Putting It All Together

Production-grade networking combines all four concerns simultaneously. The pattern is: one shared, pooled HttpClient with a connect timeout; requests with individual timeouts; a retry helper that applies exponential back-off only for transient errors; and TLS with proper certificate validation.

Checklist before shipping network code:

Connect timeout set on the client.
Request timeout set on every request.
Retry loop with exponential back-off and jitter — only for idempotent, retryable status codes.
HttpClient created once and reused (or obtained from a DI container).
TLS certificate validation left enabled; custom CAs added to a trust store, not bypassed.
Structured logging of attempt counts, status codes, and elapsed time for observability.

Summary

Always set both a connect timeout and a request timeout — they guard against different failure modes. Use exponential back-off with jitter for retries, and only retry idempotent, transient failures. Reuse a single HttpClient instance to leverage its connection pool and avoid the overhead of repeated TCP and TLS handshakes. Understand the TLS trust chain and hostname verification that Java enforces by default; add custom CAs to a trust store rather than disabling validation. These practices make your networking code resilient, efficient, and secure.