Performance & Load Testing

JMeter & Other Tools

18 min Lesson 4 of 28

JMeter & Other Tools

The previous lesson established k6 as the modern, developer-centric load generator. But k6 is not the only tool on the field — and at big-tech scale, tool choice has real consequences. Apache JMeter has been the industry workhorse for twenty years. Gatling brings a statically-typed Scala DSL and real-time HTML reports. Locust lets you write test logic in pure Python. Each reflects a different philosophy about where the boundary between "test authoring" and "test execution" belongs. This lesson maps the landscape precisely and gives you the judgment framework engineers at Amazon, Netflix, and Google actually use when selecting a load testing tool for a specific scenario.

Apache JMeter: Architecture and Mental Model

JMeter is a Java application that models load as a tree of Thread Groups. Each Thread Group simulates N concurrent users; each simulated user walks a Sampler chain — HTTP Request, JDBC Request, gRPC Request — in sequence, optionally wrapped in Logic Controllers (If, While, Loop, Transaction) and augmented by Pre/Post Processors and Assertions. Listeners collect and render results.

The canonical test plan artifact is a .jmx file — an XML document that encodes the entire tree. For CI pipelines, you discard the GUI and run headless with jmeter -n -t plan.jmx -l results.jtl -e -o report/. The -e -o flags generate a self-contained HTML dashboard from the JTL file — the most polished out-of-the-box report in the ecosystem.

# Install JMeter 5.6.3 (latest stable, requires JDK 17+)
wget https://downloads.apache.org/jmeter/binaries/apache-jmeter-5.6.3.tgz
tar xzf apache-jmeter-5.6.3.tgz
export PATH=$PATH:$(pwd)/apache-jmeter-5.6.3/bin

# Run a test plan headless, output JTL + HTML dashboard
jmeter -n \
  -t checkout-flow.jmx \
  -l results/run-$(date +%Y%m%d-%H%M%S).jtl \
  -e -o results/html-report/ \
  -Jthreads=200 \
  -Jramp=60 \
  -Jduration=300

# Override plan properties at runtime with -J flags (avoids editing the XML)
# threads=200  → number of concurrent virtual users
# ramp=60      → seconds to reach full concurrency
# duration=300 → total test duration in seconds

# Useful JMeter CLI properties for CI:
# -Jsummariser.interval=10   → print summary every 10 s
# -Jjmeter.save.saveservice.output_format=csv  → compact JTL

The GUI is for authoring, not for running tests. JMeter's GUI consumes substantial memory and its own threads skew results. Every team that has accidentally run a 500-VU test inside the GUI and wondered why throughput was capped at 200 RPS learns this lesson the hard way. The GUI is a plan editor. Production-scale execution is always jmeter -n (non-GUI / headless), ideally distributed across multiple injectors.

JMeter Distributed Mode

A single JMeter controller node can drive roughly 1,000–1,500 concurrent threads before GC pressure and network I/O become the bottleneck. For higher concurrency, JMeter uses a controller/worker architecture: one Controller (the machine running the test plan) orchestrates N Worker nodes (injectors). Workers receive the test plan at test start, execute in parallel, and stream results back to the controller.

# On each worker node — start JMeter server process
jmeter-server -Djava.rmi.server.hostname=<WORKER_IP>

# On the controller — run distributed test
jmeter -n \
  -t plan.jmx \
  -R 10.0.0.11,10.0.0.12,10.0.0.13 \
  -l results/distributed.jtl \
  -e -o results/html-report/ \
  -Gthreads=500 \
  -Gramp=120 \
  -Gduration=600

# -R  → comma-separated list of worker IPs
# -G  → set a property on ALL remote workers (like -J but broadcast)

# Firewall rules required:
# Controller → Workers: TCP 1099 (RMI registry) + TCP 4000-4002 (result stream)
# Workers → Controller: TCP 60000 (default result port, configurable in jmeter.properties)

# Docker Compose for local distributed testing:
# services:
#   jmeter-controller:
#     image: justb4/jmeter:5.6.3
#     command: ["-n","-t","/tests/plan.jmx","-R","worker1,worker2","-l","/results/out.jtl"]
#   worker1:
#     image: justb4/jmeter:5.6.3
#     command: ["-s","-Djava.rmi.server.hostname=worker1"]
#   worker2:
#     image: justb4/jmeter:5.6.3
#     command: ["-s","-Djava.rmi.server.hostname=worker2"]

JMeter: Production-Grade Patterns

Teams running JMeter at scale develop a set of non-negotiable practices:

Parameterize everything. Never hardcode base URLs, credentials, or concurrency values inside the .jmx file. Use JMeter properties (${__P(baseUrl,http://localhost)}) injected at runtime with -J flags. This makes the same plan reusable across dev, staging, and prod environments without XML edits.
Use CSV Data Set Config for realistic user data. Feeding the same credentials for 500 VUs will hit a session-collision wall in any system with per-user state. A CSV Data Set Config element lets each VU read a distinct row from a CSV — unique users, tokens, order IDs.
Add a Response Assertion or Duration Assertion to every critical sampler. Without assertions, JMeter counts every HTTP 500 as a "success" unless you tell it otherwise. Assert on response code, response body substring, and latency threshold (e.g., warn if P95 > 2,000 ms).
Tune JVM heap size for worker nodes. Default JMeter heap is 1 GB. At 500 threads with large response bodies being stored, you will hit OutOfMemoryError. Set JVM_ARGS="-Xms2g -Xmx4g -XX:+UseG1GC" in the jmeter shell script, or pass as an environment variable.
Stream results to InfluxDB + Grafana in real time. The HTML report is post-hoc. For live test monitoring, configure the Backend Listener with InfluxdbBackendListenerClient and point it at your InfluxDB instance. Then import the standard JMeter Grafana dashboard (ID 5496).

Gatling: The Scala-DSL Contender

Gatling compiles your load scenario as Scala (or the newer Java/Kotlin DSL) and runs it on Akka Netty — a non-blocking I/O event loop. The architecture difference from JMeter is fundamental: JMeter runs one OS thread per virtual user; Gatling runs thousands of virtual users multiplexed over a fixed thread pool (typically 4–8 threads per CPU core). This means Gatling can sustain 10,000+ concurrent connections from a single injector that would OOM JMeter.

// Gatling 3.10 simulation — Maven/Gradle project, place in src/gatling/simulations/
// build.gradle (Kotlin DSL):
// plugins { id("io.gatling.gradle") version "3.10.5" }

import io.gatling.javaapi.core.*
import io.gatling.javaapi.core.CoreDsl.*
import io.gatling.javaapi.http.*
import io.gatling.javaapi.http.HttpDsl.*
import java.time.Duration

class CheckoutSimulation : Simulation() {

    val httpProtocol = http
        .baseUrl("https://api-staging.example.com")
        .acceptHeader("application/json")
        .contentTypeHeader("application/json")
        .header("X-Load-Test", "true")

    val login = exec(
        http("POST /auth/login")
            .post("/auth/login")
            .body(StringBody("""{"email":"user_#{userId}@example.com","password":"Passw0rd!"}"""))
            .check(status().`is`(200))
            .check(jsonPath("$.token").saveAs("authToken"))
    )

    val checkout = exec(
        http("POST /orders")
            .post("/orders")
            .header("Authorization", "Bearer #{authToken}")
            .body(StringBody("""{"productId":42,"qty":1}"""))
            .check(status().`is`(201))
            .check(jsonPath("$.orderId").saveAs("orderId"))
    )

    val scn = scenario("Checkout Flow")
        .feed(csv("users.csv").circular())
        .exec(login)
        .pause(Duration.ofMillis(500), Duration.ofMillis(1500))
        .exec(checkout)

    init {
        setUp(
            scn.injectOpen(
                nothingFor(Duration.ofSeconds(5)),
                rampUsers(500).during(Duration.ofSeconds(60)),
                constantUsersPerSec(100.0).during(Duration.ofSeconds(300))
            )
        ).protocols(httpProtocol)
         .assertions(
             global().responseTime().percentile3().lte(2000),  // P99 <= 2s
             global().failedRequests().percent().lte(1.0)
         )
    }
}

// Run from project root:
// ./gradlew gatlingRun
// Report generated in: build/reports/gatling/checkoutsimulation-<timestamp>/index.html

Gatling's built-in HTML report is the most detailed in the class: per-request response time distributions, active users over time, requests/sec charts, error summaries — all in a single self-contained file. For CI, Gatling exits non-zero when assertions fail, making pass/fail gates trivial.

Locust: Python-Native Load Testing

Locust defines virtual user behavior as a Python class that inherits from HttpUser. Tasks are decorated with @task(weight); Locust's scheduler distributes tasks proportionally by weight. The runtime is a gevent-based cooperative-multitasking event loop — like Gatling, it does not allocate one OS thread per VU.

# locustfile.py — Locust 2.x

from locust import HttpUser, task, between, events
import json, random

class CheckoutUser(HttpUser):
    wait_time = between(0.5, 2.0)       # random think time between tasks
    host = "https://api-staging.example.com"

    def on_start(self):
        """Runs once when a simulated user spawns."""
        resp = self.client.post(
            "/auth/login",
            json={"email": f"user{random.randint(1,10000)}@example.com",
                  "password": "Passw0rd!"},
            name="/auth/login"
        )
        self.token = resp.json()["token"]
        self.client.headers.update({"Authorization": f"Bearer {self.token}"})

    @task(5)
    def browse_catalog(self):
        self.client.get(f"/products?page={random.randint(1,20)}", name="/products")

    @task(2)
    def view_product(self):
        pid = random.randint(1, 500)
        self.client.get(f"/products/{pid}", name="/products/[id]")

    @task(1)
    def checkout(self):
        self.client.post(
            "/orders",
            json={"productId": random.randint(1, 500), "qty": 1},
            name="/orders"
        )

# Run headless (CI mode):
# locust -f locustfile.py \
#   --headless \
#   --users 500 \
#   --spawn-rate 50 \
#   --run-time 5m \
#   --csv results/locust \
#   --html results/locust-report.html

# Distributed mode (1 master + N workers):
# locust -f locustfile.py --master
# locust -f locustfile.py --worker --master-host=<MASTER_IP>

Locust's killer feature for platform teams is its web UI and live API. You can start/stop/adjust user counts mid-test via the browser or curl http://localhost:8089/swarm. This makes exploratory testing and gradual ramp-up experiments much faster than restarting a JMeter plan.

Tool selection framework: JMeter dominates protocol breadth, Gatling and k6 dominate concurrency efficiency, Locust dominates Python-native ergonomics, and every major tool has a cloud-hosted SaaS tier for planet-scale injection.

Choosing a Tool: The Judgment Framework

Senior engineers at big-tech companies do not have religious loyalty to a single tool. They evaluate along four axes:

1. Protocol fit. If your system under test is a REST/gRPC HTTP service, every tool in this lesson works. If you need to load-test a message broker (ActiveMQ, Kafka), a relational database (connection pool exhaustion), or an LDAP server, JMeter's plugin ecosystem is unmatched. For WebSocket-heavy applications, Gatling and k6 both have first-class WS support; JMeter's WS plugin is third-party and fragile.

2. Concurrency ceiling per node. Thread-per-VU models (JMeter) hit OS limits around 1,000–2,000 threads per JVM instance before GC pauses dominate. Event-loop models (Gatling, k6, Locust-gevent) sustain 5,000–20,000 concurrent connections per node. For scenarios requiring 50,000+ VUs, you will distribute horizontally regardless of tool; but the event-loop tools distribute more cheaply (fewer, cheaper nodes).

3. Ecosystem integration. If your team already ships k6 scripts in the CI pipeline (previous lesson), adding JMeter for a new test suite introduces two tool chains. The consolidation cost is real. Conversely, if the QA team owns load testing and they live in JMeter, demanding they rewrite in k6 for a single CI gate is unreasonable. Meet the team where they are — unless the protocol ceiling or concurrency ceiling forces a change.

4. Test-as-code fidelity. JMeter .jmx files are XML. They are notoriously hard to diff, review in pull requests, or maintain in version control without a GUI. Gatling simulations, k6 scripts, and Locust files are real source code: reviewable, composable, refactorable. For teams running load tests on every pull request, code-based tools are strongly preferred.

Avoid tool sprawl; standardize per layer. A mature platform engineering team typically standardizes on one tool for developer-authored micro-benchmarks (often k6 in CI), one for QA-owned regression suites (often JMeter or Gatling in a scheduled pipeline), and a cloud-hosted tier for capacity planning runs at full production scale. This is three tiers with a clear ownership boundary — not five tools owned by nobody in particular.

Record-and-playback is a trap for complex flows. JMeter's HTTP(S) Test Script Recorder and Gatling's HAR recorder produce brittle scripts that capture cookies, CSRF tokens, and dynamic path segments as hard-coded strings. A script recorded Monday fails Tuesday when the session token format changes. Use recording only as a starting skeleton, then immediately parameterize every dynamic value. Teams that treat recorded scripts as production-ready pay the maintenance tax for years.

Artillery and Other Ecosystem Tools

The landscape has more entrants worth knowing by name. Artillery (Node.js-based, YAML or JS scenarios) has gained significant traction for its simple YAML DSL and native AWS Lambda runner — you can burst to 50,000 VUs purely serverless without managing injector nodes. NBomber (.NET) is the right answer when the team ships C# and wants type-safe load tests. Vegeta (Go CLI) is the fastest way to answer a single-question benchmark: "what is the maximum sustainable RPS of this endpoint before P99 blows past SLO?" — it is not a scenario runner but an HTTP rate sender with excellent statistical output. For browser-level load testing (JavaScript execution, real rendering), k6 browser and Playwright-based Artillery scenarios are replacing legacy Selenium grid approaches.

The common thread across all these tools: they are inputs to the same analysis pipeline. Whether the raw output is a .jtl file, a .json Gatling report, or Locust CSV, the next step is always the same — extract P50/P95/P99 latency, error rate, and throughput, compare against your SLOs, and drive a decision. The tool generates the data; the engineer interprets it. That interpretation is the subject of Lesson 9.