JVM Internals & Performance

Profiling Tools

15 min Lesson 8 of 13

Profiling Tools

Benchmarking tells you how fast your code runs. Profiling tells you why it runs that way. A profiler attaches to a running JVM and continuously samples or instruments it, giving you a live breakdown of CPU time, heap allocations, thread states, garbage collection, and lock contention — exactly the data you need to fix a problem you have already measured.

In this lesson we cover the three essential profiling tools for production-grade Java work: Java Flight Recorder (JFR), VisualVM, and the art of reading a heap dump.

Java Flight Recorder

JFR is a production-safe, low-overhead profiler built into the JVM itself. It was open-sourced as part of OpenJDK in Java 11 and ships with every JDK since then — no extra download, no agent, no licence fee. The overhead is typically under 1 % for most workloads, which makes it safe to run continuously in production.

JFR works by recording events. Every JVM subsystem (GC, JIT, class loading, socket I/O, thread locks, file I/O, and more) fires events with timestamps and metadata. JFR stores those events in a highly efficient binary format in a ring buffer and flushes them to a .jfr file on request. You then open that file offline in JDK Mission Control (JMC) to analyse it.

Starting a recording

There are three ways to start a JFR recording.

1. At JVM startup (best for catching problems from the very beginning):

java -XX:+FlightRecorder \
     -XX:StartFlightRecording=duration=120s,filename=myapp.jfr,settings=profile \
     -jar myapp.jar

2. On demand via jcmd while the app is running (no restart needed):

# find the PID first
jps -l

# start a 60-second recording on PID 12345
jcmd 12345 JFR.start duration=60s filename=/tmp/myapp.jfr settings=profile

# or dump what is currently in the ring buffer at any time
jcmd 12345 JFR.dump filename=/tmp/snapshot.jfr

3. Programmatically inside the application:

import jdk.jfr.Recording;
import jdk.jfr.consumer.RecordingFile;
import java.nio.file.Path;
import java.time.Duration;

public class JfrDemo {
    public static void main(String[] args) throws Exception {
        try (Recording rec = new Recording()) {
            rec.enable("jdk.CPUSample").withPeriod(Duration.ofMillis(10));
            rec.enable("jdk.GCHeapSummary");
            rec.enable("jdk.JavaMonitorEnter");   // lock contention
            rec.start();

            // ... run your workload ...
            Thread.sleep(5_000);

            rec.stop();
            rec.dump(Path.of("workload.jfr"));
        }

        // parse the file programmatically
        try (var rf = new RecordingFile(Path.of("workload.jfr"))) {
            while (rf.hasMoreEvents()) {
                var event = rf.readEvent();
                System.out.println(event.getEventType().getName()
                        + " @ " + event.getStartTime());
            }
        }
    }
}

settings=profile vs settings=default: The default profile is designed to have near-zero overhead; it records GC events and a few I/O events. The profile setting also enables CPU sampling at 10 ms intervals, object allocation profiling, and lock profiling — higher detail, still very low overhead, but not zero. Use default in production continuously; switch to profile for targeted investigations.

Reading a JFR file in JDK Mission Control

Download JMC from jdk.java.net/jmc (it is a separate download from the JDK itself). Open a .jfr file and explore:

Automated Analysis — JMC scans the recording and flags anomalies (high GC pause, lock contention hotspot, suspicious allocations). Start here.
Method Profiling — a flame graph / call tree of sampled CPU time. The hottest frame is your performance bottleneck.
Memory — allocation profiling shows which call sites allocate the most, and heap live-set over time.
Threads — thread-state timeline: green = running, yellow = waiting on lock, purple = sleeping. Wide yellow bands mean contention.
Garbage Collections — every GC pause, its type, duration, heap before and after, and the trigger.

Continuous recording in production: Configure JFR with a fixed-size ring buffer (maxsize=250m, no duration) and dump it on demand when an incident occurs — you get the last few minutes of profiling data right up to the problem without having planned ahead. This is far more useful than attaching a profiler after the fact.

VisualVM

VisualVM is a free, GUI-based profiler that connects to a local or remote JVM over JMX. It is installed separately from the JDK (visualvm.github.io) and is the most accessible starting point for developers who want a visual, real-time view of a running process.

What VisualVM gives you at a glance:

Overview — JVM flags, system properties, uptime, PID.
Monitor — live CPU %, heap used/committed, thread count, class count. Instantly shows whether you have a heap growth trend or a CPU spike.
Threads — live thread timeline, thread dump on demand. Deadlocks are detected and highlighted.
Sampler — low-overhead CPU and memory sampling. CPU sampling shows a hot method table; memory sampling shows allocation by class. Use the Sampler for quick investigations without stopping the world.
Profiler — instrumentation mode (every method entry/exit is counted). More accurate but adds overhead — avoid on production traffic.
Heap Dump — trigger or import a heap dump and browse object counts, retained sizes, and reference chains.

Instrumentation profiling adds non-trivial overhead. Instrumenting a large application can slow it by 5–20x. Use the Sampler tab for production-like investigation; reserve the Profiler tab for local micro-investigations on isolated code paths.

To connect to a remote process, start the JVM with JMX enabled:

java -Dcom.sun.management.jmxremote \
     -Dcom.sun.management.jmxremote.port=9010 \
     -Dcom.sun.management.jmxremote.authenticate=false \
     -Dcom.sun.management.jmxremote.ssl=false \
     -jar myapp.jar

In VisualVM, choose File → Add JMX Connection and enter host:9010.

Never expose JMX without authentication on the internet. The example above is safe only inside a trusted network or behind an SSH tunnel. For SSH tunnelling: ssh -L 9010:localhost:9010 user@host, then connect to localhost:9010 in VisualVM.

Reading a Heap Dump

A heap dump is a snapshot of all live objects in the JVM heap at a single point in time. It is the definitive tool for diagnosing memory leaks: everything on the heap is visible, along with reference chains that explain why objects are alive.

Capturing a heap dump

# via jmap (attaches to a running process)
jmap -dump:format=b,file=heap.hprof <pid>

# on OutOfMemoryError (safest — automatic, zero extra tooling needed)
java -XX:+HeapDumpOnOutOfMemoryError \
     -XX:HeapDumpPath=/tmp/oom-heap.hprof \
     -jar myapp.jar

# via jcmd
jcmd <pid> GC.heap_dump /tmp/heap.hprof

Always use -XX:+HeapDumpOnOutOfMemoryError in production. If the app ever OOMs you get the dump automatically, which is often the only chance to know what was consuming the heap at that exact moment.

Analysing the dump

Open the .hprof file in Eclipse Memory Analyzer Tool (MAT) or VisualVM's Heap Dump viewer. The key concepts to understand:

Shallow size — the memory used by the object itself (its fields). Useful for understanding object layout.
Retained size — the memory that would be freed if this object were garbage collected, i.e., the object plus everything exclusively reachable through it. This is what matters for leak analysis.
Dominator tree — a tree where each node is the object whose removal would free the most memory. The top of the dominator tree shows the biggest memory consumers; these are usually where leaks originate.
GC roots — the starting points from which the GC traces reachability: static fields, thread stacks, JNI references. An object is alive because a chain of references leads to it from a GC root. The leak is the link that should have been cleared.

In MAT, run the Leak Suspects Report first. It auto-detects accumulator objects (collections or caches that hold tens of thousands of entries) and traces them back to the GC root. You then look at the reference chain to understand which component holds the reference and why it was never released.

Large heap dumps take time. A 4 GB heap dump produces a 4 GB .hprof file (uncompressed). MAT needs roughly 1–1.5x that as Java heap itself to analyse it. Use the 64-bit MAT binary and give it at least -Xmx6g if analysing a large dump. VisualVM's built-in heap viewer works better for smaller dumps (under 1 GB).

Choosing the Right Tool

JFR + JMC — always-on production profiling, accurate timing, minimal overhead, rich event ecosystem. First choice for production investigations.
VisualVM — fast, visual, great for local development and quick checks on staging. Easy to share findings as screenshots.
Heap dump + MAT/VisualVM — diagnosing memory leaks and understanding live object graphs. Not a real-time tool; used reactively.

A mature performance workflow combines all three: run JFR continuously, use VisualVM for live exploration during development, and have a heap dump ready to capture automatically on OOM.