JVM Internals & Performance

The JIT Compiler

15 min Lesson 5 of 13

The JIT Compiler

The JVM starts by interpreting bytecode — executing one instruction at a time without compiling the whole program. This is flexible and fast to start, but it is significantly slower than natively compiled code for long-running workloads. The Just-In-Time (JIT) compiler bridges this gap: it observes which methods are executed most, compiles them to optimised native machine code on the fly, and replaces the interpreter path with the compiled version. Understanding how the JIT works lets you write code that the JIT can optimise aggressively and lets you diagnose the rare cases where JIT behaviour surprises you.

Interpretation vs. Compilation

When the JVM loads a class, methods start life as bytecode. The interpreter processes each bytecode opcode one by one. This is cheap to start (no compilation latency, small footprint) but CPU-intensive for hot paths because the interpretation overhead is paid on every single call.

The JIT compiler (in HotSpot: C1 and C2) watches method invocation and back-edge (loop iteration) counts. Once a counter passes a threshold it triggers compilation:

  • C1 (client compiler) — fast, lightly optimised compilation. Used first (tiered compilation tier 1–3) to replace the interpreter quickly with code that still collects profiling data.
  • C2 (server compiler) — slow, heavily optimised compilation. Fires at tier 4 for the hottest methods. Produces code that can rival hand-written C++.
Tiered Compilation (default since Java 8): The JVM starts with interpretation (tier 0), then C1 without profiling (tier 1), C1 with profiling (tier 2 / 3), and finally C2 (tier 4). You do not need to choose — the JVM promotes each method through the tiers automatically based on counters.

Hot Spots — How the JVM Finds Them

The JVM uses two independent counters per method:

  • Invocation counter — incremented every time the method is called.
  • Back-edge counter — incremented on every loop iteration within the method (enables On-Stack Replacement).

When either counter crosses the -XX:CompileThreshold (default 10,000 for server JVM), the method is enqueued for JIT compilation. The counters decay over time so a method that was once hot and then goes idle can fall out of compiled form.

On-Stack Replacement (OSR) is the JIT mechanism that replaces a method while it is still executing. This matters for long loops: the JVM can compile and switch to the native version of a loop body mid-execution, without waiting for the next call.

// This loop becomes an OSR candidate very quickly public long sumToN(int n) { long sum = 0; for (int i = 1; i <= n; i++) { sum += i; // back-edge counter incremented here } return sum; }

Inlining — The Most Important JIT Optimisation

Method inlining replaces a call site with the body of the called method. It eliminates the call overhead (stack frame creation, argument passing, return), but its real power is that it exposes the inlined code to the surrounding context, enabling further optimisations like constant folding, dead code elimination, and loop unrolling that would otherwise be impossible across method boundaries.

// Before inlining (what you write) public int add(int a, int b) { return a + b; } public int compute(int x) { return add(x, 10) * 2; // call site } // After inlining (what the JIT sees internally — not real bytecode) // compute(x) becomes: (x + 10) * 2 // Then constant folding can simplify if x is a constant: result known at compile time

The JIT inlines based on heuristics around bytecode size and call frequency. Key limits in HotSpot:

  • -XX:MaxInlineSize=35 — methods at or below 35 bytecodes are inlined almost unconditionally if called frequently.
  • -XX:FreqInlineSize=325 — very hot methods up to 325 bytecodes may also be inlined.
  • -XX:MaxRecursiveInlineLevel — controls recursive inlining depth.
Write small, focused methods. This is idiomatic Java and it directly enables aggressive inlining. A 400-line method cannot be inlined. Breaking it into smaller helpers that remain under the size threshold lets the JIT inline them all and reason about the full computation together.

Devirtualisation and Speculative Optimisations

Virtual dispatch (calling an interface or overridden method) prevents naive inlining because the JIT does not know at compile time which concrete class will be used. The JIT uses the profiling data collected during interpretation to make speculative assumptions:

  • Monomorphic call site — only one concrete type observed. The JIT inlines that type's implementation and adds a guard. If the guard fails (a different type arrives) it falls back to the slow path.
  • Bimorphic call site — two concrete types observed. The JIT emits an if/else over the two inlined bodies.
  • Megamorphic call site — three or more types observed. The JIT gives up on inlining and uses a virtual dispatch table. This is a significant performance cliff.
// Monomorphic — JIT can inline List<String> list = new ArrayList<>(); for (String s : list) { process(s); } // always ArrayList, JIT inlines ArrayList iteration // Megamorphic — JIT cannot inline 'compute' interface Transformer { int compute(int x); } void run(Transformer[] transformers, int input) { for (Transformer t : transformers) { t.compute(input); // many different implementations — megamorphic, no inlining } }
Micro-benchmarks and JIT warm-up: The JIT only kicks in after thousands of invocations. Naively timing the first few calls of a method gives you interpreter performance, not JIT performance. Always use a proper benchmarking harness (JMH) that includes warm-up iterations. Failure to warm up is the single most common reason for misleading Java benchmark results.

Escape Analysis and Scalar Replacement

The JIT performs escape analysis to determine whether an object's reference can ever leave the current method or thread. If an object does not escape, the JIT can:

  • Stack-allocate it — avoiding heap allocation and GC pressure entirely.
  • Scalar replace it — decompose the object into its primitive fields and hold them in CPU registers, eliminating object overhead completely.
// Point does NOT escape — the JIT may scalar-replace it public double distanceFromOrigin(double x, double y) { Point p = new Point(x, y); // allocated on stack or eliminated entirely return Math.sqrt(p.x * p.x + p.y * p.y); } // Point DOES escape — must be heap-allocated public Point createPoint(double x, double y) { return new Point(x, y); // returned to caller — definitely escapes }

Observing JIT Behaviour

You can observe what the JIT is doing without a profiler:

# Print every method compilation event java -XX:+PrintCompilation -jar myapp.jar # Sample output: # 176 1 3 java.lang.String::hashCode (55 bytes) # column: timestamp | compile-id | tier | method | bytecode-size # A trailing 'made not entrant' means the JIT invalidated the compiled version (type assumption violated) # Print inlining decisions (very verbose — use in a test run, not prod) java -XX:+PrintInlining -jar myapp.jar
GraalVM Native Image vs. JIT: GraalVM can compile ahead-of-time (AOT) to a native binary, eliminating warm-up at the cost of peak throughput (no runtime profiling feedback). For microservices with fast startup requirements, native compilation is compelling. For long-running servers that can afford warm-up, the JIT's profile-guided optimisations typically produce higher sustained throughput than AOT.

Practical Takeaways

  • Keep methods small to stay within inlining thresholds — good OOP design and JIT-friendliness align naturally.
  • Prefer monomorphic or bimorphic call sites in hot loops; avoid storing many different concrete types behind the same interface variable inside a tight loop.
  • Let the JVM warm up before measuring performance. Use JMH for any serious benchmarking.
  • Trust the JIT first; reach for manual micro-optimisations (e.g. avoiding autoboxing) only after profiling confirms a real bottleneck.
  • Use -XX:+PrintCompilation during testing to verify that your critical paths are being compiled at tier 4.

In the next lesson we will look at how to measure and benchmark performance correctly so that JIT warm-up, dead-code elimination, and other JIT effects do not produce misleading results.