How the JVM Works
How the JVM Works
Every Java developer knows the slogan "write once, run anywhere," but what actually makes that possible? The Java Virtual Machine (JVM) is the engine between your source code and the host operating system. Understanding its internals — class loading, bytecode interpretation, and the runtime data areas — lets you reason confidently about startup behaviour, memory pressure, and the moments when the JIT kicks in.
From Source to Execution: the Big Picture
The journey from a .java file to running instructions has three distinct phases:
- Compilation —
javaccompiles.javafiles to.classfiles containing bytecode, a compact, platform-neutral instruction set. - Class Loading — at runtime the JVM reads
.classfiles on demand, verifies them, and prepares their memory representation. - Execution — the bytecode is either interpreted directly by the interpreter or compiled to native machine code by the Just-In-Time (JIT) compiler. (JIT is covered fully in Lesson 5; here we focus on the interpreter and the runtime data areas.)
.class file contains instructions like invokevirtual and iload_1 that the JVM specification defines. They are meaningless to the CPU until the JVM translates them — either by interpreting them or by JIT-compiling them to native instructions.
The Class Loading Subsystem
Class loading happens lazily: a class is loaded the first time something references it. The subsystem has three stages:
- Loading — a ClassLoader locates the binary
.classdata (from the file system, a JAR, the network, or generated at runtime) and creates ajava.lang.Classobject in the Heap. - Linking
- Verification — the bytecode verifier checks structural constraints: correct operand types, no stack underflow, valid jump targets. This is the security gate that prevents malformed bytecode from corrupting the JVM.
- Preparation — static fields are allocated and given default values (
0,null,false). Your initializers have not run yet. - Resolution — symbolic references (e.g., "class
com.example.Order") are replaced with direct memory references. Resolution may happen eagerly or lazily depending on the JVM implementation.
- Initialization — the class's
<clinit>method runs: static initializers and static field assignments execute in textual order. This is whenstatic final String VERSION = "1.0";is actually set.
The ClassLoader Delegation Model
ClassLoaders form a parent-child hierarchy and follow the delegation model: before loading a class themselves, they ask their parent. The standard chain is:
- Bootstrap ClassLoader — built into the JVM (native code); loads
java.lang,java.util, and the rest of the core JDK modules. - Platform ClassLoader (Java 9+, formerly Extension) — loads optional JDK modules.
- Application ClassLoader — loads classes from the application classpath and modulepath.
- Custom ClassLoaders — frameworks like OSGi, servlet containers, and hot-reload tools add their own layers.
ThreadLocal holding a class instance), that ClassLoader — and every class it loaded — cannot be garbage collected after the application is undeployed. The symptom is OutOfMemoryError: Metaspace after several hot redeployments.
Runtime Data Areas
The JVM specification defines six runtime data areas. Knowing which data lives where is the foundation of all performance and memory-leak analysis.
1. Program Counter (PC) Register
Each thread has its own PC register holding the address of the bytecode instruction currently being executed. For native methods the PC is undefined. This is one of the smallest, most invisible areas — you will never tune it — but it is what makes concurrent threads possible: each thread knows its own position.
2. JVM Stack (Thread Stack)
Each thread has its own JVM Stack. Every method call pushes a stack frame onto it; the frame is popped when the method returns or throws. A frame holds:
- Local variable array — slots for all local variables and method parameters. Primitives and object references are stored here (not the objects themselves).
- Operand stack — a work area the bytecode interpreter uses to evaluate expressions (think of it as a CPU register file, but stack-based).
- Frame data — a reference to the runtime constant pool for the class, and return-address information.
The default stack size is JVM-implementation-specific (typically 256 KB – 1 MB per thread). Tune it with -Xss if you need deeper recursion, but prefer iterative algorithms or trampolining first.
3. Native Method Stack
Supports native (C/C++) method execution. In HotSpot this is unified with the JVM Stack in practice.
4. Heap
The Heap is shared across all threads and holds every object instance and array ever created. It is the primary target of the garbage collector. The Heap is divided into generations (Young, Old/Tenured) by generational GC algorithms — discussed in depth in Lessons 3 and 4.
-Xms (initial size) and -Xmx (maximum size). Setting them equal (-Xms4g -Xmx4g) avoids the JVM pausing to grow the Heap at runtime, which is a common latency spike in production services.
5. Method Area (Metaspace)
The Method Area stores per-class data: the runtime constant pool, field and method metadata, and compiled bytecode. In HotSpot before Java 8 this was called PermGen (Permanent Generation) and lived inside the Java Heap. Java 8 replaced it with Metaspace, which allocates from native memory outside the Heap.
Metaspace grows dynamically by default. Set -XX:MaxMetaspaceSize to impose a ceiling; without it the JVM can consume unlimited native memory in systems that load many classes dynamically (ORMs, scripting engines, annotation processors).
6. Runtime Constant Pool
A per-class subset of the Method Area. It holds the symbolic and numeric constants from the class file's constant_pool table — string literals, class names, field/method descriptors. String literals are interned here; that is why "hello" == "hello" is true but new String("hello") == new String("hello") is false.
Putting It Together: a Method Call Traced
When you call order.calculateTotal() the JVM:
- Resolves the symbolic reference
calculateTotalin the constant pool to a direct method reference (if not already resolved). - Checks that
order's class is loaded; if not, triggers class loading. - Pushes a new stack frame for
calculateTotalonto the calling thread's JVM Stack. - Executes the bytecode instructions in the frame's operand stack, reading and writing the local variable array.
- When the method returns, pops the frame and places the return value onto the caller's operand stack.
javap -c -p MyClass.class (or javap -verbose for the constant pool). This is invaluable when diagnosing boxing overhead, unexpected object allocations, or inlining decisions — subjects covered in the upcoming lessons.
Summary
The JVM converts platform-neutral bytecode to executing programs through class loading (loading → linking → initialization), a delegation-based ClassLoader hierarchy, and six runtime data areas. The JVM Stack is per-thread and frame-based; the Heap is shared and GC-managed; Metaspace holds class metadata in native memory. This mental model is the prerequisite for every performance topic that follows: GC behaviour, JIT compilation, memory leaks, and profiling all make sense only when you know where the data lives.