Project: Diagnosing a Performance Problem
Project: Diagnosing a Performance Problem
This capstone lesson walks you through a complete, realistic performance investigation: a service that starts slow, degrades under load, and eventually runs out of memory. You will apply every skill from this tutorial — profiling, GC analysis, JIT awareness, and systematic benchmarking — to find the root causes and fix them.
The Scenario
A team reports that their ReportService takes 12 seconds to generate a report with 50,000 rows and that heap usage climbs with every call. Your job is to diagnose and fix the problem without guessing. The starting code looks like this:
Step 1 — Establish a Baseline Benchmark
Before touching anything, write a JMH benchmark so you have a repeatable, JIT-warmed number to compare against.
Step 2 — Attach a Profiler and Read the Flame Graph
Run the benchmark with async-profiler (or JFR) to collect a CPU flame graph:
Open the .jfr file in JDK Mission Control. The flame graph reveals three hot paths:
Stringconcatenation inside the inner loop — 54% of CPU time.rows.hashCode()on a 50,000-element list — called on every invocation — 28% of CPU time.HashMap.putwith growing allocations — 11% of CPU time.
Step 3 — Analyse the Heap with a Heap Dump
Trigger a heap dump after ten report generations and open it in Eclipse MAT or VisualVM:
MAT's "Dominator Tree" shows ReportService.cache retaining 480 MB — it holds List<Row>> objects, not the formatted strings. The cache key is based on rows.hashCode(), which changes every time because Row does not override hashCode(), so the cache never actually hits. This is the memory leak.
LinkedHashMap with a max-size eviction policy or a proper cache like Caffeine.
Step 4 — Fix the Problems One at a Time
Fix each problem in isolation so you can measure the impact of each change independently.
Fix 1 — Replace implicit String concatenation with a dedicated StringBuilder inside the inner loop:Benchmark after Fix 1: 4,210 ms/op — a 64% reduction from concatenation alone.
Fix 2 — Remove the broken cache or replace it with a bounded, correct one:Benchmark after Fix 2: 3,980 ms/op. Memory stabilises at ~20 MB regardless of how many calls are made.
Fix 3 — Pre-size the outer StringBuilder:Benchmark after Fix 3: 3,410 ms/op.
Fix 4 — Stream the output instead of building one giant String:Streaming avoids materialising the full result in memory. For a 50,000-row report this eliminates a 4 MB intermediate allocation. Benchmark after all four fixes: 680 ms/op — an 18x improvement over the original 11,843 ms.
Step 5 — Verify Under Load
A single-threaded benchmark is not the whole story. Verify with concurrent load using a simple executor-based stress test:
Step 6 — Document the Investigation
Every professional performance fix should be accompanied by a brief write-up covering: the observed symptom, the profiling evidence that pointed to each root cause, the fix applied, and the before/after numbers. This creates institutional memory and prevents the same regression from sneaking back in through future code review.
Lessons Learned from This Project
- Measure first, fix second. The flame graph revealed that inner-loop string concatenation was responsible for more than half of CPU time — something no code review would have quantified.
- Heap dumps expose leaks that metrics miss. The cache was growing invisibly; only the dominator tree made it clear.
- Fix one thing at a time. If you apply all four fixes simultaneously you cannot tell which one delivered the most value.
- Streaming beats buffering for large outputs. Avoiding the large intermediate
Stringwas the single biggest remaining win after fixing the allocation hot spot. - Bounded caches are mandatory. Any cache without a size cap is a memory leak waiting to happen in a long-running service.
Summary
A systematic performance investigation follows a repeatable workflow: baseline benchmark → CPU flame graph → heap analysis → targeted fixes measured individually → concurrent load verification → written record. The tools change (JFR, async-profiler, MAT, JMH) but the process is always the same. Master the process and you will find the problem every time.