Hibernate & Entity Mapping

The First-Level Cache

18 min Lesson 7 of 13

The First-Level Cache

Every EntityManager in JPA carries a built-in cache called the first-level cache (also known as the session cache or persistence context cache). Unlike a shared cache that lives between requests, this cache is scoped to a single EntityManager instance — it lives and dies with one unit of work. Understanding it is essential because it shapes every query you write, every object you compare, and every performance decision you make.

What the First-Level Cache Is

Hibernate maintains an internal identity map keyed by entity type plus primary key. The moment you load, persist, or merge an entity, Hibernate registers it in this map. Every subsequent lookup for the same type and ID within the same EntityManager returns the exact same Java object — no round-trip to the database, no second instantiation.

This cache is always on. There is no configuration switch to disable it. It is fundamental to how JPA's object-identity guarantee works: within a persistence context, the same database row is always represented by the same Java reference.

Persistence Context = Identity Map. The JPA specification requires that within a single persistence context two find() calls for the same entity class and primary key must return the same object instance. Hibernate fulfills this via the first-level cache.

Identity Guarantee in Practice

Consider this scenario inside a Spring service method annotated with @Transactional:

@Service
public class OrderService {

    @Autowired
    private EntityManager em;

    @Transactional
    public void demonstrateIdentity(Long orderId) {
        Order first  = em.find(Order.class, orderId);   // SELECT fires once
        Order second = em.find(Order.class, orderId);   // returns cached object — no SQL

        System.out.println(first == second);  // true — same reference
    }
}

Hibernate fires a single SELECT statement for the first find() call. On the second call it checks the identity map, finds an entry for (Order, orderId), and returns the same reference. No SQL is generated. You can confirm this by enabling SQL logging in application.properties:

spring.jpa.show-sql=true
spring.jpa.properties.hibernate.format_sql=true
logging.level.org.hibernate.SQL=DEBUG

How It Interacts with Lazy Loading

The first-level cache does not just store root entities. When Hibernate resolves a lazy association the result is also stored. If two different entities both reference the same related entity (for example, two OrderItem rows that share the same Product), Hibernate returns a single shared Product instance rather than creating a duplicate:

@Transactional
public void sharedAssociation(Long productId) {
    OrderItem item1 = em.find(OrderItem.class, 1L);
    OrderItem item2 = em.find(OrderItem.class, 2L);

    // If both items reference the same product row:
    Product p1 = item1.getProduct();   // triggers lazy load, caches (Product, productId)
    Product p2 = item2.getProduct();   // served from cache — same reference

    System.out.println(p1 == p2);  // true
}

This is not merely a performance convenience. It prevents lost-update anomalies: if you modify p1, p2 reflects the change immediately because they are the same object in memory.

Bypassing the Cache: When You Need Fresh Data

Because the first-level cache is scoped to the current EntityManager, it is never stale with respect to writes you made in the same transaction. However, another transaction on another thread could modify the same row concurrently. If you need to discard the cached state and reload from the database, use refresh():

@Transactional
public void reloadFromDatabase(Long orderId) {
    Order order = em.find(Order.class, orderId);

    // ... some time passes; another transaction may have updated this row ...

    em.refresh(order);   // issues SELECT, overwrites in-memory state
    System.out.println("Reloaded status: " + order.getStatus());
}

Do not overuse refresh(). It always hits the database and bypasses any optimistic-locking version checks. Reserve it for situations where you know an external process has changed a row — for example, after calling a stored procedure or a batch job that operates outside of your JPA session.

Evicting Individual Entries

You can also evict a specific entity from the identity map without reloading it, using detach():

Order order = em.find(Order.class, 42L);
em.detach(order);  // removed from persistence context

Order fresh = em.find(Order.class, 42L);  // cache miss — new SELECT fired
System.out.println(order == fresh);  // false — different instances now

After detach(), changes made to the old reference are no longer tracked by Hibernate's dirty-checking mechanism. The entity enters the detached state covered in the previous lesson.

Clearing the Entire Context

For batch processing — importing thousands of rows in a loop — the first-level cache becomes a liability. Every entity you persist accumulates in memory, and Hibernate's flush-time dirty checking iterates all of them. The standard pattern is to flush and clear periodically:

@Transactional
public void batchImport(List<ProductDto> dtos) {
    int batchSize = 50;

    for (int i = 0; i < dtos.size(); i++) {
        Product p = new Product();
        p.setName(dtos.get(i).getName());
        p.setPrice(dtos.get(i).getPrice());
        em.persist(p);

        if ((i + 1) % batchSize == 0) {
            em.flush();   // write accumulated inserts to the DB
            em.clear();   // evict all entities — reset the identity map
        }
    }
}

After clear(), the identity map is empty. Any entity reference you held before the call is now detached. This pattern keeps heap usage constant regardless of how many rows you process.

Combine flush() and clear() in batches, never clear() alone. If you clear without flushing first, pending inserts or updates are discarded and never written to the database. Always flush before you clear.

The Cache and JPQL Queries

There is an important subtlety: JPQL (and Criteria API) queries go directly to the database. They do not consult the first-level cache before executing SQL. However, after Hibernate receives the result rows it merges them into the identity map. If a returned row is already in the cache, Hibernate returns the cached instance (ignoring the fresh column values from the query result, unless you use LockModeType.PESSIMISTIC_WRITE or explicitly refresh). This can lead to stale reads if you modify an entity and then query for it within the same transaction without flushing first:

@Transactional
public void staleReadDemo(Long orderId) {
    Order order = em.find(Order.class, orderId);
    order.setStatus("CANCELLED");  // dirty — not yet flushed

    // Without em.flush() first, the JPQL result still reflects
    // the OLD status because Hibernate returns the cached (dirty) instance:
    List<Order> results = em.createQuery(
            "SELECT o FROM Order o WHERE o.id = :id", Order.class)
        .setParameter("id", orderId)
        .getResultList();

    System.out.println(results.get(0).getStatus());  // "CANCELLED" — from cache
}

Hibernate auto-flushes before queries in FlushModeType.AUTO (the default) only when the query targets an entity type that has pending changes. This is usually correct, but understanding the mechanism helps you reason about query results in complex transactions.

Summary

The first-level cache is a mandatory, per-EntityManager identity map that eliminates redundant SQL lookups and guarantees object identity within a persistence context. Use refresh() to force a reload from the database, detach() to evict a single entity, and the flush() + clear() pattern to keep batch operations memory-efficient. Awareness of how JPQL queries interact with the cache — bypassing it on the way out but merging results into it on the way back — is essential for avoiding hard-to-diagnose stale-read bugs.