Spring Data JPA

The Repository Abstraction

18 min Lesson 3 of 13

The Repository Abstraction

The single biggest productivity win that Spring Data JPA brings is the repository abstraction: declare an interface, extend one of Spring Data's built-in repository interfaces, and you instantly have a full suite of data-access methods — with zero boilerplate implementation code. Understanding exactly what you get, how it works under the hood, and where the performance pitfalls hide is what separates a developer who merely uses Spring Data from one who uses it well.

The Core Hierarchy

Spring Data ships three repository interfaces you will work with directly, arranged in a hierarchy of increasing capability:

Repository<T, ID> — the root marker interface. Contains no methods at all. Useful when you want to opt into Spring Data's infrastructure (scanning, proxy creation) but expose only the methods you explicitly declare yourself.
CrudRepository<T, ID> — extends Repository and adds the 12 standard CRUD operations.
JpaRepository<T, ID> — extends PagingAndSortingRepository (which adds pagination) and CrudRepository, then layers on JPA-specific operations: batch saves, flush control, and entity references.

Rule of thumb for choosing: Prefer JpaRepository for most production repositories — you get everything. Drop down to CrudRepository when you want to hide paging/sorting methods from callers to prevent accidental full-table loads on large datasets.

Declaring Your First Repository

Suppose you have an Order entity with a Long primary key. Declaring a repository is a single interface:

package com.example.shop.repository;

import com.example.shop.entity.Order;
import org.springframework.data.jpa.repository.JpaRepository;

public interface OrderRepository extends JpaRepository<Order, Long> {
    // Spring Data generates the implementation automatically at startup
}

That is all. Spring Boot's auto-configuration scans your packages, finds this interface, generates a JDK dynamic proxy implementing it, and registers it as a Spring bean. You inject it like any other bean:

@Service
public class OrderService {

    private final OrderRepository orders;

    public OrderService(OrderRepository orders) {
        this.orders = orders;
    }
}

What CrudRepository Gives You for Free

By extending CrudRepository (which JpaRepository includes), you immediately have these methods without writing a single line of SQL or JPQL:

save(S entity) — persists a new entity or merges an existing one. Returns the managed entity.
saveAll(Iterable<S> entities) — batch save/merge.
findById(ID id) — returns Optional<T>. Never returns null; models the absence explicitly.
existsById(ID id) — issues a SELECT COUNT — cheaper than loading the whole entity just to check existence.
findAll() — loads every row in the table. Use with caution on large tables.
findAllById(Iterable<ID> ids) — loads a specific set of entities by their primary keys, using an SQL IN clause.
count() — returns the total row count.
deleteById(ID id) — deletes by primary key; throws EmptyResultDataAccessException if the ID does not exist.
delete(T entity) — deletes the given entity; the entity must be managed or detached with the correct ID.
deleteAllById(Iterable<ID> ids) — deletes by a set of IDs.
deleteAll(Iterable<T> entities) — deletes the given entities.
deleteAll() — deletes every row. Rarely what you want in production.

Extra Methods Added by JpaRepository

JpaRepository goes further with JPA-specific operations:

findAll(Sort sort) / findAll(Pageable pageable) — sorted or paginated fetches (covered in the Pagination lesson).
saveAndFlush(S entity) — saves and immediately flushes to the database within the same transaction. Useful in tests that need to verify constraints before the transaction commits.
saveAllAndFlush(Iterable<S> entities) — batch save then flush.
deleteAllInBatch() / deleteAllByIdInBatch(Iterable<ID> ids) — issues a single bulk DELETE SQL statement rather than loading entities and calling delete on each one. Dramatically faster for bulk deletions, but it bypasses Hibernate lifecycle callbacks and cascades.
getReferenceById(ID id) — returns a Hibernate proxy without hitting the database. Useful when you need an entity reference solely to satisfy a foreign-key association on another entity you are about to persist.
flush() — forces Hibernate to synchronise the persistence context with the database immediately.

Use getReferenceById to avoid unnecessary SELECTs. If you are setting the customer field on a new Order and you already have the customer ID, call customerRepo.getReferenceById(customerId) instead of findById. No SELECT is issued — Hibernate creates a proxy and only resolves it if you actually access its fields.

The save() Method — New vs. Existing Entities

save(entity) is deceptively simple but has important behaviour to understand:

// INSERT — entity with no ID set (null primary key)
Order newOrder = new Order();
newOrder.setTotal(BigDecimal.valueOf(99.99));
Order saved = orders.save(newOrder);
// saved.getId() is now populated

// UPDATE — entity with an existing ID
saved.setTotal(BigDecimal.valueOf(109.99));
orders.save(saved);   // Hibernate issues an UPDATE

Under the hood, Spring Data's SimpleJpaRepository checks whether the entity is new by calling EntityInformation.isNew(entity). By default this checks whether the ID field is null. When new, it calls EntityManager.persist(); when existing, it calls EntityManager.merge().

Watch out for merge semantics with detached entities. Calling save(entity) on a detached entity triggers merge(), which copies the state into a new managed instance and returns it. Always work with the returned instance — changes made to the original detached object after the call will not be persisted.

deleteAllInBatch vs deleteAll — A Performance Critical Distinction

This is one of the most impactful trade-offs in Spring Data:

deleteAll() — loads each entity individually (one SELECT per entity unless batch fetching is configured), fires Hibernate lifecycle events (@PreRemove, cascades, orphan removal) for each one, then issues one DELETE per entity. Safe and correct, but O(n) database round trips.
deleteAllInBatch() — executes a single DELETE FROM orders with no loading at all. Extremely fast, but it bypasses all Hibernate cascades and lifecycle callbacks.

Use batch deletion when you are clearing staging tables, test data, or collections without cascade dependencies. Use deleteAll() when correct cascade behaviour matters.

How Spring Data Generates the Implementation

At application startup Spring Data creates a SimpleJpaRepository<T, ID> backed by the EntityManager for each of your repository interfaces. This class is annotated with @Transactional(readOnly = true) at the class level, meaning all query methods run in a read-only transaction by default, which is a meaningful Hibernate optimisation (it disables dirty checking on loaded entities). Write methods (save, delete) are individually annotated with @Transactional (read-write), overriding the class-level setting.

You inherit transactional defaults for free. You do not need to annotate your service methods just to run a simple find or save — the repository layer already manages the transaction. You only need @Transactional on your service method when it must span multiple repository calls in a single unit of work.

Practical Example — OrderRepository in Use

@Service
@RequiredArgsConstructor
public class OrderService {

    private final OrderRepository orders;
    private final CustomerRepository customers;

    // findById returns Optional — handle absence explicitly
    public Order getOrThrow(Long id) {
        return orders.findById(id)
                     .orElseThrow(() -> new EntityNotFoundException("Order " + id));
    }

    // Efficient association set via proxy — no SELECT on Customer
    @Transactional
    public Order createOrder(Long customerId, BigDecimal total) {
        Order order = new Order();
        order.setCustomer(customers.getReferenceById(customerId));
        order.setTotal(total);
        return orders.save(order);
    }

    // Bulk cleanup — single DELETE statement
    @Transactional
    public void clearDraftOrders(List<Long> draftIds) {
        orders.deleteAllByIdInBatch(draftIds);
    }
}

Summary

The repository abstraction eliminates the hand-written DAO layer entirely. CrudRepository covers the 12 fundamental CRUD operations; JpaRepository adds pagination, flush control, and JPA-optimised batch operations. Spring Data generates a SimpleJpaRepository proxy at startup, defaulting to read-only transactions for queries and read-write transactions for mutations. Master the distinction between save/merge semantics, the performance gap between deleteAll and deleteAllInBatch, and the zero-SELECT trick of getReferenceById — these are the details that matter in production. In the next lesson you will extend this foundation with derived query methods that let you query by any entity field without writing JPQL.