The Repository Abstraction
The Repository Abstraction
The single biggest productivity win that Spring Data JPA brings is the repository abstraction: declare an interface, extend one of Spring Data's built-in repository interfaces, and you instantly have a full suite of data-access methods — with zero boilerplate implementation code. Understanding exactly what you get, how it works under the hood, and where the performance pitfalls hide is what separates a developer who merely uses Spring Data from one who uses it well.
The Core Hierarchy
Spring Data ships three repository interfaces you will work with directly, arranged in a hierarchy of increasing capability:
Repository<T, ID>— the root marker interface. Contains no methods at all. Useful when you want to opt into Spring Data's infrastructure (scanning, proxy creation) but expose only the methods you explicitly declare yourself.CrudRepository<T, ID>— extendsRepositoryand adds the 12 standard CRUD operations.JpaRepository<T, ID>— extendsPagingAndSortingRepository(which adds pagination) andCrudRepository, then layers on JPA-specific operations: batch saves, flush control, and entity references.
JpaRepository for most production repositories — you get everything. Drop down to CrudRepository when you want to hide paging/sorting methods from callers to prevent accidental full-table loads on large datasets.
Declaring Your First Repository
Suppose you have an Order entity with a Long primary key. Declaring a repository is a single interface:
That is all. Spring Boot's auto-configuration scans your packages, finds this interface, generates a JDK dynamic proxy implementing it, and registers it as a Spring bean. You inject it like any other bean:
What CrudRepository Gives You for Free
By extending CrudRepository (which JpaRepository includes), you immediately have these methods without writing a single line of SQL or JPQL:
save(S entity)— persists a new entity or merges an existing one. Returns the managed entity.saveAll(Iterable<S> entities)— batch save/merge.findById(ID id)— returnsOptional<T>. Never returns null; models the absence explicitly.existsById(ID id)— issues aSELECT COUNT— cheaper than loading the whole entity just to check existence.findAll()— loads every row in the table. Use with caution on large tables.findAllById(Iterable<ID> ids)— loads a specific set of entities by their primary keys, using an SQLINclause.count()— returns the total row count.deleteById(ID id)— deletes by primary key; throwsEmptyResultDataAccessExceptionif the ID does not exist.delete(T entity)— deletes the given entity; the entity must be managed or detached with the correct ID.deleteAllById(Iterable<ID> ids)— deletes by a set of IDs.deleteAll(Iterable<T> entities)— deletes the given entities.deleteAll()— deletes every row. Rarely what you want in production.
Extra Methods Added by JpaRepository
JpaRepository goes further with JPA-specific operations:
findAll(Sort sort)/findAll(Pageable pageable)— sorted or paginated fetches (covered in the Pagination lesson).saveAndFlush(S entity)— saves and immediately flushes to the database within the same transaction. Useful in tests that need to verify constraints before the transaction commits.saveAllAndFlush(Iterable<S> entities)— batch save then flush.deleteAllInBatch()/deleteAllByIdInBatch(Iterable<ID> ids)— issues a single bulkDELETESQL statement rather than loading entities and callingdeleteon each one. Dramatically faster for bulk deletions, but it bypasses Hibernate lifecycle callbacks and cascades.getReferenceById(ID id)— returns a Hibernate proxy without hitting the database. Useful when you need an entity reference solely to satisfy a foreign-key association on another entity you are about to persist.flush()— forces Hibernate to synchronise the persistence context with the database immediately.
getReferenceById to avoid unnecessary SELECTs. If you are setting the customer field on a new Order and you already have the customer ID, call customerRepo.getReferenceById(customerId) instead of findById. No SELECT is issued — Hibernate creates a proxy and only resolves it if you actually access its fields.
The save() Method — New vs. Existing Entities
save(entity) is deceptively simple but has important behaviour to understand:
Under the hood, Spring Data's SimpleJpaRepository checks whether the entity is new by calling EntityInformation.isNew(entity). By default this checks whether the ID field is null. When new, it calls EntityManager.persist(); when existing, it calls EntityManager.merge().
save(entity) on a detached entity triggers merge(), which copies the state into a new managed instance and returns it. Always work with the returned instance — changes made to the original detached object after the call will not be persisted.
deleteAllInBatch vs deleteAll — A Performance Critical Distinction
This is one of the most impactful trade-offs in Spring Data:
deleteAll()— loads each entity individually (one SELECT per entity unless batch fetching is configured), fires Hibernate lifecycle events (@PreRemove, cascades, orphan removal) for each one, then issues one DELETE per entity. Safe and correct, but O(n) database round trips.deleteAllInBatch()— executes a singleDELETE FROM orderswith no loading at all. Extremely fast, but it bypasses all Hibernate cascades and lifecycle callbacks.
Use batch deletion when you are clearing staging tables, test data, or collections without cascade dependencies. Use deleteAll() when correct cascade behaviour matters.
How Spring Data Generates the Implementation
At application startup Spring Data creates a SimpleJpaRepository<T, ID> backed by the EntityManager for each of your repository interfaces. This class is annotated with @Transactional(readOnly = true) at the class level, meaning all query methods run in a read-only transaction by default, which is a meaningful Hibernate optimisation (it disables dirty checking on loaded entities). Write methods (save, delete) are individually annotated with @Transactional (read-write), overriding the class-level setting.
@Transactional on your service method when it must span multiple repository calls in a single unit of work.
Practical Example — OrderRepository in Use
Summary
The repository abstraction eliminates the hand-written DAO layer entirely. CrudRepository covers the 12 fundamental CRUD operations; JpaRepository adds pagination, flush control, and JPA-optimised batch operations. Spring Data generates a SimpleJpaRepository proxy at startup, defaulting to read-only transactions for queries and read-write transactions for mutations. Master the distinction between save/merge semantics, the performance gap between deleteAll and deleteAllInBatch, and the zero-SELECT trick of getReferenceById — these are the details that matter in production. In the next lesson you will extend this foundation with derived query methods that let you query by any entity field without writing JPQL.