The Streams API

sorted, distinct, limit & skip

15 min Lesson 6 of 13

sorted, distinct, limit & skip

You have already seen stateless intermediate operations — filter and map process each element independently without looking at any other element. This lesson focuses on the four stateful intermediate operations and introduces the concept of short-circuiting, both of which matter for performance and correctness.

Stateful vs Stateless operations

A stateless operation needs only the current element. A stateful operation must accumulate or inspect multiple elements before it can produce output. The Streams API gives you four stateful intermediates:

  • sorted() — order all elements
  • distinct() — remove duplicates
  • limit(n) — keep only the first n elements
  • skip(n) — discard the first n elements
Why does stateful matter? When a stream is run in parallel, stateful operations force synchronisation between threads — they are much more expensive than stateless ones. Prefer stateless operations in hot parallel paths, and place limit / skip as early as possible so fewer elements reach the heavy operations downstream.

sorted

sorted() with no arguments sorts by natural order (elements must implement Comparable). Pass a Comparator to sort by any key:

import java.util.List; import java.util.Comparator; List<String> words = List.of("banana", "apple", "cherry", "apricot"); // natural alphabetical order words.stream() .sorted() .forEach(System.out::println); // apple, apricot, banana, cherry // sort by length descending, then alphabetically for ties words.stream() .sorted(Comparator.comparingInt(String::length) .reversed() .thenComparing(Comparator.naturalOrder())) .forEach(System.out::println); // apricot, banana, cherry, apple
sorted() buffers the entire stream. It cannot emit a single element until it has seen every element — so on an infinite stream it will loop forever (or until memory runs out). Always pair an infinite source with limit before sorted.

distinct

distinct() removes duplicate elements using equals and hashCode. It is order-preserving for sequential streams (first occurrence wins):

import java.util.List; List<Integer> numbers = List.of(3, 1, 4, 1, 5, 9, 2, 6, 5, 3); numbers.stream() .distinct() .forEach(System.out::println); // 3, 1, 4, 5, 9, 2, 6 (first occurrence of each value)

This is especially useful when flattening collections where the same value can appear multiple times across different sub-lists.

distinct() relies on equals/hashCode. If you call it on a stream of custom objects, make sure those objects have a correct equals / hashCode implementation, otherwise two logically identical objects will both pass through.

limit and skip — short-circuit operations

limit(n) keeps at most the first n elements and then stops the pipeline. skip(n) discards the first n elements and passes the rest downstream. Together they enable pagination of a stream:

import java.util.List; import java.util.stream.Collectors; List<String> items = List.of("a","b","c","d","e","f","g","h"); int page = 1; // zero-based page index int pageSize = 3; List<String> page1 = items.stream() .skip((long) page * pageSize) // skip first 3 .limit(pageSize) // keep next 3 .collect(Collectors.toList()); System.out.println(page1); // [d, e, f]

The word short-circuit means the pipeline does not need to process every source element. Once limit has emitted its quota it signals the source to stop. This is the same concept as && stopping early in boolean expressions:

import java.util.stream.Stream; // Without limit this would run forever. // With limit(5) the stream stops after producing 5 elements. Stream.iterate(0, n -> n + 1) // infinite: 0, 1, 2, 3 ... .limit(5) .forEach(System.out::println); // 0, 1, 2, 3, 4
Short-circuit terminal operations also exist. findFirst(), findAny(), anyMatch(), noneMatch(), and allMatch() can all stop the pipeline early. You will see them in a later lesson on Optional with Streams.

Combining all four in a real pipeline

A realistic scenario: from a list of log lines, find the top-5 unique error messages sorted alphabetically, skipping the first one (for some pagination use-case):

import java.util.List; import java.util.stream.Collectors; List<String> logs = List.of( "ERROR: null pointer", "INFO: started", "ERROR: timeout", "ERROR: null pointer", // duplicate "ERROR: disk full", "ERROR: timeout", // duplicate "ERROR: out of memory", "ERROR: connection refused" ); List<String> result = logs.stream() .filter(line -> line.startsWith("ERROR")) // stateless .map(line -> line.substring(7)) // stateless: strip "ERROR: " .distinct() // stateful: remove duplicates .sorted() // stateful: alphabetical .skip(1) // short-circuit: drop first .limit(5) // short-circuit: keep at most 5 .collect(Collectors.toList()); System.out.println(result); // [null pointer, out of memory, timeout] (connection refused was skip(1)'d)

Performance tips

  • Put filter before sorted and distinct to reduce the number of elements the stateful operations have to buffer.
  • Put limit as early as the logic allows — every element cut before a heavy operation saves work.
  • Avoid sorted on large parallel streams unless truly necessary; it introduces a merge step that can negate the parallelism benefit.

Summary

sorted and distinct are stateful — they must see all (or many) elements before producing output. limit and skip are short-circuit — they stop or skip early, making infinite streams practical. Placing expensive stateful operations late and short-circuit operations early is a simple rule that keeps pipelines efficient.