Java Streams and Collectors: A Practical Guide and Cheat Sheet with Real-World Examples

05/05/2025

Introduction

Having started with Java before Java 8, I have experienced how Streams and Lambda expressions completely transformed the way we handle collections and data processing. They provide a powerful, functional-style approach to manipulate sequences of elements, making code more readable and efficient. In this guide we will explore the core concepts of Java Streams and Collectors with practical examples and cover all key features including Stream gatherers which was finalized in Java 24.

Key Concepts

At a high level, below are the key components of the stream pipeline:

Source: The data source (e.g., a collection, array, or I/O channel).
Stream: Represents a sequence of elements from the source.
Intermediate Operations: Transform the stream (e.g., filter, map, sorted). We can chain multiple operations to create a functional pipeline. These operations are lazy and do not do anything until a terminal operation is invoked.
Terminal Operations: Trigger the processing of the stream and produce a result (e.g., collect, forEach, reduce). These operations are eager and consume the stream. Once a terminal operation is called, the stream cannot be reused.
Collectors: Special classes that define how to collect the results of a stream into a specific data structure (e.g., List, Set, Map).

Creating a Stream

Streams can be created from various data sources, including collections, arrays, and I/O channels. Here are some common ways to create streams:

From Collections

Collections (like List, Set, Map) are the most common source of streams. All collections in the Java Collection Framework provide a stream() method:


// Creating a stream from a List
List<String> nameList = List.of("Alex", "Brian", "Charles");
Stream<String> nameStream = nameList.stream();

// Creating a stream from a Set
Set<Integer> numberSet = Set.of(1, 2, 3, 4, 5);
Stream<Integer> numberStream = numberSet.stream();

// Creating a stream from a Map's keys, values, or entries
Map<String, Integer> ageMap = Map.of("Alex", 25, "Brian", 32, "Charles", 41);
Stream<String> keyStream = ageMap.keySet().stream();
Stream<Integer> valueStream = ageMap.values().stream();
Stream<Map.Entry<String, Integer>> entryStream = ageMap.entrySet().stream();

From Arrays

Arrays can be converted to streams using static methods in the Arrays class:


// Creating a stream from an array
String[] namesArray = {"Alex", "Brian", "Charles"};
Stream<String> arrayStream = Arrays.stream(namesArray);

// For primitive arrays, specialized streams are returned
int[] numbers = {1, 2, 3, 4, 5};
IntStream intStream = Arrays.stream(numbers);

From File/IO Sources

Java NIO provides methods to create streams from files and other I/O sources:


// Stream of lines from a file
try (Stream<String> lineStream = Files.lines(Path.of("data.txt"))) {
    lineStream.forEach(System.out::println);
}

// Stream of all files in a directory
try (Stream<Path> pathStream = Files.list(Path.of("/test-directory"))) {
    pathStream.forEach(System.out::println);
}

// Walking a directory tree
try (Stream<Path> walkStream = Files.walk(Path.of("/test-directory"))) {
    walkStream.filter(Files::isRegularFile)
        .forEach(System.out::println);
}

Try with resources is used to ensure that the stream is closed after use, preventing resource leaks. This is probably one of the rare scenarios where we manually close the stream.

From Static Factory Methods

The Stream interface provides several static methods to create streams:


// Stream of specific elements
Stream<String> elementStream = Stream.of("Alex", "Brian", "Charles");

// Empty stream
Stream<String> emptyStream = Stream.empty();

// Infinite stream generated by a supplier
Stream<UUID> uuidStream = Stream.generate(UUID::randomUUID).limit(5);

// Infinite stream of sequential elements
Stream<Integer> iterateStream = Stream.iterate(1, n -> n + 1).limit(10);

// Iterate with predicate (Java 9+)
Stream<Integer> predicateStream = Stream.iterate(1, n -> n < 10, n -> n + 2);

Primitive Streams

There are specialized stream types for primitives to avoid boxing overhead:


// IntStream, LongStream, DoubleStream
IntStream intRangeStream = IntStream.range(1, 6);    // 1, 2, 3, 4, 5
IntStream intRangeClosedStream = IntStream.rangeClosed(1, 5);  // 1, 2, 3, 4, 5

// From Object stream to primitive stream

record Person(String name, int age) {}
List<Person> people = List.of(new Person("Alex", 25), new Person("Brian", 32), new Person("Charles", 41));
var average = people.stream()
        .mapToInt(Person::age) // Convert to IntStream
        .average().orElse(0.0) // Can invoke IntStream methods like average, sum, etc.

Concatenating Streams

Multiple streams can be concatenated into a single stream:


Stream<String> stream1 = Stream.of("A", "B", "C");
Stream<String> stream2 = Stream.of("X", "Y", "Z");
Stream<String> combinedStream = Stream.concat(stream1, stream2);  // A, B, C, X, Y, Z

Intermediate and Terminal Operations

We will learn about intermediate and terminal operations through some examples

We'll use a dataset of e-commerce orders throughout this tutorial. Here's our model:


record Customer(
    String id, 
    String name, 
    String email, 
    LocalDate registrationDate,
    String tier  // "standard", "premium", or "elite"
) {}

record Product(
    String id, 
    String name, 
    String category, 
    BigDecimal price
) {}

record OrderItem(
    Product product, 
    int quantity
) {}

record Order(
    String id, 
    Customer customer, 
    LocalDate orderDate, 
    List<OrderItem> items, 
    String status  // "placed", "shipped", "delivered", or "canceled"
) {}

Let's create some sample data to work with:


List<Product> products = List.of(
    new Product("P1", "iPhone 14", "Electronics", new BigDecimal("999.99")),
    new Product("P2", "MacBook Pro", "Electronics", new BigDecimal("1999.99")),
    new Product("P3", "Coffee Maker", "Appliances", new BigDecimal("89.99")),
    new Product("P4", "Running Shoes", "Sportswear", new BigDecimal("129.99")),
    new Product("P5", "Yoga Mat", "Sportswear", new BigDecimal("25.99")),
    new Product("P6", "Water Bottle", "Sportswear", new BigDecimal("12.99")),
    new Product("P7", "Wireless Earbuds", "Electronics", new BigDecimal("159.99")),
    new Product("P8", "Smart Watch", "Electronics", new BigDecimal("349.99")),
    new Product("P9", "Blender", "Appliances", new BigDecimal("79.99")),
    new Product("P10", "Desk Lamp", "Home", new BigDecimal("34.99"))
);

List<Customer> customers = List.of(
    new Customer("C1", "John Smith", "john@example.com", LocalDate.of(2020, 1, 15), "elite"),
    new Customer("C2", "Emma Johnson", "emma@example.com", LocalDate.of(2021, 3, 20), "standard"),
    new Customer("C3", "Michael Brown", "michael@example.com", LocalDate.of(2019, 7, 5), "premium"),
    new Customer("C4", "Olivia Wilson", "olivia@example.com", LocalDate.of(2022, 2, 10), "standard"),
    new Customer("C5", "William Davis", "william@example.com", LocalDate.of(2020, 11, 25), "elite")
);

List<Order> orders = List.of(
    new Order("O1", customers.get(0), LocalDate.of(2023, 3, 15), 
        List.of(
            new OrderItem(products.get(0), 1),
            new OrderItem(products.get(7), 1)
        ), 
        "delivered"),
    new Order("O2", customers.get(2), LocalDate.of(2023, 4, 2), 
        List.of(
            new OrderItem(products.get(1), 1)
        ), 
        "delivered"),
    new Order("O3", customers.get(1), LocalDate.of(2023, 4, 15), 
        List.of(
            new OrderItem(products.get(2), 1),
            new OrderItem(products.get(9), 2)
        ), 
        "shipped"),
    new Order("O4", customers.get(0), LocalDate.of(2023, 5, 1), 
        List.of(
            new OrderItem(products.get(3), 1),
            new OrderItem(products.get(4), 1),
            new OrderItem(products.get(5), 2)
        ), 
        "placed"),
    new Order("O5", customers.get(4), LocalDate.of(2023, 5, 5), 
        List.of(
            new OrderItem(products.get(6), 1)
        ), 
        "canceled"),
    new Order("O6", customers.get(3), LocalDate.of(2023, 5, 10), 
        List.of(
            new OrderItem(products.get(8), 1),
            new OrderItem(products.get(9), 1)
        ), 
        "placed"),
    new Order("O7", customers.get(2), LocalDate.of(2023, 5, 15), 
        List.of(
            new OrderItem(products.get(0), 1),
            new OrderItem(products.get(1), 1)
        ), 
        "placed"),
    new Order("O8", customers.get(0), LocalDate.of(2023, 5, 20), 
        List.of(
            new OrderItem(products.get(7), 1)
        ), 
        "placed")
);

Basic Stream Operations

Filtering and Collecting

Scenario: Find all electronic products and create a list of their names.


List<String> electronicProductNames = products.stream()
    .filter(product -> product.category().equals("Electronics"))
    .map(Product::name)
    .collect(Collectors.toList());

System.out.println(electronicProductNames);
// Output: [iPhone 14, MacBook Pro, Wireless Earbuds, Smart Watch]

What's happening:

stream(): Creates a stream from the products list
filter(): This is an intermediate operation that filters products based on the category. Here we are passing a predicate lambda function that checks if the product's category is "Electronics"
map(): Another intermediate operation that transforms each Product into its name. We use a method reference Product::name to extract the name from each Product object
collect(): This terminal operation gathers the results into a new List. Till we specify a terminal operation, the stream is not processed.

Finding Elements

Scenario: Find the first product in the "Sportswear" category that costs less than $20.


Optional<Product> affordableSportswearProduct = products.stream()
    .filter(product -> product.category().equals("Sportswear"))
    .filter(product -> product.price().compareTo(new BigDecimal("20")) < 0)
    .findFirst();

affordableSportswearProduct.ifPresent(product -> 
    System.out.println("Affordable sportswear: " + product.name() + " - $" + product.price()));
// Output: Affordable sportswear: Water Bottle - $12.99

What's happening:

Two filter() operations chain together to find sportswear products under $20. You can also combine them into a single filter using logical AND (&&).
findFirst() is a terminal operation that returns an Optional containing the first match
ifPresent() executes the provided lambda only if a match was found

Optional is a container object which may or may not contain a non-null value. It is generally used to avoid null checks and NullPointerException. It is a good practice to use Optional in return types of methods that may not always return a value.

Transforming Elements in a Stream

Scenario: Create a list of order summaries showing order ID and total item count.


record OrderSummary(String orderId, int itemCount) {}

List<OrderSummary> orderSummaries = orders.stream()
    .map(order -> new OrderSummary(
        order.id(),
        order.items().stream().mapToInt(OrderItem::quantity).sum()
    ))
    .collect(Collectors.toList());

orderSummaries.forEach(System.out::println);
// Output:
// OrderSummary[orderId=O1, itemCount=2]
// OrderSummary[orderId=O2, itemCount=1]
// OrderSummary[orderId=O3, itemCount=3]
// ... and so on

What's happening:

map() transforms each Order into an OrderSummary
For each order, a nested stream calculates the total quantity of items
Results are collected into a list of OrderSummary objects

Sorting Elements

Scenario: Show all products sorted by price (lowest to highest).


List<Product> sortedProducts = products.stream()
    .sorted(Comparator.comparing(Product::price))
    .collect(Collectors.toList());

sortedProducts.forEach(product -> 
    System.out.println(product.name() + " - $" + product.price()));
// Output:
// Water Bottle - $12.99
// Yoga Mat - $25.99
// Desk Lamp - $34.99
// ... (remaining products in ascending price order)

What's happening:

sorted() with a Comparator arranges products by price
Comparator.comparing() creates a comparator based on the Product::price method reference

You can chain comparators to sort by multiple fields. For example, Comparator.comparing(Product::category).thenComparing(Product::price) would first sort by category and then by price within each category.

Limiting and Skipping

Scenario: Implement a simple pagination for products, showing the second page with 3 products per page.


int pageSize = 3;
int pageNumber = 1; // 0-based index, so this is the second page

List<Product> paginatedProducts = products.stream()
    .skip(pageSize * pageNumber)
    .limit(pageSize)
    .collect(Collectors.toList());

paginatedProducts.forEach(product -> System.out.println(product.name()));
// Output:
// Running Shoes
// Yoga Mat
// Water Bottle

What's happening:

skip() bypasses the first page of items
limit() takes only enough items to fill the requested page
The combination implements basic pagination

Flattening Nested Collections

Scenario: Create a list of all products that have been ordered.


List<Product> orderedProducts = orders.stream()
    .flatMap(order -> order.items().stream())
    .map(OrderItem::product)
    .distinct()
    .collect(Collectors.toList());

orderedProducts.forEach(product -> System.out.println(product.name()));
// Output will show each product that appears in at least one order, without duplicates

What's happening:

flatMap() converts each order's list of items into a stream, then flattens all into a single stream
map() extracts just the product from each order item
distinct() removes duplicate products that were ordered multiple times

Reducing Elements

Scenario: Calculate the total revenue from all orders.


BigDecimal totalRevenue = orders.stream()
    .flatMap(order -> order.items().stream())
    .map(item -> item.product().price().multiply(BigDecimal.valueOf(item.quantity())))
    .reduce(BigDecimal.ZERO, BigDecimal::add);    
System.out.println("Total Revenue: $" + totalRevenue);
// Output: Total Revenue: $1349.98

What's happening:

flatMap() flattens the order items into a single stream
map() calculates the revenue for each item by multiplying its price with the quantity
reduce() aggregates all revenues into a single total, starting from BigDecimal.ZERO

Short-circuiting Operations

Scenario: Check if any, all, or none of the products are in the "Electronics" category, and find any product over $1000.


boolean anyElectronics = products.stream()
    .anyMatch(product -> product.category().equals("Electronics"));

boolean allExpensive = products.stream()
    .allMatch(product -> product.price().compareTo(new BigDecimal("100")) > 0);

boolean noneHome = products.stream()
    .noneMatch(product -> product.category().equals("Toys"));

Optional<Product> anyHighPriced = products.stream()
    .filter(product -> product.price().compareTo(new BigDecimal("1000")) > 0)
    .findAny();

System.out.println("Any electronics? " + anyElectronics);
System.out.println("All products expensive? " + allExpensive);
System.out.println("No toys? " + noneHome);
anyHighPriced.ifPresent(p -> System.out.println("A high-priced product: " + p.name()));
// Output:
// Any electronics? true
// All products expensive? false
// No toys? true
// A high-priced product: MacBook Pro

What's happening:

anyMatch, allMatch, and noneMatch are terminal operations that short-circuit as soon as the result is determined.
findAny returns any matching element.

Using peek() for Debugging

Scenario: Trace the stream pipeline to debug filtering and mapping.


List<String> debuggedNames = products.stream()
    .peek(p -> System.out.println("Before filter: " + p.name()))
    .filter(product -> product.category().equals("Electronics"))
    .peek(p -> System.out.println("After filter: " + p.name()))
    .map(Product::name)
    .peek(name -> System.out.println("Mapped name: " + name))
    .collect(Collectors.toList());

What's happening:

peek() is an intermediate operation for debugging or tracing elements as they flow through the pipeline.
Avoid using peek() for side effects in production code.

Basic Collectors

Collectors are used to gather the results of stream operations into a collection or other data structure. Here are some common collectors:

Joining Strings

Scenario: Create a comma-separated list of all product categories.


String categories = products.stream()
    .map(Product::category)
    .distinct()
    .sorted()
    .collect(Collectors.joining(", "));

System.out.println("Available categories: " + categories);
// Output: Available categories: Appliances, Electronics, Home, Sportswear

What's happening:

map() extracts just the category from each product
distinct() removes duplicate categories
sorted() arranges them alphabetically
Collectors.joining() combines all categories with the specified delimiter

Calculating Summary Statistics

Scenario: Calculate statistics for product prices.


DoubleSummaryStatistics priceStatistics = products.stream()
    .map(product -> product.price().doubleValue())
    .collect(Collectors.summarizingDouble(price -> price));

System.out.println("Product price statistics:");
System.out.println("Count: " + priceStatistics.getCount());
System.out.println("Average: $" + String.format("%.2f", priceStatistics.getAverage()));
System.out.println("Min: $" + String.format("%.2f", priceStatistics.getMin()));
System.out.println("Max: $" + String.format("%.2f", priceStatistics.getMax()));
System.out.println("Sum: $" + String.format("%.2f", priceStatistics.getSum()));
// Output:
// Product price statistics:
// Count: 10
// Average: $388.39
// Min: $12.99
// Max: $1999.99
// Sum: $3883.89

What's happening:

map() converts BigDecimal prices to primitive doubles
summarizingDouble() collector computes statistics as items are processed
The result provides count, sum, min, max, and average

Collecting to Unmodifiable Collections

Scenario: Collect product names into an unmodifiable list (Java 10+).


List<String> unmodifiableNames = products.stream()
    .map(Product::name)
    .collect(Collectors.toUnmodifiableList());
// unmodifiableNames.add("New Product"); // Throws UnsupportedOperationException

What's happening:

Collectors.toUnmodifiableList() returns an immutable list. Similar methods exist for Set and Map.

If the ouput collection needed is immutable list, we can also use Stream.toList() which is a shorthand for Collectors.toUnmodifiableList().


List<String> unmodifiableNames = products.stream()
    .map(Product::name)
    .toList();
// unmodifiableNames.add("New Product"); // Throws UnsupportedOperationException

Collecting to Set Directly

Scenario: Collect all product categories into a Set to remove duplicates.


Set<String> categorySet = products.stream()
    .map(Product::category)
    .collect(Collectors.toSet());
System.out.println(categorySet);

What's happening:

Collectors.toSet() collects elements into a Set, automatically removing duplicates.

Collecting to Map

Scenario: Create a map of product names to their prices.


Map<String, BigDecimal> productPriceMap = products.stream()
    .collect(Collectors.toMap(
        Product::name,
        Product::price
    ));
System.out.println(productPriceMap);
// Output: {iPhone 14=999.99, MacBook Pro=1999.99, Coffee Maker=89.99, ...}

What's happening:

Collectors.toMap() creates a map where the first argument is the key (product name) and the second is the value (product price).
If there are duplicate keys, a java.lang.IllegalStateException will be thrown. To handle this, we can provide a merge function as the third argument to toMap().

Filtering and Mapping in Collectors

Scenario: Use filtering and mapping as part of a collector.


Map<String, List<String>> electronicsNamesByCategory = products.stream()
    .collect(Collectors.groupingBy(
        Product::category,
        Collectors.filtering(
            p -> p.category().equals("Electronics"),
            Collectors.mapping(Product::name, Collectors.toList())
        )
    ));
System.out.println(electronicsNamesByCategory);

What's happening:

Collectors.filtering() allows filtering within a collector.
Collectors.mapping() transforms elements during collection.

Handling Duplicate Keys in toMap()

Scenario: Safely collect products by price, merging names if prices are the same.


Map<BigDecimal, String> priceToNames = products.stream()
    .collect(Collectors.toMap(
        Product::price,
        Product::name,
        (name1, name2) -> name1 + ", " + name2
    ));
System.out.println(priceToNames);

What's happening:

The third argument in toMap() is a merge function to handle duplicate keys.

Downstream Collectors with groupingBy

Scenario: Count products in each category and collect their names.


Map<String, Long> countByCategory = products.stream()
    .collect(Collectors.groupingBy(Product::category, Collectors.counting()));

Map<String, List<String>> namesByCategory = products.stream()
    .collect(Collectors.groupingBy(Product::category, Collectors.mapping(Product::name, Collectors.toList())));

What's happening:

Downstream collectors like counting() and mapping() can be used with groupingBy for advanced grouping.

Partitioning Data

Scenario: Divide products into "expensive" (>$100) and "affordable" categories.


Map<Boolean, List<Product>> pricePartition = products.stream()
    .collect(Collectors.partitioningBy(
        product -> product.price().compareTo(new BigDecimal("100")) > 0
    ));

System.out.println("Expensive products:");
pricePartition.get(true).forEach(p -> System.out.println("- " + p.name() + " ($" + p.price() + ")"));

System.out.println("\nAffordable products:");
pricePartition.get(false).forEach(p -> System.out.println("- " + p.name() + " ($" + p.price() + ")"));

What's happening:

partitioningBy() separates products into two groups based on the boolean predicate
The result is a map with Boolean keys (true/false) and lists of products as values

Grouping Data

Scenario: Group all products by their category.


Map<String, List<Product>> productsByCategory = products.stream()
    .collect(Collectors.groupingBy(Product::category));

productsByCategory.forEach((category, prods) -> {
    System.out.println(category + ":");
    prods.forEach(p -> System.out.println("  - " + p.name()));
});

What's happening:

groupingBy() creates a map where each key is a category and each value is a list of products in that category

Chaining Collectors

Scenario: Get the average price of products by category.


Map<String, Double> avgPriceByCategory = products.stream()
    .collect(Collectors.groupingBy(
        Product::category,
        Collectors.averagingDouble(p -> p.price().doubleValue())
    ));

avgPriceByCategory.forEach((category, avgPrice) -> 
    System.out.println(category + ": $" + String.format("%.2f", avgPrice)));
// Output:
// Electronics: $877.49
// Appliances: $84.99
// Sportswear: $56.32
// Home: $34.99

What's happening:

groupingBy() with a downstream collector combines two operations:
- First groups products by category
- Then calculates the average price within each group

Using collectingAndThen to transform the result after collection

Scenario: Analyze orders by customer tier and order status, showing order count and total items.


record OrderStats(long orderCount, int totalItems) {}

Map<String, Map<String, OrderStats>> orderAnalysisByTierAndStatus = orders.stream()
    .collect(Collectors.groupingBy(
        order -> order.customer().tier(),
        Collectors.groupingBy(
            Order::status,
            Collectors.collectingAndThen(
                Collectors.toList(),
                ordersList -> new OrderStats(
                    ordersList.size(),
                    ordersList.stream()
                        .flatMap(order -> order.items().stream())
                        .mapToInt(OrderItem::quantity)
                        .sum()
                )
            )
        )
    ));

orderAnalysisByTierAndStatus.forEach((tier, statusMap) -> {
    System.out.println("Customer Tier: " + tier);
    statusMap.forEach((status, stats) -> {
        System.out.println("  Status: " + status);
        System.out.println("    Order Count: " + stats.orderCount());
        System.out.println("    Total Items: " + stats.totalItems());
    });
});

What's happening:

First level of groupingBy() separates orders by customer tier
Second level of groupingBy() further separates by order status
collectingAndThen() transforms the collected list into an OrderStats object
Within the transformation, a nested stream calculates the total quantity of items

Custom Collector

If the built-in collectors do not meet your needs, you can create a custom collector. This is useful for complex aggregations or when you need to maintain state across multiple elements.

Scenario: Build a custom collector to analyze orders by customer tier, showing average order value and total items.


record TierAnalysis(int orderCount, BigDecimal totalRevenue, long totalItems) {
    double getAverageOrderValue() {
        return orderCount == 0 ? 0 : totalRevenue.doubleValue() / orderCount;
    }
    
    TierAnalysis combine(TierAnalysis other) {
        return new TierAnalysis(
            this.orderCount + other.orderCount,
            this.totalRevenue.add(other.totalRevenue),
            this.totalItems + other.totalItems
        );
    }
}

class TierAnalysisCollector implements Collector<Order, TierAnalysis, TierAnalysis> {

    @Override
    public Supplier<TierAnalysis> supplier() {
        return () -> new TierAnalysis(0, BigDecimal.ZERO, 0);
    }

    @Override
    public BiConsumer<TierAnalysis, Order> accumulator() {
        return (analysis, order) -> {
            BigDecimal orderTotal = order.items().stream()
                .map(item -> item.product().price()
                    .multiply(BigDecimal.valueOf(item.quantity())))
                .reduce(BigDecimal.ZERO, BigDecimal::add);
            
            long itemCount = order.items().stream()
                .mapToLong(OrderItem::quantity)
                .sum();
                
            analysis = new TierAnalysis(
                analysis.orderCount() + 1,
                analysis.totalRevenue().add(orderTotal),
                analysis.totalItems() + itemCount
            );
        };
    }

    @Override
    public BinaryOperator<TierAnalysis> combiner() {
        return TierAnalysis::combine;
    }

    @Override
    public Function<TierAnalysis, TierAnalysis> finisher() {
        return Function.identity();
    }

    @Override
    public Set<Characteristics> characteristics() {
        return Collections.unmodifiableSet(EnumSet.of(
            Characteristics.IDENTITY_FINISH
        ));
    }
}

// Usage of the custom collector
Map<String, TierAnalysis> tierAnalysis = orders.stream()
    .collect(Collectors.groupingBy(
        order -> order.customer().tier(),
        () -> new TierAnalysisCollector()
    ));

tierAnalysis.forEach((tier, analysis) -> {
    System.out.println("Tier: " + tier);
    System.out.println("  Orders: " + analysis.orderCount());
    System.out.println("  Total Revenue: $" + analysis.totalRevenue());
    System.out.println("  Total Items: " + analysis.totalItems());
    System.out.printf("  Avg. Order Value: $%.2f%n", analysis.getAverageOrderValue());
});
// Output depends on the data, but might look like:
// Tier: elite
//   Orders: 3
//   Total Revenue: $1479.97
//   Total Items: 4
//   Avg. Order Value: $493.32

What's happening:

Define a custom Collector implementation that accumulates order data into TierAnalysis objects
Use it with groupingBy to analyze orders by customer tier
The collector maintains running counts and totals for each tier

Parallel Streams

Parallel streams allow for concurrent processing of data, which can improve performance on large datasets. We can use parallel streams by calling parallelStream() instead of stream().

Internally, parallel streams use the ForkJoinPool to split the workload across multiple threads. This can lead to significant performance improvements for CPU-bound tasks.

Scenario: Use parallel streams to find all electronics products ordered by premium customers.


List<Product> premiumElectronics = orders.parallelStream()
    .filter(order -> order.customer().tier().equals("premium"))
    .flatMap(order -> order.items().stream())
    .map(OrderItem::product)
    .filter(product -> product.category().equals("Electronics"))
    .distinct()
    .collect(Collectors.toList());

System.out.println("Electronics ordered by premium customers:");
premiumElectronics.forEach(p -> System.out.println("- " + p.name()));
// Output depends on the data, but might look like:
// Electronics ordered by premium customers:
// - iPhone 14
// - MacBook Pro

What's happening:

parallelStream() processes the data in parallel, potentially using multiple CPU cores
The operations filter for premium customers and electronics products
distinct() ensures no duplicate products in the result

When to Use Parallel Streams

Parallel streams can improve performance but aren't always the best choice. Use them when:

You have a large data set (thousands or millions of elements)
Operations per element are computationally intensive
Your operations are stateless and don't depend on encounter order
You have a multi-core CPU with enough available cores

Avoid parallel streams when:

The data set is small
Operations are I/O-bound rather than CPU-bound
Operations have side effects or shared mutable state
Maintaining encounter order is important

We need to ensure we are not mutating data while using parallel streams since that will lead to unpredictable results due to potential race conditions. Also since different threads will process elements, things like trace id normally shared through thread local variables may not work as expected.

Teeing Collector

The teeing() collector allows you to combine the results of two separate collectors into a single result. This is useful when you want to perform two different aggregations on the same data.

Scenario: Calculate both the total revenue and count of products sold in a single stream operation.


record SalesStatistics(BigDecimal totalRevenue, long totalProductsSold) {}

SalesStatistics salesStats = orders.stream()
    .flatMap(order -> order.items().stream())
    .collect(Collectors.teeing(
        Collectors.mapping(
            item -> item.product().price().multiply(BigDecimal.valueOf(item.quantity())),
            Collectors.reducing(BigDecimal.ZERO, BigDecimal::add)
        ),
        Collectors.summingLong(OrderItem::quantity),
        SalesStatistics::new
    ));

System.out.println("Total Revenue: $" + salesStats.totalRevenue());
System.out.println("Total Products Sold: " + salesStats.totalProductsSold());
// Output:
// Total Revenue: $4883.89
// Total Products Sold: 12

What's happening:

teeing() collector splits the stream into two separate collectors
First collector calculates the total revenue
Second collector counts total quantity of all items
Results from both collectors are combined into a single SalesStatistics object

Stream Gatherers (Java 22)

While Streams and Collectors have provided powerful data processing capabilities since Java 8, Java 22 introduced a preview feature called Stream Gatherers which takes this to the next level.

Stream Gatherers allow developers to create custom intermediate operations for streams, enabling more complex data transformations that were previously difficult to express.


record Employee(String name, int age, String department) {}

List<Employee> employees = getEmployees();

// Using a fixed window gatherer to process employees in pairs
var employeePairs = employees.stream()
    .filter(employee -> employee.department().equals("Engineering"))
    .map(Employee::name)
    .gather(Gatherers.windowFixed(2))  // Group names in pairs
    .toList();
// Result: [[Alice, Mary], [John, Ramesh], [Jen]]

Key Components of a Stream Gatherer

A Gatherer consists of:

Initializer: Creates the initial state for stateful operations
Integrator: Processes each element and potentially pushes results downstream
Finisher: Performs actions after processing all elements
Combiner: Combines states when processing in parallel

Built-in Gatherers

Java 22 includes several built-in gatherers:

Gatherers.fold(): Combines all stream elements into a single result
Gatherers.scan(): Performs incremental accumulation and produces intermediate results
Gatherers.windowFixed(): Groups elements into fixed-size windows
Gatherers.windowSliding(): Creates overlapping windows of elements
Gatherers.mapConcurrent(): Maps elements concurrently with controlled parallelism

For more details on Stream Gatherers, including implementation examples and advanced usage, check out our dedicated blog post: Stream Gatherers using Java 22.

General Tips and Best Practices

Avoid side effects in stream operations. Don't modify external state in lambdas as this breaks the functional programming model and can lead to unpredictable behavior.
Keep pipelines short and readable. Long chains of operations can be hard to follow. Break them into smaller methods if necessary.
Prefer method references over lambdas when possible for cleaner, more readable code: stream.map(Person::getName) instead of stream.map(person -> person.getName()).
Use specialized primitive streams (IntStream, LongStream, DoubleStream) when working with primitives to avoid boxing/unboxing overhead.
Choose the appropriate terminal operation for your needs - don't use collect() when a simpler operation like findFirst() or anyMatch() will do.
Be careful with stateful operations like sorted(), distinct(), and limit() as they can negatively impact performance with large datasets.
Use parallel streams judiciously. They aren't always faster and can sometimes be slower due to overhead. Benchmark your specific use case before committing to parallel processing.
Use peek() for debugging only, not for performing actual operations. It's intended for observation, not for changing state.
Remember that streams can be consumed only once. After a terminal operation is executed, the stream is closed and cannot be reused.
Leverage built-in collectors before implementing custom ones. The Collectors class provides many powerful implementations for common use cases.
Handle infinite streams carefully by always including limiting operations like limit() or short-circuiting operations like findFirst() to prevent infinite processing.

When to Avoid Streams and go for Traditional Loops

While Java Streams are powerful, there are scenarios where traditional loops may be more appropriate:

Need index access: If you need to access elements by index or require random access, traditional loops are more suitable.
Need to mutate elements: If you need to modify elements in place, traditional loops are often clearer and more efficient.
Nested loops with complex logic: When you have nested loops with complex logic, traditional loops can be easier to read and maintain.
Complex state management: When the logic requires maintaining complex state across iterations, traditional loops may provide clearer control over the flow.
Debugging: When debugging complex logic, traditional loops can be easier to step through and inspect variable states at each iteration.
Readability: In some cases, traditional loops can be more readable

Conclusion

Java Streams and Collectors provide a powerful, declarative approach to data processing that can significantly improve code readability and maintainability. By understanding the various operations and collectors available, you can solve complex data transformation problems with elegant, concise code.

Stream Gatherers using Java 22

This blog introduces Stream Gatherers, a new feature in Java 22, which allows developers to add custom intermediate operations to stream processing. It explains how to create and use Stream Gatherers to enhance data transformation capabilities in Java streams.

Top 5 Features Released in Java 21-23 all developers should know

Explore the top 5 features released from Java 21 to Java 23, including Virtual Threads, Pattern Matching with Records and Sealed Classes, Structured Concurrency, Scoped Values, and Stream Gatherers. Learn how these features can enhance your Java applications.