Java Streams Api - Parallel programming

Overview

Without fully understanding what is happening under the hood, parallelizing your code can lead to slower performance.

By default, any stream operation in Java is processed sequentially, unless explicitly specified as parallel.

myShapesCollection.stream()
    .filter(e -> e.getColor() == Color.RED)
    .forEach(e -> System.out.println(e.getName()));

myShapesCollection.parallelStream() // <-- This one uses parallel
    .filter(e -> e.getColor() == Color.RED)
    .forEach(e -> System.out.println(e.getName()));

Fork-Join Framework

Parallel streams make use of the fork-join framework and its common pool of worker threads.

The fork-join framework was added to java.util.concurrent in Java 7 to handle task management between multiple threads.

Considerations to decide between sequential and parallal streaming

  1. Splitting / decomposition costs – Sometimes splitting is more expensive than just doing the work!

  2. Task dispatch / management costs – Can do a lot of work in the time it takes to hand work to another thread.

  3. Result combination costs – Sometimes combination involves copying lots of data. For example, adding numbers is cheap whereas merging sets is expensive.

  4. Locality – The elephant in the room. This is an important point which everyone may miss. You should consider cache misses, if a CPU waits for data because of cache misses then you wouldn’t gain anything by parallelization. That’s why array-based sources parallelize the best as the next indices (near the current index) are cached and there are fewer chances that CPU would experience a cache miss.

  5. I have a massive amount of items to process (or the processing of each item takes time and is parallelizable)

  6. I have a performance problem in the first place

  7. I don’t already run the process in a multi-thread environment (for example: in a web container, if I already have many requests to process in parallel, adding an additional layer of parallelism inside each request could have more negative than positive effects)

Reading material

  1. https://www.baeldung.com/java-when-to-use-parallel-stream#:~:text=Parallel%20processing%20may%20be%20beneficial,source%20and%20merging%20the%20results.
  2. https://stackoverflow.com/questions/20375176/should-i-always-use-a-parallel-stream-when-possible

Links to this note