A short look at Java 8 streams

With Java 8, among the new language feature of Lambdas, the new concept of streams was also introduced, and if you look at streams, you will certainly use Lambdas, too.

The advantage of streams is, that you increase the understandibility and readability of your code. And in theory, if you use use parallel streams the correct way, you can speed up your process, but from my observations, that won’t happen, if you use only small datasets and/or simple operations.

To show you how to use streams, let’s implement a small task:

Imagine, you have got a record collection system and want to calculate the value of your collection and a average price of each record, where you still know, how much you payed for it (that’s not neccessiarily the case for all of your records!).

In a traditional approach, you would implement it more or less like this:

List<Medium> media = mediumRepository.findAll();

double sumValue = 0;
long boughtMediaCount=0;
for (Medium medium : media) {
  if (medium.getBuyPrice() != null) {
    sumValue += medium.getBuyPrice();
    boughtMediaCount++;
  }
}

System.out.println("Total price="+String.format("EUR %.02f", sumValue));
System.out.println("Average price="+String.format("EUR %.02f", (sumValue / (double) boughtMediaCount)))

Now, let’s analyze, what we do here:

After retrieving the whole dataset a a list, we iterate over each element. In each iteration, we check, whether the property “buyPrice” is set, and if yes, we add that value to the totals and increase the counter for records, where we know the price. At the end, we want to get two values, the total price and the the average price.

In other words:

We look at each element (“stream“), only process one single property (“map“), only use those properties with a certain value (“filter“) and calculate a result (“collect“).

That description can now be nicely transformed into Java 8 code, which is almost identical to the non-technical description above:

List media = mediumRepository.findAll();

Averager averagePrice = media.stream().
    map(Medium::getBuyPrice).
    filter(v -> v != null).
    collect(Averager::new, Averager::accept, Averager::combine);

System.out.println("Total price="+String.format("EUR %.02f", averagePrice.getTotal()));
System.out.println("Average price="+String.format("EUR %.02f", averagePrice.getAverage()));

Isn’t that nice? No brace hell any more, no boring iterations.

Ok, you have to use an extra class, the Averager, which looks the following way:

public class Averager implements DoubleConsumer {
    private double total=0;
    private int count=0;

    public double getAverage() {
        return count>0? (total/(double)count) : 0;
    }

    public int getCount() {
        return count;
    }

    public void combine(Averager other) {
        total += other.total;
        count += other.count;
    }

    @Override
    public void accept(double value) {
        total += value;
        count++;
    }

    @Override
    public DoubleConsumer andThen(DoubleConsumer after) {
        return null;
    }

    public double total() {
        return total;
    }
}

For one single occurence, you will use a little more core here (a big little more), but even then, your readability and testability increases, and that’s, what finally counts.

A few final observations:

  • You can parallelize the work on your stream, if you use the parallel() method of the streaming API. But be warned, that like with every parallelization, there can be cases, where you actually might lose performance.
  • The order of invoking the stream methods is important:
    On my example, using filter() before map() is faster on an sequentially executed stream, but equal to slower on a parallel executed stream
  • On small datasets (in my benchmarks, I worked with roughly 1000 items), the traditional approach with the for-loop, is much faster, than working with streams. I don’t know, how much that changes on larger datasets and/or more complex items.