Towards More Functional Java - Digging into Nested Data Structures

Posted on November 14, 2016 by Scott Leberknight

In the last post we saw an example that used a generator combined with a filter to find the first available port in a specific range. It returned an Optional to model the case when no open ports are found, as opposed to returning null. In this example, we'll look at how to use Java 8 streams to dig into a nested data structure and find objects of a specific type. We'll use map and filter operations on the stream, and also introduce a new concept, the flat-map.

In the original, pre-Java 8 code that I was working on in a project, the data structure was a three-level nested structure that was marshaled into Java objects from an XML file based on a schema from an external web service. The method needed to find objects of a specific type at the bottom level. For this article, to keep things simple we will work with a simple class structure in which class A contains a collection of class B, and B contains a collection of class C. The C class is a base class, and there are several subclasses C1, C2, and C3. In pseudo-code the classes look like:

class A {
  List<B> bs = []
}

class B {
  List<C> cs = []
}

class C {}
class C1 extends C {}
class C2 extends C {}
class C3 extends C {}

The goal here is to find the first C2 instance, given an instance of A. The pre-Java 8 code looks like the following:

public C2 findFirstC2(A a) {
    for (B b : a.getBs()) {
        for (C c : b.getCs()) {
            if (c instanceof C2) {
                return (C2) c;
            }
        }
    }
    return null;
}

In this code, I made the assumption that the collections are always non-null. The original code I was working on did not make that assumption, and was more complicated as a result. We will revisit the more complicated case later. This code is pretty straightforward: two loops and a conditional, plus an early exit if we find an instance of C2, and return null if we exit the loops without having found anything.

Refactoring to streams using Java 8's stream API is not too bad, though we need to introduce the flat-map concept. Martin Fowler's simple explanation is better than any I would come up with so I will repeat it here: "Map a function over a collection and flatten the result by one-level". In our example, each B has a collection of C. The flat-map operation over a collection of B will basically return a stream of all C for all B. For example, if there are two B instances in the collection, the first having 3 C and the second having 5 C, then the flat-map operation returns a stream of 8 C instances (effectively combining the 3 from the first C and 5 from the second C, and flattening by one level up). With the new flat-map tool in our tool belts, here is the Java 8 code using the stream API:

public Optional<C2> findFirstC2(A a) {
    return a.getBs().stream()
            .flatMap(b -> b.getCs().stream())
            .filter(C2.class::isInstance)
            .map(C2.class::cast)
            .findFirst();
}

In the above code, we first stream over the collection of B. Next is where we apply the flatMap method to get a stream of all C. The one somewhat tricky thing about the Java 8 flatMap method is that the mapper function must return a stream. In our example, we use b.getCs().stream() as the mapper function, thus returning a stream of C. From then on we can apply the filter and map operations, and close out with the findFirst short-circuiting (because it stops at the first C2 it finds) terminal operation which returns an Optional that either contains a value, or is empty.

If you have read any of my previous posts, you won't be surprised that I prefer the functional-style of the Java 8 stream API, for the same reasons I've listed previously (e.g. declarative code, no explicit loops or conditionals, etc.). And as we've seen before in previous posts, we can make the above example generic very easily:

public <T extends C> Optional<T> findFirst(A a, Class<T> clazz) {
    return a.getBs().stream()
            .flatMap(b -> b.getCs().stream())
            .filter(clazz::isInstance)
            .map(clazz::cast)
            .findFirst();
}

Of course, it is also not difficult to make the imperative version with loops generic, using the isAssignableFrom and cast methods in the Class class. And you can even make it just as short by removing the braces, as in the following:

public <T> T findFirstC2(A a, Class<T> clazz) {
    for (B b : a.getBs())
        for (C c : b.getCs())
            if (clazz.isAssignableFrom(c.getClass()))
                return clazz.cast(c);
    return null;
}

I never omit the braces even for one liners, because I believe it is a great way to introduce bugs (remember goto fail a few years ago?). Braces or no braces, why prefer the more functional style to the imperative style? Some is obviously personal preference, and what you are used to. Clearly if you are used to and comfortable with reading imperative code, it won't be an issue to read the above code. But the same goes for functional style, i.e. once you learn the basic concepts (map, filter, reduce, flat-map, etc.) it becomes very easy to quickly see what code is doing (and what is intended).

One other thing is that instead of using stream(), you can easily switch to parallelStream() which then automatically parallelizes the code. But simply using parallelStream() will not always (counter-intuitively) make code faster, e.g. for small collections it will probably make it slower due to context switching. But if things like transformation or filtering take a significant amount of time, then parallelizing the code can produce significant performance improvement. Unfortunately there are no hard rules though, and whether parallelizing speeds the code up depends on various and sundry factors.

The examples above were very simple. The original code was more complex because it did not make any assumptions about nullability of the original argument or the nested collections. Here is the code:

public C2 findFirstC2(A a) {
    if (a == null || a.getBs() == null) {
        return null;
    }

    for (B b : a.getBs()) {
        List<C> cs = b.getCs();
        if (cs == null) {
            continue;
        }

        for (C c : cs) {
            if (c instanceof C2) {
                return (C2) c;
            }
        }
    }
    return null;
}

This code is more difficult to read than the original code due to the additional null-checking conditionals. There are two loops, three conditionals, a loop continuation, and a short-circuit return form within a nested loop. So what does this look like using the Java 8 stream API? Here is one solution:

public Optional<C2> findFirstC2(A a) {
    return Optional.ofNullable(a)
            .map(A::getBs)
            .orElseGet(Lists::newArrayList)
            .stream()
            .flatMap(this::toStreamOfC)
            .filter(C2.class::isInstance)
            .map(C2.class::cast)
            .findFirst();
}

private Stream<? extends C> toStreamOfC(B b) {
    return Optional.ofNullable(b.getCs())
            .orElseGet(Lists::newArrayList)
            .stream();
}

That looks like a lot, so let's see what is going on. The main difference is that we need to account for possible null values. For that the code uses the Optional#ofNullable method which unsurprisingly returns an Optional. We are also using map operations on the Optional objects, which returns an empty Optional if it was originally empty, otherwise it applies the operation. We are also using the Optional#orElseGet method to ensure we are always operating on non-null collections, for example if a.getBs() returns null then the first orElseGet provides a new ArrayList. In this manner, the code always works the same way whether the intermediate collections are null or not. Instead of embedding a somewhat complicated map operation in the flatMap I extracted the toStreamOfC method, and then used a method reference. When writing code in functional style, often it helps to extract helper methods, even if that ends up creating more code because, in the end, the code is more easily understood.

The code in this more complex example illustrates the declarative nature of the functional style. Once you are familiar with the functional primitives (like map, flat-map, filter, and so on) reading this code is quite easy and fast, because it reads like a specification of the problem. Like reading code, writing code in the functional style takes some practice and getting used to, but once you get the hang of it, I think you will find you can often write the code faster. The main difference when writing code in functional style is that I do more thinking about what exactly I am trying to do before just slinging code. Until next time, auf Wiedersehen.

This blog was originally published on the Fortitude Technologies blog here.



Post a Comment:
Comments are closed for this entry.