Towards More Functional Java - Digging into Nested Data Structures

Posted on November 14, 2016 by Scott Leberknight

In the last post we saw an example that used a generator combined with a filter to find the first available port in a specific range. It returned an Optional to model the case when no open ports are found, as opposed to returning null. In this example, we'll look at how to use Java 8 streams to dig into a nested data structure and find objects of a specific type. We'll use map and filter operations on the stream, and also introduce a new concept, the flat-map.

In the original, pre-Java 8 code that I was working on in a project, the data structure was a three-level nested structure that was marshaled into Java objects from an XML file based on a schema from an external web service. The method needed to find objects of a specific type at the bottom level. For this article, to keep things simple we will work with a simple class structure in which class A contains a collection of class B, and B contains a collection of class C. The C class is a base class, and there are several subclasses C1, C2, and C3. In pseudo-code the classes look like:

class A {
  List<B> bs = []
}

class B {
  List<C> cs = []
}

class C {}
class C1 extends C {}
class C2 extends C {}
class C3 extends C {}

The goal here is to find the first C2 instance, given an instance of A. The pre-Java 8 code looks like the following:

public C2 findFirstC2(A a) {
    for (B b : a.getBs()) {
        for (C c : b.getCs()) {
            if (c instanceof C2) {
                return (C2) c;
            }
        }
    }
    return null;
}

In this code, I made the assumption that the collections are always non-null. The original code I was working on did not make that assumption, and was more complicated as a result. We will revisit the more complicated case later. This code is pretty straightforward: two loops and a conditional, plus an early exit if we find an instance of C2, and return null if we exit the loops without having found anything.

Refactoring to streams using Java 8's stream API is not too bad, though we need to introduce the flat-map concept. Martin Fowler's simple explanation is better than any I would come up with so I will repeat it here: "Map a function over a collection and flatten the result by one-level". In our example, each B has a collection of C. The flat-map operation over a collection of B will basically return a stream of all C for all B. For example, if there are two B instances in the collection, the first having 3 C and the second having 5 C, then the flat-map operation returns a stream of 8 C instances (effectively combining the 3 from the first C and 5 from the second C, and flattening by one level up). With the new flat-map tool in our tool belts, here is the Java 8 code using the stream API:

public Optional<C2> findFirstC2(A a) {
    return a.getBs().stream()
            .flatMap(b -> b.getCs().stream())
            .filter(C2.class::isInstance)
            .map(C2.class::cast)
            .findFirst();
}

In the above code, we first stream over the collection of B. Next is where we apply the flatMap method to get a stream of all C. The one somewhat tricky thing about the Java 8 flatMap method is that the mapper function must return a stream. In our example, we use b.getCs().stream() as the mapper function, thus returning a stream of C. From then on we can apply the filter and map operations, and close out with the findFirst short-circuiting (because it stops at the first C2 it finds) terminal operation which returns an Optional that either contains a value, or is empty.

If you have read any of my previous posts, you won't be surprised that I prefer the functional-style of the Java 8 stream API, for the same reasons I've listed previously (e.g. declarative code, no explicit loops or conditionals, etc.). And as we've seen before in previous posts, we can make the above example generic very easily:

public <T extends C> Optional<T> findFirst(A a, Class<T> clazz) {
    return a.getBs().stream()
            .flatMap(b -> b.getCs().stream())
            .filter(clazz::isInstance)
            .map(clazz::cast)
            .findFirst();
}

Of course, it is also not difficult to make the imperative version with loops generic, using the isAssignableFrom and cast methods in the Class class. And you can even make it just as short by removing the braces, as in the following:

public <T> T findFirstC2(A a, Class<T> clazz) {
    for (B b : a.getBs())
        for (C c : b.getCs())
            if (clazz.isAssignableFrom(c.getClass()))
                return clazz.cast(c);
    return null;
}

I never omit the braces even for one liners, because I believe it is a great way to introduce bugs (remember goto fail a few years ago?). Braces or no braces, why prefer the more functional style to the imperative style? Some is obviously personal preference, and what you are used to. Clearly if you are used to and comfortable with reading imperative code, it won't be an issue to read the above code. But the same goes for functional style, i.e. once you learn the basic concepts (map, filter, reduce, flat-map, etc.) it becomes very easy to quickly see what code is doing (and what is intended).

One other thing is that instead of using stream(), you can easily switch to parallelStream() which then automatically parallelizes the code. But simply using parallelStream() will not always (counter-intuitively) make code faster, e.g. for small collections it will probably make it slower due to context switching. But if things like transformation or filtering take a significant amount of time, then parallelizing the code can produce significant performance improvement. Unfortunately there are no hard rules though, and whether parallelizing speeds the code up depends on various and sundry factors.

The examples above were very simple. The original code was more complex because it did not make any assumptions about nullability of the original argument or the nested collections. Here is the code:

public C2 findFirstC2(A a) {
    if (a == null || a.getBs() == null) {
        return null;
    }

    for (B b : a.getBs()) {
        List<C> cs = b.getCs();
        if (cs == null) {
            continue;
        }

        for (C c : cs) {
            if (c instanceof C2) {
                return (C2) c;
            }
        }
    }
    return null;
}

This code is more difficult to read than the original code due to the additional null-checking conditionals. There are two loops, three conditionals, a loop continuation, and a short-circuit return form within a nested loop. So what does this look like using the Java 8 stream API? Here is one solution:

public Optional<C2> findFirstC2(A a) {
    return Optional.ofNullable(a)
            .map(A::getBs)
            .orElseGet(Lists::newArrayList)
            .stream()
            .flatMap(this::toStreamOfC)
            .filter(C2.class::isInstance)
            .map(C2.class::cast)
            .findFirst();
}

private Stream<? extends C> toStreamOfC(B b) {
    return Optional.ofNullable(b.getCs())
            .orElseGet(Lists::newArrayList)
            .stream();
}

That looks like a lot, so let's see what is going on. The main difference is that we need to account for possible null values. For that the code uses the Optional#ofNullable method which unsurprisingly returns an Optional. We are also using map operations on the Optional objects, which returns an empty Optional if it was originally empty, otherwise it applies the operation. We are also using the Optional#orElseGet method to ensure we are always operating on non-null collections, for example if a.getBs() returns null then the first orElseGet provides a new ArrayList. In this manner, the code always works the same way whether the intermediate collections are null or not. Instead of embedding a somewhat complicated map operation in the flatMap I extracted the toStreamOfC method, and then used a method reference. When writing code in functional style, often it helps to extract helper methods, even if that ends up creating more code because, in the end, the code is more easily understood.

The code in this more complex example illustrates the declarative nature of the functional style. Once you are familiar with the functional primitives (like map, flat-map, filter, and so on) reading this code is quite easy and fast, because it reads like a specification of the problem. Like reading code, writing code in the functional style takes some practice and getting used to, but once you get the hang of it, I think you will find you can often write the code faster. The main difference when writing code in functional style is that I do more thinking about what exactly I am trying to do before just slinging code. Until next time, auf Wiedersehen.

This blog was originally published on the Fortitude Technologies blog here.

Towards More Functional Java using Generators and Filters

Posted on October 12, 2016 by Scott Leberknight

Last time we saw how to use lambdas as predicates, and specifically how to use them with the Java 8 Collection#removeIf method in order to remove elements from a map based on the predicate. In this article we will use a predicate to filter elements from a stream, and combine it with a generator to find the first open port in a specific range. The use case is a (micro)service-based environment where each new service binds to the first open port it finds in a specific port range. For example, suppose we need to limit the port range of each service to the dynamic and/or private ports (49152 to 65535, as defined by IANA). Basically we want to choose a port at random in the dynamic port range and bind to that port if it is open, otherwise repeat the process until we find an open port or we have tried more than a pre-defined number of times.

The original pre-Java 8 code to accomplish this looked like the following:

public Integer findFreePort() {
    int assignedPort = -1;
    int count = 1;
    while (count <= MAX_PORT_CHECK_ATTEMPTS) {
        int checkPort = MIN_PORT + random.nextInt(PORTS_IN_RANGE);
        if (portChecker.isAvailable(checkPort)) {
            assignedPort = checkPort;
            break;
        }
        count++;
    }
    return assignedPort == -1 ? null : assignedPort;
}

There are a few things to note here. First, the method returns an Integer to indicate that it could not find an open port (as opposed to throwing an exception, which might or might not be better). Second, there are two mutable variables assignedPort and count, which are used to store the open port (if found) and to monitor the number of attempts made, respectively. Third, the while loop executes so long as as the maximum number of attempts has not been exceeded. Fourth, a conditional inside the loop uses a port checker object to determine port availability, breaking out of the loop if an open port is found. Finally, a ternary expression is used to check the assignedPort variable and return either null or the open port.

Taking a step back, all this code really does is loop until an open port is found, or until the maximum attempts has been exceeded. It then returns null (if no open port was found) or the open port as an Integer. There are two mutable variables, a loop, a conditional inside the loop with an early break, and another conditional (via the ternary) to determine the return value. I'm sure there are a few ways this code could be improved without using Java 8 streams. For example, we could simply return the open port from the conditional inside the loop and return null if we exit the loop without finding an open port, thereby eliminating the assignedPort variable. Even so it still contains a loop with a conditional and an early exit condition. And some people really dislike early returns and only want to see one return statement at the end of a method (I don't generally have a problem with early exits from methods, so long as the method is relatively short). Not to mention returning null when a port is not found forces a null check on callers; if a developer isn't paying attention or this isn't documented, perhaps they omit the null check causing a NullPointerException somewhere downstream.

Refactoring this to use the Java 8 stream API can be done relatively simply. In order to accomplish this we want to do the following, starting with generating a sequence of random ports. For each randomly generated port, filter on open ports and return the first open port we find. If no open ports are found after limiting our attempts to a pre-determined maximum, we want to return something that clearly indicates no open port was found, i.e. that the open port is empty. I chose the terminology here very specifically, to correspond to both general functional programming concepts as well as to the Java 8 API methods we can use.

Here is the code using the Java 8 APIs:

public OptionalInt findFreePort() {
    IntSupplier randomPorts = () -> MIN_PORT + random.nextInt(PORTS_IN_RANGE);
    return IntStream.generate(randomPorts)
            .limit(MAX_PORT_CHECK_ATTEMPTS)
            .filter(portChecker::isAvailable)
            .findFirst();
}

Without any explanation you can probably read the above code and tell generally what it does, because we are declaring what should happen, as opposed to listing the explicit instructions for how to do it. But let's dive in and look at the specifics anyway. The refactored method returns an OptionalInt to indicate the presence or absence of a value; OptionalInt is just the version of the Optional class specialized for primitive integers. This better matches the semantics we'd like, which is to clearly indicate to a caller that there may or may not be a value present. Next, we are using the generate method to create an infinite sequence of random values, using the specified IntSupplier (which is a specialization of Supplier for primitive integers). Suppliers do exactly what they say they do - supply a value, and in this case a random integer in a specific range. Note the supplier is defined using a lambda expression.

The infinite sequence is truncated (limited) using the limit method, which turns it into a finite sequence. The final two pieces are the filter and findFirst methods. The filter method uses a method reference to the isAvailable method on the portChecker object, which is just a shortcut for a lambda expression when the method accepts a single value that is the lambda argument. Finally, we use findFirst which is described by the Javadocs as a "short-circuiting terminal operation" which simply means it terminates a stream, and that as soon as its condition is met, it "short circuits" and terminates. The short-circuiting behavior is basically the same as the break statement in the original pre-Java 8 code.

So now we have a more functional version that finds free ports with no mutable variables and a more semantically correct return type. As we've seen in several of the previous articles in this ad-hoc series, we are seeing common patterns (i.e. map, filter, collect/reduce) recurring in a slightly different form. Instead of a map operation to transform an existing stream, we are generating a stream from scratch, limiting to a finite number of attempts, filtering the items we want to accept, and then using a short-circuiting terminal operation to return the value found, or an empty value as an OptionalInt.

As you can probably tell, I am biased toward the functional version for various reasons such as the declarative nature of the code, no explicit looping or variable mutation, and so on. In this case I think the more functional version is much more readable (though I am 100% sure there will be people who vehemently disagree, and that's OK). In addition, because we are using what are effectively building blocks (generators, map, filter, reduce/collect, etc.) we can much more easily make something generic to find the first thing that satisifies a filtering condition given a supplier and limit. For example:

public <T> Optional<T> findFirst(long maxAttempts,
                                 Supplier<T> generator,
                                 Predicate<T> condition) {
    return Stream.generate(generator)
            .limit(maxAttempts)
            .filter(condition)
            .findFirst();
}

Now we have a re-usable method that can accept any generator and any predicate. For example, suppose you want to find the first random number over two billion if it occurs within 10 attempts, or else default to 42 (naturally). Assuming you have a random number generator object rand, then you could call the findFirst method like this, making use of the orElse method on Optional to provide a default value:

Integer value = findFirst(10, rand::nextInt, value -> value > 2_000_000_000).orElse(42);

So as I mentioned in the last article on predicates, there is a separation of concerns achieved by using the functional style that simply is not possible using traditional control structures such as the while loop and explicit if conditional as in the first example of this article. (*) Essentially, the functional style is composable using basic building blocks, which is another huge win. Because of this composability, in general you tend to write less code, and the code that you do write tends to be more focused on the business logic you are actually trying to perform. And when you do see the same pattern repeated several times, it is much easier to extract the commonality using the functional style building blocks as we did to create the generic findFirst method in the last example. To paraphrase Yoda, once you start down the path to the functional side, forever will it dominate your destiny. Unlike the dark side of the Force, however, the functional side is much better and nicer. Until next time, arrivederci.

You can find all the sample code used in this blog and the others in this series on my GitHub in the java8-blog-code repository.

(*) Yes, you can simulate functional programming using anonymous inner classes prior to Java 8, or you can use a library like Guava and use its functional programming idioms. In general this tends to be verbose and you end up with more complicated and awkward-looking code. As the Guava team explains:

Excessive use of Guava's functional programming idioms can lead to verbose, confusing, unreadable, and inefficient code. These are by far the most easily (and most commonly) abused parts of Guava, and when you go to preposterous lengths to make your code "a one-liner," the Guava team weeps.

This blog was originally published on the Fortitude Technologies blog here.

Towards More Functional Java using Lambdas as Predicates

Posted on September 13, 2016 by Scott Leberknight

Previously I showed an example that transformed a map of query parameters into a SOLR search string. The pre-Java 8 code used a traditional for loop with a conditional and used a StringBuilder to incrementally build a string. The Java 8 code streamed over the map entries, mapping (transforming) each entry to a string of the form "key:value" and finally used a Collector to join those query fragments together. This is a common pattern in functional-style code, in which a for loop transforms one collection of objects into a collection of different objects, optionally filters some of them out, and optionally reduce the collection to a single element. These are common patterns in the functional style - map, filter, reduce, etc. You can almost always replace a for loop with conditional filtering and reduction into a Java 8 stream with map, filter, and reduce (collect) operations.

But in addition to the stream API, Java 8 also introduced some nice new API methods that make certain things much simpler. For example, suppose we have the following method to remove all map entries for a given set of keys. In the example code, dataCache is a ConcurrentMap and deleteKeys is the set of keys we want to remove from that cache. Here is the original code I came across:

public void deleteFromCache(Set<String> deleteKeys) {
    Iterator<Map.Entry<String, Object>> iterator = dataCache.entrySet().iterator();
    while (iterator.hasNext()) {
        Map.Entry<String, Object> entry = iterator.next();
        if (deleteKeys.contains(entry.getKey())) {
            iterator.remove();
        }
    }
}

Now, you could argue there are better ways to do this, e.g. iterate the delete keys and remove each mapping using the Map#remove(Object key) method. For example:

public void deleteFromCache(Set<String> deleteKeys) {
    for (String deleteKey : deleteKeys) {
        dataCache.remove(deleteKey);
    }
}

The code using the for loop certainly seems cleaner than using the Iterator in this case, though both are functionally equivalent. Can we do better? Java 8 introduced the removeIf method as a default method, not in Map but instead in the Collection interface. This new method "removes all of the elements of this collection that satisfy the given predicate", to quote from the Javadocs. This method accepts one argument, a Predicate, which is a functional interface introduced in Java 8, and which can therefore be used in lambda expressions. Let's first implement this a regular old anonymous inner class, which you can always do even in Java 8. It looks like:

public void deleteFromCache(Set<String> deleteKeys) {
    dataCache.entrySet().removeIf(new Predicate<Map.Entry<String, Object>>() {
        @Override
        public boolean test(Map.Entry<String, Object> entry) {
            return deleteKeys.contains(entry.getKey());
        }
    });
}

As you can see, we first get the map's entry set via the entrySet method and call removeIf on it, supplying a Predicate that tests whether the set of deleteKeys contains the entry key. If this test returns true, the entry is removed. Since Predicate is annotated with @FunctionalInterface it can act as a lambda expression, a method reference, or a constructor reference according to the Javadoc. So let's take the first step and convert the anonymous inner class into a lambda expression:

public void deleteFromCache(Set<String> deleteKeys) {
    dataCache.entrySet().removeIf((Map.Entry<String, Object> entry) ->
        deleteKeys.contains(entry.getKey()));
}

In the above, we've replaced the anonymous class with a lambda expression that takes a single Map.Entry argument. But, Java 8 can infer the argument types of lambda expressions, so we can remove the explicit (and a bit noisy) type declarations, leaving us with the following cleaner code:

public void deleteFromCache(Set<String> deleteKeys) {
    dataCache.entrySet().removeIf(entry -> deleteKeys.contains(entry.getKey()));
}

This code is quite a bit nicer than the original code using an explicit Iterator. But what about compared to the second code example that looped through the keys using a simple for loop, and calling remove to remove each element? The lines of code really aren't that different, so assuming they are functionally equivalent then perhaps it is just a style preference. The explicit for loop is a traditional imperative style, whereas the removeIf has a more functional flavor to it. If you look at the actual implementation of removeIf in the Collection interface, it actually uses an Iterator under the covers, just as with the first example in this post.

So practically there is no difference in functionality. But, removeIf could theoretically be implemented for certain types of collections to perform the operation in parallel, and perhaps only for collections over a certain size where it can be shown that parallelizing the operation has benefits. But this simple example is really more about separation of concerns, i.e. separating the logic of traversing the collection from the logic that determines whether or not an element is removed.

For example, if a code base needs to remove elements from collections in many difference places, chances are good that it will end up having similar loop traversal logic intertwined with remove logic in many different places. In contrast, using the removeIf function leads to only having the remove logic in the different locations - and the removal logic is really your business logic. And, if at some later point in time the traversal logic in the Java collections framework were to be improved somehow, e.g. parallelized for large collections, then all the locations using that function automatically receive the same benefit, whereas code that combines the traversal and remove logic using explicit Iterator or loops would not.

In this case, and many others, I'd argue the separation of concerns is a much better reason to prefer functional style to imperative style. Separation of concerns leads to better, cleaner code and easier code re-use precisely since those concerns can be implemented separately, and also tested separately, which results in not only cleaner production code but also cleaner test code. All of which leads to more maintainable code, which means new features and enhancements to existing code can be accomplished faster and with less chance of breaking existing code. Until the next post in this ad-hoc series on Java 8 features and a functional style, happy coding!

This blog was originally published on the Fortitude Technologies blog here.

Towards more functional Java using Streams and Lambdas

Posted on August 23, 2016 by Scott Leberknight

In the last post I showed how the Java 7 try-with-resources feature reduces boilerplate code, but probably more importantly how it removes errors related to unclosed resources, thereby eliminating an entire class of errors. In this post, the first in an ad-hoc series on Java 8 features, I'll show how the stream API can reduce the lines of code, but also how it can make the code more readable, maintainable, and less error-prone.

The following code is from a simple back-end service that lets us query metadata about messages flowing through various systems. It takes a map of key-value pairs and creates a Lucene query that can be submitted to SOLR to obtain results. It is primarily used by developers to verify behavior in a distributed system, and it does not support very sophisticated queries, since it only ANDs the key-value pairs together to form the query. For example, given a parameter map containing the (key, value) pairs (lastName, Smith) and (firstName, Bob), the method would generate the query "lastName:Smith AND firstName:Bob". As I said, not very sophisticated.

Here is the original code (where AND, COLON, and DEFAULT_QUERY are constants):

public String buildQueryString(Map<String, String> parameters) {
    int count = 0;
    StringBuilder query = new StringBuilder();

    for (Map.Entry<String, String> entry : parameters.entrySet()) {
        if (count > 0) {
            query.append(AND);
        }
        query.append(entry.getKey());
        query.append(COLON);
        query.append(entry.getValue());
        count++;
    }

    if (parameters.size() == 0) {
        query.append(DEFAULT_QUERY);
    }

    return query.toString();
}

The core business logic should be very simple, since we only need to iterate the parameter map, join the keys and values with a colon, and finally join them together. But the code above, while not terribly hard to understand, has a lot of noise. First off, it uses two mutable variables (count and query) that are modified within the for loop. The first thing in the loop is a conditional that is needed to determine whether we need to append the AND constant, as we only want to do that after the first key-value pair is added to the query. Next, joining the keys and values is done by concatenating them, one by one, to the StringBuilder holding the query. Finally the count must be incremented so that in subsequent loop iterations, we properly include the AND delimiter. After the loop there is another conditional which appends DEFAULT_QUERY if there are no parameters, and then we finally convert the StringBuilder to a String and return it.

Here is the buildQueryString method after refactoring it to use the Java 8 stream API:

public String buildQueryString(Map<String, String> parameters) {
    if (parameters.isEmpty()) {
        return DEFAULT_QUERY;
    }

    return parameters.entrySet().stream()
            .map(entry -> String.join(COLON, entry.getKey(), entry.getValue()))
            .collect(Collectors.joining(AND));
}

This code does the exact same thing, but in only 6 lines of code (counting the map and collect lines as separate even though technically they are part of the stream call chain) instead of 15. But just measuring lines of code isn't everything. The main difference here is the lack of mutable variables, no external iteration via explicit looping constructs, and no conditional statements other than the empty check which short circuits and returns DEFAULT_QUERY when there are no parameters. The code reads like a functional declaration of what we want to accomplish: stream over the parameters, convert each (key, value) to "key:value" and join them all together using the delimiter AND.

The specific Java 8 features we've used here start with the stream() method to convert the map entry set to a Java 8 java.util.stream.Stream. We then use the map operation on the stream, which applies a function (String.join) to each element (Map.Entry) in the stream. Finally, we use the collect method to reduce the elements using the joining collector into the resulting string that is the actual query we wanted to build. In the map method we've also made use of a lambda expression to specify exactly what transformation to perform on each map entry.

By removing explicit iteration and mutable variables, the code is more readable, in that a developer seeing this code for the first time will have an easier and quicker time understanding what it does. Note that much of the how it does things has been removed, for example the iteration is now implicit via the Stream, and the joining collector now does the work of inserting a delimiter between the elements. You're now declaring what you want to happen, instead of having to explicitly perform all the tedium yourself. This is more of a functional style than most Java developers are used to, and at first it can be a bit jarring, but as you practice and get used to it, the more you'll probably like it and you'll find youself able to read and write this style of code much more quickly than traditional code with lots of loops and conditionals. Generally there is also less code than when using traditional looping and control structures, which is another benefit for maintenance. I won't go so far as to say Java 8 is a functional language like Clojure or Haskell - since it isn't - but code like this has a more functional flavor to it.

There is now a metric ton of content on the internet related to Java 8 streams, but in case this is all new to you, or you're just looking for a decent place to begin learning more in-depth, the API documentation for the java.util.stream package is a good place to start. Venkat Subramaniam's Functional Programming in Java is another good resource, and at less than 200 pages can be digested pretty quickly. And for more on lambda expressions, the Lambda Expressions tutorial in the official Java Tutorials is a decent place to begin. In the next post, we'll see another example where a simple Java 8 API addition combined with a lambda expression simplifies code, making it more readable and maintainable.

This blog was originally published on the Fortitude Technologies blog here.