<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet type="text/xsl" href="http://www.sleberknight.com/blog/roller-ui/styles/rss.xsl" media="screen"?><rss version="2.0" 
  xmlns:dc="http://purl.org/dc/elements/1.1/"
  xmlns:atom="http://www.w3.org/2005/Atom" >
<channel>
  <title>Scott Leberknight&apos;s Weblog</title>
  <link>http://www.sleberknight.com/blog/sleberkn/</link>
      <atom:link rel="self" type="application/rss+xml" href="http://www.sleberknight.com/blog/sleberkn/feed/entries/rss?cat=%2FDevelopment" />
    <description>Some Day I&apos;ll Have More Time...</description>
  <language>en-us</language>
  <copyright>Copyright 2025</copyright>
  <lastBuildDate>Sun, 6 Jul 2025 19:35:31 +0000</lastBuildDate>
  <generator>Apache Roller (incubating) 4.0 (20071120033321:dave)</generator>
        <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/adding_a_mockwebserver_junit_jupiter</guid>
    <title>Adding a MockWebServer JUnit Jupiter Extension</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/adding_a_mockwebserver_junit_jupiter</link>
        <pubDate>Sat, 5 Jul 2025 19:19:05 +0000</pubDate>
    <category>Development</category>
    <category>mock</category>
    <category>jupiter</category>
    <category>junit</category>
    <category>web</category>
    <category>software</category>
    <category>testing</category>
    <category>java</category>
            <description>&lt;style type=&quot;text/css&quot;&gt;
.prettyprint {
background-color:#EFEFEF;
border:1px solid #CCCCCC;
font-size:small;
overflow:auto;
padding:5px;
}
&lt;/style&gt;

&lt;p&gt;In the &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/making_http_client_tests_cleaner&quot;&gt;last post&lt;/a&gt;, I used several utilities in the &lt;a href=&quot;https://github.com/kiwiproject/kiwi-test&quot;&gt;kiwi-test&lt;/a&gt; library to clean up and remove boilerplate from tests using OkHttp&apos;s &lt;code&gt;MockWebServer&lt;/code&gt;. But there&apos;s something else we can do to remove even more boilerplate from tests. The tests in the previous two blogs have the same code in the &lt;code&gt;@BeforeEach&lt;/code&gt; and &lt;code&gt;@AfterEach&lt;/code&gt; methods to:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Create a new &lt;code&gt;MockWebServer&lt;/code&gt; instance and set an instance field&lt;/li&gt;
&lt;li&gt;Get the base &lt;code&gt;URI&lt;/code&gt; for the server where tests can send requests&lt;/li&gt;
&lt;li&gt;Close the server after each test completes&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This setup and teardown logic can be extracted into a &lt;a href=&quot;https://junit.org/junit5/docs/current/user-guide/#extensions&quot;&gt;JUnit Jupiter extension&lt;/a&gt; that will:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Before each test, create a new &lt;code&gt;MockWebServer&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Provide methods to get the server instance and the base &lt;code&gt;URI&lt;/code&gt; of the server&lt;/li&gt;
&lt;li&gt;After each test, close the server&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Here is one implementation:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
package org.kiwiproject.test.okhttp3.mockwebserver;

// imports...

public class MockWebServerExtension implements BeforeEachCallback, AfterEachCallback {

    @Getter
    @Accessors(fluent = true)
    private MockWebServer server;

    @Getter
    @Accessors(fluent = true)
    private URI uri;

    public MockWebServerExtension() {
        this(new MockWebServer());
    }

    public MockWebServerExtension(MockWebServer server) {
        this.server = KiwiPreconditions.requireNotNull(server, &quot;server must not be nul&quot;);
    }

    @Override
    public void beforeEach(ExtensionContext context) throws IOException {
        server = new MockWebServer();
        server.start();
        uri = server.url(&quot;/&quot;).uri();
    }

    @Override
    public void afterEach(ExtensionContext context) {
        KiwiIO.closeQuietly(server);
    }
}
&lt;/pre&gt;

&lt;p&gt;This implementation provides two constructors. The no-arg constructor creates a &lt;code&gt;MockWebServer&lt;/code&gt; instance for you, while the one-arg constructor lets you create your own instance with any customization your tests need. For example, to support TLS.&lt;/p&gt;

&lt;p&gt;It also provides the &lt;code&gt;server()&lt;/code&gt; and &lt;code&gt;uri()&lt;/code&gt; methods to easily get the &lt;code&gt;MockWebServer&lt;/code&gt; instance and the base &lt;code&gt;URI&lt;/code&gt; for use in your tests. Note these methods are generated usng Lombok, though they would be easy enough to create manually.&lt;/p&gt;

&lt;p&gt;Using the extension in tests is straightforward. You add a &lt;code&gt;MockWebServerExtension&lt;/code&gt; instance field and annotate it with &lt;code&gt;@RegisterExtension&lt;/code&gt;:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
@RegisterExtension
private final MockWebServerExtension serverExtension = new MockWebServerExtension();
&lt;/pre&gt;

&lt;p&gt;For convenience, you can also declare a &lt;code&gt;MockWebServer&lt;/code&gt; field:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
private MockWebServer server;
&lt;/pre&gt;

&lt;p&gt;Then in your test&apos;s &lt;code&gt;@BeforeEach&lt;/code&gt; method, you initialize the &lt;code&gt;server&lt;/code&gt; field, which can then be referenced in tests.&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
@BeforeEach
void setUp() {
    server = serverExtension.server();
    
    // additional initialization code...
}
&lt;/pre&gt;

&lt;p&gt;Alternatively, you can get the server in each test using the extension&apos;s &lt;code&gt;server()&lt;/code&gt; method:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
@Test
void someTest() {
    var server = serverExtension.server();
    
    // test code...
}
&lt;/pre&gt;

&lt;p&gt;Since the extension takes care of closing the server, you don&apos;t need to have a custom &lt;code&gt;@AfterEach&lt;/code&gt; method to do that.&lt;/p&gt;

&lt;p&gt;Now, you can write a complete test that uses the extension like the following:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
class MathApiTest {

    @RegisterExtension
    private final MockWebServerExtension serverExtension = new MockWebServerExtension();
    
    private MathApiClient mathClient;
    private Client client;
    private MockWebServer server;
    
    @BeforeEach
    void setUp() {
        // Create the Jersey client
        client = ClientBuilder.newBuilder()
                .connectTimeout(500, TimeUnit.MILLISECONDS)
                .readTimeout(500, TimeUnit.MILLISECONDS)
                .build();
        
        server = serverExtension.server();
        var baseUri = serverExtension.uri();
        mathClient = new MathApiClient(client, baseUri);
    }

    @AfterEach
    void tearDown() {
        // Close the Jersey client
        client.close();
    }

    @Test
    void shouldAdd() {
        server.enqueue(new MockResponse()
                .setResponseCode(200)
                .setHeader(HttpHeaders.CONTENT_TYPE, &quot;text/plain&quot;)
                .setBody(&quot;42&quot;));

        assertThat(mathClient.add(40, 2)).isEqualTo(42);

        var recordedRequest = takeRequiredRequest(server);

        assertThatRecordedRequest(recordedRequest)
                .isGET()
                .hasPath(&quot;/math/add/40/2&quot;)
                .hasNoBody();
    }

    // ...more tests...
}
&lt;/pre&gt;

&lt;p&gt;This test&apos;s &lt;code&gt;@BeforeEach&lt;/code&gt; method gets the &lt;code&gt;MockWebServer&lt;/code&gt; and the base &lt;code&gt;URI&lt;/code&gt; directly from the &lt;code&gt;MockWebServerExtension&lt;/code&gt;. So the only initialization logic it needs to do is to create a Jersey client and an instance of the class being tested, &lt;code&gt;MathApiClient&lt;/code&gt;. As mentioned earlier, the test doesn&apos;t need to close the server in the &lt;code&gt;@AfterEach&lt;/code&gt; method, so all it needs to do is close the Jersey client.&lt;/p&gt;

&lt;p&gt;Each test then is the same as the previous post, where we used &lt;code&gt;RecordedReqests&lt;/code&gt; and &lt;code&gt;RecordedRequestAssertions&lt;/code&gt; from &lt;code&gt;kiwi-test&lt;/code&gt; to keep the test code clean.&lt;/p&gt;

&lt;p&gt;And that&apos;s all there is to it! The extension code shown above provides what you need in the majority of testing situations. But you don&apos;t need to create your own or copy this code if you don&apos;t want. &lt;a href=&quot;https://github.com/kiwiproject/kiwi-test&quot;&gt;kiwi-test&lt;/a&gt; version &lt;a href=&quot;https://github.com/kiwiproject/kiwi-test/milestone/53?closed=1&quot;&gt;3.9.0&lt;/a&gt; adds its own &lt;code&gt;MockWebServerExtension&lt;/code&gt;. It is very similar to the extension show here, but adds a few additional features such as the ability to specify a &quot;server customizer&quot;, which is a &lt;code&gt;Consumer&amp;lt;MockWebServer&amp;gt;&lt;/code&gt; that lets you customize a server, for example, to add TLS support and only support HTTP 1.1 and 2.0:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
@RegisterExtension
private final MockWebServerExtension serverExtension = new MockWebServerExtension(svr -&gt; {
    svr.setProtocols(List.of(Protocol.HTTP_2, Protocol.HTTP_1_1));
    svr.useHttps(getSocketFactory(), false);
});
&lt;/pre&gt;

&lt;p&gt;It also provides a &lt;code&gt;uri(path)&lt;/code&gt; method that lets you easily get a &lt;code&gt;URI&lt;/code&gt; relative to the base &lt;code&gt;URI&lt;/code&gt; of the server:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
var statusURI = serverExtension.uri(&quot;/status&quot;);
&lt;/pre&gt;

&lt;h3&gt;Wrapping Up&lt;/h3&gt;

&lt;p&gt;Using a JUnit extension like the &lt;code&gt;MockWebServerExtension&lt;/code&gt; shown here is one more thing you can do to eliminate boilerplate code in your tests. It can also provide the flexibility needed by different tests by allowing customization of the &lt;code&gt;MockWebServer&lt;/code&gt;.&lt;/p&gt;

</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/making_http_client_tests_cleaner</guid>
    <title>Making HTTP Client Tests Cleaner with MockWebServer and kiwi-test</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/making_http_client_tests_cleaner</link>
        <pubDate>Sun, 9 Mar 2025 19:50:04 +0000</pubDate>
    <category>Development</category>
    <category>web</category>
    <category>mock</category>
    <category>testing</category>
    <category>java</category>
    <category>junit</category>
    <category>jupiter</category>
    <category>software</category>
            <description>&lt;style type=&quot;text/css&quot;&gt;
.prettyprint {
background-color:#EFEFEF;
border:1px solid #CCCCCC;
font-size:small;
overflow:auto;
padding:5px;
}
&lt;/style&gt;

&lt;p&gt;
In the &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/testing_http_client_code_with&quot;&gt;previous blog&lt;/a&gt;, I showed using &lt;code&gt;MockWebServer&lt;/code&gt; (part of &lt;a href=&quot;https://square.github.io/okhttp/&quot;&gt;OkHttp&lt;/a&gt;) to test HTTP client code. The test code was pretty clean and simple, but there are a few &lt;em&gt;minor&lt;/em&gt; annoyances:
&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;The boilerplate invocation to get the &lt;code&gt;URI&lt;/code&gt; of the &lt;code&gt;MockWebServer&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Having to deal with &lt;code&gt;InterruptedException&lt;/code&gt; using &lt;code&gt;takeRequest&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Needing to assert that the &lt;code&gt;RecordedRequest&lt;/code&gt; returned from &lt;code&gt;takeRequest&lt;/code&gt; is not &lt;code&gt;null&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Wrapping the assertions on a &lt;code&gt;RecordedRequest&lt;/code&gt; in &lt;code&gt;assertAll&lt;/code&gt; versus having an AssertJ-style fluent API&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;I fully admit these are all minor. However, the more I used &lt;code&gt;MockWebServer&lt;/code&gt; the more I wanted to:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Reduce boilerplate code&lt;/li&gt;
&lt;li&gt;Not need to deal with &lt;code&gt;InterruptedException&lt;/code&gt; in tests&lt;/li&gt;
&lt;li&gt;Not have to null-check the &lt;code&gt;RecordedRequest&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Have a fluent assertion API for &lt;code&gt;RecordedRequest&lt;/code&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;In addition, there&amp;#39;s another &amp;quot;gotcha&amp;quot; which is that if you use the no-argument &lt;code&gt;takeRequest()&lt;/code&gt; method, your tests might &lt;em&gt;never end&lt;/em&gt;. From the Javadoc, the &lt;code&gt;takeRequest()&lt;/code&gt; method &amp;quot;will block until the request is available, &lt;em&gt;possibly forever&lt;/em&gt;&amp;quot;. (emphasis mine). It actually happened to me a few times before I actually read the Javadocs! After that I decided to only use the &lt;code&gt;takeRequest&lt;/code&gt; method that accepts a timeout. This fixes the &amp;quot;never ends&amp;quot; problem. But whichever of the &lt;code&gt;takeRequest&lt;/code&gt; methods you use, they both throw &lt;code&gt;InterruptedException&lt;/code&gt; which you need to handle (unless you are using Kotlin in which case you don&amp;#39;t need to worry about it).&lt;/p&gt;

&lt;p&gt;To resolve the above &amp;quot;problems&amp;quot; I added several test utilities to &lt;a href=&quot;https://github.com/kiwiproject/kiwi-test&quot;&gt;kiwi-test&lt;/a&gt; in release &lt;a href=&quot;https://github.com/kiwiproject/kiwi-test/releases/tag/v3.5.0&quot;&gt;3.5.0&lt;/a&gt; last July:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;&lt;code&gt;MockWebServers&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;MockWebServerAssertions&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;RecordedRequests&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;&lt;code&gt;RecordedRequestAssertions&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;mockwebservers&quot;&gt;MockWebServers&lt;/h3&gt;

&lt;p&gt;This currently contains only two overloaded methods named &lt;code&gt;uri&lt;/code&gt;. These are convenience methods to get the URI for a &lt;code&gt;MockWebServer&lt;/code&gt;, either with or without a path. For example, instead of:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
this.baseUri = server.url(&amp;quot;/math&amp;quot;).uri();
&lt;/pre&gt;

&lt;p&gt;you can do this:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
this.baseUri = MockWebServers.uri(server, &amp;quot;/math&amp;quot;);
&lt;/pre&gt;

&lt;p&gt;And with a static import for &lt;code&gt;MockWebServers&lt;/code&gt;, the code is even shorter.&lt;/p&gt;

&lt;p&gt;Is this small amount of boilerplate really worth these methods? Maybe, maybe not. Once I had written similar code a few dozen times, I decided it was worth having methods that accomplished the same thing.&lt;/p&gt;

&lt;p&gt;Generally, I use these methods in &lt;code&gt;@BeforeEach&lt;/code&gt; methods and store the value in a field, so that all tests can easily access it. Sometimes you don&amp;#39;t need to store it in a field, but instead just pass it to the HTTP client:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
var baseUri = MockWebServers.uri(server, &amp;quot;/math&amp;quot;);
this.mathClient = new MathApiClient(client, baseUri);
&lt;/pre&gt;

&lt;p&gt;In this example, the &lt;code&gt;mathClient&lt;/code&gt; is stored in a field and each test uses it.&lt;/p&gt;

&lt;h3 id=&quot;mockwebserverassertions&quot;&gt;MockWebServerAssertions&lt;/h3&gt;

&lt;p&gt;This class is a starting point for assertions on a &lt;code&gt;MockWebServer&lt;/code&gt;. It contains a few static factory methods to start from, one named &lt;code&gt;assertThat&lt;/code&gt; and one named &lt;code&gt;assertThatMockWebServer&lt;/code&gt;. The reason for the second one is to avoid conflicts with AssertJ&amp;#39;s &lt;code&gt;Assertions#assertThat&lt;/code&gt; methods. It provides a way to assert the number of requests made to the &lt;code&gt;MockWebServer&lt;/code&gt; and has several other methods to assert on &lt;code&gt;RecordedRequest&lt;/code&gt;. For example, assuming you use a static import:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
assertThatMockWebServer(server)
        .hasRequestCount(1)
        .recordedRequest()
        .isGET()
        .hasPath(&amp;quot;/status&amp;quot;);
&lt;/pre&gt;

&lt;p&gt;This code verifies that exactly one request was made, then uses the &lt;code&gt;recordedRequest()&lt;/code&gt; method to get the &lt;code&gt;RecordedRequest&lt;/code&gt;, and finally makes assertions that the request was a &lt;code&gt;GET&lt;/code&gt; with path &lt;code&gt;/status&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;If you want to verify more than one request, you can use the &lt;code&gt;hasRecordedRequest&lt;/code&gt;. The following code verifies that there were two requests made, and checks each one in the &lt;code&gt;Consumer&lt;/code&gt; that is passed to &lt;code&gt;hasRecordedRequest&lt;/code&gt;:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
var path1 = &amp;quot;...&amp;quot;;
var path2 = &amp;quot;...&amp;quot;;
var requestBody = &amp;quot;{ ... }&amp;quot;;

assertThatMockWebServer(server)
        .hasRequestCount(2)
        .hasRecordedRequest(recordedRequest1 -&amp;gt; {
            assertThat(recordedRequest1.getMethod()).isEqualTo(&amp;quot;GET&amp;quot;);
            assertThat(recordedRequest1.getPath()).isEqualTo(path1);
        })
        .hasRecordedRequest(recordedRequest2 -&amp;gt; {
            assertThat(recordedRequest2.getMethod()).isEqualTo(&amp;quot;POST&amp;quot;);
            assertThat(recordedRequest2.getPath()).isEqualTo(path2);
            assertThat(recordedRequest2.getBody().readUtf8()).isEqualTo(requestBody);
        });
&lt;/pre&gt;

&lt;h3 id=&quot;recordedrequests&quot;&gt;RecordedRequests&lt;/h3&gt;

&lt;p&gt;While &lt;code&gt;MockWebServers&lt;/code&gt; and &lt;code&gt;MockWebServerAssertions&lt;/code&gt; are useful, &lt;code&gt;RecordedRequests&lt;/code&gt; and &lt;code&gt;RecordedRequestAssertions&lt;/code&gt; (discussed below) are the tools I use most when writing HTTP client tests.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;RecordedRequests&lt;/code&gt; contains several methods to get a &lt;code&gt;RecordedRequest&lt;/code&gt; from a &lt;code&gt;MockWebServer&lt;/code&gt;. The method to use depends on whether there &lt;em&gt;must be&lt;/em&gt; a request, or whether there &lt;em&gt;may or may not be&lt;/em&gt; a request. If a request is required, you can use &lt;code&gt;takeRequiredRequest&lt;/code&gt;:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
var recordedRequest = takeRequiredRequest(server);

// make assertions on the RecordedRequest instance
&lt;/pre&gt;

&lt;p&gt;But if it&amp;#39;s possible that there might not be a request, you can use either &lt;code&gt;takeRequestOrEmpty&lt;/code&gt; or &lt;code&gt;takeRequestOrNull&lt;/code&gt;. The former returns &lt;code&gt;Optional&amp;lt;RecordedRequest&amp;gt;&lt;/code&gt; while the latter returns a (possibly &lt;code&gt;null&lt;/code&gt;) &lt;code&gt;RecordedRequest&lt;/code&gt;. For example, if some business logic makes a request but only when certain requirements are met, a test can use one of these two methods:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
// work with an Optional&amp;lt;RecordedRequest&amp;gt;
var maybeRequest = takeRequestOrEmpty(server);
assertThat(maybeRequest).isEmpty();

// or with a RecordedRequest directly
var request = takeRequestOrNull(server);
assertThat(request).isNull();
&lt;/pre&gt;

&lt;p&gt;But wait, there&amp;#39;s more. Not much, but there is another method &lt;code&gt;assertNoMoreRequests&lt;/code&gt; that does what you expect: it verifies the &lt;code&gt;MockWebServer&lt;/code&gt; does not contain any additional requests. So, once you have checked one or more requests, you can call it to verify the client didn&amp;#39;t do anything else unexpected:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
// get and assert one or more RecordedRequest

// now, verify there weren&amp;#39;t any additional requests
assertNoMoreRequests(server);
&lt;/pre&gt;

&lt;p&gt;As mentioned in the introduction, the &lt;code&gt;RecordedRequest#takeRequest()&lt;/code&gt; method blocks, &lt;em&gt;possibly forever&lt;/em&gt;. &lt;code&gt;RecordedRequests&lt;/code&gt; avoids this problem by assuming all requests should already have been made by the time you want to get a request and make assertions on it.&lt;/p&gt;

&lt;p&gt;Under the hood, all &lt;code&gt;RecordedRequests&lt;/code&gt; methods call &lt;code&gt;takeRequest(timeout: Long, unit: TimeUnit)&lt;/code&gt; (it&amp;#39;s Kotlin, so the argument name is first and the type is second) and only wait 10 milliseconds before giving up. They handle &lt;code&gt;InterruptedException&lt;/code&gt; by catching it, re-interrupting the current thread, and throwing an &lt;code&gt;UncheckedInterruptedException&lt;/code&gt; (from the &lt;a href=&quot;https://github.com/kiwiproject/kiwi&quot;&gt;kiwi&lt;/a&gt; library). This allows for cleaner test code without needing to catch &lt;code&gt;InterruptedException&lt;/code&gt; or declare a &lt;code&gt;throws&lt;/code&gt; clause. So, your test code can just do this without worrying about timeouts:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
var recordedRequest = RecordedRequests.takeRequiredRequest(server);
&lt;/pre&gt;

&lt;h3 id=&quot;recordedrequestassertions&quot;&gt;RecordedRequestAssertions&lt;/h3&gt;

&lt;p&gt;You use the methods in &lt;code&gt;RecordedRequests&lt;/code&gt; to get one or more &lt;code&gt;RecordedRequest&lt;/code&gt; to make assertions on. You can use &lt;code&gt;RecordedRequestAssertions&lt;/code&gt; to make these assertions in a fluent-style API like AssertJ. If you don&amp;#39;t like the AssertJ assertion chaining style, you can skip this section and move on with life. But if you like AssertJ, read on.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;RecordedRequestAssertions&lt;/code&gt; contains several static methods to start from, and a number of assertion methods to check things like the request method, path, URI, and body. For example, suppose you are using the &amp;quot;Math API&amp;quot; from the previous blog and want to test addition. You can do this:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
assertThatRecordedRequest(recordedRequest)
        .isGET()
        .hasPath(&amp;quot;/math/add/40/2&amp;quot;)
        .hasNoBody();
&lt;/pre&gt;

&lt;p&gt;Here you are checking that a &lt;code&gt;GET&lt;/code&gt; request was made to the server with path &lt;code&gt;/math/add/40/2&lt;/code&gt;, and that there was no request body (since &lt;code&gt;GET&lt;/code&gt; requests should in general not have one).&lt;/p&gt;

&lt;p&gt;You can also verify the request body. Suppose you have a &amp;quot;User API&amp;quot; to perform various actions. To test a request sent to the &amp;quot;Create User&amp;quot; endpoint, you can write a test like this:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
@Test
void shouldCreateUser() {
    var id = RandomGenerator.getDefault().nextLong(1, 501);
    var responseEntity = User.newWithRedactedPassword(id, &amp;quot;s_white&amp;quot;, &amp;quot;Shaun White&amp;quot;);

    server.enqueue(new MockResponse()
            .setResponseCode(201)
            .setHeader(HttpHeaders.CONTENT_TYPE, &amp;quot;application/json&amp;quot;)
            .setHeader(HttpHeaders.LOCATION, UriBuilder.fromUri(baseUri).path(&amp;quot;/users/{id}&amp;quot;).build(id))
            .setBody(JSON_HELPER.toJson(responseEntity)));

    var newUser = new User(null, &amp;quot;s_white&amp;quot;, &amp;quot;snowboarding&amp;quot;, &amp;quot;Shaun White&amp;quot;);
    var createdUser = apiClient.create(newUser);

    assertAll(
            () -&amp;gt; assertThat(createdUser.id()).isEqualTo(id),
            () -&amp;gt; assertThat(createdUser.username()).isEqualTo(&amp;quot;s_white&amp;quot;),
            () -&amp;gt; assertThat(createdUser.password()).isEqualTo(User.REDACTED_PASSWORD)
    );

    var recordedRequest = RecordedRequests.takeRequiredRequest(server);

    assertThatRecordedRequest(recordedRequest)
            .isPOST()
            .hasHeader(&amp;quot;Accept&amp;quot;, &amp;quot;application/json&amp;quot;)
            .hasPath(&amp;quot;/users&amp;quot;)
            .hasBody(JSON_HELPER.toJson(newUser));
            
    RecordedRequests.assertNoMoreRequests(server);
}
&lt;/pre&gt;

&lt;p&gt;This test does the following:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Create a sample User entity&lt;/li&gt;
&lt;li&gt;Set up the response that the &lt;code&gt;MockWebServer&lt;/code&gt; should return&lt;/li&gt;
&lt;li&gt;Call the &lt;code&gt;create&lt;/code&gt; method on the &amp;quot;User API&amp;quot; client&lt;/li&gt;
&lt;li&gt;Make some assertions on the returned &lt;code&gt;User&lt;/code&gt; object&lt;/li&gt;
&lt;li&gt;Get the recorded request from &lt;code&gt;MockWebServer&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Check the request&lt;/li&gt;
&lt;li&gt;Verify that there are no more requests&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;To check the request, we verify that the request was a &lt;code&gt;POST&lt;/code&gt; to &lt;code&gt;/users&lt;/code&gt;, that it contains the required &lt;code&gt;Accept&lt;/code&gt; header, and that it has the expected body. If the API is using JSON, then instead of doing the Object-to-JSON conversion manually, you can use &lt;code&gt;hasJsonBodyWithEntity&lt;/code&gt;:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
assertThatRecordedRequest(recordedRequest)
        .isPOST()
        .hasHeader(&amp;quot;Accept&amp;quot;, &amp;quot;application/json&amp;quot;)
        .hasPath(&amp;quot;/users&amp;quot;)
        .hasJsonBodyWithEntity(newUser);
&lt;/pre&gt;

&lt;p&gt;This will use a default &lt;a href=&quot;https://github.com/kiwiproject/kiwi&quot;&gt;kiwi&lt;/a&gt; &lt;code&gt;JsonHelper&lt;/code&gt; instance. If you need control over the JSON serializaiton, you can use one of the overloaded &lt;code&gt;hasJsonBodyWithEntity&lt;/code&gt; methods which accept either &lt;code&gt;JsonHelper&lt;/code&gt; or a Jackson &lt;code&gt;ObjectMapper&lt;/code&gt;. For example:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
ObjectMapper mapper = customObjectMapper();

assertThatRecordedRequest(recordedRequest)
        .isPOST()
        .hasHeader(&amp;quot;Accept&amp;quot;, &amp;quot;application/json&amp;quot;)
        .hasPath(&amp;quot;/users&amp;quot;)
        .hasJsonBodyWithEntity(newUser, mapper);
&lt;/pre&gt;

&lt;p&gt;There are various other methods in &lt;code&gt;RecordedRequestAssertions&lt;/code&gt; as well, for example methods to check the TLS version or whether there is a failure, perhaps because the inbound request was truncated. But the assertions in the above examples handle most of the use cases I&amp;#39;ve needed when writing HTTP client tests.&lt;/p&gt;

&lt;h3 id=&quot;wrapping-up&quot;&gt;Wrapping Up&lt;/h3&gt;

&lt;p&gt;The &lt;a href=&quot;https://github.com/kiwiproject/kiwi-test&quot;&gt;kiwi-test&lt;/a&gt; library contains test utilities for making HTTP client testing with &lt;code&gt;MockWebServer&lt;/code&gt; just a bit less tedious, with a little less boilerplate, and provides AssertJ-style fluent assertions for &lt;code&gt;RecordedRequest&lt;/code&gt;. You can use these utilities to write cleaner and less &amp;quot;boilerplate-y&amp;quot; tests.&lt;/p&gt;
</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/testing_http_client_code_with</guid>
    <title>Testing HTTP Client Code with MockWebServer</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/testing_http_client_code_with</link>
        <pubDate>Thu, 16 Jan 2025 02:09:18 +0000</pubDate>
    <category>Development</category>
    <category>web</category>
    <category>mock</category>
    <category>jupiter</category>
    <category>junit</category>
    <category>software</category>
    <category>java</category>
    <category>testing</category>
            <description>&lt;p&gt;When testing HTTP client code, it can be challenging to verify your application&apos;s behavior. For example, if you have an HTTP client that makes calls to some third-party API, or even to another service that you control, you want to make sure that you are sending the correct requests and handling the responses properly. There are various libraries available to help, and many times the library or framework you&apos;re using provides some kind of test support.&lt;/p&gt;

&lt;p&gt;For example, I&apos;ve used Dropwizard to create REST-based web services for a number of years. Dropwizard uses Jersey, which is the reference implementation of Jakarta RESTful Web Services (formerly known as JAX-RS). Dropwizard provides a way to test HTTP client implementations by creating a resource within your test that acts as a &quot;test double&quot; of the real server you are trying to simulate. When the test executes, a real HTTP server is started that can respond to real HTTP requests. No mocking, which is important since mocks can&apos;t easily simulate all the various things that can happen with HTTP requests.&lt;/p&gt;

&lt;p&gt;Suppose you have an HTTP client that uses Jersey &lt;code&gt;Client&lt;/code&gt; to call a &quot;Math API&quot;. For now, you only care about adding two numbers, so your client looks like:&lt;/p&gt;

&lt;style type=&quot;text/css&quot;&gt;
.prettyprint {
background-color:#EFEFEF;
border:1px solid #CCCCCC;
font-size:small;
overflow:auto;
padding:5px;
}
&lt;/style&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
public class MathApiClient {

    private final Client client;
    private final URI baseUri;

    public MathApiClient(Client client, URI baseUri) {
        this.client = client;
        this.baseUri = baseUri;
    }

    public int add(int a, int b) {
        var response = client.target(baseUri)
                .path(&quot;/math/add/{a}/{b}&quot;)
                .resolveTemplate(&quot;a&quot;, a)
                .resolveTemplate(&quot;b&quot;, b)
                .request()
                .get();

        return response.readEntity(Integer.class);
    }
}
&lt;/pre&gt;

&lt;p&gt;You want to design the client for easy testing, so the constructor accepts a Jersey &lt;code&gt;Client&lt;/code&gt; and a &lt;code&gt;URI&lt;/code&gt;, which lets you easily change the target server location. That&apos;s important, since you need to be able to provide the URI of the test server.&lt;/p&gt;

&lt;p&gt;Here&apos;s an example of a Math API test class using Dropwizard&apos;s integration testing support:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
@ExtendWith(DropwizardExtensionsSupport.class)
class DropwizardMathApiClientTest {

    @Path(&quot;/math&quot;)
    public static class MathStubResource {
        @GET
        @Path(&quot;/add/{a}/{b}&quot;)
        @Produces(MediaType.TEXT_PLAIN)
        public Response add(@PathParam(&quot;a&quot;) int a, @PathParam(&quot;b&quot;) int b) {
            var answer = a + b;
            return Response.ok(answer).build();
        }
    }

    private static final DropwizardClientExtension CLIENT_EXTENSION =
            new DropwizardClientExtension(new MathStubResource());

    private MathApiClient mathClient;
    private Client client;

    @BeforeEach
    void setUp() {
        client = ClientBuilder.newBuilder()
                .connectTimeout(500, TimeUnit.MILLISECONDS)
                .readTimeout(500, TimeUnit.MILLISECONDS)
                .build();
        var baseUri = CLIENT_EXTENSION.baseUri();
        mathClient = new MathApiClient(client, baseUri);
    }

    @AfterEach
    void tearDown() {
        client.close();
    }

    @Test
    void shouldAdd() {
        assertThat(mathClient.add(40, 2)).isEqualTo(42);
    }
}
&lt;/pre&gt;

&lt;p&gt;In this code, it&apos;s the &lt;code&gt;DropwizardClientExtension&lt;/code&gt; that provides all the real HTTP server functionality. You provide it the stub resource (a new &lt;code&gt;MathStubResource&lt;/code&gt; instance) and it takes care of starting a real application that responds to HTTP requests and responds as you defined in the stub resource. Then you write tests that use the &lt;code&gt;MathApiClient&lt;/code&gt;, make assertions as you normally would, and so on.&lt;/p&gt;

&lt;p&gt;This works great, but there are some downsides. First, there is no way to (easily) verify the HTTP &lt;em&gt;requests&lt;/em&gt; that the HTTP client made. The client makes the HTTP request and handles the response, but unless it provides some way to access the requests it has made, there&apos;s not really any way to verify this. You can add code into the stub resource to capture the requests, and provide a way for test code to access them, but that adds complexity to the stub resource.&lt;/p&gt;

&lt;p&gt;Second, while testing the &quot;happy path&quot; is straightforward, things quickly become more difficult if you want to test errors, invalid input, and other &quot;not happy path&quot; scenarios. For example, let&apos;s say you want to test how your client responds when it receives an error response such as a &lt;code&gt;400 Bad Request&lt;/code&gt; or &lt;code&gt;500 Internal Server Error&lt;/code&gt;. How can you do this? One way is &quot;magic input&quot; where the server responds with a &lt;code&gt;400&lt;/code&gt; when you provide one set of input (e.g., whenever &lt;code&gt;a&lt;/code&gt; is &lt;code&gt;84&lt;/code&gt;) and a &lt;code&gt;500&lt;/code&gt; when you provide a different input (e.g., whenever &lt;code&gt;a&lt;/code&gt; is &lt;code&gt;142&lt;/code&gt;). Depending on the number of error cases you want to test, the stub resource code can quickly get complicated with conditionals. Another way is to use some kind of &quot;flag&quot; field inside the test stub resource class, where each test can &quot;record&quot; the response it wants. But this starts to become a &quot;mini-framework&quot; as you need more and more features.&lt;/p&gt;

&lt;p&gt;Something else you can do is to create &lt;em&gt;separate&lt;/em&gt; tests with different stub resources for different scenarios. But again, this can get out of control quickly if your HTTP client has a lot of methods and you want to test each one thoroughly.&lt;/p&gt;

&lt;p&gt;Despite these shortcomings, you can still write good HTTP tests using what Dropwizard (and other similar libraries) provides. I&apos;ve used the Dropwizard test support for the vast majority of HTTP client testing over the past few years. But I&apos;ve recently come across the excellent &lt;code&gt;MockWebServer&lt;/code&gt; from OkHttp. &lt;em&gt;Basically, it is like a combination of a real HTTP server to test against and a mocking library such as Mockito.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;To test HTTP clients using &lt;code&gt;MockWebServer&lt;/code&gt;, you:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Record the responses you want to receive&lt;/li&gt;
&lt;li&gt;Run your HTTP client code&lt;/li&gt;
&lt;li&gt;Make assertions about the result from the client (if any)&lt;/li&gt;
&lt;li&gt;Verify the client made the expected &lt;em&gt;requests&lt;/em&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;This is very similar to using mocking like in Mockito, except that &lt;code&gt;MockWebServer&lt;/code&gt; lets you test against the full HTTP/HTTPS request/response lifecycle in a realistic manner. So, rewriting the above test to use &lt;code&gt;MockWebServer&lt;/code&gt; looks like:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
class OkHttpMathApiClientTest {

    private MathApiClient mathClient;
    private Client client;
    private MockWebServer server;

    @BeforeEach
    void setUp() throws URISyntaxException {
        client = ClientBuilder.newBuilder()
                .connectTimeout(500, TimeUnit.MILLISECONDS)
                .readTimeout(500, TimeUnit.MILLISECONDS)
                .build();

        server = new MockWebServer();
        var baseUri = server.url(&quot;/&quot;).uri();

        mathClient = new MathApiClient(client, baseUri);
    }

    @AfterEach
    void tearDown() throws IOException {
        client.close();
        server.close();
    }

    @Test
    void shouldAdd() throws InterruptedException {
        server.enqueue(new MockResponse()
                .setResponseCode(200)
                .setHeader(HttpHeaders.CONTENT_TYPE, &quot;text/plain&quot;)
                .setBody(&quot;42&quot;));

        assertThat(mathClient.add(40, 2)).isEqualTo(42);

        var recordedRequest = server.takeRequest(1, TimeUnit.SECONDS);
        assertThat(recordedRequest).isNotNull();

        assertAll(
                () -&amp;gt; assertThat(recordedRequest.getMethod()).isEqualTo(&quot;GET&quot;),
                () -&amp;gt; assertThat(recordedRequest.getPath()).isEqualTo(&quot;/math/add/40/2&quot;),
                () -&amp;gt; assertThat(recordedRequest.getBodySize()).isZero()
        );
    }
}
&lt;/pre&gt;

&lt;p&gt;In this test, we first &lt;em&gt;record&lt;/em&gt; the response (or responses) we want to receive by calling &lt;code&gt;enqueue&lt;/code&gt; with a &lt;code&gt;MockResponse&lt;/code&gt;. Don&apos;t let the &quot;Mock&quot; in the name fool you, though, since this just tells &lt;code&gt;MockWebServer&lt;/code&gt; the response you want. It will take care of returning a real HTTP response from a real HTTP server. The next line in the test is the same as in the Dropwizard example above, where we call the HTTP client and assert the result. But after that, &lt;code&gt;MockWebServer&lt;/code&gt; lets you get the requests that the client code made using &lt;code&gt;takeRequest&lt;/code&gt;, so you can verify that it sent exactly what it should have, with the expected path, query parameters, headers, body, etc.&lt;/p&gt;

&lt;p&gt;One advantage of using &lt;code&gt;MockWebServer&lt;/code&gt; is that it is really easy to record different responses and test how your client responds. For example, suppose the Math API returns a &lt;code&gt;400&lt;/code&gt; response if you provide two numbers that add up to a number higher than the maximum value of a Java &lt;code&gt;int&lt;/code&gt;, or a &lt;code&gt;500&lt;/code&gt; response if there is a server error. Here are a few tests for those situations:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
@Test
void shouldThrowIllegalArgumentException_ForInvalidInput() throws InterruptedException {
    server.enqueue(new MockResponse()
            .setResponseCode(400)
            .setHeader(HttpHeaders.CONTENT_TYPE, &quot;text/plain&quot;)
            .setBody(&quot;overflow&quot;));

    assertThatIllegalArgumentException()
            .isThrownBy(() -&amp;gt; mathClient.add(Integer.MAX_VALUE, 1))
            .withMessage(&quot;Invalid arguments: overflow&quot;);

    var recordedRequest = server.takeRequest(1, TimeUnit.SECONDS);
    assertThat(recordedRequest).isNotNull();

    assertAll(
            () -&amp;gt; assertThat(recordedRequest.getMethod()).isEqualTo(&quot;GET&quot;),
            () -&amp;gt; assertThat(recordedRequest.getPath()).isEqualTo(&quot;/math/add/%d/1&quot;, Integer.MAX_VALUE)
    );
}

@Test
void shouldThrowIllegalStateException_ForServerError() throws InterruptedException {
    server.enqueue(new MockResponse()
            .setResponseCode(500)
            .setHeader(HttpHeaders.CONTENT_TYPE, &quot;text/plain&quot;)
            .setBody(&quot;Server error: can&apos;t add right now&quot;));

    assertThatIllegalStateException()
            .isThrownBy(() -&amp;gt; mathClient.add(2, 2))
            .withMessage(&quot;Unknown error: Server error: can&apos;t add right now&quot;);

    var recordedRequest = server.takeRequest(1, TimeUnit.SECONDS);
    assertThat(recordedRequest).isNotNull();

    assertAll(
            () -&amp;gt; assertThat(recordedRequest.getMethod()).isEqualTo(&quot;GET&quot;),
            () -&amp;gt; assertThat(recordedRequest.getPath()).isEqualTo(&quot;/math/add/2/2&quot;, Integer.MAX_VALUE)
    );
}
&lt;/pre&gt;

&lt;p&gt;Each test defines the response(s) that the &lt;code&gt;MockWebServer&lt;/code&gt; should sent it. This makes it possible to create clean, self-contained test code that is easy to understand and change.&lt;/p&gt;

&lt;p&gt;To make these tests pass, we should update the original implementation with some error handling code:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
public int add(int a, int b) {
    var response = client.target(baseUri)
            .path(&quot;/math/add/{a}/{b}&quot;)
            .resolveTemplate(&quot;a&quot;, a)
            .resolveTemplate(&quot;b&quot;, b)
            .request()
            .get();

    if (successful(response)) {
        return response.readEntity(Integer.class);
    } else if (clientError(response)) {
        throw new IllegalArgumentException(&quot;Invalid arguments: &quot; + response.readEntity(String.class));
    }

    throw new IllegalStateException(&quot;Unknown error: &quot; + response.readEntity(String.class));
}
&lt;/pre&gt;

&lt;p&gt;The code examples (adding two numbers) I&apos;ve used are simple. In &quot;real life&quot; you are probably calling more complicated and expansive APIs, and need to test various success and failure scenarios. To recap, some of the advantages of using &lt;code&gt;MockWebServer&lt;/code&gt; in your HTTP client tests are:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;You can record different responses for each test (similar to setting up mock objects, e.g., Mockito)&lt;/li&gt;
&lt;li&gt;You can avoid having to implement &quot;stub&quot; resources that are a &quot;shadow API&quot; of the remote API&lt;/li&gt;
&lt;li&gt;Avoiding complexity in &quot;stub&quot; resources when adding logic to provide different responses based on inputs or other signals&lt;/li&gt;
&lt;li&gt;You can verify the &lt;em&gt;requests&lt;/em&gt; that were made, like how you verify method calls with mocking (e.g., Mockito)&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;There are other things you can do with &lt;code&gt;MockWebServer&lt;/code&gt;, for example you can throttle responses to simulate a slow network to test timeout and retry behavior. You can also test with and without HTTPS, requiring client certificates, and customizing the supported protocols. These are all things that can be done in custom code, but it&apos;s much nicer when it comes out of the box.&lt;/p&gt;

&lt;p&gt;To sum up, &lt;code&gt;MockWebServer&lt;/code&gt; makes it simple to write tests for HTTP client code, allowing you to test the &quot;happy path&quot; and various failure scenarios, and provides support for more advanced testing situations such as when requiring client certificate authentication or simulating network slowness.&lt;/p&gt;

</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/junit_pioneer_presentation_slides</guid>
    <title>JUnit Pioneer Presentation Slides</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/junit_pioneer_presentation_slides</link>
        <pubDate>Fri, 5 Mar 2021 17:07:49 +0000</pubDate>
    <category>Development</category>
    <category>junit</category>
    <category>software</category>
    <category>java</category>
    <category>jupiter</category>
    <category>testing</category>
            <description>&lt;p&gt;Recently I&apos;ve been using JUnit Pioneer, which is an extension library for JUnit Jupiter (JUnit 5). It contains a lot of useful annotations that are really easy to use in tests, for example to generate a range of numbers for input into a parameterized test. This is a presentation about Pioneer that I gave on March 4, 2021.&lt;/p&gt;

&lt;p&gt;&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/q7ESf4x7CvqX8v&quot; width=&quot;595&quot; height=&quot;485&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;&quot; allowfullscreen&gt; &lt;/iframe&gt; &lt;div style=&quot;margin-bottom:5px&quot;&gt; &lt;strong&gt; &lt;a href=&quot;//www.slideshare.net/scottleber/junit-pioneer&quot; title=&quot;JUnit Pioneer&quot; target=&quot;_blank&quot;&gt;JUnit Pioneer&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href=&quot;//www.slideshare.net/scottleber&quot; target=&quot;_blank&quot;&gt;Scott Leberknight&lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;&lt;/p&gt;

&lt;p&gt;In case the embedded slideshow doesn&#8217;t work properly here is a &lt;a href=&quot;https://www.slideshare.net/scottleber/junit-pioneer/&quot; target=&quot;_blank&quot;&gt;link&lt;/a&gt; to the slides (opens in a new window/tab).&lt;/p&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/unit_testing_presentation_slides</guid>
    <title>Unit Testing Presentation Slides</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/unit_testing_presentation_slides</link>
        <pubDate>Thu, 11 Jul 2019 03:12:52 +0000</pubDate>
    <category>Development</category>
    <category>testing</category>
    <category>development</category>
    <category>junit</category>
    <category>software</category>
            <description>&lt;p&gt;We have several interns this summer, and each Friday we&apos;re doing a short presentation on a different software development topic. On June 28, I gave a short presentation on (unit) testing. This presentation is very light on code, and heavier on philosophy. I shared the slides on SlideShare and have embedded them below.&lt;/p&gt;

&lt;p&gt;&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/R6AW6ku316gqx&quot; width=&quot;595&quot; height=&quot;485&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;&quot; allowfullscreen&gt; &lt;/iframe&gt; &lt;div style=&quot;margin-bottom:5px&quot;&gt; &lt;strong&gt; &lt;a href=&quot;//www.slideshare.net/scottleber/unit-testing-153856579&quot; title=&quot;Unit Testing&quot; target=&quot;_blank&quot;&gt;Unit Testing&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href=&quot;https://www.slideshare.net/scottleber&quot; target=&quot;_blank&quot;&gt;Scott Leberknight&lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;&lt;/p&gt;

&lt;p&gt;In case the embedded slideshow doesn&#8217;t work properly here is a &lt;a href=&quot;https://www.slideshare.net/scottleber/unit-testing-153856579&quot; target=&quot;_blank&quot;&gt;link&lt;/a&gt; to the slides (opens in a new window/tab).&lt;/p&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/sdkman_presentation_slides</guid>
    <title>SDKMAN! Presentation Slides</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/sdkman_presentation_slides</link>
        <pubDate>Sun, 7 Apr 2019 19:29:08 +0000</pubDate>
    <category>Development</category>
    <category>jdk</category>
    <category>java</category>
    <category>sdk</category>
            <description>&lt;p&gt;I&#8217;ve been using SDKMAN! for a while now to make it really easy to install and manage multiple versions of various SDKs like Java, Kotlin, Groovy, and so on. I recently gave a mini-talk on SDKMAN! and have embedded the slides below.&lt;/p&gt;

&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/edcaPm3tyd8IG8&quot; width=&quot;595&quot; height=&quot;485&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;&quot; allowfullscreen&gt; &lt;/iframe&gt; &lt;div style=&quot;margin-bottom:5px&quot;&gt; &lt;strong&gt; &lt;a href=&quot;//www.slideshare.net/scottleber/sdkman-138942408&quot; title=&quot;SDKMAN!&quot; target=&quot;_blank&quot;&gt;SDKMAN!&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href=&quot;https://www.slideshare.net/scottleber&quot; target=&quot;_blank&quot;&gt;Scott Leberknight&lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;

&lt;p&gt;In case the embedded slideshow doesn&#8217;t work properly here is a &lt;a href=&quot;https://www.slideshare.net/scottleber/sdkman-138942408&quot; target=&quot;_blank&quot;&gt;link&lt;/a&gt; to the slides (opens in a new window/tab).&lt;/p&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/junit_5_presentation_slides</guid>
    <title>JUnit 5 Presentation Slides</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/junit_5_presentation_slides</link>
        <pubDate>Mon, 13 Aug 2018 12:36:08 +0000</pubDate>
    <category>Development</category>
    <category>java</category>
    <category>assertj</category>
    <category>junit</category>
    <category>testing</category>
            <description>&lt;p&gt;I just gave a short presentation on &lt;a href=&quot;https://junit.org/junit5/&quot;&gt;JUnit 5&lt;/a&gt; at my company, &lt;a href=&quot;https://www.fortitudetec.com&quot;&gt;Fortitude Technologies&lt;/a&gt;. JUnit 5 adds a bunch of useful features for developer testing such as parameterized tests, a more flexible extension model, and a lot more. Plus, it aims to provide a more clean separation between the testing &lt;em&gt;platform&lt;/em&gt; that IDEs and build tools like Maven and Gradle use, versus the developer testing APIs. It also provides an easy migration path from JUnit 4 (or earlier) by letting you run JUnit 3, 4, &lt;em&gt;and&lt;/em&gt; 5 tests in the same propject. Here are the slides:&lt;/p&gt;

&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/zPokgsfF8gLdBY&quot; width=&quot;595&quot; height=&quot;485&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;&quot; allowfullscreen&gt;
&lt;/iframe&gt;

&lt;div style=&quot;margin-bottom:5px&quot;&gt;
&lt;strong&gt;&lt;a href=&quot;//www.slideshare.net/scottleber/junit-5-108430614&quot; title=&quot;JUnit 5&quot; target=&quot;_blank&quot;&gt;JUnit 5&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href=&quot;https://www.slideshare.net/scottleber&quot; target=&quot;_blank&quot;&gt;Scott Leberknight&lt;/a&gt;&lt;/strong&gt;
&lt;/div&gt;

&lt;p&gt;In case the embedded slide show does not display properly, here is a link to the &lt;a href=&quot;https://www.slideshare.net/scottleber/junit-5-108430614&quot;&gt;slides&lt;/a&gt; on Slideshare. The sample code for the presentation is on GitHub &lt;a href=&quot;https://github.com/sleberknight/junit5-presentation-code&quot;&gt;here&lt;/a&gt;.
&lt;/p&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/process_api_improvements_in_jdk9</guid>
    <title>Process API Improvements in JDK9</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/process_api_improvements_in_jdk9</link>
        <pubDate>Tue, 4 Apr 2017 12:32:00 +0000</pubDate>
    <category>Development</category>
    <category>java</category>
    <category>jdk9</category>
            <description>&lt;p&gt;Over the past year, several microservices I have worked on responded to specific events and then executed native OS processes, for example launching custom C++ applications, Python scripts, etc. In addition to simply launching processes, those services also needed to obtain information for executing processes upon request, or shut down processes upon receiving shut down events. A lot of what the services were doing was controlling native processes in response to specific external events, whether via JMS queues, Kafka topics, or even XML files dropped in specific directories.&lt;/p&gt;

&lt;p&gt;Since the microservices were implemented in Java, I had to use the less-than-stellar &lt;code&gt;Process&lt;/code&gt; API, which provides only the most basic support. Even though a few additional features were added in Java 8 - such as being able to check if a process is alive using &lt;code&gt;Process#isAlive&lt;/code&gt; and waiting for process exit with a timeout - you still cannot obtain a handle to a running process by its process ID nor can you even get the process ID of a &lt;code&gt;Process&lt;/code&gt; object. As a result of the limitations I wrote a bunch of utilities that basically call out to native programs like &lt;code&gt;grep&lt;/code&gt; and &lt;code&gt;pgrep&lt;/code&gt; to gather information on running processes, child processes for a specific process ID, and so on. Even worse, to simply find the process ID for a &lt;code&gt;Process&lt;/code&gt; instance I used reflection to directly access the private &lt;code&gt;pid&lt;/code&gt; field in the &lt;code&gt;java.lang.UNIXProcess&lt;/code&gt; class (which first required checking that we were actually dealing with a &lt;code&gt;UNIXProcess&lt;/code&gt; instance, by comparing the class name as a string, since &lt;code&gt;UNIXProcess&lt;/code&gt; is package-private and thus you cannot obtain its &lt;code&gt;Class&lt;/code&gt; instance).&lt;/p&gt;

&lt;p&gt;Most people writing and talking about Java 9 are excited about things like the new module system in &lt;a href=&quot;http://openjdk.java.net/projects/jigsaw/&quot;&gt;Project Jigsaw&lt;/a&gt;; the &lt;a href=&quot;http://openjdk.java.net/jeps/222&quot;&gt;Java shell/REPL&lt;/a&gt;; the &lt;a href=&quot;http://openjdk.java.net/jeps/110&quot;&gt;HTTP/2 client&lt;/a&gt;; convenience &lt;a href=&quot;http://openjdk.java.net/jeps/269&quot;&gt;factory methods&lt;/a&gt; for collections; and so on. But I am maybe even more excited about the &lt;a href=&quot;http://openjdk.java.net/jeps/102&quot;&gt;process API improvements&lt;/a&gt;, since it means I can get rid of a lot of the hackery I used to obtain process information. Some of the information you can now obtain from a &lt;code&gt;Process&lt;/code&gt; instance includes:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Whether the process supports normal termination (i.e. any of the &quot;non-forcible&quot; kill signals in Linux)&lt;/li&gt;
&lt;li&gt;The process ID (i.e. the &quot;pid&quot;), and yes it&apos;s about time&lt;/li&gt;
&lt;li&gt;A handle to the &lt;em&gt;current&lt;/em&gt; process&lt;/li&gt;
&lt;li&gt;A handle to the &lt;em&gt;parent&lt;/em&gt; process, if one exists&lt;/li&gt;
&lt;li&gt;A stream of handles to the direct children of the process&lt;/li&gt;
&lt;li&gt;A stream of handles to the descendants (direct children, their children, and so on recursively)&lt;/li&gt;
&lt;li&gt;A stream of handles to &lt;em&gt;all&lt;/em&gt; processes visible to the current process&lt;/li&gt;
&lt;li&gt;Process metadata such as the full command line, arguments, start instant, owning user, and total CPU duration&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;For example, to obtain the process ID (written as a unit test, and using &lt;a href=&quot;http://joel-costigliola.github.io/assertj/index.html&quot;&gt;AssertJ&lt;/a&gt; assertions):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@Test
public void getPid() throws IOException {
    ProcessBuilder builder = new ProcessBuilder(&quot;/bin/sleep&quot;, &quot;5&quot;);
    Process proc = builder.start();
    assertThat(proc.getPid()).isGreaterThan(0);
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Or, to obtain all sorts of different process metadata using &lt;code&gt;ProcessHandle&lt;/code&gt; (which is also new in JDK 9 via the &lt;code&gt;info()&lt;/code&gt; method in &lt;code&gt;Process&lt;/code&gt;):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@Test
public void processInfo() throws IOException {
    ProcessBuilder builder = new ProcessBuilder(&quot;/bin/sleep&quot;, &quot;5&quot;);
    Process proc = builder.start();
    ProcessHandle.Info info = proc.info();
    assertThat(info.arguments().orElse(new String[] {})).containsExactly(&quot;5&quot;);
    assertThat(info.command().orElse(null)).isEqualTo(&quot;/bin/sleep&quot;);
    assertThat(info.commandLine().orElse(null)).isEqualTo(&quot;/bin/sleep 5&quot;);
    assertThat(info.user().orElse(null)).isEqualTo(System.getProperty(&quot;user.name&quot;));
    assertThat(info.startInstant().orElse(null)).isLessThanOrEqualTo(Instant.now());
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Note in the above test that &lt;em&gt;every&lt;/em&gt; method in the &lt;code&gt;ProcessHandle.Info&lt;/code&gt; returns an &lt;code&gt;Optional&lt;/code&gt;, which is the reason for the &lt;code&gt;orElse&lt;/code&gt; in the assertions. Another thing that I really needed - and thankfully JDK 9 now provides - is the ability to get a handle to an &lt;em&gt;existing&lt;/em&gt; process by its process ID using the &lt;code&gt;ProcessHandle#of&lt;/code&gt; method. Here is a simple example as a unit test:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@Test
public void getProcessHandleForExistingProcess() throws IOException {
    ProcessBuilder builder = new ProcessBuilder(&quot;/bin/sleep&quot;, &quot;5&quot;);
    Process proc = builder.start();
    long pid = proc.getPid();

    ProcessHandle handle = ProcessHandle.of(pid).orElseThrow(IllegalStateException::new);
    assertThat(handle.getPid()).isEqualTo(pid);
    assertThat(handle.info().commandLine().orElse(null)).isEqualTo(&quot;/bin/sleep 5&quot;);
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;As with the &lt;code&gt;ProcessHandle.Info&lt;/code&gt; methods, &lt;code&gt;ProcessHandle#of&lt;/code&gt; returns an &lt;code&gt;Optional&lt;/code&gt; so again that is the reason for the &lt;code&gt;orElseThrow&lt;/code&gt;. In a real application you might take some other action if the returned &lt;code&gt;Optional&lt;/code&gt; is empty, or maybe you just throw an exception as the above test does.&lt;/p&gt;

&lt;p&gt;As a last example, here is a test that launches a &lt;code&gt;sleep&lt;/code&gt; process, then streams all visible processes and finds the launched &lt;code&gt;sleep&lt;/code&gt; process:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@Test
public void allProcesses() throws IOException {
    ProcessBuilder builder = new ProcessBuilder(&quot;/bin/sleep&quot;, &quot;5&quot;);
    builder.start();

    String sleep = ProcessHandle.allProcesses()
            .map(handle -&amp;gt; handle.info().command().orElse(String.valueOf(handle.getPid())))
            .filter(cmd -&amp;gt; cmd.equals(&quot;/bin/sleep&quot;))
            .findFirst()
            .orElse(null);
    assertThat(sleep).isNotNull();
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In the above test, since &lt;code&gt;allProcesses&lt;/code&gt; returns a &lt;code&gt;Stream&lt;/code&gt; we can use normal Java 8 stream API features like &lt;code&gt;map&lt;/code&gt;, &lt;code&gt;filter&lt;/code&gt;, and so on. In this example, we first map (transform) the &lt;code&gt;ProcessHandle&lt;/code&gt; to the command (i.e. &quot;sleep&quot;) or the process ID if the command &lt;code&gt;Optional&lt;/code&gt; is empty. Next we filter on whether the command equals &lt;code&gt;/bin/sleep&lt;/code&gt; and call &lt;code&gt;findFirst&lt;/code&gt; which returns an &lt;code&gt;Optional&lt;/code&gt;, and finally use &lt;code&gt;orElse&lt;/code&gt; to return &lt;code&gt;null&lt;/code&gt; if the returned &lt;code&gt;Optional&lt;/code&gt; was empty. Of course the above test can fail if, for example, there already happens to be a &lt;code&gt;/bin/sleep 5&lt;/code&gt; process executing in the operating system but we won&apos;t really worry about that here.&lt;/p&gt;

&lt;p&gt;One last piece of information that might be needed is the &lt;em&gt;current&lt;/em&gt; process, i.e. a process needs get a handle to its own process. You can now accomplish this easily by calling &lt;code&gt;ProcessHandle.current()&lt;/code&gt;. The Javadoc notes that you cannot use the returned handle to destroy the current process, and says to use &lt;code&gt;System#exit&lt;/code&gt; instead.&lt;/p&gt;

&lt;p&gt;In addition to the process information shown in the above examples, there is also a new &lt;code&gt;onExit&lt;/code&gt; method that returns a &lt;code&gt;CompletableFuture&lt;/code&gt; that &quot;provides the ability to trigger dependent functions or actions that may be run synchronously or asynchronously upon process termination&quot; according to the Javadoc. The following example shows an example that uses the native &lt;code&gt;cmp&lt;/code&gt; program to compare two files, and upon exit applies a lambda expression to check whether the exit value is zero (meaning the two files are identical). Finally, it uses the &lt;code&gt;Future#get&lt;/code&gt; method with a 1 second timeout (to avoid blocking indefinitely) to obtain the result:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Process proc = new ProcessBuilder(&quot;/usr/bin/cmp&quot;, &quot;/tmp/file1.txt&quot;, &quot;/tmp/file2.txt&quot;).start();
Future&amp;lt;Boolean&amp;gt; areIdentical = proc.onExit().thenApply(proc1 -&amp;gt; proc1.exitValue() == 0);
if (areIdentical.get(1, TimeUnit.SECONDS)) { ... }
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So a big thanks to the Java team at Oracle (I can&apos;t believe I just thanked Oracle) for adding these new features! In the &quot;real world&quot; where systems are heterogenous and need to integrate in myriad ways, having a much more featureful and robust process API helps a lot for any system that needs to launch, monitor, and destroy native processes.&lt;/p&gt;
</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/aws_lambda_presentation_slides</guid>
    <title>AWS Lambda Presentation Slides</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/aws_lambda_presentation_slides</link>
        <pubDate>Mon, 20 Mar 2017 12:15:00 +0000</pubDate>
    <category>Development</category>
    <category>aws</category>
    <category>serverless</category>
    <category>lambda</category>
            <description>&lt;p&gt;A few months ago I gave a short presentation to &lt;a href=&quot;https://www.fortitudetec.com&quot;&gt;my company&lt;/a&gt; on &lt;a href=&quot;https://aws.amazon.com/lambda/&quot;&gt;AWS Lambda&lt;/a&gt;, which is basically a &quot;serverless&quot; framework that lets you deploy and run code in Amazon&apos;s cloud without managing, provisioning, or administering any servers whatsoever. Here are the slides:&lt;/p&gt;

&lt;iframe src=&quot;//www.slideshare.net/slideshow/embed_code/key/cgkGyi5XkpCxhw&quot; width=&quot;595&quot; height=&quot;485&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px; margin-bottom:5px; max-width: 100%;&quot; allowfullscreen&gt; &lt;/iframe&gt; &lt;div style=&quot;margin-bottom:5px&quot;&gt; &lt;strong&gt; &lt;a href=&quot;//www.slideshare.net/scottleber/aws-lambda-73153540&quot; title=&quot;AWS Lambda&quot; target=&quot;_blank&quot;&gt;AWS Lambda&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a target=&quot;_blank&quot; href=&quot;//www.slideshare.net/scottleber&quot;&gt;Scott Leberknight&lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/testing_http_clients_using_spark</guid>
    <title>Testing HTTP Clients Using Spark, Revisited</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/testing_http_clients_using_spark</link>
        <pubDate>Tue, 14 Mar 2017 20:09:22 +0000</pubDate>
    <category>Development</category>
    <category>sparkjava</category>
    <category>testing</category>
    <category>java</category>
            <description>&lt;p&gt;In a previous &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/testing_http_clients_using_the&quot;&gt;post&lt;/a&gt; I described the very small &lt;a href=&quot;https://github.com/sleberknight/sparkjava-testing/&quot;&gt;sparkjava-testing&lt;/a&gt; library I created to make it really simple to test HTTP client code using the &lt;a href=&quot;http://sparkjava.com/&quot;&gt;Spark&lt;/a&gt; micro-framework. It is basically one simple JUnit 4 rule (&lt;code&gt;SparkServerRule&lt;/code&gt;) that spins up a Spark HTTP server before tests run, and shuts it down once tests have executed. It can be used either as a &lt;code&gt;@ClassRule&lt;/code&gt; or as a &lt;code&gt;@Rule&lt;/code&gt;. Using &lt;code&gt;@ClassRule&lt;/code&gt; is normally what you want to do, which starts an HTTP server before any tests has run, and shuts it down afer &lt;em&gt;all&lt;/em&gt; tests have finished.&lt;/p&gt;

&lt;p&gt;In that post I mentioned that I needed to do an &quot;incredibly awful hack&quot; to reset the Spark HTTP server to non-secure mode so that, if tests run securely using a test keystore, other tests can also run either non-secure or secure, possibly with a different keystore. I also said the reason I did that was because &quot;there is no way I found to easily reset security&quot;. The reason for all that nonsense was because I was using the &lt;em&gt;static&lt;/em&gt; methods on the &lt;code&gt;Spark&lt;/code&gt; class such as &lt;code&gt;port&lt;/code&gt;, &lt;code&gt;secure&lt;/code&gt;, &lt;code&gt;get&lt;/code&gt;, &lt;code&gt;post&lt;/code&gt;, and so on. Using the static methods also implies only one server instance across &lt;em&gt;all&lt;/em&gt; tests, which is also not so great.&lt;/p&gt;

&lt;p&gt;Well, it turns out I didn&apos;t really dig deep enough into Spark&apos;s features, because there is a really simple way to spin up separate and independent Spark server instances. You simply use the &lt;code&gt;Service#ignite&lt;/code&gt; method to return an instance of &lt;code&gt;Service&lt;/code&gt;. You then configure the &lt;code&gt;Service&lt;/code&gt; however you want, e.g. change the port, add routes, filters, set the server to run securely, etc. Here&apos;s an example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Service http = Service.ignite();
http.port(56789);
http.get(&quot;/hello&quot;, (req, resp) -&amp;gt; &quot;Hello, Spark service!&quot;);
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So now you can create as many servers as you want. This is exactly what is needed for the &lt;code&gt;SparkServerRule&lt;/code&gt;, which has been refactored to use &lt;code&gt;Spark#ignite&lt;/code&gt; to get separate servers for each test. It now has only one constructor which takes a &lt;code&gt;ServiceInitializer&lt;/code&gt; and can be used to do whatever configuration you need, add routes, filters, etc. Since &lt;code&gt;ServiceInitializer&lt;/code&gt; is a &lt;code&gt;@FunctionalInterface&lt;/code&gt; you can simply supply a lambda expression, which makes it cleaner. Here is a simple example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@ClassRule
public static final SparkServerRule SPARK_SERVER = new SparkServerRule(http -&amp;gt; {
    http.get(&quot;/ping&quot;, (request, response) -&amp;gt; &quot;pong&quot;);
    http.get(&quot;/health&quot;, (request, response) -&amp;gt; &quot;healthy&quot;);
});
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This is a rule that, before any test is run, spins up a Spark server on the default port &lt;code&gt;4567&lt;/code&gt; with two GET routes, and shuts the server down after all tests have completed. To do things like change the port and IP address in addition to adding routes, you just call the appropriate methods on the &lt;code&gt;Service&lt;/code&gt; instance (in the example above, the &lt;code&gt;http&lt;/code&gt; object passed to the lambda). Here&apos;s an example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@ClassRule
public static final SparkServerRule SPARK_SERVER = new SparkServerRule(https -&amp;gt; {
    https.ipAddress(&quot;127.0.0.1&quot;);
    https.port(56789);
    URL resource = Resources.getResource(&quot;sample-keystore.jks&quot;);
    https.secure(resource.getFile(), &quot;password&quot;, null, null);
    https.get(&quot;/ping&quot;, (request, response) -&amp;gt; &quot;pong&quot;);
    https.get(&quot;/health&quot;, (request, response) -&amp;gt; &quot;healthy&quot;);
});
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In this example, tests will be able to access a server with two secure (https) endpoints at IP &lt;code&gt;127.0.0.1&lt;/code&gt; on port &lt;code&gt;56789&lt;/code&gt;. So that&apos;s it. On the off chance someone was actually using this rule other than me, the migration path is really simple. You just need to configure the &lt;code&gt;Service&lt;/code&gt; instance passed in the &lt;code&gt;SparkServerRule&lt;/code&gt; constructor as shown above. Now, each server is totally independent which allows tests to run in parallel (assuming they&apos;re on different ports). And better, I was able to remove the hack where I used reflection to go under the covers of Spark and manipulate fields, etc. So, test away on that HTTP client code!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This blog was originally published on the &lt;a href=&quot;http://www.fortitudetec.com&quot;&gt;Fortitude Technologies&lt;/a&gt; blog &lt;a href=&quot;http://www.fortitudetec.com/blogs/2017/03/14/testing-http-clients-using-spark-revisited&quot;&gt;here&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/testing_http_clients_using_the</guid>
    <title>Testing HTTP Clients Using the Spark Micro Framework</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/testing_http_clients_using_the</link>
        <pubDate>Wed, 7 Dec 2016 14:25:00 +0000</pubDate>
    <category>Development</category>
    <category>dropwizard</category>
    <category>testing</category>
    <category>sparkjava</category>
    <category>java</category>
            <description>&lt;p&gt;Testing HTTP client code can be a hassle. Your tests either need to run against a live HTTP server, or you somehow need to figure out how to send mock requests which is generally not easy in most libraries that I have used. The tests should also be fast, meaning you need a lightweight server that starts and stops quickly. Spinning up heavyweight web or application servers, or relying on a specialized test server, is generally error-prone, adds complexity and slows tests down. In projects I&apos;m working on lately we are using &lt;a href=&quot;http://dropwizard.io&quot;&gt;Dropwizard&lt;/a&gt;, which provides first class testing support for testing JAX-RS resources and clients as JUnit rules. For example, it provides &lt;a href=&quot;http://www.dropwizard.io/1.0.3/docs/manual/testing.html#testing-client-implementations&quot;&gt;DropwizardClientRule&lt;/a&gt;, a JUnit rule that lets you implement JAX-RS resources as test doubles and starts and stops a simple Dropwizard application containing those resources. This works great if you are already using Dropwizard, but if not then a great alternative is &lt;a href=&quot;http://sparkjava.com/&quot;&gt;Spark&lt;/a&gt;. Even if you are using Dropwizard, Spark can still work well as a test HTTP server.&lt;/p&gt;

&lt;p&gt;Spark is self-described as a &quot;micro framework for creating web applications in Java 8 with minimal effort&quot;. You can create the steroptypical &quot;Hello World&quot; in Spark like this (shamelessly copied from Spark&apos;s web site):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;import static spark.Spark.get;

public class HelloWorld {
    public static void main(String[] args) {
        get(&quot;/hello&quot;, (req, res) -&amp;gt; &quot;Hello World&quot;);
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You can run this code and visit &lt;code&gt;http://localhost:4567&lt;/code&gt; in a browser or using a client tool like curl or &lt;a href=&quot;https://github.com/jkbrzt/httpie&quot;&gt;httpie&lt;/a&gt;. Spark is a perfect fit for creating HTTP servers in tests (whether you call them unit tests, integration tests or something else is up to you, I will just call them tests here). I have created a very simple library &lt;a href=&quot;https://github.com/sleberknight/sparkjava-testing/&quot;&gt;sparkjava-testing&lt;/a&gt; that contains a JUnit rule for spinning up a Spark server for functional testing of HTTP clients. This library consists of one JUnit rule, the &lt;code&gt;SparkServerRule&lt;/code&gt;. You can annotate this rule with &lt;code&gt;@ClassRule&lt;/code&gt; or just &lt;code&gt;@Rule&lt;/code&gt;. Using &lt;code&gt;@ClassRule&lt;/code&gt; will start a Spark server &lt;em&gt;one time&lt;/em&gt; before any test is run. Then your tests run, making requests to the HTTP server, and finally once all tests have finished the server is shut down. If you need true isolation between every single test, annotate the rule with &lt;code&gt;@Rule&lt;/code&gt; and a test Spark server will be started before &lt;em&gt;each&lt;/em&gt; test and shut down after &lt;em&gt;each&lt;/em&gt; test, meaning each test runs against a fresh server. (The &lt;code&gt;SparkServerRule&lt;/code&gt; is a JUnit 4 rule mainly because JUnit 5 is still in milestone releases, and because I have not actually used JUnit 5.)&lt;/p&gt;

&lt;p&gt;To declare a class rule with a test Spark server with two endpoints, you can do this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@ClassRule
public static final SparkServerRule SPARK_SERVER = new SparkServerRule(() -&amp;gt; {
    get(&quot;/ping&quot;, (request, response) -&amp;gt; &quot;pong&quot;);
    get(&quot;/healthcheck&quot;, (request, response) -&amp;gt; &quot;healthy&quot;);
});
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;code&gt;SparkServerRule&lt;/code&gt; constructor takes a &lt;code&gt;Runnable&lt;/code&gt; which define the routes the server should respond to. In this example there are two HTTP GET routes, &lt;code&gt;/ping&lt;/code&gt; and &lt;code&gt;/healthcheck&lt;/code&gt;. You can of course implement the other HTTP verbs such as POST and PUT. You can then write tests using whatever client library you want. Here is an example test using a JAX-RS:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@Test
public void testSparkServerRule_HealthcheckRequest() {
    client = ClientBuilder.newBuilder().build();
    Response response = client.target(URI.create(&quot;http://localhost:4567/healthcheck&quot;))
            .request()
            .get();
    assertThat(response.getStatus()).isEqualTo(200);
    assertThat(response.readEntity(String.class)).isEqualTo(&quot;healthy&quot;);
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In the above test, &lt;code&gt;client&lt;/code&gt; is a JAX-RS &lt;code&gt;Client&lt;/code&gt; instance (it is an instance variable which is closed after each test). I&apos;m using &lt;a href=&quot;http://joel-costigliola.github.io/assertj/&quot;&gt;AssertJ&lt;/a&gt; assertions in this test. The main thing to note is that your client code be parameterizable, so that the local Spark server URI can be injected instead of the actual production URI. When using the JAX-RS client as in this example, this means you need to be able to supply the test server URI to the &lt;code&gt;Client#target&lt;/code&gt; method. Spark runs on port 4567 by default, so the client in the test uses that port.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;SparkServerRule&lt;/code&gt; has two other constructors: one that accepts a port in addition to the routes, and another that takes a &lt;code&gt;SparkInitializer&lt;/code&gt;. To start the test server on a different port, you can do this:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@ClassRule
public static final SparkServerRule SPARK_SERVER = new SparkServerRule(6543, () -&amp;gt; {
    get(&quot;/ping&quot;, (request, response) -&amp;gt; &quot;pong&quot;);
    get(&quot;/healthcheck&quot;, (request, response) -&amp;gt; &quot;healthy&quot;);
});
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You can use the constuctor that takes a &lt;code&gt;SparkInitializer&lt;/code&gt; to customize the Spark server, for example in addition to changing the port you can also set the IP address and make the server secure. The &lt;code&gt;SparkInitializer&lt;/code&gt; is an &lt;code&gt;@FunctionalInterface&lt;/code&gt; with one method &lt;code&gt;init()&lt;/code&gt;, so you can use a lambda expression. For example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;@ClassRule
public static final SparkServerRule SPARK_SERVER = new SparkServerRule(
        () -&amp;gt; {
            Spark.ipAddress(&quot;127.0.0.1&quot;);
            Spark.port(9876);
            URL resource = Resources.getResource(&quot;sample-keystore.jks&quot;);
            String file = resource.getFile();
            Spark.secure(file, &quot;password&quot;, null, null);
        },
        () -&amp;gt; {
            get(&quot;/ping&quot;, (request, response) -&amp;gt; &quot;pong&quot;);
            get(&quot;/healthcheck&quot;, (request, response) -&amp;gt; &quot;healthy&quot;);
        });
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The first argument is the initializer. It sets the IP address and port, and then loads a sample keystore and calls the &lt;code&gt;Spark#secure&lt;/code&gt; method to make the test sever accept HTTPS connections using a sample keystore. You might want to customize settings if running tests in parallel, specifically the port, to ensure parallel tests do not encounter port conflicts.&lt;/p&gt;

&lt;p&gt;The last thing to note is that &lt;code&gt;SparkServerRule&lt;/code&gt; resets the port, IP address, and secure settings to the default values (&lt;code&gt;4567&lt;/code&gt;, &lt;code&gt;0.0.0.0&lt;/code&gt;, and non-secure, respectively) when it shuts down the Spark server. If you use the &lt;code&gt;SparkInitializer&lt;/code&gt; to customize other settings (for example the server thread pool, static file location, before/after filters, etc.) those will not be reset, as they are not currently supported by &lt;code&gt;SparkServerRule&lt;/code&gt;. Last, resetting to non-secure mode required an incredibly awful hack because there is no way I found to easily reset security - you cannot just pass in a bunch of &lt;code&gt;null&lt;/code&gt; values to the &lt;code&gt;Spark#secure&lt;/code&gt; method as it will throw an exception, and there is no &lt;code&gt;unsecure&lt;/code&gt; method probably because the server was not intended to set and reset things a bunch of times like we want to do in test scenarios. If you&apos;re interested, go look at the code for the &lt;code&gt;SparkServerRule&lt;/code&gt; in the &lt;a href=&quot;https://github.com/sleberknight/sparkjava-testing/&quot;&gt;sparkjava-testing&lt;/a&gt; repository, but prepare thyself and get some cleaning supplies ready to wash away the dirty feeling you&apos;re sure to have after seeing it.&lt;/p&gt;

&lt;p&gt;The ability to use &lt;code&gt;SparkServerRule&lt;/code&gt; to quickly and easily setup test HTTP servers, along with the ability to customize the port, IP address, and run securely intests has worked very well for my testing needs thus far. Note that unlike the above toy examples, you can implement more complicated logic in the routes, for example to return a 200 or a 404 for a GET request depending on a path parameter or request parameter value. But at the same time, don&apos;t implement extremely complex logic either. Most times I simply create separate routes when I need the test server to behave differently, for example to test various error conditions. Or, I might even choose to implement separate JUnit test &lt;em&gt;classes&lt;/em&gt; for different server endpoints, so that each test focuses on only one endpoint and its various success and failure conditions. As is many times the case, the &lt;em&gt;context&lt;/em&gt; will determine the best way to implement your tests. &lt;em&gt;Happy testing!&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This blog was originally published on the &lt;a href=&quot;http://www.fortitudetec.com&quot;&gt;Fortitude Technologies&lt;/a&gt; blog &lt;a href=&quot;http://www.fortitudetec.com/blogs/2016/12/05/testing-http-clients-using-the-spark-micro-framework&quot;&gt;here&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/towards_more_functional_java_digging</guid>
    <title>Towards More Functional Java - Digging into Nested Data Structures</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/towards_more_functional_java_digging</link>
        <pubDate>Mon, 14 Nov 2016 13:15:00 +0000</pubDate>
    <category>Development</category>
    <category>java</category>
    <category>functional</category>
    <category>refactoring</category>
            <description>&lt;p&gt;In the &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/towards_more_functional_java_using2&quot;&gt;last post&lt;/a&gt; we saw an example that used a generator combined with a filter to find the first available port in a specific range. It returned an &lt;code&gt;Optional&lt;/code&gt; to model the case when no open ports are found, as opposed to returning &lt;code&gt;null&lt;/code&gt;. In this example, we&apos;ll look at how to use Java 8 streams to dig into a nested data structure and find objects of a specific type. We&apos;ll use map and filter operations on the stream, and also introduce a new concept, the flat-map.&lt;/p&gt;

&lt;p&gt;In the original, pre-Java 8 code that I was working on in a project, the data structure was a three-level nested structure that was marshaled into Java objects from an XML file based on a schema from an external web service. The method needed to find objects of a specific type at the bottom level. For this article, to keep things simple we will work with a simple class structure in which class &lt;code&gt;A&lt;/code&gt; contains a collection of class &lt;code&gt;B&lt;/code&gt;, and &lt;code&gt;B&lt;/code&gt; contains a collection of class &lt;code&gt;C&lt;/code&gt;. The &lt;code&gt;C&lt;/code&gt; class is a base class, and there are several subclasses &lt;code&gt;C1&lt;/code&gt;, &lt;code&gt;C2&lt;/code&gt;, and &lt;code&gt;C3&lt;/code&gt;. In pseudo-code the classes look like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;class A {
  List&amp;lt;B&amp;gt; bs = []
}

class B {
  List&amp;lt;C&amp;gt; cs = []
}

class C {}
class C1 extends C {}
class C2 extends C {}
class C3 extends C {}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The goal here is to find the first C2 instance, given an instance of A. The pre-Java 8 code looks like the following:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public C2 findFirstC2(A a) {
    for (B b : a.getBs()) {
        for (C c : b.getCs()) {
            if (c instanceof C2) {
                return (C2) c;
            }
        }
    }
    return null;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In this code, I made the assumption that the collections are always non-null. The original code I was working on did not make that assumption, and was more complicated as a result. We will revisit the more complicated case later. This code is pretty straightforward: two loops and a conditional, plus an early exit if we find an instance of &lt;code&gt;C2&lt;/code&gt;, and return &lt;code&gt;null&lt;/code&gt; if we exit the loops without having found anything.&lt;/p&gt;

&lt;p&gt;Refactoring to streams using Java 8&apos;s stream API is not too bad, though we need to introduce the &lt;a href=&quot;http://martinfowler.com/articles/collection-pipeline/flat-map.html&quot;&gt;flat-map&lt;/a&gt; concept. Martin Fowler&apos;s simple explanation is better than any I would come up with so I will repeat it here: &quot;Map a function over a collection and flatten the result by one-level&quot;. In our example, each &lt;code&gt;B&lt;/code&gt; has a collection of &lt;code&gt;C&lt;/code&gt;. The flat-map operation over a &lt;em&gt;collection&lt;/em&gt; of &lt;code&gt;B&lt;/code&gt; will basically return a stream of &lt;em&gt;all&lt;/em&gt; &lt;code&gt;C&lt;/code&gt; for &lt;em&gt;all&lt;/em&gt; &lt;code&gt;B&lt;/code&gt;. For example, if there are two &lt;code&gt;B&lt;/code&gt; instances in the collection, the first having 3 &lt;code&gt;C&lt;/code&gt; and the second having 5 &lt;code&gt;C&lt;/code&gt;, then the flat-map operation returns a stream of 8 &lt;code&gt;C&lt;/code&gt; instances (effectively combining the 3 from the first &lt;code&gt;C&lt;/code&gt; and 5 from the second &lt;code&gt;C&lt;/code&gt;, and &lt;em&gt;flattening&lt;/em&gt; by one level up). With the new flat-map tool in our tool belts, here is the Java 8 code using the stream API:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public Optional&amp;lt;C2&amp;gt; findFirstC2(A a) {
    return a.getBs().stream()
            .flatMap(b -&amp;gt; b.getCs().stream())
            .filter(C2.class::isInstance)
            .map(C2.class::cast)
            .findFirst();
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In the above code, we first stream over the collection of &lt;code&gt;B&lt;/code&gt;. Next is where we apply the &lt;code&gt;flatMap&lt;/code&gt; method to get a stream of &lt;em&gt;all&lt;/em&gt; &lt;code&gt;C&lt;/code&gt;. The one somewhat tricky thing about the Java 8 &lt;a href=&quot;https://docs.oracle.com/javase/8/docs/api/java/util/stream/Stream.html#flatMap-java.util.function.Function-&quot;&gt;flatMap&lt;/a&gt; method is that the mapper function must return a &lt;em&gt;stream&lt;/em&gt;. In our example, we use &lt;code&gt;b.getCs().stream()&lt;/code&gt; as the mapper function, thus returning a stream of &lt;code&gt;C&lt;/code&gt;. From then on we can apply the filter and map operations, and close out with the &lt;code&gt;findFirst&lt;/code&gt; short-circuiting (because it stops at the first &lt;code&gt;C2&lt;/code&gt; it finds) terminal operation which returns an &lt;code&gt;Optional&lt;/code&gt; that either contains a value, or is empty.&lt;/p&gt;

&lt;p&gt;If you have read any of my previous posts, you won&apos;t be surprised that I prefer the functional-style of the Java 8 stream API, for the same reasons I&apos;ve listed previously (e.g. declarative code, no explicit loops or conditionals, etc.). And as we&apos;ve seen before in previous posts, we can make the above example generic very easily:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public &amp;lt;T extends C&amp;gt; Optional&amp;lt;T&amp;gt; findFirst(A a, Class&amp;lt;T&amp;gt; clazz) {
    return a.getBs().stream()
            .flatMap(b -&amp;gt; b.getCs().stream())
            .filter(clazz::isInstance)
            .map(clazz::cast)
            .findFirst();
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Of course, it is also not difficult to make the imperative version with loops generic, using the &lt;code&gt;isAssignableFrom&lt;/code&gt; and &lt;code&gt;cast&lt;/code&gt; methods in the &lt;code&gt;Class&lt;/code&gt; class. And you can even make it just as short by removing the braces, as in the following:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public &amp;lt;T&amp;gt; T findFirstC2(A a, Class&amp;lt;T&amp;gt; clazz) {
    for (B b : a.getBs())
        for (C c : b.getCs())
            if (clazz.isAssignableFrom(c.getClass()))
                return clazz.cast(c);
    return null;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;I never omit the braces even for one liners, because I believe it is a great way to introduce bugs (remember &lt;a href=&quot;https://nakedsecurity.sophos.com/2014/02/24/anatomy-of-a-goto-fail-apples-ssl-bug-explained-plus-an-unofficial-patch/&quot;&gt;goto fail&lt;/a&gt; a few years ago?). Braces or no braces, why prefer the more functional style to the imperative style? Some is obviously personal preference, and what you are used to. Clearly if you are used to and comfortable with reading imperative code, it won&apos;t be an issue to read the above code. But the same goes for functional style, i.e. once you learn the basic concepts (map, filter, reduce, flat-map, etc.) it becomes very easy to quickly see what code is doing (and what is intended).&lt;/p&gt;

&lt;p&gt;One other thing is that instead of using &lt;code&gt;stream()&lt;/code&gt;, you can easily switch to &lt;code&gt;parallelStream()&lt;/code&gt; which then automatically parallelizes the code. But simply using &lt;code&gt;parallelStream()&lt;/code&gt; will not always (counter-intuitively) make code faster, e.g. for small collections it will probably make it slower due to context switching. But if things like transformation or filtering take a significant amount of time, then parallelizing the code can produce significant performance improvement. Unfortunately there are no hard rules though, and whether parallelizing speeds the code up depends on various and sundry factors.&lt;/p&gt;

&lt;p&gt;The examples above were very simple. The original code was more complex because it did not make any assumptions about nullability of the original argument or the nested collections. Here is the code:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public C2 findFirstC2(A a) {
    if (a == null || a.getBs() == null) {
        return null;
    }

    for (B b : a.getBs()) {
        List&amp;lt;C&amp;gt; cs = b.getCs();
        if (cs == null) {
            continue;
        }

        for (C c : cs) {
            if (c instanceof C2) {
                return (C2) c;
            }
        }
    }
    return null;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This code is more difficult to read than the original code due to the additional null-checking conditionals. There are two loops, three conditionals, a loop continuation, and a short-circuit return form within a nested loop. So what does this look like using the Java 8 stream API? Here is one solution:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public Optional&amp;lt;C2&amp;gt; findFirstC2(A a) {
    return Optional.ofNullable(a)
            .map(A::getBs)
            .orElseGet(Lists::newArrayList)
            .stream()
            .flatMap(this::toStreamOfC)
            .filter(C2.class::isInstance)
            .map(C2.class::cast)
            .findFirst();
}

private Stream&amp;lt;? extends C&amp;gt; toStreamOfC(B b) {
    return Optional.ofNullable(b.getCs())
            .orElseGet(Lists::newArrayList)
            .stream();
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;That looks like a lot, so let&apos;s see what is going on. The main difference is that we need to account for possible &lt;code&gt;null&lt;/code&gt; values. For that the code uses the &lt;code&gt;Optional#ofNullable&lt;/code&gt; method which unsurprisingly returns an &lt;code&gt;Optional&lt;/code&gt;. We are also using map operations on the &lt;code&gt;Optional&lt;/code&gt; objects, which returns an empty &lt;code&gt;Optional&lt;/code&gt; if it was originally empty, otherwise it applies the operation. We are also using the &lt;code&gt;Optional#orElseGet&lt;/code&gt; method to ensure we are always operating on non-null collections, for example if &lt;code&gt;a.getBs()&lt;/code&gt; returns &lt;code&gt;null&lt;/code&gt; then the first &lt;code&gt;orElseGet&lt;/code&gt; provides a new &lt;code&gt;ArrayList&lt;/code&gt;. In this manner, the code always works the same way whether the intermediate collections are null or not. Instead of embedding a somewhat complicated map operation in the &lt;code&gt;flatMap&lt;/code&gt; I extracted the &lt;code&gt;toStreamOfC&lt;/code&gt; method, and then used a method reference. When writing code in functional style, often it helps to extract helper methods, even if that ends up creating &lt;em&gt;more&lt;/em&gt; code because, in the end, the code is more easily understood.&lt;/p&gt;

&lt;p&gt;The code in this more complex example illustrates the declarative nature of the functional style. Once you are familiar with the functional primitives (like map, flat-map, filter, and so on) reading this code is quite easy and fast, because it reads like a specification of the problem. Like reading code, writing code in the functional style takes some practice and getting used to, but once you get the hang of it, I think you will find you can often write the code faster. The main difference when writing code in functional style is that I do more thinking about what exactly I am trying to do before just slinging code. Until next time, &lt;em&gt;auf Wiedersehen&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This blog was originally published on the &lt;a href=&quot;http://www.fortitudetec.com&quot;&gt;Fortitude Technologies&lt;/a&gt; blog &lt;a href=&quot;http://www.fortitudetec.com/blogs/2016/11/11/towards-more-functional-java-dig-data-structures&quot;&gt;here&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/towards_more_functional_java_using2</guid>
    <title>Towards More Functional Java using Generators and Filters</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/towards_more_functional_java_using2</link>
        <pubDate>Wed, 12 Oct 2016 12:30:00 +0000</pubDate>
    <category>Development</category>
    <category>java</category>
    <category>refactoring</category>
    <category>functional</category>
            <description>&lt;p&gt;Last time we saw how to use &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/towards_more_functional_java_using1&quot;&gt;lambdas as predicates&lt;/a&gt;, and specifically how to use them with the Java 8 &lt;a href=&quot;https://docs.oracle.com/javase/8/docs/api/java/util/Collection.html#removeIf-java.util.function.Predicate-&quot;&gt;Collection#removeIf&lt;/a&gt; method in order to remove elements from a map based on the predicate. In this article we will use a predicate to filter elements from a stream, and combine it with a generator to find the first open port in a specific range. The use case is a (micro)service-based environment where each new service binds to the first open port it finds in a specific port range. For example, suppose we need to limit the port range of each service to the dynamic and/or private ports (49152 to 65535, as &lt;a href=&quot;http://www.iana.org/assignments/port-numbers&quot;&gt;defined by IANA&lt;/a&gt;). Basically we want to choose a port at random in the dynamic port range and bind to that port if it is open, otherwise repeat the process until we find an open port or we have tried more than a pre-defined number of times.&lt;/p&gt;

&lt;p&gt;The original pre-Java 8 code to accomplish this looked like the following:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public Integer findFreePort() {
    int assignedPort = -1;
    int count = 1;
    while (count &amp;lt;= MAX_PORT_CHECK_ATTEMPTS) {
        int checkPort = MIN_PORT + random.nextInt(PORTS_IN_RANGE);
        if (portChecker.isAvailable(checkPort)) {
            assignedPort = checkPort;
            break;
        }
        count++;
    }
    return assignedPort == -1 ? null : assignedPort;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;There are a few things to note here. First, the method returns an Integer to indicate that it could not find an open port (as opposed to throwing an exception, which might or might not be better). Second, there are two mutable variables &lt;code&gt;assignedPort&lt;/code&gt; and &lt;code&gt;count&lt;/code&gt;, which are used to store the open port (if found) and to monitor the number of attempts made, respectively. Third, the &lt;code&gt;while&lt;/code&gt; loop executes so long as as the maximum number of attempts has not been exceeded. Fourth, a conditional inside the loop uses a port checker object to determine port availability, breaking out of the loop if an open port is found. Finally, a ternary expression is used to check the &lt;code&gt;assignedPort&lt;/code&gt; variable and return either &lt;code&gt;null&lt;/code&gt; or the open port.&lt;/p&gt;

&lt;p&gt;Taking a step back, all this code really does is loop until an open port is found, or until the maximum attempts has been exceeded. It then returns &lt;code&gt;null&lt;/code&gt; (if no open port was found) or the open port as an &lt;code&gt;Integer&lt;/code&gt;. There are two mutable variables, a loop, a conditional inside the loop with an early break, and another conditional (via the ternary) to determine the return value. I&apos;m sure there are a few ways this code could be improved without using Java 8 streams. For example, we could simply return the open port from the conditional inside the loop and return null if we exit the loop without finding an open port, thereby eliminating the &lt;code&gt;assignedPort&lt;/code&gt; variable. Even so it still contains a loop with a conditional and an early exit condition. And some people really dislike early returns and only want to see one return statement at the end of a method (I don&apos;t generally have a problem with early exits from methods, so long as the method is relatively short). Not to mention returning null when a port is not found forces a null check on callers; if a developer isn&apos;t paying attention or this isn&apos;t documented, perhaps they omit the null check causing a &lt;code&gt;NullPointerException&lt;/code&gt; somewhere downstream.&lt;/p&gt;

&lt;p&gt;Refactoring this to use the Java 8 stream API can be done relatively simply. In order to accomplish this we want to do the following, starting with &lt;em&gt;generating&lt;/em&gt; a sequence of random ports. For each randomly generated port, &lt;em&gt;filter&lt;/em&gt; on open ports and return the &lt;em&gt;first&lt;/em&gt; open port we find. If no open ports are found after &lt;em&gt;limiting&lt;/em&gt; our attempts to a pre-determined maximum, we want to return something that clearly indicates no open port was found, i.e. that the open port is &lt;em&gt;empty&lt;/em&gt;. I chose the terminology here very specifically, to correspond to both general functional programming concepts as well as to the Java 8 API methods we can use.&lt;/p&gt;

&lt;p&gt;Here is the code using the Java 8 APIs:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public OptionalInt findFreePort() {
    IntSupplier randomPorts = () -&amp;gt; MIN_PORT + random.nextInt(PORTS_IN_RANGE);
    return IntStream.generate(randomPorts)
            .limit(MAX_PORT_CHECK_ATTEMPTS)
            .filter(portChecker::isAvailable)
            .findFirst();
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Without any explanation you can probably read the above code and tell generally what it does, because we are &lt;em&gt;declaring&lt;/em&gt; what should happen, as opposed to listing the explicit instructions for how to do it. But let&apos;s dive in and look at the specifics anyway. The refactored method returns an &lt;code&gt;OptionalInt&lt;/code&gt; to indicate the presence or absence of a value; &lt;code&gt;OptionalInt&lt;/code&gt; is just the version of the &lt;code&gt;Optional&lt;/code&gt; class specialized for primitive integers. This better matches the semantics we&apos;d like, which is to clearly indicate to a caller that there may or may not be a value present. Next, we are using the &lt;code&gt;generate&lt;/code&gt; method to create an &lt;em&gt;infinite&lt;/em&gt; sequence of random values, using the specified &lt;code&gt;IntSupplier&lt;/code&gt; (which is a specialization of &lt;code&gt;Supplier&lt;/code&gt; for primitive integers). Suppliers do exactly what they say they do - supply a value, and in this case a random integer in a specific range. Note the supplier is defined using a lambda expression.&lt;/p&gt;

&lt;p&gt;The infinite sequence is truncated (&lt;em&gt;limited&lt;/em&gt;) using the &lt;code&gt;limit&lt;/code&gt; method, which turns it into a &lt;em&gt;finite&lt;/em&gt; sequence. The final two pieces are the &lt;code&gt;filter&lt;/code&gt; and &lt;code&gt;findFirst&lt;/code&gt; methods. The &lt;code&gt;filter&lt;/code&gt; method uses a method reference to the &lt;code&gt;isAvailable&lt;/code&gt; method on the &lt;code&gt;portChecker&lt;/code&gt; object, which is just a shortcut for a lambda expression when the method accepts a single value that is the lambda argument. Finally, we use &lt;code&gt;findFirst&lt;/code&gt; which is described by the Javadocs as a &quot;short-circuiting terminal operation&quot; which simply means it terminates a stream, and that as soon as its condition is met, it &quot;short circuits&quot; and terminates. The short-circuiting behavior is basically the same as the &lt;code&gt;break&lt;/code&gt; statement in the original pre-Java 8 code.&lt;/p&gt;

&lt;p&gt;So now we have a more functional version that finds free ports with no mutable variables and a more semantically correct return type. As we&apos;ve seen in several of the previous articles in this ad-hoc series, we are seeing common patterns (i.e. map, filter, collect/reduce) recurring in a slightly different form. Instead of a map operation to transform an existing stream, we are &lt;em&gt;generating&lt;/em&gt; a stream from scratch, limiting to a finite number of attempts, filtering the items we want to accept, and then using a short-circuiting terminal operation to return the value found, or an empty value as an &lt;code&gt;OptionalInt&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;As you can probably tell, I am biased toward the functional version for various reasons such as the declarative nature of the code, no explicit looping or variable mutation, and so on. In this case I think the more functional version is much more readable (though I am 100% sure there will be people who vehemently disagree, and that&apos;s OK). In addition, because we are using what are effectively building blocks (generators, map, filter, reduce/collect, etc.) we can much more easily make something generic to find the first thing that satisifies a filtering condition given a supplier and limit. For example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public &amp;lt;T&amp;gt; Optional&amp;lt;T&amp;gt; findFirst(long maxAttempts,
                                 Supplier&amp;lt;T&amp;gt; generator,
                                 Predicate&amp;lt;T&amp;gt; condition) {
    return Stream.generate(generator)
            .limit(maxAttempts)
            .filter(condition)
            .findFirst();
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now we have a re-usable method that can accept any generator and any predicate. For example, suppose you want to find the first random number over two billion if it occurs within 10 attempts, or else default to 42 (naturally). Assuming you have a random number generator object &lt;code&gt;rand&lt;/code&gt;, then you could call the &lt;code&gt;findFirst&lt;/code&gt; method like this, making use of the &lt;code&gt;orElse&lt;/code&gt; method on &lt;code&gt;Optional&lt;/code&gt; to provide a default value:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Integer value = findFirst(10, rand::nextInt, value -&amp;gt; value &amp;gt; 2_000_000_000).orElse(42);
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So as I mentioned in the last article on predicates, there is a separation of concerns achieved by using the functional style that simply is not possible using traditional control structures such as the &lt;code&gt;while&lt;/code&gt; loop and explicit &lt;code&gt;if&lt;/code&gt; conditional as in the first example of this article. (*) Essentially, the functional style is &lt;em&gt;composable&lt;/em&gt; using basic building blocks, which is another huge win. Because of this composability, in general you tend to write less code, and the code that you do write tends to be more focused on the business logic you are actually trying to perform. And when you do see the same pattern repeated several times, it is much easier to extract the commonality using the functional style building blocks as we did to create the generic &lt;code&gt;findFirst&lt;/code&gt; method in the last example. To paraphrase Yoda, once you start down the path to the functional side, forever will it dominate your destiny. Unlike the dark side of the Force, however, the functional side is much better and nicer. Until next time, &lt;em&gt;arrivederci&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;You can find all the sample code used in this blog and the others in this series on my GitHub in the &lt;a href=&quot;https://github.com/sleberknight/java8-blog-code&quot;&gt;java8-blog-code&lt;/a&gt; repository.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;(*) Yes, you can simulate functional programming using anonymous inner classes prior to Java 8, or you can use a library like &lt;a href=&quot;https://github.com/google/guava&quot;&gt;Guava&lt;/a&gt; and use its functional programming idioms. In general this tends to be verbose and you end up with more complicated and awkward-looking code. As the Guava team &lt;a href=&quot;https://github.com/google/guava/wiki/FunctionalExplained&quot;&gt;explains&lt;/a&gt;:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&lt;em&gt;Excessive use of Guava&apos;s functional programming idioms can lead to
verbose, confusing, unreadable, and inefficient code. These are by
far the most easily (and most commonly) abused parts of Guava, and
when you go to preposterous lengths to make your code &quot;a one-liner,&quot;
the Guava team weeps&lt;/em&gt;.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;&lt;em&gt;This blog was originally published on the &lt;a href=&quot;http://www.fortitudetec.com&quot;&gt;Fortitude Technologies&lt;/a&gt; blog &lt;a href=&quot;http://www.fortitudetec.com/blogs/2016/10/11/towards-more-functional-java-using-generators-and-filters&quot;&gt;here&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/towards_more_functional_java_using1</guid>
    <title>Towards More Functional Java using Lambdas as Predicates</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/towards_more_functional_java_using1</link>
        <pubDate>Tue, 13 Sep 2016 11:45:00 +0000</pubDate>
    <category>Development</category>
    <category>java</category>
    <category>refactoring</category>
            <description>&lt;p&gt;Previously I &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/towards_more_functional_java_using&quot;&gt;showed&lt;/a&gt; an example that transformed a map of query parameters into a SOLR search string. The pre-Java 8 code used a traditional &lt;code&gt;for&lt;/code&gt; loop with a conditional and used a &lt;code&gt;StringBuilder&lt;/code&gt; to incrementally build a string. The Java 8 code streamed over the map entries, mapping (transforming) each entry to a string of the form &lt;code&gt;&quot;key:value&quot;&lt;/code&gt; and finally used a &lt;code&gt;Collector&lt;/code&gt; to join those query fragments together. This is a common pattern in functional-style code, in which a for loop transforms one collection of objects into a collection of different objects, optionally filters some of them out, and optionally reduce the collection to a single element. These are common patterns in the functional style - map, filter, reduce, etc. You can almost always replace a for loop with conditional filtering and reduction into a Java 8 stream with map, filter, and reduce (collect) operations.&lt;/p&gt;

&lt;p&gt;But in addition to the stream API, Java 8 also introduced some nice new API methods that make certain things much simpler. For example, suppose we have the following method to remove all map entries for a given set of keys. In the example code, &lt;code&gt;dataCache&lt;/code&gt; is a &lt;code&gt;ConcurrentMap&lt;/code&gt; and &lt;code&gt;deleteKeys&lt;/code&gt; is the set of keys we want to remove from that cache. Here is the original code I came across:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public void deleteFromCache(Set&amp;lt;String&amp;gt; deleteKeys) {
    Iterator&amp;lt;Map.Entry&amp;lt;String, Object&amp;gt;&amp;gt; iterator = dataCache.entrySet().iterator();
    while (iterator.hasNext()) {
        Map.Entry&amp;lt;String, Object&amp;gt; entry = iterator.next();
        if (deleteKeys.contains(entry.getKey())) {
            iterator.remove();
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now, you could argue there are better ways to do this, e.g. iterate the delete keys and remove each mapping using the &lt;code&gt;Map#remove(Object key)&lt;/code&gt; method. For example:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public void deleteFromCache(Set&amp;lt;String&amp;gt; deleteKeys) {
    for (String deleteKey : deleteKeys) {
        dataCache.remove(deleteKey);
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The code using the &lt;code&gt;for&lt;/code&gt; loop certainly seems cleaner than using the &lt;code&gt;Iterator&lt;/code&gt; in this case, though both are functionally equivalent. Can we do better? Java 8 introduced the &lt;code&gt;removeIf&lt;/code&gt; method as a default method, not in &lt;code&gt;Map&lt;/code&gt; but instead in the &lt;code&gt;Collection&lt;/code&gt; interface. This new method &quot;removes all of the elements of this collection that satisfy the given predicate&quot;, to quote from the Javadocs. This method accepts one argument, a &lt;code&gt;Predicate&lt;/code&gt;, which is a functional interface introduced in Java 8, and which can therefore be used in lambda expressions. Let&apos;s first implement this a regular old anonymous inner class, which you can always do even in Java 8. It looks like:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public void deleteFromCache(Set&amp;lt;String&amp;gt; deleteKeys) {
    dataCache.entrySet().removeIf(new Predicate&amp;lt;Map.Entry&amp;lt;String, Object&amp;gt;&amp;gt;() {
        @Override
        public boolean test(Map.Entry&amp;lt;String, Object&amp;gt; entry) {
            return deleteKeys.contains(entry.getKey());
        }
    });
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;As you can see, we first get the map&apos;s entry set via the &lt;code&gt;entrySet&lt;/code&gt; method and call &lt;code&gt;removeIf&lt;/code&gt; on it, supplying a &lt;code&gt;Predicate&lt;/code&gt; that tests whether the set of &lt;code&gt;deleteKeys&lt;/code&gt; contains the entry key. If this test returns true, the entry is removed. Since &lt;code&gt;Predicate&lt;/code&gt; is annotated with &lt;code&gt;@FunctionalInterface&lt;/code&gt; it can act as a lambda expression, a method reference, or a constructor reference according to &lt;a href=&quot;https://docs.oracle.com/javase/8/docs/api/java/lang/FunctionalInterface.html&quot;&gt;the Javadoc&lt;/a&gt;. So let&apos;s take the first step and convert the anonymous inner class into a lambda expression:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public void deleteFromCache(Set&amp;lt;String&amp;gt; deleteKeys) {
    dataCache.entrySet().removeIf((Map.Entry&amp;lt;String, Object&amp;gt; entry) -&amp;gt;
        deleteKeys.contains(entry.getKey()));
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In the above, we&apos;ve replaced the anonymous class with a lambda expression that takes a single &lt;code&gt;Map.Entry&lt;/code&gt; argument. But, Java 8 can &lt;em&gt;infer&lt;/em&gt; the argument types of lambda expressions, so we can remove the explicit (and a bit noisy) type declarations, leaving us with the following cleaner code:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public void deleteFromCache(Set&amp;lt;String&amp;gt; deleteKeys) {
    dataCache.entrySet().removeIf(entry -&amp;gt; deleteKeys.contains(entry.getKey()));
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This code is quite a bit nicer than the original code using an explicit &lt;code&gt;Iterator&lt;/code&gt;. But what about compared to the second code example that looped through the keys using a simple &lt;code&gt;for&lt;/code&gt; loop, and calling &lt;code&gt;remove&lt;/code&gt; to remove each element? The lines of code really aren&apos;t that different, so assuming they are functionally equivalent then perhaps it is just a style preference. The explicit for loop is a traditional imperative style, whereas the &lt;code&gt;removeIf&lt;/code&gt; has a more functional flavor to it. If you look at the actual implementation of &lt;code&gt;removeIf&lt;/code&gt; in the &lt;code&gt;Collection&lt;/code&gt; interface, it actually uses an &lt;code&gt;Iterator&lt;/code&gt; under the covers, just as with the first example in this post.&lt;/p&gt;

&lt;p&gt;So practically there is no difference in functionality. But, &lt;code&gt;removeIf&lt;/code&gt; could &lt;em&gt;theoretically&lt;/em&gt; be implemented for certain types of collections to perform the operation in parallel, and perhaps only for collections over a certain size where it can be shown that parallelizing the operation has benefits. But this simple example is really more about &lt;em&gt;separation of concerns&lt;/em&gt;, i.e. separating the logic of traversing the collection from the logic that determines whether or not an element is removed.&lt;/p&gt;

&lt;p&gt;For example, if a code base needs to remove elements from collections in many difference places, chances are good that it will end up having similar loop traversal logic intertwined with remove logic in many different places. In contrast, using the &lt;code&gt;removeIf&lt;/code&gt; function leads to only having the remove logic in the different locations - and the removal logic is really your business logic. And, if at some later point in time the traversal logic in the Java collections framework were to be improved somehow, e.g. parallelized for large collections, then &lt;em&gt;all&lt;/em&gt; the locations using that function &lt;em&gt;automatically&lt;/em&gt; receive the same benefit, whereas code that combines the traversal and remove logic using explicit &lt;code&gt;Iterator&lt;/code&gt; or loops would not.&lt;/p&gt;

&lt;p&gt;In this case, and many others, I&apos;d argue the separation of concerns is a much better reason to prefer functional style to imperative style. Separation of concerns leads to better, cleaner code and easier code &lt;em&gt;re-use&lt;/em&gt; precisely since those concerns can be implemented separately, and also tested separately, which results in not only cleaner production code but also cleaner test code. All of which leads to more maintainable code, which means new features and enhancements to existing code can be accomplished faster and with less chance of breaking existing code. Until the next post in this ad-hoc series on Java 8 features and a functional style, happy coding!&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This blog was originally published on the &lt;a href=&quot;http://www.fortitudetec.com&quot;&gt;Fortitude Technologies&lt;/a&gt; blog &lt;a href=&quot;http://www.fortitudetec.com/blogs/2016/9/12/towards-more-functional-java-using-lambdas-as-predicates&quot;&gt;here&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/towards_more_functional_java_using</guid>
    <title>Towards more functional Java using Streams and Lambdas</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/towards_more_functional_java_using</link>
        <pubDate>Tue, 23 Aug 2016 12:30:00 +0000</pubDate>
    <category>Development</category>
    <category>java</category>
    <category>refactoring</category>
            <description>&lt;p&gt;In the &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/reduce_java_boilerplate_using_try&quot;&gt;last post&lt;/a&gt; I showed how the Java 7 try-with-resources feature reduces boilerplate code, but probably more importantly how it removes errors related to unclosed resources, thereby eliminating an entire class of errors. In this post, the first in an ad-hoc series on Java 8 features, I&apos;ll show how the stream API can reduce the lines of code, but also how it can make the code more readable, maintainable, and less error-prone.&lt;/p&gt;

&lt;p&gt;The following code is from a simple back-end service that lets us query metadata about messages flowing through various systems. It takes a map of key-value pairs and creates a Lucene query that can be submitted to SOLR to obtain results. It is primarily used by developers to verify behavior in a distributed system, and it does not support very sophisticated queries, since it only ANDs the key-value pairs together to form the query. For example, given a parameter map containing the &lt;code&gt;(key, value)&lt;/code&gt; pairs &lt;code&gt;(lastName, Smith)&lt;/code&gt; and &lt;code&gt;(firstName, Bob)&lt;/code&gt;, the method would generate the query &lt;code&gt;&quot;lastName:Smith AND firstName:Bob&quot;&lt;/code&gt;. As I said, not very sophisticated.&lt;/p&gt;

&lt;p&gt;Here is the original code (where &lt;code&gt;AND&lt;/code&gt;, &lt;code&gt;COLON&lt;/code&gt;, and &lt;code&gt;DEFAULT_QUERY&lt;/code&gt; are constants):&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public String buildQueryString(Map&amp;lt;String, String&amp;gt; parameters) {
    int count = 0;
    StringBuilder query = new StringBuilder();

    for (Map.Entry&amp;lt;String, String&amp;gt; entry : parameters.entrySet()) {
        if (count &amp;gt; 0) {
            query.append(AND);
        }
        query.append(entry.getKey());
        query.append(COLON);
        query.append(entry.getValue());
        count++;
    }

    if (parameters.size() == 0) {
        query.append(DEFAULT_QUERY);
    }

    return query.toString();
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The core business logic should be very simple, since we only need to iterate the parameter map, join the keys and values with a colon, and finally join them together. But the code above, while not terribly hard to understand, has a lot of noise. First off, it uses two mutable variables (&lt;code&gt;count&lt;/code&gt; and &lt;code&gt;query&lt;/code&gt;) that are modified within the &lt;code&gt;for&lt;/code&gt; loop. The first thing in the loop is a conditional that is needed to determine whether we need to append the &lt;code&gt;AND&lt;/code&gt; constant, as we only want to do that after the first key-value pair is added to the query. Next, joining the keys and values is done by concatenating them, one by one, to the &lt;code&gt;StringBuilder&lt;/code&gt; holding the query. Finally the count must be incremented so that in subsequent loop iterations, we properly include the &lt;code&gt;AND&lt;/code&gt; delimiter. After the loop there is another conditional which appends &lt;code&gt;DEFAULT_QUERY&lt;/code&gt; if there are no parameters, and then we finally convert the &lt;code&gt;StringBuilder&lt;/code&gt; to a &lt;code&gt;String&lt;/code&gt; and return it.&lt;/p&gt;

&lt;p&gt;Here is the &lt;code&gt;buildQueryString&lt;/code&gt; method after refactoring it to use the Java 8 stream API:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public String buildQueryString(Map&amp;lt;String, String&amp;gt; parameters) {
    if (parameters.isEmpty()) {
        return DEFAULT_QUERY;
    }

    return parameters.entrySet().stream()
            .map(entry -&amp;gt; String.join(COLON, entry.getKey(), entry.getValue()))
            .collect(Collectors.joining(AND));
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;This code does the exact same thing, but in only 6 lines of code (counting the &lt;code&gt;map&lt;/code&gt; and &lt;code&gt;collect&lt;/code&gt; lines as separate even though technically they are part of the stream call chain) instead of 15. But just measuring lines of code isn&apos;t everything. The main difference here is the lack of mutable variables, no external iteration via explicit looping constructs, and no conditional statements other than the empty check which short circuits and returns &lt;code&gt;DEFAULT_QUERY&lt;/code&gt; when there are no parameters. The code reads like a functional declaration of what we want to accomplish: stream over the parameters, convert each (key, value) to &lt;code&gt;&quot;key:value&quot;&lt;/code&gt; and join them all together using the delimiter &lt;code&gt;AND&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;The specific Java 8 features we&apos;ve used here start with the &lt;code&gt;stream()&lt;/code&gt; method to convert the map entry set to a Java 8 &lt;code&gt;java.util.stream.Stream&lt;/code&gt;. We then use the &lt;code&gt;map&lt;/code&gt; operation on the stream, which applies a function (&lt;code&gt;String.join&lt;/code&gt;) to each element (&lt;code&gt;Map.Entry&lt;/code&gt;) in the stream. Finally, we use the &lt;code&gt;collect&lt;/code&gt; method to &lt;em&gt;reduce&lt;/em&gt; the elements using the &lt;code&gt;joining&lt;/code&gt; collector into the resulting string that is the actual query we wanted to build. In the &lt;code&gt;map&lt;/code&gt; method we&apos;ve also made use of a &lt;em&gt;lambda expression&lt;/em&gt; to specify exactly what transformation to perform on each map entry.&lt;/p&gt;

&lt;p&gt;&lt;/p&gt;

&lt;p&gt;By removing explicit iteration and mutable variables, the code is more readable, in that a developer seeing this code for the first time will have an easier and quicker time understanding what it does. Note that much of the &lt;em&gt;how&lt;/em&gt; it does things has been removed, for example the iteration is now implicit via the &lt;code&gt;Stream&lt;/code&gt;, and the &lt;code&gt;joining&lt;/code&gt; collector now does the work of inserting a delimiter between the elements. You&apos;re now &lt;em&gt;declaring&lt;/em&gt; what you want to happen, instead of having to explicitly perform all the tedium yourself. This is more of a &lt;em&gt;functional style&lt;/em&gt; than most Java developers are used to, and at first it can be a bit jarring, but as you practice and get used to it, the more you&apos;ll probably like it and you&apos;ll find youself able to read and write this style of code much more quickly than traditional code with lots of loops and conditionals. Generally there is also less code than when using traditional looping and control structures, which is another benefit for maintenance. I won&apos;t go so far as to say Java 8 is a functional language like Clojure or Haskell - since it isn&apos;t - but code like this has a more functional &lt;em&gt;flavor&lt;/em&gt; to it.&lt;/p&gt;

&lt;p&gt;There is now a metric ton of content on the internet related to Java 8 streams, but in case this is all new to you, or you&apos;re just looking for a decent place to begin learning more in-depth, the &lt;a href=&quot;https://docs.oracle.com/javase/8/docs/api/java/util/stream/package-summary.html&quot;&gt;API documentation&lt;/a&gt; for the &lt;code&gt;java.util.stream&lt;/code&gt; package is a good place to start. Venkat Subramaniam&apos;s &lt;a href=&quot;https://pragprog.com/book/vsjava8/functional-programming-in-java&quot;&gt;Functional Programming in Java&lt;/a&gt; is another good resource, and at less than 200 pages can be digested pretty quickly. And for more on lambda expressions, the &lt;a href=&quot;https://docs.oracle.com/javase/tutorial/java/javaOO/lambdaexpressions.html&quot;&gt;Lambda Expressions&lt;/a&gt; tutorial in the official Java Tutorials is a decent place to begin. In the next post, we&apos;ll see another example where a simple Java 8 API addition combined with a lambda expression simplifies code, making it more readable and maintainable. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;This blog was originally published on the &lt;a href=&quot;http://www.fortitudetec.com&quot;&gt;Fortitude Technologies&lt;/a&gt; blog &lt;a href=&quot;http://www.fortitudetec.com/blogs/2016/8/22/java8-query-builder&quot;&gt;here&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/reduce_java_boilerplate_using_try</guid>
    <title>Reduce Java boilerplate using try-with-resources</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/reduce_java_boilerplate_using_try</link>
        <pubDate>Thu, 11 Aug 2016 12:12:00 +0000</pubDate>
    <category>Development</category>
    <category>java</category>
            <description>&lt;p&gt;Java 8 has been out for a while, and Java 7 has been out even longer. But even so, many people still unfortunately are not taking advantage of some of the new features, many of which make reading and writing Java code much more pleasant. For example, Java 7 introduced some relatively simple things like strings in switch statements, underscores in numeric literals (e.g. &lt;code&gt;1_000_000&lt;/code&gt; is easier to read and see the magnitude than just &lt;code&gt;1000000&lt;/code&gt;), and the try-with-resources statement. Java 8 went a lot further and introduced lambda expressions, the streams API, a new date/time API based on the Joda Time library, &lt;code&gt;Optional&lt;/code&gt;, and more.&lt;/p&gt;

&lt;p&gt;In this blog and in a few subsequent posts, I will take a simple snippet of code from a real project, and show what the code looked like originally and what it looked like after refactoring it to be more readable and maintainable. To start, this blog will actually tackle the try-with-resources statement introduced in Java 7. Many people even in 2016 still seem not to be aware of this statement, which not only makes the code less verbose, but also eliminates an entire class of errors resulting from failure to close I/O or other resources.&lt;/p&gt;

&lt;p&gt;Without further ado (whatever ado actually means), here is a method that was used to check port availability when starting up services.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public boolean isPortAvailable(final int port) {
    ServerSocket serverSocket = null;
    DatagramSocket dataSocket = null;

    try {
        serverSocket = new ServerSocket(port);
        serverSocket.setReuseAddress(true);
        dataSocket = new DatagramSocket(port);
        dataSocket.setReuseAddress(true);
        return true;
    } catch (IOException e) {
        return false;
    } finally {
        if (dataSocket != null) {
            dataSocket.close();
        }

        if (serverSocket != null) {
            try {
                serverSocket.close();
            } catch (IOException e) {
                // ignored
            }
        }
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The core logic for the above code is pretty simple: open a &lt;code&gt;ServerSocket&lt;/code&gt; and a &lt;code&gt;DatagramSocket&lt;/code&gt; and if both opened without throwing an exception, then the port is open. It&apos;s all the extra boilerplate code and exception handling that makes the code so lengthy and error-prone, because we need to make sure to close the sockets in the &lt;code&gt;finally&lt;/code&gt; block, being careful to first check they are not null. For good measure, the &lt;code&gt;ServerSocket#close&lt;/code&gt; method throws yet another &lt;code&gt;IOException&lt;/code&gt;, which we simply ignore but are required to catch nonetheless. A lot of extra code which obscures the actual simple core of the code.&lt;/p&gt;

&lt;p&gt;Here&apos;s the refactored version which makes use of the try-with-resources statement from Java 7.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public boolean isPortAvailable(final int port) {
    try (ServerSocket serverSocket = new ServerSocket(port); 
         DatagramSocket dataSocket = new DatagramSocket(port)) {
        serverSocket.setReuseAddress(true);
        dataSocket.setReuseAddress(true);
        return true;
    } catch (IOException e) {
        return false;
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;As you can hopefully see, this code has the same core logic, but much less of the boilerplate. There is not only less code (7 lines instead of 22), but the code is much more readable since only the core logic remains. We are still catching the &lt;code&gt;IOException&lt;/code&gt; that can be thrown by the &lt;code&gt;ServerSocket&lt;/code&gt; and &lt;code&gt;DatagramSocket&lt;/code&gt; constructors, but we no longer need to deal with the routine closing of those socket resources. The try-with-resources statement does that task for us, automatically closing any resources opened in the declaration statement that immediately follows the &lt;code&gt;try&lt;/code&gt; keyword.&lt;/p&gt;

&lt;p&gt;The one catch is that the declared resources must implement the &lt;code&gt;AutoCloseable&lt;/code&gt; interface, which itself extends &lt;code&gt;Closeable&lt;/code&gt;. Since the Java APIs make extensive use of &lt;code&gt;Closeable&lt;/code&gt; and &lt;code&gt;AutoCloseable&lt;/code&gt; this means that most things you&apos;ll want to use can be handled via try-with-resources. Classes that don&apos;t implement &lt;code&gt;AutoCloseable&lt;/code&gt; cannot be used directly in try-with-resources statments. For example, if you are unfortunate enough to still need to deal with XML, for example if you need to use the old-school &lt;code&gt;XMLStreamReader&lt;/code&gt; then you are out of luck since it doesn&apos;t implement &lt;code&gt;Closeable&lt;/code&gt; or &lt;code&gt;AutoCloseable&lt;/code&gt;. I generally fix those types of things by creating a small wrapper/decorator class, e.g. &lt;code&gt;CloseableXMLStreamReader&lt;/code&gt;, but sometimes it simply isn&apos;t worth the trouble unless you are using it in many difference places.&lt;/p&gt;

&lt;p&gt;For more information on try-with-resources, the Java tutorials on Oracle&apos;s website has a more in-depth article &lt;a href=&quot;http://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html&quot;&gt;here&lt;/a&gt;. In subsequent posts, I&apos;ll show some before/after code that makes use of Java 8 features such as the stream API and lambda expressions.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;This blog was originally published on the &lt;a href=&quot;http://www.fortitudetec.com&quot;&gt;Fortitude Technologies&lt;/a&gt; blog &lt;a href=&quot;http://www.fortitudetec.com/blogs/2016/8/8/java-try-with-resources&quot;&gt;here&lt;/a&gt;&lt;/em&gt;.&lt;/p&gt;
</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/slides_for_restful_web_services</guid>
    <title>Slides for RESTful Web Services with Jersey presentation</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/slides_for_restful_web_services</link>
        <pubDate>Tue, 10 Jun 2014 10:50:00 +0000</pubDate>
    <category>Development</category>
    <category>java</category>
    <category>rest</category>
    <category>jersey</category>
            <description>&lt;p&gt;While teaching a course on web development which included Ruby on Rails and Java segments, we used &lt;a href=&quot;https://jersey.java.net&quot; title=&quot;Jersey&quot;&gt;Jersey&lt;/a&gt; to expose a simple web services which the Rails application consumed. I put together a presentation on Jersey that I recently gave. Here are the slides:&lt;/p&gt;

&lt;p&gt;&lt;iframe src=&quot;http://www.slideshare.net/slideshow/embed_code/35586008&quot; width=&quot;427&quot; height=&quot;356&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px 1px 0; margin-bottom:5px; max-width: 100%;&quot; allowfullscreen&gt; &lt;/iframe&gt; &lt;div style=&quot;margin-bottom:5px&quot;&gt; &lt;strong&gt; &lt;a href=&quot;https://www.slideshare.net/scottleber/jersey-35586008&quot; title=&quot;RESTful Web Services with Jersey&quot; target=&quot;_blank&quot;&gt;RESTful Web Services with Jersey&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href=&quot;http://www.slideshare.net/scottleber&quot; target=&quot;_blank&quot;&gt;Scott Leberknight&lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;&lt;/p&gt;
</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/slides_for_httpie_presentation</guid>
    <title>Slides for httpie presentation</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/slides_for_httpie_presentation</link>
        <pubDate>Mon, 9 Jun 2014 09:40:00 +0000</pubDate>
    <category>Development</category>
    <category>curl</category>
    <category>rest</category>
    <category>httpie</category>
            <description>&lt;p&gt;I&apos;ve used &lt;a href=&quot;http://curl.haxx.se&quot; title=&quot;cURL&quot;&gt;cURL&lt;/a&gt; for a long time but I can never seem to remember all the various flags and settings. Recently I came across &lt;a href=&quot;http://httpie.org&quot; title=&quot;httpie&quot;&gt;httpie&lt;/a&gt; which is a simple command line tool for accessing HTTP resources. Here are the presentation slides:&lt;/p&gt;

&lt;p&gt;&lt;iframe src=&quot;http://www.slideshare.net/slideshow/embed_code/35585955&quot; width=&quot;427&quot; height=&quot;356&quot; frameborder=&quot;0&quot; marginwidth=&quot;0&quot; marginheight=&quot;0&quot; scrolling=&quot;no&quot; style=&quot;border:1px solid #CCC; border-width:1px 1px 0; margin-bottom:5px; max-width: 100%;&quot; allowfullscreen&gt; &lt;/iframe&gt; &lt;div style=&quot;margin-bottom:5px&quot;&gt; &lt;strong&gt; &lt;a href=&quot;https://www.slideshare.net/scottleber/htt-pie-minitalk&quot; title=&quot;httpie&quot; target=&quot;_blank&quot;&gt;httpie&lt;/a&gt; &lt;/strong&gt; from &lt;strong&gt;&lt;a href=&quot;http://www.slideshare.net/scottleber&quot; target=&quot;_blank&quot;&gt;Scott Leberknight&lt;/a&gt;&lt;/strong&gt; &lt;/div&gt;&lt;/p&gt;
</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/building_a_distributed_lock_revisited</guid>
    <title>Building a Distributed Lock Revisited: Using Curator&apos;s InterProcessMutex</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/building_a_distributed_lock_revisited</link>
        <pubDate>Mon, 30 Dec 2013 00:00:00 +0000</pubDate>
    <category>Development</category>
    <category>curator</category>
    <category>distributed-computing</category>
    <category>java</category>
    <category>zookeeper</category>
            <description>&lt;p&gt;Last summer I wrote a series of blogs introducing &lt;a href=&quot;http://zookeeper.apache.org/&quot;&gt;Apache ZooKeeper&lt;/a&gt;, which is a distributed coordination service used in many open source projects like &lt;a href=&quot;http://hadoop.apache.org&quot;&gt;Hadoop&lt;/a&gt;, &lt;a href=&quot;http://hbase.apache.org&quot;&gt;HBase&lt;/a&gt;, and &lt;a href=&quot;http://storm-project.net&quot;&gt;Storm&lt;/a&gt; to manage clusters of machines. The &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/distributed_coordination_with_zookeeper_part4&quot;&gt;fifth blog&lt;/a&gt; described how to use ZooKeeper to implement a distributed lock. In that blog I explained that the goals of a distributed lock are &quot;to build a mutually exclusive lock between processes that could be running on different machines, possibly even on different networks or different data centers&quot;. I also mentioned that one significant benefit is that &quot;clients know nothing about each other; they only know they need to use the lock to access some shared resource, and that they should not access it unless they own the lock.&quot; That blog described how to use the ZooKeeper &lt;code&gt;WriteLock&lt;/code&gt; &quot;recipe&quot; that comes with ZooKeeper in the contrib modules to build a synchronous &lt;code&gt;BlockingWriteLock&lt;/code&gt; with easier semantics in which you simply call a &lt;code&gt;lock()&lt;/code&gt; method to acquire the lock, and call &lt;code&gt;unlock()&lt;/code&gt; to release the lock. Earlier in the series, we learned how to connect to ZooKeeper in the &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/distributed_coordination_with_zookeeper_part2&quot;&gt;Group Membership Example blog&lt;/a&gt; using a &lt;code&gt;Watcher&lt;/code&gt; and a &lt;code&gt;CountDownLatch&lt;/code&gt; to block until the &lt;code&gt;SyncConnected&lt;/code&gt; event was received. All that code wasn&apos;t terribly complex but it also was fairly low-level, especially if you include the need to block until a connection event is received and the non-trival implementation of the &lt;code&gt;WriteLock&lt;/code&gt; recipe.&lt;/p&gt;

&lt;p&gt;In the &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/distributed_coordination_with_zookeeper_part5&quot;&gt;wrap-up blog&lt;/a&gt; I mentioned the &lt;a href=&quot;http://curator.apache.org/&quot;&gt;Curator&lt;/a&gt; project, originally opened sourced by Netflix and later donated by them to Apache. The Curator wiki describes Curator as &quot;a set of Java libraries that make using Apache ZooKeeper much easier&quot;. In this blog we&apos;ll see how to use Curator to implement a distributed lock, without needing to write any of our own wrapper code for obtaining a connection or to implement the lock itself. In the distributed lock blog we saw how sequential ephemeral child nodes (e.g. &lt;code&gt;child-lock-node-0000000000&lt;/code&gt;, &lt;code&gt;child-lock-node-0000000001&lt;/code&gt;, &lt;code&gt;child-lock-node-0000000002&lt;/code&gt;, etc.) are created under a persistent parent lock node. The client holding the lock on the child with the lowest sequence number owns the lock. We saw several potential gotchas: first, how does a client know whether it successfully created a child node in the case of a partial failure, i.e. a (temporary) connection loss, and how does it know which child node it created, i.e. the child with which sequence number? I noted that a solution was the embed the ZooKeeper session ID in the child node such that the client can easily identify the child node it created. Jordan Zimmerman (the creator of Curator) was kind enough to post a comment to that blog noting that using the session ID is &quot;not ideal&quot; because it &quot;prevents the same ZK connection from being used in multiple threads for the same lock&quot;. He said &quot;It&apos;s much better to use a GUID. This is what Curator uses.&quot;&lt;/p&gt;

&lt;p&gt;Second, we noted that distributed lock clients should watch only the immediately preceding child node rather than the parent node in order to prevent a &quot;herd effect&quot; in which every client is notified for every single child node event, when in reality each client only need care about the child immediately preceding it. Curator handles both these cases plus adds other goodies such as a retry policy for connecting to ZooKeeper. So without further comment, lets see how to use a distributed lock in Curator.&lt;/p&gt;

&lt;p&gt;First, we&apos;ll need to get an instance of &lt;code&gt;CuratorFramework&lt;/code&gt; - this is an interface that represents a higher level abstraction API for working with ZooKeeper. It provides automatic connection management including retry operations, a fluent-style API, as well as a bunch of recipes you can use out-of-the-box for distributed data structures like locks, queues, leader election, etc. We can use the &lt;code&gt;CuratorFrameworkFactory&lt;/code&gt; and a &lt;code&gt;RetryPolicy&lt;/code&gt; of our choosing to get one.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;String hosts = &quot;host-1:2181,host-2:2181,host-3:2181&quot;;
int baseSleepTimeMills = 1000;
int maxRetries = 3;

RetryPolicy retryPolicy = new ExponentialBackoffRetry(baseSleepTimeMills, maxRetries);
CuratorFramework client = CuratorFrameworkFactory.newClient(hosts, retryPolicy);
client.start();
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In the above code we first create a retry policy - in this case an &lt;code&gt;ExponentialBackoffRetry&lt;/code&gt; using a base sleep time of 1000 milliseconds and up to 3 retries. Then we can use the &lt;code&gt;CuratorFrameworkFactory.newClient()&lt;/code&gt; to obtain an instance of &lt;code&gt;CuratorFramework&lt;/code&gt;. Finally we need to call &lt;code&gt;start()&lt;/code&gt; (note we&apos;ll need to call &lt;code&gt;close()&lt;/code&gt; when we&apos;re done with the client). Now that we have a client instance, we can use an implementation of &lt;code&gt;InterProcessLock&lt;/code&gt; to create our distributed lock. The simplest one is the &lt;code&gt;InterProcessMutex&lt;/code&gt; which is a &lt;em&gt;re-entrant&lt;/em&gt; mutual exclusion lock that works across JVMs, by using ZooKeeper to hold the lock.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;InterProcessLock lock = new InterProcessMutex(client, lockPath);
lock.acquire();
try {
  // do work while we hold the lock
} catch (Exception ex) {
  // handle exceptions as appropriate
} finally {
  lock.release();
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The above code simply creates a &lt;code&gt;InterProcessMutex&lt;/code&gt; for a specific lock path (&lt;code&gt;lockPath&lt;/code&gt;), acquires the lock, does some work, and releases the lock. In this case &lt;code&gt;acquire()&lt;/code&gt; will block until the lock becomes available. In many cases blocking indefinitely won&apos;t be a Good Thing, and Curator provides an overloaded version of &lt;code&gt;acquire()&lt;/code&gt; which requires a maximum time to wait for the lock and returns &lt;code&gt;true&lt;/code&gt; if the lock is obtained within the time limit and &lt;code&gt;false&lt;/code&gt; otherwise.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;InterProcessLock lock = new InterProcessMutex(client, lockPath);
if (lock.acquire(waitTimeSeconds, TimeUnit.SECONDS)) {
  try {
    // do work while we hold the lock
  } catch (Exception ex) {
    // handle exceptions as appropriate
  } finally {
    lock.release();
  }
} else {
  // we timed out waiting for lock, handle appropriately
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The above code demonstrates using the timeout version of &lt;code&gt;acquire&lt;/code&gt;. The code is slightly more complex since you need to check whether the lock is acquired or whether we timed out waiting for the lock. Regardless of which version of &lt;code&gt;acquire()&lt;/code&gt; you use, you&apos;ll need to &lt;code&gt;release()&lt;/code&gt; the lock in a &lt;code&gt;finally&lt;/code&gt; block. The final piece is to remember to close the client when you&apos;re done with it:&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;client.close();
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;And that&apos;s pretty much it for using Curator&apos;s &lt;code&gt;InterProcessMutex&lt;/code&gt; to implement a distributed lock. All the complexity in handling connection management, partial failures, the &quot;herd effect&quot;, automatic retries, and so on are handled by the higher level Curator APIs. To paraphrase &lt;a href=&quot;https://www.google.com/#q=stu%20halloway&quot;&gt;Stu Halloway&lt;/a&gt;, you should always understand at least one layer beneath the one you&apos;re working at - in this case you should have a decent understanding of how ZooKeeper works under the covers and some of the potential issues of distributed computing. But having said that, go ahead and use Curator to work at a higher level of abstraction and gain the benefits of all the distributed computing experience at Netflix as well as Yahoo (which created ZooKeeper). And last, Happy New Year 2014!&lt;/p&gt;
</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/handling_big_data_with_hbase5</guid>
    <title>Handling Big Data with HBase Part 6: Wrap-up</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/handling_big_data_with_hbase5</link>
        <pubDate>Fri, 20 Dec 2013 00:00:00 +0000</pubDate>
    <category>Development</category>
    <category>java</category>
    <category>hadoop</category>
    <category>distributed-computing</category>
    <category>hbase</category>
            <description>&lt;p&gt;&lt;em&gt;This is the sixth and final blog in an introduction to &lt;a href=&quot;http://hbase.apache.org/&quot;&gt;Apache HBase&lt;/a&gt;. In the &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/handling_big_data_with_hbase4&quot;&gt;fifth part&lt;/a&gt;, we learned the basics of schema design in HBase and several techniques you can use to make scanning and filtering data more efficient. We also saw two different basic design schemes (&quot;wide&quot; and &quot;tall&quot;) for storing information about the same entity, and briefly touched upon more advanced topics like adding full-text search and secondary indexes. In this part, we&apos;ll wrap up by summarizing the main points and then listing the (many) things we didn&apos;t cover in this introduction to HBase series.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;HBase is a distributed database providing inherent scalability, performance, and fault-tolerance across potentially massive clusters of commodity servers. It provides the means to store and efficiently scan large swaths of data. We&apos;ve looked at the HBase shell for basic interaction, covered the high-level HBase architecture and looked at using the Java API to create, get, scan, and delete data. We also considered how to design tables and row keys for efficient data access.&lt;/p&gt;

&lt;p&gt;One thing you certainly noticed when working with the HBase Java API is that it is much lower level than other data APIs you might be used to working with, for example JDBC or JPA. You get the basics of CRUD plus scanning data, and that&apos;s about it. In addition, you work directly with byte arrays which is about as low-level as it gets when you&apos;re trying to retrieve information from a datastore.&lt;/p&gt;

&lt;p&gt;If you are considering whether to use HBase, you should really think hard about how large the data is, i.e. does your app need to be able to accomodate ever-growing volumes of data? If it does, then you need to think hard about what that data looks like and what the most likely data access patterns will be, as this will drive your schema design and data access patters. For example, if you are designing a schema for a weather collection project, you will want to consider using a &quot;tall&quot; schema design such that the sensor readings for each sensor are split across rows as opposed to a &quot;wide&quot; design in which you keep adding columns to a column family in a single row. Unlike relational models in which you work hard to normalize data and then use SQL as a flexible way to join the data in various ways, with HBase you need to think much more up-front about the data access patterns, because retrieval by row key and table scans are the only two ways to access data. In other words, there is no joining across multiple HBase tables and projecting out the columns you need. When you retrieve data, you want to only ask HBase for the exact data you need.&lt;/p&gt;

&lt;h1&gt;Things We Didn&apos;t Cover&lt;/h1&gt;

&lt;p&gt;Now let&apos;s discuss a few things we didn&apos;t cover. First, coprocessors were a major addition to HBase in version 0.92, and were inspired by Google adding coprocessors to its Bigtable data store. You can, at a high level, think of coprocessors like triggers or stored procedures in relational databases. Basically you can have either trigger-like functionality via observers, or stored-procedure functionality via RPC endpoints. This allows many new things to be accomplished in an elegant fashion, for example maintaining secondary indexes via observing changes to data.&lt;/p&gt;

&lt;p&gt;We showed basic API usage, but there is more advanced usage possible with the API. For example, you can batch data and provide much more advanced filtering behavior than a simple paging filter like we showed. There is also the concept of &lt;em&gt;counters&lt;/em&gt;, which allows you to do atomic increments of numbers without requiring the client to perform explicit row locking. And if you&apos;re not really into Java, there are external APIs available via Thrift and REST gateways. There&apos;s also even a C/C++ client available and there are DSLs for Groovy, Jython, and Scala. These are all discussed on the HBase wiki.&lt;/p&gt;

&lt;p&gt;Cluster setup and configuration was not covered at all, nor was performance tuning. Obviously these are hugely important topics and the references below are good starting places. With HBase you not only need to worry about tuning HBase configuration, but also tuning Hadoop (or more specifically, the HDFS file system). For these topics definitely start with the HBase References Guide and also check out HBase: The Definitive Guide by Lars George.&lt;/p&gt;

&lt;p&gt;We also didn&apos;t cover how to Map/Reduce with HBase. Essentially you can use Hadoop&apos;s Map/Reduce framework to access HBase tables and perform tasks like aggregation in a Map/Reduce-style.&lt;/p&gt;

&lt;p&gt;Last there is security (which I suppose should be expected to come last for a developer, right?) in HBase. There are two types of security I&apos;m referring to here: first is access to HBase itself in order to create, read, update, and delete data, e.g. via requiring Kerberos authentication to connect to HBase. The second type of security is ACL-based access restrictions. HBase as of this writing you can restrict access via ACLs at the table and column family level. However, &lt;a href=&quot;https://communities.intel.com/community/datastack/blog/2013/10/29/hbase-cell-security&quot;&gt;HBase Cell Security&lt;/a&gt; describes how cell-level security features similar to &lt;a href=&quot;http://accumulo.apache.org&quot;&gt;Apache Accumulo&lt;/a&gt; are being added to HBase and are scheduled to be released in version 0.98 in &lt;a href=&quot;https://issues.apache.org/jira/browse/HBASE-8496&quot;&gt;this issue&lt;/a&gt; (the current version as of this writing is 0.96).&lt;/p&gt;

&lt;h1&gt;Goodbye!&lt;/h1&gt;

&lt;p&gt;With this background, you can now consider whether HBase makes sense on future projects with Big Data and high scalability requirements. I hope you found this series of posts useful as an introduction to HBase.&lt;/p&gt;

&lt;h1&gt;References&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;HBase web site, &lt;a href=&quot;http://hbase.apache.org/&quot;&gt;http://hbase.apache.org/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;HBase wiki, &lt;a href=&quot;http://wiki.apache.org/hadoop/Hbase&quot;&gt;http://wiki.apache.org/hadoop/Hbase&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;HBase Reference Guide &lt;a href=&quot;http://hbase.apache.org/book/book.html&quot;&gt;http://hbase.apache.org/book/book.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;HBase: The Definitive Guide, &lt;a href=&quot;http://bit.ly/hbase-definitive-guide&quot;&gt;http://bit.ly/hbase-definitive-guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Google Bigtable Paper, http://labs.google.com/papers/bigtable.html&lt;/li&gt;
&lt;li&gt;Hadoop web site, &lt;a href=&quot;http://hadoop.apache.org/&quot;&gt;http://hadoop.apache.org/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Hadoop: The Definitive Guide, &lt;a href=&quot;http://bit.ly/hadoop-definitive-guide&quot;&gt;http://bit.ly/hadoop-definitive-guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Fallacies of Distributed Computing, &lt;a href=&quot;http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing&quot;&gt;http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;HBase lightning talk slides, &lt;a href=&quot;http://www.slideshare.net/scottleber/hbase-lightningtalk&quot;&gt;http://www.slideshare.net/scottleber/hbase-lightningtalk&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Sample code, https://github.com/sleberknight/basic-hbase-examples&lt;/li&gt;
&lt;/ul&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/handling_big_data_with_hbase4</guid>
    <title>Handling Big Data with HBase Part 5: Data Modeling (or, Life without SQL)</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/handling_big_data_with_hbase4</link>
        <pubDate>Wed, 18 Dec 2013 00:00:00 +0000</pubDate>
    <category>Development</category>
    <category>distributed-computing</category>
    <category>hbase</category>
    <category>hadoop</category>
    <category>java</category>
            <description>&lt;p&gt;&lt;em&gt;This is the fifth of a series of blogs introducing &lt;a href=&quot;http://hbase.apache.org/&quot;&gt;Apache HBase&lt;/a&gt;. In the &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/handling_big_data_with_hbase3&quot;&gt;fourth part&lt;/a&gt;, we saw the basics of using the Java API to interact with HBase to create tables, retrieve data by row key, and do table scans. This part will discuss how to design schemas in HBase.&lt;/em&gt; &lt;/p&gt;

&lt;p&gt;HBase has nothing similar to a rich query capability like SQL from relational databases. Instead, it forgoes this capability and others like relationships, joins, etc. to instead focus on providing scalability with good performance and fault-tolerance. So when working with HBase you need to design the row keys and table structure in terms of rows and column families to match the data access patterns of your application. This is completely opposite what you do with relational databases where you start out with a normalized database schema, separate tables, and then you use SQL to perform joins to combine data in the ways you need. With HBase you design your tables specific to how they will be accessed by applications, so you need to think much more up-front about how data is accessed. You are much closer to the bare metal with HBase than with relational databases which abstract implementation details and storage mechanisms. However, for applications needing to store massive amounts of data and have inherent scalability, performance characteristics and tolerance to server failures, the potential benefits can far outweigh the costs.&lt;/p&gt;

&lt;p&gt;In the &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/handling_big_data_with_hbase3&quot;&gt;last part on the Java API&lt;/a&gt;, I mentioned that when scanning data in HBase, the row key is critical since it is the primary means to restrict the rows scanned; there is nothing like a rich query like SQL as in relational databases. Typically you create a scan using start and stop row keys and optionally add filters to further restrict the rows and columns data returned. In order to have some flexibility when scanning, the row key should be designed to contain the information you need to find specific subsets of data. In the blog and people examples we&apos;ve seen so far, the row keys were designed to allow scanning via the most common data access patterns. For the blogs, the row keys were simply the posting date. This would permit scans in ascending order of blog entries, which is probably not the most common way to view blogs; you&apos;d rather see the most recent blogs first. So a better row key design would be to use a reverse order timestamp, which you can get using the formula &lt;code&gt;(Long.MAX_VALUE - timestamp)&lt;/code&gt;, so scans return the most recent blog posts first. This makes it easy to scan specific time ranges, for example to show all blogs in the past week or month, which is a typical way to navigate blog entries in web applications.&lt;/p&gt;

&lt;p&gt;For the &lt;code&gt;people&lt;/code&gt; table examples, we used a composite row key composed of last name, first name, middle initial, and a (unique) person identifier to distinguish people with the exact same name, separated by dashes. For example, Brian M. Smith with identifier 12345 would have row key &lt;code&gt;smith-brian-m-12345&lt;/code&gt;. Scans for the &lt;code&gt;people&lt;/code&gt; table can then be composed using start and end rows designed to retrieve people with specific last names, last names starting with specific letter combinations, or people with the same last name and first name initial. For example, if you wanted to find people whose first name begins with &lt;code&gt;B&lt;/code&gt; and last name is &lt;code&gt;Smith&lt;/code&gt; you could use the start row key &lt;code&gt;smith-b&lt;/code&gt; and stop row key &lt;code&gt;smith-c&lt;/code&gt; (the start row key is inclusive while the stop row key is exclusive, so the stop key &lt;code&gt;smith-c&lt;/code&gt; ensures all Smiths with first name starting with the letter &quot;B&quot; are included). You can see that HBase supports the notion of partial keys, meaning you do not need to know the exact key, to provide more flexibility creating appropriate scans. You can combine partial key scans with filters to retrieve only the specific data needed, thus optimizing data retrieval for the data access patterns specific to your application.&lt;/p&gt;

&lt;p&gt;So far the examples have involved only single tables containing one type of information and no related information. HBase does not have foreign key relationships like in relational databases, but because it supports rows having up to millions of columns, one way to design tables in HBase is to encapsulate related information in the same row - a &quot;wide&quot; table design. It is called a &quot;wide&quot; design since you are storing all information related to a row together in as many columns as there are data items. In our blog example, you might want to store comments for each blog. The &quot;wide&quot; way to design this would be to include a column family named &lt;code&gt;comments&lt;/code&gt; and then add columns to the &lt;code&gt;comment&lt;/code&gt; family where the qualifiers are the comment timestamp; the comment columns would look like &lt;code&gt;comments:20130704142510&lt;/code&gt; and &lt;code&gt;comments:20130707163045&lt;/code&gt;. Even better, when HBase retrieves columns it returns them in sorted order, just like row keys. So in order to display a blog entry and its comments, you can retrieve all the data from one row by asking for the &lt;code&gt;content&lt;/code&gt;, &lt;code&gt;info&lt;/code&gt;, and &lt;code&gt;comments&lt;/code&gt; column families. You could also add a filter to retrieve only a specific number of comments, adding pagination to them.&lt;/p&gt;

&lt;p&gt;The &lt;code&gt;people&lt;/code&gt; table column families could also be redesigned to store contact information such as separate addresses, phone numbers, and email addresses in column families allowing all of a person&apos;s information to be stored in one row. This kind of design can work well if the number of columns is relatively modest, as blog comments and a person&apos;s contact information would be. If instead you are modeling something like an email inbox, financial transactions, or massive amounts of automatically collected sensor data, you might choose instead to spread a user&apos;s emails, transactions, or sensor readings across multiple rows (a &quot;tall&quot; design) and design the row keys to allow efficient scanning and pagination. For an inbox the row key might look like &lt;code&gt;&amp;lt;user_id&amp;gt;-&amp;lt;reversed_email_timestamp&amp;gt;&lt;/code&gt; which would permit easily scanning and paginating a user&apos;s inbox, while for financial transactions the row key might be &lt;code&gt;&amp;lt;user_id&amp;gt;-&amp;lt;reversed_transaction_timestamp&amp;gt;&lt;/code&gt;. This kind of design can be called &quot;tall&quot; since you are spreading information about the same thing (e.g. readings from the same sensor, transactions in an account) across multiple rows, and is something to consider if there will be an ever-expanding amount of information, as would be the case in a scenario involving data collection from a huge network of sensors.&lt;/p&gt;

&lt;p&gt;Designing row keys and table structures in HBase is a key part of working with HBase, and will continue to be given the fundamental architecture of HBase. There are other things you can do to add alternative schemes for data access within HBase. For example, you could implement full-text searching via Apache Lucene either within rows or external to HBase (search Google for HBASE-3529). You can also create (and maintain) secondary indexes to permit alternate row key schemes for tables; for example in our &lt;code&gt;people&lt;/code&gt; table the composite row key consists of the name and a unique identifier. But if we desire to access people by their birth date, telephone area code, email address, or any other number of ways, we could add secondary indexes to enable that form of interaction. Note, however, that adding secondary indexes is not something to be taken lightly; every time you write to the &quot;main&quot; table (e.g. &lt;code&gt;people&lt;/code&gt;) you will need to also update all the secondary indexes! (Yes, this is something that relational databases do very well, but remember that HBase is designed to accomodate a lot more data than traditional RDBMSs were.)&lt;/p&gt;

&lt;h1&gt;Conclusion to Part 5&lt;/h1&gt;

&lt;p&gt;In this part of the series, we got an introduction to schema design in HBase (without relations or SQL). Even though HBase is missing some of the features found in traditional RDBMS systems such as foreign keys and referential integrity, multi-row transactions, multiple indexes, and son on, many applications that need inherent HBase benefits like scaling can benefit from using HBase. As with anything complex, there are tradeoffs to be made. In the case of HBase, you are giving up some richness in schema design and query flexibility, but you gain the ability to scale to massive amounts of data by (more or less) simply adding additional servers to your cluster.&lt;/p&gt;

&lt;p&gt;In the next and last part of this series, we&apos;ll wrap up and mention a few (of the many) things we didn&apos;t cover in these introductory blogs.&lt;/p&gt;

&lt;h1&gt;References&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;HBase web site, &lt;a href=&quot;http://hbase.apache.org/&quot;&gt;http://hbase.apache.org/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;HBase wiki, &lt;a href=&quot;http://wiki.apache.org/hadoop/Hbase&quot;&gt;http://wiki.apache.org/hadoop/Hbase&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;HBase Reference Guide &lt;a href=&quot;http://hbase.apache.org/book/book.html&quot;&gt;http://hbase.apache.org/book/book.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;HBase: The Definitive Guide, &lt;a href=&quot;http://bit.ly/hbase-definitive-guide&quot;&gt;http://bit.ly/hbase-definitive-guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Google Bigtable Paper, http://labs.google.com/papers/bigtable.html&lt;/li&gt;
&lt;li&gt;Hadoop web site, &lt;a href=&quot;http://hadoop.apache.org/&quot;&gt;http://hadoop.apache.org/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Hadoop: The Definitive Guide, &lt;a href=&quot;http://bit.ly/hadoop-definitive-guide&quot;&gt;http://bit.ly/hadoop-definitive-guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Fallacies of Distributed Computing, &lt;a href=&quot;http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing&quot;&gt;http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;HBase lightning talk slides, &lt;a href=&quot;http://www.slideshare.net/scottleber/hbase-lightningtalk&quot;&gt;http://www.slideshare.net/scottleber/hbase-lightningtalk&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Sample code, https://github.com/sleberknight/basic-hbase-examples&lt;/li&gt;
&lt;/ul&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/handling_big_data_with_hbase3</guid>
    <title>Handling Big Data with HBase Part 4: The Java API</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/handling_big_data_with_hbase3</link>
        <pubDate>Mon, 16 Dec 2013 00:00:00 +0000</pubDate>
    <category>Development</category>
    <category>distributed-computing</category>
    <category>hadoop</category>
    <category>java</category>
    <category>hbase</category>
            <description>&lt;p&gt;&lt;em&gt;This is the fourth of an introductory series of blogs on &lt;a href=&quot;http://hbase.apache.org/&quot;&gt;Apache HBase&lt;/a&gt;. In the &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/handling_big_data_with_hbase2&quot;&gt;third part&lt;/a&gt;, we saw a high level view of HBase architecture
. In this part, we&apos;ll use the HBase Java API to create tables, insert new data, and retrieve data by row key. We&apos;ll also see how to setup a basic table scan which restricts the columns retrieved and also uses a filter to page the results.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Having just learned about HBase high-level architecture, now let&apos;s look at the Java client API since it is the way your applications interact with HBase. As mentioned earlier you can also interact with HBase via several flavors of RPC technologies like Apache Thrift plus a REST gateway, but we&apos;re going to concentrate on the native Java API. The client APIs provide both DDL (data definition language) and DML (data manipulation language) semantics very much like what you find in SQL for relational databases. Suppose we are going to store information about people in HBase, and we want to start by creating a new table. The following listing shows how to create a new table using the &lt;code&gt;HBaseAdmin&lt;/code&gt; class.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Configuration conf = HBaseConfiguration.create();
HBaseAdmin admin = new HBaseAdmin(conf);
HTableDescriptor tableDescriptor = new HTableDescriptor(TableName.valueOf(&quot;people&quot;));
tableDescriptor.addFamily(new HColumnDescriptor(&quot;personal&quot;));
tableDescriptor.addFamily(new HColumnDescriptor(&quot;contactinfo&quot;));
tableDescriptor.addFamily(new HColumnDescriptor(&quot;creditcard&quot;));
admin.createTable(tableDescriptor);
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;code&gt;people&lt;/code&gt; table defined in preceding listing contains three column families: &lt;code&gt;personal&lt;/code&gt;, &lt;code&gt;contactinfo&lt;/code&gt;, and &lt;code&gt;creditcard&lt;/code&gt;. To create a table you create an &lt;code&gt;HTableDescriptor&lt;/code&gt; and add one or more column families by adding &lt;code&gt;HColumnDescriptor&lt;/code&gt; objects. You then call &lt;code&gt;createTable&lt;/code&gt; to create the table. Now we have a table, so let&apos;s add some data. The next listing shows how to use the &lt;code&gt;Put&lt;/code&gt; class to insert data on John Doe, specifically his name and email address (omitting proper error handling for brevity).&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Configuration conf = HBaseConfiguration.create();
HTable table = new HTable(conf, &quot;people&quot;);
Put put = new Put(Bytes.toBytes(&quot;doe-john-m-12345&quot;));
put.add(Bytes.toBytes(&quot;personal&quot;), Bytes.toBytes(&quot;givenName&quot;), Bytes.toBytes(&quot;John&quot;));
put.add(Bytes.toBytes(&quot;personal&quot;), Bytes.toBytes(&quot;mi&quot;), Bytes.toBytes(&quot;M&quot;));
put.add(Bytes.toBytes(&quot;personal&quot;), Bytes.toBytes(&quot;surame&quot;), Bytes.toBytes(&quot;Doe&quot;));
put.add(Bytes.toBytes(&quot;contactinfo&quot;), Bytes.toBytes(&quot;email&quot;), Bytes.toBytes(&quot;john.m.doe@gmail.com&quot;));
table.put(put);
table.flushCommits();
table.close();
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In the above listing we instantiate a &lt;code&gt;Put&lt;/code&gt; providing the unique row key to the constructor. We then add values, which must include the column family, column qualifier, and the value all as &lt;em&gt;byte arrays&lt;/em&gt;. As you probably noticed, the HBase API&apos;s utility &lt;code&gt;Bytes&lt;/code&gt; class is used a lot; it provides methods to convert to and from &lt;code&gt;byte[]&lt;/code&gt; for primitive types and strings. (Adding a static import for the &lt;code&gt;toBytes()&lt;/code&gt; method would cut out a lot of boilerplate code.) We then put the data into the table, flush the commits to ensure locally buffered changes take effect, and finally close the table. Updating data is also done via the &lt;code&gt;Put&lt;/code&gt; class in exactly the same manner as just shown in the prior listing. Unlike relational databases in which updates must update entire rows even if only one column changed, if you only need to update a single column then that&apos;s all you specify in the &lt;code&gt;Put&lt;/code&gt; and HBase will only update that column. There is also a &lt;code&gt;checkAndPut&lt;/code&gt; operation which is essentially a form of optimistic concurrency control - the operation will only put the new data if the current values are what the client says they should be.&lt;/p&gt;

&lt;p&gt;Retrieving the row we just created is accomplished using the &lt;code&gt;Get&lt;/code&gt; class, as shown in the next listing. (From this point forward, listings will omit the boilerplate code to create a configuration, instantiate the &lt;code&gt;HTable&lt;/code&gt;, and the flush and close calls.)&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Get get = new Get(Bytes.toBytes(&quot;doe-john-m-12345&quot;));
get.addFamily(Bytes.toBytes(&quot;personal&quot;));
get.setMaxVersions(3);
Result result = table.get(get);
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The code in the previous listing instantiates a &lt;code&gt;Get&lt;/code&gt; instance supplying the row key we want to find. Next we use &lt;code&gt;addFamily&lt;/code&gt; to instruct HBase that we only need data from the &lt;code&gt;personal&lt;/code&gt; column family, which also cuts down the amount of work HBase must do when reading information from disk. We also specify that we&apos;d like up to three versions of each column in our result, perhaps so we can list historical values of each column. Finally, calling &lt;code&gt;get&lt;/code&gt; returns a &lt;code&gt;Result&lt;/code&gt; instance which can then be used to inspect all the column values returned.&lt;/p&gt;

&lt;p&gt;In many cases you need to find more than one row. HBase lets you do this by scanning rows, as shown in the &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/handling_big_data_with_hbase1
&quot;&gt;second part&lt;/a&gt; which showed using a scan in the HBase shell session. The corresponding class is the &lt;code&gt;Scan&lt;/code&gt; class. You can specify various options, such as the start and ending row key to scan, which columns and column families to include and the maximum versions to retrieve. You can also add filters, which allow you to implement custom filtering logic to further restrict which rows and columns are returned. A common use case for filters is pagination. For example, we might want to scan through all people whose last name is Smith one page (e.g. 25 people) at a time. The next listing shows how to perform a basic scan.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Scan scan = new Scan(Bytes.toBytes(&quot;smith-&quot;));
scan.addColumn(Bytes.toBytes(&quot;personal&quot;), Bytes.toBytes(&quot;givenName&quot;));
scan.addColumn(Bytes.toBytes(&quot;contactinfo&quot;), Bytes.toBytes(&quot;email&quot;));
scan.setFilter(new PageFilter(25));
ResultScanner scanner = table.getScanner(scan);
for (Result result : scanner) {
    // ...
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In the above listing we create a new &lt;code&gt;Scan&lt;/code&gt; that starts from the row key &lt;code&gt;smith-&lt;/code&gt; and we then use &lt;code&gt;addColumn&lt;/code&gt; to restrict the columns returned (thus reducing the amount of disk transfer HBase must perform) to &lt;code&gt;personal:givenName&lt;/code&gt; and &lt;code&gt;contactinfo:email&lt;/code&gt;. A &lt;code&gt;PageFilter&lt;/code&gt; is set on the scan to limit the number of rows scanned to 25. (An alternative to using the page filter would be to specify a stop row key when constructing the &lt;code&gt;Scan&lt;/code&gt;.) We then get a &lt;code&gt;ResultScanner&lt;/code&gt; for the &lt;code&gt;Scan&lt;/code&gt; just created, and loop through the results performing whatever actions are necessary. Since the only method in HBase to retrieve multiple rows of data is scanning by sorted row keys, how you design the row key values is very important. We&apos;ll come back to this topic later.&lt;/p&gt;

&lt;p&gt;You can also delete data in HBase using the &lt;code&gt;Delete&lt;/code&gt; class, analogous to the &lt;code&gt;Put&lt;/code&gt; class to delete all columns in a row (thus deleting the row itself), delete column families, delete columns, or some combination of those.&lt;/p&gt;

&lt;h1&gt;Connection Handling&lt;/h1&gt;

&lt;p&gt;In the above examples not much attention was paid to connection handling and RPCs (remote procedure calls). HBase provides the &lt;code&gt;HConnection&lt;/code&gt; class which provides functionality similar to connection pool classes to share connections, for example you use the &lt;code&gt;getTable()&lt;/code&gt; method to get a reference to an &lt;code&gt;HTable&lt;/code&gt; instance. There is also an &lt;code&gt;HConnectionManager&lt;/code&gt; class which is how you get instances of &lt;code&gt;HConnection&lt;/code&gt;. Similar to avoiding network round trips in web applications, effectively managing the number of RPCs and amount of data returned when using HBase is important, and something to consider when writing HBase applications.&lt;/p&gt;

&lt;h1&gt;Conclusion to Part 4&lt;/h1&gt;

&lt;p&gt;In this part we used the HBase Java API to create a &lt;code&gt;people&lt;/code&gt; table, insert a new person, and find the newly inserted person information. We also used the &lt;code&gt;Scan&lt;/code&gt; class to scan the &lt;code&gt;people&lt;/code&gt; table for people with last name &quot;Smith&quot; and showed how to restrict the data retrieved and finally how to use a filter to limit the number of results.&lt;/p&gt;

&lt;p&gt;In the next part, we&apos;ll learn how to deal with the absence of SQL and relations when modeling schemas in HBase.&lt;/p&gt;

&lt;h1&gt;References&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;HBase web site, &lt;a href=&quot;http://hbase.apache.org/&quot;&gt;http://hbase.apache.org/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;HBase wiki, &lt;a href=&quot;http://wiki.apache.org/hadoop/Hbase&quot;&gt;http://wiki.apache.org/hadoop/Hbase&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;HBase Reference Guide &lt;a href=&quot;http://hbase.apache.org/book/book.html&quot;&gt;http://hbase.apache.org/book/book.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;HBase: The Definitive Guide, &lt;a href=&quot;http://bit.ly/hbase-definitive-guide&quot;&gt;http://bit.ly/hbase-definitive-guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Google Bigtable Paper, http://labs.google.com/papers/bigtable.html&lt;/li&gt;
&lt;li&gt;Hadoop web site, &lt;a href=&quot;http://hadoop.apache.org/&quot;&gt;http://hadoop.apache.org/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Hadoop: The Definitive Guide, &lt;a href=&quot;http://bit.ly/hadoop-definitive-guide&quot;&gt;http://bit.ly/hadoop-definitive-guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Fallacies of Distributed Computing, &lt;a href=&quot;http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing&quot;&gt;http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;HBase lightning talk slides, &lt;a href=&quot;http://www.slideshare.net/scottleber/hbase-lightningtalk&quot;&gt;http://www.slideshare.net/scottleber/hbase-lightningtalk&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Sample code, https://github.com/sleberknight/basic-hbase-examples&lt;/li&gt;
&lt;/ul&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/handling_big_data_with_hbase2</guid>
    <title>Handling Big Data with HBase Part 3: Architecture Overview</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/handling_big_data_with_hbase2</link>
        <pubDate>Fri, 13 Dec 2013 00:00:00 +0000</pubDate>
    <category>Development</category>
    <category>distributed-computing</category>
    <category>java</category>
    <category>hadoop</category>
    <category>hbase</category>
            <description>&lt;p&gt;&lt;em&gt;This is the third blog in a series of introductory blogs on &lt;a href=&quot;http://hbase.apache.org/&quot;&gt;Apache HBase&lt;/a&gt;. In the &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/handling_big_data_with_hbase1&quot;&gt;second part&lt;/a&gt;, we saw how to interact with HBase via the shell. In this part, we&apos;ll look at the HBase architecture from a bird&apos;s eye view.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;HBase is a &lt;em&gt;distributed&lt;/em&gt; database, meaning it is designed to run on a cluster of dozens to possibly thousands or more servers. As a result it is more complicated to install than a single RDBMS running on a single server. And all the typical problems of distributed computing begin to come into play such as coordination and management of remote processes, locking, data distribution, network latency and number of round trips between servers. Fortunately HBase makes use of several other mature technologies, such as Apache Hadoop and Apache ZooKeeper, to solve many of these issues. The figure below shows the major architectural components in HBase.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;http://www.sleberknight.com/blog/sleberkn/resource/hbase-architecture.png&quot; alt=&quot;HBase Architecture&quot; title=&quot;HBase Architecture&quot; width=&quot;800&quot;/&gt;&lt;/p&gt;

&lt;p&gt;In the above figure you can see there is a single HBase master node and multiple region servers. (Note that it is possible to run HBase in a multiple master setup, in which there is a single active master.) HBase tables are partitioned into multiple regions with each region storing a range of the table&apos;s rows, and multiple regions are assigned by the master to a region server.&lt;/p&gt;

&lt;p&gt;HBase is a &lt;em&gt;column-oriented&lt;/em&gt; data store, meaning it stores data by columns rather than by rows. This makes certain data access patterns much less expensive than with traditional row-oriented relational database systems. For example, in HBase if there is no data for a given column family, it simply does not store anything at all; contrast this with a relational database which must store &lt;code&gt;null&lt;/code&gt; values explicitly. In addition, when retrieving data in HBase, you should only ask for the specific column families you need; because there can literally be millions of columns in a given row, you need to make sure you ask only for the data you actually need.&lt;/p&gt;

&lt;p&gt;HBase utilizes &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/distributed_coordination_with_zookeeper_part&quot;&gt;ZooKeeper&lt;/a&gt; (a distributed coordination service) to manage region assignments to region servers, and to recover from region server crashes by loading the crashed region server&apos;s regions onto other functioning region servers.&lt;/p&gt;

&lt;p&gt;Regions contain an in-memory data store (MemStore) and a persistent data store (HFile), and all regions on a region server share a reference to the write-ahead log (WAL) which is used to store new data that hasn&apos;t yet been persisted to permanent storage and to recover from region server crashes. Each region holds a specific range of row keys, and when a region exceeds a configurable size, HBase automatically splits the region into two child regions, which is the key to scaling HBase.&lt;/p&gt;

&lt;p&gt;As a table grows, more and more regions are created and spread across the entire cluster. When clients request a specific row key or scan a range of row keys, HBase tells them the regions on which those keys exist, and the clients then communicate directly with the region servers where those regions exist. This design minimizes the number of disk seeks required to find any given row, and optimizes HBase toward disk transfer when returning data. This is in contrast to relational databases, which might need to do a large number of disk seeks before transferring data from disk, even with indexes.&lt;/p&gt;

&lt;p&gt;The HDFS component is the Hadoop Distributed Filesystem, a distributed, fault-tolerant and scalable filesystem which guards against data loss by dividing files into blocks and spreading them across the cluster; it is where HBase actually stores data. Strictly speaking the persistent storage can be anything that implements the Hadoop &lt;code&gt;FileSystem&lt;/code&gt; API, but usually HBase is deployed onto Hadoop clusters running HDFS. In fact, when you first download and install HBase on a single machine, it uses the local filesystem until you change the configuration!&lt;/p&gt;

&lt;p&gt;Clients interact with HBase via one of several available APIs, including a native Java API as well as a REST-based interface and several RPC interfaces (Apache Thrift, Apache Avro). You can also use DSLs to HBase from Groovy, Jython, and Scala.&lt;/p&gt;

&lt;h1&gt;Conclusion to Part 3&lt;/h1&gt;

&lt;p&gt;In this part, we got a pretty high level view of HBase architecture. In the next part, we&apos;ll dive into some real code and show the basics of working with HBase via its native Java API.&lt;/p&gt;

&lt;h1&gt;References&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;HBase web site, &lt;a href=&quot;http://hbase.apache.org/&quot;&gt;http://hbase.apache.org/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;HBase wiki, &lt;a href=&quot;http://wiki.apache.org/hadoop/Hbase&quot;&gt;http://wiki.apache.org/hadoop/Hbase&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;HBase Reference Guide &lt;a href=&quot;http://hbase.apache.org/book/book.html&quot;&gt;http://hbase.apache.org/book/book.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;HBase: The Definitive Guide, &lt;a href=&quot;http://bit.ly/hbase-definitive-guide&quot;&gt;http://bit.ly/hbase-definitive-guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Google Bigtable Paper, http://labs.google.com/papers/bigtable.html&lt;/li&gt;
&lt;li&gt;Hadoop web site, &lt;a href=&quot;http://hadoop.apache.org/&quot;&gt;http://hadoop.apache.org/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Hadoop: The Definitive Guide, &lt;a href=&quot;http://bit.ly/hadoop-definitive-guide&quot;&gt;http://bit.ly/hadoop-definitive-guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Fallacies of Distributed Computing, &lt;a href=&quot;http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing&quot;&gt;http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;HBase lightning talk slides, &lt;a href=&quot;http://www.slideshare.net/scottleber/hbase-lightningtalk&quot;&gt;http://www.slideshare.net/scottleber/hbase-lightningtalk&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Sample code, https://github.com/sleberknight/basic-hbase-examples&lt;/li&gt;
&lt;/ul&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/handling_big_data_with_hbase1</guid>
    <title>Handling Big Data with HBase Part 2: First Steps</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/handling_big_data_with_hbase1</link>
        <pubDate>Thu, 12 Dec 2013 00:00:00 +0000</pubDate>
    <category>Development</category>
    <category>java</category>
    <category>hbase</category>
    <category>distributed-computing</category>
    <category>hadoop</category>
            <description>&lt;p&gt;&lt;em&gt;This is the second in a series of blogs that introduce &lt;a href=&quot;http://hbase.apache.org/&quot;&gt;Apache HBase&lt;/a&gt;. In the &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/handling_big_data_with_hbase&quot;&gt;first blog&lt;/a&gt;, we introduced HBase at a high level. In this part, we&apos;ll see how to interact with HBase via its command line shell.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Let&apos;s take a look at what working with HBase is like at the command line. HBase comes with a JRuby-based shell that lets you define and manage tables, execute CRUD operations on data, scan tables, and perform maintenance among other things. When you&apos;re in the shell, just type &lt;code&gt;help&lt;/code&gt; to get an overall help page. You can get help on specific commands or groups of commands as well, using syntax like &lt;code&gt;help &amp;lt;group&amp;gt;&lt;/code&gt; and help &lt;code&gt;command&lt;/code&gt;. For example, &lt;code&gt;help &apos;create&apos;&lt;/code&gt; provides help on creating new tables. While HBase is deployed in production on clusters of servers, you can download it and get up and running with a standalone installation in literally minutes. The first thing to do is fire up the HBase shell. The following listing shows a shell session in which we create a &lt;code&gt;blog&lt;/code&gt; table, list the available tables in HBase, add a blog entry, retrieve that entry, and scan the blog table.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ bin/hbase shell
HBase Shell; enter &apos;help&amp;lt;RETURN&amp;gt;&apos; for list of supported commands.
Type &quot;exit&amp;lt;RETURN&amp;gt;&quot; to leave the HBase Shell
Version 0.96.0-hadoop2, r1531434, Fri Oct 11 15:28:08 PDT 2013

hbase(main):001:0&amp;gt; create &apos;blog&apos;, &apos;info&apos;, &apos;content&apos;
0 row(s) in 6.0670 seconds

=&amp;gt; Hbase::Table - blog

hbase(main):002:0&amp;gt; list
TABLE
blog
fakenames
my-table
3 row(s) in 0.0300 seconds

=&amp;gt; [&quot;blog&quot;, &quot;fakenames&quot;, &quot;my-table&quot;]

hbase(main):003:0&amp;gt; put &apos;blog&apos;, &apos;20130320162535&apos;, &apos;info:title&apos;, &apos;Why use HBase?&apos;
0 row(s) in 0.0650 seconds

hbase(main):004:0&amp;gt; put &apos;blog&apos;, &apos;20130320162535&apos;, &apos;info:author&apos;, &apos;Jane Doe&apos;
0 row(s) in 0.0230 seconds

hbase(main):005:0&amp;gt; put &apos;blog&apos;, &apos;20130320162535&apos;, &apos;info:category&apos;, &apos;Persistence&apos;
0 row(s) in 0.0230 seconds

hbase(main):006:0&amp;gt; put &apos;blog&apos;, &apos;20130320162535&apos;, &apos;content:&apos;, &apos;HBase is a column-oriented...&apos;
0 row(s) in 0.0220 seconds

hbase(main):007:0&amp;gt; get &apos;blog&apos;, &apos;20130320162535&apos;
COLUMN             CELL
 content:          timestamp=1386556660599, value=HBase is a column-oriented...
 info:author       timestamp=1386556649116, value=Jane Doe
 info:category     timestamp=1386556655032, value=Persistence
 info:title        timestamp=1386556643256, value=Why use HBase?
4 row(s) in 0.0380 seconds

hbase(main):008:0&amp;gt; scan &apos;blog&apos;, { STARTROW =&amp;gt; &apos;20130300&apos;, STOPROW =&amp;gt; &apos;20130400&apos; }
ROW                COLUMN+CELL
 20130320162535    column=content:, timestamp=1386556660599, value=HBase is a column-oriented...
 20130320162535    column=info:author, timestamp=1386556649116, value=Jane Doe
 20130320162535    column=info:category, timestamp=1386556655032, value=Persistence
 20130320162535    column=info:title, timestamp=1386556643256, value=Why use HBase?
1 row(s) in 0.0390 seconds
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In the above listing we first create the &lt;code&gt;blog&lt;/code&gt; table having column families &lt;code&gt;info&lt;/code&gt; and &lt;code&gt;content&lt;/code&gt;. After listing the tables and seeing our new &lt;code&gt;blog&lt;/code&gt; table, we put some data in the table. The &lt;code&gt;put&lt;/code&gt; commands specify the table, the unique row key, the column key composed of the column family and a qualifier, and the value. For example, &lt;code&gt;info&lt;/code&gt; is the column family while &lt;code&gt;title&lt;/code&gt; and &lt;code&gt;author&lt;/code&gt; are qualifiers and so &lt;code&gt;info:title&lt;/code&gt; specifies the column &lt;code&gt;title&lt;/code&gt; in the &lt;code&gt;info&lt;/code&gt; family with value &quot;Why use HBase?&quot;. The &lt;code&gt;info:title&lt;/code&gt; is also referred to as a column key. Next we use the &lt;code&gt;get&lt;/code&gt; command to retrieve a single row and finally the &lt;code&gt;scan&lt;/code&gt; command to perform a scan over rows in the &lt;code&gt;blog&lt;/code&gt; table for a specific range of row keys. As you might have guessed, by specifying start row &lt;code&gt;20130300&lt;/code&gt; (inclusive) and end row &lt;code&gt;20130400&lt;/code&gt; (exclusive) we retrieve all rows whose row key falls within that range; in this &lt;code&gt;blog&lt;/code&gt; example this equates to all blog entries in March 2013 since the row keys are the time when an entry was published.&lt;/p&gt;

&lt;p&gt;An important characteristic of HBase is that you define column families, but then you can add any number of columns within that family, identified by the column qualifier. HBase is optimized to store columns together on disk, allowing for more efficient storage since columns that don&apos;t exist don&apos;t take up any space, unlike in a RDBMS where null values must actually be stored. Rows are defined by columns they contain; if there are no columns then the row, logically, does not exist. Continuing the above example in the following listing, we delete some specific columns from a row.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;hbase(main):009:0&amp;gt;  delete &apos;blog&apos;, &apos;20130320162535&apos;, &apos;info:category&apos;
0 row(s) in 0.0490 seconds

hbase(main):010:0&amp;gt; get &apos;blog&apos;, &apos;20130320162535&apos;
COLUMN             CELL
 content:          timestamp=1386556660599, value=HBase is a column-oriented...
 info:author       timestamp=1386556649116, value=Jane Doe
 info:title        timestamp=1386556643256, value=Why use HBase?
3 row(s) in 0.0260 seconds
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;As shown just above, you can delete a specific column from a table as we deleted the &lt;code&gt;info:category&lt;/code&gt; column. You can also delete all columns within a row and thereby delete the row using the &lt;code&gt;deleteall&lt;/code&gt; shell command. To update column values, you simply use the &lt;code&gt;put&lt;/code&gt; command again. By default HBase retains up to three versions of a column value, so if you &lt;code&gt;put&lt;/code&gt; a new value into &lt;code&gt;info:title&lt;/code&gt;, HBase will retain both the old and new version.&lt;/p&gt;

&lt;p&gt;The commands issued in the above examples show how to create, read, update, and delete data in HBase. Data retrieval comes in only two flavors: retrieving a row using &lt;code&gt;get&lt;/code&gt; and retrieving multiple rows via &lt;code&gt;scan&lt;/code&gt;. When retrieving data in HBase you should take care to retrieve only the information you actually require. Since HBase retrieves data from each column family separately, if you only need data for one column family, then you can specify to retrieve only that bit of information. In the next listing we retrieve only the blog titles for a specific row key range that equate to March through April 2013.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;hbase(main):011:0&amp;gt; scan &apos;blog&apos;, { STARTROW =&amp;gt; &apos;20130300&apos;, STOPROW =&amp;gt; &apos;20130500&apos;, COLUMNS =&amp;gt; &apos;info:title&apos; }
ROW                COLUMN+CELL
 20130320162535    column=info:title, timestamp=1386556643256, value=Why use HBase?
1 row(s) in 0.0290 seconds
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;So by setting row key ranges, restricting the columns we need, and restricting the number of versions to retrieve, you can optimize data access patterns in HBase. Of course in the above examples, all this is done from the shell, but you can do the same things, and much more, using the HBase APIs.&lt;/p&gt;

&lt;h1&gt;Conclusion to Part 2&lt;/h1&gt;

&lt;p&gt;In this second part of the HBase introductory series, we saw how to use the shell to create tables, insert data, retrieve data by row key, and saw a basic scan of data via row key range. You also saw how you can delete a specific column from a table row.&lt;/p&gt;

&lt;p&gt;In the next blog, we&apos;ll get an overview of HBase&apos;s high level architecture.&lt;/p&gt;

&lt;h1&gt;References&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;HBase web site, &lt;a href=&quot;http://hbase.apache.org/&quot;&gt;http://hbase.apache.org/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;HBase wiki, &lt;a href=&quot;http://wiki.apache.org/hadoop/Hbase&quot;&gt;http://wiki.apache.org/hadoop/Hbase&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;HBase Reference Guide &lt;a href=&quot;http://hbase.apache.org/book/book.html&quot;&gt;http://hbase.apache.org/book/book.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;HBase: The Definitive Guide, &lt;a href=&quot;http://bit.ly/hbase-definitive-guide&quot;&gt;http://bit.ly/hbase-definitive-guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Google Bigtable Paper, http://labs.google.com/papers/bigtable.html&lt;/li&gt;
&lt;li&gt;Hadoop web site, &lt;a href=&quot;http://hadoop.apache.org/&quot;&gt;http://hadoop.apache.org/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Hadoop: The Definitive Guide, &lt;a href=&quot;http://bit.ly/hadoop-definitive-guide&quot;&gt;http://bit.ly/hadoop-definitive-guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Fallacies of Distributed Computing, &lt;a href=&quot;http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing&quot;&gt;http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;HBase lightning talk slides, &lt;a href=&quot;http://www.slideshare.net/scottleber/hbase-lightningtalk&quot;&gt;http://www.slideshare.net/scottleber/hbase-lightningtalk&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Sample code, https://github.com/sleberknight/basic-hbase-examples&lt;/li&gt;
&lt;/ul&gt;
</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/handling_big_data_with_hbase</guid>
    <title>Handling Big Data with HBase Part 1: Introduction</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/handling_big_data_with_hbase</link>
        <pubDate>Tue, 10 Dec 2013 00:00:00 +0000</pubDate>
    <category>Development</category>
    <category>java</category>
    <category>distributed-computing</category>
    <category>hbase</category>
    <category>hadoop</category>
            <description>&lt;p&gt;&lt;em&gt;This is the first in a series of blogs that will introduce &lt;a href=&quot;http://hbase.apache.org/&quot;&gt;Apache HBase&lt;/a&gt;. This blog provides a brief introduction to HBase. In later blogs you will see how the the HBase shell can be used for quick and dirty data access via the command line, learn about the high-level architecture of HBase, learn the basics of the Java API, and learn how to live without SQL when designing HBase schemas.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;In the past few years we have seen a veritable explosion in various ways to store and retrieve data. The so-called NoSql databases have been leading the charge and creating all these new persistence choices. These alternatives have, in large part, become more popular due to the rise of Big Data led by companies such as Google, Amazon, Twitter, and Facebook as they have amassed vast amounts of data that must be stored, queried, and analyzed. But more and more companies are collecting massive amounts of data and they need to be able to effectively use all that data to fuel their business. For example social networks all need to be able to analyze large social graphs of people and make recommendations for who to link to next, while almost every large website out there now has a recommendation engine that tries to suggest ever more things you might want to purchase. As these businesses collect more data, they need a way to be able to easily scale-up without needing to re-write entire systems.&lt;/p&gt;

&lt;p&gt;Since the 1970s, relational database management systems (RDBMS) have dominated the data landscape. But as businesses collect, store and process more and more data, relational databases are harder and harder to scale. At first you might go from a single server to a master/slave setup, and add caching layers in front of the database to relieve load as more and more reads/writes hit the database. When performance of queries begins to degrade, usually the first thing to be dropped is indexes, followed quickly by denormalization to avoid joins as they become more costly. Later you might start to precompute (or materialize) the most costly queries so that queries then effectively become key lookups and perhaps distribute data in huge tables across multiple database shards. At this point if you step back, many of the key benefits of RDBMSs have been lost &#8212; referential integrity, ACID transactions, indexes, and so on. Of course, the scenario just described presumes you become very successful, very fast and need to handle more data with continually increasing data ingestion rates. In other words, you need to be the next Twitter.&lt;/p&gt;

&lt;p&gt;Or do you? Maybe you are working on an environment monitoring project that will deploy a network of sensors around the world, and all these sensors will produce huge amounts of data. Or maybe you are working on DNA sequencing. If you know or think you are going to have massive data storage requirements where the number of rows run into the billions and number of columns potentially in the millions, you should consider alternative databases such as HBase. These new databases are designed from the ground-up to scale horizontally across clusters of commodity servers, as opposed to vertical scaling where you try to buy the next larger server (until there are no more bigger ones available anyway).&lt;/p&gt;

&lt;h1&gt;Enter HBase&lt;/h1&gt;

&lt;p&gt;HBase is a database that provides real-time, random read and write access to tables meant to store billions of rows and millions of columns. It is designed to run on a cluster of commodity servers and to automatically scale as more servers are added, while retaining the same performance. In addition, it is fault tolerant precisely because data is divided across servers in the cluster and stored in a redundant file system such as the Hadoop Distributed File System (HDFS). When (not if) servers fail, your data is safe, and the data is automatically re-balanced over the remaining servers until replacements are online. HBase is a strongly consistent data store; changes you make are immediately visible to all other clients.&lt;/p&gt;

&lt;p&gt;HBase is modeled after Google&apos;s Bigtable, which was described in a paper written by Google in 2006 as a &quot;sparse, distributed, persistent multi-dimensional sorted map.&quot; So if you are used to relational databases, then HBase will at first seem foreign. While it has the concept of tables, they are not like relational tables, nor does HBase support the typical RDBMS concepts of joins, indexes, ACID transactions, etc. But even though you give those features up, you automatically and transparently gain scalability and fault-tolerance. HBase can be described as a key-value store with automatic data versioning.&lt;/p&gt;

&lt;p&gt;You can CRUD (create, read, update, and delete) data just as you would expect. You can also perform &lt;em&gt;scans&lt;/em&gt; of HBase table rows, which are always stored in HBase tables in ascending sort order. When you scan through HBase tables, rows are always returned in order by row key. Each row consists of a unique, sorted row key (think primary key in RDBMS terms) and an arbitrary number of columns, each column residing in a column family and having one or more versioned values. Values are simply byte arrays, and it&apos;s up to the application to transform these byte arrays as necessary to display and store them. HBase does not attempt to hide this column-oriented data model from developers, and the Java APIs are decidedly more lower-level than other persistence APIs you might have worked with. For example, JPA (Java Persistence API) and even JDBC are much more abstracted than what you find in the HBase APIs. You are working with bare metal when dealing with HBase.&lt;/p&gt;

&lt;h1&gt;Conclusion to Part 1&lt;/h1&gt;

&lt;p&gt;In this introductory blog we&apos;ve learned that HBase is a non-relational, strongly consistent, distributed key-value store with automatic data versioning. It is horizontally scaleable via adding additional servers to a cluster, and provides fault-tolerance so data is not lost when (not if) servers fail. We&apos;ve also discussed a bit about how data is organized within HBase tables; specifically each row has a unique row key, some number of column families, and an arbitrary number of columns within a family. In the next blog, we&apos;ll take first steps with HBase by showing interaction via the HBase shell.&lt;/p&gt;

&lt;h1&gt;References&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;HBase web site, &lt;a href=&quot;http://hbase.apache.org/&quot;&gt;http://hbase.apache.org/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;HBase wiki, &lt;a href=&quot;http://wiki.apache.org/hadoop/Hbase&quot;&gt;http://wiki.apache.org/hadoop/Hbase&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;HBase Reference Guide &lt;a href=&quot;http://hbase.apache.org/book/book.html&quot;&gt;http://hbase.apache.org/book/book.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;HBase: The Definitive Guide, &lt;a href=&quot;http://bit.ly/hbase-definitive-guide&quot;&gt;http://bit.ly/hbase-definitive-guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Google Bigtable Paper, http://labs.google.com/papers/bigtable.html&lt;/li&gt;
&lt;li&gt;Hadoop web site, &lt;a href=&quot;http://hadoop.apache.org/&quot;&gt;http://hadoop.apache.org/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Hadoop: The Definitive Guide, &lt;a href=&quot;http://bit.ly/hadoop-definitive-guide&quot;&gt;http://bit.ly/hadoop-definitive-guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Fallacies of Distributed Computing, &lt;a href=&quot;http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing&quot;&gt;http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;HBase lightning talk slides, &lt;a href=&quot;http://www.slideshare.net/scottleber/hbase-lightningtalk&quot;&gt;http://www.slideshare.net/scottleber/hbase-lightningtalk&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Sample code, https://github.com/sleberknight/basic-hbase-examples&lt;/li&gt;
&lt;/ul&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/distributed_coordination_with_zookeeper_part5</guid>
    <title>Distributed Coordination With ZooKeeper Part 6: Wrapping Up</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/distributed_coordination_with_zookeeper_part5</link>
        <pubDate>Tue, 16 Jul 2013 00:00:00 +0000</pubDate>
    <category>Development</category>
    <category>hadoop</category>
    <category>distributed-computing</category>
    <category>java</category>
    <category>zookeeper</category>
            <description>&lt;p&gt;&lt;em&gt;This is the sixth (and last) in a series of blogs that introduce &lt;a href=&quot;http://zookeeper.apache.org/&quot;&gt;Apache ZooKeeper&lt;/a&gt;. In the &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/distributed_coordination_with_zookeeper_part4
&quot;&gt;fifth blog&lt;/a&gt;, we implemented a distributed lock, dealing with the issues of partial failure due to connection loss and the &quot;herd effect&quot; along the way.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;em&gt;In this final blog in the series you&apos;ll learn a few tips for administering and tuning ZooKeeper, and we&apos;ll introduce the Curator and Exhibitor frameworks.&lt;/em&gt;&lt;/p&gt;

&lt;h1&gt;Administration and Tuning&lt;/h1&gt;

&lt;p&gt;As with any complex distributed system, &lt;a href=&quot;http://zookeeper.apache.org/&quot;&gt;Apache ZooKeeper&lt;/a&gt; provides administrators plenty of knobs to control its behavior. Several important properties include the &lt;code&gt;tickTime&lt;/code&gt; (the fundamental unit of time in ZooKeeper measured in milliseconds);  the &lt;code&gt;initLimit&lt;/code&gt; which is the time in ticks to allow followers to connect and sync to the leader; the &lt;code&gt;syncLimit&lt;/code&gt; which is the time in ticks to allow a follower to synchronize with the leader; and the &lt;code&gt;dataDir&lt;/code&gt; and &lt;code&gt;dataLogDir&lt;/code&gt; which are the directories where ZooKeeper stores the in-memory database snapshots and transaction log, respectively.&lt;/p&gt;

&lt;p&gt;Next, we&apos;ll cover just a few things you will want to be aware of when running a ZooKeeper ensemble in production.&lt;/p&gt;

&lt;p&gt;First, when creating a ZooKeeper ensemble you should run each node on a &lt;em&gt;dedicated&lt;/em&gt; server, meaning the only thing the server does is run an instance of ZooKeeper. The main reason you want to do this is to avoid any contention with other processes for both network and disk I/O. If you run other I/O and/or CPU-intensive processes on the same machines you are running a ZooKeeper node, you will likely see connection timeouts and other issues due to contention. I&apos;ve seen this happen in production systems, and as soon as the ZooKeeper nodes were moved to their own dedicated machines, the connection loss problems disappeared.&lt;/p&gt;

&lt;p&gt;Second, start with a three node ensemble and monitor the usage of those machines, for example using Ganglia and Nagios, to determine if your ensemble needs additional machines. Remember also to maintain an &lt;em&gt;odd&lt;/em&gt; number of machines in the ensemble, so that there can be a majority when nodes commit write operations and when they need to vote for a new leader. Another really useful tool is &lt;a href=&quot;https://github.com/phunt/zktop&quot;&gt;zktop&lt;/a&gt;, which is very similar to the &lt;code&gt;top&lt;/code&gt; command on *nix systems. It is a simple, quick and dirty way to easily start monitoring your ensemble.&lt;/p&gt;

&lt;p&gt;Third, watch out for session timeouts, and modify the &lt;code&gt;tickTime&lt;/code&gt; appropriately, for example maybe you have heavy network traffic and can increase tickTime to 5 seconds.&lt;/p&gt;

&lt;p&gt;The above three tips are by no means the end of the story when it comes to administering and tuning ZooKeeper. For more in-depth information on setting up, running, administering and monitoring a ZooKeeper ensemble see the &lt;a href=&quot;http://zookeeper.apache.org/doc/trunk/zookeeperAdmin.html&quot;&gt;ZooKeeper Administrator&apos;s Guide&lt;/a&gt; on the ZooKeeper web site. Another resource is Kathleen Ting&apos;s &lt;a href=&quot;http://www.infoq.com/presentations/Misconfiguration-ZooKeeper&quot;&gt;Building an Impenetrable ZooKeeper&lt;/a&gt; presentation which I attended at Strange Loop 2013, and which provides a lot of very useful tips for running a ZooKeeper ensemble.&lt;/p&gt;

&lt;h1&gt;Getting a Curator&lt;/h1&gt;

&lt;p&gt;So far we&apos;ve seen everything ZooKeeper provides out of the box. But when using ZooKeeper in production, you may quickly find that building recipes like distributed locks and other similar distributed data structures is harder than it looks, because you must be aware of many different kinds of problems that can arise - recall the connection loss and herd effect issues when constructing the distributed lock. You need to know when you can handle exceptions and retry an operation. For example if an idempotent operation fails during a client automatic failover event, you can simply retry the operation. The raw ZooKeeper library does not do much exception handling for you, and you need to implement retry logic yourself.&lt;/p&gt;

&lt;p&gt;Helpfully Netflix uses ZooKeeper and has developed a framework named &lt;code&gt;Curator&lt;/code&gt;, which they open sourced and later donated to Apache. The &lt;a href=&quot;http://curator.incubator.apache.org/&quot;&gt;Curator&lt;/a&gt; wiki page describes it as &quot;a set of Java libraries that make using Apache ZooKeeper much easier&quot;. While ZooKeeper comes bundled with the &lt;code&gt;ZooKeeper&lt;/code&gt; Java client, using it to develop &lt;em&gt;correct&lt;/em&gt; distributed data structures can be difficult and makes the code much harder to understand, due to problems such as connection loss and the &quot;herd effect&quot; which we saw in the previous blog.&lt;/p&gt;

&lt;p&gt;Once you have a good understanding of ZooKeeper basics, check out Curator. It provides a client that replaces (wraps) the &lt;code&gt;ZooKeeper&lt;/code&gt; class; a framework that contains a high-level API and improved connection and exception handling, along with built-in retry logic in the form of retry policies. Last, it provides a bunch of recipes that implement distributed data structures including locks, barriers, queues, and more. Curator even provides useful testing servers to run a single embedded ZooKeeper server or a test ensemble in unit tests.&lt;/p&gt;

&lt;p&gt;Even better, Netflix also created &lt;a href=&quot;http://curator.incubator.apache.org/exhibitor.html&quot;&gt;Exhibitor&lt;/a&gt;, which is a &quot;supervisor&quot; for your ZooKeeper ensemble. It provides features such as monitoring, backups, a web-based interface for znode exploration, and a RESTful API.&lt;/p&gt;

&lt;h1&gt;Conclusion&lt;/h1&gt;

&lt;p&gt;In this series of blogs you were introduced to ZooKeeper; took a test drive in the ZooKeeper shell; worked with ZooKeeper&apos;s Java API to build a group membership application as well as a distributed lock; and toured the architecture and implementation details of ZooKeeper. If nothing else, remember that ZooKeeper is like a filesystem, except distributed and replicated. It allows you to build distributed coordination and data structures, is highly available, reliable, and fast due to its leader/follower design with no single point of failure, in-memory reads, and writes via the leader to maintain sequential consistency. Last, it provides clients with (mostly) transparent and automatic session failover in case of server failure. After becoming comfortable with ZooKeeper, be sure to have a look at the Curator framework by Apache (donated by Netflix recently) and also the Exhibitor monitoring application.&lt;/p&gt;

&lt;h1&gt;References&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Source code for these blogs, &lt;a href=&quot;https://github.com/sleberknight/zookeeper-samples&quot;&gt;https://github.com/sleberknight/zookeeper-samples&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Presentation on ZooKeeper, &lt;a href=&quot;http://www.slideshare.net/scottleber/apache-zookeeper&quot;&gt;http://www.slideshare.net/scottleber/apache-zookeeper&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;ZooKeeper web site, &lt;a href=&quot;http://zookeeper.apache.org/&quot;&gt;http://zookeeper.apache.org/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;ZooKeeper Administrator&apos;s Guide &lt;a href=&quot;http://zookeeper.apache.org/doc/current/zookeeperAdmin.html&quot;&gt;http://zookeeper.apache.org/doc/current/zookeeperAdmin.html&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Projects powered by ZooKeeper, &lt;a href=&quot;https://cwiki.apache.org/confluence/display/ZOOKEEPER/PoweredBy&quot;&gt;https://cwiki.apache.org/confluence/display/ZOOKEEPER/PoweredBy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Fallacies of Distributed Computing, &lt;a href=&quot;http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing&quot;&gt;http://en.wikipedia.org/wiki/Fallacies&lt;em&gt;of&lt;/em&gt;Distributed_Computing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Hadoop web site, &lt;a href=&quot;http://hadoop.apache.org/&quot;&gt;http://hadoop.apache.org/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Hadoop: The Definitive Guide, &lt;a href=&quot;http://bit.ly/hadoop-definitive-guide&quot;&gt;http://bit.ly/hadoop-definitive-guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Apache Blur (incubating) web site, &lt;a href=&quot;http://incubator.apache.org/blur/&quot;&gt;http://incubator.apache.org/blur/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Apache Curator, &lt;a href=&quot;http://curator.incubator.apache.org/&quot;&gt;http://curator.incubator.apache.org/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Netflix Exhibitor, &lt;a href=&quot;https://github.com/Netflix/exhibitor/wiki&quot;&gt;https://github.com/Netflix/exhibitor/wiki&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;zktop, &lt;a href=&quot;https://github.com/phunt/zktop&quot;&gt;https://github.com/phunt/zktop&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Building an Impenetrable ZooKeeper, &lt;a href=&quot;http://www.infoq.com/presentations/Misconfiguration-ZooKeeper&quot;&gt;http://www.infoq.com/presentations/Misconfiguration-ZooKeeper&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/distributed_coordination_with_zookeeper_part4</guid>
    <title>Distributed Coordination With ZooKeeper Part 5: Building a Distributed Lock</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/distributed_coordination_with_zookeeper_part4</link>
        <pubDate>Thu, 11 Jul 2013 00:00:00 +0000</pubDate>
    <category>Development</category>
    <category>distributed-computing</category>
    <category>zookeeper</category>
    <category>java</category>
    <category>hadoop</category>
            <description>&lt;p&gt;&lt;em&gt;This is the fifth in a series of blogs that introduce &lt;a href=&quot;http://zookeeper.apache.org/&quot;&gt;Apache ZooKeeper&lt;/a&gt;. In the &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/distributed_coordination_with_zookeeper_part3&quot;&gt;fourth blog&lt;/a&gt;, you saw a high-level view of ZooKeeper&apos;s architecture and data consistency guarantees. In this blog, we&apos;ll use all the knowledge we&apos;ve gained thus far to implement a distributed lock.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;You&apos;ve now seen how to interact with &lt;a href=&quot;http://zookeeper.apache.org/&quot;&gt;Apache ZooKeeper&lt;/a&gt; and learned about its architecture and consistency model. Let&apos;s now use that knowledge to build a distributed lock. The goals are to build a mutually exclusive lock between processes that could be running on different machines, possibly even on different networks or different data centers. This also has the benefit that clients know nothing about each other; they only know they need to use the lock to access some shared resource, and that they should not access it unless they own the lock.&lt;/p&gt;

&lt;p&gt;To build the lock, we&apos;ll create a persistent znode that will serve as the parent. Clients wishing to obtain the lock will create sequential, ephemeral child znodes under the parent znode. The lock is owned by the client process whose child znode has the lowest sequence number. In Figure 2, there are three children of the &lt;code&gt;lock-node&lt;/code&gt; and &lt;code&gt;child-1&lt;/code&gt; owns the lock at this point in time, since it has the lowest sequence number. After &lt;code&gt;child-1&lt;/code&gt; is removed, the lock is relinquished and then the client who owns &lt;code&gt;child-2&lt;/code&gt; owns the lock, and so on.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 2 - Parent lock znode and child znodes&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;http://www.sleberknight.com/blog/sleberkn/resource/dist-lock-nodes-small.png&quot; alt=&quot;Distributed Lock Nodes&quot; title=&quot;Distributed Lock Node&quot; width=&quot;800&quot; /&gt;&lt;/p&gt;

&lt;p&gt;The algorithm for clients to determine if they own the lock is straightforward, on the surface anyway. A client creates a new sequential ephemeral znode under the parent lock znode. The client then gets the children of the lock node and sets a watch on the lock node. If the child znode that the client created has the lowest sequence number, then the lock is acquired, and it can perform whatever actions are necessary with the resource that the lock is protecting. If the child znode it created does not have the lowest sequence number, then wait for the watch to trigger a watch event, then perform the same logic of getting the children, setting a watch, and checking for lock acquisition via the lowest sequence number. The client continues this process until the lock is acquired.&lt;/p&gt;

&lt;p&gt;While this doesn&apos;t sound too bad there are a few potential gotchas. First, how would the client know that it successfully created the child znode if there is a partial failure (e.g. due to connection loss) during znode creation? The solution is to embed the client ZooKeeper session IDs in the child znode names, for example &lt;code&gt;child-&amp;lt;sessionId&amp;gt;-&lt;/code&gt;; a failed-over client that retains the same session (and thus session ID) can easily determine if the child znode was created by looking for its session ID amongst the child znodes. Second, in our earlier algorithm, every client sets a watch on the parent lock znode. But this has the potential to create a &quot;herd effect&quot; - if every client is watching the parent znode, then every client is notified when any changes are made to the children, regardless of whether a client would be able to own the lock. If there are a small number of clients this probably doesn&apos;t matter, but if there are a large number it has the potential for a spike in network traffic. For example, the client owning &lt;code&gt;child-9&lt;/code&gt; need only watch the child immediately preceding it, which is most likely &lt;code&gt;child-8&lt;/code&gt; but could be an earlier child if the 8th child znode somehow died. Then, notifications are sent only to the client that can actually take ownership of the lock.&lt;/p&gt;

&lt;p&gt;Fortunately for us, ZooKeeper comes with a lock &quot;recipe&quot; in the contrib modules called &lt;code&gt;WriteLock&lt;/code&gt;. &lt;code&gt;WriteLock&lt;/code&gt; implements a distributed lock using the above algorithm and takes into account partial failure and the herd effect. It uses an asynchronous callback model via a &lt;code&gt;LockListener&lt;/code&gt; instance, whose &lt;code&gt;lockAcquired&lt;/code&gt; method is called when the lock is acquired and &lt;code&gt;lockReleased&lt;/code&gt; method is called when the lock is released. We can build a synchronous lock class on top of &lt;code&gt;WriteLock&lt;/code&gt; by blocking until the lock is acquired. Listing 6 shows how we use a &lt;code&gt;CountDownLatch&lt;/code&gt; to block until the &lt;code&gt;lockAcquired&lt;/code&gt; method is called. (Sample code for this blog is available on GitHub at &lt;a href=&quot;https://github.com/sleberknight/zookeeper-samples&quot;&gt;https://github.com/sleberknight/zookeeper-samples&lt;/a&gt;)&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Listing 6 - Creating BlockingWriteLock on top of WriteLock&lt;/em&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public class BlockingWriteLock {
  private String path;
  private WriteLock writeLock;
  private CountDownLatch signal = new CountDownLatch(1);

  public BlockingWriteLock(ZooKeeper zookeeper,
          String path, List&amp;lt;ACL&amp;gt; acls) {
    this.path = path;
    this.writeLock =
        new WriteLock(zookeeper, path, acls, new SyncLockListener());
  }

  public void lock() throws InterruptedException, KeeperException {
    writeLock.lock();
    signal.await();
  }

  public void unlock() {
    writeLock.unlock();
  }

  class SyncLockListener implements LockListener {
    @Override public void lockAcquired() {
      signal.countDown();
    }

    @Override public void lockReleased() { /* ignored */ }
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You can then use the &lt;code&gt;BlockingWriteLock&lt;/code&gt; as shown in Listing 7.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Listing 7 - Using BlockingWriteLock&lt;/em&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;BlockingWriteLock lock =
  new BlockingWriteLock(zooKeeper, path, ZooDefs.Ids.OPEN_ACL_UNSAFE);
try {
  lock.lock();
  // do something while we own the lock
} catch (Exception ex) {
  // handle appropriately
} finally {
  lock.unlock();
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You can take this a step further, wrapping the try/catch/finally logic and creating a class that takes commands which implement an interface. For example, you can create a &lt;code&gt;DistributedLockOperationExecutor&lt;/code&gt; class that implements a &lt;code&gt;withLock&lt;/code&gt; method that takes a &lt;code&gt;DistributedLockOperation&lt;/code&gt; instance as an argument, as shown in Listing 8.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Listing 8 - Wrapping the BlockingWriteLock try/catch/finally logic&lt;/em&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;DistributedLockOperationExecutor executor =
  new DistributedLockOperationExecutor(zooKeeper);
executor.withLock(lockPath, ZooDefs.Ids.OPEN_ACL_UNSAFE,
  new DistributedLockOperation() {
    @Override public Object execute() {
      // do something while we have the lock
    }
  });
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The nice thing about wrapping try/catch/finally logic in &lt;code&gt;DistributedLockOperationExecutor&lt;/code&gt; is that when you call &lt;code&gt;withLock&lt;/code&gt; you eliminate boilerplate code and you cannot possibly forget to unlock the lock.&lt;/p&gt;

&lt;h1&gt;Conclusion to Part 5&lt;/h1&gt;

&lt;p&gt;In this fifth blog on ZooKeeper, you implemented a distributed lock and saw some of the potential problems that should be avoided such as partial failure on connection loss, and the &quot;herd effect&quot;.  We took our initial distributed lock and cleaned it up a bit, which resulted in a synchronous implementation using the &lt;code&gt;DistributedLockOperationExecutor&lt;/code&gt; and &lt;code&gt;DistributedLockOperation&lt;/code&gt; which ensures proper connection handling and lock release.&lt;/p&gt;

&lt;p&gt;In the next (and final) blog, we&apos;ll briefly touch on administration and tuning ZooKeeper and introduce the Apache Curator framework, and finally summarize what we&apos;ve learned.&lt;/p&gt;

&lt;h1&gt;References&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Source code for these blogs, &lt;a href=&quot;https://github.com/sleberknight/zookeeper-samples&quot;&gt;https://github.com/sleberknight/zookeeper-samples&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Presentation on ZooKeeper, &lt;a href=&quot;http://www.slideshare.net/scottleber/apache-zookeeper&quot;&gt;http://www.slideshare.net/scottleber/apache-zookeeper&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;ZooKeeper web site, &lt;a href=&quot;http://zookeeper.apache.org/&quot;&gt;http://zookeeper.apache.org/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/distributed_coordination_with_zookeeper_part3</guid>
    <title>Distributed Coordination With ZooKeeper Part 4: Architecture from 30,000 Feet</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/distributed_coordination_with_zookeeper_part3</link>
        <pubDate>Mon, 8 Jul 2013 00:00:00 +0000</pubDate>
    <category>Development</category>
    <category>distributed-computing</category>
    <category>java</category>
    <category>hadoop</category>
    <category>zookeeper</category>
            <description>&lt;p&gt;&lt;em&gt;This is the fourth in a series of blogs that introduce &lt;a href=&quot;http://zookeeper.apache.org/&quot;&gt;Apache ZooKeeper&lt;/a&gt;. In the &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/distributed_coordination_with_zookeeper_part2&quot;&gt;third blog&lt;/a&gt;, you implemented a group membership example using the ZooKeeper Java API. In this blog, we&apos;ll get an overview of ZooKeeper&apos;s architecture.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Now that we&apos;ve test driven &lt;a href=&quot;http://zookeeper.apache.org/&quot;&gt;Apache ZooKeeper&lt;/a&gt; in the shell and Java code, let&apos;s take a bird&apos;s eye view of the ZooKeeper architecture, and expand on the core concepts discussed earlier. As previously mentioned, ZooKeeper is essentially a distributed, hierarchical filesystem comprised of znodes, which can be either persistent or ephemeral. Persistent znodes can have chidren, whereas ephemeral nodes cannot, and persistent znodes persist after client sessions expire or disconnect. In contrast, ephemeral nodes cannot have children, and they are automatically destroyed as soon as the session in which they were created is closed. Both persistent and ephemeral znodes can have associated data, however the data must be less than 1MB (per znode). All znodes can optionally be sequential, for which ZooKeeper maintains a monotonically increasing number which is automatically appended to the znode name upon creation. Each sequence number is guaranteed to be unique. Finally, all znode operations (reads and writes) are atomic; they either succeed or fail and there is never a partial application of an operation. For example, if a client tries to set data on a znode, the operation will either set the data in its entirely, or no data will be changed at all.&lt;/p&gt;

&lt;p&gt;A key element of ZooKeeper&apos;s architecture is the ability to set watches on read operations such as &lt;code&gt;exist&lt;/code&gt;, &lt;code&gt;getChildren&lt;/code&gt;, and &lt;code&gt;getData&lt;/code&gt;. Write operations (i.e. &lt;code&gt;create&lt;/code&gt;, &lt;code&gt;delete&lt;/code&gt;, &lt;code&gt;setData&lt;/code&gt;) on znodes trigger any watches previously set on those znodes, and watchers are notified via a &lt;code&gt;WatchedEvent&lt;/code&gt;. How clients respond to events is entirely up to them, but setting watches and receiving notifications at some later point in time results in an event-driven, decoupled architecture. Suppose client A sets a watch on a znode. At some point in the future, when client B performs a write operation on the znode client A is watching, a &lt;code&gt;WatchedEvent&lt;/code&gt; is generated and client A is called back via the &lt;code&gt;processResult&lt;/code&gt; method. Client A and B are completely independent and need not even know anything about each other, so long as they each know their own responsibilities in relation to specific znodes.&lt;/p&gt;

&lt;p&gt;Important to remember about watches is that they are &lt;em&gt;one-time notifications&lt;/em&gt; about changes to a znode. If a client receives a &lt;code&gt;WatchedEvent&lt;/code&gt; notification, it &lt;em&gt;must&lt;/em&gt; re-register a new &lt;code&gt;Watcher&lt;/code&gt; if it wants to be notified about future updates. During the period between receipt of the notification and re-registration, there exists the possibility that other clients could perform write operations on the znode before the new &lt;code&gt;Watcher&lt;/code&gt; is registered which the client would not know about. In other words, it is entirely possible in a high write volume environment that a client can miss updates during the time it takes to process an event and re-register a new watch. Clients should assume updates can be missed, and not rely on having a complete history of every single event that occurs to a given znode.&lt;/p&gt;

&lt;p&gt;ZooKeeper implements the hierarchical filesystem via an &quot;ensemble&quot; of servers. Figure 1 shows a three server ensemble with multiple clients reading and one client writing. The basic idea is that the filesystem state is replicated on each server in the ensemble, both on disk and in memory. &lt;/p&gt;

&lt;p&gt;&lt;em&gt;Figure 1 - ZooKeeper Ensemble&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;http://www.sleberknight.com/blog/sleberkn/resource/zk-architecture.png&quot; alt=&quot;ZooKeeper Architecture&quot; title=&quot;ZooKeeper Architecture&quot; width=&quot;800&quot; /&gt;&lt;/p&gt;

&lt;p&gt;In Figure 1 you can see one of the servers in the ensemble acts as the leader, while the rest are followers. When an ensemble is first started, a &lt;em&gt;leader election&lt;/em&gt; is held. During leader election, a leader is elected and the process is complete onces a simple majority of followers have synchronized their state with the leader. After leader election is complete, all write requests are routed through the leader, and changes are broacast to all followers - this is termed &lt;em&gt;atomic broadcast&lt;/em&gt;. Once a majority of followers have persisted the change (to disk and memory), the leader commits the change and notifies the client of a successful update. Because only a majority of followers are required for a successful update, followers can lag the leader which means ZooKeeper is an &lt;em&gt;eventually consistent&lt;/em&gt; system. Thus, different clients can read information about a given znode and receive a different answer. Every write is assigned a globally unique, sequentially ordered identifier called a &lt;code&gt;zxid&lt;/code&gt;, or ZooKeeper transaction id. This guarantees a global order to all updates in a ZooKeeper ensemble. In addition, because &lt;em&gt;all&lt;/em&gt; writes go through the leader, write throughput does not scale as more nodes are added.&lt;/p&gt;

&lt;p&gt;This leader/follower architecture is not a master/slave setup, however, since the leader is not a single point of failure. If a leader dies, then a new leader election takes place and a new leader is elected (this is typically very fast and will not noticeably degrade performance, however). In addition, because leader election and writes both require a simple majority of servers, ZooKeeper ensembles should contain an odd number of machines; in a five node ensemble any two machines can go down and ZooKeeper can still remain available, whereas a six node ensemble can also only handle two machines going down because if three nodes fail, the remaining three are not a majority (of the original six).&lt;/p&gt;

&lt;p&gt;All client read requests are served directly from the memory of the server they are connected to, which makes reads very fast. In addition, clients have no knowledge about the server they are connected to and do not know if they are connected to a leader or follower. Because reads are from the in-memory representation of the filesystem, read throughput increases as servers are added to an ensemble. But recall that write throughput is limited by the leader, so you cannot simply add more and more ZooKeepers forever and expect performance to increase.&lt;/p&gt;

&lt;h1&gt;Data Consistency&lt;/h1&gt;

&lt;p&gt;With ZooKeeper&apos;s leader/follower architecture in mind, let&apos;s consider what guarantees it makes regarding data consistency.&lt;/p&gt;

&lt;h2&gt;Sequential Updates&lt;/h2&gt;

&lt;p&gt;ZooKeeper guarantees that updates are made to the filesystem in the order they are received from clients. Since all writes route through the leader, the global order is simply the order in which the leader receives write requests.&lt;/p&gt;

&lt;h2&gt;Atomicity&lt;/h2&gt;

&lt;p&gt;All updates either succeed or fail, just like transactions in ACID-compliant relational databases. ZooKeeper, as of version 3.4.0, supports transactions as a thin wrapper around the &lt;code&gt;multi&lt;/code&gt; operation, which performs a list of operations (instances of the &lt;code&gt;Op&lt;/code&gt; class) and either all operations succeed or none succeed. So if you need to ensure that multiple znodes are updated at the same time, for example if two znodes are part of a graph, then you can use &lt;code&gt;multi&lt;/code&gt; or the transaction wrapper around &lt;code&gt;multi&lt;/code&gt;.&lt;/p&gt;

&lt;h2&gt;Consistent client view&lt;/h2&gt;

&lt;p&gt;Consistent client view means that a client will see the same view of the system, regardless of which server it is connected to. The offical ZooKeeper documentation calls this &quot;single system image&quot;. So, if a client fails over to a different server during a session, it will never see an older view of the system than it has previously seen. A server will not accept a connection from a client until it has caught up with the state of the server to which the client was previously connected.&lt;/p&gt;

&lt;h2&gt;Durability&lt;/h2&gt;

&lt;p&gt;If an update succeeds, ZooKeeper guarantees it has been persisted and will survive server failures, even if all ZooKeeper ensemble nodes were forcefully killed at the same time! (Admittedly this would be an extreme situation, but the update would survive such an apocalypse.)&lt;/p&gt;

&lt;h2&gt;Eventual consistency&lt;/h2&gt;

&lt;p&gt;Because followers may lag the leader, ZooKeeper is an eventually consistent system. But ZooKeeper limits the amount of time a follower can lag the leader, and a follower will take itself offline if it falls too far behind. Clients can force a server to catch up with the leader by calling the asynchronous &lt;code&gt;sync&lt;/code&gt; command. Despite the fact that &lt;code&gt;sync&lt;/code&gt; is asynchronous, a ZooKeeper server will not process any operations until it has caught up to the leader.&lt;/p&gt;

&lt;h1&gt;Conclusion to Part 4&lt;/h1&gt;

&lt;p&gt;In this fourth blog on ZooKeeper you saw a bird&apos;s eye view of ZooKeeper&apos;s architecture, and learned about its data consistency guarantees. You also learned that ZooKeeper is an &lt;em&gt;eventually consistent&lt;/em&gt; system.&lt;/p&gt;

&lt;p&gt;In the next blog, we&apos;ll dive back into some code and use what we&apos;ve learned so far to build a distributed lock.&lt;/p&gt;

&lt;h1&gt;References&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Source code for these blogs, &lt;a href=&quot;https://github.com/sleberknight/zookeeper-samples&quot;&gt;https://github.com/sleberknight/zookeeper-samples&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Presentation on ZooKeeper, &lt;a href=&quot;http://www.slideshare.net/scottleber/apache-zookeeper&quot;&gt;http://www.slideshare.net/scottleber/apache-zookeeper&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;ZooKeeper web site, &lt;a href=&quot;http://zookeeper.apache.org/&quot;&gt;http://zookeeper.apache.org/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/distributed_coordination_with_zookeeper_part2</guid>
    <title>Distributed Coordination With ZooKeeper Part 3: Group Membership Example</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/distributed_coordination_with_zookeeper_part2</link>
        <pubDate>Tue, 2 Jul 2013 00:00:00 +0000</pubDate>
    <category>Development</category>
    <category>java</category>
    <category>hadoop</category>
    <category>distributed-computing</category>
    <category>zookeeper</category>
            <description>&lt;p&gt;&lt;em&gt;This is the third in a series of blogs that introduce &lt;a href=&quot;http://zookeeper.apache.org/&quot;&gt;Apache ZooKeeper&lt;/a&gt;. In the &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/distributed_coordination_with_zookeeper_part1&quot;&gt;second blog&lt;/a&gt;, you took a test drive of ZooKeeper using its command-line shell. In this blog, we&apos;ll re-implement the group membership example using the ZooKeeper Java API.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href=&quot;http://zookeeper.apache.org/&quot;&gt;Apache ZooKeeper&lt;/a&gt; is implemented in Java, and its native API is also Java. ZooKeeper also provides a C language API, and the distribution provides contrib modules for Perl, Python, and RESTful clients. The ZooKeeper APIs come in two flavors, synchronous or asynchronous. Which one you use depends on the situation. For example you might choose the asynchronous Java API if you are implementing a Java application to process a large number of child znodes independently of one another; in this case you could make good use of the asynchronous API to simultaneously launch all the independent tasks in parallel. On the other hand, if you are implementing simple tasks that perform sequential operations in ZooKeeper, the synchronous API is easier to use and might be a better fit in such cases.&lt;/p&gt;

&lt;p&gt;For our group membership example, we&apos;ll use the synchronous Java API. The first thing we need to do is connect to ZooKeeper and get an instance of &lt;code&gt;ZooKeeper&lt;/code&gt;, which is the main client API through which you perform operations like creating znodes, setting data on znodes, listing znodes, and so on. The &lt;code&gt;ZooKeeper&lt;/code&gt; constructor launches a separate thread to connect, and returns immediately. As a result, you need to watch for the &lt;code&gt;SyncConnected&lt;/code&gt; event which indicates when the connection has been established. Listing 1 shows code to connect to ZooKeeper, in which we use a &lt;code&gt;CountDownLatch&lt;/code&gt; to block until we&apos;ve received the connected event. (Sample code for this blog is available on GitHub at &lt;a href=&quot;https://github.com/sleberknight/zookeeper-samples&quot;&gt;https://github.com/sleberknight/zookeeper-samples&lt;/a&gt;).&lt;/p&gt;

&lt;p&gt;Listing 1 - Connecting to ZooKeeper&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public ZooKeeper connect(String hosts, int sessionTimeout)
        throws IOException, InterruptedException {
  final CountDownLatch connectedSignal = new CountDownLatch(1);
  ZooKeeper zk = new ZooKeeper(hosts, sessionTimeout, new Watcher() {
    @Override
    public void process(WatchedEvent event) {
      if (event.getState() == Watcher.Event.KeeperState.SyncConnected) {
        connectedSignal.countDown();
      }
    }
  });
  connectedSignal.await();
  return zk;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The next thing we need to do is create a znode for the group. As in the test drive, this znode should be persistent, so that it hangs around regardless of whether any clients are connected or not. Listing 2 shows creating a group znode.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Listing 2 - Creating the group znode&lt;/em&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public void createGroup(String groupName)
        throws KeeperException, InterruptedException {
  String path = &quot;/&quot; + groupName;
  zk.create(path,
            null /* data */,
            ZooDefs.Ids.OPEN_ACL_UNSAFE,
            CreateMode.PERSISTENT);
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Note in Listing 2 that we prepended a leading slash to the group name since ZooKeeper requires that all paths be absolute. The &lt;code&gt;create&lt;/code&gt; operation takes arguments for the path, a &lt;code&gt;byte[]&lt;/code&gt; for data which is optional, a list of ACLs (access control list) to control who can access the znode, and finally the type of znode, in this case persistent. Creating the group member znodes is almost identical to creating the group znode, except we need to create an ephemeral, sequential znode. Let&apos;s also say that we need to store some information about each member, so we&apos;ll set data on the member znodes. This is shown in Listing 3.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Listing 3 - Creating group member znodes with data&lt;/em&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public String joinGroup(String groupName, String memberName, byte[] data)
        throws KeeperException, InterruptedException {
  String path = &quot;/&quot; + groupName + &quot;/&quot; + memberName + &quot;-&quot;;
  String createdPath = zk.create(path,
          data,
          ZooDefs.Ids.OPEN_ACL_UNSAFE,
          CreateMode.EPHEMERAL_SEQUENTIAL);
  return createdPath;
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now that we can create the group allow members to join the group, it would be nice to have some way to monitor the group membership. To do this we&apos;ll first need to list children for the group znode, then set a watch on the group znode, and whenever the watch triggers an event, we&apos;ll query ZooKeeper for the group&apos;s (updated) members, as shown in Listing 4. This process continues in an infinite loop, hence the class name &lt;code&gt;ListGroupForever&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Listing 4 - Listing a group&apos;s members indefinitely&lt;/em&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public class ListGroupForever {
  private ZooKeeper zooKeeper;
  private Semaphore semaphore = new Semaphore(1);

  public ListGroupForever(ZooKeeper zooKeeper) {
    this.zooKeeper = zooKeeper;
  }

  public static void main(String[] args) throws Exception {
    ZooKeeper zk = new ConnectionHelper().connect(args[0]);
    new ListGroupForever(zk).listForever(args[1]);
  }

  public void listForever(String groupName)
          throws KeeperException, InterruptedException {
    semaphore.acquire();
    while (true) {
      list(groupName);
      semaphore.acquire();
    }
  }

  private void list(String groupName)
          throws KeeperException, InterruptedException {
    String path = &quot;/&quot; + groupName;
    List&amp;lt;String&amp;gt; children = zooKeeper.getChildren(path, new Watcher() {
      @Override
      public void process(WatchedEvent event) {
        if (event.getType() == Event.EventType.NodeChildrenChanged) {
          semaphore.release();
        }
      }
    });
    if (children.isEmpty()) {
      System.out.printf(&quot;No members in group %s\n&quot;, groupName);
      return;
    }
    Collections.sort(children);
    System.out.println(children);
    System.out.println(&quot;--------------------&quot;);
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The &lt;code&gt;ListGroupForever&lt;/code&gt; class in Listing 4 has some interesting characteristics. The &lt;code&gt;listForever&lt;/code&gt; method loops infinitely and uses a semaphore to block until changes occur to the group node. The &lt;code&gt;list&lt;/code&gt; method calls &lt;code&gt;getChildren&lt;/code&gt; to actually retrieve the child nodes from ZooKeeper, and critically sets a &lt;code&gt;Watcher&lt;/code&gt; to watch for changes of type &lt;code&gt;NodeChildrenChanged&lt;/code&gt;. When the &lt;code&gt;NodeChildrenChanged&lt;/code&gt; event occurs, the watcher releases the semaphore, which permits &lt;code&gt;listForever&lt;/code&gt; to re-acquire the semaphore and then retrieve and display the updated group znodes. This process continues until &lt;code&gt;ListGroupForever&lt;/code&gt; is terminated.&lt;/p&gt;

&lt;p&gt;To round out the example, we&apos;ll create a method to delete the group. As shown in the test drive, ZooKeeper doesn&apos;t permit znodes that have children to be deleted, so we first need to delete all the children, and then delete the group (parent) znode. This is shown in Listing 5.&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Listing 5 - Deleting a group&lt;/em&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;public void delete(String groupName)
        throws KeeperException, InterruptedException {
  String path = &quot;/&quot; + groupName;
  try {
    List&amp;lt;String&amp;gt; children = zk.getChildren(path, false);
    for (String child : children) {
      zk.delete(path + &quot;/&quot; + child, -1);
    }
    zk.delete(path, -1);
  }
  catch (KeeperException.NoNodeException e) {
    System.out.printf(&quot;Group %s does not exist\n&quot;, groupName);
  }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;When deleting a group, we passed &lt;code&gt;-1&lt;/code&gt; to the &lt;code&gt;delete&lt;/code&gt; method to unconditionally delete the znodes. We could also have passed in a version, so that if we have the correct version number, the znode is deleted but otherwise we receive an optimistic locking violation in the form of a &lt;code&gt;BadVersionException&lt;/code&gt;.&lt;/p&gt;

&lt;h1&gt;Conclusion to Part 3&lt;/h1&gt;

&lt;p&gt;In this third blog on ZooKeeper, we implemented a group membership example using the Java API. You saw how to connect to ZooKeeper; how to create persistent, ephemeral, and sequential znodes; how to list znodes and set watches to receive events; and finally how to delete znodes.&lt;/p&gt;

&lt;p&gt;In the next blog, we&apos;ll back off from the code level and get an overview of ZooKeeper&apos;s architecture. &lt;/p&gt;

&lt;h1&gt;References&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Source code for these blogs, &lt;a href=&quot;https://github.com/sleberknight/zookeeper-samples&quot;&gt;https://github.com/sleberknight/zookeeper-samples&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Presentation on ZooKeeper, &lt;a href=&quot;http://www.slideshare.net/scottleber/apache-zookeeper&quot;&gt;http://www.slideshare.net/scottleber/apache-zookeeper&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;ZooKeeper web site, &lt;a href=&quot;http://zookeeper.apache.org/&quot;&gt;http://zookeeper.apache.org/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/distributed_coordination_with_zookeeper_part1</guid>
    <title>Distributed Coordination With ZooKeeper Part 2: Test Drive</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/distributed_coordination_with_zookeeper_part1</link>
        <pubDate>Fri, 28 Jun 2013 00:00:00 +0000</pubDate>
    <category>Development</category>
    <category>zookeeper</category>
    <category>java</category>
    <category>hadoop</category>
    <category>distributed-computing</category>
            <description>&lt;p&gt;&lt;em&gt;This is the second in a series of blogs that introduce &lt;a href=&quot;http://zookeeper.apache.org/&quot;&gt;Apache ZooKeeper&lt;/a&gt;. In the &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/distributed_coordination_with_zookeeper_part&quot;&gt;first blog&lt;/a&gt;, you got an introduction to ZooKeeper and its core concepts. In this blog, you&apos;ll take a brief test drive of ZooKeeper using its command line shell. This is a really fast and convenient way to get up and running with ZooKeeper immediately.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;To get an idea of some of the basic building blocks in &lt;a href=&quot;http://zookeeper.apache.org/&quot;&gt;Apache ZooKeeper&lt;/a&gt;, let&apos;s take a test drive. ZooKeeper comes with a command-line shell that you can connect to and interact with the service. The following listing shows connecting to the shell, listing the znodes at the root level, and creating a znode named &lt;code&gt;/sample-group&lt;/code&gt; which will serve as a parent znode for some other znodes that we&apos;ll create in a moment. All paths in ZooKeeper must be &lt;em&gt;absolute&lt;/em&gt; and begin with a &lt;code&gt;/&lt;/code&gt;. The first argument to the create command is the path, while the second is the data that is associated with the znode. Note also that when a connection is established, the default watcher sends the &lt;code&gt;SyncConnected&lt;/code&gt; event, which you see in the listing below.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ ./zkCli.sh
Connecting to localhost:2181
Welcome to ZooKeeper!
JLine support is enabled

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0] ls /
[zookeeper]
[zk: localhost:2181(CONNECTED) 1] create /sample-group a-sample-group
Created /sample-group
[zk: localhost:2181(CONNECTED) 2] ls /
[sample-group, zookeeper]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;At this point we want to create some child znodes under &lt;code&gt;/sample-group&lt;/code&gt;. ZooKeeper znodes can be either &lt;em&gt;persistent&lt;/em&gt; or &lt;em&gt;ephemeral&lt;/em&gt;. Persistent znodes are permanent and once created, stick around until they are explicitly deleted. On the other hand, ephemeral znodes exist only as long as the client who created them is alive; once the client goes away for any reason, all ephemeral znodes it created are automatically destroyed. As you might imagine, if we want to build a group membership service for a distributed system, the client ( which is a group member) should indicate its status via ephemeral znodes, so that if it dies, the znode representing its membership is destroyed thus indicating the client is no longer a member of the group. When we created the group, we created a persistent znode. To create an ephemeral znode we use the &lt;code&gt;-e&lt;/code&gt; option. In addition, maybe we&apos;d like to know the order in which clients joined our group. ZooKeeper znodes can be automatically and uniquely ordered by their parent. In the shell we use &lt;code&gt;-s&lt;/code&gt; to indicate we want to create the child znode as a &lt;em&gt;sequential&lt;/em&gt; znode. Note also that we named the child nodes &lt;code&gt;/sample-group/child-&lt;/code&gt; in each case. When creating sequential znodes, it is typical to end the name with a dash, to which a unique, monotonically increasing integer is automatically appended.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;[zk: localhost:2181(CONNECTED) 3] create -s -e /sample-group/child- data-1
Created /sample-group/child-0000000000
[zk: localhost:2181(CONNECTED) 4] create -s -e /sample-group/child- data-2
Created /sample-group/child-0000000001
[zk: localhost:2181(CONNECTED) 5] create -s -e /sample-group/child- data-3
Created /sample-group/child-0000000002
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now let&apos;s set a watch on the &lt;code&gt;/sample-group&lt;/code&gt; znode in order to receive change notifications whenever a child znode is added or removed. Setting the watch lets us monitor the group for changes and react accordingly. For example, if we are building a distributed search engine and a server in the search cluster dies, we need to know about that event and move the data held by the (now dead) server across the remaining servers, assuming the data is stored redundantly such as in Hadoop. This is exactly what the Apache Blur distributed search engine does in order to ensure data is not lost and that the cluster continues operating when one or more servers is lost. In ZooKeeper you set watches on read operations, for example when listing a znode or getting its data. We&apos;ll list the children under &lt;code&gt;/sample-group&lt;/code&gt; and set a watch, indicated by using &lt;code&gt;true&lt;/code&gt; as the second argument.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;[zk: localhost:2181(CONNECTED) 6] ls /sample-group true
[child-0000000001, child-0000000002, child-0000000000]
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Now if we create another child znode, the watch event will fire and notify us that a &lt;code&gt;NodeChildrenChanged&lt;/code&gt; event occurred.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;[zk: localhost:2181(CONNECTED) 7] create -s -e /sample-group/child- data-4

WATCHER::

WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/sample-group
Created /sample-group/child-0000000003
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;The event does not tell us what actually changed, however. To get the updated list of children we need to again list the contents of &lt;code&gt;/sample-group&lt;/code&gt;. In addition, watchers are &lt;em&gt;one-time events&lt;/em&gt;, and clients must &lt;em&gt;re-register&lt;/em&gt; the watch to continue receiving change notifications. So if we now create another child znode, no watch will fire.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;[zk: localhost:2181(CONNECTED) 8] create -s -e /sample-group/child- data-5
Created /sample-group/child-0000000004
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;To finish off our test drive, let&apos;s delete our test group.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;[zk: localhost:2181(CONNECTED) 9] delete /sample-group
Node not empty: /sample-group
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Oops. ZooKeeper won&apos;t allow znodes to be deleted if they have children. In addition updates, including deletes, are conditional upon a specific version, which is a form of optimistic locking that ensures a client update succeeds only if it passes the current version of the data. Otherwise the update fails with a &lt;code&gt;BadVersionException&lt;/code&gt;. You can short-circuit the optimistic versioning behavior by passing &lt;code&gt;-1&lt;/code&gt; to updates, which tells ZooKeeper to perform the update unconditionally. So in order to delete our group, we first delete all the child znodes and then delete the group znode, all unconditionally.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;[zk: localhost:2181(CONNECTED) 10] delete /sample-group/child-0000000000 -1
[zk: localhost:2181(CONNECTED) 11] delete /sample-group/child-0000000001 -1
[zk: localhost:2181(CONNECTED) 12] delete /sample-group/child-0000000002 -1
[zk: localhost:2181(CONNECTED) 13] delete /sample-group/child-0000000003 -1
[zk: localhost:2181(CONNECTED) 14] delete /sample-group/child-0000000004 -1
[zk: localhost:2181(CONNECTED) 15] delete /sample-group -1                 
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In addition to the shell, ZooKeeper also provides commands referred to as the &quot;four letter words&quot;. You issue the commands via telnet or nc (netcat). For example, let&apos;s ask ZooKeeper how it&apos;s feeling.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ echo &quot;ruok&quot; | nc localhost 2181
imok
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;You can also use the &lt;code&gt;stat&lt;/code&gt; command to get basic statistics on ZooKeeper.&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;$ echo &quot;stat&quot; | nc localhost 2181
Zookeeper version: 3.4.5-1392090, built on 09/30/2012 17:52 GMT
Clients:
 /0:0:0:0:0:0:0:1%0:63888[0](queued=0,recved=1,sent=0)

Latency min/avg/max: 0/0/157
Received: 338
Sent: 337
Connections: 1
Outstanding: 0
Zxid: 0xb
Mode: standalone
Node count: 17
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;In this test drive, we&apos;ve seen some basic but important aspects of ZooKeeper. We created persistent and sequential ephemeral znodes, set a watch and received a change notification event when a znode&apos;s children changed, and deleted znodes. We also saw how znodes can have associated data. When building real systems you obviously won&apos;t be using the command line shell to implement behavior, however, so let&apos;s translate this simple group membership example into Java code.&lt;/p&gt;

&lt;h1&gt;Conclusion to Part 2&lt;/h1&gt;

&lt;p&gt;In this second part of the ZooKeeper series of blogs, you took a test drive using the command-line shell available in ZooKeeper. You created both persistent and ephemeral znodes. You created the ephemeral znodes as children of the persistent znode, and made them sequential as well so that ZooKeeper maintains a monotonically increasing, unique order. Finally you saw how to delete znodes and use a few of the &quot;four letter words&quot; to check ZooKeeper&apos;s status.&lt;/p&gt;

&lt;p&gt;In the next blog, we&apos;ll recreate the group example you&apos;ve just seen using the ZooKeeper Java API.&lt;/p&gt;

&lt;h1&gt;References&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Source code for these blogs, &lt;a href=&quot;https://github.com/sleberknight/zookeeper-samples&quot;&gt;https://github.com/sleberknight/zookeeper-samples&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Presentation on ZooKeeper, &lt;a href=&quot;http://www.slideshare.net/scottleber/apache-zookeeper&quot;&gt;http://www.slideshare.net/scottleber/apache-zookeeper&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;ZooKeeper web site, &lt;a href=&quot;http://zookeeper.apache.org/&quot;&gt;http://zookeeper.apache.org/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/distributed_coordination_with_zookeeper_part</guid>
    <title>Distributed Coordination With ZooKeeper Part 1: Introduction</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/distributed_coordination_with_zookeeper_part</link>
        <pubDate>Tue, 25 Jun 2013 10:29:24 +0000</pubDate>
    <category>Development</category>
    <category>hadoop</category>
    <category>distributed-computing</category>
    <category>java</category>
    <category>zookeeper</category>
            <description>&lt;p&gt;&lt;em&gt;This is the first in a series of blogs that introduce &lt;a href=&quot;http://zookeeper.apache.org/&quot;&gt;Apache ZooKeeper&lt;/a&gt;. This blog provides an introduction to ZooKeeper and its core concepts and use cases. In later blogs you will test drive ZooKeeper, see some examples of the Java API, learn about its architecture, build a distributed data structure which can be used across independent processes and machines, and finally get a brief introduction to a higher-level API on top of ZooKeeper.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Consider a distributed system with multiple servers, each of which is responsible for holding data and performing operations on that data. This could be a distributed search engine, a distributed build system, or even something like Hadoop which has both a distributed file system and a Map/Reduce data processing framework that operates on the data in the file system. How would you determine which servers are alive and operating at any given moment in time? Or, how would you determine which servers are available to process a build in a distributed build system? Or for a distributed search system how would you know which servers are available to hold data and handle search requests? Most importantly, how would you do these things reliably in the face of the difficulties of distributed computing such as network failures, bandwidth limitations, variable latency connections, security concerns, and anything else that can go wrong in a networked environment, perhaps even across multiple data centers?&lt;/p&gt;

&lt;p&gt;These and similar questions are the focus of Apache ZooKeeper, which is a fast, highly available, fault tolerant, distributed coordination service. Using ZooKeeper you can build reliable, distributed data structures for group membership, leader election, coordinated workflow, and configuration services, as well as generalized distributed data structures like locks, queues, barriers, and latches.&lt;/p&gt;

&lt;p&gt;Many well-known and successful projects already rely on ZooKeeper. Just a few of them include HBase, Hadoop 2.0, Solr Cloud, Neo4J, Apache Blur (incubating), and Accumulo.&lt;/p&gt;

&lt;h1&gt;Core Concepts&lt;/h1&gt;

&lt;p&gt;ZooKeeper is a distributed, hierarchical file system that facilitates loose coupling between clients and provides an eventually consistent view of its znodes, which are like files and directories in a traditional file system.  It provides basic operations such as creating, deleting, and checking existence of znodes. It provides an event-driven model in which clients can watch for changes to specific znodes, for example if a new child is added to an existing znode. ZooKeeper achieves high availability by running multiple ZooKeeper servers, called an ensemble, with each server holding an in-memory copy of the distributed file system to service client read requests. Each server also holds a persistent copy on disk.&lt;/p&gt;

&lt;p&gt;One of the servers is elected as the leader, and all other servers are followers. The leader is responsible for all writes and for broadcasting changes to to followers. Assuming a majority of followers commit a change successfully, the write succeeds and the data is then durable even if the leader then fails. This means ZooKeeper is an eventually consistent system, because the followers may lag the leader by some small amount of time, hence clients might not always see the most up-to-date information. Importantly, the leader is not a master as in a master/slave architecture and thus is not a single point of failure; rather, if the leader dies, then the remaining followers hold an election for a new leader, and the new leader takes over where the old one left off.&lt;/p&gt;

&lt;p&gt;Each client connects to ZooKeeper, passing in the list of servers in the ensemble. The client connects to &lt;em&gt;one&lt;/em&gt; of the servers in the ensemble at random until a connection is established. Once connected, ZooKeeper creates a session with the client-specified timeout period. The ZooKeeper client automatically sends periodic heartbeats to keep the session alive if no operations are performed for a while, and automatically handles failover. If the ZooKeeper server a client is connected to fails, the client automatically detects this and tries to reconnect to a different server in the ensemble. The nice thing is that the same client session is retained during this failover event; however during failover it is possible that client operations could fail and, as with almost all ZooKeeper operations, client code must be vigilant and detect errors and deal with them as necessary.&lt;/p&gt;

&lt;h1&gt;Partial Failure&lt;/h1&gt;

&lt;p&gt;One of the fallacies of distributed computing is that the network is reliable. Having worked on a project for the past few years with multiple Hadoop, Apache Blur, and ZooKeeper clusters including hundreds of servers, I can definitely say from experience that the network is not reliable. Simply put, things break and you cannot assume the network is 100% reliable all the time. When designing distributed systems, you must keep this in mind and handle things you ordinarily would not even consider when building software for a single server. For example, assume a client sends an update to a server, but before the response is received the network connection is lost for a brief period. You need to ask several questions in this case. Did the message get through to the server? If it did, then did the operation actually complete successfully? Is it safe to retry an operation for which you don&apos;t even know whether it reached the server or if it failed at the server, in other words is the operation idempotent? You need to consider questions like these when building distributed systems. ZooKeeper cannot help with network problems or partial failures, but once you are aware of the kinds of problems which can arise, you are much better prepared to deal with problems when (not if) they occur. ZooKeeper provides certain guarantees regarding data consistency and atomicity that can aid you when building systems, as you will see later.&lt;/p&gt;

&lt;h1&gt;Conclusion to Part 1&lt;/h1&gt;

&lt;p&gt;In this blog we&apos;ve learned that ZooKeeper is a distributed coordination service that facilitates loose coupling between distributed components. It is implemented as a distributed, hierarchical file system and you can use it to build distributed data structures such as locks, queues, and so on. In the next blog, we&apos;ll take a test drive of ZooKeeper using its command line shell.&lt;/p&gt;

&lt;h1&gt;References&lt;/h1&gt;

&lt;ul&gt;
&lt;li&gt;Presentation on ZooKeeper, &lt;a href=&quot;http://www.slideshare.net/scottleber/apache-zookeeper&quot;&gt;http://www.slideshare.net/scottleber/apache-zookeeper&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;ZooKeeper web site, &lt;a href=&quot;http://zookeeper.apache.org/&quot;&gt;http://zookeeper.apache.org/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Projects powered by ZooKeeper, &lt;a href=&quot;https://cwiki.apache.org/confluence/display/ZOOKEEPER/PoweredBy&quot;&gt;https://cwiki.apache.org/confluence/display/ZOOKEEPER/PoweredBy&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Fallacies of Distributed Computing, &lt;a href=&quot;http://en.wikipedia.org/wiki/Fallacies_of_Distributed_Computing&quot;&gt;http://en.wikipedia.org/wiki/Fallacies&lt;em&gt;of&lt;/em&gt;Distributed_Computing&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Hadoop web site, &lt;a href=&quot;http://hadoop.apache.org/&quot;&gt;http://hadoop.apache.org/&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Hadoop: The Definitive Guide, &lt;a href=&quot;http://bit.ly/hadoop-definitive-guide&quot;&gt;http://bit.ly/hadoop-definitive-guide&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Apache Blur (incubating) web site, &lt;a href=&quot;http://incubator.apache.org/blur/&quot;&gt;http://incubator.apache.org/blur/&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;
</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/hadoop_presentation_at_nova_dc</guid>
    <title>Hadoop Presentation at NOVA/DC Java Users Group</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/hadoop_presentation_at_nova_dc</link>
        <pubDate>Tue, 10 May 2011 01:02:46 +0000</pubDate>
    <category>Development</category>
    <category>hadoop</category>
    <category>java</category>
    <category>hive</category>
            <description>&lt;p&gt;Last Thursday (on Cinco de Mayo) I gave a presentation on &lt;a href=&quot;http://hadoop.apache.org/&quot;&gt;Hadoop&lt;/a&gt; and &lt;a href=&quot;http://hive.apache.org/&quot;&gt;Hive&lt;/a&gt; at the &lt;a href=&quot;http://www.meetup.com/dc-jug/&quot;&gt;Nova/DC Java Users Group&lt;/a&gt;. As several people asked about getting the slides, I&apos;ve shared them &lt;a href=&quot;http://www.slideshare.net/scottleber/hadoop-7904044&quot;&gt;here&lt;/a&gt; on Slideshare. I also posted the presentation sample code on Github at &lt;a href=&quot;https://github.com/sleberknight/basic-hadoop-examples&quot;&gt;basic-hadoop-examples&lt;/a&gt;.&lt;/p&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/what_s_in_jdk_7</guid>
    <title>What&apos;s in JDK 7 Lightning Talk Slides</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/what_s_in_jdk_7</link>
        <pubDate>Sat, 16 Apr 2011 11:06:24 +0000</pubDate>
    <category>Development</category>
    <category>jdk7</category>
    <category>java</category>
            <description>&lt;p&gt;Yesterday at the &lt;a href=&quot;http://www.nearinfinity.com&quot;&gt;Near Infinity&lt;/a&gt; 2011 Spring Conference I gave a talk on CoffeeScript (see &lt;a href=&quot;http://www.nearinfinity.com/blogs/scott_leberknight/coffeescript_slides.html&quot;&gt; here&lt;/a&gt;) and a very short lightning talk on what exactly is in JDK 7. You can find the slides for the JDK 7 talk &lt;a href=&quot;http://www.slideshare.net/scottleber/wtf-is-in-javajdkwtf7&quot;&gt;here&lt;/a&gt; if you&apos;re interested.&lt;/p&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/coffeescript_slides</guid>
    <title>CoffeeScript Slides</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/coffeescript_slides</link>
        <pubDate>Fri, 15 Apr 2011 15:30:14 +0000</pubDate>
    <category>Development</category>
    <category>coffeescript</category>
    <category>javascript</category>
            <description>&lt;p&gt;Today is the &lt;a href=&quot;http://www.nearinfinity.com&quot;&gt;Near Infinity&lt;/a&gt; Spring Conference. We have one conference in the fall and one in the spring for all our developers as well as invited guests. Today I gave a presentation on &lt;a href=&quot;http://coffeescript.org&quot;&gt;CoffeeScript&lt;/a&gt; and shared the slides &lt;a href=&quot;http://www.slideshare.net/scottleber/coffeescript-7642999&quot;&gt;here&lt;/a&gt;.&lt;/p&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/introducing_rjava</guid>
    <title>Introducing RJava</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/introducing_rjava</link>
        <pubDate>Fri, 1 Apr 2011 00:00:00 +0000</pubDate>
    <category>Development</category>
    <category>jruby</category>
    <category>java</category>
    <category>ruby</category>
            <description>&lt;p&gt;You&#8217;ve no doubt heard about JRuby, which lets you run Ruby code on the JVM. This is nice, but wouldn&#8217;t it be nicer if you could write Java code on a Ruby VM? This would let you take advantage of the power of Ruby 1.9&#8217;s new YARV (Yet Another Ruby VM) interpreter while letting you write code in a statically-typed language. Without further ado, I&#8217;d like to introduce &lt;strong&gt;RJava&lt;/strong&gt;, which does just that!&lt;/p&gt;

&lt;p&gt;RJava lets you write code in Java and run it on a Ruby VM! And you still get the full benefit of the Java compiler to ensure your code is 100% correct. Of course with Java you also get checked exceptions and proper interfaces and abstract classes to ensure compliance with your design. You no longer need to worry about whether an object responds to a random message, because the Java compiler will enforce that it does.&lt;/p&gt;

&lt;p&gt;You get all this and more but on the power and flexibility of a Ruby VM. And because Java does not support closures, you are ensured that everything is properly designed since you&#8217;ll be able to define interfaces and then implement anonymous inner classes just like you&#8217;re used to doing! Even when JDK 8 arrives sometime in the future with lambdas, you can rest assured that they will be statically typed.&lt;/p&gt;

&lt;p&gt;As a first example, let&#8217;s see how you could filter a collection in RJava to find only the even numbers from one to ten. In Ruby you&#8217;d probably write something like this:&lt;/p&gt;

&lt;pre class=&#8220;prettyprint&#8221;&gt;
evens = (1..10).find_all { |n| n % 2 == 0 }
&lt;/pre&gt;

&lt;p&gt;With RJava, you&#8217;d write this:&lt;/p&gt;

&lt;pre class=&#8220;prettyprint&#8221;&gt;
List&amp;lt;Integer&amp;gt; evens = new ArrayList&amp;lt;Integer&amp;gt;();
for (int i = 1; i &amp;lt;= 10; i++) {
  if (i % 2 == 0) {
    evens.add(i);
  }
}
&lt;/pre&gt;

&lt;p&gt;This example shows the benefits of declaring variables with specific types, how you can use interfaces (e.g. List in the example) when declaring variables, and shows how you also get the benefits of Java generics to ensure your collections are always type-safe. Without any doubt you know that &#8220;evens&#8221; is a List containing Integers and that &#8220;i&#8221; is an int, so you can sleep soundly knowing your code is correct. You can also see Java&#8217;s powerful &#8220;for&#8221; loop at work here, to easily traverse from 1 to 10, inclusive. Finally, you saw how to effectively use Java&#8217;s braces to organize code to clearly show blocks, and semi-colons ensure you always know where lines terminate.&lt;/p&gt;

&lt;p&gt;I&#8217;ve just released &lt;a href=&quot;https://github.com/sleberknight/rjava&quot; onclick=&quot;alert(&apos;April Fools!&apos;); return false;&quot;&gt;RJava&lt;/a&gt; on GitHub, so go check it out. Please download RJava today and give it a try and let me know what you think!&lt;/p&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/database_backed_refreshable_beans_with</guid>
    <title>Database-Backed Refreshable Beans with Groovy and Spring 3</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/database_backed_refreshable_beans_with</link>
        <pubDate>Sat, 30 Oct 2010 11:57:05 +0000</pubDate>
    <category>Development</category>
    <category>spring</category>
    <category>groovy</category>
            <description>&lt;p&gt;In 2009 I published a &lt;a href=&quot;http://www.ibm.com/developerworks/views/java/libraryview.jsp?search_by=groovier+spring&quot;&gt;two-part&lt;/a&gt; series of articles on IBM developerWorks entitled &lt;a href=&quot;http://www.ibm.com/developerworks/java/library/j-groovierspring1.html&quot;&gt;Groovier&lt;/a&gt; &lt;a href=&quot;http://www.ibm.com/developerworks/java/library/j-groovierspring2.html&quot;&gt;Spring&lt;/a&gt;. The articles showed how Spring supports implementing beans in Groovy whose behavior can be changed at runtime via the &quot;refreshable beans&quot; feature. This feature essentially detects when a Spring bean backed by a Groovy script has changed, recompiles it, and replaces the old bean with the new one. This feature is pretty powerful in certain scenarios, for example in PDF generation; mail or any kind of template generation; and as a way to implement runtime modifiable business rules. One specific use case I showed was how to implement PDF generation where the Groovy scripts reside in a database, allowing you to change how PDFs are generated by simply updating Groovy scripts in your database.&lt;/p&gt;

&lt;p&gt;In order to load Groovy scripts from a database, I showed how to implement custom &lt;code&gt;ScriptFactoryPostProcessor&lt;/code&gt; and &lt;code&gt;ScriptSource&lt;/code&gt; classes. The &lt;code&gt;CustomScriptFactoryPostProcessor&lt;/code&gt; extends the default Spring &lt;code&gt;ScriptFactoryPostProcessor&lt;/code&gt; and overrides the &lt;code&gt;convertToScriptSource&lt;/code&gt; method to recognize a database-based script, e.g. you could specify a script source of &lt;code&gt;database:com/nearinfinity/demo/GroovyPdfGenerator.groovy&lt;/code&gt;. There is also &lt;code&gt;DatabaseScriptSource&lt;/code&gt; that implements the &lt;code&gt;ScriptSource&lt;/code&gt; interface and which knows how to load Groovy scripts from a database.&lt;/p&gt;

&lt;p&gt;In order to put these pieces together, you need to do a bit of configuration. In the articles I used Spring 2.5.x which was &lt;i&gt;current at the time in early 2009&lt;/i&gt;. The configuration looked like this:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
&amp;lt;bean id=&quot;dataSource&quot;
  class=&quot;org.springframework.jdbc.datasource.DriverManagerDataSource&quot;&amp;gt;
    &amp;lt;!-- set data source props, e.g. driverClassName, url, username, password... --&amp;gt;
&amp;lt;/bean&amp;gt

&amp;lt;bean id=&quot;scriptFactoryPostProcessor&quot;
  class=&quot;com.nearinfinity.spring.scripting.support.CustomScriptFactoryPostProcessor&quot;&amp;gt;
    &amp;lt;property name=&quot;dataSource&quot; ref=&quot;dataSource&quot;/&amp;gt;
&amp;lt;/bean&amp;gt;

&amp;lt;lang:groovy id=&quot;pdfGenerator&quot;
  script-source=&quot;database:com/nearinfinity/demo/DemoGroovyPdfGenerator.groovy&quot;&amp;gt;
    &amp;lt;lang:property name=&quot;companyName&quot; value=&quot;Database Groovy Bookstore&quot;/&amp;gt;
&amp;lt;/lang:groovy&amp;gt;
&lt;/pre&gt;

&lt;p&gt;In Spring 2.5.x this works because the &lt;code&gt;&amp;lt;lang:groovy&amp;gt;&lt;/code&gt; tag looks for a Spring bean with id &quot;scriptFactoryPostProcessor&quot; and if one exists it uses it, if not it creates it. In the above configuration we created our own &quot;scriptFactoryPostProcessor&quot; bean for &lt;code&gt;&amp;lt;lang:groovy&amp;gt;&lt;/code&gt; tags to utilize. So all&apos;s well...until you move to Spring 3.x at which point the above configuration no longer works. This was pointed out to me by Jo&#227;o from Brazil who tried the sample code in the articles with Spring 3.x, and it did not work. After trying a bunch of things, we eventually determined that in Spring 3.x the &lt;code&gt;&amp;lt;lang:groovy&amp;gt;&lt;/code&gt; tag looks for a &lt;code&gt;ScriptFactoryPostProcessor&lt;/code&gt; bean whose id is &quot;org.springframework.scripting.config.scriptFactoryPostProcessor&quot; not just &quot;scriptFactoryPostProcessor.&quot; So once you figure this out, it is easy to change the above configuration to:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
&amp;lt;bean id=&quot;org.springframework.scripting.config.scriptFactoryPostProcessor&quot;
  class=&quot;com.nearinfinity.spring.scripting.support.CustomScriptFactoryPostProcessor&quot;&amp;gt;
    &amp;lt;property name=&quot;dataSource&quot; ref=&quot;dataSource&quot;/&amp;gt;
&amp;lt;/bean&amp;gt;

&amp;lt;lang:groovy id=&quot;pdfGenerator&quot;
  script-source=&quot;database:com/nearinfinity/demo/DemoGroovyPdfGenerator.groovy&quot;&amp;gt;
    &amp;lt;lang:property name=&quot;companyName&quot; value=&quot;Database Groovy Bookstore&quot;/&amp;gt;
&amp;lt;/lang:groovy&amp;gt;
&lt;/pre&gt;

&lt;p&gt;Then, everything works as expected and the Groovy scripts can reside in your database and be automatically reloaded when you change them. So if you download the article sample code as-is, it will work since the bundled Spring version is 2.5.4, but if you update to Spring 3.x then you&apos;ll need to modify the configuration in applicationContext.xml for example #7 (EX #7) as shown above to change the &quot;scriptFactoryPostProcessor&quot; bean to be &quot;org.springframework.scripting.config.scriptFactoryPostProcessor.&quot; Note there is a scheduled JIRA issue &lt;a href=&quot;https://jira.springframework.org/browse/SPR-5106&quot;&gt;SPR-5106&lt;/a&gt; that will make the &lt;code&gt;ScriptFactoryPostProcessor&lt;/code&gt; mechanism pluggable, so that you won&apos;t need to extend the default &lt;code&gt;ScriptFactoryPostProcessor&lt;/code&gt; and replace the default bean, etc. But until then, this hack continues to work pretty well.&lt;/p&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/rack_lightning_talk</guid>
    <title>Rack Lightning Talk</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/rack_lightning_talk</link>
        <pubDate>Thu, 21 Oct 2010 20:32:38 +0000</pubDate>
    <category>Development</category>
    <category>rack</category>
    <category>ruby</category>
    <category>middleware</category>
            <description>&lt;p&gt;I gave a short lightning talk on &lt;a href=&quot;http://rack.rubyforge.org/&quot;&gt;Rack&lt;/a&gt; tonight at the &lt;a href=&quot;http://novarug.org/&quot;&gt;NovaRUG&lt;/a&gt;. It&apos;s on slideshare &lt;a href=&quot;http://www.slideshare.net/scottleber/rack-5521616&quot;&gt;here&lt;/a&gt;. Rack is really cool  because it makes creating modular functionality really easy. For example, if you want to have exceptions mailed to you you can use the Rack::MailExceptions middleware, or if you want responses compressed you can add one line of code to a Rails app to use Rack::Deflater. Cool.&lt;/p&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/adding_each_line_method_to</guid>
    <title>Missing the each_line method in FakeFS version 0.2.1? Add it!</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/adding_each_line_method_to</link>
        <pubDate>Thu, 6 May 2010 23:21:28 +0000</pubDate>
    <category>Development</category>
    <category>rspec</category>
    <category>ruby</category>
    <category>fakefs</category>
            <description>&lt;p&gt;Recently we have been using the excellent &lt;a href=&quot;http://github.com/defunkt/fakefs&quot;&gt;FakeFS&lt;/a&gt; (fake filesystem) gem in some specs to test code that reads and writes files on the filesystem. We are using the latest &lt;em&gt;release&lt;/em&gt; version of this gem which is 0.2.1 as I am writing this. Some of the code under test uses the &lt;code&gt;IO&lt;/code&gt; &lt;code&gt;each_line&lt;/code&gt; method to iterate lines in relatively largish files. But we found out quickly that is a problem, since in version 0.2.1 the &lt;code&gt;FakeFS::File&lt;/code&gt; class does not extend &lt;code&gt;StringIO&lt;/code&gt; and so you don&apos;t get all its methods such as &lt;code&gt;each_line&lt;/code&gt;. (The &lt;a href=&quot;http://github.com/defunkt/fakefs/blob/master/lib/fakefs/file.rb&quot;&gt;version on master in GitHub&lt;/a&gt; as I write this does extend &lt;code&gt;StringIO&lt;/code&gt;, but it is not yet released as a formal version.)

As an example suppose we have the following code that prints out the size of each line in a file as stars (asterisks):

&lt;style type=&quot;text/css&quot;&gt;
.prettyprint {
background-color:#EFEFEF;
border:1px solid #CCCCCC;
font-size:small;
overflow:auto;
padding:5px;
}
&lt;/style&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
def lines_to_stars(file_path)
  File.open(file_path, &apos;r&apos;).each_line { |line| puts &apos;*&apos; * line.size }
end
&lt;/pre&gt;

&lt;p&gt;Let&apos;s say we use FakeFS to create a fake file like this:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
require &apos;fakefs/safe&apos;
require &apos;stringio&apos;

FakeFS.activate!

File.open(&apos;/tmp/foo.txt&apos;, &apos;w&apos;) do |f|
  f.write &quot;The quick brown fox jumped over the lazy dog\n&quot;
  f.write &quot;The quick red fox jumped over the sleepy cat\n&quot;
  f.write &quot;Jack be nimble, Jack be quick, Jack jumped over the candle stick\n&quot;
  f.write &quot;Twinkle, twinkle little star, how I wonder what you are\n&quot;
  f.write &quot;The End.&quot;
end
&lt;/pre&gt;

&lt;p&gt;So far, so good. But now if we call &lt;code&gt;lines_to_stars&lt;/code&gt; we get an error:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
NoMethodError: undefined method `each_line&apos; for #&amp;lt;FakeFS::File:0x000001012c22b8&amp;gt;
&lt;/pre&gt;

&lt;p&gt;Oops. No &lt;code&gt;each_line&lt;/code&gt;. If you don&apos;t want to use an unreleased version of the gem, you can add &lt;code&gt;each_line&lt;/code&gt; onto FakeFS::File using the following code:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
module FakeFS
  class File
    def each_line
      File.readlines(self.path).each { |line| yield line }
    end
  end
end
&lt;/pre&gt;

&lt;p&gt;Basically all it does is define &lt;code&gt;each_line&lt;/code&gt; so that it reads all the lines from a (fake) file on the (fake) filesystem and then yields them up one by one, so you can have code under test that iterates a file and work as expected. So now calling &lt;code&gt;lines_to_stars&lt;/code&gt; gives a nice pretty bar chart containing the line sizes represented by stars:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
********************************************
********************************************
***************************************************************
*******************************************************
********
&lt;/pre&gt;

&lt;p&gt;Since we&apos;re using RSpec, to make this work nicely we added the above code that defines &lt;code&gt;each_line&lt;/code&gt; into a file named &lt;code&gt;fakefs.rb&lt;/code&gt; in the &lt;code&gt;spec/support&lt;/code&gt; directory, since &lt;code&gt;spec_helper&lt;/code&gt; requires supporting files in the &lt;code&gt;spec/support&lt;/code&gt; directory and its subdirectories. So now all our specs automatically get the &lt;code&gt;each_line&lt;/code&gt; behavior when using FakeFS.&lt;/p&gt;
</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/hibernate_performance_tuning_part_2</guid>
    <title>Hibernate Performance Tuning Part 2 Article Published</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/hibernate_performance_tuning_part_2</link>
        <pubDate>Mon, 21 Dec 2009 14:23:06 +0000</pubDate>
    <category>Development</category>
    <category>performance</category>
    <category>java</category>
    <category>orm</category>
    <category>hibernate</category>
            <description> &lt;p&gt;I&apos;ve just published the second article of a two-part series in the December 2009 &lt;a href=&quot;http://www.nofluffjuststuff.com/home/magazine_subscribe?id=10&quot;&gt;NFJS Magazine&lt;/a&gt; on Hibernate Performance Tuning. Here&apos;s the abstract:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Tuning performance in Hibernate applications is all about reducing the number of database queries or eliminating them entirely using caching. In the first article in this two part series, you saw how to tune object retrieval using eager fetching techniques to optimize queries and avoid lazy-loads. In this second and final article, I&#8217;ll show you how inheritance strategy affects performance, how to eliminate queries using the Hibernate second-level cache, and show some simple but effective tools you can use to monitor and profile your applications.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If you are using Hibernate and want to know more about how inheritance affects performance, how to use the second-level cache, and some simple monitoring and profiling techniques, check it out and let me know what you think. Note that NFJS Magazine does require a subscription.&lt;/p&gt;

</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/making_cobertura_reports_show_groovy</guid>
    <title>Making Cobertura Reports Show Groovy Code with Maven</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/making_cobertura_reports_show_groovy</link>
        <pubDate>Tue, 15 Dec 2009 23:43:37 +0000</pubDate>
    <category>Development</category>
    <category>cobertura</category>
    <category>java</category>
    <category>groovy</category>
    <category>maven</category>
            <description>&lt;p&gt;A recent project started out life as an all-Java project that used Maven as the build tool. Initially we used &lt;a href=&quot;http://www.atlassian.com/software/clover/&quot;&gt;Atlassian Clover&lt;/a&gt; to measure unit test coverage. Clover is a great product for Java code, but unfortunately it only works with Java code because it works at the Java source level. (This was the case as of Spring 2009, and I haven&apos;t checked since then.) As we started migrating existing code from Java to Groovy and writing new code in Groovy, we started to lose data about unit test coverage because Clover does not understand Groovy code. To remedy this problem we switched from Clover to &lt;a href=&quot;http://cobertura.sourceforge.net/&quot;&gt;Cobertura&lt;/a&gt;, which instruments at the bytecode level and thus works with Groovy code. Theoretically it would also work with any JVM-based language but I&apos;m not sure whether or not it could handle something like Clojure or not.&lt;/p&gt;

&lt;p&gt;In any case, we only cared about Groovy so Cobertura was a good choice. With the &lt;a href=&quot;http://mojo.codehaus.org/cobertura-maven-plugin/&quot;&gt;Cobertura Maven&lt;/a&gt; plugin we quickly found a problem, which was that even though the code coverage was running, the reports only showed coverage for Java code, not Groovy. This blog shows you how to display coverage on Groovy code  when using Maven and the Cobertura plugin. In other words, I&apos;ll show how to get Cobertura reports to link to the real Groovy source code in Maven, so you can navigate Cobertura reports as you normally would.&lt;/p&gt;

&lt;p&gt;The core problem is pretty simple, though it took me a while to figure out how to fix it. Seems to be pretty standard in Maven: I know what I want to do, but finding out how to do it is the &lt;i&gt;really&lt;/i&gt; hard part. The only thing you need to do is tell Maven about the Groovy source code and where it lives. The way I did this is to use the Codehaus &lt;a href=&quot;http://mojo.codehaus.org/build-helper-maven-plugin/&quot;&gt;build-helper-maven-plugin&lt;/a&gt; which has an add-source goal. The add-source goal does just what you would expect; it adds a specified directory (or directories) as a source directory in your Maven build. Here&apos;s how you use it in your Maven pom.xml file:&lt;/p&gt;

&lt;style type=&quot;text/css&quot;&gt;
.prettyprint {
background-color:#EFEFEF;
border:1px solid #CCCCCC;
font-size:small;
overflow:auto;
padding:5px;
}
&lt;/style&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
&amp;lt;plugin&amp;gt;
    &amp;lt;groupId&amp;gt;org.codehaus.mojo&amp;lt;/groupId&amp;gt;
    &amp;lt;artifactId&amp;gt;build-helper-maven-plugin&amp;lt;/artifactId&amp;gt;
    &amp;lt;executions&amp;gt;
        &amp;lt;execution&amp;gt;
            &amp;lt;phase&amp;gt;generate-sources&amp;lt;/phase&amp;gt;
            &amp;lt;goals&amp;gt;
                &amp;lt;goal&amp;gt;add-source&amp;lt;/goal&amp;gt;
            &amp;lt;/goals&amp;gt;
            &amp;lt;configuration&amp;gt;
                &amp;lt;sources&amp;gt;
                    &amp;lt;source&amp;gt;src/main/groovy&amp;lt;/source&amp;gt;
                &amp;lt;/sources&amp;gt;
            &amp;lt;/configuration&amp;gt;
        &amp;lt;/execution&amp;gt;
    &amp;lt;/executions&amp;gt;
&amp;lt;/plugin&amp;gt;
&lt;/pre&gt;

&lt;p&gt;In the above code snippet, we&apos;re  using the &quot;build-helper-maven-plugin&quot; to add the src/main/groovy directory. That&apos;s pretty much it. Run Cobertura as normal, view the reports, and you should now see coverage on Groovy source code as well as Java.&lt;/p&gt;
</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/hibernate_performance_tuning_part_11</guid>
    <title>Hibernate Performance Tuning Part 1 Article Published</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/hibernate_performance_tuning_part_11</link>
        <pubDate>Tue, 1 Dec 2009 19:39:51 +0000</pubDate>
    <category>Development</category>
    <category>performance</category>
    <category>java</category>
    <category>orm</category>
    <category>hibernate</category>
            <description> &lt;p&gt;I&apos;ve just published an article in the November 2009 &lt;a href=&quot;http://www.nofluffjuststuff.com/home/magazine_subscribe?id=9&quot;&gt;NFJS Magazine&lt;/a&gt; on Hibernate Performance Tuning. Here&apos;s the abstract:&lt;/p&gt;

&lt;p&gt;&lt;em&gt;Many developers treat Hibernate like a &quot;black box&quot; and assume it will simply &quot;Do the Right Thing&quot; when it comes to all things related to the underlying database. This is a faulty assumption because, while Hibernate is great at the mechanics of database interaction, it cannot and will likely not ever be able to figure out the specific details of your domain model and discern the most efficient and best performing data access strategies. In this first article of a two part series, I&apos;ll show you how to achieve better performance in your Hibernate applications by focusing on tuning object retrieval, which forms the basis of your &quot;fetch plan&quot; for finding and storing objects in the database.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;If you are using Hibernate and want to know more about how to change how objects are fetched from the database, check it out and let me know what you think. Note that NFJS Magazine does require a subscription.&lt;/p&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/can_java_be_saved</guid>
    <title>Can Java Be Saved?</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/can_java_be_saved</link>
        <pubDate>Mon, 9 Nov 2009 15:37:25 +0000</pubDate>
    <category>Development</category>
    <category>python</category>
    <category>closure</category>
    <category>java</category>
    <category>c#</category>
    <category>groovy</category>
    <category>clojure</category>
            <description>&lt;h3&gt;Java and Evolution&lt;/h3&gt;

&lt;p&gt;The Java language has been around for a pretty long time, and in my view is now a stagnant language. I don&apos;t consider it &lt;a href=&quot;http://codemonkeyism.com/java-dead/&quot;&gt;dead&lt;/a&gt; because I believe it will be around for probably decades if not longer. But it appears to have reached its evolutionary peak, and it doesn&apos;t look it&apos;s going to be evolved any further. This is not due to problems inherent in the language itself. Instead it seems the problem lies with Java&apos;s stewards (Sun and the JCP) and their unwillingness to evolve the language to keep it current and modern, and more importantly the goal to keep backward compatibility at all costs. Not just Sun, but also it seems the large corporations with correspondingly large investments in Java like IBM and Oracle aren&apos;t exactly chomping at the bit to improve Java. I don&apos;t even know if they think it even needs improvement at all. So really, the ultra-conservative attitude towards change and evolution is the problem with Java from my admittedly limited view of things.&lt;/p&gt;

&lt;p&gt;That&apos;s why I don&apos;t hate Java. But, I do hate the way it has been treated by the people charged with improving it. It is clear many in the Java community want things like closures and a native property syntax but instead we got Project Coin. This, to me, is sad really. It is a shame that things like closures and native properties were not addressed in Java/JDK/whatever-it-is-called 7.&lt;/p&gt;

&lt;h3&gt;Why Not?&lt;/h3&gt;

&lt;p&gt;I want to know why Java can&apos;t be improved. We have concrete examples that it is possible to change a major language in major ways. Even in ways that break backward compatibility in order to evolve and improve. Out with the old, in with the new. Microsoft with C# showed that you can successfully evolve a language over time in major ways. For example C# has always had a property syntax but it now also has many features found in dynamically typed and functional languages such as type inference and, effectively, closures. With LINQ it introduced functional concepts. When C# added generics they did it correctly and retained the type information in the compiled IL, whereas Java used type-erasure and simply dropped the types from the compiled  bytecode. There is a great irony here: though C# began life about five or six years after Java, it not only has caught up but has surpassed Java in most if not all ways, and has continued to evolve while Java has become stagnant.&lt;/p&gt;

&lt;p&gt;C# is not the only example. Python 3 is a major overhaul of the Python language, and it introduced breaking changes that are not backwards compatible. I believe they provide a migration tool to assist you should you want to move from the 2.x series to version 3 and beyond. Microsoft has done this kind of thing as well. I remember when they made Visual Basic conform to the .NET platform and introduced some rather gut wrenching (for VB developers anyway) changes, and they also provided a tool to aid the transition. One more recent example is Objective-C which has experienced a resurgence in importance mainly because of the iPhone. Objective-C has been around longer than all of Java, C#, Ruby, Python, etc. since the 1980s. Apple has made improvements to Objective-C and it now sports a way to define and synthesize properties and most recently added blocks (effectively closures). If a language that pre-dates Java (Python also pre-dates Java by the way) can evolve, I just don&apos;t get why Java can&apos;t.&lt;/p&gt;

&lt;p&gt;While it is certainly possible to remain on older versions of software, forcing yourself to upgrade can be a Good Thing, because it ensures you don&apos;t get the &quot;COBOL Syndrome&quot; where you end up with nothing but binaries that have to run on a specific hardware platform forever and you are trapped until you rewrite or you go out of business. The other side of this, of course, is that organizations don&apos;t have infinite time, money, and resources to update every single application. Sometimes this too can be good, because it forces you to triage older systems, and possibly consolidate or outright eliminate them if they have outlived their usefulness. In order to facilitate large transitions, I believe it is very important to use tools that help automate the upgrade process, e.g. tools that analyze code and fix it if possible (reporting all changes in a log) and which provide warnings and guidance when a simple fix isn&apos;t possible.&lt;/p&gt;

&lt;h3&gt;The JVM Platform&lt;/h3&gt;

&lt;p&gt;Before I get into the changes I&apos;d make to Java to make it not feel like I&apos;m developing with a straightjacket on while having to type masses of unnecessary boilerplate code, I want to say that I think the JVM is a great place to be. Obviously the JVM itself facilitates developing all kinds of languages as evidenced by the huge number of languages that run on the JVM. The most popular ones and most interesting ones these days are probably JRuby, Scala, Groovy, and Clojure though there are probably hundreds more. So I suppose you could make an argument that Java doesn&apos;t need to evolve any more because we can simply use a more modern language that runs on the JVM.&lt;/p&gt;

&lt;p&gt;The main problem I have with that argument is simply that there is already a ton of Java code out there, and there are many organizations who are simply not going to allow other JVM-based languages; they&apos;re going to stick with Java for the long haul, right or wrong. This means there is a good chance that even if you can manage convince someone to try writing that shiny new web app using Scala and its Lift framework,  JRuby on Rails, Grails, or Clojure, chances are at some point you&apos;ll also need to maintain or enhance existing large Java codebases. Wouldn&apos;t you like to be able to first upgrade to a version of Java that has closures, native property syntax, method/property handles, etc?&lt;/p&gt;

&lt;p&gt;Next I&apos;ll choose what would be my top three choices to make Java much better immediately.&lt;/p&gt;

&lt;h3&gt;Top Three Java Improvements&lt;/h3&gt;

&lt;p&gt;If given the chance to change just three things about Java to make it better, I would choose these:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Remove checked exceptions&lt;/li&gt;
&lt;li&gt;Add closures&lt;/li&gt;
&lt;li&gt;Add formal property support&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;I think these three changes along would make coding in Java much, much better. Let&apos;s see how.&lt;/p&gt;

&lt;h4&gt;Remove Checked Exceptions&lt;/h4&gt;

&lt;p&gt;By removing checked exceptions you eliminate a ton of boilerplate try/catch clauses that do nothing except log a message, wrap and re-throw as a RuntimeException, pollute the API with throws clauses all over the place, or worst of all empty catch blocks that can cause very subtle and evil bugs. With unchecked exceptions, developers still have the option to catch exceptions that they can actually handle. It would be interesting to see how many times in a typical Java codebase people actually handle exceptions and do something at the point of exception, or whether they simply punt it away for the caller to handle, who in turn also punts, and so forth all the way up the call stack until some global handler catches it or the program crashes. If I were a betting man, I&apos;d bet a lot of money that for most applications, developers punt the vast majority of the time. So why force people to handle something they cannot possible handle?&lt;/p&gt;

&lt;h4&gt;Add Closures&lt;/h4&gt;

&lt;p&gt;I specifically listed removing checked exceptions first, because to me it is the first step to being able to have a closure/block syntax that isn&apos;t totally horrendous. If you remove checked exceptions, then adding closures would seem to be much easier since you don&apos;t need to worry at all about what exceptions could possibly be thrown and there is obviously no need to declare exceptions. Closures/blocks would lead to better ability to handle collections, for example as in Groovy but in Java you would still have types (note I&apos;m also using a literal property syntax here):&lt;/p&gt;

&lt;style type=&quot;text/css&quot;&gt;
.prettyprint {
background-color:#EFEFEF;
border:1px solid #CCCCCC;
font-size:small;
overflow:auto;
padding:5px;
}
&lt;/style&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
// Find all people whose last name is &quot;Smith&quot;
List&amp;lt;Person&amp;gt; peeps = people.findAll { Person person -&gt; person.lastName.equals(&quot;Smith&quot;);   } 
&lt;/pre&gt;

or

&lt;pre class=&quot;prettyprint&quot;&gt;
// Create a list of names by projecting the name property of a bunch of Person objects
List&amp;lt;String&amp;gt; names = people.collect { Person person -&gt; person.name; }
&lt;/pre&gt;

&lt;p&gt;Not quite as clean as Groovy but still much better than the for loops that would traditionally be required (or trying to shoehorn functional-style into Java using the &lt;a href=&quot;http://commons.apache.org/collections/&quot;&gt;Jakarta Commons Collections&lt;/a&gt;  or &lt;a href=&quot;http://code.google.com/p/google-collections/&quot;&gt;Google Collections&lt;/a&gt;). Removal of checked exceptions would allow, as mentioned earlier, the block syntax to not have to deal with declaring exceptions all over the place. Having to declare checked exceptions in blocks makes the syntax worse instead of better, at least when I saw the various closure proposals for Java/JDK/whatever 7 which did not get included. Requiring types in the blocks is still annoying, especially once you get used to Ruby and Groovy, but it would be passable.&lt;/p&gt;

&lt;h4&gt;Native Property Syntax&lt;/h4&gt;

&lt;p&gt;The third change should do essentially what Groovy for properties does but should introduce a &quot;property&quot; keyword (i.e. don&apos;t rely on whether someone accidentally put an access modifier in there as Groovy does). The syntax could be very clean:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
property String firstName;
property String lastName;
property Date dateOfBirth;
&lt;/pre&gt;

&lt;p&gt;The compiler could automatically generate the appropriate getter/setter for you like Groovy does. This obviates the need to manually code the getter/setter. Like Groovy you should be able to override either or both. It de-clutters code enormously and removes a ton of lines of silly getter/setter code (plus JavaDocs if you are actually still writing them for every get/set method). Then you could reference properties as you would expect: person.name is the &quot;getter&quot; and person.name = &quot;Fred&quot; is the &quot;setter.&quot; Much cleaner syntax, way less boilerplate code. By the way, if someone used the word &quot;property&quot; in their code, i.e. as a variable name, it is just not that difficult to rename refactor, especially with all the advanced &lt;a href=&quot;http://www.jetbrains.com/idea/&quot;&gt;IDEs&lt;/a&gt; in the Java community that do this kind of thing in their sleep.&lt;/p&gt;

&lt;p&gt;Lots of other things could certainly be done, but if just these three were done I think Java would be much better off, and maybe it would even come into the 21st century like Objective-C. (See the very long but very good &lt;a href=&quot;http://arstechnica.com/apple/reviews/2009/08/mac-os-x-10-6.ars&quot;&gt;Ars Technica Snow Leopard review&lt;/a&gt; for information on Objective-C&apos;s new &lt;a href=&quot;http://arstechnica.com/apple/reviews/2009/08/mac-os-x-10-6.ars/10#blocks&quot;&gt;blocks&lt;/a&gt; feature.)&lt;/p&gt;

&lt;h3&gt;Dessert Improvements&lt;/h3&gt;

&lt;p&gt;If (as I suspect they certainly will :-) ) Sun/Oracle/whoever takes my suggestions and makes these changes and improves Java, then I&apos;m sure they&apos;ll want to add in a few more for dessert. After the main course which removes checked exceptions, adds closures, and adds native property support, dessert includes the following:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Remove type-erasure and clean up generics&lt;/li&gt;
&lt;li&gt;Add property/method handles&lt;/li&gt;
&lt;li&gt;String interpolation&lt;/li&gt;
&lt;li&gt;Type inference&lt;/li&gt;
&lt;li&gt;Remove &quot;new&quot; keyword
&lt;/ul&gt;

&lt;h4&gt;Clean Up Generics&lt;/h4&gt;

&lt;p&gt;Generics should simply not remove type information when compiled. If you&apos;re going to have generics in the first place, do it correctly and stop worrying about backward compatibility. Keep type information in the bytecode, allow reflection on it, and allow me to instantiate a &quot;new T()&quot; where T is some type passed into a factory method, for example. I think an improved generics implementation could basically copy the way C# does it and be done.&lt;/p&gt;

&lt;h4&gt;Property/Method Handles&lt;/h4&gt;

&lt;p&gt;Property/method &lt;a href=&quot;http://blogs.sun.com/jrose/entry/method_handles_in_a_nutshell&quot;&gt;handles&lt;/a&gt; would allow you to reference a property or method directly. They would make code that now must use strings strongly typed and refactoring-safe (IDEs like IntelliJ already know how to search in text and strings but can never be perfect) much nicer. For example, a particular pet peeve of mine and I&apos;m sure a lot of other developers is writing Criteria queries in Hibernate. You are forced to reference properties as simple strings. If the lastName property is changed to surname then you better make sure to catch all the places the String &quot;lastName&quot; is referenced. So you could replace code like this:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
session.createCriteria(Person.class)
	.add(Restrictions.eq(&quot;lastName&quot;, &quot;Smith&quot;)
	.addOrder(Order.asc(&quot;firstName&quot;)
	.list();
&lt;/pre&gt;

&lt;p&gt;with this using method/property handles:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
session.createCriteria(Person.class)
	.add(Restrictions.eq(Person.lastName, &quot;Smith&quot;)
	.addOrder(Order.asc(Person.firstName)
	.list();
&lt;/pre&gt;

&lt;p&gt;Now the code is strongly-typed and refactoring-safe. JPA 2.0 tries mightily to overcome having strings in the new criteria query API  with its metamodel. But I find it pretty much appalling to even look at, what with having to create or code-generate a separate &quot;metamodel&quot; class which you reference like &quot;_Person.lastName&quot; or some similar awful way. This metamodel class lives only to represent properties on your real model object for the sole purpose of making JPA 2.0 criteria queries strongly typed. It just isn&apos;t worth it and is total overkill. In fact, it reminds me of the bad-old days of rampant over-engineering in Java (which apparently is still alive and well in many circles but I try to avoid it as best I can). The right thing is to fix the language, not to invent something that adds yet more boilerplate and more complexity to an already overcomplicated ecosystem.&lt;/p&gt;

&lt;p&gt;Method handles could also be used to make calling methods using reflection much cleaner than it currently is, among other things. Similarly it would make accessing properties via reflection easier and cleaner. And with only unchecked exceptions you would not need to catch the four or five kinds of exceptions reflective code can throw.&lt;/p&gt;

&lt;h4&gt;String Interpolation&lt;/h4&gt;

&lt;p&gt;String interpolation is like the sorbet that you get at fancy restaurants to cleanse your palate. This would seem to be a no-brainer to add. You could make code like:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
log.error(&quot;The object of type  [&quot;
    + foo.getClass().getName()
    + &quot;] and identifier [&quot;
    + foo.getId()
    + &quot;] does not exist.&quot;, cause);
&lt;/pre&gt;

&lt;p&gt;turn into this much more palatable version (using the native property syntax I mentioned earlier):&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
log.error(&quot;The object of type [${foo.class.name}] and identifier [${foo.id}] does not exist.&quot;, cause);
&lt;/pre&gt;

&lt;h4&gt;Type Inference&lt;/h4&gt;

&lt;p&gt;I&apos;d also suggest adding type inference, if only for local variables like C# does. Why do we have to repeat ourselves? Instead of writing:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
Person person = new Person();
&lt;/pre&gt;

&lt;p&gt;why can&apos;t we just write:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
var person = new Person();
&lt;/pre&gt;

&lt;p&gt;I have to believe the compiler and all the tools are smart enough to infer the type from the &quot;new Person()&quot;. Especially since other strongly-typed JVM languages like Scala do exactly this kind of thing.&lt;/p&gt;

&lt;h4&gt;Elminate &quot;new&quot;&lt;/h4&gt;

&lt;p&gt;Last but not least, and actually not the last thing I can think of but definitely the last I&apos;m writing about here, let&apos;s get rid of the &quot;new&quot; keyword and either go with Ruby&apos;s new &lt;i&gt;method&lt;/i&gt; or Python&apos;s constructor syntax, like so:&lt;/p&gt;

&lt;pre class=&quot;prettyprint&quot;&gt;
// Ruby-like new method
var person = Person.new()

// or Python-like construction
var person = Person()
&lt;/pre&gt;

&lt;p&gt;This one came to me recently after hearing &lt;a href=&quot;http://en.wikipedia.org/wiki/Bruce_Eckel&quot;&gt;Bruce Eckel&lt;/a&gt; give an excellent talk on language evolution and archaeology. He had a ton of really interesting examples of why things are they way they are, and how Java and other languages like C++ evolved from C. One example was the reason for &quot;new&quot; in Java. In C++ you can allocate objects on the stack or the heap, so there is a stack-based constructor syntax that does not use &quot;new&quot; while the heap-based constructor syntax uses the &quot;new&quot; operator. Even though Java only has heap-based object allocation, it retained the &quot;new&quot; keyword which is not only boilerplate code but also makes the entire process of object construction pretty much immutable: you cannot change anything about it nor can you easily add hooks into the object creation process.&lt;/p&gt;

&lt;p&gt;I am not an expert at all in the low-level details, and Bruce obviously knows what he is &lt;a href=&quot;http://www.amazon.com/Thinking-C-2-Practical-Programming/dp/0130353132/&quot;&gt;talking&lt;/a&gt; &lt;a href=&quot;http://www.amazon.com/Thinking-Java-4th-Bruce-Eckel/dp/0131872486/&quot;&gt;about&lt;/a&gt; way more than I do, but I can say that I believe the Ruby and Python syntaxes are not only nicer but more internally consistent, especially in the Ruby case because there is no special magic or sauce going on. In Ruby, new is just a method, on a class, just like everything else.&lt;/p&gt;

&lt;h3&gt;Conclusion to this Way Too Long Blog Entry&lt;/h3&gt;

&lt;p&gt;I did not actually set out to write a blog whose length is worthy of a &lt;a href=&quot;http://blogs.tedneward.com/&quot;&gt;Ted Neward&lt;/a&gt; blog. It just turned out that way. (And I do in fact like reading Ted&apos;s long blogs!) Plus, I found out that &lt;a href=&quot;http://en.wikipedia.org/wiki/Speculative_fiction&quot;&gt;speculative fiction&lt;/a&gt; can be pretty fun to write, since I don&apos;t think pretty much any of these things are going to make it into Java anytime soon, if ever, and I&apos;m sure there are lots of people in the Java world who hate things like Ruby won&apos;t agree anyway.&lt;/p&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/several_must_have_firebug_related</guid>
    <title>Several Must Have Firebug-Related Firefox Extensions</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/several_must_have_firebug_related</link>
        <pubDate>Mon, 28 Sep 2009 12:54:20 +0000</pubDate>
    <category>Development</category>
    <category>firebug</category>
    <category>firefox</category>
            <description>&lt;p&gt;Last week while doing the usual (web development stuff) I discovered a few Firefox extensions I didn&apos;t even know I was missing until I found them by accident. The &quot;accident&quot; happened while adding Firebug to a Firefox that was running in a VMWare Fusion Windows virtual machine on which I was testing in, gasp, Windows. I went to find add-ons and searched for Firebug. And up came not only &lt;a href=&quot;http://getfirebug.com/&quot;&gt;Firebug&lt;/a&gt; but also results for &lt;a href=&quot;http://www.softwareishard.com/blog/firecookie/&quot;&gt;Firecookie&lt;/a&gt;, &lt;a href=&quot;http://robertnyman.com/firefinder/&quot;&gt;Firefinder&lt;/a&gt;, &lt;a href=&quot;http://robertnyman.com/inline-code-finder/&quot;&gt;Inline Code Finder for Firebug&lt;/a&gt;, and &lt;a href=&quot;http://tools.sitepoint.com/codeburner/firefox&quot;&gt;CodeBurner for Firebug&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;Of course everyone doing web development uses Firebug (or really should anyway) since it rules. But these other extensions provide some &lt;i&gt;really&lt;/i&gt; nice functionality and complement Firebug perfectly. Here&apos;s a quick run down:&lt;/p&gt;

&lt;h3&gt;Firecookie&lt;/h3&gt;

&lt;p&gt;Firecookie lets you see all the cookies for a site, add new ones, remove existing cookies, etc. It gives useful information about each cookie like the name, value, raw value (if URI-encoded), domain, size, path, expiration, and security. Very cool.&lt;/p&gt;

&lt;style type=&quot;text/css&quot;&gt;
img.border {
   margin-bottom: 10px;
   border: 1px solid #021a40;
}
&lt;/style&gt;

&lt;img class=&quot;border&quot; src=&quot;http://www.sleberknight.com/blog/sleberkn/resource/firecookie.png&quot; alt=&quot;Firecookie Firefox Add-On&quot; title=&quot;Firecookie Firefox Add-On&quot;/&gt;

&lt;h3&gt;Firefinder&lt;/h3&gt; 

&lt;p&gt;Firefinder for Firebug lets you search for elements on a page using either CSS expressions or an XPath query. In the list of matching elements, you can expand each result, inspect the element by clicking the &quot;Inspect&quot; link, or click &quot;FriendlyFire&quot; which will copy the content you&apos;re looking at and post it up to &lt;a href=&quot;http://jsbin.com/&quot;&gt;JS Bin&lt;/a&gt;. (Be careful with this one if you have code you&apos;d rather not have going up over the wire to a different web site.) Firefinder also puts a dashed border around each matching element it found. As you hover over search results, it highlights the matching element in the page. This is really useful when you want to find all elements matching a CSS expression or when you&apos;d like to use XPath to find specific elements. Nice.

&lt;img class=&quot;border&quot; src=&quot;http://www.sleberknight.com/blog/sleberkn/resource/firefinder.png&quot; alt=&quot;Firefinder Firefox Add-On&quot; title=&quot;Firefinder Firefox Add-On&quot;/&gt;

&lt;h3&gt;Inline Code Finder for Firebug&lt;/h3&gt;

&lt;p&gt;The Inline Code Finder does just that. It finds inline CSS styles, JavaScript links, and inline events, and reports the number of each of these in its results pane. Even better, it highlights each of these problems on the page  you are viewing with a thick red border, and as you hover over them it shows you what the problem is in a nicely tooltip. This is really nice to help you become less obtrusive by writing more unobtrusive JavaScript and avoiding inline styles. For older sites or sites that weren&apos;t designed with &quot;unobtrusivity&quot; in mind though, be warned that there might be a lot of red on the page!&lt;/p&gt;

&lt;img class=&quot;border&quot; src=&quot;http://www.sleberknight.com/blog/sleberkn/resource/inlinecodefinder.png&quot; alt=&quot;Inline Code Finder Firefox Add-On&quot; title=&quot;Inline Code Finder Firefox Add-On&quot;/&gt;

&lt;h3&gt;CodeBurner for Firebug&lt;/h3&gt;

&lt;p&gt;CodeBurner for Firebug provides an inline HTML and CSS reference within Firebug. It allows you to search for HTML elements or CSS styles and shows a definition and an example. It also provides links to the awesome &lt;a href=&quot;http://www.sitepoint.com/&quot;&gt;Sitepoint&lt;/a&gt; reference and even to the Sitepoint live demos of the feature you are learning about. This is so unbelievably useful to have a HTML and CSS references directly within Firebug it isn&apos;t even funny. Thanks Sitepoint.&lt;/p&gt;

&lt;img class=&quot;border&quot; src=&quot;http://www.sleberknight.com/blog/sleberkn/resource/codeburner.png&quot; alt=&quot;CodeBurner Firefox Add-On&quot; title=&quot;CodeBurner Firefox Add-On&quot;/&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/sorting_collections_in_hibernate_using</guid>
    <title>Sorting Collections in Hibernate Using SQL in @OrderBy</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/sorting_collections_in_hibernate_using</link>
        <pubDate>Tue, 15 Sep 2009 12:40:00 +0000</pubDate>
    <category>Development</category>
    <category>hibernate</category>
    <category>database</category>
    <category>java</category>
            <description>&lt;p&gt;When you have collections of associated objects in domain objects, you generally want to specify some kind of default sort order. For example, suppose I have domain objects &lt;code&gt;Timeline&lt;/code&gt; and &lt;code&gt;Event&lt;/code&gt;:&lt;/p&gt;

&lt;style type=&quot;text/css&quot;&gt;
.code {
background-color:#EFEFEF;
border:1px solid #CCCCCC;
font-size:small;
overflow:auto;
padding:5px;
}
&lt;/style&gt;

&lt;pre class=&quot;code&quot;&gt;
@Entity
class Timeline {

    @Required 
    String description

    @OneToMany(mappedBy = &quot;timeline&quot;)
    @javax.persistence.OrderBy(&quot;startYear, endYear&quot;)
    Set&amp;lt;Event&amp;gt; events
}

@Entity
class Event {

    @Required
    Integer startYear

    Integer endYear

    @Required
    String description

    @ManyToOne
    Timeline timeline
}
&lt;/pre&gt;

&lt;p&gt;In the above example I&apos;ve used the standard JPA (Java Persistence API) &lt;code&gt;@OrderBy&lt;/code&gt; annotation which allows you to specify the order of a collection of objects via object properties, in this example a &lt;code&gt;@OneToMany&lt;/code&gt; association .  I&apos;m ordering first by &lt;code&gt;startYear&lt;/code&gt;  in ascending order and then by &lt;code&gt;endYear&lt;/code&gt;, also in ascending order. This is all well and good, but note that I&apos;ve specified that only the start year is required. (The &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/20070928&quot;&gt;@Required&lt;/a&gt; annotation is a custom Hibernate Validator annotation which does exactly what you would expect.)  How are the events ordered when you have several events that start in the same year but some of them have no end year? The answer is that it depends on how your database sorts null values by default. Under Oracle 10g nulls will come last. For example if two events both start in 2001 and one of them has no end year, here is how they are ordered:&lt;/p&gt;

&lt;pre class=&quot;code&quot;&gt;
2001 2002  Some event
2001 2003  Other event
2001       Event with no end year
&lt;/pre&gt;

&lt;p&gt;What if you want to control how null values are ordered so they come first rather than last? In Hibernate there are several ways you could do this. First, you could use the Hibernate-specific &lt;code&gt;@Sort&lt;/code&gt; annotation to perform in-memory (i.e. not in the database) sorting, using natural sorting or sorting using a &lt;code&gt;Comparator&lt;/code&gt; you supply. For example, assume I have an &lt;code&gt;EventComparator&lt;/code&gt; helper class that implements &lt;code&gt;Comparator&lt;/code&gt;. I could change &lt;code&gt;Timeline&lt;/code&gt;&apos;s collection of events to look like this:&lt;/p&gt;

&lt;pre class=&quot;code&quot;&gt;
@OneToMany(mappedBy = &quot;timeline&quot;)
@org.hibernate.annotations.Sort(type = SortType.COMPARATOR, comparator = EventCompator)
 Set&amp;lt;Event&amp;gt; events
&lt;/pre&gt;

&lt;p&gt;Using &lt;code&gt;@Sort&lt;/code&gt;  will perform sorting in-memory once the collection has been retrieved from the database. While you can certainly do this and implement arbitrarily complex sorting logic, it&apos;s probably better to sort in the database when you can. So we now need to turn to &lt;i&gt;Hibernate&apos;s&lt;/i&gt; &lt;code&gt;@OrderBy&lt;/code&gt; annotation, which lets you specify a &lt;i&gt;SQL fragment&lt;/i&gt; describing how to perform the sort. For example, you can change the events mapping to :&lt;/p&gt;

&lt;pre class=&quot;code&quot;&gt;
@OneToMany(mappedBy = &quot;timeline&quot;)
@org.hibernate.annotations.OrderBy(&quot;start_year, end_year&quot;)
 Set&amp;lt;Event&amp;gt; events
&lt;/pre&gt;

&lt;p&gt;This sort order is the same as using the JPA &lt;code&gt;@OrderBy&lt;/code&gt; with &quot;startYear, endYear&quot; sort order. But since you write actual SQL in Hibernate&apos;s &lt;code&gt;@OrderBy&lt;/code&gt; you can take advantage of whatever features your database has, at the possible expense of portability across databases. As an example, Oracle 10g supports using a syntax like &quot;order by start_year, end_year nulls first&quot; to order null end years before non-null end years. You could also say &quot;order by start_year, end year nulls last&quot; which sorts null end years last as you would expect. This syntax is probably not portable, so another trick you can use is the NVL function, which is supported in a bunch of databases. You can rewrite &lt;code&gt;Timeline&lt;/code&gt;&apos;s collection of events like so:&lt;/p&gt;

&lt;pre class=&quot;code&quot;&gt;
@OneToMany(mappedBy = &quot;timeline&quot;)
@org.hibernate.annotations.OrderBy(&quot;start_year, nvl(end_year , start_year)&quot;)
 Set&amp;lt;Event&amp;gt; events
&lt;/pre&gt;

&lt;p&gt;The expression &quot;nvl(end_year , start_year)&quot; simply says to use &lt;code&gt;end_year&lt;/code&gt; as the sort value if it is not null, and &lt;code&gt;start_year&lt;/code&gt; if it is null. So for sorting purposes you end up treating &lt;code&gt;end_year&lt;/code&gt; as the same as the &lt;code&gt;start_year&lt;/code&gt; if &lt;code&gt;end_year&lt;/code&gt; is null. In the contrived example earlier, applying the nvl-based sort using Hibernate&apos;s &lt;code&gt;@OrderBy&lt;/code&gt; to specify SQL sorting criteria, you now end with the events sorted like this:&lt;/p&gt;

&lt;pre class=&quot;code&quot;&gt;
2001       Event with no end year
2001 2002  Some event
2001 2003  Other event
&lt;/pre&gt;

&lt;p&gt;Which is what you wanted in the first place. So if you need more complex sorting logic than what you can get out of the standard JPA &lt;code&gt;@javax.persistence.OrderBy&lt;/code&gt;, try one of the Hibernate sorting options, either &lt;code&gt;@org.hibernate.annotations.Sort&lt;/code&gt; or &lt;code&gt;@org.hibernate.annotations.OrderBy&lt;/code&gt;. Adding a SQL fragment into your domain class isn&apos;t necessarily the most &lt;i&gt;elegant&lt;/i&gt; thing in the world, but it might be the most &lt;i&gt;pragmatic&lt;/i&gt; thing.&lt;/p&gt;
</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/groovification</guid>
    <title>Groovification</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/groovification</link>
        <pubDate>Mon, 4 May 2009 17:25:40 +0000</pubDate>
    <category>Development</category>
    <category>gmaven</category>
    <category>groovy</category>
    <category>java</category>
            <description>&lt;p&gt;Last week I tweeted about groovification, which is defined thusly:&lt;/p&gt;

&lt;p&gt;&lt;i&gt;groovification.&lt;/i&gt; noun. the process of converting java source code into groovy source code (usually done to make development more fun)&lt;/p&gt;

&lt;p&gt;On my main day-to-day project, we&apos;ve been writing unit tests in Groovy for quite a while now, and recently we decided to start implementing new code in Groovy rather than Java. The reason for doing this is to gain more flexibility in development, to make testing easier (i.e. in terms of the ability to mock dependencies in a trivial fashion), to eliminate a lot of Java boilerplate code and thus write less code, and of course to make developing more fun. It&apos;s not that I hate Java so much as I feel Java simply isn&apos;t innovating anymore and hasn&apos;t for a while, and isn&apos;t adding features that I simply don&apos;t want to live without anymore such as closures and the ability to do metaprogramming when I need to. In addition, it isn&apos;t removing features that I don&apos;t want, such as checked exceptions. If I know, for a fact, that I can handle an exception, I&apos;ll handle it appropriately. Otherwise, when there&apos;s nothing I can do anyway, I want to let the damn thing propagate up and just show a generic error message to the user, log the error, and send the admin team an email with the problem details.&lt;/p&gt;

&lt;p&gt;This being, for better or worse, a Maven project, we&apos;ve had some interesting issues with mixed compilation of Java and Groovy code. The &lt;a href=&quot;http://groovy.codehaus.org/&quot;&gt;GMaven plugin&lt;/a&gt; is easy to install and works well but currently has some outstanding issues related to Groovy stub generation, specifically it cannot handle &lt;a href=&quot;http://jira.codehaus.org/browse/MGROOVY-108&quot;&gt;generics&lt;/a&gt; or &lt;a href=&quot;http://jira.codehaus.org/browse/MGROOVY-109&quot;&gt;enums&lt;/a&gt; properly right now. (Maybe someone will be less lazy than me and help them fix it instead of complaining about it.) Since many of our classes use generics, e.g. in service classes that return domain objects, we currently are not generating stubs. We&apos;ll convert existing classes and any other necessary dependencies to Groovy as we make updates to Java classes, and we are implementing new code in Groovy. Especially in the web controller code, this becomes trivial since the controllers generally depend on other Java and/or Groovy code, but no other classes depend on the controllers. So starting in the web tier seems to be a good choice. Groovy combined with implementing controllers using the Spring @MVC annotation-based controller configuration style (i.e. no XML configuration), is making the controllers &lt;i&gt;really&lt;/i&gt; thin, lightweight, and easy to read, implement, and test.&lt;/p&gt;

&lt;p&gt;I estimate it will take a while to fully convert all the existing Java code to Groovy code. The point here is that we are doing it piecemeal rather than trying to do it all at once. Also, whenever we convert a Java file to a Groovy one, there are a few basics to make the classes Groovier without going totally overboard and spending loads of time. First, once you&apos;ve used &lt;a href=&quot;http://www.jetbrains.com/idea/&quot;&gt;IntelliJ&apos;s&lt;/a&gt; move refactoring to move the .java file to the Groovy source tree (since we have src/main/java and src/main/groovy) you can then use IntelliJ&apos;s handy-dandy &quot;Rename to Groovy&quot; refactoring. In IntelliJ 8.1 you need to use the &quot;Search - Find Action&quot; menu option or keystroke and type &quot;Rename to...&quot; and select &quot;Rename to Groovy&quot; since they goofed in version 8 and that option was left off a menu somehow. Once that&apos;s done you can do a few simple things to make the class a bit more groovy. First, get rid of all the semi-colons. Next, replace getter/setter code with direct property access. Third, replace for loops with &quot;each&quot;-style internal iterators when you don&apos;t need the loop index and &quot;eachWithIndex&quot; where you do. You can also get rid of some of the redundant modifiers like &quot;public class&quot; since that is the Groovy default. That&apos;s not too much at once, doesn&apos;t take long, and makes your code Groovier. Over time you can do more groovification if you like.&lt;/p&gt;

&lt;p&gt;The most common gotchas I&apos;ve found have to do with code that uses anonymous or inner classes since Groovy doesn&apos;t support those Java language features. In that case you can either make a non-public named class (and it&apos;s OK to put it in the same Groovy file unlike Java as long as it&apos;s not public) or you can refactor the code some other way (using your creativity and expertise since we are not &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/thinking_matters&quot;&gt;monkeys&lt;/a&gt;, right?). This can sometimes be a pain, especially if you are using a lot of them. So it goes. (And yes, that is a &lt;a href=&quot;http://en.wikipedia.org/wiki/Slaughterhouse-Five&quot;&gt;Slaughterhouse Five&lt;/a&gt; reference.)&lt;/p&gt;

&lt;p&gt;Happy groovification!&lt;/p&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/thinking_matters</guid>
    <title>Thinking Matters</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/thinking_matters</link>
        <pubDate>Thu, 30 Apr 2009 16:10:30 +0000</pubDate>
    <category>Development</category>
    <category>programming</category>
    <category>cylon</category>
    <category>thinking</category>
    <category>logic</category>
            <description>&lt;p&gt;Aside from the fact that &lt;a href=&quot;http://www.forbes.com/2009/04/29/java-oracle-sun-technology-internet-infrastructure-java.html&quot;&gt;Oracle&apos;s Java Problem&lt;/a&gt; contains all kinds of factual and other errors (see the comments on the post) this sentence caught my eye in particular when referring to Java being &quot;quite hard to work with&quot; - &quot;Then, as now, you needed to be a highly trained programmer to make heads or tails of the language.&quot;&lt;/p&gt;

&lt;p&gt;What&apos;s the issue here? That Java is hard to work with? Perhaps more specifically, not just Java but perhaps the artificial complexity in developing &quot;Enterprise&quot; applications in Java? Nope. The problem is that this type of thinking epitomizes the attitude that business people and other &quot;professionals&quot; tend to have about software development in general, in that they believe it is or should be easy and that it is always the tools and rogue programmers that are the problem. Thus, with more and better tools, they reason, there won&apos;t be a need for skilled developers and the monkey-work of actually programming could be done by, well, &lt;a href=&quot;http://www.newtechusa.com/PPI/main.asp&quot;&gt;monkeys&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;I believe software development is one of the hardest activities humans currently do, and yes I suppose I do have some bias since I am a developer. Also contrary to what many people think, there is both art and engineering involved, and any given problem can be solved in an almost infinite variety of ways. Unlike more established disciplines that have literally been around for hundreds or thousands of years (law, medicine, accounting, architecture, certain branches of engineering like civil, etc.), the software industry hasn&apos;t even reached the century mark yet! As a result there isn&apos;t any kind of consensus whatsoever about a completely standardized &quot;body of knowledge&quot; and thus there isn&apos;t an industy-recognized set of standard exams and boards like you find in the medical and law professions for example. (That topic is for a future post.)&lt;/p&gt;

&lt;p&gt;One thing that is certain is that software development involves logic, and thus people who can solve problems using logic will always be needed, whether the primary medium stays in textual format (source code) or whether it evolves into some different representation like &lt;a href=&quot;http://martinfowler.com/bliki/IntentionalSoftware.html&quot;&gt;Intentional Software&lt;/a&gt; is trying to do. So the statement from the article that &quot;you needed to be a highly trained programmer to make heads or tails of the language&quot; is always going to be true in software development. More generally, highly skilled people are needed in any complex endeavor, and attempts to dumb dumb complex things will likely not succeed in any area, not just software development. Would you trust someone to perform surgery on you so long as they have a &quot;Dummies Guide to Surgery&quot; book? Or someone to represent you in court who stayed at a Holiday Inn Express last night?&lt;/p&gt;

&lt;p&gt;I hypothesize that things are becoming more complex as time moves on, not less. I also propose that unless we actually succeed in building &lt;a href=&quot;http://www.scifi.com/battlestar/&quot;&gt;Cylons&lt;/a&gt; who end up wiping us all out or enslaving us, we will never reach a point where we don&apos;t need people to actually think and use logic to solve problems. So even though many business-types would love to be able to hire a bunch of monkeys and pay them $0.01 per day to develop software, those who actually realize that highly skilled people are an asset and help their bottom line, and treat them as such, are the ones who will come out on top, because they will smash their competitors who think of software/IT purely as a cost center and not a profit center.&lt;/p&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/running_visualvm_on_a_32</guid>
    <title>Running VisualVM on a 32-bit Macbook Pro</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/running_visualvm_on_a_32</link>
        <pubDate>Wed, 1 Apr 2009 11:03:44 +0000</pubDate>
    <category>Development</category>
    <category>visualvm</category>
    <category>osx</category>
    <category>macbook</category>
    <category>java</category>
            <description>&lt;p&gt;If you want/need to run &lt;a href=&quot;https://visualvm.dev.java.net/&quot;&gt;VisualVM&lt;/a&gt; on a &lt;i&gt;32-bit&lt;/i&gt; Macbook Pro you&apos;ll need to do a couple of things. First, download and install Soy Latte, using &lt;a href=&quot;http://landonf.bikemonkey.org/static/soylatte/&quot;&gt;these instructions&lt;/a&gt; - this gets you a Java 6 JDK/JRE on your 32-bit Macbook Pro. Second, download VisualVM and extract it wherever, e.g. /usr/local/visualvm. If you now try to run VisualVM you&apos;ll get the following error message:&lt;/p&gt;

&lt;pre class=&quot;code&quot;&gt;
$ ./visualvm
./..//platform9/lib/nbexec: line 489: /System/Library/Frameworks/JavaVM.framework/
Versions/1.6/Home/bin/java: Bad CPU type in executable
&lt;/pre&gt;

&lt;p&gt;Oops. After looking at the bin/visualvm script I noticed it is looking for an environment variable named &quot;jdkhome.&quot; So the third step is to export an environment variable named &apos;jdkhome&apos; that points to wherever you installed Soy Latte:

&lt;pre class=&quot;code&quot;&gt;
export jdkhome=/usr/local/soylatte16-i386-1.0.3
&lt;/pre&gt;

&lt;p&gt;Now run the bin/visualvm script from the command line. Oh, almost forgot to mention that you should also have &lt;a href=&quot;http://developer.apple.com/opensource/tools/X11.html&quot;&gt;X11&lt;/a&gt; installed, which it will be by default on Mac OS X Leopard. Now if all went well, you should have VisualVM up and running!&lt;/p&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/missing_aop_target_packages_in</guid>
    <title>Missing aop &apos;target&apos; packages in Spring 3.0.0.M1 zip file</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/missing_aop_target_packages_in</link>
        <pubDate>Thu, 15 Jan 2009 18:46:43 +0000</pubDate>
    <category>Development</category>
    <category>soylatte</category>
    <category>spring</category>
    <category>openjdk</category>
    <category>ivy</category>
            <description>&lt;p&gt;Today I was mucking around with the Spring 3.0.0.M1 source release I downloaded as a ZIP file. I wanted to simply get the sample PetClinic up and running and be able to load Spring as a project in IntelliJ. Note Spring now requires Java 6 to build, so if you&apos;re using an older 32-bit Macbook Pro you&apos;ll need to install JDK 6. I used &lt;a href=&quot;http://landonf.bikemonkey.org/static/soylatte/&quot;&gt;these instructions&lt;/a&gt; generously provided by Landon Fuller to install Soy Latte, which is a Java 6 port for Mac OS X (Tiger and Leopard). So I went to run the &quot;ant jar package&quot; command (after first setting up &lt;a href=&quot;http://ant.apache.org/ivy/&quot;&gt;Ivy&lt;/a&gt; since that is how Spring now manages dependencies) and everything went well until I got a compilation exception. There unfortunately wasn&apos;t any nice error message about why the compile failed.&lt;/p&gt;

&lt;p&gt;So next I loaded up the Spring project in IntelliJ and tried to compile from there. Aha! It tells me that the org.springframework.aop.target package is missing as well as the org.springframework.aop.framework.autoproxy.target package, and of course all the classes in those packages were also missing. I was fairly sure I didn&apos;t accidentally delete those two packages in the source code, so I checked the spring-framework-3.0.0.M1.zip file to be sure. Sure enough those two &apos;target&apos; packages are not present in the source code in the zip file. The resolution is to go grab the missing files from the Spring 3.0.0.M1 subversion &lt;a href=&quot;https://src.springframework.org/svn/spring-framework/tags/spring-framework-3.0.0.M1/&quot;&gt;repository&lt;/a&gt; and put them in the correct place in the source tree. The better resolution is to do an export of the 3.0.0.M1 tag from the Subversion repo directly, rather than be lazy like I was and download the zip file.&lt;/p&gt;

&lt;p&gt;I still am wondering why the &apos;target&apos; packages were missing, however. My guess is that whatever build process builds the zip file for distribution excluded directories named &apos;target&apos; since &apos;target&apos; is a common output directory name in build systems like Ant and Maven and usually should be excluded since it contains generated artifacts. If that assumption is correct and &lt;i&gt;all&lt;/i&gt; directories named &apos;target&apos; were excluded, then unfortunately the two aop subpackages named &apos;target&apos; got mistakenly excluded which caused a bit of head-scratching as to why Spring wouldn&apos;t compile.&lt;/p&gt;

</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/groovy_spring_groovier_spring</guid>
    <title>Groovy + Spring = Groovier Spring</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/groovy_spring_groovier_spring</link>
        <pubDate>Tue, 6 Jan 2009 23:50:46 +0000</pubDate>
    <category>Development</category>
    <category>java</category>
    <category>dynamic</category>
    <category>spring</category>
    <category>groovy</category>
            <description>&lt;p&gt;If you&apos;re into Groovy and Spring, check out my &lt;a href=&quot;http://www.ibm.com/developerworks/views/java/libraryview.jsp?search_by=groovier+spring&quot;&gt;two-part&lt;/a&gt; series on &lt;a href=&quot;http://www.ibm.com/developerworks/&quot;&gt;IBM developerWorks&lt;/a&gt; on using Groovy together with Spring&apos;s dynamic language support for potentially more flexible (and interesting) applications. In &lt;a href=&quot;http://www.ibm.com/developerworks/java/library/j-groovierspring1.html&quot;&gt;Part 1&lt;/a&gt; I show how to easily integrate Groovy &lt;i&gt;scripts&lt;/i&gt; (i.e. .groovy files containing one or more classes) into Spring-based applications. In &lt;a href=&quot;http://www.ibm.com/developerworks/java/library/j-groovierspring2.html&quot;&gt;Part 2&lt;/a&gt; I show how to use the &quot;refreshable beans&quot; feature in Spring to automatically and transparently reload Spring beans implemented in Groovy from pretty much anywhere including a relational database, and why you might actually are to do something like that!&lt;/p&gt;</description>          </item>
    <item>
    <guid isPermaLink="true">http://www.sleberknight.com/blog/sleberkn/entry/iphone_bootcamp_summary</guid>
    <title>iPhone Bootcamp Summary</title>
    <dc:creator>sleberkn</dc:creator>
    <link>http://www.sleberknight.com/blog/sleberkn/entry/iphone_bootcamp_summary</link>
        <pubDate>Fri, 5 Dec 2008 17:08:33 +0000</pubDate>
    <category>Development</category>
    <category>nerd</category>
    <category>apple</category>
    <category>big</category>
    <category>cocoa</category>
    <category>ranch</category>
    <category>bootcamp</category>
    <category>iphone</category>
            <description>&lt;p&gt;So, after having actually written a blog entry covering &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/iphone_bootcamp_logs&quot;&gt;each day&lt;/a&gt; of the &lt;a href=&quot;http://www.bignerdranch.com/classes/iphone.shtml&quot;&gt;iPhone bootcamp&lt;/a&gt; at &lt;a href=&quot;http://www.bignerdranch.com/&quot;&gt;Big Nerd Ranch&lt;/a&gt;, I thought a more broad summary would be in order. (That, and I&apos;m sitting in the airport waiting for my flight this evening.) Anyway, the iPhone bootcamp was my second BNR class (I took the Cocoa bootcamp last April and wrote a summary blog about it &lt;a href=&quot;http://www.sleberknight.com/blog/sleberkn/entry/20080410&quot;&gt;here&lt;/a&gt;.)&lt;/p&gt;

&lt;p&gt;As with the Cocoa bootcamp, I had a great time and learned a ton about iPhone development. I met a lot of really cool and interesting people with a wide range of backgrounds and experiences. This seems to be a trend at BNR, that the people who attend are people who have a variety of knowledge and experience, and bring totally different perspectives to the class. The students who attend are also highly motivated people in general, which, when combined with excellent instruction and great lab coding exercises all week, makes for a great learning environment.&lt;/p&gt;

&lt;p&gt;Another interesting thing that happens at BNR is that in this environment, you somehow don&apos;t burn out and can basically write code all day every day and many people keep at it into the night hours. I think this is due to the way the BNR classes combine short, targeted lecture with lots and lots and lots of hands-on coding. In addition, taking an afternoon hike through untouched nature really helps to refresh you and keep energy levels up. (Maybe if more companies, and the USA for that matter, encouraged this kind of thing people would actually be &lt;i&gt;more&lt;/i&gt; productive rather than less.) And because of the diversity of the students, every meal combines good food with interesting conversation.&lt;/p&gt;

&lt;p&gt;So, thanks to our instructors Joe and Brian for a great week of learning and to all the students for making it a great experience. Can&apos;t wait to take the OpenGL bootcamp sometime in the future.&lt;/p&gt;</description>          </item>
  </channel>
</rss>