Thursday, September 8, 2016

Java 8's Optional

That's Optional

Discussions about Java 8's class Optional are popping up a lot lately around the office. It turns out that java.util.Optional has generated a lot of discussion by a lot of people. The root node seems to be a Stack Overflow post by Brian Goetz. He is in as good a position as anyone to know Optional's back story.

...But we did have a clear intention when adding this feature, and it was not to be a general purpose Maybe or Some type, as much as many people would have liked us to do so. Our intention was to provide a limited mechanism for library method return types where there needed to be a clear way to represent "no result", and using null for such was overwhelmingly likely to cause errors.

Wait. What?

  Our intention was to provide a limited mechanism for library method return types.
I'm not sure what that means. Is a library method a method that ships with the Java Runtime Environment? Is Spring Boot a library? Is a JAR a library? And why is the mechanism limited? In what ways it is intended to be limited?

What are others saying

How I Almost Used Optional. Twice!

I have submitted two pull requests that used Optional instead of null checks. The pull requests were a month or two apart, so I had time to think about it between each time.

In both cases, the code was peer-reviewed and peer-review lead to long comment chains about Optional. In both cases I ended up ripping out Optional and submitting pull requests with null and null checks.

The code base is imperative, and null checks are the province of imperative programming. Optional is a monad, and monads are the province functional programming. The two do not mix well.

When Java 5 if statements and for-each loops sit next to Java 8 monads and lambdas, you get cognitive dissonance.

TL;DR - Do Not Mix Imperative and Functional Styles

Sleek skyscrapers and gothic churches each have a place in the world, but that does not mean they should exist side-by-side.

Nothing is wrong with the pieces,
but something is not right with the whole.

Mixing monads and lambdas with traditional imperative code creates something surreal that is hard to read and is ultimately dissatisfying. Just because Java lets you mix programming paradigms, does not mean you should.

Pick a paradigm and stick to it. At least within a package. My advice is based on very limited experience, but it feels like packages should be either imperative or functional. It's fine for imperative and functional packages to interact. But within the package, there really needs to be a single programming paradigm. Right, Chava?


And a Lisp Shall Show Them The Way

Hungry for another way? Check out:
How do they address some of the same problems as Java's Optional? Do you find nil punning and destructuring more or less appealing than null checks? How about monad's like Optional?

Tuesday, August 16, 2016

Machine learning in R


In two days I will present to the West Side Developers Legion (WSDL) about machine learning using R. It is intended to be introductory for people with no experience in R or machine learning. Surprisingly, several of the people attending list R as part of their professional expertise. I am not sure why they are coming, because this presentation assumes the audience cannot spell R. As long they are not showing up to heckle, it's all good.

I am excited to give this presentation. I put a lot of time into polishing it. I intend to limit the presentation to an hour. However, the material kept growing as I worked with it, so I broke the material into three different modules.

Module 1 - Introduction to R and Machine Learning ~ 1 hour
Module 2 - More challenging classification problem ~ 30 minutes
Module 3 - Classification with R and H2O machine learning service ~ 20 minutes


Projection of the first three principal components of a set of data on plant leaves


Monday, October 19, 2015

Log Levels

The question came up today about why changing log level would affect application performance.

Loggers generally use an ordered set of log levels. When we request the logger to log an event, we usually pass it two parameters: the message we want to log and the level (or priority) of the message. Sometimes we only pass it the message and the name of the log method tells us the level. For example, the logger might implement methods named warn, info, or debug.

The logging framework usually has a global value for level as well. This setting tells the logger how verbose it to should be. Do you only want to see critical failures, or do want to see all the data passing through the APIs? Different kinds of debugging demand need different levels of detail.  Additionally, more verbose the logging, the more space it requires and the more time your application spends recording events instead of processing your users' requests.

When you ask the logger to record and event, the logger compares the level of the message to the global level and then makes a decision to record the message or ignore the message and do nothing. I read the documentation for log4j and (without actually testing it), this would be log4j's decision table:
╔═════════════════════════╦═════════════════════════════════════════════════════════╗
║ If I log an event as…   ║ …and my current logging level is…                       ║
╠═════════════════════════╬═════╤═══════╤═══════╤══════╤══════╤═══════╤═══════╤═════╣
║                         ║ OFF │ FATAL │ ERROR │ WARN │ INFO │ DEBUG │ TRACE │ ALL ║
╟─────────────────────────╫─────┼───────┼───────┼──────┼──────┼───────┼───────┼─────╢
║ FATAL                   ║ -   │ log   │ log   │ log  │ log  │ log   │ log   │ log ║
╟─────────────────────────╫─────┼───────┼───────┼──────┼──────┼───────┼───────┼─────╢
║ ERROR                   ║ -   │ -     │ log   │ Log  │ log  │ log   │ log   │ log ║
╟─────────────────────────╫─────┼───────┼───────┼──────┼──────┼───────┼───────┼─────╢
║ WARN                    ║ -   │ -     │ -     │ log  │ log  │ log   │ log   │ log ║
╟─────────────────────────╫─────┼───────┼───────┼──────┼──────┼───────┼───────┼─────╢
║ INFO                    ║ -   │ -     │ -     │      │ log  │ log   │ log   │ log ║
╟─────────────────────────╫─────┼───────┼───────┼──────┼──────┼───────┼───────┼─────╢
║ DEBUG                   ║ -   │ -     │ -     │ -    │ -    │ log   │ log   │ log ║
╟─────────────────────────╫─────┼───────┼───────┼──────┼──────┼───────┼───────┼─────╢
║ TRACE                   ║ -   │ -     │ -     │ -    │ -    │ -     │ log   │ log ║
╚═════════════════════════╩═════╧═══════╧═══════╧══════╧══════╧═══════╧═══════╧═════╝

Logging a message here or there should not affect application performance. However, if the logging statement is inside a loop, it can make a difference. We changed a log statement from warn to trace today and it sped up our automated tests dramatically.

Imagine this:

for (int i=0; i<1000;++i) {
    for (int j=0;i<10000;++i {
        int product = i * j;
        logger.debug(String.format("The product of i and j is %s", product));
    }
}

The computer is very fast at computing the product of i and j. But the logging statements cause the computer to perform I/O (i.e. write the message to a log file), and I/O is very slow compared to simple arithmetic. In the example above, the great majority of the time is spent writing to the log and not doing real work.

Logging levels exist to help us get to the information we need. Too much information is difficult to store, time consuming search, and as frustrating as looking for a needle in a haystack. Too little logging is frustrating because the information you need to understand the system's behavior is not available.

The trend is keeping more data for longer periods of time. Persistent storage gets cheaper all the time and that makes it easier to retain the logs. Logging tools like Splunk include machine learning capabilities to help you identify patterns and anomalies in the logs.

Happy debugging!

Friday, October 2, 2015

OSGi First Impressions

Lots of new technologies since starting this job in July. The most interesting ones turn the JVM into something more like a micro-kernel operating system. Since Java 1.5 MBeans ship as part of Java runtime. Since Java 1.6 MXBeans ship as part of Java runtime.
This application leverages that in a big way. Using Apache libraries that implement the OSGi specification, the application dynamically installs, starts, stops and uninstall features. Other Apache libraries provide a command line style interface to interact with the services and features.

The advantages of this approach are obvious. Uncaught or unrecoverable exceptions in one runtime component do not take down the entire JVM and failed services are restarted quickly and quietly. The approach also provides a more declarative way to build application. Apache Aries compiles blueprint XML files to provide dependency injection. And OSGi supports more a more flexible dependency framework than base Java's class path and class loader.

Of course, this flexibility comes at a cost. Debugging is more difficult than debugging POJOs. There is a big overhead and lots of configuration dependencies to bring up the containarized environment. It slow and delicate. Also, automated test become very flakey, very quickly. For example:
  • For years there was a single provider of the SecurityConfiguration service. Recently a second implementation was created to support external credential providers. Everything ran fine for a week, but the CI builds started failing occasionally. At the time, no one had any idea what was going wrong or why the failure were intermittent. Turn out, the tests were simply taking the whichever SecurityConfiguration provider registered itself first. In a more synchronous environment where it is easy to step through the code, this would have been obvious. But because of the mountain of frameworks and the loose coupling, finding the issue was a tough.
  • I was blocked for a day trying to register a new MBean. Eventually, a long-time project member spotted the problem. The blueprint file that would be read and create the bean was in a directory named OSGI-INF.blueprint. All of the blueprint files were in directories with that name. Except, there weren't. The IDE was concatenating the names of resource directories the same way it concatenated the name of Java package directories. The blueprint builder would descend into the proper resource directory, but it was looking for a directory named "OSGI-INF" with a subdir "blueprint", not "OSGI-INF.blueprint". Not finding such a directory, it moved on without any logging or notification.