Monday, October 19, 2015

Log Levels

The question came up today about why changing log level would affect application performance.

Loggers generally use an ordered set of log levels. When we request the logger to log an event, we usually pass it two parameters: the message we want to log and the level (or priority) of the message. Sometimes we only pass it the message and the name of the log method tells us the level. For example, the logger might implement methods named warn, info, or debug.

The logging framework usually has a global value for level as well. This setting tells the logger how verbose it to should be. Do you only want to see critical failures, or do want to see all the data passing through the APIs? Different kinds of debugging demand need different levels of detail.  Additionally, more verbose the logging, the more space it requires and the more time your application spends recording events instead of processing your users' requests.

When you ask the logger to record and event, the logger compares the level of the message to the global level and then makes a decision to record the message or ignore the message and do nothing. I read the documentation for log4j and (without actually testing it), this would be log4j's decision table:
╔═════════════════════════╦═════════════════════════════════════════════════════════╗
║ If I log an event as…   ║ …and my current logging level is…                       ║
╠═════════════════════════╬═════╤═══════╤═══════╤══════╤══════╤═══════╤═══════╤═════╣
║                         ║ OFF │ FATAL │ ERROR │ WARN │ INFO │ DEBUG │ TRACE │ ALL ║
╟─────────────────────────╫─────┼───────┼───────┼──────┼──────┼───────┼───────┼─────╢
║ FATAL                   ║ -   │ log   │ log   │ log  │ log  │ log   │ log   │ log ║
╟─────────────────────────╫─────┼───────┼───────┼──────┼──────┼───────┼───────┼─────╢
║ ERROR                   ║ -   │ -     │ log   │ Log  │ log  │ log   │ log   │ log ║
╟─────────────────────────╫─────┼───────┼───────┼──────┼──────┼───────┼───────┼─────╢
║ WARN                    ║ -   │ -     │ -     │ log  │ log  │ log   │ log   │ log ║
╟─────────────────────────╫─────┼───────┼───────┼──────┼──────┼───────┼───────┼─────╢
║ INFO                    ║ -   │ -     │ -     │      │ log  │ log   │ log   │ log ║
╟─────────────────────────╫─────┼───────┼───────┼──────┼──────┼───────┼───────┼─────╢
║ DEBUG                   ║ -   │ -     │ -     │ -    │ -    │ log   │ log   │ log ║
╟─────────────────────────╫─────┼───────┼───────┼──────┼──────┼───────┼───────┼─────╢
║ TRACE                   ║ -   │ -     │ -     │ -    │ -    │ -     │ log   │ log ║
╚═════════════════════════╩═════╧═══════╧═══════╧══════╧══════╧═══════╧═══════╧═════╝

Logging a message here or there should not affect application performance. However, if the logging statement is inside a loop, it can make a difference. We changed a log statement from warn to trace today and it sped up our automated tests dramatically.

Imagine this:

for (int i=0; i<1000;++i) {
    for (int j=0;i<10000;++i {
        int product = i * j;
        logger.debug(String.format("The product of i and j is %s", product));
    }
}

The computer is very fast at computing the product of i and j. But the logging statements cause the computer to perform I/O (i.e. write the message to a log file), and I/O is very slow compared to simple arithmetic. In the example above, the great majority of the time is spent writing to the log and not doing real work.

Logging levels exist to help us get to the information we need. Too much information is difficult to store, time consuming search, and as frustrating as looking for a needle in a haystack. Too little logging is frustrating because the information you need to understand the system's behavior is not available.

The trend is keeping more data for longer periods of time. Persistent storage gets cheaper all the time and that makes it easier to retain the logs. Logging tools like Splunk include machine learning capabilities to help you identify patterns and anomalies in the logs.

Happy debugging!

Friday, October 2, 2015

OSGi First Impressions

Lots of new technologies since starting this job in July. The most interesting ones turn the JVM into something more like a micro-kernel operating system. Since Java 1.5 MBeans ship as part of Java runtime. Since Java 1.6 MXBeans ship as part of Java runtime.
This application leverages that in a big way. Using Apache libraries that implement the OSGi specification, the application dynamically installs, starts, stops and uninstall features. Other Apache libraries provide a command line style interface to interact with the services and features.

The advantages of this approach are obvious. Uncaught or unrecoverable exceptions in one runtime component do not take down the entire JVM and failed services are restarted quickly and quietly. The approach also provides a more declarative way to build application. Apache Aries compiles blueprint XML files to provide dependency injection. And OSGi supports more a more flexible dependency framework than base Java's class path and class loader.

Of course, this flexibility comes at a cost. Debugging is more difficult than debugging POJOs. There is a big overhead and lots of configuration dependencies to bring up the containarized environment. It slow and delicate. Also, automated test become very flakey, very quickly. For example:
  • For years there was a single provider of the SecurityConfiguration service. Recently a second implementation was created to support external credential providers. Everything ran fine for a week, but the CI builds started failing occasionally. At the time, no one had any idea what was going wrong or why the failure were intermittent. Turn out, the tests were simply taking the whichever SecurityConfiguration provider registered itself first. In a more synchronous environment where it is easy to step through the code, this would have been obvious. But because of the mountain of frameworks and the loose coupling, finding the issue was a tough.
  • I was blocked for a day trying to register a new MBean. Eventually, a long-time project member spotted the problem. The blueprint file that would be read and create the bean was in a directory named OSGI-INF.blueprint. All of the blueprint files were in directories with that name. Except, there weren't. The IDE was concatenating the names of resource directories the same way it concatenated the name of Java package directories. The blueprint builder would descend into the proper resource directory, but it was looking for a directory named "OSGI-INF" with a subdir "blueprint", not "OSGI-INF.blueprint". Not finding such a directory, it moved on without any logging or notification.