Finding Bugs in IBM Java

The first rule of programming is “it’s always your fault.” Which, like all good rules, has some exceptions. After eliminating all other possibilities, I discovered a few issues in IBM’s Java Runtime (also known as the IBM J9 JVM) that were definitely not my fault. I consider the discovery of these issues to have been incredibly interesting as it’s not every day that one finds multiple issues with such a well tested, widely deployed, and well supported piece of software as Java.

Background on the Project

The project was to build a new web site for Mackenzie Financial. The new site would be written in Groovy on Grails 2.4. A CMS (Content Management System), OpenText, would allow business users to modify pages and update the web site with zero downtime by publishing new GSPs (Groovy Server Pages) that the site would dynamically load.

One of the most interesting facets of this project was how different the production environment was from that used by the developers. The developers used a typical stack composed of Tomcat, Oracle JDK Java (version 6), Windows/Mac/Linux operating system, x86-64 hardware. The servers that would run staging and production were very different – they ran IBM WAS (WebSphere Application Server) 8.0, IBM Java 6, IBM AIX operating system, POWER hardware.

Developers ran Tomcat instead of WAS for a few reasons. First, Tomcat took seconds to start while WAS took ~30 minutes to start. Second, Tomcat was trivial to install (Grails included it) while WAS took hours to install and was onerous to automate. Third, licensing WAS for all of the developers was prohibitively expense (which is also one of the reasons why the team used the Oracle Java distribution as opposed to the IBM one).

We ran into problems as one would expect due to different application servers (Tomcat versus WAS). However, we expected that due to Java’s “write once run anywhere” promise enforced by the extensive testing of the TCK we would not have problems with Java itself. That expectation proved to be incorrect.

IBM Java Bug #1: Non-Heap Memory Leak

The first bug I encountered was a memory leak, specifically an OutOfMemoryError in Non-Heap Memory. None of the developers encountered this bug on their systems but it readily occurred on the AIX servers whenever someone used the CMS to publish content. Thankfully, the AIX system administration staff was able to record memory dumps when the out of memory crash occurred and I was able to use YourKit Profiler (one of the only tools that could open IBM Java memory dumps) to analyze them.

The memory dump showed that classes were being loaded until memory was exhausted. On Oracle Java, when content was published, the number of classes loaded would grow then a GC (garbage collection) cycle would occur and the number of classes loaded would shrink back down to where it started. But, on IBM Java, the number of classes would grow, a GC would then occur, but no classes were unloaded.

The first question was, “why are classes being loaded and unloaded?” The project was written in Groovy on Grails and the content is published in the form of GSPs (Groovy Server Pages). GSPs are read by Grails and compiled as Groovy classes. When a new page is published, the CMS overwrites the GSP, Grails notices the file changes so it reloads the GSP, and that results in the JVM loading a new class (for the new GSP) then it should unload the old one (for the old GSP).

The profiler is able to determine what references are being held to objects in order to tell why the objects are still in memory and not being garbage collected. Since classes are objects, this functionality also applies to them. However, the profiler shows that there were no references being held to the these classses, and yet, they were never being garbage collected. This situation should not happen – the GC should always clean up unreachable objects. At that point, it was clear I was experiencing a genuine bug in the IBM Java runtime.

I needed a simple way to reproduce this problem as running all of Grails then modifying GSP files was too complex and too tedious. I created a reduced test case demonstrating the problem in the simplest way I could. Here’s the Java class:

import groovy.lang.ExpandoMetaClass;
import groovy.lang.GroovyClassLoader;
import groovy.lang.GroovySystem;
import groovy.lang.MetaClass;
import groovy.lang.MetaClassRegistry;

public class LeakTest {
	public static void main(String[] args) throws Exception{
	    for(int i=1; i <= 10000 ; i++){
	    	final GroovyClassLoader classLoader = new GroovyClassLoader(Thread.currentThread().getContextClassLoader());
	        final Class<?> clazz = classLoader.parseClass("print 'hello world'");
	        getExpandoMetaClass(clazz);
	        GroovySystem.getMetaClassRegistry().removeMetaClass(clazz);
	        System.gc();
	        Thread.sleep(50);
	        System.out.println("Done " + i + " iterations");
	    }
	}
	
    public static MetaClassRegistry getRegistry() {
        return GroovySystem.getMetaClassRegistry();
    }
    
    public static ExpandoMetaClass getExpandoMetaClass(Class<?> aClass) {
        MetaClassRegistry registry = getRegistry();

        MetaClass mc = registry.getMetaClass(aClass);
        if (mc instanceof ExpandoMetaClass) {
            ExpandoMetaClass emc = (ExpandoMetaClass) mc;
            registry.setMetaClass(aClass, emc); // make permanent
            return emc;
        }

        registry.removeMetaClass(aClass);
        mc = registry.getMetaClass(aClass);
        if (mc instanceof ExpandoMetaClass) {
            return (ExpandoMetaClass)mc;
        }

        ExpandoMetaClass emc = new ExpandoMetaClass(aClass, true, true);
        emc.initialize();
        registry.setMetaClass(aClass, emc);
        return emc;
    }
} 

To run the test:

  • Download groovy-all-2.3.6.jar
  • With either the Oracle or IBM JDK (it doesn’t matter which), run javac -cp groovy-all-2.3.6.jar LeakTest.java
  • Run java -Xmx32m -XX:MaxPermSize=32m -cp groovy-all-2.3.6.jar;. LeakTest

On the IBM JDK, the test fails with an OutOfMemory exception after 813 iterations. On Oracle, it will run indefinitely. Using the YourKit profiler, I noticed that when running on IBM, the classes loaded keeps rising, never falling, and the non-heap memory also keeps rising and never falling. On Oracle, both show the desirable sawtooth pattern.

I now knew that I had found a bug in IBM Java and I had the reduced test case to easily demonstrate it so I contacted IBM support by creating a PMR (Problem Management Report).

After a few days, IBM acknowledged the bug, and after a few weeks, provided a fix: IV61544: BEANCLASS LEAK DUE TO CACHE IN JAVA.BEANS.INTROSPECTOR.

IBM Java Bug #2: Poor Performance

The project progressed and reached the point of being ready for performance testing. Immediately, I noticed that a lot of time was being spent in the org.springframework.beansBeanWrapperImpl.convertForProperty method.

The first thing I did was search the web and found that someone else had already experienced this performance issue and reported it to the Spring project at Regression: Slow TypeDescriptor lookups in CachedIntrospectionResults on IBM JVM 6 [SPR-12185]. I hadn’t discovered another new bug but I had encountered an existing, unreported one: the reporter of that issue didn’t report the bug to IBM. And since IBM didn’t know, they didn’t fix it, and that’s why I was still encountering it. Fellow developers, please report bugs so they get fixed!

I created a reduced test case and reported the issue to IBM by creating a PMR. After a bit of a wait, IBM fixed the issue: IV66463: OVERRIDING GETREAD()/GETWRITE() METHODS IN JAVA.BEANS.PROPERTYDESCRIPTOR CLASS MAKE ITS EQUALS() FUNCTION TO FAIL.

Closing Thoughts

I’ve found lots of bugs in lots of software. But I had always thought of the Java implementation itself as being so high quality, so well tested, so well supported, that I would never find a bug in it. Sure, I knew there were bugs in it, but I thought only those developers who were really on the bleeding edge of development, doing truly experimental things would find them. Surely I would surely never find any! Then, over the course of one project, I found more than one bug.

I consider the process of encountering these bugs, diving into them, figuring out simple ways to reproduce them, working with vendor to produce the fix, and finally being able to close them out to have been an incredibly interesting and rewarding experience.

And the client was happy too. They got an easy to update, well designed, performant, maintainable web presence that won the 2015 Kasina Award as Top Canadian Website for Financial Advisors.

CC BY-SA 4.0 Finding Bugs in IBM Java by Craig Andrews is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.