If your car just wouldn’t start, it would be nonsensical for one of your passengers to say, “Let’s all just get out of the car, close the doors, and then lock the car. We’ll then wait 10 seconds and get back in.” It won’t work and you just lost some of the time you have left here on earth.
Restarting your production servers every time that recurring ‘OutofMemory’ issue appears would prove to be no more fruitful.
I recently tackled the task of debugging a potential memory leak in our java based enterprise web application. I felt like a consultant and like any consultant, I was expected to give a final recommendation. No memory leak was found; rather, an interesting series of events lead me to believe that our JVM options were not configured optimally at runtime. It turns out that we had many short lived objects that were prematurely being tenured from the heap’s young generation space (Eden and Survivor) to the heap’s old space (tenured). The reason behind the premature promotion had to do with the young space portion of the heap filling up too quickly, as the space allocated to the young generation space of the heap was next to nothing by default. The premature tenuring is also what caused the old space utilization to rise, something we watch for in our Nagios system monitoring charts. This takes us one step back as we ask, “Why was there a suspicion that we had a memory leak? What metric lead folks to believe we had a memory leak?”
Our web app’s performance was not underperforming so that’s not what led us to deduce that we may be experiencing a memory leak. Rather, we run ‘jstat –gcutil‘
The java process I’m interested in performing statistics on is the ‘Dictionary’ application and its process id is 5776. I’ll use this id when invoking jstat.
S0 S1 E O P YGC YGCT FGC FGCT GCT
0.00 0.00 2.09 97.00 0.15 66 0.021 25 0.271 0.293
The 4th field, ‘O’, represents old space utilization, or usage of the old space. Whenever this metric rises, our eyebrows also rise in suspicion that certain objects are living in memory much longer than they should be. The stat itself can be deceiving since it varies based upon the absolute size of the heap, particularly the size of the old generation space. If the old generation space exhibits a small size, then we’d expect the utilization metric to be quite high, any objects living in that space eating up a greater portion of that space. Aside from the absolute size caveat, the observation that our old space utilization was abnormally high, coupled with the fact that garbage collections in the old space are much more expensive, gave us due cause to worry.
So just how is the java heap organized? In a nutshell, the java heap has a young generation space
How exactly does garbage collection work? Well, that’s the topic of another discussion and likely beyond my scope as well. For now, it’s useful to know that there exist numerous garbage collection types and various algorithms for each type. At the end of the day, the goal is to iterate over the reachable objects and mark them as live while removing all non reachable objects.
Ok, back to the case study. Our application is similar to a search engine application. We get numerous short lived connection objects and slightly fewer stateful search objects, also short lived. It seems as if some of our connection objects and quick search objects were sneaking their way into the old space portion of the heap, simply because the Eden and Survivor spaces were filling up too quickly. How did we come to this conclusion? We used JConsole, java’s memory monitoring and management console, to visually inspect the behavior of the heap live.
To invoke JConsole, type ‘jconsole’ from the
First, before having your fun, choose the local or remote java process that you want to connect to.
JConsole didn’t give us any decisive information; however, it did lead us to an interesting conclusion. We noticed that after a garbage collection, most of the objects were being promoted from Eden and Survivor space to the Tenured space with very few objects being swept into garbage. It seemed like a garbage collection would simply copy all Eden objects and place them into Tenured space, unable to find many dead objects. It seems as if the live objects needed just a little bit more time to live in the young generation before they’d die off, as many of our objects are short lived connection objects.
To verify we weren’t overlooking the possibility of a coding issue that was causing a mem leak (perhaps there was code creating unnecessary objects that continued living into the old generation space), we ran ‘jmap’ to get a memory mapping of the heap. jmap is a kick butt toolas it can create a histogram of the live heap as well as create a heap dump file. You can then use a tool like ‘jhat’ to (java heap analysis tool) to visually profile the dump file. For this case study, we just ran jmap with the ‘-histo’ option to create a histogram of classes, number of instances, and number of bytes used by various objects. The output format is as follows (example):
num #instances #bytes class name
10: 1806 43344 java.lang.String
Cool tool, really cool tool but it does didly squat for us here.
Perhaps we weren’t allocating enough absolute memory to the young generation space, but this seemed a bit unnerving since we were giving the heap 1 gigabyte of space to work with.
Using JStat once more, we could gather how much heap space was being allocated to the Eden, Survivor, and Tenured portions of the heap.
Jstat’s gccapacity option can tell us how much of the heap is allocated to the eden, survivor, and tenured spaces.
C:Program FilesJavajdk1.6.0_04bin>jstat -gccapacity 2104
NGCMN NGCMX NGC S0C S1C EC OGCMN OGCMX OGC OC
1024.0 4992.0 1280.0 128.0 128.0 1024.0 4096.0 60544.0 15240 15240
NGCMN – minimum new generation capacity = 1024 kb
NGCMX – maximum new generation capacity = 4992 kb
NGC – current new generation capacity = 1280 kb
S0C – current survivor space 0 capacity = 128 kb !!
S1C – current survivor space 1 capacity = 128 kb !!
EC – current eden capacity = 1024 kb
OGCMN – minimum old generation capacity = 4096 kb
OGCMX – maximum old generation capacity = 60544 kb
OGC – current old generation capacity = 15240 kb
OC – current old space capacity = 15240 kb
It seems that by default, the Sun Hot Spot JVM allocates a tiny portion of the heap to the Eden space and an even tinier portion to the survivor space. This meant our short-lived objects would be quickly promoted to the tenured space since both the Eden and Survivor spaces were being filled quickly, their max capacities being extremely small. The old generation tenured space was given about 60mb, over 90% of the heap! A huge chunk of the remaining portion was given to the Eden space while only a sliver was given to survivor. Bingo! Not enough space is allocated to the young generation, causing the heap to fill up quickly and prematurely promote temporary objects into the old space. This is dangerous since garbage collection in the old space is EXPENSIVE.
So, to remedy this situation, all we needed to do was to set the ‘–Xmn’ option when starting the jvm. The ‘-Xmn’ option specifies the amount of heap space to allocate to the young generation. In Eclipse, when running a java program, this option may be specified in the ‘VM Arguments’ text box of the ‘Arguments’ tab in the ‘Run Dialog’
Here, we’re guaranteeing that the young generation is given 20 megabytes of the heap space. This will allow short lived objects more time to live in the young generation, with the hopes of eventually being removed by the garbage collector while still in the young generation. Running ‘jmap –gccapacity’ allows to verify that the heap’s memory allocation does indeed reflect our VM Arguments change.
C:Program FilesJavajdk1.6.0_04bin>jstat -gccapacity 4908
NGCMN NGCMX NGC S0C S1C EC OGCMN OGCMX OGC OC
20480.0 20480.0 20480 2048 2048 16384 4096 45056 4096 4096
The next post will detail how to use JConsole and JMAP to verify that a code issue is causing a memory leak. We’ll then show the code fix and use JConsole and JMAP once again to visually verify that the code fix plugs up the memory leak.
Updated to include comments from Aalap S. Updates include:
1. JConsole connection screen
2. Link for more JConsole info
OGC was mislabeled so I relabeled it to: OGC – current old generation capacity. Sorry bout that