June 7, 2009

Performance Tuning Twitcaps (Grails, JVM, GC, and CPU)

Twitcaps has been up for a full week and I had been thinking that the performance and stability was, generally, great. The site was online and available, speedy even, every time I loaded it up. I hadn’t load tested the app, and though I hadn’t spent much time publicising Twitcaps, it got a few nice press notices that generated not insignificant traffic.

Webmonkey Article

Killer Startups’ Write-up

So, this weekend, I deployed some new code, and sat back to give it a good test in the production environment (yes, this is how we freewheeling indie coders roll, no staging, no integration testing, just deploy and see what happens - gulp!)

The app seemed fine for five minute intervals, then, everything would appear to freeze for a minute or two, after which, we’re fine for another five minutes before the next seizure. Not a good sign.

So, what changes did I make to the application to induce this behavior?

Well, I changed the number and interval of my TwitterSluper jobs. Previously, I had one slurper that searched for all picture services - a single search string like “twitpic.com OR yfrog.com OR twitgoo.com”, etc. This slurper ran ONCE every 10 seconds.

In the course of my continued Twitter research, I found that, because of the ubiquity of twitpic links, I got much more numerous and complete results if I did separate searches for “twitpic.com”, “yfrog.com”, “twitgoo.com”, “flickr.com/photos/” and so on.

Accordingly, I adjusted the way my TwitterSlurper job works. Now, I would have 6 different jobs (one for each service), running concurrently, and scheduled ONCE per second. A hazy guess based on how long the searches actually take (they don’t really run once per second, because the searches sometimes take longer than 1 second) tells me that I’m in the neighborhood of 30 - 40 times more search calls with the new code.

Ok. This ran fine for extended test periods on my dev box. What is going on?

My main dev machine is a Vista 64-bit box w/6g of RAM running the Java 1.6.0_13 32-bit JVM. As a rule, I allocated the same amount of RAM to the JVM (-Xmx420m -Xms420m) as I do on my production box, a CentOS 512m VPS Slice over at Slicehost. Sometimes I’ll do some dev testing on my MacBook Pro, but mostly, I’m glued to my desktop. Ideally, I’d have a development environment to match production, but you work with what you got.

I fired up Lambda Probe (a Tomcat-specific profiler) and this is what I saw.

Pre-optimization

You can see the huge jaggy spikes in memory. When we reach the peaks, a stop-the-world garbage collection occurs and the site seizes for a couple of minutes. This is not a healthy profile when considered on the scale of 5 minute intervals. The one good thing is that the troughs do not appear to be growing over time, which would be indicative of a memory leak. Phew, nothing worse than trying to isolate a memory leak.

I’m hitting the peaks too quickly though, and normal garbage collection is not reclaiming any Old Gen memory. Curious.

Twitcaps is built on the Grails framework, which glues together some best-of-breed Java web application technologies - Hibernate, Spring - using Groovy. I love Grails, I do. I credit it (along with Groovy) with making development fun for me again.

But…it can be something of a hog out of the box. Luckily, Grails exposes the plumbing for you to dive down and take a closer look.

The TwitterSlurpers are now creating 30 to 40 times more domain objects in the same period of time, which are all backed by Hibernate’s second-level cache. Every time I load up a new object and save it, this object stays in memory, even if it is out of scope of the current request context. Perhaps all along, the site was having these seizures, just far more rarely and I never noticed them. Or, perhaps the memory was eventually reclaimed by the GC on a long enough time scale.

Reading up on Grails memory issues, I found this article which discusses some issues with loading Grails’ domain classes outside of the web request context. I implemented the fix discussed here as well as explicitly flushing and clearing the Hibernate session inside my service calls (session.flush() and session.clear()) after the batch processes. These changes seemed to do little to alleviate my problem, they may have just delayed it another minute or two. So I did a bit more reading.

Performance Tuning For Grails App In A Memory Constrained Environment

This guy was also running on a 512m slice @ Slicehost and it sounded like he had dealt with some of the same issues.

The key take away for me was that I had allocated TOO MUCH memory to the JVM. What was happening was that I was bleeding over into the SWAP and when the stop-the-world garbage collection was occuring, it was taking a very, very long time to reclaim the memory that had leaked out of RAM and into the swap.

While my numbers were slightly different than his, I lowered the amount of RAM until I honed in on the sweet spot, juuuuust before the RAM leaks to swap. Now my memory profile looked nearly the same as before (shorter with the same slopes), but the stop-the-world garbage collection was a total non-event.

He mentions the use of -XX:+UseParallelGC vs. -XX:UseConcMarkSweepGC, which although it is supposed to be good for multi-CPU systems (we both have 4), did not turn up good results for him. 

Myself, I found the Concurrent GC to be much better. The profile certainly looks healthier. Regarding CPU usage, I topped out at 129% (average about 80%) which is appx. 1/4 of the CPU available on a Quad Core box. Maybe Twitcaps’ performance will look a lot different under heavy load, but for the time being I’m going to stick with this and revert back to the Parallel GC (which worked fine, but looked not as nice) should the situation warrant it.

Here’s what my app now looks like over 2 hours:

Healthier Profile

With the application looking and behaving much better, I re-enabled the MetaCaps functionality and the site has been stable ever since.