Give me six hours to chop down a tree and I will spend the first four sharpening the axe. – Abraham Lincoln
I feel so lucky. Because my boss considers performance matters fanatically. So he gave me enough time to optimize a software application.
This is just an after action review (AAR) of what I’ve done during testing and optimizing a web-based medium size Java application. The project which I work on is a web-based online database which has been built based on a couple of frameworks such as Spring, Hibernate, DWR and Ext.Js.
The experience tells us that regular coding practices that are perfectly legal for small projects should never be used in a medium or large scale software. A large amount of concurrent requests can be affects regular behavior of the system . A fast method may be become the slowest one. So performance tuning is a mandatory step for preparing a software which aimed to be used by a large number of online users. Finding the areas which needed to be optimized could be the first step. This can be done by performing a load testing.
There are a number of tools for doing that. I always use JMteter for simulating real world situation in a laboratory environment. It is easy to define test scenarios and scaling them up to find out break points. JConsole also provides invaluable informations about the amount of used resources.
When I started to test the first results were awful. We were so far from an acceptable situation for the stakeholder. During the time I was adding new features I was notified that this code needs to be optimized so I preferred to have a wide and deep optimization.
Finely after a couple of weeks the application was ready for final tests. The following just shows how it became fast during optimization.
The following snapshot shows how we were not able to finish the test with just 20 simultaneity users because of wasting heap memory.
I’ve done the following steps to reduce needed runtime memory and increase the performance at the same time:
Optimizing DTO Processors
By performing an optimization over DTO processors and reducing the size DTO objects the result sets became smaller.
In a special case I even prevented to use DTO objects and I just put a simple hash map to make it more fast loading. DTO processing is from the kind of iterative actions which uses a lot of resource under presser.
The DTOs optimization wasn’t enough to me. So I changed the manner of loading model objects from greedy to lazy.
Lazy Loading Instead Of Greedy Loading.
As I mentioned I made object loading lazy. Spring and Hibernate made it easy to me. It was enough to mark an object by @Lazy annotation.
Replacing HQLs With SQLs
Hibernate HQL is just an automatic gearbox in my opinion. You may be loosed the performance to gain easy development. I modified HQL statements in the critical method calls and replaced them by optimized SQL queries. It was really effective specially in paginations. I really don’t know why Microsoft doesn’t care about SQLServer pagination! Anyway I implemented a more optimized method manually. Consider getFormRecord in the following snapshot:
And check the following out how it became faster after query optimization:
During testing I notified that a small number of objects may be accessed frequently. I knew while the focus of caching is on improving performance, caching also reduces load by reducing the time of process. So our objects needed to be cached. I added EHCache as Hibernate second level cache. It is really nice. Terracota let us cluster EHCaches when we need more servers. That is really cool.
We had some routines for exporting user tables which had a significant load on overall performance. I re-developed the routine totally. Using stream instead of string made exporting very reliable. But the large amount of process which It used during exporting should be managed. So I added a little intelligent mechanism which calculates the amount of free heap memory and decides how many rows should be fetched and exported as stream.
The result of all we have made are good as my assumption. There is nothing achievable in software word and this is what make it fantastic.
The followings are the snapshots of the optimized version. It became faster and more reliable in compare with the previous charts:
The following is the most interesting result to me. This is the result of running test with 200 simultaneity users with 10 seconds ram-up applying just 2GB heap size. This scenario just can be happend by an unmanaged DOS attack. Assume 200 threads call a number of methods seamlessly. Wow, I just love it.
And the following is a comparison of running test with a range of users from 10 to 200.
The application which wasn’t able to serve 20 concurrent users became ready to be host of 200 users which all clicks the same scenario seamlessly. Now it looks very stable with a large safety margine. The tests all ran on my notebook. The DB was a minimum size VM on an old fashioned machine.
The most important lissons which I got from that optimization are the following: