JVM-Tuning


Observing and tuning the JVM

Tools you can use: visualgc, jconsole

All graphs are related to OpenOlat 8.1 and a load of 700 users (session timeout 1h) with 200 users who clicked in the last 5 minutes.

3 GB RAM for Tomcat

The frequency of minor collections is very high, full GC does not take much effect, the pause is very long. The cause is, that many live objects are transferred from young generation space to the old generation space. Those object can't be collected by GC because they are still living. so you need more RAM.

4 GB

The frequency of minor collections is lower. But 4 GB were not enough to run Olat with the load of 700/200 users.

6 GB

With 6 GB RAM the server runs fine. The only thing is, that the full GC stops the server every 30 minutes for approx. 15-20s. The next step is to adjust the ratio between young and old generation space. The idea was to spend more space for the YGS, so that more objects are finalized in YGS and are collected by minor collections.

6 GB with 2,5 GB for YGS and CMS Garbage collector

Increasing the YGS does not take a visible effect. But changing the garbage collector to concurrent mark sweep collector has increased the performance significantly. The pause for full GC was 5 minutes for a period of 10 hours running Olat.

Much better performance with OpenOlat 8.3

The frequency of minor GC has decreased significantly with 8.3 (load 570/150 users)

JVM parameters

CATALINA_OPTS="-server -Xss256k -Xms6144m -Xmx6144m -Djava.net.preferIPv4Stack=true -Djava.awt.headless=true -Xloggc:/tmp/gc.log -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:MaxPermSize=192m -XX:+UseConcMarkSweepGC -XX:NewSize=2560m -XX:SurvivorRatio=18 -XX:+HeapDumpOnOutOfMemoryError"

Olat runs with 8 CPU cores.

Changing the garbage collector to the new G1GC had a strange effect: I experienced very fast increasing open file handles.

Invoking lsof | grep java | wc -l within intervals of a few seconds showed the following results:
1810 / 3322 / 3917 / 4297 / 4486

Changing back to CMS garbage collector lsof showed the following results within a period of 2 hours:
706 / 815 / 665 / 772 / 888 / 716

Still need to restart Olat

Because the old generation space increases continously, a restart of Olat is needed (every 2 days right now)