We have a customer where we’ve started to get memory problems in Alfresco recently. Those kind of problems can be very hard to pinpoint but for this particular client we’re almost sure what’s causing the memory problems. Unfortunately for us, this knowledge doesn’t prevent the memory problem. The bad thing with the JVM (in this case) is that even though a memory problem has occured, the JVM is left in a running state, although it’s not in a good running state… This particular Alfresco solution is clustered, and the memory problems ejects the server from Alfresco’s Hazelcast cluster but the LoadBalancer cluster still thinks the server is in the cluster which leads to a lot of problems down the road
Our customer have a very good organization around Alfresco, and if the JVM in which Alfresco lives should die when a memory problem occurs, it can be restarted in a breeze. In order to achieve this there is a JVM parameter (-XX:OnOutOfMemoryError) which can be used to execute a script when such an error occurs.
Below is how we solved this for our customer, a step-by-step instruction how to achive this. The server OS is Ubuntu 12.04.
Install the package mailutils if not already installed.
sudo apt-get install mailutils
Create a shell script somewhere in your installation path.
nano -w /opt/alfresco/current_version/bin/mail-and-kill.sh
Paste this content into the script.
#!/bin/bash # First argument : process id # Second argument : server port # Third argument : module (repo, solr or share) PROCESSID="$1" PORT="$2" MODULE="$3" FROM="firstname.lastname@example.org" TO="email@example.com" SUBJECT="Tomcat shut down on $HOSTNAME:$PORT ($MODULE)" FILENAME="/tmp/mail-and-kill.txt" rm -f $FILENAME echo "Server : $HOSTNAME" >> $FILENAME echo "Port : $PORT" >> $FILENAME echo "Module : $MODULE" >> $FILENAME echo "Message : Server got an java.lang.OutOfMemoryError and java process is killed" >> $FILENAME mail -a "From: $FROM" -s "$SUBJECT" $TO < $FILENAME kill -9 $PROCESSID
The script takes three parameters. Process id (jvm process), port of Tomcat (HTTP port), and the module which caused the problem (repo, solr or share).
In order for this to work, a local mail server has to be installed. For our client we’ve installed postfix which acts as a mail relay server.
Add the following to the Tomcat startup parameters (for example setenv.sh).
JAVA_OPTS="$JAVA_OPTS -XX:OnOutOfMemoryError='/opt/alfresco/current_version/bin/mail-and-kill.sh %p 8080 repo'"
- Restart Alfresco and force some nasty code to kill it and watch how you get a mail and the JVM is killed.