Tidying the alfresco workflow database

During an upgrade from Alfresco 4.2.5.1 to 4.2.6 for a client of ours we identified a problematic patch script which does some refactoring of data in the Activiti tables in the database. The customer system has been running for many years and the Activiti historic tables have grown large due to the fact that Alfresco never cleans these automatically. All workflow and task data is stored indefinitely in the database in the act_hi_*.

For this particular system the act_hi_detail table contained 2.1 million records (not that many for a database), however, the nasty SQL used to refactor data in the patch does not work well for a system of this size with that many workflows. The total number of Activiti processes (active and completed) were about 21000 and the patch script ran in our QA environment for many hours. Too many hours for us to have scheduled downtime, so we decided to cancel the upgrade script and find an alternate solution.

After some investigation and some trial and errors we found that when deleting a workflow using the workflow service it clears up the Activiti history tables. This can also be done from the UI per workflow after a workflow has completed. Then you have the option to delete the workflow from the workflow details page. This is nothing you do manually for 21k workflows so here comes the JavaScript console to the rescue!

var ctx = Packages.org.springframework.web.context.ContextLoader.getCurrentWebApplicationContext();
var log = Packages.org.apache.log4j.Logger.getLogger("RL_CANCEL_DELETED_WORKFLOWS");
var workflowService = ctx.getBean('WorkflowService');
logger.log("Starting cancel workflow script");
log.error("Starting cancel workflow script");

var completedWorkflows = workflowService.getCompletedWorkflows();
var limit = 5000;
logger.log("Limit is: "+limit);
log.error("Limit is: "+limit);
logger.log("Number of completed workflows: "+completedWorkflows.size());
log.error("Number of completed workflows: "+completedWorkflows.size());
if (completedWorkflows) {
	for (var i=0;i<completedWorkflows.size();i++) {
		var wf = completedWorkflows.get(i);
		if (wf.isActive()) {
			logger.log("Workflow is still active: "+wf.getId());
			log.error("Workflow is still active: "+wf.getId());
		} else {
			logger.log("Deleting workflow: "+wf.getId());
			log.error("Deleting workflow: "+wf.getId());
			workflowService.deleteWorkflow(wf.getId());
		}
		if (i>=limit) {
			break;
		}
	}
}
logger.log("Finished cancel workflow script");
log.error("Finished cancel workflow script");

This script will delete all completed workflows, both old jBPM workflows and Activiti workflows. The script will most likely time out when you run it so we added some log4j logging to it as well to get a log trail in our alfresco.log.

Since we are doing some workflow maintenance here we might as well delete all Active old jBPM workflows as well (in this case we knew for a fact that they will never be completed).

var ctx = Packages.org.springframework.web.context.ContextLoader.getCurrentWebApplicationContext();
var log = Packages.org.apache.log4j.Logger.getLogger("RL_DELETE_JBPM_WORKFLOWS");
var workflowService = ctx.getBean('WorkflowService');

var activeWorkflows = workflowService.getActiveWorkflows();
var limit = 2000;
logger.log("Limit is: "+limit);
log.error("Limit is: "+limit);
logger.log("Number of active workflows: "+activeWorkflows.size());
log.error("Number of active workflows: "+activeWorkflows.size());
if (activeWorkflows) {
	for (var i=0;i<activeWorkflows.size();i++) {
		var wf = activeWorkflows.get(i);
		if (!wf.isActive()) {
			logger.log("Workflow is not active: "+wf.getId());
			log.error("Workflow is not active: "+wf.getId());
		} else if (wf.getId().indexOf("activiti")===0) {
			logger.log("Activiti workflow: "+wf.getId());
			log.error("Activiti workflow: "+wf.getId());
		} else if (wf.getId().indexOf("jbpm")===0) {
			logger.log("Canceling jBPM workflow: "+wf.getId());
			log.error("Canceling jBPM workflow: "+wf.getId());
			workflowService.cancelWorkflow(wf.getId());
		} else {
			logger.log("Unknown workflow type"+wf.getId());
			log.error("Unknown workflow type"+wf.getId());
		}
		if (i>=limit) {
			break;
		}
	}
}

As a result of this maintenance job, we have about 1700 active activiti workflows and an act_hi_detail table with about 210000 rows (about 10% of the original count) and the patch went through in seconds.

This entry was posted in Uncategorized. Bookmark the permalink.

2 Responses to Tidying the alfresco workflow database

  1. I have bookmarked your website because this site contains valuable information in it. I am really happy with articles quality and presentation.

  2. Tk6iiW Thank you ever so for you article.Really thank you! Keep writing.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>