Cluster node failure


One of the nodes of our clustered instance keeps going down with the following error in the kernel logs - 

"Error while detecting node "" from database :: Last time stamp value not updated by cluster node since last "45" seconds "



This error is thrown when the primary node cannot connect to the secondary node

- The date and time of the servers in the clusters are different

- The heartbeat period is too low

- A large number of jobs is causing the secondary node to go down



1) Ensure the times of all the nodes in the cluster are the same -

2) Increase the heartbeat period of the cluster. This is the "abpm.node.heartbeat.period" property in the file (ServerKernel/etc)

3) Turn on the Queue Processor and setting a limit on how many concurrent process flows can be executed by the server at the same time. Please refer to the developer's guide for more information on the Record Queue Processor.


Have more questions? Submit a request


Article is closed for comments.