Cluster node failure

Error:

One of the nodes of our clustered instance keeps going down with the following error in the kernel logs - 

"Error while detecting node "192.168.193.145" from database :: Last time stamp value not updated by cluster node since last "45" seconds
com.adeptia.indigo.cluster.failure.detection.ClusterDBFailureDetection.run(ClusterDBFailureDetection.java:45) "

 

Cause:

This error is thrown when the primary node cannot connect to the secondary node

- The date and time of the servers in the clusters are different

- The heartbeat period is too low

- A large number of jobs is causing the secondary node to go down

 

Recommendations:

1) Ensure the times of all the nodes in the cluster are the same - http://support.adeptia.com/entries/25012448-Secondary-Server-is-down-

2) Increase the heartbeat period of the cluster. This is the "abpm.node.heartbeat.period" property in the server-configure.properties file (ServerKernel/etc)

3) Turn on the Queue Processor and setting a limit on how many concurrent process flows can be executed by the server at the same time. Please refer to the developer's guide for more information on the Record Queue Processor.

 

Have more questions? Submit a request

0 Comments

Article is closed for comments.