This post provides some details to questions involving scheduling within a clustered environment:
1) Should the scheduler be running on only one node, or both? What is the intended behavior here?
Ideally, the scheduler runs only on primary node in the clustered environment. It is the duty of primary node to schedule all the jobs and then load balance all the triggered process flows. The scheduler never runs per node (/machine) in clustered environment. The provided KernelApplication.log file is from secondary node of clustered environment, so in this case, the scheduler shall not be started as it runs only on primary node.
2) We also noted there is a property in quartz.properties, called org.quartz.jobStore.isClustered, which is currently set to false. Should this be set to true in a clustered environment? And if so, what is the effect of this?
No, this property shall not be set to “true” in clustered environment since scheduler runs only on primary node.
3) I assume that currently this is using a round robin style load balancing? Are there any other algorithms? And if not, are there any plans to add algorithms to distribute the load more evenly based on actual load rather than number of processes?
Yes, this is round robin style of load balancing in which primary node delegates process flow execution request to each cluster node one by one (irrespective of type of process flow). As of now, we don’t support any other load balancing algorithms. Adding new algorithms for load balancing in clustered environment in certainly in our product road map but not in v6.0 release plan.
4) What steps should be taken, with respect to the scheduler, when restarting the master node?
You don't have to take any manual steps to start the scheduler, when re-starting the master node. It is handled implicitly during node start-up in clustered environment. If master node comes up within configured heart-beat period, then scheduler shall be started implicitly on this node during boot up. If master node comes up after configured heart-beat period, then one of the secondary node takes over the role of master node and scheduler is started on new master node.
5) If the master node were to fail for some reason (e.g. power failure), does the secondary node automatically take over the scheduling function?
Yes, In case the master node were to fail for some reason, the secondary node will automatically take over the scheduling function.
6) When the master node is up and running again, does it automatically assume the scheduling role?
Yes, When the master node is up and running again,it automatically assume the scheduling role.
7) If the entire cluster is to be restarted, will the scheduler start automatically once the primary node is started, or do we need to manually start it?
If the entire cluster is to be restarted, It shall be automatically started, no manual start is required.