NextGen Knowledge Center

Strict Mode Enabled

This option is enabled by default to preserve the behavior of the clustering plugin upon upgrade to 3.12 or later.

When this option is enabled, the Advanced Clustering extension ensures that all channel operations (deploy/undeploy/start/stop/etc.) are fully synchronized across the entire cluster, meaning that only one deploy/start/etc operation can be running at any given time. In addition, new servers that start up and attempt to join this cluster will be blocked by the following operations:

  • Other servers starting up / joining the cluster
  • Performing a channel deploy/undeploy/start/stop/pause/resume
  • Performing a connector start/stop task
  • Performing the Remove All Messages task for a channel
  • Performing the Remove Results task for a channel
  • Deleting a channel
  • Performing message recovery for an offline server

The clustering plugin does this by having each server obtain two locks as part of the startup process:

  • startupDeploy: This lock prevents other servers from entering the startup process at the same time, and also prevents the Remove All Messages task from being executed on a channel at the same time.
  • channelOperation: This lock prevents any other server from performing any deploy/undeploy/start/stop/pause/resume/halt/remove task on any channel across the entire cluster.

After acquiring these two cluster-wide locks, the server will proceed to perform its initial deploy of all channels (whichever channels are currently deployed across the cluster). After all of these channels have finished deploying, the above two locks will be released. At that point other new nodes will then be able to join, and deploy/start/etc operations will be able to be performed on channels again.

Caveats:

This default strict mode is meant to prevent channels from getting out-of-sync across the cluster. However, it also means that when a server joins the cluster, it may be stuck in the starting state if there is already a channel operation running on another server. Or the opposite could be true, the new server could start joining the cluster and its initial deploy process could be taking a long time, blocking other servers from performing channel operations or other new servers from joining.