Recover - EC and SC

When the CMS receives an exit code from the monitor script that indicates failure on an EC or SC, the following measures apply:

  1. Stop the pico instance by calling the offline script. All workflows will be brought down immediately, interrupting the current batches being processed.

    Example

    $ ./offline <$JAVA_HOME> <$MZ_HOME> mzadmin ec1 
    Shutting down ec1...done.
  2. The pico instance should be down, but to make sure it is completely down, call the clean script.

    Example

    $ ./clean <$JAVA_HOME> <$MZ_HOME> mzadmin ec1
  3. Start the pico instance in an alternative container.

    Example

    $ ./online <$JAVA_HOME> <$MZ_HOME> mzadmin ec1
    Starting ec1...done.



Note!

When an EC has recovered, workflows scheduled to be activated periodically will be brought up automatically the next time they are due to start. The same applies to workflows scheduled by triggers. Note that if chains of events are configured, the counting will restart, since triggered events are kept in memory only.

Unscheduled workflows are not started automatically. For this purpose, modify the online script to log in to $MZ_HOME/bin/mzsh and restart them.

A service that is running in an SC may fail over to another SC. If there are no redundant SCs, or if failover is not supported by the service, it will be restarted automatically when the SC recovers.