HA Monitor Server

The embedded HA Monitor Server in each pico instance is responsible for providing information about its availability.

To configure the HA Monitor Server, you must set the system properties that are described in section High Availability. These properties control e g the listening port (HA port), interface to bind, and various threshold levels. To use the HA Monitor server, you must set the property mz.ha.enabled to true in each pico configuration.

$ mzsh topo set topo://container:<container>:/pico:<pico>/val:config.properties.mz.ha.enabled true

Example -Enabling HA Monitor Server

$ mzsh topo set topo://container:main1/pico:platform/val:config.properties.mz.ha.enabled true
$ mzsh topo set topo://container:main1/pico:ec1/val:config.properties.mz.ha.enabled true

The HA Monitor Server is accessed via ha.jar, which is referenced by the HA script monitor.

Additional scripts are available to start and stop pico instances based on the result of the monitor script.

The scripts take the following arguments and must be known by the CMS or other entity that calls the HA scripts:

  • JAVA_HOME at the monitored hosts
  • MZ_HOME of containers
  • HA ports of pico instances


The HA script interface with the HA Monitor Server


The monitor script interfaces the HA Monitor Server of a pico instance. By default, the scripts uses the following exit codes:

  • 100  - ha.jar error returned non-zero value, indicating failed instance
  • 110 - ha.jar returned 0 (zero), indicating healthy instance

Extracting the HA Scripts

The scripts files are stored in file in MZ_HOME/ha. Run the following command to extract the files:

$ tar -xvf $MZ_HOME/ha/ha.tar

Using the HA Scripts

This is the communication between the HA Monitor Server and the CMS in detail:

  1. The CMS executes the monitor script. 

    $ ./monitor <$JAVA_HOME> <$MZ_HOME> <username> <mz.ha.port> [host] [sec_timeout] [debug]

    The username argument is included for use in modified monitor scripts. 

    By default, the host argument is omitted, this will cause the HA Client to connect to localhost.

    Use the optional argument sec_timeout to specify a timeout threshold, for the response from the HA Monitor Server. For additional debug information on stdout add the argument debug. 


  2. The monitor script calls the HA Client to pull the HA monitor server for information about the pico instance. 

    ...
    # Execute the ping command
    
    $java_home/bin/java -cp $mz_home/lib/ha.jar com.digitalroute.ha.HAClient $port $@ 2>&1
    
    ...
  3. The monitor script uses the status from ha.jar to return the appropriate exit code to the CMS.

    ...
    # Collect the result
    
    result=$? 
    
    # If the result differ from "0" then the command failed.
    
    if [ "$result" != "0" ] ; then
    
        echo "FAILED"
    
        exit 100
    
    else
    
        echo "OK"
    
        exit 110
    
    fi
    ...
  4. The CMS acts upon this information. In case of failure (indicated by exit code), the offline scripts followed by the clean script should be executed. 

    $ ./offline <$JAVA_HOME> <$MZ_HOME> mzadmin <pico> ...
    $ ./clean <$JAVA_HOME> <$MZ_HOME> mzadmin <pico> ...

    The offline script is used to gracefully stop the pico instances. It must be executed when the Platform availability evaluation fails.

    The clean script is used to terminate the pico instances immediately and should always be used if the offline script fails. It will also attempt to remove any terminated SCs, from Akka clusters by calling the mzsh command akka down.  


  5. The CMS executes the online script to start a new pico instance, e g in a different container.

    $ ./online <$JAVA_HOME> <$MZ_HOME> mzadmin <pico> ...

Note!

The provided scripts should be seen as templates that must be modified for different types of CMS products.

Consider the following when adjusting the scripts:

  • Measure the response time from the HA Monitor Servers for different load conditions to be able to distinguish between unavailable and heavily loaded pico instances.
  • When starting a failed EC, scheduled workflows will be executed according to the schedule.

The HA Monitor Server Interface

You can communicate with a HA Monitor Server via telnet. The following commands are supported:

CommandDescription
helpLists the available commands
ping <pico_name>Checks the pico client(s) for availability.
quitCloses the connection
exitCloses the connection