ECD Examples (4.2)

This page contains a collection of examples of how different kind of yaml files for ECDs may look like: 

ECD for Batch Scheduling

For batch use cases, you typically want to create an ECD without any workflows and then configure the batch scheduling using the Workflow Group scheduling rules. An EC group with the same name as the ECD will automatically be created, which means it straight forward to start configuring scheduling rules. Note that since the ECD is inherently scalable, there is no need to perform additional configuration to distribute the batch scheduling over more machines. Scaling out the ECD using the built in autoscaler is enough.

ECDs for batch scheduling can be used together with the dynamic workflow REST API to dynamically manage workflows executing on the ECD. For instance, adding workflows to collect from new sources can be done using the /ops/api/v1/workflows/{templateName}/{workflowName} API, with relevant workflow parameters to define the collection point. When enough workflows are added to make the CPU or memory of a single EC over-utilised, the ECD can be scaled out, either automatically using autoscaling with metric threshold, or manually by changing the max-/min-settings of the ECD autoscaler.

Example ECD: https://github.com/digitalroute/mz-example-workflows/blob/master/batchsftpcollect/ecd/batchsftpcollect.yaml

Example workflow export: https://github.com/digitalroute/mz-example-workflows/tree/master/batchsftpcollect/export

ECD for Realtime

For realtime use cases, the lifecycle of the workflow is tied to an EC rather than controlled by a scheduler as in the batch scenario. This means workflows are defined as part of the ECD rather than through a separate API. 

Realtime collection often involves exposing an IP port from the workflow, often also fronted with an external load balancer or API gateway. The ECD concept supports defining the load balancer as part of the ECD resource.

Click on an example to know more:

Single connection TCP based collection workflows

First, let's consider the case where a realtime collection workflow is exposing a raw TCP or UDP port without fronting it with a load balancer. In this scenario, there is a one to one mapping between the client connection and the  workflow. Scaling the ECD is not applicable here. Instead scaling on multiple connections is done by creating new ECDs, configured with different IP ports.

Example ECD: https://github.com/digitalroute/mz-example-workflows/blob/master/tcpcollect/ecd/tcpcollect_single.yaml

Example workflow export: https://github.com/digitalroute/mz-example-workflows/tree/master/tcpcollect/export

Scalable TCP based collection workflows

Next, let's consider a case where a TCP load balancer is used to distribute load across a number of backend workflows. An external load balancer together with a Kubernetes Service resource distributes the traffic across the workflows. The workflows can expose the same IP port as it is a cluster internal port. Kubernetes networking takes care or routing the traffic to the correct EC. TCP based load balancing is classified as OSI Level 4 (transport level) load balancing.

Example ECD: https://github.com/digitalroute/mz-example-workflows/blob/master/tcpcollect/ecd/tcpcollect_scaled.yaml

Example workflow export: https://github.com/digitalroute/mz-example-workflows/tree/master/tcpcollect/export

Single connection peer-to-peer UDP based workflows

Certain low level protocols, especially in the telecom domain, require the IP address of the sender to be known to the receiver. In such scenarios, it can be necessary to map connection from the client directly to a physical node, to avoid that the traffic is being proxied between cluster nodes. The ECD does need to be tied to the physical node using a Nodeselector. This is not a scalable setup, but at least can be used to solve the peer-to-peer protocol with source address verification paradigm.

Example ECD: https://github.com/digitalroute/mz-example-workflows/blob/master/radiuscollect/ecd/radiuscollect.yaml

Example workflow export: 

ECD for Scalable Realtime Processing

For a processing workflow, scaling can be very useful to dynamically distribute the load between worker nodes and adapt to traffic changes. A processing workflow must be connected to the collection node to receive payload data. There are different tools to achieve this. Either standard using support for standard protocols like HTTP or to use proprietary features Workflow Bridge or Inter Workflow. If HTTP is used, the setup is very similar to the HTTP collection example, with the difference that the ports does not have to be published external to the cluster. HTTP has the advantages that standard Kubernetes tools for traffic management and similar can be used. For instance Istio is a tool that can provide very powerful traffic shaping capabilities to be used together with processing workflows.

Workflow Bridge on the other hand has the advantages of providing very high throughput and to integrate efficiently with  Usage Engine's concepts of UDRs, error routing etc. By using the Dynamic Load Balancing capability of Workflow Bridge it integrates nicely with the dynamic scaling capabilities of ECDs.

Inter Workflow does not provide the same raw throughput as Workflow Bridge and also adds additional complexity by introducing a disk dependency. It does however also come with advantages in the form of Batch Collection to provide transaction safety which is in some distributed processing scenarios an important capability.

Example ECD: https://github.com/digitalroute/mz-example-workflows/blob/master/wfbstream/ecd/wfbstream.yaml

Example workflow export: 

ECD for external HTTP interface

You can also configure an ECD to expose an HTTP interface externally using a DNS name and a path. This require no changes to the HTTP server workflow and only minor changes to the ECD. You need to change the networking configuration to use an Ingress resource. Also a DNS resolvable 'host' must be assigned (in a public cloud setup, this will typically be setup during the installation) as well as a path. The DNS name and the path will form the URL on which the interface will be exposed. Finally, to expose an interface publicly in a secure manner, you should use encryption, which require the use of a certificate. The certificate is stored in a Kubernetes secret. In the example below, the system certificate 'mz-cert' is used.

Example ECD: https://github.com/digitalroute/mz-example-workflows/blob/master/petstore/ecd/petstore.yaml

Example workflow export: 

ECD for Batch Workflow Groups

In order to support running Batch workflows through an ECD specification, it is possible to add workflow groups in the ECD YAML file.

Click on the following example to know more:

Disk Collection workflows added in a Workflow Group

This example shows how to add a Workflow Group in a ECD YAML file. It adds two members, one dynamic inside a Workflow Package and one static workflow. The dynamic workflow is also created by the ECD. The workflow group has a simple schedule to run every minute all days.

Example ECD: https://github.com/digitalroute/mz-example-workflows/blob/master/diskcollect/ecd/diskCollectEcd.yaml

Example workflow export: 

Note!

For certain workflows like in case of diameter, you need to set up hostname in the ECD. This can be done by adding hostAliases to the ECD. It looks like this:
hostAliases = ip - "127.0.0.1" hostnames - "server"

Where hostAliases is a list of objects containing one IP adress and a list of hostname strings.

This configuration will create entries in the /etc/hosts file (with IP and hostname mappings) in all the ECD pods.