Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Usage Engine Private Edition now supports horizontal scaling of batch workflows, increasing or decreasing processing capacity as needed without manual intervention. As a general concept, batch scaling is a way to speed up processes by splitting the workload between multiple ‘workers,’ enabling them to complete tasks in parallel rather than sequentially. Usage Engine’s solution consists of two new agents, a Scalable Inter Workflow Forwarding agent and a Scalable Inter Workflow Collection agent (Scalable InterWF). Two new profiles have also been created - the Partition Profile and the Scalable Inter Workflow Profile. The feature uses the existing agents, Data Aggregator and Deduplication, which have been updated to support a Kafka storage type. Kafka must be configured for all storage within your scalable batch solution.

How it works

Scalable WFs operate by splitting batch data into partitions so that multiple WFs can cooperate to process a batch. each scaled WFs is assigned one or more partitions and will process all the data assigned to them. When WFs are started or stopped, a rebalance is performed where partitions are reassigned to the new set of WFs.

This example shows a batch processing setup where you collect files and perform duplication checks and aggregation. We want to make this solution scalable to improve the processing times of our data during periods of high usage. We have set up two workflows in our batch scaling solution.

...

  1. In the File collection workflow the Scalable InterWF Forwarding agent manages sends data to the partitions. It uses one or more unique ID Fields (e.g. customer ID) to determine which partition a UDR belongs to.

  2. The maximum number of partitions created is determined by the Max Scale Factor parameter in the Partition Profile.

Note!

The number of partitions will be the same across all topics. The points of storage will occur, for example,

  • With the passing of UDRs between workflows.

  • When duplicate UDR keys are detected.

  • For aggregated sessions.

  1. In the The Processing workflow the Duplicate UDR agent will check for duplicates across all partitions. Checked UDRs are placed in an additional topic with the same partitions as the corresponding collection workflow topic. Any duplicate keys are saved in a separate topic.

  2. Also in the Processing workflow, the Aggregation Agent will collect data from an inter-workflow topic and use a separate aggregation session storage topic.

Prerequisites for Kafka/batch scaling?

...

  1. isthe workflow that scales, that is, you can run from one up to the Max Scale Factor of WFs that will cooperate to do the processing. In this example, records go through DUP UDR and are aggregated. Persistent storage for DUP UDR and aggregation is also partitioned.

Subsections

This section contains the following subsections:

...