...
Overview
Usage Engine Private Edition now supports batch scaling, making it possible to increase or decrease processing capacity as needed without any manual intervention. This will enable parallelism within batch workflows and is all about splitting a job to get it done faster. The new solution consists of two new agents, the scalable InterWF forwarder and collector, and a new profile, the partition profile. it also makes use of the existing agents, Data Aggregator and Deduplication. The solution uses Kafka for both transaction and working storage for aggregation and duplication data.
Note!
Note that you cannot mix standard agents with scaling agents in the same workflow. Workflows with standard agents save the state in PE. Workflows with batch agents save the state in Kafka.
Prerequisites for Kafka?
Are there any prerequisites required to be able to configure automatic batch scaling?…
How it works
Assume that you have a batch use case where you collect files, and have to do duplication checks and aggregation. You want to be able to scale. You need 2 or 3 WFs. In the picture below we use 3 WFs.
...
The File collection workflow(s) will use the ID Fields (e.g. customer id?) to determine which shard/partition a UDR belongs to.
The number of partitions is determined using the Max Scale Factor parameter. The number of partitions will be the same for all different storages needed:
Passing of UDRs between workflows.
Duplicate UDR keys.
Aggregation Sessions
The Duplication Check workflow(s) will check for duplicates across all partitions. Checked UDRs are placed in another topic with the same corresponding partitions as the topic the workflow collected from. (The Duplication Keyes are saved in a separate topic with the same number of partitions having the same ID fields.)
The Aggregation workflow(s) will collect from an inter-workflow topic, and work against a separate aggregation session storage topic.
Subsections
This section contains the following subsections:
...