...
Overview
...
Usage Engine Private Edition now supports horizontal scaling of batch workflows, increasing or decreasing processing capacity as needed , without the need of changing configuration or any manual intervention.
Batch scaling in 9.2 is not easily achieved:
Transactions are stored per WF and file.
File collection cannot scale using the same WF config (you need to have different reg exp).
Aggregation is a separate storage (file/db). Only one WF at a time can work on one Profile/storage. The only alternative option you have is sharding (additional Profiles).
Duplicate UDR is a separate storage (file/db). Only one WF at a time can work on one Profile/storage. The only alternative option you have is sharding (additional Profiles).
The solution we implemented uses Kafka for both transaction/state and working storage for aggregation and duplication data.
Mention the new agents - but don’t go into detail.
Info |
---|
Note that you cannot mix standard agents with scaling agents in the same workflow. Workflows with standard agents save the state in PE. Workflows with batch agents save the state in Kafka. |
As a general concept, batch scaling is a way to speed up processes by splitting the workload between multiple ‘workers,’ enabling them to complete tasks in parallel rather than sequentially. Usage Engine’s solution consists of two new agents, a Scalable Inter Workflow Forwarding agent and a Scalable Inter Workflow Collection agent (Scalable InterWF). Two new profiles have also been created - the Partition Profile and the Scalable Inter Workflow Profile. The feature uses the existing agents, Data Aggregator and Duplicate UDR, which have been updated to support a Kafka storage type. Kafka must be configured for all storage within your scalable batch solution.
How it works
Scalable workflows operate by splitting batch data into partitions so that multiple workflows can cooperate to process a batch. Each scaled workflow is assigned one or more partitions and will process all the data assigned to them. When workflows are started or stopped, a rebalance is performed where partitions are reassigned to the new set of workflows.
This example shows a batch processing setup where you collect files and perform duplication checks and aggregation. We have set up two workflows in our batch scaling solution.
...
In the File collection workflow the Scalable InterWF Forwarding agent sends data to the partitions. It uses one or more unique ID Fields (e.g. customer ID) to determine which partition a UDR belongs to.
The number of partitions created is the Max Scale Factor parameter in the Partition Profile.
Note!
The number of partitions will be the same across all topics. The points of storage will occur, for example,
With the passing of UDRs between workflows.
When duplicate UDR keys are detected.
For aggregated sessions.
The Processing workflow isthe workflow that scales, that is, you can run from one up to the Max Scale Factor of WFs that will cooperate to do the processing. In this example, records go through a duplication check and are aggregated. Persistent storage for Duplicate UDR check and aggregation is also partitioned.
Subsections
This section contains the following subsections:
...
How it works
...
Configuration
...
Child pages (Children Display) | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
|