Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Overview

Usage Engine Private Edition now supports horizontal scaling of batch scaling, making it possible to increase or decrease workflows, increasing or decreasing processing capacity as needed without manual intervention. As a general concept, batch scaling is a way to speed up processes by splitting the workload between multiple “workers” or resources‘workers,enabling them to complete tasks in parallel rather than sequentially. Usage Engine’s solution consists of two new agents, the Scalable InterWF Forwarder and Collector, and a new profile, the Partition Profile. It also a Scalable Inter Workflow Forwarding agent and a Scalable Inter Workflow Collection agent (Scalable InterWF). Two new profiles have also been created - the Partition Profile and the Scalable Inter Workflow Profile. The feature uses the existing agents, Data Aggregator and DeduplicationDuplicate UDR, which have been updated to include support a Kafka storage profile. Add something here about recommended use cases as per note above.

Prerequisites for Kafka?

Are there any prerequisites required to be able to configure automatic batch scaling?…type. Kafka must be configured for all storage within your scalable batch solution.

How it works

You collect a large number of files and you want to process the data in them more efficiently. This can be achieved by creating… ,

  1. The File collection workflow(s) will use the ID Fields (e.g. customer id?) to determine which shard/partition a UDR belongs to - they manage the InterWF partitions

you use the new agent InterWF Collector, to pick up the files from the external system/ IF storage (InterWF partition). You also need to have Duplication checks after which you will use the InterWF Forwarder to take the non-duplicated files and feed them to the Aggregation partitions on the data (pretty common processes in any workflow group) You will use the current agents Deduplicate and Data Aggregator, however they will have a new storage profile option for Kafka, which you need to configure. Finally you would use the other new agent

Assume that you have a batch use case where you collect files, and have to do duplication checks and aggregation. You want to be able to scale. You need 2 or 3 WFs. In the picture below we use 3 WFs.

batchScaling.pngImage Removed

...

Scalable workflows operate by splitting batch data into partitions so that multiple workflows can cooperate to process a batch. Each scaled workflow is assigned one or more partitions and will process all the data assigned to them. When workflows are started or stopped, a rebalance is performed where partitions are reassigned to the new set of workflows.

This example shows a batch processing setup where you collect files and perform duplication checks and aggregation. We have set up two workflows in our batch scaling solution.

...

  1. In the File collection workflow the Scalable InterWF Forwarding agent sends data to the partitions. It uses one or more unique ID Fields (e.g. customer id?ID) to determine which shard/ partition a UDR belongs to.

  2. The number of partitions created is determined using the Max Scale Factor parameter in the Partition Profile.

Note!
The number of partitions will be the same

...

across all topics. The points of storage will occur, for example,

  • With the passing of UDRs between workflows.

...

  • When duplicate UDR keys are detected.

...

  • For aggregated sessions.

...

  1. The

...

  1. Processing workflow isthe workflow that scales, that is, you can run from one up to the Max Scale Factor of WFs that will cooperate to do the processing. In this example, records go through a duplication check and are aggregated. Persistent storage for Duplicate UDR check and aggregation is also partitioned.

Subsections

This section contains the following subsections:

...

Configuration

...

Batch scaling agents

...

Child pages (Children Display)
depth1
allChildrentrue
style
sortAndReverse
first0