Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This example shows a batch processing setup where you collect files and perform duplication checks and aggregation. We want to make this solution scalable to improve the processing times of our data during periods of high usage. We want to set up three workflows in our batch scaling solution.

...

  1. The Scalable InterWF forwarder agent in the File collection workflow manages the partitions. It uses an ID Field (e.g. customer ID) to determine which partition a UDR belongs to.

  2. The maximum number of partitions created is determined by the Max Scale Factor parameter in the Partition profile.

...

  1. The Duplication Check workflow will check for duplicates across all partitions. Checked UDRs are placed in an additional topic with the same partitions as the corresponding Collection workflow topic. (The Duplicate Any duplicate keys are saved in a separate topic with the same number of partitions having the same ID fields. )

  1. The Aggregation workflow will collect data from an inter-workflow topic and use a separate aggregation session storage topic.

...

Your workflow has to be designed in a way that can process batch workflows for example, there has to be at least one common denominator in the data that links individual records. include an example here?

...

Child pages (Children Display)
depth1
allChildrentrue
style
sortAndReverse
first0

...

Info

From chat with Michal:

How does the new solution differ from what users can configure now? The information on Automatic Scale Out and Rebalancing (4.3) is not related to batch scaling. It references Kafka doing some partitioning work based on what is configured in the Kafka agent. DRs new Batch scaling solution does the partitioning work within the inter-WF agents.

How does the new solution know when to scale? Is it based on the number of raw data files that get collected at any one time?  - right now you have to manually configure your ECD to scale based on a known metric i.e. if the data file amount is over 1000 files then…

Look at the example image from the doc: 

is it the File collection workflow that creates the partitions?  not really, but it is sort of the scalable InterWF forwarding agent or as Michal says - any agent using the Partition profile. 

It creates the partitions based on the Max Scale Factor paramater? True - says Michal - this will set the max number of parallel workflows as well. 

Where is the Max scale factor parameter located? In the Partition Profile configuration.

Our example shows 3 workflows - Does there have to be exactly 3 workflows in a solution? Is there a minimum/maximum amount of workflows needed to create a working solution? there is no maximum or minimum amount of workflows required. 

Are there any prerequisites required to be able to configure batch scaling using Kafka storage? yes -  your workflow has to be designed in a way that can process batch workflows for example there has to be at least 1 common denominator in the data that links individual records.