...
These are high-level steps to creating a scalable batch solution in Usage Engine. The following example solution is made up of several profiles including the newly created Partition Profile (4.3) and Scalable Inter Workflow Profile (4.3), and two workflow types, Batch Scaling Collection and Batch Scaling Processing.
Create a Partition Profile
Create an the Aggregation, Duplicate UDR, and Scalable Inter Workflow profile profiles and link the Partition Profile created in Step 1 to each.
Create the workflows.
Batch Scaling Collection Workflow
Batch Scaling Processing Workflow(s) - can be one or a series of workflows.
...
Decide how many maximum workflows should execute in parallel. Think about how you can evenly distribute your data between workers.
Note |
---|
...
Warning! |
...
Max Scale Factor that is divisible by many other numbers, like 6 or 12. You need to ensure that |
...
it is high enough to handle the data coming in, but not so high that you will overload resources. |
Finally, you will need to select an identifier that the workflow will use to distribute the UDRs. Typically, this would be a field based on the record group like a customer ID or an account number. You can also create and populate a field using APL, see https://infozone.atlassian.net/wiki/x/xeckEg.
Note!
When creating a scalable workflow you need to add the Kafka profile in the execution tab of the workflow properties.
UI Parameters
Parameter | Comment | ||
---|---|---|---|
ID Field | Defines how to match a UDR to a partition. | ||
Max Scale Factor - this is located in the Partition profile config. | Number of partitions, which is the same as maximum number of workflows that can execute in parallel. - this means that there can be fewer workflows - but not more. Note! If any of the parameters need to be changed, it is considered a new configuration, and they need to start with empty topics.
|
Scaling Batch Workflows
Usage Engine will scale out and in and re-balance scalable batch workflows automatically and you can schedule when to start a scale-out or scale-in.
...
Automatic Scaling | Manual Scaling |
---|---|
|
|
...
|
Warning |
---|
Questions: Assumedly a scaling solution will contain multiple workflows and based on the above these workflows can be scheduled to be started at specific times or based on metrics? How does it being based on metrics work? does this need to be configured somewhere? The key difference for customers is that scaling workflows are quicker/more efficient at processing data than regular workflows? It’s not more automated in general but it is automatic at being efficient. (that sounds weird, but I think I understand what I mean.) This is why the metric question is interesting.. because sometimes a customer won’t know what times they’ll need to scale workflows. |
...