The
following guide can be used to assist you when creating your unique batch scaling solution. An important thing to remember is that you cannot mix standard agents with scaling agents in the same workflows. Workflows with standard agents save the state in Usage Engine and workflows with batch agents save the state in KafkaWhen creating a scalable batch workflow in Usage Engine, it’s important to ensure that all agents with storage capabilities are configured to use Kafka storage. Additionally, scalable workflows require scalable Inter Workflow Collection and Forwarding Agents, as regular Inter Workflow Agents are not compatible. Mixing agents with different storage types, such as a Data Aggregator agent configured with Kafka storage and another with file storage, within the same workflow is not supported.
Creating a scalable solution (example)
These are high-level steps to creating a scalable batch solution in Usage Engine. The following example solution is made up of several profiles including the newly created Partition Profile (4.3) and Scalable Inter Workflow Profile (4.3), and two workflow types, Batch Scaling Collection and Batch Scaling Processing.
Create a Partition Profile
Create an Aggregation, Duplicate UDR, and Inter Workflow profile and link the Partition Profile created in Step 1 to each.
Create the workflows.
Batch Scaling Collection Workflow
Batch Scaling Processing Workflow(s) - can be one or a series of workflows.
Note!
You can include multiple Aggregation and Duplicate UDR agents within the same workflow. These agents can either share the same Partition Profile or use different Aggregation and Duplicate UDR Profiles. For instance, you might use different profiles if you need to apply a different ID field as the Key in storage. c
Decide how many maximum workflows should execute in parallel. Think about how you can evenly distribute your data between workers.
...
Decide on your scaling factor, this will be the maximum number of workflows that can effectively cooperate to process a batch. This is an important choice and will be difficult to change once your workflows are in production.
Note |
---|
Warning! |
...
Max Scale Factor that is divisible by many other numbers, like 6 or 12. You need to ensure that |
...
it is high enough to handle the data coming in, but not so high that you will overload resources. |
...
You must choose one or more fields in your UDRs that will be used to partition data. These fields may be based on a record group like a customer ID or an account number.
...
UI Parameters
...
Parameter
...
Comment
...
ID Field
...
Defines how to match a UDR to a partition.
...
Max Scale Factor - this is located in the Partition profile config.
...
Number of partitions, which is the same as maximum number of workflows that can execute in parallel. - this means that there can be fewer workflows - but not more.
Note!
If any of the parameters need to be changed, it is considered a new configuration, and they need to start with empty topics.
...
Create a Kafka Profile pointing to your cluster
Create a Partition Profile where you define your Max Scale Factor and your partitioning fields.
Create the Aggregation, Duplicate UDR, and Scalable Inter Workflow profiles and link the Partition Profile created in Step 2 to each.
Create your workflows.
Standard workflows - prepare data for scaling by sending it to the scalable InterWF Forwarder
Scalable processing workflows - collect data with a Scalable InterWF Collector.
Note |
---|
Warning! |
Note!
You can include multiple Aggregation and Duplicate UDR agents within the same workflow. These agents can either share the same Partition Profile or use different Aggregation and Duplicate UDR Profiles. For instance, you might use different profiles if you need to apply a different ID field as the Key in storage.
Scaling Batch Workflows
Usage Engine will scale out and in and re-balance scalable batch workflows automatically and you can schedule when to start a scale-out or scale-in.
Deploying a scale-out configuration with ECDs:
Use the regular https://infozone.atlassian.net/wiki/x/IgMkEg definition using ‘Dynamic Workflows’ with Dynamic Workflows to define how to package a scale-out. You need to define when these ECDs will activate.
A Collection Workflow scales with 1 extra Workflow per ECD.
A Processing Workflow scales with 3 extra Workflows per ECD.
Or combine the above into the same ECD.
Scheduling a scale-out configuration:
You can schedule the ECD and workflow to start or stop at specific times, alternatively, these can also be started manually. This is configured in… If no schedule for scaling is created, the system will scale automatically based on metrics.
Add an image of the setting for the manual schedule.
Use this table to explain the settings in the image.
...
Automatic Scaling
...
Manual Scaling
...
Based on Metric.
Should also have some “duration” of the metric to avoid oscillating behavior?
...
You can start up ECDs manually.
We have a way of scheduling ECDs as well.
Note!
When creating a scalable workflow you need to add the Kafka profile in the execution tab of the workflow properties.
Warning |
---|
Questions: Assumedly a scaling solution will contain multiple workflows and based on the above these workflows can be scheduled to be started at specific times or based on metrics? How does it being based on metrics work? does this need to be configured somewhere? The key difference for customers is that scaling workflows are quicker/more efficient at processing data than regular workflows? It’s not more automated in general but it is automatic at being efficient. (that sounds weird, but I think I understand what I mean.) This is why the metric question is interesting.. because sometimes a customer won’t know what times they’ll need to scale workflows. |
...
See the tabs on https://infozone.atlassian.net/wiki/x/VgQkEg for more information.