...
Always start with a Batch Scaling Collection Workflow that collects from the original file source and forwards UDRs to Kafka.
The Batch Scaling Processing Workflows can be one or a series of workflows. Batch Duplication Check and Aggregation can be part of the same workflow. There can only be one Aggregation agent and one Deduplication agent per workflow.
Decide how many maximum workflows should execute in parallel. Think about how you can evenly distribute your data into different groups. For example….add example here. -
Info |
---|
Example - Evenly distributing data into different groups |
Finally you will need to select an identifier that the workflow will use to distribute the UDR. Typically, this would be a field based on the record group like a customer ID or an account number. You can also create and populate a field using APL, see https://infozone.atlassian.net/wiki/x/xeckEg.
UI Parameters
Parameter | Comment |
---|---|
ID Field | Defines how to match a UDR to a partition. |
Max Scale Factor - this is located in the Partition profile config. | Number of partitions, which is the same as maximum number of workflows that can execute in parallel. - this means that there can be fewer workflows - but not more. Note! If any of the parameters needs need to be changed, it is considered a new configuration, and they need to start with empty topics. You can use the existing data, but you must use the standard Kafka Agents and migrate the data. Or do we even want to mention this? |
...
Add an image of the setting for the manual schedule.
Use this table to explain the settings in the image.
Automatic Scaling | Manual Scaling |
---|---|
|
|
Warning |
---|
Questions: Assumedly a scaling solution will contain multiple workflows and based on the above these workflows can be scheduled to be started at specific times or based on metrics? How does it being based on metrics work? does this need to be configured somewhere? The key difference for customers is that scaling workflows are quicker/more efficient at processing data that than regular workflows? It’s not more automated in general but it is automatic at being efficient. (that sounds weird, but I think I understand what I mean.) This is why the metric question is interesting.. because sometimes a customer won’t know what times they’ll need to scale workflows. |
Info |
---|
ECD from Chat GPT: (internal notes) In simple terms, an ECD is like setting up a workspace that’s fully equipped for a job, so the tasks can start and run without interruptions or missing tools. |
...