Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Usage Engine Private Edition now supports batch scaling, making it possible to increase or decrease processing capacity as needed without manual intervention. As a general concept, batch scaling is a way to speed up processes by splitting the workload between multiple “workers” or resources, enabling them to complete tasks in parallel rather than sequentially. Usage Engine’s solution consists of two new agents, the Scalable InterWF Forwarder and Collector, and a new profile, the Partition Profile. It also uses the existing agents, Data Aggregator and Deduplication, which have been updated to include a Kafka storage profile. Add something here about recommended use cases as per the note above.

Prerequisites for Kafka?

...

You collect a large number of files and you want to process the data in them more efficiently. This can be achieved by creating… ,

...

maybe number these…

  1. Firstly, the File collection workflow(s) will use the an ID FieldsField (e.g. customer id?ID) to determine which shard/partition a UDR belongs to. - they manage the InterWF partitions

Add image example of the ‘File collection workflow’

  1. Next, the Max Scale Factor parameter will determine how many partitions will be created. This parameter needs to be configured in ….

Note!

The number of partitions will be the same across all storage buckets/caches/topics. The points of storage will occur, for example,

  • With the passing of UDRs between workflows.

  • When duplicate UDR keys are detected.

  • For aggregated sessions.

  1. Then, the Duplication Check workflow(s) will check for duplicates across all partitions. Checked UDRs are placed in an additional topic with the same partitions as the corresponding Collection workflow topic. (The Duplication keys are saved in a separate topic with the same number of partitions having the same ID fields.)

  1. Finally, the Aggregation workflow(s) will collect from an inter-workflow topic, and use a separate aggregation session storage topic.

...

you use the new agent InterWF Collector, to pick up the files from the external system/ IF storage (InterWF partition). You also need to have Duplication checks after which you will use the InterWF Forwarder to take the non-duplicated files and feed them to the Aggregation partitions on the data (pretty common processes in any workflow group) You will use the current agents Deduplicate and Data Aggregator, however, they will have a new storage profile option for Kafka, which you need to configure. Finally you would use the other new agent

Info

From Chat GPT re: Topics - For draft purposes only:

In a software context, especially in messaging and streaming platforms like Kafka, a topic isn’t a type of storage in the traditional sense, like a cache or database. Instead, it refers to a "channel" or "feed" where messages (like UDRs) are grouped and published for consumers to read from. While a topic involves data persistence (messages are stored temporarily or longer-term, depending on configuration), it's more about organizing and transmitting data rather than being a storage unit itself.

In comparison, a cache is a direct storage solution intended for fast access to data. Topics, on the other hand, are about managing and distributing data streams efficiently across systems.

Assume that you have a batch use case where you collect files, and have to do duplication checks and aggregation. You want to be able to scale. You need 2 or 3 WFs. In the picture below we use 3 WFs.

batchScaling.pngImage Removed

...

The File collection workflow(s) will use the ID Fields (e.g. customer id?) to determine which shard/partition a UDR belongs to.

...

The number of partitions is determined using the Max Scale Factor parameter. The number of partitions will be the same for all different storages needed:

  1. Passing of UDRs between workflows.

  2. Duplicate UDR keys.

  3. Aggregation Sessions

...

The Duplication Check workflow(s) will check for duplicates across all partitions. Checked UDRs are placed in another topic with the same corresponding partitions as the topic the workflow collected from. (The Duplication Keyes are saved in a separate topic with the same number of partitions having the same ID fields.)

...

.

Subsections

This section contains the following subsections:

...