Automatic Scale Out and Rebalancing

Multiple workflows in the same Workflow configuration can be executed in parallel, and collect messages from the same topics, provided that there are several partitions configured for a topic, and the same consumer group is specified. If the number of running workflows changes, the Kafka cluster will automatically trigger rebalancing. You can only have one workflow per partition. If you start more workflows than partitions, the workflows not assigned any partition will run as active stand-by workflows.

kafkaRebalance.png
Multiple identical workflows can collect messages from the same topic.

Example - Collect Messages from 2 Topics with 3 Configured Partitions

You want to collect messages from 2 Kafka topics called “example1” and “example2” which have 3 configured partitions each.

In the Kafka collection agent configuration, you only need to state the names of the topics and consumer group, and the agent will automatically manage the assignment of partitions within the consumer group.

kafkaBatchColl_example.png
Kafka collection agent configuration

You will get the following behavior and debug in the executing workflows:

  • If only one workflow is started, it will collect messages from all three partitions. The debug output from the collector will look like this:

*** Assignment *** Topic(s): example1, example2 Partition(s): 0-2
  • If a second workflow is started, an automatic rebalance is triggered, and the first workflow will collect messages from two partitions, and the second from the third partition. The debug output from the collectors of the two workflows will look like this:

*** Rebalance *** Topic(s): example1, example2 Partition(s): 0-1
*** Assignment *** Topic(s): example1, example2 Partition(s): 2
  • If a third workflow is started, an automatic rebalance is triggered, and each workflow will collect messages from one partition. The debug output from the collectors of the three workflows:

  • If a fourth workflow is started, it will not be assigned any partitions since there are no partitions left and there can only be one workflow collecting from one partition. The fourth workflow will act as an active stand-by workflow. If a workflow aborts, a new automatic rebalance is triggered and the active stand-by workflow can be used.