Kafka Real-Time Collection Agent Configuration (4.2)

You open the Kafka collection agent configuration dialog from a workflow configuration. Click  Build → New Configuration. Select Workflow from the Configurations dialog. When prompted to Select workflow type, select Realtime. Click Add agent and select Kafka from the Collection tab in the Agent Selection dialog

kafkaCollRT.png
Kafka real-time collection agent configuration.

Setting

Description

Setting

Description

Profile

Select the Kafka profile you want the agent to use in this drop-down list. The Kafka profile defines from which Kafka broker the agent collects data.

Consumer Group

The consumer group to collect data from.

Maximum poll size

The number of messages to request at a time. A low value will affect performance negatively.

Offset management

At Least Once

When you select this option, a message is guaranteed to be collected at least once.

This setting may sometimes result in duplicates.

At Most Once

When you select this option, a message is guaranteed to be collected at most once.

This setting may sometimes result in data being lost.

Start At Requested

When you select this option you must determine from which offset you want to start collecting data and add an incoming route for the UDRs. When a UDR arrives on that route, the collection of data will start from the given offset. You set the offset using the KafkaOffset UDR in an Analysis agent, see KafkaOffset (4.2). This setting reduces the risk of data loss, and prevents messages from being processed multiple times after a restart. See the example in Legacy KafkaOffsetUDR(4.2) .

Start At Beginning

When you select this option you must determine from which offset you want to start collecting data. Messages are then collected from the first offset. With this setting there is a risk that messages will be processed multiple times after a restart.

Start At End

When you select this option you must determine from which offset you want to start collecting data. Messages are then selected from the last offset from when the workflow was started. With this setting there is a risk that data can be lost after a restart.

Assignment

When you select this option, messages can be collected from one or several topics. The topics can be identified in two different ways:

Topic Pattern
Enter a regular expression that the names of the topics you want to collect from must match, see https://docs.oracle.com/en/java/javase/15/docs/api/java.base/java/util/regex/Pattern.html.

Topic Names
This option displays a list and an Add button. Add one or several topic names to collect from. The exact names must be entered. Regular expressions cannot be used.

Topic Partitions
This option will display a list and an Add button. Add one or several topic names and partitions to collect from. The exact names must be entered. Regular expressions cannot be used.

Below are a few examples of valid partition declarations:

Example of collection from partition 0:

Partitions: 0

Example of collection from the three partitions 0, 8 and 12:

Partitions: 0,8,12

Example of collection from the six partitions 0, 3, 4, 5, 6, and 7:

Partitions: 0,3-7

Note!

If you select Topic Partitions, automatic rebalancing will not take place, and you will have to handle potential rebalancing manually if needed.

Â