Data aggregator - Configuration tab

The Data Aggregator is a processor function, meaning it operates on data as it passes through a stream, transforming it before forwarding it to the next function. The Data Aggregator configuration contains the following sections:

Group Fields: Define the fields the Data aggregator will use to group records during the aggregation process.
Aggregation Fields: Specify the fields on which the aggregation operations (sum, count, min, max) will be performed.
Flush by: Configure how and when the aggregated data should be flushed (forwarded) to the next function in the stream.

Note!
A note on TTL (Time to Live) for aggregated data sessions: Aggregated sessions are stored for a maximum of 180 days. This means that if a session is not updated for 180 days, all the stored data from that session will be permanently deleted.

Data Aggregator configuration example

Page navigation:

1 Group fields
2 Aggregation fields
3 Flush by

Configuration field	Description

Configuration field	Description
Group fields Note! Changing how you group data (updating the Group fields) will start new aggregation sessions and the the old sessions will be stored. If you want to revert to the original way of grouping, the system will reuse the old stored sessions, but only if they haven’t timed out yet.
Fields	Specify which fields will be used to group records together for aggregation. If two or more records have the same values in the fields you select, they will be grouped into the same session for processing. You must configure at least one field for the data aggregator to work properly. If left empty, you will receive an error message advising of this. Example - Grouping user records by the user field Let’s say you have three records, and you decide to group them by the user field: Record 1 has the user field set to A Record 2 has the user field set to B Record 3 has the user field set to A In this case, Record 1 and Record 3 will be grouped in a session because they have the same user value (A), while Record 2 will be in a separate session with its own user value (B). If you’re using a COUNT operation, for example, the total for the group with user A will be 2 and the total for the group with user B will be 1. You can either type the field names manually or select them from the drop-down menu.
Group based on date/time	Select the checkbox to specify a defined period for aggregation. If you need to convert a date format to support the ISO standard, see Script. Click + Add field to add additional period fields
Field	If you select to group the fields by date/ time you will specify each field and then select a time duration for each.
Period	Select a time duration from the dropdown: Hour Day Month Year
Aggregation fields
Field	Specify the name of the field(s) on which the aggregation operation will be performed.
Operation	Select the aggregation operation from the drop-down menu, this will be applied to the chosen field. The available operations are grouped into three categories: Numeric, and General. Click + Add field to add more fields. Numeric: SUM: Adds up all the numeric values. MAX: Returns the highest numeric value. MIN: Returns the lowest numeric value. AVERAGE: Calculates the mean of the numeric values. General: COUNT: Counts the number of records. CARRY_FIRST: Uses the first value encountered. CARRY_LAST: Uses the last value encountered. Date: MAX: Returns the latest date. MIN: Returns the earliest date.
Flush by
Flush by	Select how and when to flush the aggregated data to the next function in the stream. The options are: End of transaction End of stream Timeout See below for more information on these options and how Flush by works.

Flush by

The data Aggregator collects and processes data internally until it is "flushed." Flushing means the data is finalized and either saved or sent for further processing. For details see, https://infozone.atlassian.net/wiki/x/y4JLDg. There are three options in the Data aggregator configuration for how and when the data can be flushed.

Flush by ‘End of transaction’

This option flushes the aggregated data once a transaction is completed, even if the overall stream is still running, and it allows for more frequent, smaller data outputs. If the data is coming from multiple files each file's data is handled as an individual transaction within the stream. After processing each file's data and applying the aggregation logic, the results of that particular file are "flushed" immediately.

Flush by ‘End of stream’

In this case, no data is flushed until the entire stream has finished running. It continues to be aggregated throughout the entire duration of a stream. This will ensure that the output represents the entire stream's data.

Flush by 'Timeout'

In this case, the system flushes the aggregated data after a specific timeout period has elapsed or a condition is met. This can be useful when the data should be output at regular intervals.

In batch streams, the timeout is passive and waits for the next stream execution to flush data

In real-time streams, the timeout is actively monitored, and the system automatically flushes the data every 60 seconds if the timeout has arrived.

Select the Timeout type

This can be one of the following options:

Hour: Select the hour interval after which the data will timeout.

Day: Data is timed out daily at a specific time you set. You must specify both the exact time of day and the timezone when this should happen. You specify both the exact day of the month (e.g., the 1st or 15th) and the time of day when the timeout should occur, along with the timezone.

Month: Data is timed out monthly at a specific time you set. You specify the exact day of the month (e.g., the 1st or 15th or ‘last day of the month’) and the time of day when the timeout should occur, along with the timezone.

Based on timestamp field: This setting uses the event timestamp from the input data to determine when to flush the aggregated results.

Adding custom conditions

You can add custom conditions when the flush by is set to timeout by clicking + Add condition.

OR condition configuration	Description

OR condition configuration	Description
Based on	Select which input field or aggregated field you want to apply the condition on. The Input Fields show all the input fields configured in the stream and Aggregated Fields show the fields that you have selected to perform aggregation on. In this example, the only aggregated field is 'SUM.sheets’.
Type of field	Select the type of field. Your selection will determine the configuration options that follow this. Numerical values - choose an Operator from the list provided then add the Value as a number Text values (string) Boolean
Operator	Use this field to choose how you want to compare the selected Field with a specific value. The options available will depend on the the Type of field selected.
Value	The options for this field will change depending on the type of field selected: Boolean - choose either True or False from the drop-down list. Numerical or Text values - type the value in