...
Configuration field | Description | ||
---|---|---|---|
Group fields Note! Changing how you group data (updating the Group fields) will start new aggregation sessions and the the old sessions will be stored. If you want to revert to the original way of grouping, the system will reuse the old stored sessions, but only if they haven’t timed out yet. See the note on TTL above. | |||
Fields | Specify which fields will be used to group records together for aggregation. If two or more records have the same values in the fields you select, they will be grouped into the same session for processing. You must configure at least one field for the data aggregator to work properly. If left empty, you will receive an error message advising of this.
You can either type the field names manually or select them from the drop-down menu. | ||
Group based on date/time | Select the checkbox to specify a defined period for aggregation. Note! The supported format is ISO 8601 (extended format), following the pattern
For example: If you need to convert a date format to support the ISO standard, see Script. Click + Add period field to add additional period fields | ||
Field | If you select to group the fields by date/ time you will specify each field and then select a time duration for each. | ||
Period | Select a time duration from the dropdown:
| ||
Aggregate fields | |||
Field | Specify the name of the field(s) on which the aggregation operation will be performed: | ||
Operation | Select the aggregation operation from the drop-down menu, this operation will apply to the chosen field. The available operations are grouped into two categories: Numeric, and General. Click + Add Aggregate field to add more fields. Numeric:
General:
| ||
Flush by | |||
Flush by | Select how and when to flush the aggregated data to the next function in the stream. The options are:
|
Flush by
The data Aggregator collects and processes data internally until it is "flushed." Flushing means the data is finalized and either saved or sent for further processing. For details see, https://infozone.atlassian.net/wiki/x/y4JLDg. In the data aggregator configuration are three options for how and when the data can be flushed.
...
Flush by ‘End of transaction’
This option flushes the aggregated data once a transaction is completed, even if the overall stream is still running, and it allows for more frequent, smaller data outputs. If the data is coming from multiple files each file's data is handled as an individual transaction within the stream. After processing each file's data and applying the aggregation logic, the results of that particular file are "flushed" immediately.
Flush by ‘End of stream’
In this case, no data is flushed until the entire stream has finished running. It continues to be aggregated throughout the entire duration of a stream. This will ensure that the output represents the entire stream's data.
...
Note!
The Flush by options, End of transaction and End of stream, do not apply to real-time streams.
Flush by 'Timeout'
In this case, the system flushes the aggregated data after a specific timeout period has elapsed or a condition is met. This can be useful when the data should be output at regular intervals.
...
Info |
---|
Example of a timeout “Record the sum of sheets of paper used by a subscriber to a printing service on the 10th of every month.” |
Select the Timeout type
This can be one of the following options:
Hour: Select the hour interval after which the data will timeout.
...
Info |
---|
Example- Timeout type set to ‘month’ For example, if If you set the timeout to happen on the 1st of every month at 23:59 in the "UTC" timezone, the system will flush all aggregated data at that exact time, on the 1st of every month. |
...
Info |
---|
Example - Timeout type set to ‘Based on timeout’timestamp field’ If your data has a field containing a timestamp (e.g., |
Adding custom conditions
You can add custom conditions when the flush by is set to timeout by clicking + Add condition.
...