Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Configuration field

Description

Group fields

Note!

Changing how you group data (updating the Group fields) will start new aggregation sessions and the the old sessions will be stored. If you want to revert to the original way of grouping, the system will reuse the old stored sessions, but only if they haven’t timed out yet. See the note on TTL above. Changing the Group Fields configuration can create a new aggregation session, while previous sessions will remain stored. If you revert to the original configuration, any existing sessions in storage will only be used for aggregation if the TTL (Time to Live) has not expired.

Fields

Specify which fields will be used to group records together for aggregation. If two or more records have the same values in the fields you select, they will be grouped into the same session for processing. You must configure at least one field for the data aggregator to work properly. If left empty, you will receive an error message advising of this.

Info

Example - Grouping user records by the user field

Let’s say you have three records, and you decide to group them by the user field:

  • Record 1 has the user field set to A

  • Record 2 has the user field set to B

  • Record 3 has the user field set to A

In this case, Record 1 and Record 3 will be grouped in a session because they have the same user value (A), while Record 2 will be in a separate group session with its own user value (B). If you’re using a COUNT operation, for example, the total for the group with user A will be 2 and the total for the group with user B will be 1

You can either type the field names manually or select them from the drop-down menu.

Group based on date/time

Select the checkbox to specify a defined period for aggregation.

Note!

The supported format is ISO 8601 (extended format), following the pattern YYYY-MM-DDTHH:mm:ss.sssZ.

  • YYYY-MM-DD represents the date (year, month, day).

  • THH:mm:ss indicates the time (hours, minutes, seconds).

  • .sss denotes milliseconds.

  • Z signifies the time is in UTC (Coordinated Universal Time).

For example: "2019-01-13T00:00:00.000Z".

If you need to convert a date format to support the ISO standard, see Script.

Click + Add period field to add additional period fields

Field

If you select to group the fields by date/ time you will specify each field and then select a time duration for each.

Period

Select a time duration from the dropdown:

  • Hour

  • Day

  • Month

  • Year

Aggregate fields

Field

Specify the name of the field(s) on which the aggregation operation will be performed:

Operation

Select the aggregation operation from the drop-down menu, this operation will apply to the chosen field. The available operations are grouped into two categories: Numeric, and General. Click + Add Aggregate field to add more fields.

Numeric:

  • SUM: Adds up all the numeric values.

  • MAX: Returns the highest numeric value.

  • MIN: Returns the lowest numeric value.

  • AVERAGE: Calculates the mean of the numeric values.

General:

  • COUNT: Counts the number of records.

  • CARRY_FIRST: Uses the first value encountered.

  • CARRY_LAST: Uses the last value encountered.

Flush by

Flush by

Select how and when to flush the aggregated data to the next function in the stream. The options are:

  • End of transaction

  • End of stream

  • Timeout


See below for more information on these options and how Flush by works.


Flush by

The data Aggregator collects and processes data internally until it is "flushed." Flushing means the data is finalized and either saved or sent for further processing. For details see, https://infozone.atlassian.net/wiki/x/y4JLDg. In the data aggregator configuration are three options for how and when the data can be flushed. 

...

Flush by ‘End of transaction’

This option flushes the aggregated data once a transaction is completed, even if the overall stream is still running, and it allows for more frequent, smaller data outputs. If the data is coming from multiple files each file's data is handled individually as an individual transaction within the stream. After processing each file's data and applying the aggregation logic, the results of that particular file are "flushed" immediately. the aggregated results records from a single file are aggregated during a stream execution.

Flush by ‘End of stream’

In this case, no data is flushed until the entire stream has finished running. It continues to be aggregated throughout the entire duration of a stream. This will ensure that the output represents the entire stream's data. This option may be used when there are data sets from multiple files within a stream.

Info

Example - telecom billing system.

You have four CSV files with data from different regions (North, South, East and West), each file contains 10,000 call records from the last hour. The data must be aggregated before it’s sent to the billing system. If you set the ‘flush by’ to…

  • Flush by End of transaction - When the system finishes processing the first file (e.g., North region), the aggregated results (sessions with total calls, minutes, etc.) etc… are flushed immediately downstream for that file. The same behavior would occur for the other regions' files.

  • Flush by end of stream - The system would process all 40,000 records in the sessions before flushing the aggregated results downstream.

...

Note!
The Flush by options, End of transaction and End of stream, do not apply to real-time streams.

Flush by 'Timeout'

In this case, the system flushes the aggregated data after a specific timeout period has elapsed or a condition is met. This can be useful when the data should be output at regular intervals.

In batch streams, the timeout is passive and waits for the next stream execution to flush data.

In real-time streams, the timeout is actively monitored, and the system automatically flushes the data every 60 seconds if the timeout condition is methas arrived

Info

Example of a timeout vs a condition timeout

Timeout: “Record the mobile data usage of sum of sheets of paper used by a subscriber to a printing service on the 10th of every month.”
In this case, the data is flushed strictly based on time—on the 10th of each month.Condition Timeout: “Record the mobile data usage of a subscriber on the 10th of every month or if the 100 GB limit is reached.”
Here, the data will be flushed either when the 100 GB usage is reached or on the 10th of the month, whichever happens first.

Select the Timeout type

This can be one of the following options:

Hour: Select the hour interval after which the data will timeout.

Info

Example- Timeout type with 1-hour intervals

If you aggregate price per account and set timeout to 1 hour, then you will have the following behavior: 

  • At 13:00, Account 'X' might be charged $10 for a specific transaction or service usage.

  • At 13:33, Account 'X' uses another service or makes another transaction, leading to a new charge of $12. By the end of the aggregation window timeout (set to 14:00), the total sum for Account 'X' would be $22 (10 + 12).

For Account 'Y':

  • At 13:07, Account 'Y' is charged $20 for a service. Since Account 'Y' has its own aggregation window starting at this time, its timeout will be set to 14:07.

  • No further records come in for Account 'Y' during this window, so at 14:07 when the timeout occurs, the total for Account 'Y' remains $20.

...

Info

Example - Timeout type set to ‘day’

if If you set the timeout to happen everyday at 23:59 midnight in the "UTC" timezone, all data will be flushed at that time, every day, according to UTC.

This means the timeout will always occur at the same time each day, regardless of when data was first aggregated.

...

Info

Example- Timeout type set to ‘month’

For example, if If you set the timeout to happen on the 1st of every month at 23:59 in the "UTC" timezone, the system will flush all aggregated data at that exact time, on the 1st of every month.

...

Info

Example - Timeout type set to ‘Based on timeout’timestamp field’

If your data has a field containing a timestamp (e.g., "event_time": "2024-09-18T15:00:00"), You can select to flush the data "Based on timestamp field". The system will flush the data according to this timestamp.

W

Adding custom conditions

You can add custom conditions when the flush by is set to timeout by clicking + Add condition.

Info

Example - Custom condition on Flush by Timeout

Add a custom condition for a timeout to occur when the value of the SUM field for paper sheets is more than 70Record the sum of sheets of paper used by a subscriber to a printing service on the 10th of every month or if the 70 page limit is reached.”

Here, the data will be flushed either when the 70 page limit is reached or on the 10th of the month, whichever happens first.

This is what the custom condition would look like in the Data aggregator configuration.

flush_by_add_condition.png

OR condition configuration

Description

Condition name

Assign a name to the condition.

Based on

Select which input field or aggregated field you want to apply the condition on. The Input Fields show all the input fields configured in the stream and Aggregated Fields show the fields that you have selected to perform aggregation on. In this example, the only aggregated field is the total SUM of ‘sheets’.

custom_condition_fileds.png

Type of field

Select the type of field. Your selection will determine the configuration options that follow this.

  • Numerical values - choose an Operator from the list provided then add the Value as a number

  • Text values (string)

  • Boolean

Note

Type of field is only available if an Input field was selected in Based on.

Operator

Use this field to choose how you want to compare the selected Field with a specific value.

Note

Operator is not available if you have selected Boolean in Type of field

Value

The options for this field will change depending on the type of field selected:

  • Boolean - choose either True or False from the drop-down list.

  • Numerical or Text values - type the value in

...