Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The Data Aggregator Function

...

helps you streamline data by combining related records from one or more sources into a single

...

,

...

more useful format. This process allows for easier analysis and better decision-making.

...

It groups records

...

into sessions

...

based on their field values and configurable conditions, making data usable for different use cases.

For instance, if you're working with a system with limitations on the number of usage events

...

it can handle, the Data Aggregator Function can simplify this by performing operations like SUM, COUNT, MAX, MIN, and AVERAGE

...

on

...

For example, accumulate usage records over time, per account, subscriber, and service to calculate the total usage of a service over a month and then use the aggregated data to deliver to a billing service or to be used as an input for a cloud-based storage service.

Configuration

The Data Aggregator Function's configuration contains the following sections:

...

Here you specify the names of all the fields you want to group together when performing the aggregation.

...

selected fields

...

titleExample

...

Record2 has user field with value set as 2

Record3 has user field with value set as 1

So, Record1 and Record3 will be grouped in the same aggregated data session and the value will be updated to 2 (in case of the SUM operation). And, Record 2 will be in a separate session.

...

Select the checkbox to enable aggregation based on a time period. Specify applicable fields, containing timestamp data, to be used for grouping of aggregation data.

Select the Period to group the information by:

  • Hour
  • Day
  • Month
  • Year
Info
titleTime Period Format

The supported time period format is ISO 8601 (extended format) in accordance with the YYYY-MM-DDTHH:mm:ss.sssZ pattern.

In case you need to convert a date format to support the ISO standard, see Script.

An example would be: "2019-01-13T00:00:00.000Z". 

Click Image Removed to add additional grouping criteria based on the time period.

Note
titleConfiguration

Any change in the Group Fields configuration may result in a new aggregation session, however, the old sessions will still be in the storage. So if you change it back to the previous configuration, the already existing session in the storage (if any) will be used during the aggregation. Refer to the note for more information on the TTL of an aggregation session.

...

Here you specify the name of the field(s) on which the aggregation operation will be performed:

...

The Operation drop-down menu is divided into three types: 

  • Numeric: 
    • SUM
    • MAX
    • MIN
    • AVERAGE
  • General
    • COUNT
    • CARRY_FIRST
    • CARRY_LAST
  • Date
    • MAX
    • MIN

Click Image Removed to add more fields for aggregation.

...

Here you select how and when you want to flush the aggregated data to the next Function in the stream.

...

Aggregated data is flushed at the end of each transaction.

Info

A use case where records from a single file or data set are being aggregated during the stream execution.

This option is not applicable to real-time streams.

...

Aggregated data is flushed at the end of the stream.

Info
titleExample

A use case where data sets from multiple files are being aggregated during a stream execution.

This option is not applicable to real-time streams.

...

In the case of batch streams, the flush for timed out aggregated data happens only when a stream is executed. For real-time streams, the aggregated data is checked every 60 seconds and flushed in case of timeout. 

Info
titleExample

Example of a timeout - Record the consumption of mobile data of a subscriber by 10th of every month.

Example of a condition timeout -  Record the consumption of mobile data on 10th of every month or if 100 GB limit is reached. So in this case the timeout will happen if either of these two conditions are met.

Select the type of timeout based on:

...

Hours: Aggregated data is timed out at the specified interval after creation.

...

titleExample

...

  • At 13.00 Account 'X' with price 10 is aggregated, timeout will then be set to 14.00
  • At 13.07 Account 'Y'  with price 20 is aggregated, timeout will be set to 14.07
  • At 13.33 Account 'X' with price 12 is aggregated, sum updated to 22, timeout will still be set to 14.00

...

Day: Aggregated data is timed out on a daily basis. Both Time and Timezone must be considered when specifying frequency based on Day. The timeout will be an absolute value as you set time/day when the timeout must happen. 

Info
titleExample

For example, you set the timeout to day and time to 12.00 PM, then all created aggregation sessions will be timed out independently on when they were created.

...

  • Based on timestamp field: Timeout will be defined based on the input timestamp. Supports UTC format only.

    Info
    titleDate Format

    The supported date format is ISO 8601 (extended format) in accordance with the YYYY-MM-DDTHH:mm:ss.sss pattern.

    An example would be: "2019-01-13T00:00:00.000". 

    For example, a record that matches aggregation criteria with the timestamp field set to 20220803:13.00.00 will have its timeout set to that date and time. Timeout will be updated in case new matching records arrive (with a different timestamp).  

...

expandedtrue
titleAdd Condition

...

You also have the option to set a condition to timeout the aggregated data. (For example, timeout to happen when the value of the SUM field is more than 70).

Note
This option is only available in case of Timeout flush setting.

...

Based on: Select which input fields or aggregated fields you want to apply the condition on. The Input Fields show all the input fields configured in the stream and Aggregated Fields show the fields that you have selected to perform aggregation on (in the Aggregation tab).

Note

The CARRY_FIRST and CARRY_LAST operations are not supported. So if you perform aggregation using any of these General Operations, those will not appear under the Aggregated Fields.

...

The following table explains which options are available when you select a specific criteria to add a Flush condition:

...

Select from the following values:

  • Numerical
  • Text Values (string)
  • Boolean

...

Options vary according to the Type of field.

  • Numerical: Mathematical operation. You can choose from the following:
    Less than, Greater than, Less or equal to, Greater or equal to, Equal, Is different from
  • Text: Matches/Not Matches. It is not possible to perform mathematical operation on strings.
  • Boolean: Yes (True) and No (False)

...

If you select:

  • Numeric Values: You can type the value or use the up or down keys to increment or decrement the value. 
  • Text Values (string): Enter the string value.
  • Boolean: Select Yes/No 

...

Click +Add Condition to add more conditions.

...

Note
titleTTL

The aggregated session is stored for a maximum of 180 days. That means if a session is not updated for 180 days, all the stored data pertaining to that session will be deleted permanently. 

Info
titleSome example streams

Metadata

You can view and access the following metadata properties of aggregated session. To view the metadata, use the meta object as mentioned in the Script Function. Here is an example:

...

titleExample
Code Block
{"origin":"Data_Aggregator","count":7,"flushType":"TIMEOUT","firstEvent":"2022-04-08T17:35:53.239Z","lastEvent":"2022-04-08T17:38:17.315Z","lastCall":false}

...

count

...

Number of aggregated records

...

The reason for session being flushed out. Shows any of the values: ALL_FILES, EACH_FILE, TIMEOUT and CONDITION. During preview, the value will be empty.

...

Date and time of the last aggregated record in the session

...

titleNote!

Licensing Information

...

. After setting a timeout, you can add custom conditions on it that specify when the data should be output.

A typical use case would be accumulating usage records per account, subscriber, or service over time. You can then calculate total monthly usage and send that aggregated data to a billing system or cloud-based storage, making your data management much more efficient.

Subsections

This section contains the following subsections: