The Data Aggregator Function consolidates related records from one or more sources into a single record transforming it into a more useful format for analysis. Related records are grouped into sessions according to the value of their respective fields, and a set of configurable conditions.
The Data Aggregator Function can be used in a stream when a billing system has limitations on the number of usage events it can process. It performs aggregation operations such as SUM, COUNT, MAX, MIN, and AVERAGE on any configured field(s). It can also set conditions on those fields when flushing data.
The Data Aggregator Function works by accumulating usage records over time, per account, subscriber, and product, to calculate the total usage of a product over a month. It then aggregates this data and delivers it to an external system, such as a billing or storage system.
Configuration
The Configuration for the Data Aggregator Function contains the following sections:
Group Fields
Aggregate Fields
Flush by
Group Fields
In this section, specify the names of all the fields you want to group when performing the aggregation.
Configuration | Description |
---|---|
Fields | Specify the fields (keys) to be used for grouping aggregated data. If a record has the same values as another record in the selected fields, they will be grouped in the same session. ExampleFor example, there are three records being read in a stream with field key, user and you decide to group by user
So, You can either type the name of the field(s) or select the field(s) from the drop-down menu that appears when you click on the field. In the Fields drop-down menu, you also have the option to Select all or Deselect all fields. |
Group based on date/time | Select the checkbox to enable aggregation based on a time period. Specify applicable fields, containing timestamp data, to be used for grouping of aggregation data. Select the Period to group the information by:
Time Period FormatThe supported time period format is ISO 8601 (extended format) in accordance with the YYYY-MM-DDTHH:mm:ss.sssZ pattern. In case you need to convert a date format to support the ISO standard, see Script. An example would be: "2019-01-13T00:00:00.000Z". Click to add additional grouping criteria based on the time period. |
Configuration
Any change in the Group Fields configuration may result in a new aggregation session, however, the old sessions will still be in the storage. So if you change it back to the previous configuration, the already existing session in the storage (if any) will be used during the aggregation. Refer to the note for more information on the TTL of an aggregation session.
Aggregate Fields
Here you specify the name of the field(s) on which the aggregation operation will be performed:
Field | Operation |
---|---|
Name of the input field key | The Operation drop-down menu is divided into three types:
|
Click to add more fields for aggregation.
Flush by
Here you select how and when you want to
In Usage Engine, you can flush using any of the following options:
Flush by | Description |
---|---|
End of transaction | Aggregated data is flushed at the end of each transaction. A use case where records from a single file or data set are being aggregated during the stream execution. This option is not applicable to |
End of stream | Aggregated data is flushed at the end of the stream. ExampleA use case where data sets from multiple files are being aggregated during a stream execution. This option is not applicable to real-time streams. |
Timeout | Aggregated data is timed out when a predefined interval has passed or a condition is met. In the case of batch streams, the flush for timed out aggregated data happens only when a stream is executed. For real-time streams, the aggregated data is checked every 60 seconds and flushed in case of timeout. ExampleExample of a timeout - Record the consumption of mobile data of a subscriber by 10th of every month. Example of a condition timeout - Record the consumption of mobile data on 10th of every month or if 100 GB limit is reached. So in this case the timeout will happen if either of these two conditions are met. Select the type of timeout based on:
Click if you wish to add custom conditions to timeout the aggregated data. This option is available only when the Timeout flush by option is selected. |
TTL
The aggregated session is stored for a maximum of 180 days. That means if a session is not updated for 180 days, all the stored data pertaining to that session will be deleted permanently.
Metadata
You can view and access the following metadata properties of aggregated session. To view the metadata, use the meta object as mentioned in the Script Function. Here is an example:
Example
{"origin":"Data_Aggregator","count":7,"flushType":"TIMEOUT","firstEvent":"2022-04-08T17:35:53.239Z","lastEvent":"2022-04-08T17:38:17.315Z","lastCall":false}
Property name | Description |
---|---|
count | Number of aggregated records |
flushType | The reason for session being flushed out. Shows any of the values: ALL_FILES, EACH_FILE, TIMEOUT and CONDITION. During preview, the value will be empty. |
firstEvent | Date and time of the first aggregated record in the session |
lastEvent | Date and time of the last aggregated record in the session |