9.3.4 Aggregation Profile
You can apply an Aggregation profile to any number of workflow configurations. Aggregation sessions created in the storage that is specified by the profile can be accessed by multiple active workflows simultaneously.
When you have selected file storage and are using an Aggregation profile across several workflow configurations, you must consider the read-and-write lock mechanisms that are applied to the stored sessions. For further information about read-and-write locks, see 9.3.5.1 Aggregation Agent Configuration - Batch and 9.3.5.3 Aggregation Agent Configuration - Real-Time.
The Aggregation profile is loaded when you start a workflow that depends on it. Changes to the profile become effective when you restart the workflow.
Configuration
To open the configuration, click the New Configuration button in the upper left part of the Desktop, and then select Aggregation Profile from the menu.
The contents of the menus in the menu bar may change depending on which configuration type that has been opened in the currently active tab. The Aggregation profile uses the standard menu items and buttons that are visible for all configurations, and these are described in 2.1 Menus and Buttons.
The Edit menu is specific for Aggregation profile configurations.
The Edit Menu | |
Item | Description |
External References | Select this menu item to Enable External References in an agent profile field. Refer to Enabling External References in an Agent Profile Field in 8.10 External Reference Profile for further information. |
Session Tab
In the Session tab you can browse and select a Session UDR Type and configure the Storage selection settings.
The Aggregation profile configuration dialog - Session tab
Setting | Description |
---|---|
Session UDR Type | Click on the Browse... button and select the Session UDR Type, defined in Ultra, that you want to use. For further information, see 9.3.2 Session UDR Type. |
Storage | Select the type of storage for aggregation sessions. The available settings are File Storage, Couchbase, Redis and Elasticsearch. File Storage can be used in batch and real-time workflows. Couchbase and Redis can only be used in real-time workflows. These storage types allow highly available systems with geographic redundancy. The session data that is replicated within the storage is available across workflows, EC/ECSAs, and systems. This serves to minimize data loss in failover scenarios. Note! Data stored in Couchbase or Redis is not available in the Aggregation Session Inspector. Elasticsearch can only be used in batch workflows. |
Association Tab
You use the Association tab to configure rules that are used to match an incoming UDR with a session. Every UDR type requires a set of rules that are processed in a certain order. In most cases, only one rule per incoming UDR type is defined.
You can use a primary expression to filter out UDRs that are candidates for a specific rule. If the UDR is filtered out by the primary expression, it is matched with the existing sessions by using one or several ID Fields as a key.
For UDRs with ID Fields matching an existing session, an additional expression may be used to specify additional matching criteria. For example, if dynamic IP addresses are provided to customers based on time intervals, the field that contains the IP address could be used in ID Fields while the actual time could be compared in Additional Expression.
The Aggregation profile configuration dialog - Association tab
Setting | Description |
---|---|
UDR Types | Click on the Add button to select a UDR Type in the UDR Internal Format dialog. The selected UDR type will then appear in this field. Each UDR type may have a list of rules attached to it. Selecting the UDR type will display its rules as separate tabs to the right in the Aggregation profile configuration. |
Primary Expression | The Primary Expression is optional. Enter an APL code expression that is going to be evaluated before the ID Fields are evaluated. If the evaluation result is Use the |
ID Fields | Click on the Add button to select additional ID Fields in the ID Fields dialog. These fields, along with the Additional Expression settings help determine whether a UDR belongs to an existing session or not. If the contents of the selected fields match the contents of a session and an Additional Expression evaluation results in Note! Make sure that the selected fields are of the same type and appear in the same order for all the rules that are defined for the agent. |
Additional Expression | The Additional Expression is optional. Enter an APL code expression that is going to be evaluated along with the ID Fields. Use the The Additional Expression is useful when you have several UDR types with a varying number of ID Fields, that are about to be consolidated. Having several UDR types requires the ID fields to be equal in number and type. If one of the types requires additional fields that do not have any counterpart in the other type or types, these must be evaluated in the Additional Expression field. Save the field contents as a session variable, and compare the new UDRs with it. For an example, see Association - Radius UDRs in 9.3.9 Aggregation Example - Association of IP Data. Note! When using Additional Expressions for Aggregation the caching mechanism only takes into account the primary and secondary rules when creating the session CRC. This means that if the number of sessions that cannot be told apart without the use of an Additional Expression is high, the performance of the Aggregation Agent decreases due to cache read/write operations. This is especially true if the Max Cached Sessions property is low compared to the number of sessions. For this reason, it is recommended that Max Cached Sessions is set to a high value when using Additional Expressions. |
Create Session on Failure | Select this check box to create a new session if no matching session is found. If the check box is not selected, a new session will not be created when no matching session is found. Note! If you provide a primary expression, and it evaluates to If the order of the input UDRs is not important, all the rules should have this check box checked. This means that the session object is going to be created regardless of the order in which the UDRs arrive. However, if the UDRs are expected to arrive in a particular sequence, Create Session on Failure must only be selected for the UDR type/field that is considered to be the master UDR, i e the UDR that marks the beginning of the sequence. In this case, all the slave UDR types/fields are targeted for error handling if they arrive before their master UDR. Note! At least one of all defined rules must have this check box selected. Otherwise, no session will ever be created. For further information about all available system properties, see 2.6 System Properties. |
Add Rule | Click this button to add a new rule for the selected UDR Type. The rule will appear as a new folder to the right of the UDR Types in the Aggregation profile configuration. Usually, only one rule is required. However, in a situation where a session is based on IP number, stored in either a target or source IP field, two rules are required. The source IP field can be listed in the ID Fields of the first rule and the target IP field listed in the ID Fields of the second rule. |
Remove Rule | Click this button to remove the currently displayed rule. |
Storage Tab
The Storage tab contains settings that are specific for the selected storage, i e File Storage, Couchbase, Redis or Elasticsearch.
File Storage
The Aggregation profile configuration dialog - File Storage settings
Setting | Description |
---|---|
Storage Host | Select a Storage Host from the drop-down list. For storage of aggregation sessions select either a specific Execution Context or >Automatic. If you select Automatic, the same EC that has been used by the running workflow will be applied. Alternatively, if the Aggregation Session Inspector is used, a storage host is selected automatically. Refer to 9.3.8 Aggregation Session Inspector for further information on the Aggregation Session Inspector. Note! It is recommended that you configure the aggregation workflow to run on the same EC that you have selected as Storage Host. |
Directory | Enter the directory on the Storage Host where you want the aggregation data to be stored. Note! If the Storage Host above, is configured to be Automatic, the corresponding Directory has to be a shared file system between all the ECs. If this field is greyed out with a stated directory, it means that the directory path has been hard coded using the Example - Using the mz.preset.aggregation.storage.path property To enable the property and state the directory to be used: mzsh topo set val:common.mz.preset.aggregation.storage.path '/mydirectory/agg'
mzsh topo unset val:common.mz.preset.aggregation.storage.path |
Partial File Count | In this field, you can enter the maximum number of partial files that you want to store. Consider the following: Startup: All the files are read at startup. It takes longer if there are many partial files. Transaction commitment: When the transactions are committed, many small files (large Partial File Count) increase performance. In a batch workflow, use this variable to tune performance. Note! In a real-time workflow, updates to sessions are saved on disk only if the Storage tab is configured with Storage Commit Conditions. |
Max Cached Sessions | Enter the maximum number of sessions to keep in the memory cache. This is a performance-tuning parameter that determines the memory usage of the Aggregation agent. Set this value to be low enough so that there is still enough space for the cache in memory, but not too low, as this will cause performance to deteriorate. For further information see the section below, Performance Tuning with File Storage. |
Note!
Sometimes, you may notice that file storage takes up more space than expected. This is expected behavior.
When session data is stored, it is appended to the session file. This means that old session data from the session file is still present in the storage and the current version is added to the file. Removal of old data is done only under certain conditions because otherwise, aggregation handling would be too slow. This is why file storage takes up more space than calculated with session number and single session object size.
The session files on the disk grow up to a certain threshold ( 50MB by default) and then a new file is created and used. The old session file will be deleted when no more active sessions are stored in it. The accepted size of a session file can be adjusted by using aggregation.min_session_file_size parameter. For instance, aggregation.min_session_file_size=20000000 will set it to 20MB. This parameter is set with the mzsh topo command on EC, cell or container level.
Old files are removed during the storage commit.
Also, since there is a possibility that there will be old session files present because of some long-lived sessions stored there, a defragmentation algorithm is implemented. It runs occasionally and moves those long-lived sessions to new session files so that old session files can be deleted.
This is why aggregation storage takes up a lot of disk space. It is designed to provide higher performance at the expense of higher disk space consumption.
Couchbase
The Aggregation profile configuration dialog - Couchbase Storage
Setting | Description |
---|---|
Profile | Select a Couchbase profile. This profile is used to access the primary storage for aggregation sessions. |
Mirror Profile | Selecting this Couchbase profile is optional. It is used to access secondary storage, providing read-only access for aggregation sessions. Typically, the Mirror Profile is identically configured to a (primary) Profile, that is used by workflows on a different EC/ECSA or other system. This is useful to minimize data loss in various failover scenarios. The read-only sessions can be retrieved with APL commands. For more information and examples, see 3. Aggregation Functions in the APL Reference Guide. |
Mirror profile concept
Redis
Aggregation profile configuration dialog - Redis Storage
Setting | Description |
---|---|
Profile | Select a Redis profile. This profile is used to access the storage for aggregation sessions. |
Elasticsearch
Aggregation profile configuration dialog - Elasticsearch Storage
Setting | Description |
---|---|
Elasticsearch | Select an Elasticsearch profile. This profile is used to access the storage for aggregation sessions. |
Advanced Tab
The Advanced tab is available when you have selected Couchbase Storage, Redis Storage or Elasticsearch Storage in the Session tab. It contains properties that can be used for performance tuning. For information about performance tuning, see 9.3.6 Aggregation Performance Tuning.
Couchbase
The Aggregation profile configuration dialog - Advanced tab for Couchbase
You can also set the properties listed in the Advanced tab as Execution properties in the STR. This will override the values that are set in the profile, including default values.
Example - Overriding the Advanced properties
$ mzsh topo set topo://container:container1/pico:ec1/val: \ config.properties.mz.cb.agg.json_serializer.format MZ-BIN
Redis
The Aggregation profile configuration dialog - Advanced tab for Redis
For Redis Storage, you can only modify the properties in the Advanced tab.
Elasticsearch
The Aggregation profile configuration dialog - Advanced tab for Elasticsearch
For Elasticsearch storage, you can modify the properties listed as shown above in the Advanced tab.