Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Excerpt

Take into account the following behaviors when using the Aggregation profile:

  • You can apply an Aggregation profile to any number of workflow configurations.

  • Aggregation sessions created in the storage that is specified by the profile can be accessed by multiple active workflows simultaneously.

  • When you have selected file storage and are using an Aggregation profile across several workflow configurations, you must consider the read-and-write lock mechanisms that are applied to the stored sessions. For further information about read-and-write locks, see Aggregation Agent - Batch(4.2) and Aggregation Agent - Real-Time(4.2).

  • The Aggregation profile is loaded when you start a workflow that depends on it. Changes to the profile become effective when you restart the workflow.

Session UDR

Each Aggregation profile stores sessions of a specific Session UDR type that you define in ultra. This means that your Aggregation profile configuration must include a session UDR type. See the example below:

Info

Example - Defining a Session UDR type in an Ultra Configuration

Code Block
session SessionUDRType {
   int intField;
   string strField;
   list<drudr> udrList;
};

It is recommended that you keep the session UDR as small as possible. A larger UDR decreases performance compared to a small one.

...

note

Note!

Take particular care when updating the Ultra formats. It is not possible to collect data from the Aggregation session storage if the corresponding UDR has been renamed. However, if you change the format definition, you can still collect the data.

Changes to the formats are handled as follows:

  • Default values are assigned to fields that are added or renamed.

  • Fields that have been removed are ignored.

  • Default values are assigned to fields with data types that have been changed.

For further information on Ultra formats, see Ultra Format(4.2).

Profile Configuration

The contents of the buttons in the menu bar may change depending on which configuration type has been opened in the currently active tab. The Aggregation profile uses the standard buttons that are visible for all configurations, and these are described in Build View (4.2).

The profile consists of four tabs:

  • Session Tab

  • Association Tab

  • Storage Tab

  • Advanced Tab

Session Tab

In the Session tab you can browse to and select a Session UDR Type and configure the Storage selection settings.

Image Modified

Setting

Description

Session UDR Type

Click Browse… to search for the Session UDR type that you want to use. See https://infozone.atlassian.net/wiki/spaces/UEPE4D/pages/edit-v2/233738277#Session-UDR above for information on creating a Session UDR type.

Storage

Select the type of storage for aggregation sessions. The available settings are:

  • Couchbase Storage

  • Elasticsearch Storage

  • File Storage

  • Memory Only

  • Redis Storage

  • SQL Storage

File Storage and Memory Only can be used in batch and real-time workflows.

Elasticsearch Storage and SQL Storage can only be used in batch workflows.

Couchbase Storage and Redis Storage can only be used in real-time workflows. These storage types allow highly available systems with geographic redundancy. The session data that is replicated within the storage is available across workflows, EC Groups, and systems. This serves to minimize data loss in failover scenarios.

...

Note!

Data stored in Couchbase or Redis is not available in the Aggregation Session Inspector(4.2).

image-20240415-093654.pngImage Modified

Association Tab

You use the Association tab to configure rules

...

to match an incoming UDR with a session. Every UDR type requires a set of rules that are processed in a certain order. In most cases, only one rule per incoming UDR type is defined.

You can use a primary expression to filter out UDRs that are candidates for a specific rule. If the UDR is filtered out by the primary expression, it is matched with the existing sessions by using one or several ID fields as a key.

For UDRs with ID Fields matching an existing session, an additional expression may be used to specify additional matching criteria. For example, if dynamic IP addresses are provided to customers based on time intervals, the field that contains the IP address could be used in ID Fields while the actual time could be compared in Additional Expression.

Image Modified

Setting

Description

UDR Types

Click the Add button to select a UDR type in the UDR Internal Format dialog. The UDR type that you select then appears in this field. A UDR type may have a list of rules attached to it. When you select the UDR type, its rules appear as separate tabs to the right in the Aggregation profile configuration.

Primary Expression

The Primary Expression is optional. You can enter an APL code expression

...

to be evaluated before the ID Fields are evaluated. If the evaluation result is false the rule is ignored and the evaluation continues with the next rule.

Use the inputvariable to write this filtering expression.

ID Fields

Click the Add button to select additional ID Fields in the ID Fields dialog. These fields, along with the Additional Expression settings, enable Usage Engine to determine whether a UDR belongs to an existing session or not. If the contents of the selected fields match the contents of a session and

...

an Additional

...

Expression evaluation results in true, the UDR belongs to the session.

...

Note!

Ensure that the selected fields are of the same type and appear in the same order for all the rules that are defined for the agent.

Additional Expression

The Additional Expression is optional. Enter an APL code expression

...

to be evaluated with the ID Fields.

Use the inputvariable to write this filtering expression.

This setting is useful when you have several UDR types with a varying number of ID Fields to be consolidated. Having several UDR types requires the ID fields to be equal in number and type. If one of the types requires additional fields that do not have any counterpart in the other type or types, these must be evaluated in the Additional Expression field. Save the field contents as a session variable, and compare the new UDRs with it.

...

Note!

When using Additional Expressions for Aggregation the caching mechanism only takes into account the primary and secondary rules when creating the session CRC. This means that if the number of sessions that cannot be told apart without the use of an Additional Expression is high, the performance of the Aggregation Agent decreases due to cache read/write operations. This is especially true if the Max Cached Sessions property is low compared to the number of sessions. For this reason, it is recommended that Max Cached Sessions is set to a high value when using Additional Expressions.

Create Session on Failure

Select this check box to create a new session if no matching session is found. If the check box is not selected, a new session is not created when no matching session is found.

...

Note!

If you provide a primary expression, and it evaluates to false, the rule is ignored and no new session is created.

If the order of the input UDRs is unimportant, select this check box for all the rules. This means that the session object is created regardless of the order in which the UDRs arrive.

However, if the UDRs are expected to arrive in a particular sequence, only select Create Session on Failure for the UDR type/field that is considered to be the master UDR, i.e. the UDR that marks the beginning of the sequence. In this case, all the slave UDR types/fields are targeted for error handling if they arrive before their master UDR.

...

Note!

At least one of all defined rules must have this check box selected. Otherwise, no session is created.

For further information about all available system properties, see System Properties (4.2).

Add Rule

Click this button to add a new rule for the selected UDR type. The rule appears as a new folder to the right of the UDR types in the Aggregation profile configuration.

Usually, only one rule is required. However, in a situation where a session is based on an IP number, stored in either a target or source IP field, two rules are required. The source IP field can be listed in the ID Fields of the first rule and the target IP field listed in the ID Fields of the second rule.

Remove Rule

Click this button to remove the selected rule.

Storage Tab

The Storage tab contains settings that are specific for File Storage, Couchbase Storage, Redis Storage, Elasticsearch Storage, and SQL Storage.

Couchbase Storage

...

couchbase storage image.pngImage Added

Setting

Description

Profile

Select a Couchbase (4.2) profile. This profile is used to access the primary storage for aggregation sessions.

Mirror Profile

Selecting this Couchbase profile is optional. It is used to access a secondary storage, providing read-only access for aggregation sessions. Typically, the Mirror Profile is identically configured to a (primary) Profile, that is used by workflows on a different EC or other Usage Engine system. This is useful to minimize data loss in various failover scenarios. The read-only sessions can be retrieved with APL commands. For more information and examples, see Aggregation Functions(4.2).

Image Modified

Elasticsearch Storage

Image Modified

Setting

Description

Profile

Select an Elasticsearch (4.2) profile. This profile is used to access the storage for aggregation sessions.

File Storage

Image Modified

Setting

Description

Storage Host

You can only select Automatic.

When you select Automatic, the EC used by the running workflow is automatically applied. Alternatively, if the Aggregation Session Inspector is used, a storage host is selected automatically. For further information, see Aggregation Session Inspector(4.2).

...

Note!

It is recommended that you configure the aggregation workflow to run on the same EC Group that you have selected as the Storage Host.

Directory

Enter the directory on the Storage Host where you want the aggregation data to be stored.

...

Note!

If the Storage Host above, is configured to be Automatic, the corresponding Directory has to be a shared file system between all the ECs.

Partial File Count

...

In this field

...

Info

Example - Using the mz.preset.aggregation.storage.path property

To enable the property and state the directory to be used:

Code Block
mzsh topo set val:common.mz.preset.aggregation.storage.path '/mydirectory/agg'

To disable the property:

Code Block
mzsh topo unset val:common.mz.preset.aggregation.storage.path

...

Partial File Count

In this field, you can enter the maximum number of partial files that you want to store. Consider the following:

Startup: All the files are read at startup. It takes longer if there are many partial files.

, you can enter the maximum number of partial files that you want to store. Consider the following:

Startup: All the files are read at startup. It takes longer if there are many partial files.

Transaction commitment: Many small files (large Partial File Count) increase performance when the transactions are committed.

In a batch workflow, use this variable to tune performance.

...

Note!

In a real-time workflow, updates to sessions are saved on disk only if the Storage tab is configured with Storage Commit Conditions .

Max Cached Sessions

Enter the maximum number of sessions to keep in the memory cache.

This is a performance-tuning parameter that determines the memory usage of the Aggregation agent. Set this value to be low enough so that there is still enough space for the cache in memory, but not too low, as this will cause performance to deteriorate, see Performance Tuning with File Storage(4.2) for more information.

Enable Separate Storage Per Workflow

This option enables each workflow to have a separate session storage. Multiple workflows are allowed to run simultaneously using the same Aggregation profile.

If this checkbox is selected, a workflow will never see a session from another workflow.

...

note

Note!

Sometimes, you may notice that file storage takes up more space than expected. This is expected behavior. Read through this note for an overall understanding of the way file storage in Aggregation works. 

When session data is stored, it is appended to the session file. This means that old session data from the session file is still present in the storage and the current version is added to the file. Removal of old data is done only under certain conditions because otherwise, aggregation handling would be too slow. This is why file storage takes up more space than calculated with session number and single session object size.

The session files on the disk grow up to a certain threshold ( 50MB by default) and then a new file is created and used. The old session file will be deleted when no more active sessions are stored in it. The accepted size of a session file can be adjusted by

using 

using the system property aggregation.min_session_file_

size parameter

size.

For instance, 

For example:

aggregation.min_session_file_size=

20000000 will set it to 20MB

20000000 

will set it to 20MB.

This system property can be configured in the ECD, see Creating an EC Deployment (4.2).

Old files are removed during the storage commit. Also, since there is a possibility that there will be old session files present because of some long-lived sessions stored there, a defragmentation algorithm is implemented. It runs occasionally and moves those long-lived sessions to new session files so that old session files can be deleted.

This is why aggregation storage takes up a lot of disk space. It is designed to provide higher performance at the expense of higher disk space consumption.

Memory Only

image-20240415-095414.pngImage Modified

When you have selected Memory Only as storage, there are no additional settings in the Storage tab.

Redis Storage

Image Modified

Setting

Description

Profile

Select a Redis (4.2) profile. This profile is used to access the storage for aggregation sessions.

SQL Storage

Image Modified

Setting

Description

Profile

Select a Database Profile(4.2) configured with the SQL database type. This profile is used to access the storage for aggregation sessions.

...

Note!

Currently, the SQL storage only supports PostgreSQL and SAP HANA databases.

Storage-sharing functionality is currently not supported.

Index Fields

Click the Add button to select the UDR type.

Table SQL Script

This text box will generate the SQL statements for the selected UDRs' table schema and indexes for Id, TxId. The schema will be generated based on the number of UDRs in the UDR Type Mapping table.

Info

Info!

Users will have to copy the SQL script generated in the text box to create the PostgreSQL and SAP HANA tables on their own in the database listed in the Database profile. The Aggregation profile will not automatically create the tables for you.

...

Note!

The following table columns are mandatory when creating the database:

Column NameData Type

Id

VARCHAR(24)

TxId

BIGINT

Deleted

BOOLEAN

Timeout

BIGINT

Session

BYTEA

Advanced Tab

The Advanced tab is available when you have selected Couchbase Storage, Elasticsearch Storage, Redis Storage or SQL Storage in the Session tab. It contains properties that can be used for performance tuning. For information about performance tuning, see Aggregation Performance Tuning(4.2).

These fields supports parameterization using ${} syntax

...

Couchbase Storage

...

You can also set the properties listed in the Advanced tab as Execution properties. This will override the values that are set in the profile, including default values.

$ mzsh topo set topo://container:container1/pico:ec1/val: \ config.properties.mz.cb.agg.json_serializer.format MZ-BIN

, see Profiles(4.2) for more information on how parameterization works.

Couchbase Storage

image-20240415-100304.pngImage Added

You can also set the properties listed in the Advanced tab as system properties in the ECD, see Creating an EC Deployment (4.2) , which will override the values that are set in the profile, including default values.

Note!

See the Note at the end of this page for more information when using pessimistic or optimistic locking mechanisms for Couchbase aggregation storage.

Elasticsearch Storage

image-20240415-100356.pngImage Modified

For Elasticsearch storage, you can modify the properties listed as shown above in the Advanced tab.

Redis Storage

image-20240415-100433.pngImage Modified

For Redis Storage, you can only modify the properties in the Advanced tab.

Note!

See the Note at the end of this page for more information when using optimistic locking for Redis aggregation storage.

SQL Storage

image-20240415-100502.pngImage Modified

For SQL Storage, you can modify the properties listed as shown above in the Advanced tab.

Note!

When using Couchbase or Redis aggregation storage, it is important to take note of the concept of locking mechanisms when configuring workflows. Locking mechanisms are of two types: Pessimistic and Optimistic.

Redis aggregation storage only has an Optimistic lock whereas Couchbase aggregation storage has both Optimistic and Pessimistic locks.

  • Pessimistic Lock
    When a workflow thread is working on a session, it is considered

Info

Example - Overriding the Advanced properties

Code Block
languagetext

...

  • fully locked. No other thread can work on that particular session. Once the first thread is finished, the lock is released and another thread can take the lock and work on the session.

  • Optimistic Lock
    Instead of acquiring a traditional lock for a session, a workflow thread obtains a CAS (Compare And Swap) for that session. The CAS serves as a

...

  • type of hash or fingerprint of the session data. When the consume block is done and the session is ready to be updated, an error occurs if the CAS no longer matches. In scenarios where multiple threads have made updates to the same session, only the changes from the first thread to complete its work are accepted. Any other thread(s) attempting to update will encounter failure and need to restart their work from the beginning. This process ensures that only changes from one workflow at a time can be committed, akin to the principles of pessimistic locking. It's essential to understand a key distinction: the consume and sessionInit blocks may be invoked multiple times due to the retry mechanism mentioned earlier. As a result, it's advisable to avoid using global variables within the aggregation APL. However, the udrRoute function can be safely utilized within these blocks since it is executed only when the Optimistic lock succeeds. If global variables are necessary, they can be relocated to an analysis agent and updated through the udrRoute function.

Note!

Threads may live in multiple processes on multiple machines.