Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Excerpt
nameDuplicate UDR (4.3)

A Duplicate UDR (4.3) agent is configured in two steps. First, a profile has to be defined, then the regular configurations of the agent are made. The Duplicate UDR profile is loaded when you start a workflow that depends on it. Changes to the profile become effective when you restart the workflow.

Configuration

To create a new Duplicate UDR profile configuration, click the New Configuration button from the Configuration dialog available from Build View (4.3), and then select Duplicate UDR Profile from the menu. 

To open an existing Duplicate UDR profile configuration, click on the configuration in the Configuration Navigator, or right-click on the configuration and then select View Configuration

The contents of the menus in the menu bar may change depending on which configuration type has been opened. The Duplicate UDR profile uses the standard menu items and buttons that are visible for all configurations, and these are described in Common Configuration Buttons (4.3).

Button

Description

External References

Select this menu item to Enable External References in an agent profile field. Refer to Enabling External References in an Agent Profile Field in External Reference (4.3) for further information.

General Tab

The General tab is displayed by default. 

In the General tab, the File Storage is displayed as the default Storage for a New Configuration. 

dup_udr_default_general_tab.jpg

The General tab is split into two sections, the Storagesettings and the UDRsettings. 

To begin, first select either File Storage, SQL Storage or SQL Kafka Storage with the Storage selector. When a Storage is selected, only Only settings relevant to the selected Storage will be displayed. 

Storage Settings 

The Storage settings is the top section of the General Tab. It contains settings to setup the Duplicate UDR cache storage and settings for managing the cache size and data expiration.

File Storage

Note!
If you change the storage type in an active profile, it’s recommended to restart the ECs running workflows that use the profile to prevent caching issues.

dup_udr_file_storage_settings.jpg

The Duplicate UDR profile configuration contains the following Storage settings specific to File Storage:

Setting

Description

Storage Host

In the drop-down menu, the preferred storage host can be selected, where the duplicate UDRs are to be stored. The choice for storage of duplicate repositories is either on a specific EC Group or Automatic.

If Automatic is selected, the same EC Group used by the running workflow will be selected. When the Duplicate UDR Inspector is used, the EC Group is automatically selected.

Note!
The workflow must run on the same EC Group where its storage resides. Otherwise, the Duplicate UDR Agent will refuse to run. If the storage is configured to be Automatic, its corresponding directory must be a file system shared between all the EC Groups.

Directory

An absolute path to the directory on the selected storage host, in which to store the duplicate cache.

If this field is greyed out with a stated directory, it means that the directory path has been hard-coded using the mz.present.dupUDR.storage.path property. This property is set to false by default.

External References specific to File Storage can be used with the following field: 

  • Directory 

SQL Storage

dup_udr_sql_storage_settings.jpg

The Duplicate UDR profile configuration contains the following Storage settings specific to SQL Storage:

Setting

Description

Database Profile 

This is the database in which to store the Duplicate UDR cache. 

Click the  Browse...  button to get a list of all the database profiles that are available. For further information see Database (4.3)

Duplicate UDR SQL Storage is supported for use with the following database: 

  • SAP HANA 2.0 SP 5 and above 

  • PostgreSQL 12 and above

Note!
If no changes are made to the Duplicate UDR profile, changes to the settings of selected Database Profile will only be detected during the Duplicate UDR Agent workflow run. 

Generate SQL 

Click this button to bring up a dialog that will contain the SQL statements for the table schema generated for the Duplicate UDR profile. 

Note!
The Duplicate UDR profile Configuration Key is used for generating the names of the Duplicate UDR database tables. You will need to save the profile at least once for the profile to have a Configuration Key, so that proper database table names can be generated. 

Warning

Warning!
Users will have to copy the SQL script generated in the dialog and execute the SQL script separately to create the Duplicate UDR tables in the database selected with the Database Profile selector. The Duplicate UDR profile will not automatically create the tables in the database for you. 

Generate SQL Dialog Box 

When a user clicks on the SQL Storage Generate SQL button, the associated dialog box will open. The Copy button is a convenient way to copy the whole Create Tables SQL Script. 

dup_udr_sql_storage_generate_sql_dialog.jpg

Kafka Storage

You will need to set the storage to Kafka when creating a scalable solution.

Note!
For Kafka Storage, a separate cache is created for every partition. If you select Kafka storage in the Dup UDR Profile, the mz.present.dupUDR.storage.path property cannot be configured on the EC where the workflow is running.

Setting

Description

Partition Profile 

Configure a Partition Profile (4.3) to be used when creating a scalable solution.

More Storage Settings

The Duplicate UDR profile configuration contains the following Storage settings common to multiple storage types.

Setting

Description

Max Cache Age (days) 

The maximum number of days to keep UDRs in the cache. The age of a UDR stored in the cache is either calculated from the Indexing Field (timestamp) of a UDR in the latest processed batch file, or from the system time, depending on whether Based on System Arrival Time or Based on Latest Time Stamp in Cache is selected. 

If the Date Field option in the UDRsettings section below, is not enabled for the Indexing Field, Max Cache Age setting will be disabled and ignored, and cache size can only be configured using the Max Cache Size settings. 

 If enabled, the default value is 30 days.

Note!
If enabled and the Duplicate UDR Agent receives UDRs that are too old (exceeded Max Cache Age), the UDRs will not be processed and will simply be routed to the usual route. Duplicate checking is not performed for these UDRs and a warning will be logged in the System Log.

Note!
The age calculation cannot be performed if the cache is empty.

Based On System Arrival Time 

When Max Cache Age is enabled, this radio button is selected by default, the calculation of cached UDR's age will be based on the time when a new batch is being processed. 

In case of a longer system idle time, this setting may have a major impact on which UDRs are removed from the cache. For more information about the difference between Based on System Arrival Time and Based on Latest Time Stamp in Cache when calculating the UDR age, see the section, Duplicate UDR Using Indexing Field Instead of System Time(4.3).

Based on Latest Time Stamp in Cache 

When Max Cache Age is enabled and this radio button is selected, the UDR cache age calculation will be made toward the latest Indexing Field (timestamp) of a UDR that was included in the previously processed batch files. 

For more information about the difference between Based on System Arrival Time and Based on Latest Time Stamp in Cache when calculating the UDR age, see the section on 'Indexing fields instead of System Time' linked above.

Max Cache Size (thousands) 

The maximum number of UDRs to store in the duplicate cache. The value must be in the range of 100-9999999 (thousands), the default is 5000 (thousands). The cache will be made up of containers partitioned by the key from the Indexing Field below. For every incoming UDR, it will be determined in which cache container the UDR will be stored. 

During the initialization phase of each batch, the agent checks if the cache is full. If the check indicates that there will be less than 10% of the cache available, cache containers will start to be cleared until at least 10% free cache is reached, starting with the oldest container.

Note!
Depending on how many UDRs are stored in each container, this means that different amounts of UDRs may be cleared depending on the setup. If the Indexing Field of all the UDRs happens to have the same value, then all the UDRs in the cache will be cleared.

Note!
If you have a very large cache size, it may be a good idea to split the workflows to preserve performance. 

Enable Separate Storage Per Workflow 

This option enables each workflow to have a separate storage that is checked for duplicates. This allows multiple workflows to run simultaneously using the same Duplicate UDR profile. However, if this checkbox is selected, a UDR in a workflow will not be checked against UDRs in a different workflow.  

Note!
Duplicate UDR Inspector currently does not support Duplicate UDRprofiles with Enable Separate Storage Per Workflow enabled

External References Usage

  • File Storage and SQL Storage:

    • Max Cache Age

    • Max Cache Size

  • Kafka Storage:

    • Max Cache Size

UDR Settings

The UDR settings is the bottom section of the General tab. It contains settings to select which UDR to apply duplicate checks, the UDR field used to segment the Duplicate UDR cache into containers and information to manage the scope and contents in the Duplicate UDR containers.

dup_udr_udr_settings.jpg

Setting

Description

Type

The UDR type the agent will process.

Indexing Field

The UDR field is used as an index in the duplicate comparison. Fields of type long (in milliseconds) and date are valid for selection. 

The cache will be made up of containers partitioned by the key from this Indexing Field. If Date Field below is disabled, each container will cover 50 seconds. If the Date Field is enabled, each container will cover 10 minutes. For every incoming UDR, it will be determined in which cache container the UDR will be stored. 

For performance reasons, this field should preferably be either an increasing sequence number or a timestamp with a good locality. This field will always be implicitly evaluated. 

For further information, see the section Duplicate UDR Using Indexing Field Instead of System Time(4.3). The Duplicate UDR profile configuration contains the following UDR settings common to both File Storage and SQL Storage. 

Date Field

If selected, the Indexing Field will be treated as a timestamp instead of a sequence number, and this must be selected to enable the Max Cache Age (days) field above to be configured.

Note!
If the UDR Indexing Field value is a timestamp that is configured to be 24 hours or more ahead of the system time, the workflow will abort.

Checked Fields

In addition to the Indexing Field, the Checked Fields will be used for the duplication evaluation when deciding if a UDR is a duplicate.

Note!
If the Indexing or Checked Fields are modified after an agent is executed, the already stored information will be considered useless the next time the workflow is activated. Hence, duplicates will never be found amongst the old information since another type of metadata has replaced them.

Advanced Tab

The Advanced tab is available when you have selectedeither SQL Storage, or Kafka Storage for your Duplicate UDR Storage. It contains properties that can be used for performance tuning. For information about setting up SQL Storage for better performance, see Duplicate UDR SQL Storage Setup Guide(4.3). For information about setting up Kafka storage for better performance, see /wiki/spaces/UEPE4D/pages/407928875.

...