Duplicate UDR Profile

A Duplicate UDR agent is configured in two steps. First, a profile has to be defined, then the regular configurations of the agent are made. The Duplicate UDR profile is loaded when you start a workflow that depends on it. Changes to the profile become effective when you restart the workflow.

Configuration

To create a new Duplicate UDR profile configuration, click the New Configuration button from the Configuration dialog available from Build View, and then select Duplicate UDR Profile from the menu. 

To open an existing Duplicate UDR profile configuration, click on the configuration in the Configuration Navigator, or right-click on the configuration and then select View Configuration. 

The contents of the menus in the menu bar may change depending on which configuration type that has been opened. The Duplicate UDR profile uses the standard menu items and buttons that are visible for all configurations, and these are described in Common Configuration Buttons.

Item

Description

Item

Description

External References

Select this menu item to Enable External References in an agent profile field. Refer to Enabling External References in an Agent Profile Field in External Reference Profile for further information.

General Tab

The General tab is displayed by default. 

In the General tab, the File Storage is displayed as the default Storage for a New Configuration. 

dup_udr_default_general_tab.jpg
Duplicate UDR New Configuration

The General tab is split into two sections, the Storage settings and the UDR settings. 

To begin, first select either File Storage or SQL Storage with the Storage selector. When a Storage is selected, only settings relevant to the Storage will be displayed. 

Storage Settings 

The Storage settings is the top section of the General Tab. It contains settings to setup the Duplicate UDR cache storage and settings for managing the cache size and data expiration.

File Storage

dup_udr_file_storage_settings.jpg
General Tab Storage settings for File Storage

The Duplicate UDR profile configuration contains the following Storage settings specific to File Storage:

Setting

Description

Setting

Description

Storage Host

In the drop-down menu, the preferred storage host, where the duplicate UDRs are to be stored, can be selected. The choice for storage of duplicate repositories is either on a specific EC Group or Automatic.

If Automatic is selected, the same EC Group used by the running workflow will be selected. When the Duplicate UDR Inspector is used, the EC Group is automatically selected.

Note!

The workflow must be running on the same EC Group as its storage resides, otherwise, the Duplicate UDR Agent will refuse to run. If the storage is configured to be Automatic, its corresponding directory must be a file system shared between all the EC Groups.

Directory



An absolute path to the directory on the selected storage host, in which to store the duplicate cache.

If this field is greyed out with a stated directory, it means that the directory path has been hard-coded using the mz.present.dupUDR.storage.path property. This property is set to false by default.

Example - Using the mz.preset.dupUDR.storage.path property

To enable the property and state the directory to be used:

mzsh topo set val:common.mz.preset.dupUDR.storage.path '/mydirectory/dupudr'


To disable the property:

mzsh topo unset val:common.mz.preset.dupUDR.storage.path



For further information about all available system properties, see System Properties.

External References specific to File Storage can be used with the following field: 

  • Directory 

SQL Storage

The Duplicate UDR profile configuration contains the following Storage settings specific to SQL Storage:

Setting

Description

Setting

Description

Database Profile 

This is the database in which to store the Duplicate UDR cache. 

Click the  Browse...  button to get a list of all the database profiles that are available. For further information see Database Profile. 

Duplicate UDR SQL Storage is supported for use with the following database: 

  • SAP HANA 2.0 SP 5 and above

  • PostgreSQL 12 and above

Note!

If no changes are made to the Duplicate UDR profile, changes to the settings of selected Database Profile will only be detected during Duplicate UDR Agent workflow run. 

Generate SQL 

Click this button to bring up a dialog that will contain the SQL statements for the table schema generated for the Duplicate UDR profile. 

Generate SQL Dialog Box 

When a user clicks on the SQL Storage Generate SQL button, the associated dialog box will open. The Copy button is a convenient way to copy the whole Create Tables SQL Script. 

More Storage Settings

The Duplicate UDR profile configuration contains the following Storage settings common to both File Storage and SQL Storage. 

Setting

Description

Setting

Description

Max Cache Age (days) 

The maximum number of days to keep UDRs in the cache. The age of a UDR stored in the cache is either calculated from the Indexing Field (timestamp) of a UDR in the latest processed batch file, or from the system time, depending on whether Based on System Arrival Time or  Based on Latest Time Stamp in Cache is selected. 

If the Date Field option in the UDR settings section below, is not enabled for the Indexing Field, Max Cache Age setting will be disabled and ignored, and cache size can only be configured using the Max Cache Size settings. 

 If enabled, the default value is 30 days.

Based On System Arrival Time 

When Max Cache Age is enabled, this radio button is selected by default, the calculation of cached UDR's age will be based on the time when a new batch is being processed. 

In case of a longer system idle time, this setting may have a major impact on which UDRs that are removed from the cache. For more information about the difference between Based on System Arrival Time and Based on Latest Time Stamp in Cache when calculating the UDR age, see the section, Duplicate UDR Using Indexing Field Instead of System Time . 

Based on Latest Time Stamp in Cache 

When Max Cache Age is enabled and this radio button is selected, the UDR cache age calculation will be made toward the latest Indexing Field (timestamp) of a UDR that was included in the previously processed batch files. 

For more information about the difference between Based on System Arrival Time and Based on Latest Time Stamp in Cache when calculating the UDR age, see the section, Duplicate UDR Using Indexing Field Instead of System Time . 

Max Cache Size (thousands) 

The maximum number of UDRs to store in the duplicate cache. The value must be in the range of 100-9999999 (thousands), the default is 5000 (thousands). The cache will be made up of containers partitioned by the key from the Indexing Field below. For every incoming UDR, it will be determined in which cache container the UDR will be stored. 

During the initialization phase of each batch, the agent checks if the cache is full. If the check indicates that there will be less than 10% of the cache available, cache containers will start to be cleared until at least 10% free cache is reached, starting with the oldest container.

Enable Separate Storage Per Workflow 

This option enables each workflow to have a separate storage that is checked for duplicates. This allows multiple workflows to run simultaneously using the same Duplicate UDR profile. However, if this checkbox is selected, a UDR in a workflow will not be checked against UDRs in a different workflow.  

For both File Storage and SQL Storage, External References can be used with the fields:

  • Max Cache Age

  • Max Cache Size

UDR Settings

The UDR settings is the bottom section of the General tab. It contains settings to select which UDR to apply duplicate checks, the UDR field used to segment the Duplicate UDR cache into containers and information to manage the scope and contents in the Duplicate UDR containers.

The Duplicate UDR profile configuration contains the following UDR settings common to both File Storage and SQL Storage. 

Setting

Description

Setting

Description

Type

The UDR type the agent will process.

Indexing Field

The UDR field is used as an index in the duplicate comparison. Fields of type long (in milliseconds) and date are valid for selection. 

The cache will be made up of containers partitioned by the key from this Indexing Field. If Date Field below is disabled, each container will cover an interval of 50 seconds. If Date Field is enabled, each container will cover an interval of 10 minutes. For every incoming UDR, it will be determined in which cache container the UDR will be stored. 

For performance reasons, this field should preferably be either an increasing sequence number or a timestamp with good locality. This field will always be implicitly evaluated. 

For further information, see the section, Duplicate UDR Using Indexing Field Instead of System Time . 

Date Field

If selected, the Indexing Field will be treated as a timestamp instead of a sequence number, and this must be selected to enable the Max Cache Age (days) field above to be configured.

Checked Fields

In addition to the Indexing Field, the Checked Fields will be used for the duplication evaluation when deciding if a UDR is a duplicate.

Advanced Tab

The Advanced tab is available when you have selected SQL Storage for your Duplicate UDR Storage. It contains properties that can be used for performance tuning. For information about setting up SQL Storage for better performance, see Duplicate UDR SQL Storage Setup Guide. 

Â