Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

A Duplicate UDR agent is configured in two steps. First, a profile has to be defined, then the regular configurations of the agent are made. The Duplicate UDR profile is loaded when you start a workflow that depends on it. Changes to the profile become effective when you restart the workflow.

Configuration

To create a new Duplicate UDR profile configuration, click the New Configuration button from the Configuration dialog available from Build View (4.1), and then select Duplicate UDR Profile from the menu. 

...

Item

Description

External References

Select this menu item to Enable External References in an agent profile field. Refer to Enabling External References in an Agent Profile Field in External Reference (4.1) for further information.

General Tab

The General tab is displayed by default. 

...

To begin, first select either File Storage or SQL Storage with the Storage selector. When a Storage is selected, only settings relevant to the Storage will be displayed. 

Storage Settings 

The Storage settings is the top section of the General Tab. It contains settings to setup the Duplicate UDR cache storage and settings for managing the cache size and data expiration.

File Storage

...

The Duplicate UDR profile configuration contains the following Storage settings specific to File Storage:

Setting

Description

Storage Host

In the drop-down menu, the preferred storage host, where the duplicate UDRs are to be stored, can be selected. The choice for storage of duplicate repositories is either on a specific EC Group or Automatic.

If Automatic is selected, the same EC Group used by the running workflow will be selected. When the Duplicate UDR Inspector is used, the EC Group is automatically selected.

Note

Note!

The workflow must be running on the same EC Group as its storage resides, otherwise, the Duplicate UDR Agent will refuse to run. If the storage is configured to be Automatic, its corresponding directory must be a file system shared between all the EC Groups.

Directory


An absolute path to the directory on the selected storage host, in which to store the duplicate cache.

If this field is greyed out with a stated directory, it means that the directory path has been hard-coded using the mz.present.dupUDR.storage.path property. This property is set to false by default.

Info

Example - Using the mz.preset.dupUDR.storage.path property

To enable the property and state the directory to be used:

Code Block
mzsh topo set val:common.mz.preset.dupUDR.storage.path '/mydirectory/dupudr'


To disable the property:

Code Block
mzsh topo unset val:common.mz.preset.dupUDR.storage.path


For further information about all available system properties, see System Properties (4.1).

External References specific to File Storage can be used with the following field: 

  • Directory 

SQL Storage

...

The Duplicate UDR profile configuration contains the following Storage settings specific to SQL Storage:

Setting

Description

Database Profile 

This is the database in which to store the Duplicate UDR cache. 

Click the  Browse...  button to get a list of all the database profiles that are available. For further information see Database Profile (4.1)

Duplicate UDR SQL Storage is supported for use with the following database: 

  • SAP HANA 2.0 SP 7 

Note

Note!

If no changes are made to the Duplicate UDR profile, changes to the settings of selected Database Profile will only be detected during Duplicate UDR Agent workflow run. 

Generate SQL 

Click this button to bring up a dialog that will contain the SQL statements for the table schema generated for the Duplicate UDR profile. 

Note

Note!

The Duplicate UDR profile Configuration Key is used for generating the names of the Duplicate UDR database tables. You will need to save the profile at least once for the profile to have a Configuration Key, so that proper database table names can be generated. 

Warning

Warning!

Users will have to copy the SQL script generated in the dialog to create the Duplicate UDR tables on their own in the database selected with the Database Profile selector. The Duplicate UDR profile will not automatically create the tables for you. 

Generate SQL Dialog Box 

When a user clicks on the SQL Storage Generate SQL button, the associated dialog box will open. The Copy button is a convenient way to copy the whole Create Tables SQL Script. 

...

More Storage Settings

The Duplicate UDR profile configuration contains the following Storage settings common to both File Storage and SQL Storage. 

Setting

Description

Max Cache Age (days) 

The maximum number of days to keep UDRs in the cache. The age of a UDR stored in the cache is either calculated from the Indexing Field (timestamp) of a UDR in the latest processed batch file, or from the system time, depending on whether Based on System Arrival Time or  Based on Latest Time Stamp in Cache is selected. 

If the Date Field option in the UDRsettings section below, is not enabled for the Indexing Field, Max Cache Age setting will be disabled and ignored, and cache size can only be configured using the Max Cache Size settings. 

 If enabled, the default value is 30 days.

Note

Note!

If enabled and the Duplicate UDR Agent receives UDRs that are too old (exceeded Max Cache Age), the UDRs will not be processed and will simply be routed to the usual route. Duplicate checking is not performed for these UDRs and a warning will be logged in the System Log.

Note

Note!

The age calculation cannot be performed if the cache is empty.

Based On System Arrival Time 

When Max Cache Age is enabled, this radio button is selected by default, the calculation of cached UDR's age will be based on the time when a new batch is being processed. 

In case of a longer system idle time, this setting may have a major impact on which UDRs that are removed from the cache. For more information about the difference between Based on System Arrival Time and Based on Latest Time Stamp in Cache when calculating the UDR age, see the section, Duplicate UDR Using Indexing Field Instead of System Time(4.1)

Based on Latest Time Stamp in Cache 

When Max Cache Age is enabled and this radio button is selected, the UDR cache age calculation will be made toward the latest Indexing Field (timestamp) of a UDR that was included in the previously processed batch files. 

For more information about the difference between Based on System Arrival Time and Based on Latest Time Stamp in Cache when calculating the UDR age, see the section, Duplicate UDR Using Indexing Field Instead of System Time(4.1)

Max Cache Size (thousands) 

The maximum number of UDRs to store in the duplicate cache. The value must be in the range of 100-9999999 (thousands), the default is 5000 (thousands). The cache will be made up of containers partitioned by the key from the Indexing Field below. For every incoming UDR, it will be determined in which cache container the UDR will be stored. 

During the initialization phase of each batch, the agent checks if the cache is full. If the check indicates that there will be less than 10% of the cache available, cache containers will start to be cleared until at least 10% free cache is reached, starting with the oldest container.

Note

Note!

Depending on how many UDRs are stored in each container, this means that different amounts of UDRs may be cleared depending on the setup. If the Indexing Field of all the UDRs happens to have the same value, then all the UDRs in the cache will be cleared.

Note

Note!

If you have a very large cache size, it may be a good idea to split the workflows in order to preserve performance. 

Enable Separate Storage Per Workflow 

This option enables each workflow to have a separate storage that is checked for duplicates. This allows multiple workflows to run simultaneously using the same Duplicate UDR profile. However, if this checkbox is selected, a UDR in a workflow will not be checked against UDRs in a different workflow.  

Note

Note!

Duplicate UDR Inspector currently does not support Duplicate UDRprofiles with Enable Separate Storage Per Workflow enabled

...

  • Max Cache Age

  • Max Cache Size

UDR Settings

The UDR settings is the bottom section of the General tab. It contains settings to select which UDR to apply duplicate checks, the UDR field used to segment the Duplicate UDR cache into containers and information to manage the scope and contents in the Duplicate UDR containers.

...

Setting

Description

Type

The UDR type the agent will process.

Indexing Field

The UDR field is used as an index in the duplicate comparison. Fields of type long (in milliseconds) and date are valid for selection. 

The cache will be made up of containers partitioned by the key from this Indexing Field. If Date Field below is disabled, each container will cover an interval of 50 seconds. If Date Field is enabled, each container will cover an interval of 10 minutes. For every incoming UDR, it will be determined in which cache container the UDR will be stored. 

For performance reasons, this field should preferably be either an increasing sequence number or a timestamp with good locality. This field will always be implicitly evaluated. 

For further information, see the section, Duplicate UDR Using Indexing Field Instead of System Time(4.1)

Date Field

If selected, the Indexing Field will be treated as a timestamp instead of a sequence number, and this must be selected to enable the Max Cache Age (days) field above to be configured.

Note

Note!

If the UDR Indexing Field value is a timestamp that is configured to be 24 hours or more ahead of the system time, the workflow will abort.

Checked Fields

In addition to the Indexing Field, the Checked Fields will be used for the duplication evaluation when deciding if a UDR is a duplicate.

Note

Note!

If the  Checked Fields or Indexing Field are modified after an agent is executed, the already stored information will be considered useless the next time the workflow is activated. Hence, duplicates will never be found amongst the old information since another type of metadata has replaced them.

Advanced Tab

The Advanced tab is available when you have selectedSQL Storage for your Duplicate UDR Storage. It contains properties that can be used for performance tuning. For information about setting up SQL Storage for better performance, see Duplicate UDR SQL Storage Setup Guide (4.1)

...