Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

A Duplicate UDR agent is configured in two steps. First, a profile has to be defined, then the regular configurations of the agent are made. The Duplicate UDR profile is loaded when you start a workflow that depends on it. Changes to the profile become effective when you restart the workflow.

...

SettingDescription

Storage Host

In the drop-down menu, the preferred storage host, where the duplicate UDRs are to be stored, can be selected. The choice for storage of duplicate repositories is either on a specific EC Group or Automatic. If Automatic is selected, the same EC Group used by the running workflow will be selected, or when the Duplicate UDR Inspector is used, the EC Group is automatically selected.

Note
titleNote!

The workflow must be running on the same EC Group as its storage resides, otherwise, the Duplicate UDR Agent will refuse to run. If the storage is configured to be Automatic, its corresponding directory must be a file system shared between all the EC Groups.


Directory


An absolute path to the directory on the selected storage host, in which to store the duplicate cache.

If this field is greyed out with a stated directory, it means that the directory path has been hard-coded using the mz.present.dupUDR.storage.path property. This property is set to false by default.

Info
titleExample - Using the mz.preset.dupUDR.storage.path property

To enable the property and state the directory to be used:

Code Block
mzsh topo set val:common.mz.preset.dupUDR.storage.path '/mydirectory/dupudr'


To disable the property:

Code Block
mzsh topo unset val:common.mz.preset.dupUDR.storage.path


For further information about all available system properties, see/wiki/spaces/MD82/pages/3778732.

Max Cache Age (days)

The maximum number of days to keep UDRs in the cache. The age of a UDR stored in the cache is either calculated from the Indexing Field (timestamp) of a UDR in the latest processed batch file, or from the system time, depending on whether Based on System Arrival Time or  Based on Latest Time Stamp in Cache is selected.

If the Date Field option, below, is not selected as an indexing field, this field will be deactivated and ignored, and cache size can only be configured using the Max Cache Size settings. The default value is 30 days.

Note
titleNote!
Duplicate checking is not performed if the processed UDRs are too old, this will be logged in the System Log. However, the age calculation cannot be performed if the cache is empty.


Based On System Arrival Time

When this radio button is selected (default), the calculation of cached UDR's age will be based on the time when a new batch is being processed.

In case of a longer system idle time, this setting may have a major impact on which UDRs that are removed from the cache. For more information about the difference between Based on System Arrival Time and Based on Latest Time Stamp in Cache when calculating the UDR age, see the section below, Using Indexing Field Instead of System Time.

Based on Latest Time Stamp in Cache

When this radio button is selected, the UDR cache age calculation will be made toward the latest Indexing Field (timestamp) of a UDR that is included in the previously processed batch files.

For more information about the difference between Based on System Arrival Time and Based on Latest Time Stamp in Cache when calculating the UDR age, see the section below, Using Indexing Field Instead of System Time.

Max Cache Size (thousands)

The maximum number of UDRs to store in the duplicate cache. The value must be in the range of 100-9999999 (thousands), the default is 5000 (thousands). The cache will be made up of containers covering 50 seconds each, and for every incoming UDR, it will be determined in which cache container the UDR will be stored

During the initialization phase, the agent checks whether the cache is full or not. If the check indicates that there will be less than 10% of the cache available, cache containers will start to be cleared until a 10% free cache is reached, starting with the oldest container. Depending on how many UDRs are stored in each container, this means that different amounts of UDRs may be cleared depending on the setup. If the index field happens to have the same value in all the UDRs, all of the UDRs in the cache will be cleared. 

Note
titleNote!

If you have a very large cache size, it may be a good idea to split the workflows in order to preserve performance. 


Enable Separate Storage Per Workflow

This option enables each workflow to have a separate storage that is checked for duplicates. This allows multiple workflows to run simultaneously using the same Duplicate UDR profile. However, if this checkbox is selected, a UDR in a workflow will not be checked against UDRs in a different workflow.

Note
titleNote!

Duplicate UDR Inspector currently does not support Duplicate UDRprofiles with Enable Separate Storage Per Workflow enabled.


Type

The UDR type the agent will process.

Indexing Field

The UDR field is used as an index in the duplicate comparison. Fields of type long and date are valid for selection.

For performance reasons, this field should preferably be either an increasing sequence number or a timestamp with good locality. This field will always be implicitly evaluated.

For further information, see the section below, Using Indexing Field instead of System Time.

Date Field

If selected (default), the indexing field will be treated as a timestamp instead of a sequence number, and this has to be selected to be able to set the maximum age of UDRs to keep in the cache in the Max Cache Age (days) field above.

Note
titleNote!

If the selected indexing field is a timestamp that is configured to be 24 hours or more ahead of the system time, the workflow will abort. 


Checked Fields

The fields to use for the duplication evaluation, when deciding whether or not a UDR is a duplicate.

Note
titleNote!

If the  Checked Fields or  Indexing Field are modified after an agent is executed, the already stored information will be considered useless the next time the workflow is activated. Hence, duplicates will never be found amongst the old information since another type of metadata has replaced them.


...