A Duplicate UDR agent is configured in two steps. First, a profile has to be defined, then the regular configurations of the agent are made. The Duplicate UDR profile is loaded when you start a workflow that depends on it. Changes to the profile become effective when you restart the workflow.
...
Setting | Description | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
Storage Host | In the drop-down menu, the preferred storage host, where the duplicate UDRs are to be stored, can be selected. The choice for storage of duplicate repositories is either on a specific EC Group or ADuplicate UDR Inspector is used, the EC Group is automatically selected. . If is selected, the same EC Group used by the running workflow will be selected, or when the
| |||||||||
Directory | An absolute path to the directory on the selected storage host, in which to store the duplicate cache. If this field is greyed out with a stated directory, it means that the directory path has been hard-coded using the
For further information about all available system properties, see/wiki/spaces/MD82/pages/3778732. | |||||||||
Max Cache Age (days) | The maximum number of days to keep UDRs in the cache. The age of a UDR stored in the cache is either calculated from the Indexing Field (timestamp) of a UDR in the latest processed batch file, or from the system time, depending on whether Based on System Arrival Time or Based on Latest Time Stamp in Cache is selected. If the Date Field option, below, is not selected as an indexing field, this field will be deactivated and ignored, and cache size can only be configured using the Max Cache Size settings. The default value is 30 days.
| |||||||||
Based On System Arrival Time | When this radio button is selected (default), the calculation of cached UDR's age will be based on the time when a new batch is being processed. In case of a longer system idle time, this setting may have a major impact on which UDRs that are removed from the cache. For more information about the difference between Based on System Arrival Time and Based on Latest Time Stamp in Cache when calculating the UDR age, see the section below, Using Indexing Field Instead of System Time. | |||||||||
Based on Latest Time Stamp in Cache | When this radio button is selected, the UDR cache age calculation will be made toward the latest Indexing Field (timestamp) of a UDR that is included in the previously processed batch files. For more information about the difference between Based on System Arrival Time and Based on Latest Time Stamp in Cache when calculating the UDR age, see the section below, Using Indexing Field Instead of System Time. | |||||||||
Max Cache Size (thousands) | The maximum number of UDRs to store in the duplicate cache. The value must be in the range of 100-9999999 (thousands), the default is 5000 (thousands). The cache will be made up of containers covering 50 seconds each, and for every incoming UDR, it will be determined in which cache container the UDR will be stored During the initialization phase, the agent checks whether the cache is full or not. If the check indicates that there will be less than 10% of the cache available, cache containers will start to be cleared until a 10% free cache is reached, starting with the oldest container. Depending on how many UDRs are stored in each container, this means that different amounts of UDRs may be cleared depending on the setup. If the index field happens to have the same value in all the UDRs, all of the UDRs in the cache will be cleared.
| |||||||||
Enable Separate Storage Per Workflow | This option enables each workflow to have a separate storage that is checked for duplicates. This allows multiple workflows to run simultaneously using the same Duplicate UDR profile. However, if this checkbox is selected, a UDR in a workflow will not be checked against UDRs in a different workflow.
| |||||||||
Type | The UDR type the agent will process. | |||||||||
Indexing Field | The UDR field is used as an index in the duplicate comparison. Fields of type long and date are valid for selection. For performance reasons, this field should preferably be either an increasing sequence number or a timestamp with good locality. This field will always be implicitly evaluated. For further information, see the section below, Using Indexing Field instead of System Time. | |||||||||
Date Field | If selected (default), the indexing field will be treated as a timestamp instead of a sequence number, and this has to be selected to be able to set the maximum age of UDRs to keep in the cache in the field above.
| |||||||||
Checked Fields | The fields to use for the duplication evaluation, when deciding whether or not a UDR is a duplicate.
|
...