Configuration
To create a new Duplicate UDR profile configuration, click the New Configuration button from the Configuration dialog available from Build View (4.3), and then select Duplicate UDR Profile from the menu.
To open an existing Duplicate UDR profile configuration, click on the configuration in the Configuration Navigator, or right-click on the configuration and then select View Configuration.
The contents of the menus in the menu bar may change depending on which configuration type has been opened. The Duplicate UDR profile uses the standard menu items and buttons that are visible for all configurations, and these are described in Common Configuration Buttons (4.3).
Button | Description |
---|---|
External References | Select this menu item to Enable External References in an agent profile field. Refer to Enabling External References in an Agent Profile Field in External Reference (4.3) for further information. |
General Tab
The General tab is displayed by default.
In the General tab, the File Storage is displayed as the default Storage for a New Configuration.
The General tab is split into two sections, the Storage settings and the UDR settings.
To begin, first select either File Storage or SQL Storage with the Storage selector. When a Storage is selected, only settings relevant to the Storage will be displayed.
Storage Settings
The Storage settings is the top section of the General Tab. It contains settings to setup the Duplicate UDR cache storage and settings for managing the cache size and data expiration.
File Storage
The Duplicate UDR profile configuration contains the following Storage settings specific to File Storage:
Setting | Description |
---|---|
Storage Host | In the drop-down menu, the preferred storage host can be selected, where the duplicate UDRs are to be stored. The choice for storage of duplicate repositories is either on a specific EC Group or Automatic. If Automatic is selected, the same EC Group used by the running workflow will be selected. When the Duplicate UDR Inspector is used, the EC Group is automatically selected. Note! |
Directory | An absolute path to the directory on the selected storage host, in which to store the duplicate cache. If this field is greyed out with a stated directory, it means that the directory path has been hard-coded using the |
External References specific to File Storage can be used with the following field:
Directory
SQL Storage
The Duplicate UDR profile configuration contains the following Storage settings specific to SQL Storage:
Setting | Description |
---|---|
Database Profile | This is the database in which to store the Duplicate UDR cache. Click the Browse... button to get a list of all the database profiles that are available. For further information see Database (4.3). Duplicate UDR SQL Storage is supported for use with the following database:
Note! |
Generate SQL | Click this button to bring up a dialog that will contain the SQL statements for the table schema generated for the Duplicate UDR profile. Note! Warning! |
Generate SQL Dialog Box
When a user clicks on the SQL Storage Generate SQL button, the associated dialog box will open. The Copy button is a convenient way to copy the whole Create Tables SQL Script.
Kafka Storage
You will need to set the storage to Kafka when creating a scalable solution.
Note!
For Kafka Storage, a separate cache is created for every partition.
Setting | Description |
---|---|
Partition Profile | Configure a Partition Profile (4.3) to be used when creating a scalable solution. |
Note!
If you select Kafka storage in the Dup UDR Profile, the mz.present.dupUDR.storage.path
property cannot be configured on the EC where the workflow is running.
More Storage Settings
The Duplicate UDR profile configuration contains the following Storage settings common to multiple storage types.
Setting | Description |
---|---|
Max Cache Age (days) | The maximum number of days to keep UDRs in the cache. The age of a UDR stored in the cache is either calculated from the Indexing Field (timestamp) of a UDR in the latest processed batch file, or from the system time, depending on whether Based on System Arrival Time or Based on Latest Time Stamp in Cache is selected. If the Date Field option in the UDR settings section below, is not enabled for the Indexing Field, Max Cache Age setting will be disabled and ignored, and cache size can only be configured using the Max Cache Size settings. If enabled, the default value is 30 days. Note! Note! |
Based On System Arrival Time | When Max Cache Age is enabled, this radio button is selected by default, the calculation of cached UDR's age will be based on the time when a new batch is being processed. In case of a longer system idle time, this setting may have a major impact on which UDRs are removed from the cache. For more information about the difference between Based on System Arrival Time and Based on Latest Time Stamp in Cache when calculating the UDR age, see the section, Duplicate UDR Using Indexing Field Instead of System Time(4.3). |
Based on Latest Time Stamp in Cache | When Max Cache Age is enabled and this radio button is selected, the UDR cache age calculation will be made toward the latest Indexing Field (timestamp) of a UDR that was included in the previously processed batch files. For more information about the difference between Based on System Arrival Time and Based on Latest Time Stamp in Cache when calculating the UDR age, see the section on 'Indexing fields instead of System Time' linked above. |
Max Cache Size (thousands) | The maximum number of UDRs to store in the duplicate cache. The value must be in the range of 100-9999999 (thousands), the default is 5000 (thousands). The cache will be made up of containers partitioned by the key from the Indexing Field below. For every incoming UDR, it will be determined in which cache container the UDR will be stored. During the initialization phase of each batch, the agent checks if the cache is full. If the check indicates that there will be less than 10% of the cache available, cache containers will start to be cleared until at least 10% free cache is reached, starting with the oldest container. Note! Note! |
Enable Separate Storage Per Workflow | This option enables each workflow to have a separate storage that is checked for duplicates. This allows multiple workflows to run simultaneously using the same Duplicate UDR profile. However, if this checkbox is selected, a UDR in a workflow will not be checked against UDRs in a different workflow. Note! |
External References Usage
File Storage and SQL Storage:
Max Cache Age
Max Cache Size
Kafka Storage:
Max Cache Size
UDR Settings
The UDR settings is the bottom section of the General tab. It contains settings to select which UDR to apply duplicate checks, the UDR field used to segment the Duplicate UDR cache into containers and information to manage the scope and contents in the Duplicate UDR containers.
Setting | Description |
---|---|
Type | The UDR type the agent will process. |
Indexing Field | The UDR field is used as an index in the duplicate comparison. Fields of type long (in milliseconds) and date are valid for selection. The cache will be made up of containers partitioned by the key from this Indexing Field. If Date Field below is disabled, each container will cover 50 seconds. If the Date Field is enabled, each container will cover 10 minutes. For every incoming UDR, it will be determined in which cache container the UDR will be stored. For performance reasons, this field should preferably be either an increasing sequence number or a timestamp with a good locality. This field will always be implicitly evaluated. For further information, see the section Duplicate UDR Using Indexing Field Instead of System Time(4.3). The Duplicate UDR profile configuration contains the following UDR settings common to both File Storage and SQL Storage. |
Date Field | If selected, the Indexing Field will be treated as a timestamp instead of a sequence number, and this must be selected to enable the Max Cache Age (days) field above to be configured. Note! |
Checked Fields | In addition to the Indexing Field, the Checked Fields will be used for the duplication evaluation when deciding if a UDR is a duplicate. Note! |
Advanced Tab
The Advanced tab is available when you have selected either SQL Storage, or Kafka Storage for your Duplicate UDR Storage. It contains properties that can be used for performance tuning. For information about setting up SQL Storage for better performance, see Duplicate UDR SQL Storage Setup Guide(4.3). For information about setting up Kafka storage for better performance, see /wiki/spaces/UEPE4D/pages/407928875.