Excerpt | ||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||||||||||||
A Duplicate UDR (4.3) agent is configured in two steps. First, a profile has to be defined, then the regular configurations of the agent are made. The Duplicate UDR profile is loaded when you start a workflow that depends on it. Changes to the profile become effective when you restart the workflow. ConfigurationTo create a new Duplicate UDR profile configuration, click the New Configuration button from the Configuration dialog available from Build View (4.3), and then select Duplicate UDR Profile from the menu. To open an existing Duplicate UDR profile configuration, click on the configuration in the Configuration Navigator, or right-click on the configuration and then select View Configuration. The contents of the menus in the menu bar may change depending on which configuration type has been opened. The Duplicate UDR profile uses the standard menu items and buttons that are visible for all configurations, and these are described in Common Configuration Buttons (4.3).
General TabThe General tab is displayed by default. In the General tab, the File Storage is displayed as the default Storage for a New Configuration. The General tab is split into two sections, the Storagesettings and the UDRsettings. To begin, first select either File Storage, SQL Storage or SQL Kafka Storage with the Storage selector. When a Storage is selected, only Only settings relevant to the selected Storage will be displayed. Storage SettingsThe Storage settings is the top section of the General Tab. It contains settings to setup the Duplicate UDR cache storage and settings for managing the cache size and data expiration. File StorageNote! The Duplicate UDR profile configuration contains the following Storage settings specific to File Storage:
External References specific to File Storage can be used with the following field:
SQL StorageThe Duplicate UDR profile configuration contains the following Storage settings specific to SQL Storage:
Generate SQL Dialog BoxWhen a user clicks on the SQL Storage Generate SQL button, the associated dialog box will open. The Copy button is a convenient way to copy the whole Create Tables SQL Script. Kafka StorageYou will need to set the storage to Kafka when creating a scalable solution.
| ||||||||||||||||||||
Setting | Description | |||||||||||||||||||
Partition Profile | Configure a Partition Profile (4.3) to be used when creating a scalable solution. |
If you select Kafka storage in the Dup UDR Profile, the |
Note!
Duplicate UDR Inspector does not support Duplicate UDR profiles with the Kafka storage type selected.
Setting | Description |
---|---|
Partition Profile | Configure a Partition Profile (4.3) to be used when creating a scalable solution. |
More Storage Settings
The Duplicate UDR profile configuration contains the following Storage settings common to multiple storage types.
Setting | Description |
---|---|
Max Cache Age (days) | The maximum number of days to keep UDRs in the cache. The age of a UDR stored in the cache is either calculated from the Indexing Field (timestamp) of a UDR in the latest processed batch file, or from the system time, depending on whether Based on System Arrival Time or Based on Latest Time Stamp in Cache is selected. If the Date Field option in the UDRsettings section below, is not enabled for the Indexing Field, Max Cache Age setting will be disabled and ignored, and cache size can only be configured using the Max Cache Size settings. If enabled, the default value is 30 days. Note! Note! |
Based On System Arrival Time | When Max Cache Age is enabled, this radio button is selected by default, the calculation of cached UDR's age will be based on the time when a new batch is being processed. In case of a longer system idle time, this setting may have a major impact on which UDRs are removed from the cache. For more information about the difference between Based on System Arrival Time and Based on Latest Time Stamp in Cache when calculating the UDR age, see the section, Duplicate UDR Using Indexing Field Instead of System Time(4.3). |
Based on Latest Time Stamp in Cache | When Max Cache Age is enabled and this radio button is selected, the UDR cache age calculation will be made toward the latest Indexing Field (timestamp) of a UDR that was included in the previously processed batch files. For more information about the difference between Based on System Arrival Time and Based on Latest Time Stamp in Cache when calculating the UDR age, see the section on 'Indexing fields instead of System Time' linked above. |
Max Cache Size (thousands) | The maximum number of UDRs to store in the duplicate cache. The value must be in the range of 100-9999999 (thousands), the default is 5000 (thousands). The cache will be made up of containers partitioned by the key from the Indexing Field below. For every incoming UDR, it will be determined in which cache container the UDR will be stored. During the initialization phase of each batch, the agent checks if the cache is full. If the check indicates that there will be less than 10% of the cache available, cache containers will start to be cleared until at least 10% free cache is reached, starting with the oldest container. Note! Note! |
Enable Separate Storage Per Workflow | This option enables each workflow to have a separate storage that is checked for duplicates. This allows multiple workflows to run simultaneously using the same Duplicate UDR profile. However, if this checkbox is selected, a UDR in a workflow will not be checked against UDRs in a different workflow. Note! |
External References Usage
File Storage and SQL Storage:
Max Cache Age
Max Cache Size
Kafka Storage:
Max Cache Size
UDR Settings
The UDR settings is the bottom section of the General tab. It contains settings to select which UDR to apply duplicate checks, the UDR field used to segment the Duplicate UDR cache into containers and information to manage the scope and contents in the Duplicate UDR containers.
Setting | Description |
---|---|
Type | The UDR type the agent will process. |
Indexing Field | The UDR field is used as an index in the duplicate comparison. Fields of type long (in milliseconds) and date are valid for selection. The cache will be made up of containers partitioned by the key from this Indexing Field. If Date Field below is disabled, each container will cover 50 seconds. If the Date Field is enabled, each container will cover 10 minutes. For every incoming UDR, it will be determined in which cache container the UDR will be stored. For performance reasons, this field should preferably be either an increasing sequence number or a timestamp with a good locality. This field will always be implicitly evaluated. For further information, see the section Duplicate UDR Using Indexing Field Instead of System Time(4.3). The Duplicate UDR profile configuration contains the following UDR settings common to both File Storage and SQL Storage. |
Date Field | If selected, the Indexing Field will be treated as a timestamp instead of a sequence number, and this must be selected to enable the Max Cache Age (days) field above to be configured. Note! |
Checked Fields | In addition to the Indexing Field, the Checked Fields will be used for the duplication evaluation when deciding if a UDR is a duplicate. Note! |
Advanced Tab
The Advanced tab is available when you have selectedeither SQL Storage, or Kafka Storage for your Duplicate UDR Storage. It contains properties that can be used for performance tuning. For information about setting up SQL Storage for better performance, see Duplicate UDR SQL Storage Setup Guide(4.3). For information about setting up Kafka storage for better performance, see /wiki/spaces/UEPE4D/pages/407928875.
...