9.20.1 Duplicate UDR Detection Agent Overview
The Duplicate UDR Detection agent provides duplication control on incoming UDRs. Each new UDR is compared with the UDRs that are already stored, to evaluate if it is a duplicate.
If a duplicate is found, a message is automatically logged in the System Log, and the UDR is marked as erroneous and routed on a user-defined route, for instance to ECS. If the UDR is routed to ECS, an automatically generated ECS Error Code, DUPLICATE_UDR
, is assigned to the UDR, which enables searching for duplicate UDRs in ECS.
Duplication comparison is not based on the content of a complete UDR but on the content of the fields selected by the user. When a UDR arrives, two values are calculated by the agent:
- Key from indexing field
- Checksum based on all the fields to check and the indexing field
The key from the indexing field is used to find the right "container" in the cache. If an entry with the same checksum is found in the container, then this is classified as a duplicate.
Note!
If the same file happens to be reprocessed, all UDRs will be considered as being duplicates, unless the cache is full, in which case a part of the cache will be cleared and the corresponding amount of UDRs will be considered as non-duplicates. If the file contains a considerable number of UDRs, the process of inserting all of them in ECS may be time-consuming.
Having a Duplicate Batch agent prior to the Duplicate UDR Detection agent will only make the problem worse. The Duplicate Batch agent will not detect that the batch is a duplicate until the end of the batch. At that point all UDRs have already passed the Duplicate UDR Detection agent and are inserted, as duplicates, into ECS. Since the Duplicate Batch agent will flag for a duplicate batch, the batch is removed from the stream forcing the Duplicate UDR Detection agent to also remove all UDRs from ECS.