Duplicate UDR Agent Overview(3.1)
The Duplicate UDR Detection agent provides duplication control on incoming UDRs. Each new UDR is compared with the UDRs that are already stored, to evaluate if it is a duplicate.
If a duplicate is found, a message is automatically logged in the System Log, and the UDR is marked as erroneous and routed on a user defined route, for instance to Data Veracity. If the UDR is routed to Data Veracity, an automatically generated Error Code, DUPLICATE_UDR
, is assigned to the UDR, which enables searching for duplicate UDRs in Data Veracity.
Duplication comparison is not based on the contents of a complete UDR but the contents of the fields selected by the user. When a UDR arrives, two values are calculated by the agent:
- Key from indexing field
- Checksum based on all the fields to check and the indexing field
The key from the indexing field is used to find the right "container" in the cache. If an entry with the same checksum is found in the container, then this is classified as a duplicate.
Note!
If the same file happens to be reprocessed, all UDRs will be considered as being duplicates, unless the cache is full, in which case a part of the cache will be cleared and the corresponding amount of UDRs will be considered as non-duplicates. If the file contains a considerable number of UDRs, the process of inserting all of them in Data Veracity may be time-consuming.
Having a Duplicate Batch agent prior to the Duplicate UDR agent only makes the problem worse. The Duplicate Batch agent does not detect that the batch is a duplicate until the end of the batch. At that point all UDRs have already passed the Duplicate UDR agent and are inserted as duplicates into Data Veracity. Since the Duplicate Batch agent will flag for a duplicate batch, the batch is removed from the stream forcing the Duplicate UDR agent to also remove all UDRs from Data Veracity.