Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The Duplicate UDR cache is partitioned into containers by the key from the indexing field. The key from the indexing field is used to find the right "container" in the cache. If an entry with the same checksum is found in the container, then this is classified as a duplicate.

Note

Note!

If a previously processed file is encountered again, all its UDRs will be treated as duplicates. However, should the cache reach a configured capacity when about to reprocess said file, the system initiates a pruning process, starting with the UDRs in the oldest container. This action frees up space in the cache, allowing new UDRs to be added while the corresponding amount cleared will be treated as non-duplicates. If the file contains a considerable number of UDRs, the process of inserting all of them in ECS may be time-consuming.

Having a Duplicate Batch agent prior to the Duplicate UDR agent will only make the problem worse. The Duplicate Batch agent will not detect that the batch is a duplicate until the end of the batch. At that point, all UDRs have already passed the Duplicate UDR agent and are inserted, as duplicates, into ECS. Since the Duplicate Batch agent will flag for a duplicate batch, the batch is removed from the stream forcing the Duplicate UDR agent to also remove all UDRs from ECS.

Prerequisites

The reader of this information should be familiar with:

...