Deduplicate
The Deduplicate function is a processor that you can use to find duplicate records among the collected records. In the https://infozone.atlassian.net/wiki/spaces/DAZ/pages/7856512, it must be connected to a collector or processor. You can configure the function to check for duplicate records based on all columns or based on specific columns.
For each record, the columns you select are compared to all the records in the cache to check for duplicates. If all specified columns match a record previously processed within the same cache period, the record is identified as a duplicate record. The duplicate record can then be either discarded from the stream or handled in a separate output channel.
Unique records are stored in the cache, and are discarded from the cache after the number of days that you can specify in the configuration of the function.
This section has the following subsections: