Duplicate Filter Collection Strategy

This section includes a description of the Duplicate Filter Collection Strategy that is applied with the Disk Advanced, FTP, SFTP, and SCP agents. The Duplicate Filter Collection Strategy helps you configure a collection agent to collect files from a directory without having the same files being collected again.

Configuration

You configure the Duplicate Filter Collection Strategy from the Source tab in the agent configuration dialog.


The Duplicate Filter configuration dialog


SettingDescription

Collection Strategy

From the drop-down list select Duplicate Filter.

Directory

Absolute pathname of the source directory on the remote host, where the source files reside. The pathname might also be given relative to the home directory of the User Name account.

Filename

Name of the source files on the remote host.

Regular expressions according to Java syntax applies. For further information, see http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html.


Example

To match all file names beginning with  TTFILE , type:  TTFILE.*

Compression

Compression type of the source files. Determines if the agent will decompress the files before passing them on in the workflow.

  • No Compression - agent does  not  decompress the files.

  • Gzip - agent decompresses the files using gzip.

Duplicate Criteria - Filename

Select this option to have only the filename compared for the duplicate check. If the filename is in the list of files which have already been collected once, the file is ignored by the agent.

Duplicate Criteria - Filename and Timestamp

Select this option to have both the filename and the time stamp of the last modification, compared when checking for duplicates. If the file has already been collected once, it is collected again only if the duplicate check reveals that the file has been updated since the previous collection.

Note!

Files that have the same name and are older than the last collected file by the same name, are ignored. Only files which time stamp is more recent are collected.

File List Size

Enter a value to specify the maximum size of the list of already collected files. This list of files is compared to the input files in order to detect duplicates and prevent them from being collected by the agent.

When this collection strategy is used with multiple server connection strategy, each host has its own duplicate list. If a server is removed from the multiple server configuration the collection strategy will automatically drop the list of duplicates for that host in the next successful collection.

Note!

If the number of files to be collected is greater than the file list size, files older than the oldest file in the list are not collected.

Route FileReferenceUDR

Select this checkbox to route File Reference UDR instead of raw data.