Multi Directory Collection Strategy(4.2)
This section includes a description of the Multi Directory Collection Strategy that is applied in with the Disk, FTP, FTPS, HDFS, SCP and SFTP Collection agents.
Overview
The Multi Directory Collection Strategy enables you to configure a collection agent to collect data from a series of directories that are listed in a control file. The collection agent reads the control file and collects from the specified directories.
Configuration
You configure the Multi Directory Collection Strategy from the Disk tab for Disk Collection agent, the HDFS tab for the HDFS Collection agent and the Source tab for FTP, FTPS, SCP and SFTP Collection agents in the agent configuration view.
To Configure the Multi Directory Collection Strategy:
The collection agent configuration dialog
Setting | Description |
---|---|
From the drop-down list select Multi Directory. | |
Enter the path and the name of the control TXT-file. Note! If the control file is missing, it is empty or if the file is not readable, the workflow aborts. Example - A Control File controlfile.txt: directory1 directory1/subdir1 directory1/subdir2 directory2 /home/user/directory3 ... Example - A Control File for VMS controlfile_vms.txt: DISK$USERS:[USERS.USER1.TESTDIR1] DISK$USERS:[USERS.USER1.TESTDIR2] DISK$USERS:[USERS.USER1.TESTDIR2.SUBDIR1] DISK$USERS:[USERS.USER1.TESTDIR3] DISK$USERS:[USERS.USER1.TESTDIR4] ... | |
Filename | The regular expression of the names of the source files on the local file system. Regular expressions according to Java syntax applies. For further information, see https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html. Example To match all filenames beginning with Note! If you leave Filename empty, or if you specify |
Select this check box to abort the workflow if a directory, that is specified on the control file list, is missing on the server. Otherwise, the workflow continues to execute (default). | |
Select this check box to prevent collection of the same file more than once. Files are considered to be duplicate if the absolute filename is the same. The workflow holds an internal data structure with information about which files the collector has collected in previous executions. The data structure is purged by the collection strategy based on the contents of the collection directories. If files collected in the past are no longer found in the collection directory they are removed from the data structure. Note! The internal data structure is stored in the workflow state. Since workflow state is only updated when files are collected the purged internal data structure will be stored the next time a successful file collection is performed. It is possible to manually purge the internal duplicate data structure if needed. To do this, disable duplicate filter and run the workflow. The next time duplicate filter is enabled the internal data structure will be empty. | |
Select this check box to to enable generation of error or debug messages. Note! If you choose to enable messaging, make sure to enable debug on the Workflow Monitor, as well. For further information see Workflow Monitor (3.0). Since debugging has a negative impact on performance the debug option should never be enabled in a production environment. | |
Use File Reference/Route File Reference UDR | Select this check box if you want to forward the data to an SQL Loader agent. See the description of SQL Loader(3.0) for more information. |