Control File Collection Strategy(4.2)

This section includes a description of the complementary Control File Collection Strategy that is applied in    for the Disk, FTP, FTPS, HDFS, SCP and SFTP Collection agents. 

Overview

The collection strategy makes it possible to collect files for which a corresponding control file exist. If the control file does not exist, the file is ignored.

Configuration

The Control File Collection Strategy controls which further configuration options that are available in the Disk tab for Disk Collection agent, the HDFS tab for the HDFS Collection agent and the Source tab for FTP, FTPS, SCP and SFTP Collection agents. If no strategy is selected, the default strategy is used.

To Configure the Control File Collection Strategy:

Collection Strategy - Control File Collection Strategy

Note!

The Collection Strategy drop down list will only be visible if there are other collection strategies available in the system, apart from the default collection strategy available.

SettingDescription

Collection Strategy

Select the Control File option in this list.

Directory

Enter the absolute path name of the source directory on the remote host, where the source files reside. The path name may also be entered relative to the home directory of the User Name account.

Include Subfolders

Select this check box if you have subfolders in the source directory from which you want files to be collected.

Note!

Subfolders that are in the form of a link are not supported.

If you select Enable Sort Order in the Sort Order tab, the sort order selected will also apply to subfolders.

Filename

Enter the name of the source files on the remote host.

Regular expressions according to Java syntax can be used.

For further information, see https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html.


Example

To match all file names beginning with TTFILE, enter: TTFILE.*

Compression

Select compression type for the source files. This selection determines if the agent will decompress the files before passing them on in the workflow.

  • No Compression - the agent will  not  decompress the files.

  • Gzip - the agent will decompress the files using gzip.

Position

The control filename consists of an extension added either before or after the shared filename part. Select one of the choices: Prefix or Suffix.

Setting the option to Prefix will ensure that the text entered into the Control File Extension field will be searched for before the shared filename part and the Suffix option will ensure that the text entered in the Control File Extension field, will be searched for after the shared filename part.

Control File Extension

The text entered in this field is the expected extension of the shared filename. The Control File Extension will be attached to the beginning or the end of the shared filename, depending on the selection made in Position.

Note!

The Control File Extension will help determine when the data files should be collected. Say for example the extension is set to .ok and the data files are represented with the filename FILE. FILE will only be collected if the corresponding control file exists, such as FILE.ok.

Data File Extension

The Data File Extension will only be applicable if  Position  is set to Suffix.

There can be cases where a stricter definition of which files should be collected is needed. This is defined in the Data File Extension field.

Consider a data file called FILE.dat. If .dat is entered in the Data File Extension field the corresponding Control file will be called FILE.ok if .ok is entered in the Control File Extension field.

Note!

Consider a directory containing 5 files:

  • FILE1.dat

  • FILE2.dat

  • FILE1.ok

  • ok.FILE1

  • FILE1

  1. The  Position  field is set to Prefix and the Control File Extension field is set to .ok.

    The control file is ok.FILE1 and FILE1 will be the file collected.
     

  2. The  Position  field is set to Suffix and the Control File Extension field is set to .ok.

    The control file is FILE1.ok and FILE1 will be be the file collected.
     

  3. The  Position  field is set to Suffix and the Control File Extension field is set to .ok and the Data File Extension field is set to .dat.

    The control file is FILE1.ok and FILE1.dat will be the file collected.

After collection, the control file is handled in the same way as the collected file is configured to be handled, that is the system should delete/rename/move/ignore it.

Move to Temporary Directory

If this option is selected, the source files will be moved to the automatically created subdirectory DR_TMP_DIR in the source directory, before collection. This option supports safe collection when source files repeatedly use the same name.

Inactive Source Warning (h)

If this option is selected, a warning message (event) will appear in the System Log and Event Area when the configured number of hours have passed without any file being available for collection:

The source has been idle for more than <n> hours,
       the last inserted file is <file>.

Move to

If this option is selected, the source files will be moved from the source directory (or from the directory DR_TMP_DIR if using Move to Temporary Directory), to the directory specified in the  Destination  field, after collection.

Note!

The  Destination must be located in the same file system as the collected files at the remote host. Additionally, absolute path names must be defined (relative path names cannot be used).

Rename

If this option is selected, the source files will be renamed after the collection, and remain (or moved back from the directory DR_TMP_DIR if using Move to Temporary Directory) in the source directory from which they were collected.

Remove

If this option is selected, the source files will be removed from the source directory (or from the directory DR_TMP_DIR, if using Move to Temporary Directory), after the collection.

Ignore

If this option is selected, the source files will remain in the source directory after the collection. This field is  not available if Move to Temporary Directory is enabled.

Destination

If the Move to option has been selected, enter the full path name of the directory on the remote host into which the source files will be moved after the collection in this field. If any of the other After Collection options have been selected, this option will not be available.

Prefix and Suffix

If any of the Move to or Rename options have been selected, enter the prefix or suffix that will be appended to the beginning or end of the name of the source files, respectively, after the collection, in these fields. If any of the other After Collection options have been selected, this option will not be available.

Note!

If  Rename  is enabled, the source files will be renamed in the current (source or  DR_TMP_DIR ) directory. Ensure that you do not assign a  Prefix  or Suffix , giving files new names that still match the  Filename regular expression. That will cause the files to be collected over and over again.

Keep (days)

If any of the Move to or Rename options have been selected, enter the number of days to keep moved or renamed source files on the remote host after the collection in this field. In order to delete the source files, the workflow has to be executed (scheduled or manually) again, after the configured number of days. If any of the other After Collection options have been selected, this option will not be available.

Note!

A date tag is added to the filename, determining when the file may be removed.

Use File Reference/Route FileReferenceUDRSelect this check box if you want to forward the data to an SQL Loader agent. See the description of SQL Loader(3.0) for more information.