Disk Collection Agent(4.3)

The Disk collection agent collects files from a local file system and inserts them into a workflow. Initially, the source directory is scanned for all files matching the current filter. All files found will be fed one after the other into the workflow.

Configuration

The Disk agent configuration consists of three tabs: Disk, Filename Sequence and Sort Order. For information on Filename Sequence and Sort Order, see Workflow Template (3.0).

Disk Tab

The Disk tab contains settings related to the placement and handling of the source files to be collected by the agent.

SettingDescription

Collection Strategy

From the drop down list you can choose between the Default Collection Strategy and Duplicate Filter. Your selection determines the settings to be configured on the tab.

Directory

Enter the absolute pathname of the source directory of the local file system, where the source files reside. You can also enter the pathname relative to the $MZ_HOME environment variable.

Include Subfolders

Select this check box if you have subfolders in the source directory from which you want files to be collected.

If you select Enable Sort Order in the Sort Order tab, the sort order selected also applies to subfolders.

Filename

Enter the name of the source files on the local file system. Regular expressions according to Java syntax applies. For further information, see https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/regex/Pattern.html.

Example - Filenames

To match all filenames beginning with  TTFILE , you enter:  TTFILE.*

Compression

Select the compression type of the source files. This setting determines if the agent decompresses the files before passing them on in the workflow.

  • No Compression - The agent does not decompress the files. This is the default setting.

  • Gzip - The agent decompresses the files using gzip.

When you select Default Collection Strategy:

SettingDescription

Move to Temporary Directory

If you select this option, the source files are moved to the automatically created subdirectory DR_TMP_DIR in the source directory, prior to collection. This option supports safe collection of a source file reusing the same name.

Append Suffix to Filename

Enter the suffix that you want to add to the file name prior to collecting it.

Important!

 Before you execute your workflow, make sure that none of the file names in the collection directory include this suffix.

Inactive Source Warning (hours)

If the value that you enter is greater than zero, and if no file has been collected during the specified number of hours, the following message is logged:

The source has been idle for more than <n> hours, the last inserted file is <file>.

Move to

If you select this option, the source files are moved from the source directory or subfolders if Include Subfolders is selected (or from the directory DR_TMP_DIR, if you are using Move to Temporary Directory) to the directory specified in the Destination field, after collection.

If the Prefix or Suffix fields are set, the file is also renamed.

Note!

If you want to move files between file systems, it is recommended that you route the Disk collection agent directly to a Disk forwarding agent, configuring the output agent to store the files in the required directory. For further information, see Disk Forwarding Agent(4.3).

This is because:

  • It is not always possible to move collected files from one file system to another.

  • Moving files between different file systems usually causes worse performance than having them on the same file system.

  • The workflow will not be transaction safe, because of the nature of the copy plus delete functionality.

Destination

Enter the absolute pathname of the directory on the local file system of the EC into which the source files are to be moved after collection. You can also provide the pathname relative to the $MZ_HOME environment variable.

This field is only enabled if Move to is selected.

Rename

If you select this option, the source files are renamed after collection, remaining in the source directory or subfolders from which they were collected (or moved back from the directory DR_TMP_DIR, if you are using Move To Temporary Directory).
Prefix/Suffix

Enter the prefix and/or suffix to be appended to the name of the source files, after collection.

These fields are only enabled if Move to or Rename is selected.

Note!

If Rename is selected, the source files are renamed after collection, remaining in the source directory or subfolders from which they were collected (or moved back from the directory DR_TMP_DIR, if you are using Move To Temporary Directory). Ensure that you do not assign a prefix or suffix, giving files new names, that still match the filename regular expression, otherwise the files will be collected over and over again.

Search and Replace

To apply Search and Replace, select either Move to or Rename.

  • Search: Enter the part of the filename that you want to replace.

  • Replace: Enter the replacement text.

Search and Replace operate on your entries in a way that is similar to the Unix sed utility. The identified filenames are modified and forwarded to the next agent in the workflow.

This functionality also allows you to make advanced filename modifications:

  • Use regular expression in the Search entry to specify the part of the filename that you want to extract.

    Note!

    A regular expression that fails to match the original file name aborts the workflow.

  • Enter Replace with characters and meta characters that define the pattern and content of the replacement text.


Search and Replace Examples

To rename the file file1.new to file1.old, use:

  • Search: .new
  • Replace: .old

To rename the file JAN2011_file to file_DONE, use:

  • Search: ([A-Z]*[0-9]*)_([a-z]*)
  • Replace: $2_DONE

Note!

The search value divides the file name into two parts with brackets. The replace value applies the second part with the place holder $2. 

Keep (days)

Enter the number of days to keep source files after collection. To delete the source files, the workflow has to be executed again (scheduled or manually), after the number of days that you enter.

Note!

A date tag is added to the filename, determining when the file can be removed. This field is only enabled if you have selected Move to or Rename.

RemoveIf you select this option, the source files are removed from the source directory or subfolders(or from the directory DR_TMP_DIR, if you are using Move To Temporary Directory), after collection.
IgnoreIf you select this option, the source files remain in the source directory or subfolders after collection.

When you select Duplicate Filter:

SettingDescription
FilenameSelect this option to have only the filename checked for duplicates. If the filename is in the list of files which have already been collected once, the file is ignored by the agent.
Filename and TimestampSelect this option to have both the filename and the time stamp of the last modification checked for duplicates. If the file has already been collected once, it is collected again only if the duplicate check reveals that the file has been updated since the previous collection.
File List Size

Enter a value to determine the maximum size of the list of files already collected. This list of files is compared to the input files to detect duplicates and prevent them from being collected by the agent.

When this collection strategy is used with multiple server connection strategy, each host has its own duplicate list. If a server is removed from the multiple server configuration, the collection strategy automatically drops the list of duplicates for that host in the next successful collection.

Note!

If the number of files to be collected is greater than the file list size, files older than the oldest file in the list are not collected.

Batch Operations

If a beginBatch process is started, any files that are moved will be inaccessible by the disk collection agent. fileMove operations can be issued in beginBatch processes if the files are not read by any other agent. The fileMove function can be used in endBatch for zero byte files. 


This section contains the following subsections: