HDFS Collection Agent Configuration
You open the HDFS collection agent configuration dialog from a workflow configuration. To open the HDFS processing agent configuration, click  Build → New Configuration. Select  Workflow from the Configurations dialog. When prompted to Select workflow type, select Batch. Click Add agent and select HDFS from the Collection tab of the Agent Selection dialog.
Part of the configuration may be done in the Filename Sequence or Sort Order service tab described in Workflow Template.
HDFS Tab
The HDFS tab contains configurations related to the placement and handling of the source files to be collected by the agent.
HDFS collection agent configuration - HDFS tab, General
Item | Description |
---|---|
Profile | Select the File System profile you want the agent to use, see File System Profile for further information about this profile. |
Collection Strategy | If there is more than one collection strategy available in the system a Collection Strategy drop-down list will also be visible. For more information about the nature of the collection strategy please refer to Appendix 4 - Collection Strategies. |
File Information Settings | |
Directory | Enter the absolute pathname of the directory on the remote file system, where the source files reside. |
Filename | Enter the name of the source files on the local file system. Regular expressions according to Java syntax apply. For further information, see http://docs.oracle.com/javase/8/docs/api/java/util/regex/Pattern.html. ExampleTo match all filenames beginning with |
Compression | Select the compression type of the source files. Determines if the agent will decompress the files before passing them on in the workflow.
|
Before Collection Settings | |
Move to Temporary Directory | If enabled, the source files will be moved to the automatically created subdirectory |
Append Suffix to Filename | Enter the suffix that you want added to the file name prior to collecting it. ImportantBefore you execute your workflow, make sure that none of the file names in the collection directory include this suffix. |
Inactive Source Warning (hours) | If the specified value is greater than zero, and if no file has been collected during the specified number of hours, the following message is logged: The source has been idle for more than <n> hours, the last inserted file is <file>. |
After Collection Settings | |
Move to | If enabled, the source files will be moved from the source directory (or from the directory If the Prefix or Suffix fields are set, the file will be renamed as well. NoteIt is possible to move collected files from one file system to another however it causes negative impact on the performance. Also, the workflow will not be transaction safe, because of the nature of the If it is desired to move files between file systems it is strongly recommended to route the HDFS collection agent directly to a HDFS forwarding agent, configuring the output agent to store the files in the desired directory, HDFS Forwarding Agent. This is because of the following reasons:
|
Rename | If enabled, the source files will be renamed after the collection, remaining in the source directory from which they were collected (or moved back from the directory |
Remove | If enabled, the source files will be removed from the source directory (or from the directory DR_TMP_DIR, if using Move to Temporary Directory), after the collection. |
Ignore | If enabled, the source files will remain in the source directory after collection. |
Destination | Enter the absolute pathname of the directory on the local file system of the EC into which the source files will be moved after collection. The pathname might also be given relative to the $MZ_HOME environment variable. This field is only enabled if Move to is selected. |
Prefix/Suffix | Enter the Prefix and/or suffix that will be appended to the beginning respectively the end of the name of the source files, after the collection. These fields are only enabled if Move to or Rename is selected. |
Search and Replace | Â
Search and Replace operate on your entries in a way that is similar to the Unix sed utility. The identified filenames are modified and forwarded to the following agent in the workflow. This functionality enables you to perform advanced filename modifications, as well:
|
Keep (days) | Specify the number of days to keep source files after the collection. In order to delete the source files, the workflow has to be executed (scheduled or manually) again, after the configured number of days. Note, a date tag is added to the filename, determining when the file may be removed. This field is only enabled if Move to or Rename is selected. |