Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Current »

The HDFS collection agent collects files from a HDFS, which is the primary distributed storage used by Hadoop applications, and inserts them into a workflow. A HDFS cluster primarily consists of a NameNode that manages the file system meta data, and DataNodes that store the actual data. Initially, the source directory is scanned for all files matching the current filter. In addition, the Filename Sequence and  Sort Order services may be used to further manage the matching of files, although they may not be used at the same time since it will cause the workflow to abort. All files found will be fed one after the other into the workflow.

When a file has been successfully processed by the workflow, the agent offers the possibility of moving, renaming, removing or ignoring the original file. The agent can also be configured to keep files for a set number of days. In addition, the agent offers the possibility of decompressing compressed (gzip) files after they have been collected. When all the files are successfully processed, the agent stops to await the next activation, whether it is scheduled or manually initiated.


The section contains the following subsections:


  • No labels

0 Comments

You are not logged in. Any changes you make will be marked as anonymous. You may want to Log In if you already have an account.