8.12 File System Profile

The File System Profile are used for making file system specific configurations, currently used by the Amazon S3 collection and forwarding agents.

Configuration

To create a new File System profile, click the New Configuration button in the upper left part of the Desktop window, and then select File System Profile  from the menu. The configurations will vary depending on selected file system, and each file system will be described separately below.

Menus

The contents of the menus in the menu bar may change depending on which configuration type that has been opened in the currently displayed tab. The File System profile uses the standard menu items and buttons that are visible for all configurations, and these are described in 2.1 Menus and Buttons.

The Edit menu is specific for the File System profile configurations.

ItemDescription

External References

Select this menu item to enable the use of External References in the File System profile configuration. This can be used to configure the following fields:

Amazon S3 file systems

  • Access Key
  • Secret Key
  • Bucket
  • Region
  • Advanced Properties

HDFS file systems

  • Host
  • Port
  • Advanced Properties

For further information, see 8.11.4 Using External Reference in Agent Profile Fields and 8.11 External Reference Profile.

Amazon S3

When selecting Amazon S3 as file system, you will see two tabs; General and Advanced.

File System profile - Amazon S3 - General tab

General Tab

The following settings are available in the General tab in the File System profile (see screenshot above):

SettingDescription

File System Type

Select which file system type this profile should be applied for. Currently only Amazon S3 is available.
Credentials from EnvironmentSelect this check box in order to pick up the credentials from the environment instead of entering them in this profile. If this check box is selected, the Access Key and Secret Key fields will be disabled.

Access Key

Enter the access key for the user who owns the Amazon S3 account in this field.

Secret Key

Enter the secret key for the stated access key in this field.

Region from EnvironmentSelect this check box in order to pick up the region from the environment instead of entering the region in this profile. If this check box is selected, the Region field will be disabled.
RegionEnter the name of the Amazon S3 region in this field.

Bucket

Enter the name of the Amazon S3 bucket in this field.

Use Amazon ProfileSelect this check box if you already have an Amazon Profile set up, this will disable the the fields above and allow you to utilize the credentials that you have defined in your chosen Amazon Profile.

Advanced Tab

In the Advanced tab, you can configure properties for the Amazon S3 File System client. 

File System profile - Amazon S3 - Advanced tab

For information on how to configure the properties for Amazon S3 File System client, please refer to https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl.

HDFS

When selecting HDFS as file systems, you will see two tabs; General and Advanced.

File System profile - HDFS General tab

The General Tab

In the General tab you can find the following settings:

FieldDescription

File System Type

Select which file system type this profile should be applied for. Currently only Amazon S3 is available.
Version

Select a version of Hadoop from the drop-down box:

  • Non HA - This version of Hadoop does not support high availability as it has only one NameNode.
  • HA - This verison of Hadoop support high availability.

This setting only applies when you have selected Distributed File System as the File System Type.

Host

Enter the IP address or hostname of the NameNode in this field. See the Apache Hadoop Project documentation for further information about the NameNode.

Port

Enter the port number of the NameNode in this field.

The Advanced Tab

The Advanced tab contains Advanced Properties for the configuration of Kerberos authentication.

File System profile - HDFS Advanced tab 

Kerberos is an authentication technology that uses a trusted third party to authenticate one service or user to another. Within Kerberos, this trusted third party is commonly referred to as the Key Distribution Center, or KDC. For HDFS, this means that the HDFS agent authenticates with the KDC using a user principal which must be pre-defined in the KDC. The HDFS cluster must be set up to use Kerberos, and the KDC must contain service principals for the HDFS NameNodes. For information on how to set up a HDFS cluster with Kerberos, see the Hadoop Users Guide at http://www.hadoop.apache.org.

In order to perform authentication towards the KDC without a password, the HDFS agent requires a keytab file.

You can set the advanced properties in the Advanced Properties dialog to activate and configure Kerberos authentication.

The following advanced properties are related to Kerberos authentication. Refer to the Advanced Properties dialog for examples.

PropertyDescription
hadoop.security.authentication

Set the value to kerberos to activate Kerberos authentication.

Note!

Due to limitations in the Apache Hadoop client libraries, if you change this property, you may be required to restart the ECs where workflows containing the HDFS agent is going to run.

dfs.namenode.kerberos.principalThis sets the service principal to use for the HDFS NameNode. This must be predefined in the KDC. The service principal is expected to be in the form of nn/<host>@<REALM> where <host> is the host where the service is running and <REALM> is the name (in uppercase) of the Kerberos realm.
java.security.krb5.kdcThis specifies the hostname of the Key Distribution Center.
java.security.krb5.realmThis sets the name of the Kerberos realm. Uppercase only.
dr.kerberos.client.keytabfileThis sets the keytab file to use for authentication. A keytab must be predefined using Kerberos tools. The keytab must be generated for the user principal in dr.kerberos.client.principal. This filepath must be on a file system that can be reached from the EC process. The user that launches the EC must also have read permissions for this file.
dr.kerberos.client.principalThis sets the user principal that the HDFS agent authenticates as. This must be predefined in the KDC. User principals are expected to be in the form of <user>@<REALM> where <user> is typically a username and <REALM> is the name (in uppercase) of the Kerberos realm.
sun.security.krb5.debugSet this value to true to activate debug output for Kerberos.

The following properties are also included in the Advanced tab, but only apply if you have selected the HA version of Hadoop in the General tab:

PropertyDescription
fs.defaultFS This sets the HDFS filesystem path prefix.
dfs.nameservices This sets the logical name for the name services.
dfs.ha.namenodes.<nameservice ID> This sets the unique identifiers for each NameNode in the name service.
dfs.namenode.rpc-address.<nameservice ID>.<name node ID> This sets the fully-qualified RPC address for each NameNode to listen on.
dfs.namenode.http-address.<nameservice ID>.<name node ID> This sets the fully-qualified HTTP address for each NameNode to listen on.
dfs.client.failover.proxy.provider.<nameservice ID> This sets the Java class that HDFS clients use to contact the Active NameNode.

Note!

If you are using Kerberos authentication, it is recommended that you only run the HDFS agents toward one HDFS cluster per EC. This is because the Kerberos client library of HDFS relies on static properties and configurations that are global for the whole JVM. This means that one workflow running the HDFS agents could impact another workflow running the HDFS agents within the same EC process. Due to this limitation, you must also restart the EC for some configuration changes to the Advanced Properties.


The Advanced Properties can also be configured using External References by following these steps:
 

  1. Create a properties file containing the advanced configurations.

    Example - Properties file with advanced configurations

    ADV_PROP=hadoop.security.authentication\=kerberos\n\ 
     java.security.krb5.kdc\=kdc.example.com\n\ 
     dr.kerberos.client.principal\=mzadmin@EXAMPLE.COM\n\ 
     dr.kerberos.client.keytabfile\=/home/mzadmin/keytabs/ex.keytab

    Note!

    All "=" characters need to be escaped.

  2. Create an External Reference profile pointing out the property file, and containing a key pari, e g "ADV_PROP" and "ADV_PROP".
     
  3. In the workflow containing the agent, open up the Workflow Properties, select the Enable External Reference check box.
     
  4. Click on the Browse button and select your Exernal Reference profile, and for the HD FS - Advanced Properties field, select either Default, or Per Workflow.
     
  5. In the workflow instance table, right click and select the Enable External Reference option, and enter the key for the properties file, e g ADV_PROP, if that is what you used in step 2 above.