8.12 File System Profile
The File System Profile is used for making file system-specific configurations, currently used by the Amazon S3 collection and forwarding agents.
Configuration
To create a new File System profile, click the New Configuration button in the upper left part of the Desktop window, and then select File System Profile from the menu. The configurations will vary depending on the selected file system, and each file system will be described separately below.
Menus
The contents of the menus in the menu bar may change depending on which configuration type that has been opened in the currently displayed tab. The File System profile uses the standard menu items and buttons that are visible for all configurations, and these are described in 2.1 Menus and Buttons.
The Edit menu is specific for the File System profile configurations.
Item | Description |
---|---|
External References | Select this menu item to enable the use of External References in the File System profile configuration. This can be used to configure the following fields: Amazon S3 file systems
HDFS file systems
For further information, see 8.11.4 Using External Reference in Agent Profile Fields and 8.11 External Reference Profile. |
Amazon S3
When selecting Amazon S3 as a file system, you will see two tabs; General and Advanced.
File System profile - Amazon S3 - General tab
General Tab
The following settings are available in the General tab in the File System profile (see screenshot above):
Setting | Description |
---|---|
File System Type | Select which file system type this profile should be applied for. Currently, only Amazon S3 is available. |
Credentials from Environment | Select this check box in order to pick up the credentials from the environment instead of entering them in this profile. If this check box is selected, the Access Key and Secret Key fields will be disabled. |
Access Key | Enter the access key for the user who owns the Amazon S3 account in this field. |
Secret Key | Enter the secret key for the stated access key in this field. |
Region from Environment | Select this check box in order to pick up the region from the environment instead of entering the region in this profile. If this check box is selected, the Region field will be disabled. |
Region | Enter the name of the Amazon S3 region in this field. |
Bucket | Enter the name of the Amazon S3 bucket in this field. |
Use Amazon Profile | Select this check box if you already have an Amazon Profile set up, this will disable the fields above and allow you to utilize the credentials that you have defined in your chosen Amazon Profile. |
Advanced Tab
In the Advanced tab, you can configure properties for the Amazon S3 File System client.
File System profile - Amazon S3 - Advanced tab
For information on how to configure the properties for the Amazon S3 File System client, please refer to https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl.
HDFS
When selecting HDFS as the file system, you will see two tabs; General and Advanced.
File System profile - HDFS General tab
The General Tab
In the General tab you can find the following settings:
Field | Description |
---|---|
File System Type | Select which file system type this profile should be applied for. Currently, only Amazon S3 is available. |
Version | Select a version of Hadoop from the drop-down box:
This setting only applies when you have selected Distributed File System as the File System Type. |
Host | Enter the IP address or hostname of the NameNode in this field. See the Apache Hadoop Project documentation for further information about the NameNode. |
Port | Enter the port number of the NameNode in this field. |
The Advanced Tab
The Advanced tab contains Advanced Properties for the configuration of Kerberos authentication.
File System profile - HDFS Advanced tab
Kerberos is an authentication technology that uses a trusted third party to authenticate one service or user to another. Within Kerberos, this trusted third party is commonly referred to as the Key Distribution Center or KDC. For HDFS, this means that the HDFS agent authenticates with the KDC using a user principal which must be pre-defined in the KDC. The HDFS cluster must be set up to use Kerberos, and the KDC must contain service principals for the HDFS NameNodes. For information on how to set up an HDFS cluster with Kerberos, see the Hadoop Users Guide at http://www.hadoop.apache.org.
In order to perform authentication towards the KDC without a password, the HDFS agent requires a keytab file.
You can set the advanced properties in the Advanced Properties dialog to activate and configure Kerberos authentication.
The following advanced properties are related to Kerberos authentication. Refer to the Advanced Properties dialog for examples.
Property | Description |
---|---|
hadoop.security.authentication | Set the value to Note! Due to limitations in the Apache Hadoop client libraries, if you change this property, you may be required to restart the ECs where workflows containing the HDFS agent is going to run. |
dfs.namenode.kerberos.principal | This sets the service principal to use for the HDFS NameNode. This must be predefined in the KDC. The service principal is expected to be in the form of nn/<host>@<REALM> where <host> is the host where the service is running and <REALM> is the name (in uppercase) of the Kerberos realm. |
java.security.krb5.kdc | This specifies the hostname of the Key Distribution Center. |
java.security.krb5.realm | This sets the name of the Kerberos realm. Uppercase only. |
dr.kerberos.client.keytabfile | This sets the keytab file to use for authentication. A keytab must be predefined using Kerberos tools. The keytab must be generated for the user principal in dr.kerberos.client.principal . This file path must be on a file system that can be reached from the EC process. The user that launches the EC must also have read permissions for this file. |
dr.kerberos.client.principal | This sets the user principal that the HDFS agent authenticates as. This must be predefined in the KDC. User principlas are expected to be in the form of <user>@<REALM> where <user> is typically a username and <REALM> is the name (in uppercase) of the Kerberos realm. |
sun.security.krb5.debug | Set this value to true to activate debug output for Kerberos. |
The following properties are also included in the Advanced tab, but only apply if you have selected the HA version of Hadoop in the General tab:
Property | Description |
---|---|
fs.defaultFS
| This sets the HDFS filesystem path prefix. |
dfs.nameservices
| This sets the logical name for the name services. |
dfs.ha.namenodes.<nameservice ID>
| This sets the unique identifiers for each NameNode in the name service. |
dfs.namenode.rpc-address.<nameservice ID>.<name node ID>
| This sets the fully-qualified RPC address for each NameNode to listen on. |
dfs.namenode.http-address.<nameservice ID>.<name node ID>
| This sets the fully-qualified HTTP address for each NameNode to listen on. |
dfs.client.failover.proxy.provider.<nameservice ID>
| This sets the Java class that HDFS clients use to contact the Active NameNode. |
Note!
If you are using Kerberos authentication, it is recommended that you only run the HDFS agents toward one HDFS cluster per EC. This is because the Kerberos client library of HDFS relies on static properties and configurations that are global for the whole JVM. This means that one workflow running the HDFS agents could impact another workflow running the HDFS agents within the same EC process. Due to this limitation, you must also restart the EC for some configuration changes to the Advanced Properties.
The Advanced Properties can also be configured using External References by following these steps:
Create a properties file containing the advanced configurations.
Example - Properties file with advanced configurations
ADV_PROP=hadoop.security.authentication\=kerberos\n\ java.security.krb5.kdc\=kdc.example.com\n\ dr.kerberos.client.principal\=mzadmin@EXAMPLE.COM\n\ dr.kerberos.client.keytabfile\=/home/mzadmin/keytabs/ex.keytab
Note!
All "=" characters need to be escaped.
- Create an External Reference profile pointing out the property file, and containing a key pair, e g "ADV_PROP" and "ADV_PROP".
- In the workflow containing the agent, open up the Workflow Properties, select the Enable External Reference check box.
- Click on the Browse button and select your External Reference profile and for the HD FS - Advanced Properties field, select either Default, or Per Workflow.
- In the workflow instance table, right-click and select the Enable External Reference option, and enter the key for the properties file, e g ADV_PROP, if that is what you used in step 2 above.