The File System Profile are is used for making file system-specific configurations, currently used by the Amazon S3 collection and forwarding agents.
Configuration
To create a new File System profile, click the New Configuration button in the upper left part of the Desktop window button from the Configuration dialog available from Build View, and then select File System Profile from select File System Profile from the menu. The configurations will vary depending on the selected file system, and each file system will be described separately below.
...
Buttons
The contents of the menus in the menu button bar may change depending on which configuration type that has been opened in the currently displayed tab. The File System profile uses the standard menu items and buttons that are visible for all configurations, and these are described in 2.1 Menus and in Common Configuration Buttons.
The Edit menu is specific for the File System profile configurations.
ItemSetting | Description |
---|---|
External References | Select this menu item to enable the use of External References in the File System profile configuration. This can be used to configure the following fields: Amazon S3 file systems
HDFS file systems
GCP file systems
For further information, see 8.10.4 Using External Reference in Agent Profile Fields and 8.10 External Reference Profile. |
Amazon S3
When selecting Amazon S3 as a file system, you will see two tabs ; – General and Advanced.
File System profile - Amazon S3 - General tab
General Tab
The following settings are available in the General tab tab in the File System profile (see screenshot above):
Setting | Description |
---|---|
File System Type | Select which file system type this profile should be applied for. Currently, only Amazon S3 is available. |
Credentials Settings | |
Credentials from Environment | Select this check box in order to pick up the credentials from the environment instead of entering them in this profile. If this check box is selected, the Access Key and Secret Key fields will be disabled. |
Access Key | Enter the access key for the user who owns the Amazon S3 account in this field. |
Secret Key | Enter the secret key for the stated access key in this field. |
Location Settings | |
Region from Environment | Select this check box in order to pick up the region from the environment instead of entering the region in this profile. If this check box is selected, the Region field will be disabled. |
Region | Enter the name of the Amazon S3 region in this field. |
Bucket | Enter the name of the Amazon S3 bucket in this field. |
Use Amazon Profile | Select this check box if you already have an Amazon Profile set up, this will disable the the fields above and allow you to utilize the credentials that you have defined in your chosen Amazon Profile. |
Advanced Tab
In the Advanced tab, you can configure properties for the Amazon S3 File System client.
The Advanced tab allows for advanced properties to be configured in the profile.
File System profile - Amazon S3 - Advanced tab
For information on how to configure the properties for Amazon S3 File System client, please refer to https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl.
...
GCP Storage
When selecting HDFS GCP as a file systemssystem, you will see two tabs; General and Advancedone tab called General.
File System profile - HDFS GCP - General tab
The
...
In the General tab you can find the following settings:
...
File System Type
...
Select a version of Hadoop from the drop-down box:
- Non HA - This version of Hadoop does not support high availability as it has only one NameNode.
- HA - This verison of Hadoop support high availability.
This setting only applies when you have selected Distributed File System as the File System Type.
...
Host
...
Enter the IP address or hostname of the NameNode in this field. See the Apache Hadoop Project documentation for further information about the NameNode.
...
Port
...
Enter the port number of the NameNode in this field.
The Advanced Tab
The Advanced tab contains Advanced Properties for the configuration of Kerberos authentication.
File System profile - HDFS Advanced tab
Kerberos is an authentication technology that uses a trusted third party to authenticate one service or user to another. Within Kerberos, this trusted third party is commonly referred to as the Key Distribution Center, or KDC. For HDFS, this means that the HDFS agent authenticates with the KDC using a user principal which must be pre-defined in the KDC. The HDFS cluster must be set up to use Kerberos, and the KDC must contain service principals for the HDFS NameNodes. For information on how to set up a HDFS cluster with Kerberos, see the Hadoop Users Guide at http://www.hadoop.apache.org.
In order to perform authentication towards the KDC without a password, the HDFS agent requires a keytab file.
You can set the advanced properties in the Advanced Properties dialog to activate and configure Kerberos authentication.
The following advanced properties are related to Kerberos authentication. Refer to the Advanced Properties dialog for examples.
...
Set the value to kerberos
to activate Kerberos authentication.
Note | ||
---|---|---|
| ||
Due to limitations in the Apache Hadoop client libraries, if you change this property, you may be required to restart the ECs where workflows containing the HDFS agent is going to run. |
...
The following properties are also included in the Advanced tab, but only apply if you have selected the HA version of Hadoop in the General tab:
...
Note | ||
---|---|---|
| ||
If you are using Kerberos authentication, it is recommended that you only run the HDFS agents toward one HDFS cluster per EC. This is because the Kerberos client library of HDFS relies on static properties and configurations that are global for the whole JVM. This means that one workflow running the HDFS agents could impact another workflow running the HDFS agents within the same EC process. Due to this limitation, you must also restart the EC for some configuration changes to the Advanced Properties. |
...
Create a properties file containing the advanced configurations.
...
title | Example - Properties file with advanced configurations |
---|
Code Block | ||||
---|---|---|---|---|
| ||||
ADV_PROP=hadoop.security.authentication\=kerberos\n\
java.security.krb5.kdc\=kdc.example.com\n\
dr.kerberos.client.principal\=mzadmin@EXAMPLE.COM\n\
dr.kerberos.client.keytabfile\=/home/mzadmin/keytabs/ex.keytab |
Note | ||
---|---|---|
| ||
All "=" characters need to be escaped. |
...
following settings are available in the General tab in the File System profile:
Setting | Description |
---|---|
Environment-Provided Service Account | Select this option, if you are configuring an environment-provided service account with this profile. This will disable the Input Option and Credentials File fields. |
Input Option | Using this option you delegate how the GCP connection credentials are acquired, the available options are to select a JSON File or to fill in the information from a Form. |
Credentials File | Enter the path to the delegated credentials file. This option is visible only when the JSON File option is selected as the input option. |
Import Credentials from File | This button allows for credentials to be imported from a locally stored file. This option is visible only when the Form input option is selected. |
Project ID | Enter the project ID. This option is visible only when the Form input option is selected. |
Private Key ID | Enter the Private key ID. This option is visible only when the Form input option is selected. |
Client Email | Enter the client's email. This option is visible only when the Form input option is selected. |
Client ID | Enter the client ID. This option is visible only when the Form input option is selected. |
Other Information | Enter other information that might be used with this profile. This option is visible only when the Form input option is selected. |
Location Settings | |
Bucket | Enter the target bucket name. |
Use GCP Profile | Select this check box if you already have a GCP Profile set up, this will disable the fields above and allow you to utilize the credentials that you have defined in your chosen GCP Profile. |
HDFS
When selecting HDFS as a file system, you will see two tabs – General and Advanced.
File System profile - HDFS - General tab
The following settings are available in the General tab in the File System profile:
Setting | Description |
---|---|
General Settings | |
Hadoop Mode | Select the Hadoop mode that you want to use, both NON-HA and HA are available. |
Name Node Settings | |
Host | Enter the Hadoop name node host. This option is visible only when the NON-HA Hadoop Mode is selected. |
Port | Enter the Hadoop port number. This option is visible only when the NON-HA Hadoop Mode is selected. |
Replication | Enter the desired number of replication. |
The Advanced tab allows for advanced properties to be configured in the profile.