...

The

...

GCP Profile is used for

...

Amazon S3 collection agent
Amazon S3 forwarding agent
GCP Storage collection agent
GCP Storage forwarding agent
HDFS collection agent
HDFS forwarding agent
System Importer
System Exporter

The configuration options vary depending on the selected file system, and each file system is described separately below.

Table of Contents

maxLevel	1

Menus

The External Reference button is specific for the File System profile configurations.

...

setting up the access credentials and properties to be used to connect to a Google Cloud Platform service. Currently, the profile can be used with the following profiles and agents:

Menus

The contents of the menus in the menu bar may change depending on which configuration type has been opened in the currently displayed tab. The GCP Profile uses the standard menu items and buttons that are visible for all configurations, and these are described in Build View (4.2).

The Edit menu is specific for the GCP Profile configurations.

Item	Description
External References	Select this menu item to enable the use of External References in the

...

GCP Profile configuration. This can be used to configure the following fields:

Amazon S3 file systems

Access Key
Secret Key
Bucket
Region
Advanced Properties

GCP Storage file systems

Use JSON File

Credentials File

...

Bucket

Form

Project Id
Private Key Id
Private Key
Client Email
Client Id
Other Information

...

Bucket

HDFS file systems

Host
Port
Advanced Properties
Replication

Git

When selecting Git as a file system, you will see the General tab.

...

General Tab

The following settings are available in the General tab in the Git File System profile:

...

Setting

...

Description

...

Repository URL

...

The URL to the repository.

...

Token

...

Token to access the repository. This field is optional.

...

Use Secrets Profile

...

Select the checkbox to use a Secrets Profile to get the Token.

...

Get Branches

...

Click this button to fetch the branches from the repository. If the connection is working the Branch combo box will be populated. If the connection fails, an error dialog will be shown.

...

Branch

Select the branch to use.

For further information, see External Reference (4.2).

Note

Note!

It is not possible to create a new branch using Usage Engine. The branch must already exist in the repository specified in the Repository URL.

Preview Repository

Click here to browse the folders in the repository. It is only possible when the configuration is saved.

Note

Note!

When you do a Save As operation, the remote repository is cloned to the platform and may take a little long time. This directory is $MZHOME/gitrepos by default. It can be changed by setting the property mz.git.basePath to some other path accessible from the Platform

It is not possible to change the Repository URL or branch once the configuration is saved.

Import of Git File System Profile

An imported new Git File System Profile configuration will always be invalid since the repository has not been cloned. You clone the repository in the profile by clicking the Clone Repository button.

Image Removed

When the cloning is done the text on the button will change to Preview Repository, and the configuration should now be valid, which you can verify by clicking the Validate button.

Amazon S3

When selecting Amazon S3 as a file system, you will see two tabs; General and Advanced.

General Tab

The following settings are available in the General tab in the Amazon S3 File System profile:

Setting

Description

File System Type

Select which file system type this profile should be applied for. You can choose either Amazon S3 or HDFS.

Credentials from Environment

Select this check box to pick up the credentials from the environment instead of entering them in this profile. If this checkbox is selected, the Access Key and Secret Key fields will be disabled.

Access Key

Enter the access key for the user who owns the Amazon S3 account in this field.

If you want to set a parameter, select the Parameterized checkbox and enter the parameter name using ${} syntax, see Profile(4.1) for more information on how parameterization works (in this mode the regular access key field is disabled).

Secret Key

Enter the secret key for the stated access key in this field.

If you want to set a parameter, select the Parameterized checkbox and enter the parameter name using ${} syntax, see Profile(4.1) for more information on how parameterization works (in this mode the regular secret key field is disabled).

Region from Environment

Select this check box to pick up the region from the environment instead of entering the region in this profile. If this check box is selected, the Region field will be disabled.

Region

Enter the name of the Amazon S3 region in this field.

Bucket

Enter the name of the Amazon S3 bucket in this field.

Advanced Tab

In the Advanced tab, you can configure properties for the Amazon S3 File System client.

For information on how to configure the properties for Amazon S3 File System client, see https://docs.aws.amazon.com/AmazonS3/latest/dev/acl-overview.html#canned-acl.

GCP Storage

When selecting GCP Storage as a file system, you will see the tab General.

Image Removed Insert excerptUEPE4D:GCP (4.0)UEPE4D:GCP (4.0)nopaneltrue

Location

Image Removed

Setting

Description

Bucket

Enter the name of the GCP Storage bucket in this field.

Use GCP Profile

Select the checkbox and then choose an existing GCP Profile if the Authentication Details should be derived from a GCP Profile instead of adding them directly to this profile.

HDFS

When selecting HDFS as a file systems, you will see two tabs; General and Advanced.

General Tab

The following settings are available in the General tab in the HDFS File System profile:

Setting

Description

File System Type

Select which file system type this profile should be applied for. You can choose either Amazon S3 or HDFS.

Hadoop Mode

Select the type of Hadoop from the drop-down box:

Non HA - This version of Hadoop does not support high availability as it has only one NameNode.
HA - This version of Hadoop supports high availability.

Host

Enter the IP address or hostname of the NameNode in this field. See the Apache Hadoop Project documentation for further information about the NameNode.

Port

Enter the port number of the NameNode in this field.

Replication

Enter the number for HDFS to configure the replication factor. Replication is used for fault tolerance and more information regarding replication be found at: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/HdfsDesign.html#Data_Replication

Advanced Tab

The Advanced tab contains Advanced Properties for the configuration of Kerberos authentication.

Kerberos is an authentication technology that uses a trusted third party to authenticate one service or user to another. Within Kerberos, this trusted third party is commonly referred to as the Key Distribution Center, or KDC. For HDFS, this means that the HDFS agent authenticates with the KDC using a user principal which must be pre-defined in the KDC. The HDFS cluster must be set up to use Kerberos, and the KDC must contain service principals for the HDFS NameNodes. For information on how to set up an HDFS cluster with Kerberos, see the Hadoop Users Guide at http://www.hadoop.apache.org.

To perform authentication towards the KDC without a password, the HDFS agent requires a keytab file.

You can set the advanced properties in the Advanced Properties dialog to activate and configure Kerberos authentication.

The following advanced properties are related to Kerberos authentication. Refer to the Advanced Properties dialog for examples.

Property

Description

hadoop.security.authentication

Set the value to kerberos to activate Kerberos authentication.

Note

Note!

Due to limitations in the Apache Hadoop client libraries, if you change this property, you may be required to restart the ECs where workflows containing the HDFS agent is going to run.

dfs.namenode.kerberos.principal

This sets the service principal to use for the HDFS NameNode. This must be predefined in the KDC. The service principal is expected to be in the form of nn/<host>@<REALM> where <host> is the host where the service is running and <REALM> is the name (in uppercase) of the Kerberos realm.

java.security.krb5.kdc

This specifies the hostname of the Key Distribution Center.

java.security.krb5.realm

This sets the name of the Kerberos realm. Uppercase only.

dr.kerberos.client.keytabfile

This sets the keytab file to use for authentication. A keytab must be predefined using Kerberos tools. The keytab must be generated for the user principal in dr.kerberos.client.principal. This filepath must be on a file system that can be reached from the EC process. The user that launches the EC must also have read permissions for this file.

dr.kerberos.client.principal

This sets the user principal that the HDFS agent authenticates as. This must be predefined in the KDC. User principals are expected to be in the form of <user>@<REALM> where <user> is typically a username and <REALM> is the name (in uppercase) of the Kerberos realm.

sun.security.krb5.debug

Set this value to true to activate debug output for Kerberos.

The following properties are also included in the Advanced tab, but only apply if you have selected the HA version of Hadoop in the General tab:

Property

Description

fs.defaultFS

This sets the HDFS filesystem path prefix.

dfs.nameservices

This sets the logical name for the name services.

dfs.ha.namenodes.<nameservice ID>

This sets the unique identifiers for each NameNode in the name service.

dfs.namenode.rpc-address.<nameservice ID>.<name node ID>

This sets the fully-qualified RPC address for each NameNode to listen on.

dfs.namenode.http-address.<nameservice ID>.<name node ID>

This sets the fully-qualified HTTP address for each NameNode to listen on.

dfs.client.failover.proxy.provider.<nameservice ID>

This sets the Java class that HDFS clients use to contact the Active NameNode.

Note

Note!

If you are using Kerberos authentication, it is recommended that you only run the HDFS agents toward one HDFS cluster per EC. This is because the Kerberos client library of HDFS relies on static properties and configurations that are global for the whole JVM. This means that one workflow running the HDFS agents could impact another workflow running the HDFS agents within the same EC process. Due to this limitation, you must also restart the EC for some configuration changes to the Advanced Properties.

The Advanced Properties can also be configured using External References by following these steps:

Create a properties file containing the advanced configurations.

Info

Example - Properties file with advanced configurations

Code Block

ADV_PROP=hadoop.security.authentication\=kerberos\n\ 
 java.security.krb5.kdc\=kdc.example.com\n\ 
 dr.kerberos.client.principal\=mzadmin@EXAMPLE.COM\n\ 
 dr.kerberos.client.keytabfile\=/home/mzadmin/keytabs/ex.keytab

Note

Note!

All "=" characters need to be escaped.

Create an External Reference profile pointing out the property file, and containing a key pair, e g "ADV_PROP" and "ADV_PROP".

In the workflow containing the agent, open up the Workflow Properties, select the Enable External Reference check box.

Click on the Browse button and select your Exernal Reference profile, and for the HDFS - Advanced Properties field, select either Default, or Per Workflow.

In the workflow table, right click and select the Enable External Reference option, and enter the key for the properties file, e g ADV_PROP, if that is what you used in step 2 above

If there is a proxy in your network environment, the GCP agents will work with a proxy that does not require authentication. Currently, the GCP agents do not work with a proxy that requires authentication. Refer to HTTP Proxy Setup (4.2) for more details.

Configuration

JSON File

The following settings are available when you have selected Use Json File as the Input Option in the GCP Profile.

...

Setting	Description
Environment-Provided Service Account	When Usage Engine is deployed in the GCP environment, such as in Compute Engine, enable this option to retrieve the Service Account credentials provided by the environment.
Input Option	Allows you to select the method for connecting to the GCP service. For the Use JSON File option, you need to create the GCP Service Account Key as a JSON file and download it into the Platform and EC servers.
Credentials File	The location of the GCP Service Account JSON file containing the credential keys.

Note

Note!

The JSON file option is not recommended for production dgcp_profile_jsoneployments. It is meant to facilitate ease of testing of the GCP Profile by the workflow designer during development.

Form

The following settings are available when you have selected Form as the Input Option in the GCP Profile.

...

Setting	Description
Environment-Provided Service Account	When Usage Engine is deployed in the GCP environment, such as in Compute Engine, enable this option to retrieve the Service Account credentials provided by the environment.
Input Option	Allows you to select the method for connecting to the GCP service. For Form, the GCP Profile will take the role of the Service Account Key file. It will parse all the credentials in order to connect to the GCP service.
Project Id	The GCP Project Id that hosts the GCP service that Usage Engine should access.
Private Key Id	The Private Key Id to be used for the service account.
Private Key	The full content of the private key, or use Secret Profile.
Client Email	The email address given to the service account.
Client Id	The Id for the service account client.
Other Information	The Auth URI, Token URI and info about the certs are to be added into this field.

Versions Compared

Old Version 1

New Version Current

Key

Menus

Menus

Git

General Tab

Note!

Import of Git File System Profile

Amazon S3

General Tab

Advanced Tab

GCP Storage

HDFS

General Tab

Advanced Tab

Note!

Note!

Example - Properties file with advanced configurations

Configuration

JSON File

Note!

Form

Page Comparison

Versions Compared

Old Version 1

New Version Current

Key

Menus

Menus

Git

General Tab

Note!

Import of Git File System Profile

Amazon S3

General Tab

Advanced Tab

GCP Storage

HDFS

General Tab

Advanced Tab

Note!

Note!

Example - Properties file with advanced configurations

Configuration

JSON File

Note!

Form