Data Hub Profile

The Data Hub profile is used by the Data Hub Forwarding Agent and Data Hub Task Agent to connect to an Impala database via Cloudera JDBC, and an HDFS. This profile is also used to map the input UDRs to the Data Hub forwarding and an Impala database table. 

Data Hub Query also uses the profile to access the data stored in the Impala database.

The Data Hub profile is loaded when you start a workflow that relies on it. Changes to the profile become effective when you restart the workflow.

Configuration

To create a new Data Hub profile configuration, click the New Configuration in the Build view and select Data Hub Profile from the Configurations browser.

The contents of the menus in the menu bar may change depending on which configuration type that has been opened in the currently active tab. The Data Hub profile uses the standard menu items and buttons that are visible for all configurations, and these are described in Build View.

Impala Tab

The Impala tab contains connection settings for the Impala database.

Data Hub profile configuration - Impala tab

Setting

Description

Setting

Description

Host

Enter the hostname or the IP address of the Impala database.

Port

Enter the port number that is configured for the connection into the Impala database.

Enable TLS

Allows the user to enable the TLS functionality for the connection to the Impala database.

Allow Self Signed Cert

Checkbox to enable the usage of Self Signed Certificates. If this checkbox is selected, both Trust Store File Path and Trust Store Password will be disabled.

Trust Store File Path

Enter the location of the trust store file. This path is used to store certificates from other Certified Authorities. The setup is required to establish a successful connection at the client side.

Note!

This field is enabled when the Enable TLS checkbox is selected.

However, when the Allow Self Signed Cert checkbox is selected, this field is disabled.



Trust Store Password

Enter the passphrase of the trust store file. This password is used to access the certificates stored in the trust store.

Note!

This field is enabled when the Enable TLS checkbox is selected.

However, when the Allow Self Signed Cert checkbox is selected, this field is disabled.



Key Store File Path

Enter the location of the key store file. This path is used to store your credential. This is required when setting up the server side on the SSL. 

Note!

This field is enabled when the Enable TLS checkbox is selected.



Key Store Password

Enter the passphrase of the key store file. This password is used to access the credentials stored in the key store.



Database Name

Click the Refresh button next to Database Name to retrieve a list of available databases and then select a database from the drop-down menu. The Tables Mapping tab will appear.

Refresh

Click this to retrieve a list of available databases.

Test Connection

Click to test the JDBC connection to the Impala database.

HDFS Tab

The HDFS tab contains the properties required for the connection to HDFS as well as properties for staging paths.

Data Hub profile configuration - HDFS tab

Setting

Description

Setting

Description

HDFS URI

Enter the URI of the HDFS NameNode.

Staging Path

Enter the absolute path to an existing directory on the HDFS. This directory will be used as a staging directory for the data.

MZ Temp Path

Enter the path for temporarily storing the files locally before it is inserted into the HDFS staging directory.

Create Directory

Select this to create the MZ Temp Path directory if it does not exist. 

Advanced Tab

The Advanced tab contains the properties needed for Data Hub agent to connect to any Cloudera configurations that has LDAP and Kerberos enabled.

Data Hub profile configuration - Advanced tab

See the text in the Properties field for further information about the other properties that you can set.



Tables Mapping Tab

Data Hub profile configuration - Tables Mapping tab

Setting

Description

Setting

Description

Table

Select a database table from the drop-down list. The name of the table columns and their data type will appear.





UDR Type

Click the Browse button and then select a UDR type that will be routed to a Data Hub agent. For information about Ultra types that can be mapped to Impala types, see Compatible Types below.

Auto Map

Click this button to automatically map UDR fields and database columns with identical names. The automatic mapping is not case-sensitive. If a field cannot be mapped, the current value in the UDR Field column remains unchanged.

Column

The name of the columns in the selected table.

Type

This displays a list of valid Impala types based on the Column in the selected table. Each column must be mapped against a type. For information about Ultra types that can be mapped to Impala types, see Compatible Types below.

UDR Field

Select a UDR field from the drop-down list. This represents selectable fields available based on the selected UDR Type above.

Date Hint

Select the date format to be stored in the table. This is required for a partition column. A date in the Ultra format can be stored as INT, BIGINT and SMALLINT types.

When the column is a partition, you must select the corresponding date format from this drop-down list:

  • yyyyMMddHH - e g 2018123013

  • yyyyMMdd - e g 20181230

  • yyyyMM - 201812

  • yyyy - 2018

The selected format determines the granularity of date pickers in the Web UI.

When you change the Date Hint value of a partition column for an existing profile, make sure to review the settings of Data Hub task workflows that depend on the updated profile and table.



When the UDR fields do not have names that match the database columns, you can map these manually by clicking the corresponding cell and selecting the field from a drop-down list. 

Compatible Types

The following table shows allowed mappings of Impala types to Ultra types.

Impala Type

Impala Range

Ultra Type

Ultra Range

Impala Type

Impala Range

Ultra Type

Ultra Range

STRING

Maximum of 32,767 bytes

string

Unlimited

INT

-2147483648 to +2147483647

int

-2,147,483,648 to +2,147,483,647

short

-32,768 to +32,767

byte

-128 to +127

FLOAT

1.40129846432481707e-45 to 3.40282346638528860e+38
(positive or negative)

float

1.40129846432481707e-45 to 3.40282346638528860e+38
(positive or negative)

DOUBLE

4.94065645841246544e-324d to 1.79769313486231570e+308
(positive or negative)

double

4.94065645841246544e-324d to 1.79769313486231570e+308d
(positive or negative)

float

1.40129846432481707e-45 to 3.40282346638528860e+38
(positive or negative)

BOOLEAN

TRUE or FALSE

boolean

true or false

BIGINT

-9223372036854775808 to +9223372036854775807

long

-9223372036854775808 to +9223372036854775807

int

-2,147,483,648 to +2,147,483,647

short

-32,768 to +32,767

byte

-128 to +127

REAL

4.94065645841246544e-324d to +1.79769313486231570e+308
(positive or negative)

double

4.94065645841246544e-324d to 1.79769313486231570e+308d
(positive or negative)

SMALLINT

-32768 to +32767

short

-32,768 to +32,767

byte

-128 to +127

TINYINT

-128 to +127

byte

-128 to +127

TIMESTAMP

YYYYMMDDHH

date

yyyyMMddHH