The Data Hub profiles is used by the Data Hub forwarding- and task agent to connect to an Impala database via Cloudera JDBC, and an HDFS. This profile is also used to map the input UDRs to the Data Hub forwarding and an Impala database table.
The Data Hub profile is loaded when you start a workflow that depends on it. Changes to the profile become effective when you restart the workflow.
Configuration
To create a new Data Hub profile configuration, click the New Configuration button in the upper left part of the Desktop window, and then select Data Hub Profile from the menu.
The contents of the menus in the menu bar may change depending on which configuration type that has been opened in the currently active tab. The Data Hub profile uses the standard menu items and buttons that are visible for all configurations, and these are described in /wiki/spaces/MD82/pages/3769348 in the /wiki/spaces/MZD73/pages/5656992.
Impala Tab
The impala tab contains connections settings for the Impala database.
Data Hub profile configuration - Impala tab
Setting | Description |
---|---|
Host | Enter the hostname or the IP address of the Impala database. |
Port | Enter the port number that is configured for the connection into the Impala database. |
Enable TLS | Allows the user to enable the TLS functionality for the connection to the Impala database. |
Allow Self Signed Cert | Checkbox to enable the usage of Self Signed Certificates. If this checkbox is enabled, both Trust Store File Path and Trust Store Password will be disabled. |
Trust Store File Path | This field is available when you have enabled TLS. Enter the location of the trust store file. |
Trust Store Password | This field is available when you have enabled TLS. Enter the passphrase of the trust store file. |
Key Store File Path | This field is available when you have enabled TLS. Enter the location of the key store file. |
Key Store Password | This field is available when you have enabled TLS. Enter the passphrase of the key store file. |
Database Name | Click the Refresh button next to Database Name to retrieve a list of available database and then select a database from the drop-down menu. The Tables Mapping tab will appear. |
Refresh | Click the refresh to retrieve a list of available databases. |
Test Connection | Click to test the JDBC connection to the Impala database. |
HDFS
The HDFS tab contains the properties required for the connection to HDFS as well as properties for staging paths
Data Hub profile configuration - HDFS tab
Setting | Description |
---|---|
HDFS URI | Enter the URI of the HDFS NameNode. |
Staging Path | Enter the absolute path to an existing directory on the HDFS. This directory will be used as a staging directory for the data. |
MZ Temp Path | Enter the path for temporarily storing the files locally before it is inserted into the HDFS staging directory. |
Create Directory | Select this to create the MZ Temp Path directory if it does not exist. |
Advanced
The Advanced tab contains the properties needed for Data Hub agent to connect to any Cloudera configurations that has LDAP and Kerberos enabled.
Data Hub profile configuration - Advanced tab
See the text in the Properties field for further information about the other properties that you can set.
Note | ||
---|---|---|
| ||
Due to the behavior of the Kerberos JVM, Data Hub profiles and agents that will interface with a Kerberos enabled Cloudera must be configured to run on the same EC. |
Info | ||
---|---|---|
| ||
You will need to ensure the keytab file is located in the same host as the EC that will be running the workflow with the Data Hub agent. |
Tables Mapping
The Tables Mapping tab contains the mapping between one or more database table and a UDR type that you have selected in this tab.
Data Hub profile configuration - Tables Mapping tab
Setting | Description | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|
Table | Select a database table from the drop-down list. The name of the table columns and their data type will appear.
| |||||||||||
UDR Type | Click the Browse button and then select a UDR type that will be routed to a Data Hub agent. For information about Ultra types that can be mapped to Impala types, see Compatible Types below. | |||||||||||
Auto Map | Click this button to automatically map UDR fields and database columns with identical names. The automatic mapping is not case-sensitive. If a field cannot be mapped, the current value in the UDR Field column remains unchanged. | |||||||||||
Column | The name of the columns in the selected table. | |||||||||||
Type | Clicking the cell, displays a list of valid Impala types. Each column must be mapped against a type. For information about Ultra types that can be mapped to Impala types, see Compatible Types below. | |||||||||||
UDR Field | Select a UDR field from the drop-down list. This represents selectable fields available based on the selected UDR Type above. | |||||||||||
Date Hint | Select the date format to be stored in the table. A date in the Ultra format can be stored as date, string, or integer types. When the type is string or integer, you must select the corresponding date format from a drop-down list in this column:
The selected format determines the granularity of date pickers in the Web UI. When you change the Date Hint value of a partition column for an existing profile, make sure to review the settings of Data Hub task workflows that depend on the updated profile and table.
|
When the UDR fields do not have names that matches the database columns, you can map these manually by clicking the corresponding cell and selecting the field from a drop-down list.
Anchor | ||||
---|---|---|---|---|
|
The following table shows allowed mappings of Impala types to Ultra types.
Impala Type | Impala Range | Ultra Type | Ultra Range |
---|---|---|---|
STRING | Maximum of 32,767 bytes | string | Unlimited |
INT | -2147483648 to +2147483647 | int | -2,147,483,648 to +2,147,483,647 |
short | -32,768 to +32,767 | ||
byte | -128 to +127 | ||
FLOAT | 1.40129846432481707e-45 to 3.40282346638528860e+38 (positive or negative) | float | 1.40129846432481707e-45 to 3.40282346638528860e+38 (positive or negative) |
DOUBLE | 4.94065645841246544e-324d to 1.79769313486231570e+308 (positive or negative) | double | 4.94065645841246544e-324d to 1.79769313486231570e+308d (positive or negative) |
float | 1.40129846432481707e-45 to 3.40282346638528860e+38 (positive or negative) | ||
BOOLEAN | TRUE or FALSE | boolean | true or false |
BIGINT | -9223372036854775808 to +9223372036854775807 | long | -9223372036854775808 to +9223372036854775807 |
int | -2,147,483,648 to +2,147,483,647 | ||
short | -32,768 to +32,767 | ||
byte | -128 to +127 | ||
REAL | 4.94065645841246544e-324d to +1.79769313486231570e+308 (positive or negative) | double | 4.94065645841246544e-324d to 1.79769313486231570e+308d (positive or negative) |
SMALLINT | -32768 to +32767 | short | -32,768 to +32,767 |
byte | -128 to +127 | ||
TINYINT | -128 to +127 | byte | -128 to +127 |
TIMSTAMP | YYYYMMDDHH | date | yyyyMMddHH |