Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Data Hub allows MediationZone to integrate into Big Data solutions, by allowing the agent to forward large amounts of data to be stored in data lakes or data storage. Data Hub leverages Cloudera Data Platform (CDP) for storage access along with HDFS and Apache Impala.

It also allows for any data manipulation, such as enrichment, formatting, normalization and correlation of data to be done before the data is sent into the big data storage. The stored data can then be made accessible and searched for using the Data Hub web UI.

...

Currently, Data Hub supports connection to Cloudera with LDAP, Kerberos or both. The table below will indicate how we will support the LDAP and Kerberos authentication in the Cloudera framework.


LDAP

Kerberos

No Authentication

Impala

Supported

Supported

Supported

HDFS


Supported

Supported

Mapping of the UDR's to the designated table in Impala is performed once the table name is selected from the Database section of the Impala Tab. The mapping can be done automatically if the field names in the ultra decoder matches the one in the Impala table.

...

The Data Hub forwarding agent will perform the following series of tasks when initiated by the workflow. A temporary CSV file is created locally in Image RemovedMediationZone, where once the file is complete, it will be uploaded to the HDFS staging area. Following from that, the JDBC driver will call the Impala database to load the CSV file into the designated parquet table. It is only after the contents of the file is fully committed inside the table, does the agent remove the temporary file. Any workflow aborts will result in the temp file existing locally in Image RemovedMediationZone, much like the standard behavior of most forwarding agents.

Data Hub Forwarding Agent process

...