(Data Hub allows MediationZone to integrate into Big Data solutions, by allowing the agent to forward large amounts of data to be stored in data lakes or data storage. Data Hub leverages Cloudera Data Platform (CDP) for storage access along with HDFS and Apache Impala.
It also allows for any data manipulation, such as enrichment, formatting, normalization and correlation of data to be done before the data is sent into the big data storage. The stored data can then be made accessible and searched for using the Data Hub web UI.
...
Mapping of the UDR's to the designated table in Impala is performed once the table name is selected from the Database section of the Impala Tab. The mapping can be done automatically if the field names in the ultra decoder matches the one in the Impala table.
Example of Data Hub Profile - Tables Mapping Tab
...
The Data Hub forwarding agent will perform the following series of tasks when initiated by the workflow. A temporary CSV file is created locally in MediationZone , where once the file is complete, it will be uploaded to the HDFS staging area. Following from that, the JDBC driver will call the Impala database to load the CSV file into the designated parquet table. It is only after the contents of the file is fully committed inside the table, does the agent remove the temporary file. Any workflow aborts will result in the temp file existing locally in MediationZone , much like the standard behavior of most forwarding agents.
...
You can use the Data Hub profile to select the table that should be available for query or export data in one of the Impala tables specified in the Data Hub profile, without any knowledge about SQL.
Example of Data Hub UI
Scroll pagebreak