Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

(Data Hub allows MediationZone Image Added to integrate into Big Data solutions, by allowing the agent to forward large amounts of data to be stored in data lakes or data storage. Data Hub leverages Cloudera Data Platform (CDP) for storage access along with HDFS and Apache Impala.

It also allows for any data manipulation, such as enrichment, formatting, normalization and correlation of data to be done before the data is sent into the big data storage. The stored data can then be made accessible and searched for using the Data Hub web UI.

...

The Data Hub forwarding agent will perform the following series of tasks when initiated by the workflow. A temporary CSV file is created locally in MediationZone Image Added, where once the file is complete, it will be uploaded to the HDFS staging area. Following from that, the JDBC driver will call the Impala database to load the CSV file into the designated parquet table. It is only after the contents of the file is fully committed inside the table, does the agent remove the temporary file. Any workflow aborts will result in the temp file existing locally in MediationZone Image Added, much like the standard behavior of most forwarding agents.

...