Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Data Hub requires access to Cloudera Impala, which provides high-performance, low-latency SQL queries on data stored in an Hadoop filesystem (HDFS). 

The Data Hub Forwarding Agent bulk loads data in CSV files to HDFS and then inserts it into a Parquet table in the Impala database specified by a Data Hub Profile Profile. The table data is then available for query via Data Hub Query.

In a production environment, it is recommended that the size of the collected files ranges between 1 to 100 MB. Though it is possible to collect and process small batches the overhead of handling a large number of files will have significant impact on performance.

...