Parquet Examples

This example is simple, but will help you understand the basic functionality of Apache Parquet.

Apache Parquet support is in the form of a pair of agents:

The Parquet Decoder processes data from incoming Parquet documents, and the Parquet Encoder creates outgoing Parquet documents. The Parquet Encoder – and optionally the Decoder – makes use of a Parquet Profile that encapsulates the schema as well as encoding options.

The Parquet Decoder agent receives Parquet data from file collectors in bytearray format, converts the data into ParquetDecoderUDRs (one UDR per record), and routes those UDRs forward into the workflow.

The Parquet Encoder agent receives ParquetEncoderUDRs, converts the data into Parquet, and forwards bytearray data to a forwarder to (eventually) be written to a Parquet document.

Example workflow with Parquet Decoder and Encoder

Note!

These Parquet agents are batch agents. Given that Parquet is a file-oriented encoding scheme that includes metadata about the entire document, batch agents – which natively support the processing of entire files – provide a tasteful lifecycle for Parquet.

This section provides examples of how to use the Parquet agents in batch workflows. The examples are simple and intended to be used as a base for further development.

The section contains the following subsections: