Overview
This example illustrates typical use of the Parquet Decoder agent in a batch workflow. In this example, complete records are processed using the embedded document schema. The following configurations will be created:
- An Ultra Format
- A Batch Workflow that makes use of a Parquet Decoder agent that parses Parquet documents.
Define an Ultra Format
A simple Ultra Format needs to be created both for the incoming UDRs. For more information about the Ultra Format Editor and the UFDL syntax, refer to the Ultra Format Management User's Guide.
Info | |||||||
---|---|---|---|---|---|---|---|
| |||||||
Create an Ultra Format as defined below:
|
Create a Batch Workflow
In this workflow, Parquet files on disk are retrieved that are then decoded into UDRs that are written into a CSV file. The workflow is illustrated here:
Example workflow with Parquet Encoder
Walking through the example workflow from left to right, we have:
- A Disk agent named Disk_Source that reads in the source file (which contains a Parquet document) as a byte array.
- A Parquet Decoder agent that parses the bytes from the file as Parquet, passing ParquetDecoderUDRs to the Analysis agent.
- An Analysis agent named Analysis that transforms these incoming ParquetDecoderUDRs into BookRecord UDRs.
- An Encoder agent named CSV_Encoder that encodes the BookRecord UDRs as CSV bytes.
- The Disk_Destination forwarding agent receives the bytearray data and writes out a CSV document.
This section walks through the steps of creating such a batch workflow.
Disk
Disk_Source is a Disk Collection agent that collects data from an input file and forwards it to the Decoder agent.
Double-click on the Disk_Source agent to display the configuration dialog for the agent:
Example of a Disk agent configuration
...