This example illustrates typical use of the Parquet Decoder agent in a batch workflow. In this example, complete records are processed using the embedded document schema. The following configurations will be created:
- Ultra Format
- Batch Workflow that makes use of a Parquet Decoder agent that parses Parquet documents.
Define an Ultra Format
A simple Ultra Format needs to be created for the incoming UDRs. For more information about the Ultra Format Editor and the UFDL syntax, refer to the Ultra Format[hide]3.0[/hide].
Info | |||||||
---|---|---|---|---|---|---|---|
| |||||||
Create an Ultra Format as defined below:
|
Create a Batch Workflow
In this workflow, Parquet files on disk are retrieved that are then decoded into UDRs that are written into a CSV file. The workflow is illustrated here:
Example workflow with Parquet Encoder
Walking through the example workflow from left to right, we have:
- A Disk agent that reads in the source file (which contains a Parquet document) as a byte array.
- A Parquet Decoder agent that parses the bytes from the file as Parquet, passing ParquetDecoderUDRs to the Analysis agent.
- An Analysis agent that transforms these incoming ParquetDecoderUDRs into BookRecord UDRs.
- An Encoder agent that encodes the BookRecord UDRs as CSV bytes.
- The Disk forwarding agent receives the bytearray data and writes out a CSV document.
This section walks through the steps of creating such a batch workflow.
Disk
Disk_Input is a Disk Collection agent that collects data from an input file and forwards it to the Decoder agent.
Double-click on the Disk_Source agent to display the configuration dialog for the agent:
Example of a Disk agent configuration
...