Overview
This example illustrates typical use of the Parquet Encoder agent in a batch workflow. The following configurations will be created:
- An Ultra Format
- A Parquet Profile
- A Batch Workflow that makes use of the Parquet Encoder agent to create Parquet documents
Define an Ultra Format
A simple Ultra Format needs to be created for the incoming UDRs. For more information about the Ultra Format Editor and the UFDL syntax, refer to the Ultra Format Management User's Guide.
Info | |||||||
---|---|---|---|---|---|---|---|
| |||||||
Create an Ultra Format as defined below:
|
Define a Parquet Profile
The Parquet Profile is used to define the Schema as well as define advanced properties for encoding. See .2.1 Parquet Profile Configuration for information on how to open the Parquet Profile editor.
Profile - Schema Tab
Profile Configuration Example - Schema Tab
...
Info | |||||||
---|---|---|---|---|---|---|---|
| |||||||
The structured text block shows an example Parquet schema for a book asset. Copy and paste this text to your schema.
|
Profile - Advanced Tab
Profile Configuration Example - Advanced Tab
The Advanced Tab includes a number of dialogs with default values retained.
Create a Batch Workflow
In this workflow, CSV records in a disk are retrieved that are then encoded into a Parquet document. The workflow is illustrated here:
...
This section walks through the steps of creating such a batch workflow.
Disk
Disk_Source is a Disk Collection agent that collects data from an input file and forwards it to the Decoder agent.
...
Example of a Disk agent configuration
Decoder
The Decoder agent receives the input data from the Disk agent, translates it into UDRs and forwards them to the Analysis agent. Double-click on the Decoder agent to display the configuration dialog.
...
In this dialog, choose the Decoder that you defined in your Ultra Format.
Analysis
The Analysis Agent transforms the data from each BookDecoder UDR into a ParquetEncoderUDR. In particular, the ParquetEncoderUDR includes a map with contents that mirror the Parquet schema defined in the profile.
...
Note in the code that the data in the payload map in the ParquetEncoderUDR mirrors the schema configured in the profile. Non-matching structures will result in errors at runtime.
Parquet Encoder
The Parquet Encoder agent creates a Parquet document based on the ParquetEncoderUDRs it receives from upstream agents.
...
In this dialog, choose the Parquet Profile that you defined earlier.
Disk Forwarder
Disk_Destination is a Disk Forwarding agent that writes bytes to an output file on disk.
...
Example of a Disk agent configuration
Running the Workflow
When you run the Workflow, it processes the CSV file from the input directory and writes out a corresponding Parquet file in the configured output directory.
...