Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Parquet Decoder

The Parquet Decoder agent collects the bytes from the Disk Collector into a complete Parquet document (with an embedded schema). The Parquet Decoder creates ParquetDecoderUDRs - one for each row - and forwards them on to the next agent.

Double-click on the Parquet Decoder agent to display the configuration dialog.

The Parquet Decoder agent with no Parquet Profile specified.

In this dialog, note that no Parquet Profile is specified. In this case, the ParquetDecoderUDRs will include all columns in the file. You can specify a Parquet Profile with a schema to subset the columns to increase performance.

Analysis

The Analysis Agent transforms the data from each ParquetDecoderUDR into a BookRecord UDR as defined above in the Ultra. In particular, the ParquetDecoderUDR includes a payload map whose with contents mirrors that mirror the Parquet schema defined in the profile - that data is available when constructing well-typed UDRs (for example, BookRecord).

Double-click on the Analysis agent to display the configuration dialog.

The Analysis agent dialogue with the APL code defined.

In this dialog, the APL code for handling input data is written. In the example, each ParquetDecoderUDR is transformed into a BookeRecord UDR. Adapt the code according to your requirements.

You can also see the UDR type used in the UDR Types field, in this example it is aParquetDecoderUDR.

Info
titleExample - Parquet APL

The APL code below shows an example of processing ParquetDecoderUDR:

Code Block
languagetext
themeEclipse
import ultra.Sandbox_Parquet_Autotest.Autotest_Ultra;

consume 
{
  switch (input)
    {
      case (ParquetDecoderUDR decoderUDR)
      {
        //  payload
        map<string,any> payload = decoderUDR.payload;
        map<string,any> author = (map<string,any>) mapGet(payload, "author");

        //  extract
        BookRecord record = udrCreate(BookRecord);
        record.title = (string) mapGet(payload, "title");
        record.authorName = (string) mapGet(author, "name");
        record.organization = (string) mapGet(author, "organization");
        record.copyrightYear = (date) mapGet(payload, "copyrightYear");
        record.numberOfPages = (int) mapGet(payload, "numberOfPages");

        //  normalize
        dateToString(record.copyrightYearString, record.copyrightYear, "yyyy");

        //  route
        udrRoute(record);
      }
    }
} 


The data in the payload map in the ParquetDecoderUDR conforms to the embedded schema.

Encoder

The Encoder agent receives the BookRecord UDRs from the Analysis agent and generates byte arrays in CSV format - one byte array for each UDR. Double-click on the Encoder agent to display the configuration dialog.


Example of an Encoder agent configuration

In this dialog, choose the Encoder that you defined in your Ultra Format.

...