Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

""""""""""""""""""REMOVE THIS SECTION AS IT'S AN EXACT REPETITION OF THE PREVIOUS, WHICH MAKES IT CONFUSING""""""""""""""""""""""" To create a new Parquet profile configuration, click the New Configuration button in the upper left part of the Desktop window, and then select Parquet Profile from the menu. The contents of the menus in the menu bar may change depending on which configuration type hasĀ  been opened. The Parquet profile uses the standard menu items and buttons that are visible for all configurations, and these are described in 2.1 Menus and ButtonsThe Schema tab is the primary configuration in the Parquet Profile. This tab allows the author to specify a Parquet Schema. This schema will be used for different purposes by the Parquet Encoder and Paruet Decoder agents.

  • Parquet Encoder Agent - The Parquet Encoder agent will generate a Parquet document that conforms to the specified schema. Not only will the data conform to the schema, but the schema itself is included in the Parquet document.

  • Parquet Decoder Agent - When the Parquet Decoder processes a Parquet document, only the columns included in the specified schema will be included. For example:
    • Consider a document with columns A, B, C, and D.
    • Assume that the schema in the Parquet Profile only specifies columns A and D.
    • The generated ParquetDecoderUDRs will include only fields A and D in the payload map.

Note that the Parquet Profile (and hence the schema) are required for Parquet Encoder agents and optional for Parquet Decoders.


The Parquet profile's Schema tab with an example of a defined Schema. You will have to write the Schema for your desired functions.


SettingDescription

Schema

See below.

Validate

Press the Validate button to validate the Schema and make sure it has a correct format.

Defining the Parquet Schema

To be able to define a Schema, it is useful to have knowledge about primitives, nested groups, repetition levels, and logical types, as described below:

Primitives in Apache are the fundamental data types. They consist of integers (e.g., int32, int64), floating point (e.g., float, double), Boolean (boolean), and byte array (binary).

Nested groups in Apache are the way structured objects (consisting of primitives or lists of groups/primitives) are put together. In the example below, id is a nested group that includes a name (which is itself a nested group) and employeeNumber (an integer primitive).

Repetition levels are modifiers that specify whether a column is optional, required, or repeated multiple times.

Logical types are used to extend the sparse primitive types. For example, the byte array data type can be used to specify strings and structured JSON as well as binary data.

Insert excerpt
9.57.1 Overview and Concepts59 Parquet Agents
9.57.1 Overview and Concepts59 Parquet Agents
nopaneltrue

Example Parquet Schema

Excerpt

Apache Parquet supports a small set of primitives (integer, floating point, boolean, and byte array). These primitives can be extended using logical type annotations

,

which are modifiers on primitives. For example, the UTF8 annotation is a modifier to byte arrays that

denote

denotes string data. Parquet also supports structured data through groups and repetitions (

i.e.

that is, optional, required, repeated).

Info
titleExample - Parquet Schema
The

This structured text block shows an example Parquet schema for company employees:

Code Block
languagetext
themeEclipse
message employee {
  required group id {
    required group name {
      required binary surname (UTF8);
      required binary firstName (UTF8);
      optional binary preferredFirstName (UTF8);
    }
    required int32 employeeNumber;
  }
  optional group phones (LIST) {
    repeated group list {
      required group element {
        required binary type (ENUM);
        required binary phoneNumber (UTF8);
      }
    }
  }
  required binary email (UTF8);
  optional binary manager (UTF8);
  required binary jobTitle (UTF8);
  required group team {
    required binary country (UTF8);
    required binary businessUnit (UTF8);
    required binary function (UTF8);
    optional binary team (UTF8);
    optional binary department (UTF8);
    required binary legalEntity (UTF8);
  }
  optional int32 birthdate (DATE);
}




Scroll ignore
scroll-viewportfalse
scroll-pdftrue
scroll-officefalse
scroll-chmtrue
scroll-docbooktrue
scroll-eclipsehelptrue
scroll-epubtrue
scroll-htmlfalse


Next: