Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...


SettingDescription

Schema

See below.

Validate

Press the Validate button to validate the Schema and make sure it has a correct format.

Defining the Parquet Schema

To be able to define a Schema, it is useful to have knowledge about primitives, nested groups, repetition levels, and logical types, as described below:

Insert excerpt
9.57.1 Overview and Concepts
9.57.1 Overview and Concepts
nopaneltrue

Excerpt

Example Parquet Schema

"""""""""""""""I SUGGEST MOVING THE SAME SECTION FROM OVERVIEWS AND CONCEPTS INSTEAD""""""""""""""""""""""

Apache Parquet supports a small set of primitives (integer, floating point, boolean, and byte array). These primitives can be extended using logical type annotations

,

which are modifiers on primitives. For example, the UTF8 annotation is a modifier to byte arrays that

denote

denotes string data. Parquet also supports structured data through groups and repetitions (

i.e.

that is, optional, required, repeated).

Info
titleExample - Parquet Schema
The

This structured text block shows an example Parquet schema for company employees:

Code Block
languagetext
themeEclipse
message employee {
  required group id {
    required group name {
      required binary surname (UTF8);
      required binary firstName (UTF8);
      optional binary preferredFirstName (UTF8);
    }
    required int32 employeeNumber;
  }
  optional group phones (LIST) {
    repeated group list {
      required group element {
        required binary type (ENUM);
        required binary phoneNumber (UTF8);
      }
    }
  }
  required binary email (UTF8);
  optional binary manager (UTF8);
  required binary jobTitle (UTF8);
  required group team {
    required binary country (UTF8);
    required binary businessUnit (UTF8);
    required binary function (UTF8);
    optional binary team (UTF8);
    optional binary department (UTF8);
    required binary legalEntity (UTF8);
  }
  optional int32 birthdate (DATE);
}