...
Setting | Description |
---|
Schema | |
Validate | Press the Validate button to validate the Schema and make sure it has a correct format. |
Defining the Parquet Schema
To be able to define a Schema, it is useful to have knowledge about primitives, nested groups, repetition levels, and logical types, as described below:
Insert excerpt |
---|
| 9.57.1 Overview and Concepts |
---|
| 9.57.1 Overview and Concepts |
---|
nopanel | true |
---|
|
Excerpt |
---|
Example Parquet Schema |
"""""""""""""""I SUGGEST MOVING THE SAME SECTION FROM OVERVIEWS AND CONCEPTS INSTEAD""""""""""""""""""""""Apache Parquet supports a small set of primitives (integer, floating point, boolean, and byte array). These primitives can be extended using logical type annotations |
, which are modifiers on primitives. For example, the UTF8 annotation is a modifier to byte arrays that |
denote denotes string data. Parquet also supports structured data through groups and repetitions ( |
i.e.that is, optional, required, repeated). Info |
---|
title | Example - Parquet Schema |
---|
| The This structured text block shows an example Parquet schema for company employees: Code Block |
---|
| message employee {
required group id {
required group name {
required binary surname (UTF8);
required binary firstName (UTF8);
optional binary preferredFirstName (UTF8);
}
required int32 employeeNumber;
}
optional group phones (LIST) {
repeated group list {
required group element {
required binary type (ENUM);
required binary phoneNumber (UTF8);
}
}
}
required binary email (UTF8);
optional binary manager (UTF8);
required binary jobTitle (UTF8);
required group team {
required binary country (UTF8);
required binary businessUnit (UTF8);
required binary function (UTF8);
optional binary team (UTF8);
optional binary department (UTF8);
required binary legalEntity (UTF8);
}
optional int32 birthdate (DATE);
} |
|
|