Press the Validate button to validate the Schema and make sure it has a correct format.
Defining the Parquet Schema
To be able to define a Schema, it is useful to have knowledge about primitives, nested groups, repetition levels, and logical types, as described below:
Insert excerpt
9.59 .1 Overview and ConceptsParquet Agents
9.59 .1 Overview and ConceptsParquet Agents
nopanel
true
Example Parquet Schema
Excerpt
Apache Parquet supports a small set of primitives (integer, floating point, boolean, and byte array). These primitives can be extended using logical type annotations which are modifiers on primitives. For example, the UTF8 annotation is a modifier to byte arrays that denotes string data. Parquet also supports structured data through groups and repetitions (that is, optional, required, repeated).
Info
title
Example - Parquet Schema
This structured text block shows an example Parquet schema for company employees:
Code Block
language
text
theme
Eclipse
message employee {
required group id {
required group name {
required binary surname (UTF8);
required binary firstName (UTF8);
optional binary preferredFirstName (UTF8);
}
required int32 employeeNumber;
}
optional group phones (LIST) {
repeated group list {
required group element {
required binary type (ENUM);
required binary phoneNumber (UTF8);
}
}
}
required binary email (UTF8);
optional binary manager (UTF8);
required binary jobTitle (UTF8);
required group team {
required binary country (UTF8);
required binary businessUnit (UTF8);
required binary function (UTF8);
optional binary team (UTF8);
optional binary department (UTF8);
required binary legalEntity (UTF8);
}
optional int32 birthdate (DATE);
}