Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The schema in the previous section illustrates Apache Parquet concepts, but it helps to have a good grasp of primitives, nested groups, repetition levels, and logical types. Briefly:

Excerpt

Primitives in Apache are the fundamental data types. They consist of integers (for example, int32, int64), floating point (for example, float, double), Boolean (boolean), and bytearray (binary).

Nested groups in Apache are the way structured objects (consisting of primitives or lists of groups/primitives) are put together. In the example above, id is a nested group that includes a name (which is itself a nested group) and employeeNumber (an integer primitive).

Repetition levels are modifiers that specify whether a column is optional, required, or repeated multiple times.

Logical types are used to extend the sparse primitive types. For example, the bytearray data type can be used to specify strings and structured JSON as well as binary data.

For further reading on Parquet, see these documents: Apache Parquet Documentation, Parquet Logical Types Definitions, and Maven Repository Apache Parquet.

Working with the Parquet

...

agents

Insert excerpt
Parquet Examples
Parquet Examples
nopaneltrue

...