Object storage file formats#
Примечание
Ниже приведена оригинальная документация Trino. Скоро мы ее переведем на русский язык и дополним полезными примерами.
Object storage connectors support one or more file formats specified by the underlying data source.
ORC format configuration properties#
The following properties are used to configure the read and write operations with ORC files performed by supported object storage connectors:
Property Name |
Description |
Default |
---|---|---|
|
Sets the default time zone for legacy ORC files that did not declare a time zone. |
JVM default |
|
Enable bloom filters for predicate pushdown. |
|
|
Allow reads on ORC files with short zone ID in the stripe footer. |
|
File compression and decompression is automatically performed and some details can be configured.
Parquet format configuration properties#
The following properties are used to configure the read and write operations with Parquet files performed by supported object storage connectors:
Property Name |
Description |
Default |
---|---|---|
|
Adjusts timestamp values to a specific time zone. For Hive 3.1+, set this to UTC. |
JVM default |
|
Percentage of parquet files to validate after write by re-reading the whole
file. The equivalent catalog session property is
|
|
|
Maximum size of pages written by Parquet writer. |
|
|
Maximum values count of pages written by Parquet writer. |
|
|
Maximum size of row groups written by Parquet writer. |
|
|
Maximum number of rows processed by the parquet writer in a batch. |
|
|
Whether bloom filters are used for predicate pushdown when reading Parquet
files. Set this property to |
|
|
Skip reading Parquet pages by using Parquet column indices. The equivalent
catalog session property is |
|
|
Ignore statistics from Parquet to allow querying files with corrupted or
incorrect statistics. The equivalent catalog session property is
|
|
|
Sets the maximum number of rows read in a batch. The equivalent catalog
session property is named |
|
|
Data size below which a Parquet file is read
entirely. The equivalent catalog session property is named
|
|
|
Enable using Java Vector API (SIMD) for faster decoding of parquet files.
The equivalent catalog session property is
|
|
File compression and decompression is automatically performed and some details can be configured.