Object storage file formats#
Примечание
Ниже приведена оригинальная документация Trino. Скоро мы ее переведем на русский язык и дополним полезными примерами.
Object storage connectors support one or more file formats specified by the underlying data source.
ORC format configuration properties#
The following properties are used to configure the read and write operations with ORC files performed by supported object storage connectors:
Property Name  | 
Description  | 
Default  | 
|---|---|---|
  | 
Sets the default time zone for legacy ORC files that did not declare a time zone.  | 
JVM default  | 
  | 
Enable bloom filters for predicate pushdown.  | 
  | 
  | 
Allow reads on ORC files with short zone ID in the stripe footer.  | 
  | 
File compression and decompression is automatically performed and some details can be configured.
Parquet format configuration properties#
The following properties are used to configure the read and write operations with Parquet files performed by supported object storage connectors:
Property Name  | 
Description  | 
Default  | 
|---|---|---|
  | 
Adjusts timestamp values to a specific time zone. For Hive 3.1+, set this to UTC.  | 
JVM default  | 
  | 
Percentage of parquet files to validate after write by re-reading the whole
file. The equivalent catalog session property is
  | 
  | 
  | 
Maximum size of pages written by Parquet writer.  | 
  | 
  | 
Maximum values count of pages written by Parquet writer.  | 
  | 
  | 
Maximum size of row groups written by Parquet writer.  | 
  | 
  | 
Maximum number of rows processed by the parquet writer in a batch.  | 
  | 
  | 
Whether bloom filters are used for predicate pushdown when reading Parquet
files. Set this property to   | 
  | 
  | 
Skip reading Parquet pages by using Parquet column indices. The equivalent
catalog session property is   | 
  | 
  | 
Ignore statistics from Parquet to allow querying files with corrupted or
incorrect statistics. The equivalent catalog session property is
  | 
  | 
  | 
Sets the maximum number of rows read in a batch. The equivalent catalog
session property is named   | 
  | 
  | 
Data size below which a Parquet file is read
entirely. The equivalent catalog session property is named
  | 
  | 
  | 
Enable using Java Vector API (SIMD) for faster decoding of parquet files.
The equivalent catalog session property is
  | 
  | 
File compression and decompression is automatically performed and some details can be configured.