Hudi коннектор#
Примечание
Ниже приведена оригинальная документация Trino. Скоро мы ее переведем на русский язык и дополним полезными примерами.
The Hudi connector enables querying Hudi tables.
Requirements#
To use the Hudi connector, you need:
Network access from the Trino coordinator and workers to the Hudi storage.
Access to the Hive metastore service (HMS).
Network access from the Trino coordinator to the HMS.
Конфигурация#
The connector requires a Hive metastore for table metadata and supports the same
metastore configuration properties as the Hive connector. At a minimum, hive.metastore.uri
must be configured.
The connector recognizes Hudi tables synced to the metastore by the
Hudi sync tool.
To create a catalog that uses the Hudi connector, create a catalog properties file,
for example etc/catalog/example.properties
, that references the hudi
connector. Update the hive.metastore.uri
with the URI of your Hive metastore
Thrift service:
connector.name=hudi
hive.metastore.uri=thrift://example.net:9083
Additionally, following configuration properties can be set depending on the use-case.
Property name |
Description |
Default |
---|---|---|
|
Fetch the list of file names and sizes from metadata rather than storage. |
|
|
List of column names that are hidden from the query output. It can be used to hide Hudi meta fields. By default, no fields are hidden. |
|
|
Access Parquet columns using names from the file. If disabled, then columns are accessed using the index. Only applicable to Parquet file format. |
|
|
Minimum number of partitions returned in a single batch. |
|
|
Maximum number of partitions returned in a single batch. |
|
|
Unlike uniform splitting, size-based splitting ensures that each batch of splits has enough data to process. By default, it is enabled to improve performance. |
|
|
The split size corresponding to the standard weight (1.0) when size-based split weights are enabled. |
|
|
Minimum weight that a split can be assigned when size-based split weights are enabled. |
|
|
Rate at which splits are queued for processing. The queue is throttled if this rate limit is breached. |
|
|
Maximum outstanding splits in a batch enqueued for processing. |
|
Supported file types#
The connector supports Parquet file type.
SQL support#
The connector provides read access to data in the Hudi table that has been synced to Hive metastore. The globally available and read operation statements are supported.
Supported query types#
Hudi supports two types of tables depending on how the data is indexed and laid out on the file system. The following table displays a support matrix of tables types and query types for the connector.
Table type |
Supported query type |
---|---|
Copy on write |
Snapshot queries |
Merge on read |
Read optimized queries |
Examples queries#
In the queries below, stock_ticks_cow
is a Hudi copy-on-write table that we refer
in the Hudi quickstart documentation.
Here are some sample queries:
USE a-catalog.myschema;
SELECT symbol, max(ts)
FROM stock_ticks_cow
GROUP BY symbol
HAVING symbol = 'GOOG';
symbol | _col1 |
-----------+----------------------+
GOOG | 2018-08-31 10:59:00 |
(1 rows)
SELECT dt, symbol
FROM stock_ticks_cow
WHERE symbol = 'GOOG';
dt | symbol |
------------+--------+
2018-08-31 | GOOG |
(1 rows)
SELECT dt, count(*)
FROM stock_ticks_cow
GROUP BY dt;
dt | _col1 |
------------+--------+
2018-08-31 | 99 |
(1 rows)