Pinot коннектор#
Примечание
Ниже приведена оригинальная документация Trino. Скоро мы ее переведем на русский язык и дополним полезными примерами.
The Pinot connector allows Trino to query data stored in Apache Pinot™.
Requirements#
To connect to Pinot, you need:
Pinot 1.1.0 or higher.
Network access from the Trino coordinator and workers to the Pinot controller nodes. Port 8098 is the default port.
Configuration#
To configure the Pinot connector, create a catalog properties file
e.g. etc/catalog/example.properties
with at least the following contents:
connector.name=pinot
pinot.controller-urls=host1:8098,host2:8098
Replace host1:8098,host2:8098
with a comma-separated list of Pinot controller nodes.
This can be the ip or the FDQN, the url scheme (http://
) is optional.
Configuration properties#
General configuration properties#
Property name |
Required |
Description |
---|---|---|
|
Yes |
A comma separated list of controller hosts. If Pinot is deployed via Kubernetes this needs to point to the controller service endpoint. The Pinot broker and server must be accessible via DNS as Pinot returns hostnames and not IP addresses. |
|
No |
A host and port of broker. If broker URL exposed by Pinot controller API is not accessible, this property can be used to specify the broker endpoint. Enabling this property will disable broker discovery. |
|
No |
Pinot connection timeout, default is |
|
No |
Pinot metadata expiration time, default is |
|
No |
Pinot authentication method for controller requests. Allowed values are |
|
No |
Controller username for basic authentication method. |
|
No |
Controller password for basic authentication method. |
|
No |
Pinot authentication method for broker requests. Allowed values are |
|
No |
Broker username for basic authentication method. |
|
No |
Broker password for basic authentication method. |
|
No |
Fail query if Pinot server split returns more rows than configured, default to |
|
No |
Pinot query plan prefers to query Pinot broker, default is |
|
No |
Forbid parallel querying and force all querying to happen via the broker, default is |
|
No |
The number of segments processed in a split. Setting this higher reduces the number of requests made to Pinot. This is useful for smaller Pinot clusters, default is |
|
No |
Retry count for retriable Pinot data fetch calls, default is |
|
No |
Max limit for non aggregate queries to the Pinot broker, default is |
|
No |
Max rows for a broker query can return, default is |
|
No |
Push down aggregation queries, default is |
|
No |
Push down count distinct queries to Pinot, default is |
|
No |
Max allowed page size for segment query, default is |
|
No |
Use Pinot Proxy for controller and broker requests, default is |
If pinot.controller.authentication.type
is set to PASSWORD
then both pinot.controller.authentication.user
and
pinot.controller.authentication.password
are required.
If pinot.broker.authentication.type
is set to PASSWORD
then both pinot.broker.authentication.user
and
pinot.broker.authentication.password
are required.
If pinot.controller-urls
uses https
scheme then TLS is enabled for all connections including brokers.
gRPC configuration properties#
Property name |
Required |
Description |
---|---|---|
|
No |
Pinot gRPC port, default to |
|
No |
Max inbound message bytes when init gRPC client, default is |
|
No |
Use plain text for gRPC communication, default to |
|
No |
TLS keystore type for gRPC connection, default is |
|
No |
TLS keystore file location for gRPC connection, default is empty. |
|
No |
TLS keystore password, default is empty. |
|
No |
TLS truststore type for gRPC connection, default is |
|
No |
TLS truststore file location for gRPC connection, default is empty. |
|
No |
TLS truststore password, default is empty. |
|
No |
SSL provider, default is |
|
No |
Pinot Rest Proxy gRPC endpoint URI, default is null. |
For more Apache Pinot TLS configurations, please also refer to Configuring TLS/SSL.
You can use secrets to avoid actual values in the catalog properties files.
Querying Pinot tables#
The Pinot connector automatically exposes all tables in the default schema of the catalog. You can list all tables in the pinot catalog with the following query:
SHOW TABLES FROM example.default;
You can list columns in the flight_status table:
DESCRIBE example.default.flight_status;
SHOW COLUMNS FROM example.default.flight_status;
Queries written with SQL are fully supported and can include filters and limits:
SELECT foo
FROM pinot_table
WHERE bar = 3 AND baz IN ('ONE', 'TWO', 'THREE')
LIMIT 25000;
Dynamic tables#
To leverage Pinot’s fast aggregation, a Pinot query written in PQL can be used as the table name. Filters and limits in the outer query are pushed down to Pinot. Let’s look at an example query:
SELECT *
FROM example.default."SELECT MAX(col1), COUNT(col2) FROM pinot_table GROUP BY col3, col4"
WHERE col3 IN ('FOO', 'BAR') AND col4 > 50
LIMIT 30000
Filtering and limit processing is pushed down to Pinot.
The queries are routed to the broker and are more suitable to aggregate queries.
For SELECT
queries without aggregates it is more performant to issue a regular SQL query.
Processing is routed directly to the servers that store the data.
The above query is translated to the following Pinot PQL query:
SELECT MAX(col1), COUNT(col2)
FROM pinot_table
WHERE col3 IN('FOO', 'BAR') and col4 > 50
TOP 30000
Type mapping#
Because Trino and Pinot each support types that the other does not, this connector maps some types when reading data.
Pinot type to Trino type mapping#
The connector maps Pinot types to the corresponding Trino types according to the following table:
Pinot type |
Trino type |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
No other types are supported.
Date Type#
For Pinot DateTimeFields
, if the FormatSpec
is in days,
then it is converted to a Trino DATE
type.
Pinot allows for LONG
fields to have a FormatSpec
of days as well, if the
value is larger than Integer.MAX_VALUE
then the conversion to Trino DATE
fails.
Null Handling#
If a Pinot TableSpec has nullHandlingEnabled
set to true, then for numeric
types the null value is encoded as MIN_VALUE
for that type.
For Pinot STRING
type, the value null
is interpreted as a NULL
value.
SQL support#
The connector provides globally available and read operation statements to access data and metadata in Pinot.
Pushdown#
The connector supports pushdown for a number of operations:
Aggregate pushdown for the following functions:
Aggregate function pushdown is enabled by default, but can be disabled with the
catalog property pinot.aggregation-pushdown.enabled
or the catalog session
property aggregation_pushdown_enabled
.
A count(distint)
pushdown may cause Pinot to run a full table scan with
significant performance impact. If you encounter this problem, you can disable
it with the catalog property pinot.count-distinct-pushdown.enabled
or the
catalog session property count_distinct_pushdown_enabled
.
Примечание
Коннектор осуществляет pushdown для улучшения производительности запросов. Коннектор не будет делать pushdown конкретной операции, если это может привести к некорректным результатам. Таким образом коннектор предпочитает корректность производительности. В некоторых случаях коннекторы могут предоставлять дополнительные параметры конфигурации, которые разрешают pushdown небезопасных операций, но только при явном указании соответствующего параметра пользователем.