Optimizer properties#

Примечание

Ниже приведена оригинальная документация Trino. Скоро мы ее переведем на русский язык и дополним полезными примерами.

cedrusdata.optimizer.join-pruning-enabled#

  • Тип: boolean

  • Значение по умолчанию: true

  • Свойство сессии: cedrusdata_join_pruning_enabled

Включает автоматическое удаление неиспользуемых операторов Join из запроса. В процессе работы оптимизатор заменяет неиспользуемый Join на одну из сторон Join, к которой могут быть применены дополнительные трансформации (проекция или фильтрация) для обеспечения эквивалентности результатов запроса. Замена возможна в следующем случае:

  • Запрос не использует атрибуты удаляемой стороны Join, или же они могут быть заменены атрибутами оставшейся стороны.

  • Удаляемая сторона не имеет дополнительных предикатов или же может быть создать эквивалентный предикат поверх оставшейся стороны.

  • Удаляемые сторона не является противоположной к null-producing стороне Join. Это означает, что в запросе a LEFT OUTER JOIN b может быть удалена только сторона b, в запросе a RIGHT OUTER JOIN b может быть удалена только сторона a, а в запросе a FULL OUTER JOIN b не может быть удалена ни одна сторона.

  • Запрос гарантированно не увеличивает количество записей оставшейся стороны. Для обеспечения данного условия необходимо задать PRIMARY KEY или UNIQUE constraint на удаляемой стороне и FOREIGN KEY constraint на противоположной стороне.

Для работы данной оптимизации необходимо указать примененные к таблицам ограничения (constraints) с помощью параметра конфигурации cedrusdata.optimizer.constraint.file или свойства сессии cedrusdata_constraints. См. {ref}(cedrusdata-constraints).

Для максимального упрощения запросов постарайтесь задать как можно больше PRIMARY KEY, UNIQUE и FOREIGN KEY ограничений, а так же NOT NULL ограничений на колонках, используемых в левой части FOREIGN KEY.

Например, следующий запрос может быть автоматически упрощен оптимизатором:

SET SESSION cedrusdata_constraints = ARRAY[
  'PRIMARY KEY example.tpch.part(partkey)',
  'PRIMARY KEY example.tpch.supplier(suppkey)', 
  'FOREIGN KEY example.tpch.partsupp(partkey) REFERENCES example.tpch.part(partkey)',
  'FOREIGN KEY example.tpch.partsupp(suppkey) REFERENCES example.tpch.supplier(suppkey)'
];

SELECT p_name, COUNT(*)  
FROM example.tpch.partsupp  
  INNER JOIN example.tpch.part ON ps_partkey = p_partkey 
  INNER JOIN example.tpch.supplier ON ps_suppkey = s_suppkey
WHERE p_retailprice > 100     
GROUP BY p_name

Упрощенный запрос после оптимизации:

SELECT p_name, COUNT(*)  
FROM example.tpch.partsupp  
  INNER JOIN example.tpch.part ON ps_partkey = p_partkey 
WHERE p_retailprice > 100 AND ps_suppkey IS NOT NULL      
GROUP BY p_name

cedrusdata.optimizer.constraint.file#

  • Тип: string

  • Значение по умолчанию: null

Задает путь к файлу, который содержит описание ограничений (constraints), примененных к таблицам. Например:

cedrusdata.optimizer.constraint.file=/path/to/file

Каждая строка файла представляет собой отдельное ограничение в одном из следующих форматов:

PRIMARY KEY table(список_колонок)
UNIQUE имя_таблицы(список_колонок)
FOREIGN KEY имя_таблицы(список_колонок) REFERENCES имя_таблицы(список_колонок) 
имя_таблицы(список_колонок) NOT NULL

Имя таблицы должно быть задано в fully-qualified формате каталог.схема.таблица. Имена колонок должны быть разделены запятой. Например:

PRIMARY KEY example.tpch.partsupp(partkey, suppkey)
FOREIGN KEY example.tpch.lineitem(partkey, suppkey) REFERENCES example.tpch.partsupp(partkey, suppkey)
example.tpch.nation(regionkey) NOT NULL

Обновлять содержимое файла можно без перезагрузки узла, задав параметр конфигурации cedrusdata.optimizer.constraint.file.refresh-period.

Можно переопределить ограничения с помощью свойства сессии cedrusdata_constraints. Например:

SET SESSION cedrusdata_constraints = ARRAY[
  'PRIMARY KEY example.tpch.part(partkey)',
  'PRIMARY KEY example.tpch.supplier(suppkey)', 
  'FOREIGN KEY example.tpch.partsupp(partkey) REFERENCES example.tpch.part(partkey)',
  'FOREIGN KEY example.tpch.partsupp(suppkey) REFERENCES example.tpch.supplier(suppkey)'
];

cedrusdata.optimizer.constraint.file.refresh-period#

  • Тип: duration

  • Значение по умолчанию: null (перечитывание файла отключено)

Задает частоту повторного чтения файла с ограничениями, заданного с помощью параметра конфигурации cedrusdata.optimizer.constraint.file. Например:

cedrusdata.optimizer.constraint.file=/path/to/file
cedrusdata.optimizer.constraint.file.refresh-period=1m

cedrusdata.optimizer.cascades-enabled#

  • Тип: boolean

  • Значение по умолчанию: false

  • Свойство сессии: cedrusdata_cascades_enabled

Включает cost-based оптимизацию операторов Exchange на основе алгоритма Cascades. Данный режим позволяет в ряде случаев сократить количество стадий, необходимых для выполнения запроса, тем самым повышая производительность. Данный режим также может быть включен с помощью параметра сессии cedrusdata_cascades_enabled. Подробнее: Режим оптимизации Cascades.

optimizer.dictionary-aggregation#

  • Type: boolean

  • Default value: false

  • Session property: dictionary_aggregation

Enables optimization for aggregations on dictionaries.

optimizer.optimize-hash-generation#

  • Type: boolean

  • Default value: false

  • Session property: `optimize_hash_generation

Compute hash codes for distribution, joins, and aggregations early during execution, allowing result to be shared between operations later in the query. This can reduce CPU usage by avoiding computing the same hash multiple times, but at the cost of additional network transfer for the hashes. In most cases it decreases overall query processing time.

It is often helpful to disable this property, when using EXPLAIN in order to make the query plan easier to read.

optimizer.optimize-metadata-queries#

  • Type: boolean

  • Default value: false

Enable optimization of some aggregations by using values that are stored as metadata. This allows Trino to execute some simple queries in constant time. Currently, this optimization applies to max, min and approx_distinct of partition keys and other aggregation insensitive to the cardinality of the input,including DISTINCT aggregates. Using this may speed up some queries significantly.

The main drawback is that it can produce incorrect results, if the connector returns partition keys for partitions that have no rows. In particular, the Hive connector can return empty partitions, if they were created by other systems. Trino cannot create them.

optimizer.mark-distinct-strategy#

  • Type: string

  • Allowed values: AUTOMATIC, ALWAYS, NONE

  • Default value: AUTOMATIC

  • Session property: mark_distinct_strategy

The mark distinct strategy to use for distinct aggregations. NONE does not use MarkDistinct operator. ALWAYS uses MarkDistinct for multiple distinct aggregations or for mix of distinct and non-distinct aggregations. AUTOMATIC limits the use of MarkDistinct only for cases with limited concurrency (global or small cardinality aggregations), where direct distinct aggregation implementation cannot utilize CPU efficiently.

optimizer.push-aggregation-through-outer-join#

  • Type: boolean

  • Default value: true

  • Session property: push_aggregation_through_outer_join

When an aggregation is above an outer join and all columns from the outer side of the join are in the grouping clause, the aggregation is pushed below the outer join. This optimization is particularly useful for correlated scalar subqueries, which get rewritten to an aggregation over an outer join. For example:

SELECT * FROM item i
    WHERE i.i_current_price > (
        SELECT AVG(j.i_current_price) FROM item j
            WHERE i.i_category = j.i_category);

Enabling this optimization can substantially speed up queries by reducing the amount of data that needs to be processed by the join. However, it may slow down some queries that have very selective joins.

optimizer.push-table-write-through-union#

  • Type: boolean

  • Default value: true

  • Session property: push_table_write_through_union

Parallelize writes when using UNION ALL in queries that write data. This improves the speed of writing output tables in UNION ALL queries, because these writes do not require additional synchronization when collecting results. Enabling this optimization can improve UNION ALL speed, when write speed is not yet saturated. However, it may slow down queries in an already heavily loaded system.

optimizer.join-reordering-strategy#

  • Type: string

  • Allowed values: AUTOMATIC, ELIMINATE_CROSS_JOINS, NONE

  • Default value: AUTOMATIC

  • Session property: join_reordering_strategy

The join reordering strategy to use. NONE maintains the order the tables are listed in the query. ELIMINATE_CROSS_JOINS reorders joins to eliminate cross joins, where possible, and otherwise maintains the original query order. When reordering joins, it also strives to maintain the original table order as much as possible. AUTOMATIC enumerates possible orders, and uses statistics-based cost estimation to determine the least cost order. If stats are not available, or if for any reason a cost could not be computed, the ELIMINATE_CROSS_JOINS strategy is used.

optimizer.max-reordered-joins#

When optimizer.join-reordering-strategy is set to cost-based, this property determines the maximum number of joins that can be reordered at once.

Предупреждение

The number of possible join orders scales factorially with the number of relations, so increasing this value can cause serious performance issues.

optimizer.optimize-duplicate-insensitive-joins#

  • Type: boolean

  • Default value: true

Reduces number of rows produced by joins when optimizer detects that duplicated join output rows can be skipped.

optimizer.use-exact-partitioning#

  • Type: boolean

  • Default value: false

Re-partition data unless the partitioning of the upstream stage exactly matches what the downstream stage expects. This can also be specified using the use_exact_partitioning session property.

optimizer.use-table-scan-node-partitioning#

  • Type: boolean

  • Default value: true

Use connector provided table node partitioning when reading tables. For example, table node partitioning corresponds to Hive table buckets. When set to true and minimal partition to task ratio is matched or exceeded, each table partition is read by a separate worker. The minimal ratio is defined in optimizer.table-scan-node-partitioning-min-bucket-to-task-ratio.

Partition reader assignments are distributed across workers for parallel processing. Use of table scan node partitioning can improve query performance by reducing query complexity. For example, cluster wide data reshuffling might not be needed when processing an aggregation query. However, query parallelism might be reduced when partition count is low compared to number of workers.

optimizer.table-scan-node-partitioning-min-bucket-to-task-ratio#

  • Type: double

  • Default value: 0.5

Specifies minimal bucket to task ratio that has to be matched or exceeded in order to use table scan node partitioning. When the table bucket count is small compared to the number of workers, then the table scan is distributed across all workers for improved parallelism.

optimizer.colocated-joins-enabled#

  • Type: boolean

  • Default value: true

  • Session property: colocated_join

Use co-located joins when both sides of a join have the same table partitioning on the join keys and the conditions for optimizer.use-table-scan-node-partitioning are met. For example, a join on bucketed Hive tables with matching bucketing schemes can avoid exchanging data between workers using a co-located join to improve query performance.

optimizer.filter-conjunction-independence-factor#

  • Type: double

  • Default value: 0.75

  • Min allowed value: 0

  • Max allowed value: 1

Scales the strength of independence assumption for estimating the selectivity of the conjunction of multiple predicates. Lower values for this property will produce more conservative estimates by assuming a greater degree of correlation between the columns of the predicates in a conjunction. A value of 0 results in the optimizer assuming that the columns of the predicates are fully correlated and only the most selective predicate drives the selectivity of a conjunction of predicates.

optimizer.join-multi-clause-independence-factor#

  • Type: double

  • Default value: 0.25

  • Min allowed value: 0

  • Max allowed value: 1

Scales the strength of independence assumption for estimating the output of a multi-clause join. Lower values for this property will produce more conservative estimates by assuming a greater degree of correlation between the columns of the clauses in a join. A value of 0 results in the optimizer assuming that the columns of the join clauses are fully correlated and only the most selective clause drives the selectivity of the join.

optimizer.non-estimatable-predicate-approximation.enabled#

  • Type: boolean

  • Default value: true

Enables approximation of the output row count of filters whose costs cannot be accurately estimated even with complete statistics. This allows the optimizer to produce more efficient plans in the presence of filters which were previously not estimated.

optimizer.join-partitioned-build-min-row-count#

  • Type: integer

  • Default value: 1000000

  • Min allowed value: 0

The minimum number of join build side rows required to use partitioned join lookup. If the build side of a join is estimated to be smaller than the configured threshold, single threaded join lookup is used to improve join performance. A value of 0 disables this optimization.

optimizer.min-input-size-per-task#

  • Type: data size

  • Default value: 5GB

  • Min allowed value: 0MB

  • Session property: min_input_size_per_task

The minimum input size required per task. This will help optimizer to determine hash partition count for joins and aggregations. Limiting hash partition count for small queries increases concurrency on large clusters where multiple small queries are running concurrently. The estimated value will always be between min_hash_partition_count and max_hash_partition_count session property. A value of 0MB disables this optimization.

optimizer.min-input-rows-per-task#

  • Type: integer

  • Default value: 10000000

  • Min allowed value: 0

  • Session property: min_input_rows_per_task

The minimum number of input rows required per task. This will help optimizer to determine hash partition count for joins and aggregations. Limiting hash partition count for small queries increases concurrency on large clusters where multiple small queries are running concurrently. The estimated value will always be between min_hash_partition_count and max_hash_partition_count session property. A value of 0 disables this optimization.

optimizer.use-cost-based-partitioning#

  • Type: boolean

  • Default value: true

  • Session property: use_cost_based_partitioning

When enabled the cost based optimizer is used to determine if repartitioning the output of an already partitioned stage is necessary.