Monitoring with JMX#

Примечание

Ниже приведена оригинальная документация Trino. Скоро мы ее переведем на русский язык и дополним полезными примерами.

Trino exposes a large number of different metrics via the Java Management Extensions (JMX).

You have to enable JMX by setting the ports used by the RMI registry and server in the config.properties file:

jmx.rmiregistry.port=9080
jmx.rmiserver.port=9081
  • jmx.rmiregistry.port: Specifies the port for the JMX RMI registry. JMX clients should connect to this port.

  • jmx.rmiserver.port: Specifies the port for the JMX RMI server. Trino exports many metrics, that are useful for monitoring via JMX.

Additionally configure a Java system property in the jvm.config with the RMI server port:

-Dcom.sun.management.jmxremote.rmi.port=9081

JConsole (supplied with the JDK), VisualVM, and many other tools can be used to access the metrics in a client application. Many monitoring solutions support JMX. You can also use the JMX коннектор and query the metrics using SQL.

Many of these JMX metrics are a complex metric object such as a CounterStat that has a collection of related metrics. For example, InputPositions has InputPositions.TotalCount, InputPositions.OneMinute.Count, and so on.

A small subset of the available metrics are described below.

JVM#

  • Heap size: java.lang:type=Memory:HeapMemoryUsage.used

  • Thread count: java.lang:type=Threading:ThreadCount

Trino cluster and nodes#

  • Active nodes: trino.failuredetector:name=HeartbeatFailureDetector:ActiveCount

  • Free memory (general pool): trino.memory:type=ClusterMemoryPool:name=general:FreeDistributedBytes

  • Cumulative count (since Trino started) of queries that ran out of memory and were killed: trino.memory:name=ClusterMemoryManager:QueriesKilledDueToOutOfMemory

Trino queries#

  • Active queries currently executing or queued: trino.execution:name=QueryManager:RunningQueries

  • Queries started: trino.execution:name=QueryManager:StartedQueries.FiveMinute.Count

  • Failed queries from last 5 min (all): trino.execution:name=QueryManager:FailedQueries.FiveMinute.Count

  • Failed queries from last 5 min (internal): trino.execution:name=QueryManager:InternalFailures.FiveMinute.Count

  • Failed queries from last 5 min (external): trino.execution:name=QueryManager:ExternalFailures.FiveMinute.Count

  • Failed queries (user): trino.execution:name=QueryManager:UserErrorFailures.FiveMinute.Count

  • Execution latency (P50): trino.execution:name=QueryManager:ExecutionTime.FiveMinutes.P50

  • Input data rate (P90): trino.execution:name=QueryManager:WallInputBytesRate.FiveMinutes.P90

Trino tasks#

  • Input data bytes: trino.execution:name=SqlTaskManager:InputDataSize.FiveMinute.Count

  • Input rows: trino.execution:name=SqlTaskManager:InputPositions.FiveMinute.Count

Connectors#

Many connectors provide their own metrics. The metric names typically start with trino.plugin.