db

db_yugabyte_http

YugabyteDB by HTTP

Overview

This template is designed for the deployment of YugabyteDB monitoring by Zabbix via HTTP and doesn't require any external scripts.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

YugabyteDB, version 2.19.2.0

Configuration

Setup

Set your account ID as a value of the {$YUGABYTEDB.ACCOUNT.ID} macro. The account ID is the unique identifier for your customer account in YugabyteDB Managed. You can access the account ID from your profile in the YugabyteDB Managed user interface. To get your account ID, log in to YugabyteDB Managed and click the user profile icon. See YugabyteDB documentation for instructions.
Set your project ID as a value of the {$YUGABYTEDB.PROJECT.ID} macro. The project ID is the unique identifier for a YugabyteDB Managed project. You can access the project ID from your profile in the YugabyteDB Managed user interface (along with the account ID). See YugabyteDB documentation for instructions.
Generate the API access token and specify it as a value of the {$YUGABYTEDB.ACCESS.TOKEN} macro. See YugabyteDB documentation for instructions.

NOTE If needed, you can specify a HTTP proxy for the template to use by changing the value of the {$YUGABYTEDB.PROXY} user macro.

IMPORTANT

The value of the {$YUGABYTEDB.ACCESS.TOKEN} macro is stored as plain (not secret) text by default.

Macros used

Name	Description	Default
{$YUGABYTEDB.ACCOUNT.ID}	YugabyteDB account ID.	`<Put your account ID here>`
{$YUGABYTEDB.PROJECT.ID}	YugabyteDB project ID.	`<Put your project ID here>`
{$YUGABYTEDB.ACCESS.TOKEN}	Access token for the YugabyteDB API.	`<Put your access token here>`
{$YUGABYTEDB.PROXY}	Sets the HTTP proxy value. If this macro is empty, then no proxy is used.

Items

Name Description Type Key and additional info

YugabyteDB: Get cluster

Name	Description	Type	Key and additional info
YugabyteDB: Get cluster	Get raw data about clusters.	Script	yugabytedb.clusters.get
YugabyteDB: Get clusters item error	Item for gathering all the cluster item errors.	Dependent item	yugabytedb.clusters.get.errors Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `12h`

Get raw data about clusters.

Script

yugabytedb.clusters.get

YugabyteDB: Get clusters item error

Item for gathering all the cluster item errors.

Dependent item

yugabytedb.clusters.get.errors

Preprocessing

JSON Path: $.error
⛔️Custom on fail: Discard value
Discard unchanged with heartbeat: 12h

Triggers

Name	Description	Expression	Severity	Dependencies and additional info
YugabyteDB: Failed to fetch data	Failed to fetch data about cluster.	`length(last(/YugabyteDB by HTTP/yugabytedb.clusters.get.errors)) > 0`\|Warning

LLD rule Cluster discovery

Name Description Type Key and additional info

Cluster discovery

Name	Description	Type	Key and additional info
Cluster discovery	Discovery of the available clusters.	Dependent item	yugabytedb.cluster.discovery Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1d`

Discovery of the available clusters.

Dependent item

yugabytedb.cluster.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1d

YugabyteDB Cluster by HTTP

Macros used

Name	Description	Default
{$YUGABYTEDB.CLUSTER.NAME}	Name of cluster.	`<Put your cluster name here>`
{$YUGABYTEDB.CLUSTER.ID}	ID of cluster.	`<Put your cluster ID here>`
{$YUGABYTEDB.MEMORY.CLUSTER.UTILIZATION.WARN}	The percentage of memory use on the cluster - for the Warning trigger expression.	`70`
{$YUGABYTEDB.MEMORY.CLUSTER.UTILIZATION.CRIT}	The percentage of memory use on the cluster - for the High trigger expression.	`90`
{$YUGABYTEDB.DISK.UTILIZATION.WARN}	The percentage of disk use in the cluster - for the Warning trigger expression.	`75`
{$YUGABYTEDB.DISK.UTILIZATION.CRIT}	The percentage of disk use in the cluster - for the High trigger expression.	`90`
{$YUGABYTEDB.CONNECTION.UTILIZATION.WARN}	The percentage of connections in the cluster - for the Warning trigger expression.	`75`
{$YUGABYTEDB.CONNECTION.UTILIZATION.CRIT}	The percentage of connections in the cluster - for the High trigger expression.	`90`
{$YUGABYTEDB.CPU.UTILIZATION.CRIT}	The threshold of CPU utilization for the High trigger expression, expressed in percent.	`90`
{$YUGABYTEDB.CPU.UTILIZATION.WARN}	The threshold of CPU utilization for the Warning trigger expression, expressed in percent.	`75`
{$YUGABYTEDB.IOPS.UTILIZATION.WARN}	The percentage of IOPS use on the node - for the Warning trigger expression.	`75`
{$YUGABYTEDB.IOPS.UTILIZATION.CRIT}	The percentage of IOPS use on the node - for the High trigger expression.	`90`
{$YUGABYTEDB.PROXY}	Sets the HTTP proxy value. If this macro is empty, then no proxy is used.

Items

Name	Description	Type	Key and additional info
YugabyteDB Cluster: Get cluster	Get raw data about clusters.	Script	yugabytedb.cluster.get
YugabyteDB Cluster: Get cluster item error	Item for gathering all the cluster item errors.	Dependent item	yugabytedb.cluster.get.errors Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `12h`
YugabyteDB Cluster: Get keyspace	Get raw data about keyspaces.	Script	yugabytedb.keyspace.get
YugabyteDB Cluster: Get keyspace item error	Item for gathering all the keyspace item errors.	Dependent item	yugabytedb.keyspace.get.errors Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `12h`
YugabyteDB Cluster: Get node	Get raw data about nodes.	Script	yugabytedb.node.get
YugabyteDB Cluster: Get node item error	Item for gathering all the node item errors.	Dependent item	yugabytedb.node.get.errors Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `12h`
YugabyteDB Cluster: Get cluster metrics	Getting metrics for the cluster.	Script	yugabytedb.cluster.metric.get
YugabyteDB Cluster: Get cluster metrics item error	Item for gathering all the cluster item errors.	Dependent item	yugabytedb.cluster.metric.get.errors Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `12h`
YugabyteDB Cluster: Get cluster query statistic	Getting SQL statistics for the cluster.	Script	yugabytedb.cluster.query.statistic.get
YugabyteDB Cluster: Get cluster query statistic item error	Item for gathering all the cluster query statistics item errors.	Dependent item	yugabytedb.cluster.query.statistic.get.errors Preprocessing JSON Path: `$.error` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `12h`
YugabyteDB Cluster: State	The current state of the cluster. One of the following: - INVALID - QUEUED - INIT - BOOTSTRAPPING - VPCPEERING - NETWORKCREATING - PROVISIONING - CONFIGURING - CREATINGLB - UPDATINGLB - ACTIVE - PAUSING - PAUSED - RESUMING - UPDATING - MAINTENANCE - RESTORE - FAILED - CREATEFAILED - DELETING - STARTINGNODE - STOPPINGNODE - REBOOTINGNODE - CREATEREADREPLICAFAILED - DELETEREADREPLICAFAILED - DELETECLUSTERFAILED - EDITCLUSTERFAILED - EDITREADREPLICAFAILED - PAUSECLUSTERFAILED - RESUMECLUSTERFAILED - RESTOREBACKUPFAILED - CERTIFICATEROTATIONFAILED - UPGRADECLUSTERFAILED - UPGRADECLUSTERGFLAGSFAILED - UPGRADECLUSTEROSFAILED - UPGRADECLUSTERSOFTWAREFAILED - STARTNODEFAILED - STOPNODEFAILED - REBOOTNODEFAILED - CONFIGURECMK - ENABLINGCMK - DISABLINGCMK - UPDATINGCMK - ROTATINGCMK - STOPPINGMETRICSEXPORTER - STARTINGMETRICSEXPORTER - CONFIGURINGMETRICSEXPORTER - STOPMETRICSEXPORTERFAILED - STARTMETRICSEXPORTERFAILED - CONFIGUREMETRICSEXPORTERFAILED - REMOVINGMETRICSEXPORTER - REMOVEMETRICSEXPORTER_FAILED	Dependent item	yugabytedb.cluster.state Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `12h`
YugabyteDB Cluster: Type	The kind of cluster deployment: SYNCHRONOUS or GEO_PARTITIONED.	Dependent item	yugabytedb.cluster.type Preprocessing JSON Path: `$.data.spec.cluster_info.cluster_type` Replace: `SYNCHRONOUS -> 0` Replace: `GEO_PARTITIONED -> 1` Discard unchanged with heartbeat: `1d`
YugabyteDB Cluster: Number of nodes	How many nodes are in the cluster.	Dependent item	yugabytedb.cluster.node.number Preprocessing JSON Path: `$.data.spec.cluster_info.num_nodes` Discard unchanged with heartbeat: `12h`
YugabyteDB Cluster: Software version	The current version of YugabyteDB installed on the cluster.	Dependent item	yugabytedb.cluster.software.version Preprocessing JSON Path: `$.data.info.software_version` Discard unchanged with heartbeat: `12h`
YugabyteDB Cluster: YB controller version	The current version of the YB controller installed on the cluster.	Dependent item	yugabytedb.cluster.ybc.version Preprocessing JSON Path: `$.data.info.ybc_version` Discard unchanged with heartbeat: `12h`
YugabyteDB Cluster: Health state	Current state regarding the health of the cluster: - HEALTHY - NEEDS_ATTENTION - UNHEALTHY - UNKNOWN	Dependent item	yugabytedb.cluster.health.state Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `12h`
YugabyteDB Cluster: CPU utilization	The percentage of CPU use being consumed by the tablet or master server Yugabyte processes, as well as other processes, if any.	Dependent item	yugabytedb.cluster.cpu.utilization Preprocessing JSON Path: `$.metrics.CPU_USAGE`
YugabyteDB Cluster: Disk space usage	Shows the amount of disk space used by the cluster.	Dependent item	yugabytedb.cluster.disk.usage Preprocessing JSON Path: `$.metrics.DISK_USAGE_GB` Custom multiplier: `1073741824`
YugabyteDB Cluster: Disk space provisioned	Shows the amount of disk space provisioned for the cluster.	Dependent item	yugabytedb.cluster.disk.provisioned Preprocessing JSON Path: `$.metrics.PROVISIONED_DISK_SPACE_GB` Custom multiplier: `1073741824` Discard unchanged with heartbeat: `1h`
YugabyteDB Cluster: Disk space utilization	Shows the percentage of disk space used by the cluster.	Calculated	yugabytedb.cluster.disk.utilization
YugabyteDB Cluster: Disk read, Bps	The number of bytes being read from disk per second, averaged over each node.	Dependent item	yugabytedb.cluster.disk.read.bps Preprocessing JSON Path: `$.metrics.DISK_BYTES_READ_MB_PER_SEC` Custom multiplier: `1048576`
YugabyteDB Cluster: Disk write, Bps	The number of bytes being written to disk per second, averaged over each node.	Dependent item	yugabytedb.cluster.disk.write.bps Preprocessing JSON Path: `$.metrics.DISK_BYTES_WRITTEN_MB_PER_SEC` Custom multiplier: `1048576`
YugabyteDB Cluster: Disk read OPS	The number of read operations per second.	Dependent item	yugabytedb.cluster.disk.read.ops Preprocessing JSON Path: `$.metrics.READ_OPS_PER_SEC`
YugabyteDB Cluster: Disk write OPS	The number of write operations per second.	Dependent item	yugabytedb.cluster.disk.write.ops Preprocessing JSON Path: `$.metrics.WRITE_OPS_PER_SEC`
YugabyteDB Cluster: Average read latency	The average latency of read operations at the tablet level.	Dependent item	yugabytedb.cluster.read.latency Preprocessing JSON Path: `$.metrics.AVERAGE_READ_LATENCY_MS` Custom multiplier: `0.001` ⛔️Custom on fail: Set value to: `0`
YugabyteDB Cluster: Average write latency	The average latency of write operations at the tablet level.	Dependent item	yugabytedb.cluster.write.latency Preprocessing JSON Path: `$.metrics.AVERAGE_WRITE_LATENCY_MS` Custom multiplier: `0.001` ⛔️Custom on fail: Set value to: `0`
YugabyteDB Cluster: YSQL connections limit	The limit of the number of connections to the YSQL backend for all nodes.	Dependent item	yugabytedb.cluster.connection.limit Preprocessing JSON Path: `$.metrics.YSQL_CONNECTION_LIMIT` Discard unchanged with heartbeat: `12h`
YugabyteDB Cluster: YSQL connections average used	Cumulative number of connections to the YSQL backend for all nodes.	Dependent item	yugabytedb.cluster.connection.count Preprocessing JSON Path: `$.metrics.AVERAGE_YSQL_CONNECTION_COUNT`
YugabyteDB Cluster: YSQL connections utilization	Cumulative number of connections to the YSQL backend for all nodes, expressed in percent.	Calculated	yugabytedb.cluster.connection.utilization
YugabyteDB Cluster: YSQL connections maximum used	Maximum of used connections to the YSQL backend for all nodes.	Dependent item	yugabytedb.cluster.connection.max Preprocessing JSON Path: `$.metrics.YSQL_MAX_CONNECTION_COUNT`
YugabyteDB Cluster: Clock skew	The clock drift and skew across different nodes.	Dependent item	yugabytedb.cluster.node.skew Preprocessing JSON Path: `$.metrics.NODE_CLOCK_SKEW` Custom multiplier: `0.001` ⛔️Custom on fail: Discard value
YugabyteDB Cluster: Memory total	Shows the amount of RAM provisioned to the cluster.	Dependent item	yugabytedb.cluster.memory.total Preprocessing JSON Path: `$.metrics.MEMORY_TOTAL_GB` Custom multiplier: `1073741824` Discard unchanged with heartbeat: `12h`
YugabyteDB Cluster: Memory usage	Shows the amount of RAM used on the cluster.	Dependent item	yugabytedb.cluster.memory.usage Preprocessing JSON Path: `$.metrics.MEMORY_USAGE_GB` Custom multiplier: `1073741824`
YugabyteDB Cluster: Memory utilization	Shows the amount of RAM used on the cluster, expressed in percent.	Calculated	yugabytedb.cluster.memory.utilization
YugabyteDB Cluster: Network receive, Bps	The size of network packets received per second, averaged over nodes.	Dependent item	yugabytedb.cluster.network.receive.bps Preprocessing JSON Path: `$.metrics.NETWORK_RECEIVE_BYTES_MB_PER_SEC` Custom multiplier: `1048576`
YugabyteDB Cluster: Network transmit, Bps	The size of network packets transmitted per second, averaged over nodes.	Dependent item	yugabytedb.cluster.network.transmit.bps Preprocessing JSON Path: `$.metrics.NETWORK_TRANSMIT_BYTES_MB_PER_SEC` Custom multiplier: `1048576`
YugabyteDB Cluster: Network receive error, rate	The number of errors related to network packets received per second, averaged over nodes.	Dependent item	yugabytedb.cluster.network.receive.error.rate Preprocessing JSON Path: `$.metrics.NETWORK_RECEIVE_ERRORS_PER_SEC`
YugabyteDB Cluster: Network transmit error, rate	The number of errors related to network packets transmitted per second, averaged over nodes.	Dependent item	yugabytedb.cluster.network.transmit.error.rate Preprocessing JSON Path: `$.metrics.NETWORK_TRANSMIT_ERRORS_PER_SEC`
YugabyteDB Cluster: YSQL SELECT OPS	The count of SELECT statements executed through the YSQL API per second. This does not include index writes.	Dependent item	yugabytedb.cluster.ysql.select.ops Preprocessing JSON Path: `$.metrics.YSQL_SELECT_OPS_PER_SEC`
YugabyteDB Cluster: YSQL DELETE OPS	The count of DELETE statements executed through the YSQL API per second. This does not include index writes.	Dependent item	yugabytedb.cluster.ysql.delete.ops Preprocessing JSON Path: `$.metrics.YSQL_DELETE_OPS_PER_SEC`
YugabyteDB Cluster: YSQL UPDATE OPS	The count of UPDATE statements executed through the YSQL API per second. This does not include index writes.	Dependent item	yugabytedb.cluster.ysql.update.ops Preprocessing JSON Path: `$.metrics.YSQL_UPDATE_OPS_PER_SEC`
YugabyteDB Cluster: YSQL INSERT OPS	The count of INSERT statements executed through the YSQL API per second. This does not include index writes.	Dependent item	yugabytedb.cluster.ysql.insert.ops Preprocessing JSON Path: `$.metrics.YSQL_INSERT_OPS_PER_SEC`
YugabyteDB Cluster: YSQL OTHER OPS	The count of OTHER statements executed through the YSQL API per second.	Dependent item	yugabytedb.cluster.ysql.other.ops Preprocessing JSON Path: `$.metrics.YSQL_OTHER_OPS_PER_SEC`
YugabyteDB Cluster: YSQL transaction OPS	The count of transactions executed through the YSQL API per second.	Dependent item	yugabytedb.cluster.ysql.transaction.ops Preprocessing JSON Path: `$.metrics.YSQL_TRANSACTION_OPS_PER_SEC`
YugabyteDB Cluster: YSQL SELECT average latency	Average time of executing SELECT statements through the YSQL API.	Dependent item	yugabytedb.cluster.ysql.select.latency Preprocessing JSON Path: `$.metrics.YSQL_SELECT_LATENCY_MS` Custom multiplier: `0.001` ⛔️Custom on fail: Set value to: `0`
YugabyteDB Cluster: YSQL DELETE average latency	Average time of executing DELETE statements through the YSQL API.	Dependent item	yugabytedb.cluster.ysql.delete.latency Preprocessing JSON Path: `$.metrics.YSQL_DELETE_LATENCY_MS` Custom multiplier: `0.001` ⛔️Custom on fail: Set value to: `0`
YugabyteDB Cluster: YSQL UPDATE average latency	Average time of executing UPDATE statements through the YSQL API.	Dependent item	yugabytedb.cluster.ysql.update.latency Preprocessing JSON Path: `$.metrics.YSQL_UPDATE_LATENCY_MS` Custom multiplier: `0.001` ⛔️Custom on fail: Set value to: `0`
YugabyteDB Cluster: YSQL INSERT average latency	Average time of executing INSERT statements through the YSQL API.	Dependent item	yugabytedb.cluster.ysql.insert.latency Preprocessing JSON Path: `$.metrics.YSQL_INSERT_LATENCY_MS` Custom multiplier: `0.001` ⛔️Custom on fail: Set value to: `0`
YugabyteDB Cluster: YSQL OTHER average latency	Average time of executing OTHER statements through the YSQL API.	Dependent item	yugabytedb.cluster.ysql.other.latency Preprocessing JSON Path: `$.metrics.YSQL_OTHER_LATENCY_MS` Custom multiplier: `0.001` ⛔️Custom on fail: Set value to: `0`
YugabyteDB Cluster: YSQL transaction average latency	Average time of executing transactions through the YSQL API.	Dependent item	yugabytedb.cluster.ysql.transaction.latency Preprocessing JSON Path: `$.metrics.YSQL_TRANSACTION_LATENCY_MS` Custom multiplier: `0.001` ⛔️Custom on fail: Set value to: `0`
YugabyteDB Cluster: YCQL SELECT OPS	The count of SELECT statements executed through the YCQL API per second. This does not include index writes.	Dependent item	yugabytedb.cluster.ycql.select.ops Preprocessing JSON Path: `$.metrics.YCQL_SELECT_OPS_PER_SEC`
YugabyteDB Cluster: YCQL DELETE OPS	The count of DELETE statements executed through the YCQL API per second. This does not include index writes.	Dependent item	yugabytedb.cluster.ycql.delete.ops Preprocessing JSON Path: `$.metrics.YCQL_DELETE_OPS_PER_SEC`
YugabyteDB Cluster: YCQL INSERT OPS	The count of INSERT statements executed through the YCQL API per second. This does not include index writes.	Dependent item	yugabytedb.cluster.ycql.insert.ops Preprocessing JSON Path: `$.metrics.YCQL_INSERT_OPS_PER_SEC`
YugabyteDB Cluster: YCQL OTHER OPS	The count of OTHER statements executed through the YCQL API per second.	Dependent item	yugabytedb.cluster.ycql.other.ops Preprocessing JSON Path: `$.metrics.YCQL_OTHER_OPS_PER_SEC`
YugabyteDB Cluster: YCQL UPDATE OPS	The count of UPDATE statements executed through the YCQL API per second. This does not include index writes.	Dependent item	yugabytedb.cluster.ycql.update.ops Preprocessing JSON Path: `$.metrics.YCQL_UPDATE_OPS_PER_SEC`
YugabyteDB Cluster: YCQL transaction OPS	The count of transactions executed through the YCQL API per second.	Dependent item	yugabytedb.cluster.ycql.transaction.ops Preprocessing JSON Path: `$.metrics.YCQL_TRANSACTION_OPS_PER_SEC`
YugabyteDB Cluster: YCQL SELECT average latency	Average time of executing SELECT statements through the YCQL API.	Dependent item	yugabytedb.cluster.ycql.select.latency Preprocessing JSON Path: `$.metrics.YCQL_SELECT_LATENCY_MS` Custom multiplier: `0.001` ⛔️Custom on fail: Set value to: `0`
YugabyteDB Cluster: YCQL DELETE average latency	Average time of executing DELETE statements through the YCQL API.	Dependent item	yugabytedb.cluster.ycql.delete.latency Preprocessing JSON Path: `$.metrics.YCQL_DELETE_LATENCY_MS` Custom multiplier: `0.001` ⛔️Custom on fail: Set value to: `0`
YugabyteDB Cluster: YCQL INSERT average latency	Average time of executing INSERT statements through the YCQL API.	Dependent item	yugabytedb.cluster.ycql.insert.latency Preprocessing JSON Path: `$.metrics.YCQL_INSERT_LATENCY_MS` Custom multiplier: `0.001` ⛔️Custom on fail: Set value to: `0`
YugabyteDB Cluster: YCQL OTHER average latency	Average time of executing OTHER statements through the YCQL API.	Dependent item	yugabytedb.cluster.ycql.other.latency Preprocessing JSON Path: `$.metrics.YCQL_OTHER_LATENCY_MS` Custom multiplier: `0.001` ⛔️Custom on fail: Set value to: `0`
YugabyteDB Cluster: YCQL UPDATE average latency	Average time of executing UPDATE statements through the YCQL API.	Dependent item	yugabytedb.cluster.ycql.update.latency Preprocessing JSON Path: `$.metrics.YCQL_UPDATE_LATENCY_MS` Custom multiplier: `0.001` ⛔️Custom on fail: Set value to: `0`
YugabyteDB Cluster: YCQL transaction average latency	Average time of executing transactions through the YCQL API.	Dependent item	yugabytedb.cluster.ycql.transaction.latency Preprocessing JSON Path: `$.metrics.YCQL_TRANSACTION_LATENCY_MS` Custom multiplier: `0.001` ⛔️Custom on fail: Set value to: `0`

Triggers

Name	Description	Expression	Severity
YugabyteDB Cluster: Failed to fetch data	Failed to fetch data from the YugabyteDB API.	`length(last(/YugabyteDB Cluster by HTTP/yugabytedb.node.get.errors)) > 0 or length(last(/YugabyteDB Cluster by HTTP/yugabytedb.keyspace.get.errors)) > 0 or length(last(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.get.errors)) > 0`\|Warning
YugabyteDB Cluster: Failed to fetch metric data	Failed to fetch cluster metrics or cluster statistics.	`length(last(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.query.statistic.get.errors)) > 0 or length(last(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.metric.get.errors)) > 0`\|Warning
YugabyteDB Cluster: Cluster software version has changed	YugabyteDB Cluster software version has changed. Acknowledge to close the problem manually.	`last(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.software.version,#1) <> last(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.software.version,#2) and length(last(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.software.version)) > 0`\|Info	Manual close: Yes
YugabyteDB Cluster: YB controller version has changed	YugabyteDB Cluster YB controller version has changed. Acknowledge to close the problem manually.	`last(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.ybc.version,#1) <> last(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.ybc.version,#2) and length(last(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.ybc.version)) > 0`\|Info	Manual close: Yes
YugabyteDB Cluster: Cluster is not healthy	YugabyteDB Cluster is not healthy.	`last(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.health.state,#1) <> 0`\|Average
YugabyteDB Cluster: CPU utilization is too high	YugabyteDB Cluster CPU utilization is more than {$YUGABYTEDB.CPU.UTILIZATION.CRIT}%. The system might be slow to respond.	`min(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.cpu.utilization,5m) > {$YUGABYTEDB.CPU.UTILIZATION.CRIT}`\|High
YugabyteDB Cluster: CPU utilization is high	YugabyteDB Cluster CPU utilization is more than {$YUGABYTEDB.CPU.UTILIZATION.WARN}%. The system might be slow to respond.	`min(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.cpu.utilization,5m) > {$YUGABYTEDB.CPU.UTILIZATION.WARN}`\|Warning	Depends on: YugabyteDB Cluster: CPU utilization is too high
YugabyteDB Cluster: Storage space is low	YugabyteDB Cluster uses more than {$YUGABYTEDB.DISK.UTILIZATION.WARN}% of disk space.	`min(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.disk.utilization,5m) > {$YUGABYTEDB.DISK.UTILIZATION.WARN}`\|Warning	Depends on: YugabyteDB Cluster: Storage space is critically low
YugabyteDB Cluster: Storage space is critically low	YugabyteDB Cluster uses more than {$YUGABYTEDB.DISK.UTILIZATION.CRIT}% of disk space.	`min(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.disk.utilization,5m) > {$YUGABYTEDB.DISK.UTILIZATION.CRIT}`\|High
YugabyteDB Cluster: Average utilization of connections is high	YugabyteDB Cluster uses more than {$YUGABYTEDB.CONNECTION.UTILIZATION.WARN}% of the connection limit.	`min(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.connection.utilization,5m) > {$YUGABYTEDB.CONNECTION.UTILIZATION.WARN}`\|Warning	Depends on: YugabyteDB Cluster: Average utilization of connections is too high
YugabyteDB Cluster: Average utilization of connections is too high	YugabyteDB Cluster uses more than {$YUGABYTEDB.CONNECTION.UTILIZATION.CRIT}% of the connection limit.	`min(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.connection.utilization,5m) > {$YUGABYTEDB.CONNECTION.UTILIZATION.CRIT}`\|High
YugabyteDB Cluster: Memory utilization is high	YugabyteDB Cluster uses more than {$YUGABYTEDB.MEMORY.CLUSTER.UTILIZATION.WARN}% of memory.	`min(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.memory.utilization,5m) > {$YUGABYTEDB.MEMORY.CLUSTER.UTILIZATION.WARN}`\|Warning	Depends on: YugabyteDB Cluster: Memory utilization is too high
YugabyteDB Cluster: Memory utilization is too high	YugabyteDB Cluster uses more than {$YUGABYTEDB.MEMORY.CLUSTER.UTILIZATION.CRIT}% of memory.	`min(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.memory.utilization,5m) > {$YUGABYTEDB.MEMORY.CLUSTER.UTILIZATION.CRIT}`\|High

LLD rule Keyspace discovery

Name Description Type Key and additional info

Keyspace discovery

Name	Description	Type	Key and additional info
Keyspace discovery	Discovery of the available keyspaces.	Dependent item	yugabytedb.keyspace.discovery Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1d`

Discovery of the available keyspaces.

Dependent item

yugabytedb.keyspace.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1d

Item prototypes for Keyspace discovery

Name	Description	Type	Key and additional info
YugabyteDB Keyspace [{#KEYSPACE.NAME}]: Get keyspace info	Get raw data about the keyspace [{#KEYSPACE.NAME}].	Dependent item	yugabytedb.keyspace.get[{#KEYSPACE.NAME}] Preprocessing JSON Path: `$.keyspaces.[?(@.keyspace_name=='{#KEYSPACE.NAME}')].first()` ⛔️Custom on fail: Discard value
YugabyteDB Keyspace [{#KEYSPACE.NAME}]: SST size	The size of the table's SST.	Dependent item	yugabytedb.keyspace.sst.size[{#KEYSPACE.NAME}] Preprocessing JSON Path: `$.size_bytes`
YugabyteDB Keyspace [{#KEYSPACE.NAME}]: Wal size	The size of the table's WAL.	Dependent item	yugabytedb.keyspace.wal.size[{#KEYSPACE.NAME}] Preprocessing JSON Path: `$.wal_size_bytes`
YugabyteDB Keyspace [{#KEYSPACE.NAME}]: Type	The type of keyspace: YSQL or YCQL.	Dependent item	yugabytedb.keyspace.type[{#KEYSPACE.NAME}] Preprocessing JSON Path: `$.type` Replace: `YSQL -> 0` Replace: `YCQL -> 1` Discard unchanged with heartbeat: `1d`

LLD rule Node discovery

Name Description Type Key and additional info

Node discovery

Name	Description	Type	Key and additional info
Node discovery	Discovery of the nodes for all clusters.	Dependent item	yugabytedb.node.discovery Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1d`

Discovery of the nodes for all clusters.

Dependent item

yugabytedb.node.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1d

Item prototypes for Node discovery

Name	Description	Type	Key and additional info
YugabyteDB Node [{#NODE.NAME}]: Get node info	Get raw data about the node [{#NODE.NAME}].	Dependent item	yugabytedb.node.get[{#NODE.NAME}] Preprocessing JSON Path: `$.nodes.[?(@.name=='{#NODE.NAME}')].first()` ⛔️Custom on fail: Discard value
YugabyteDB Node [{#NODE.NAME}]: Disk IOPS limit	The IOPS to provision for the node [{#NODE.NAME}] for each disk.	Dependent item	yugabytedb.node.iops.limit[{#NODE.NAME}] Preprocessing JSON Path: `$.data.spec.cluster_info.node_info.disk_iops` Discard unchanged with heartbeat: `12h`
YugabyteDB Node [{#NODE.NAME}]: Total disk size	The disk size (GB) for the node [{#NODE.NAME}].	Dependent item	yugabytedb.node.disk.size.total[{#NODE.NAME}] Preprocessing JSON Path: `$.data.spec.cluster_info.node_info.disk_size_gb` Discard unchanged with heartbeat: `12h`
YugabyteDB Node [{#NODE.NAME}]: Total memory, bytes	The amount of RAM for the node [{#NODE.NAME}].	Dependent item	yugabytedb.node.memory.total[{#NODE.NAME}] Preprocessing JSON Path: `$.data.spec.cluster_info.node_info.memory_mb` Custom multiplier: `1048576` Discard unchanged with heartbeat: `12h`
YugabyteDB Node [{#NODE.NAME}]: Total CPU cores	The number of cores for the node [{#NODE.NAME}].	Dependent item	yugabytedb.node.cpu.num.cores[{#NODE.NAME}] Preprocessing JSON Path: `$.data.spec.cluster_info.node_info.num_cores` Discard unchanged with heartbeat: `12h`
YugabyteDB Node [{#NODE.NAME}]: Region	The cloud information for the node [{#NODE.NAME}] about the region.	Dependent item	yugabytedb.node.region[{#NODE.NAME}] Preprocessing JSON Path: `$.cloud_info.region` Discard unchanged with heartbeat: `1d`
YugabyteDB Node [{#NODE.NAME}]: Zone	The cloud information for the node [{#NODE.NAME}] about the zone.	Dependent item	yugabytedb.node.zone[{#NODE.NAME}] Preprocessing JSON Path: `$.cloud_info.zone` Discard unchanged with heartbeat: `1d`
YugabyteDB Node [{#NODE.NAME}]: Total SST file size	The size of all SST files.	Dependent item	yugabytedb.node.sst.file.size.total[{#NODE.NAME}] Preprocessing JSON Path: `$.metrics.total_sst_file_size_bytes`
YugabyteDB Node [{#NODE.NAME}]: Uncompressed SST file size	The size of uncompressed SST files.	Dependent item	yugabytedb.node.sst.file.size.uncompressed[{#NODE.NAME}] Preprocessing JSON Path: `$.metrics.uncompressed_sst_file_size_bytes`
YugabyteDB Node [{#NODE.NAME}]: Read OPS	The amount of read operations per second for the node [{#NODE.NAME}].	Dependent item	yugabytedb.node.read.ops[{#NODE.NAME}] Preprocessing JSON Path: `$.metrics.read_ops_per_sec`
YugabyteDB Node [{#NODE.NAME}]: Write OPS	The amount of write operations per second for the node [{#NODE.NAME}].	Dependent item	yugabytedb.node.write.ops[{#NODE.NAME}] Preprocessing JSON Path: `$.metrics.write_ops_per_sec`
YugabyteDB Node [{#NODE.NAME}]: Disk IOPS utilization	Shows the utilization of provisioned IOPS.	Calculated	yugabytedb.node.iops.utilization[{#NODE.NAME}]
YugabyteDB Node [{#NODE.NAME}]: Node status	The current status of the node [{#NODE.NAME}]: 0 = Down 1 = Up	Dependent item	yugabytedb.node.status[{#NODE.NAME}] Preprocessing JSON Path: `$.is_node_up` Replace: `true -> 1` Replace: `false -> 0` Discard unchanged with heartbeat: `1h`
YugabyteDB Node [{#NODE.NAME}]: Node is master	The current role of the node [{#NODE.NAME}]: 0 = False 1 = True	Dependent item	yugabytedb.node.master[{#NODE.NAME}] Preprocessing JSON Path: `$.is_master` Replace: `true -> 1` Replace: `false -> 0` Discard unchanged with heartbeat: `1h`
YugabyteDB Node [{#NODE.NAME}]: Node is TServer	This item indicates if the node [{#NODE.NAME}] is a TServer node: 0 = False 1 = True	Dependent item	yugabytedb.node.tserver[{#NODE.NAME}] Preprocessing JSON Path: `$.is_tserver` Replace: `true -> 1` Replace: `false -> 0` Discard unchanged with heartbeat: `1h`
YugabyteDB Node [{#NODE.NAME}]: Node is read replica	This item indicates if the node [{#NODE.NAME}] is a read replica: 0 = False 1 = True	Dependent item	yugabytedb.node.read.replica[{#NODE.NAME}] Preprocessing JSON Path: `$.is_read_replica` Replace: `true -> 1` Replace: `false -> 0` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Node discovery

Name	Description	Expression	Severity
YugabyteDB Node [{#NODE.NAME}]: Node disk IOPS utilization is high	IOPS utilization on the node [{#NODE.NAME}] is more than {$YUGABYTEDB.IOPS.UTILIZATION.WARN}% of the provisioned IOPS.	`min(/YugabyteDB Cluster by HTTP/yugabytedb.node.iops.utilization[{#NODE.NAME}],5m) > {$YUGABYTEDB.IOPS.UTILIZATION.WARN}`\|Warning	Depends on: YugabyteDB Node [{#NODE.NAME}]: Node disk IOPS utilization is too high
YugabyteDB Node [{#NODE.NAME}]: Node disk IOPS utilization is too high	IOPS utilization on the node [{#NODE.NAME}] is more than {$YUGABYTEDB.IOPS.UTILIZATION.CRIT}% of the provisioned IOPS.	`min(/YugabyteDB Cluster by HTTP/yugabytedb.node.iops.utilization[{#NODE.NAME}],5m) > {$YUGABYTEDB.IOPS.UTILIZATION.CRIT}`\|High
YugabyteDB Node [{#NODE.NAME}]: Node is down	The node [{#NODE.NAME}] is down.	`max(/YugabyteDB Cluster by HTTP/yugabytedb.node.status[{#NODE.NAME}],3m) = 0`\|Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

db_tidb_tikv_http

View README Download JSON

TiDB TiKV by HTTP

Overview

The template to monitor TiKV server of TiDB cluster by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template TiDB TiKV by HTTP — collects metrics by HTTP agent from TiKV /metrics endpoint.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

TiDB cluster 4.0.10, 6.5.1

Configuration

Setup

This template works with TiKV server of TiDB cluster. Internal service metrics are collected from TiKV /metrics endpoint. Don't forget to change the macros {$TIKV.URL}, {$TIKV.PORT}. Also, see the Macros section for a list of macros used to set trigger values.

Macros used

Name	Description	Default
{$TIKV.PORT}	The port of TiKV server metrics web endpoint	`20180`
{$TIKV.URL}	TiKV server URL	`localhost`
{$TIKV.COPROCESSOR.ERRORS.MAX.WARN}	Maximum number of coprocessor request errors	`1`
{$TIKV.STORE.ERRORS.MAX.WARN}	Maximum number of failure messages	`1`
{$TIKV.PENDING_COMMANDS.MAX.WARN}	Maximum number of pending commands	`1`
{$TIKV.PENDING_TASKS.MAX.WARN}	Maximum number of tasks currently running by the worker or pending	`1`

Items

Name	Description	Type	Key and additional info
TiKV: Get instance metrics	Get TiKV instance metrics.	HTTP agent	tikv.get_metrics Preprocessing Check for not supported value ⛔️Custom on fail: Discard value Prometheus to JSON
TiKV: Store size	The storage size of TiKV instance.	Dependent item	tikv.engine_size Preprocessing JSON Path: `$[?(@.name == "tikv_engine_size_bytes")].value.sum()`
TiKV: Get store size metrics	Get capacity metrics of TiKV instance.	Dependent item	tikv.store_size.metrics Preprocessing JSON Path: `$[?(@.name == "tikv_store_size_bytes")]` ⛔️Custom on fail: Discard value
TiKV: Available size	The available capacity of TiKV instance.	Dependent item	tikv.store_size.available Preprocessing JSON Path: `$[?(@.labels.type == "available")].value.first()`
TiKV: Capacity size	The capacity size of TiKV instance.	Dependent item	tikv.store_size.capacity Preprocessing JSON Path: `$[?(@.labels.type == "capacity")].value.first()`
TiKV: Bytes read	The total bytes of read in TiKV instance.	Dependent item	tikv.engineflowbytes.read Preprocessing JSON Path: `The text is too long. Please see the template.`
TiKV: Bytes write	The total bytes of write in TiKV instance.	Dependent item	tikv.engineflowbytes.write Preprocessing JSON Path: `The text is too long. Please see the template.`
TiKV: Storage: commands total, rate	Total number of commands received per second.	Dependent item	tikv.storage_command.rate Preprocessing JSON Path: `$[?(@.name == "tikv_storage_command_total")].value.sum()` Change per second
TiKV: CPU util	The CPU usage ratio on TiKV instance.	Dependent item	tikv.cpu.util Preprocessing JSON Path: `$[?(@.name == "tikv_thread_cpu_seconds_total")].value.sum()` Change per second Custom multiplier: `100`
TiKV: RSS memory usage	Resident memory size in bytes.	Dependent item	tikv.rss_bytes Preprocessing JSON Path: `The text is too long. Please see the template.`
TiKV: Regions, count	The number of regions collected in TiKV instance.	Dependent item	tikv.region_count Preprocessing JSON Path: `The text is too long. Please see the template.`
TiKV: Regions, leader	The number of leaders in TiKV instance.	Dependent item	tikv.region_leader Preprocessing JSON Path: `The text is too long. Please see the template.`
TiKV: Get QPS metrics	Get QPS metrics in TiKV instance.	Dependent item	tikv.grpc_msgs.metrics Preprocessing JSON Path: `$[?(@.name == "tikv_grpc_msg_duration_seconds_count")]` ⛔️Custom on fail: Discard value
TiKV: Total query, rate	The total QPS in TiKV instance.	Dependent item	tikv.grpc_msg.rate Preprocessing JSON Path: `$..value.sum()` Change per second
TiKV: Total query errors, rate	The total number of gRPC message handling failure per second.	Dependent item	tikv.grpcmsgfail.rate Preprocessing JSON Path: `$[?(@.name == "tikv_grpc_msg_fail_total")].value.sum()` ⛔️Custom on fail: Discard value Change per second
TiKV: Coprocessor: Errors, rate	Total number of push down request error per second.	Dependent item	tikv.coprocessorrequesterror.rate Preprocessing JSON Path: `$[?(@.name == "tikv_coprocessor_request_error")].value.sum()` ⛔️Custom on fail: Discard value Change per second
TiKV: Get coprocessor requests metrics	Get metrics of coprocessor requests.	Dependent item	tikv.coprocessor_requests.metrics Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
TiKV: Coprocessor: Requests, rate	Total number of coprocessor requests per second.	Dependent item	tikv.coprocessor_request.rate Preprocessing JSON Path: `$..value.sum()` Change per second
TiKV: Coprocessor: Scan keys, rate	Total number of scan keys observed per request per second.	Dependent item	tikv.coprocessorscankeys_sum.rate Preprocessing JSON Path: `$[?(@.name == "tikv_coprocessor_scan_keys")].value.sum()` ⛔️Custom on fail: Discard value Change per second
TiKV: Coprocessor: RocksDB ops, rate	Total number of RocksDB internal operations from PerfContext per second.	Dependent item	tikv.coprocessorrocksdbperf.rate Preprocessing JSON Path: `$[?(@.name == "tikv_coprocessor_rocksdb_perf")].value.sum()` ⛔️Custom on fail: Discard value Change per second
TiKV: Coprocessor: Response size, rate	The total size of coprocessor response per second.	Dependent item	tikv.coprocessorresponsebytes.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
TiKV: Scheduler: Pending commands	The total number of pending commands. The scheduler receives commands from clients, executes them against the MVCC layer storage engine.	Dependent item	tikv.scheduler_contex Preprocessing JSON Path: `$[?(@.name == "tikv_scheduler_contex_total")].value.first()`
TiKV: Scheduler: Busy, rate	The total count of too busy schedulers per second.	Dependent item	tikv.schedulertoobusy.rate Preprocessing JSON Path: `$[?(@.name == "tikv_scheduler_too_busy_total")].value.sum()` ⛔️Custom on fail: Discard value Change per second
TiKV: Get scheduler metrics	Get metrics of scheduler commands.	Dependent item	tikv.scheduler.metrics Preprocessing JSON Path: `$[?(@.name == "tikv_scheduler_stage_total")]` ⛔️Custom on fail: Discard value
TiKV: Scheduler: Commands total, rate	Total number of commands per second.	Dependent item	tikv.scheduler_commands.rate Preprocessing JSON Path: `$..value.sum()` ⛔️Custom on fail: Set value to: `0` Change per second
TiKV: Scheduler: Low priority commands total, rate	Total count of low priority commands per second.	Dependent item	tikv.commands_pri.low.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
TiKV: Scheduler: Normal priority commands total, rate	Total count of normal priority commands per second.	Dependent item	tikv.commands_pri.normal.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
TiKV: Scheduler: High priority commands total, rate	Total count of high priority commands per second.	Dependent item	tikv.commands_pri.high.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
TiKV: Snapshot: Pending tasks	The number of tasks currently running by the worker or pending.	Dependent item	tikv.workerpendingtask Preprocessing JSON Path: `The text is too long. Please see the template.`
TiKV: Snapshot: Sending	The total amount of raftstore snapshot traffic.	Dependent item	tikv.snapshot.sending Preprocessing JSON Path: `The text is too long. Please see the template.`
TiKV: Snapshot: Receiving	The total amount of raftstore snapshot traffic.	Dependent item	tikv.snapshot.receiving Preprocessing JSON Path: `The text is too long. Please see the template.`
TiKV: Snapshot: Applying	The total amount of raftstore snapshot traffic.	Dependent item	tikv.snapshot.applying Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
TiKV: Uptime	The runtime of each TiKV instance.	Dependent item	tikv.uptime Preprocessing JSON Path: `$[?(@.name=="process_start_time_seconds")].value.first()` JavaScript: `The text is too long. Please see the template.`
TiKV: Get failure msg metrics	Get metrics of reporting failure messages.	Dependent item	tikv.messages.failure.metrics Preprocessing JSON Path: `$[?(@.name == "tikv_server_report_failure_msg_total")]` ⛔️Custom on fail: Discard value
TiKV: Server: failure messages total, rate	Total number of reporting failure messages per second.	Dependent item	tikv.messages.failure.rate Preprocessing JSON Path: `$..value.sum()` ⛔️Custom on fail: Discard value Change per second

Triggers

Name	Description	Expression	Severity
TiKV: Too many coprocessor request error		`min(/TiDB TiKV by HTTP/tikv.coprocessor_request_error.rate,5m)>{$TIKV.COPROCESSOR.ERRORS.MAX.WARN}`\|Warning
TiKV: Too many pending commands		`min(/TiDB TiKV by HTTP/tikv.scheduler_contex,5m)>{$TIKV.PENDING_COMMANDS.MAX.WARN}`\|Average
TiKV: Too many pending tasks		`min(/TiDB TiKV by HTTP/tikv.worker_pending_task,5m)>{$TIKV.PENDING_TASKS.MAX.WARN}`\|Average
TiKV: has been restarted	Uptime is less than 10 minutes.	`last(/TiDB TiKV by HTTP/tikv.uptime)<10m`\|Info	Manual close: Yes

LLD rule QPS metrics discovery

Name Description Type Key and additional info

QPS metrics discovery

Name	Description	Type	Key and additional info
QPS metrics discovery	Discovery QPS metrics.	Dependent item	tikv.qps.discovery Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Discovery QPS metrics.

Dependent item

tikv.qps.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for QPS metrics discovery

Name Description Type Key and additional info

TiKV: Query: {#TYPE}, rate

Name	Description	Type	Key and additional info
TiKV: Query: {#TYPE}, rate	The QPS per command in TiKV instance.	Dependent item	tikv.grpc_msg.rate[{#TYPE}] Preprocessing JSON Path: `$[?(@.labels.type == "{#TYPE}")].value.first()` ⛔️Custom on fail: Set value to

The QPS per command in TiKV instance.

Dependent item

tikv.grpc_msg.rate[{#TYPE}]

Preprocessing

JSON Path: $[?(@.labels.type == "{#TYPE}")].value.first()
⛔️Custom on fail: Set value to

LLD rule Coprocessor metrics discovery

Name Description Type Key and additional info

Coprocessor metrics discovery

Name	Description	Type	Key and additional info
Coprocessor metrics discovery	Discovery coprocessor metrics.	Dependent item	tikv.coprocessor.discovery Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Discovery coprocessor metrics.

Dependent item

tikv.coprocessor.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Coprocessor metrics discovery

Name	Description	Type	Key and additional info
TiKV: Coprocessor: {#REQ_TYPE} metrics	Get metrics of {#REQ_TYPE} requests.	Dependent item	tikv.coprocessorrequest.metrics[{#REQTYPE}] Preprocessing JSON Path: `$[?(@.labels.req == "{#REQ_TYPE}")]` ⛔️Custom on fail: Discard value
TiKV: Coprocessor: {#REQ_TYPE} errors, rate	Total number of push down request error per second.	Dependent item	tikv.coprocessorrequesterror.rate[{#REQ_TYPE}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
TiKV: Coprocessor: {#REQ_TYPE} requests, rate	Total number of coprocessor requests per second.	Dependent item	tikv.coprocessorrequest.rate[{#REQTYPE}] Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
TiKV: Coprocessor: {#REQ_TYPE} scan keys, rate	Total number of scan keys observed per request per second.	Dependent item	tikv.coprocessorscankeys.rate[{#REQ_TYPE}] Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
TiKV: Coprocessor: {#REQ_TYPE} RocksDB ops, rate	Total number of RocksDB internal operations from PerfContext per second.	Dependent item	tikv.coprocessorrocksdbperf.rate[{#REQ_TYPE}] Preprocessing JSON Path: `$[?(@.name == "tikv_coprocessor_rocksdb_perf")].value.sum()` ⛔️Custom on fail: Discard value Change per second

LLD rule Scheduler metrics discovery

Name Description Type Key and additional info

Scheduler metrics discovery

Name	Description	Type	Key and additional info
Scheduler metrics discovery	Discovery scheduler metrics.	Dependent item	tikv.scheduler.discovery Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Discovery scheduler metrics.

Dependent item

tikv.scheduler.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Scheduler metrics discovery

Name Description Type Key and additional info

TiKV: Scheduler: commands {#STAGE}, rate

Name	Description	Type	Key and additional info
TiKV: Scheduler: commands {#STAGE}, rate	Total number of commands on each stage per second.	Dependent item	tikv.scheduler_stage.rate[{#STAGE}] Preprocessing JSON Path: `$[?(@.labels.stage == "{#STAGE}")].value.sum()` ⛔️Custom on fail: Set value to: `0` Change per second

Total number of commands on each stage per second.

Dependent item

tikv.scheduler_stage.rate[{#STAGE}]

Preprocessing

JSON Path: $[?(@.labels.stage == "{#STAGE}")].value.sum()
⛔️Custom on fail: Set value to: 0
Change per second

LLD rule Server errors discovery

Name Description Type Key and additional info

Server errors discovery

Name	Description	Type	Key and additional info
Server errors discovery	Discovery server errors metrics.	Dependent item	tikv.serverreportfailure.discovery Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Discovery server errors metrics.

Dependent item

tikv.serverreportfailure.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Server errors discovery

Name Description Type Key and additional info

Name	Description	Type	Key and additional info
TiKV: Storeid {#STOREID}: failure messages "{#TYPE}", rate	Total number of reporting failure messages. The metric has two labels: type and storeid. type represents the failure type, and storeid represents the destination peer store id.	Dependent item	tikv.messages.failure.rate[{#STORE_ID},{#TYPE}] Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second

TiKV: Storeid {#STOREID}: failure messages "{#TYPE}", rate

Total number of reporting failure messages. The metric has two labels: type and storeid. type represents the failure type, and storeid represents the destination peer store id.

Dependent item

tikv.messages.failure.rate[{#STORE_ID},{#TYPE}]

Preprocessing

JSON Path: The text is too long. Please see the template.
Change per second

Trigger prototypes for Server errors discovery

Name	Description	Expression	Severity	Dependencies and additional info
TiKV: Storeid {#STOREID}: Too many failure messages "{#TYPE}"	Indicates that the remote TiKV cannot be connected.	`min(/TiDB TiKV by HTTP/tikv.messages.failure.rate[{#STORE_ID},{#TYPE}],5m)>{$TIKV.STORE.ERRORS.MAX.WARN}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

db_tidb_tidb_http

View README Download JSON

TiDB by HTTP

Overview

The template to monitor TiDB server of TiDB cluster by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template TiDB by HTTP — collects metrics by HTTP agent from PD /metrics endpoint and from monitoring API. See https://docs.pingcap.com/tidb/stable/tidb-monitoring-api.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

TiDB cluster 4.0.10, 6.5.1

Configuration

Setup

This template works with TiDB server of TiDB cluster. Internal service metrics are collected from TiDB /metrics endpoint and from monitoring API. See https://docs.pingcap.com/tidb/stable/tidb-monitoring-api. Don't forget to change the macros {$TIDB.URL}, {$TIDB.PORT}. Also, see the Macros section for a list of macros used to set trigger values.

Macros used

Name	Description	Default
{$TIDB.PORT}	The port of TiDB server metrics web endpoint	`10080`
{$TIDB.URL}	TiDB server URL	`localhost`
{$TIDB.OPEN.FDS.MAX.WARN}	Maximum percentage of used file descriptors	`90`
{$TIDB.HEAP.USAGE.MAX.WARN}	Maximum heap memory used	`10G`
{$TIDB.DDL.WAITING.MAX.WARN}	Maximum number of DDL tasks that are waiting	`5`
{$TIDB.TIMEJUMPBACK.MAX.WARN}	Maximum number of times that the operating system rewinds every second	`1`
{$TIDB.SCHEMALEASEERRORS.MAX.WARN}	Maximum number of schema lease errors	`0`
{$TIDB.SCHEMALOADERRORS.MAX.WARN}	Maximum number of load schema errors	`1`
{$TIDB.GC_ACTIONS.ERRORS.MAX.WARN}	Maximum number of GC-related operations failures	`1`
{$TIDB.REGION_ERROR.MAX.WARN}	Maximum number of region related errors	`50`
{$TIDB.MONITORKEEPALIVE.MAX.WARN}	Minimum number of keep alive operations	`10`

Items

Name	Description	Type	Key and additional info
TiDB: Get instance metrics	Get TiDB instance metrics.	HTTP agent	tidb.get_metrics Preprocessing Check for not supported value ⛔️Custom on fail: Discard value Prometheus to JSON
TiDB: Get instance status	Get TiDB instance status info.	HTTP agent	tidb.get_status Preprocessing Check for not supported value ⛔️Custom on fail: Set value to: `{"status": "0"}`
TiDB: Status	Status of PD instance.	Dependent item	tidb.status Preprocessing JSON Path: `$.status` ⛔️Custom on fail: Set value to: `1` Discard unchanged with heartbeat: `1h`
TiDB: Get total server query metrics	Get information about server queries.	Dependent item	tidb.serverquery.getmetrics Preprocessing JSON Path: `$[?(@.name == "tidb_server_query_total")]` ⛔️Custom on fail: Discard value
TiDB: Total "error" server query, rate	The number of queries on TiDB instance per second with failure of command execution results.	Dependent item	tidb.server_query.error.rate Preprocessing JSON Path: `$[?(@.labels.result == "Error")].value.sum()` Change per second
TiDB: Total "ok" server query, rate	The number of queries on TiDB instance per second with success of command execution results.	Dependent item	tidb.server_query.ok.rate Preprocessing JSON Path: `$[?(@.labels.result == "OK")].value.sum()` Change per second
TiDB: Total server query, rate	The number of queries per second on TiDB instance.	Dependent item	tidb.server_query.rate Preprocessing JSON Path: `$..value.sum()` Change per second
TiDB: Get SQL statements metrics	Get SQL statements metrics.	Dependent item	tidb.statementtotal.getmetrics Preprocessing JSON Path: `$[?(@.name=="tidb_executor_statement_total")]` ⛔️Custom on fail: Discard value
TiDB: SQL statements, rate	The total number of SQL statements executed per second.	Dependent item	tidb.statement_total.rate Preprocessing JSON Path: `$..value.sum()` Change per second
TiDB: Failed Query, rate	The number of error occurred when executing SQL statements per second (such as syntax errors and primary key conflicts).	Dependent item	tidb.execute_error.rate Preprocessing JSON Path: `$[?(@.name=="tidb_server_execute_error_total")].value.sum()` ⛔️Custom on fail: Discard value Change per second
TiDB: Get TiKV client metrics	Get TiKV client metrics.	Dependent item	tidb.tikvclient.get_metrics Preprocessing JSON Path: `$[?(@.name=~"tidb_tikvclient_*")]` ⛔️Custom on fail: Discard value
TiDB: KV commands, rate	The number of executed KV commands per second.	Dependent item	tidb.tikvclient_txn.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
TiDB: PD TSO commands, rate	The number of TSO commands that TiDB obtains from PD per second.	Dependent item	tidb.pdtsocmd.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
TiDB: PD TSO requests, rate	The number of TSO requests that TiDB obtains from PD per second.	Dependent item	tidb.pdtsorequest.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
TiDB: TiClient region errors, rate	The number of region related errors returned by TiKV per second.	Dependent item	tidb.tikvclientregionerr.rate Preprocessing JSON Path: `$[?(@.name=="tidb_tikvclient_region_err_total")].value.sum()` Change per second
TiDB: Lock resolves, rate	The number of DDL tasks that are waiting.	Dependent item	tidb.tikvclientlockresolver_action.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
TiDB: DDL waiting jobs	The number of TiDB operations that resolve locks per second. When TiDB's read or write request encounters a lock, it tries to resolve the lock.	Dependent item	tidb.ddlwaitingjobs Preprocessing JSON Path: `$[?(@.name=="tidb_ddl_waiting_jobs")].value.sum()` ⛔️Custom on fail: Set value to: `0`
TiDB: Load schema total, rate	The statistics of the schemas that TiDB obtains from TiKV per second.	Dependent item	tidb.domainloadschema.rate Preprocessing JSON Path: `$[?(@.name=="tidb_domain_load_schema_total")].value.sum()` Change per second
TiDB: Load schema failed, rate	The total number of failures to reload the latest schema information in TiDB per second.	Dependent item	tidb.domainloadschema.failed.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
TiDB: Schema lease "outdate" errors , rate	The number of schema lease errors per second. "outdate" errors means that the schema cannot be updated, which is a more serious error and triggers an alert.	Dependent item	tidb.sessionschemalease_error.outdate.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
TiDB: Schema lease "change" errors, rate	The number of schema lease errors per second. "change" means that the schema has changed	Dependent item	tidb.sessionschemalease_error.change.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
TiDB: KV backoff, rate	The number of errors returned by TiKV.	Dependent item	tidb.tikvclient_backoff.rate Preprocessing JSON Path: `$[?(@.name=="tidb_tikvclient_backoff_total")].value.sum()` ⛔️Custom on fail: Discard value Change per second
TiDB: Keep alive, rate	The number of times that the metrics are refreshed on TiDB instance per minute.	Dependent item	tidb.monitorkeepalive.rate Preprocessing JSON Path: `$[?(@.name=="tidb_monitor_keep_alive_total")].value.first()` ⛔️Custom on fail: Discard value Simple change
TiDB: Server connections	The connection number of current TiDB instance.	Dependent item	tidb.tidbserverconnections Preprocessing JSON Path: `$[?(@.name=="tidb_server_connections")].value.first()`
TiDB: Heap memory usage	Number of heap bytes that are in use.	Dependent item	tidb.heap_bytes Preprocessing JSON Path: `$[?(@.name=="go_memstats_heap_inuse_bytes")].value.first()`
TiDB: RSS memory usage	Resident memory size in bytes.	Dependent item	tidb.rss_bytes Preprocessing JSON Path: `$[?(@.name=="process_resident_memory_bytes")].value.first()`
TiDB: Goroutine count	The number of Goroutines on TiDB instance.	Dependent item	tidb.goroutines Preprocessing JSON Path: `$[?(@.name=="go_goroutines")].value.first()`
TiDB: Open file descriptors	Number of open file descriptors.	Dependent item	tidb.processopenfds Preprocessing JSON Path: `$[?(@.name=="process_open_fds")].value.first()`
TiDB: Open file descriptors, max	Maximum number of open file descriptors.	Dependent item	tidb.processmaxfds Preprocessing JSON Path: `$[?(@.name=="process_max_fds")].value.first()`
TiDB: CPU	Total user and system CPU usage ratio.	Dependent item	tidb.cpu.util Preprocessing JSON Path: `$[?(@.name=="process_cpu_seconds_total")].value.first()` Change per second Custom multiplier: `100`
TiDB: Uptime	The runtime of each TiDB instance.	Dependent item	tidb.uptime Preprocessing JSON Path: `$[?(@.name=="process_start_time_seconds")].value.first()` JavaScript: `The text is too long. Please see the template.`
TiDB: Version	Version of the TiDB instance.	Dependent item	tidb.version Preprocessing JSON Path: `$.version` Discard unchanged with heartbeat: `3h`
TiDB: Time jump back, rate	The number of times that the operating system rewinds every second.	Dependent item	tidb.monitortimejump_back.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
TiDB: Server critical error, rate	The number of critical errors occurred in TiDB per second.	Dependent item	tidb.tidbservercriticalerrortotal.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
TiDB: Server panic, rate	The number of panics occurred in TiDB per second.	Dependent item	tidb.tidbserverpanic_total.rate Preprocessing JSON Path: `$[?(@.name=="tidb_server_panic_total")].value.first()` ⛔️Custom on fail: Discard value Change per second

Triggers

Name	Description	Expression	Severity
TiDB: Instance is not responding		`last(/TiDB by HTTP/tidb.status)=0`\|Average
TiDB: Too many region related errors		`min(/TiDB by HTTP/tidb.tikvclient_region_err.rate,5m)>{$TIDB.REGION_ERROR.MAX.WARN}`\|Average
TiDB: Too many DDL waiting jobs		`min(/TiDB by HTTP/tidb.ddl_waiting_jobs,5m)>{$TIDB.DDL.WAITING.MAX.WARN}`\|Warning
TiDB: Too many schema lease errors		`min(/TiDB by HTTP/tidb.domain_load_schema.failed.rate,5m)>{$TIDB.SCHEMA_LOAD_ERRORS.MAX.WARN}`\|Average
TiDB: Too many schema lease errors	The latest schema information is not reloaded in TiDB within one lease.	`min(/TiDB by HTTP/tidb.session_schema_lease_error.outdate.rate,5m)>{$TIDB.SCHEMA_LEASE_ERRORS.MAX.WARN}`\|Average
TiDB: Too few keep alive operations	Indicates whether the TiDB process still exists. If the number of times for tidbmonitorkeepalivetotal increases less than 10 per minute, the TiDB process might already exit and an alert is triggered.	`max(/TiDB by HTTP/tidb.monitor_keep_alive.rate,5m)<{$TIDB.MONITOR_KEEP_ALIVE.MAX.WARN}`\|Average
TiDB: Heap memory usage is too high		`min(/TiDB by HTTP/tidb.heap_bytes,5m)>{$TIDB.HEAP.USAGE.MAX.WARN}`\|Warning
TiDB: Current number of open files is too high	Heavy file descriptor usage (i.e., near the process's file descriptor limit) indicates a potential file descriptor exhaustion issue.	`min(/TiDB by HTTP/tidb.process_open_fds,5m)/last(/TiDB by HTTP/tidb.process_max_fds)*100>{$TIDB.OPEN.FDS.MAX.WARN}`\|Warning
TiDB: has been restarted	Uptime is less than 10 minutes.	`last(/TiDB by HTTP/tidb.uptime)<10m`\|Info	Manual close: Yes
TiDB: Version has changed	TiDB version has changed. Acknowledge to close the problem manually.	`last(/TiDB by HTTP/tidb.version,#1)<>last(/TiDB by HTTP/tidb.version,#2) and length(last(/TiDB by HTTP/tidb.version))>0`\|Info	Manual close: Yes
TiDB: Too many time jump backs		`min(/TiDB by HTTP/tidb.monitor_time_jump_back.rate,5m)>{$TIDB.TIME_JUMP_BACK.MAX.WARN}`\|Warning
TiDB: There are panicked TiDB threads	When a panic occurs, an alert is triggered. The thread is often recovered, otherwise, TiDB will frequently restart.	`last(/TiDB by HTTP/tidb.tidb_server_panic_total.rate)>0`\|Average

LLD rule QPS metrics discovery

Name Description Type Key and additional info

QPS metrics discovery

Name	Description	Type	Key and additional info
QPS metrics discovery	Discovery QPS specific metrics.	Dependent item	tidb.qps.discovery Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Discovery QPS specific metrics.

Dependent item

tidb.qps.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for QPS metrics discovery

Name Description Type Key and additional info

TiDB: Get QPS metrics: {#TYPE}

Name	Description	Type	Key and additional info
TiDB: Get QPS metrics: {#TYPE}	Get QPS metrics of {#TYPE}.	Dependent item	tidb.qps.get_metrics[{#TYPE}] Preprocessing JSON Path: `$[?(@.labels.type == "{#TYPE}")]` ⛔️Custom on fail: Discard value
TiDB: Server query "OK": {#TYPE}, rate	The number of queries on TiDB instance per second with success of command execution results.	Dependent item	tidb.server_query.ok.rate[{#TYPE}] Preprocessing JSON Path: `$[?(@.labels.result == "OK")].value.first()` Change per second
TiDB: Server query "Error": {#TYPE}, rate	The number of queries on TiDB instance per second with failure of command execution results.	Dependent item	tidb.server_query.error.rate[{#TYPE}] Preprocessing JSON Path: `$[?(@.labels.result == "Error")].value.first()` Change per second

Get QPS metrics of {#TYPE}.

Dependent item

tidb.qps.get_metrics[{#TYPE}]

Preprocessing

JSON Path: $[?(@.labels.type == "{#TYPE}")]
⛔️Custom on fail: Discard value

TiDB: Server query "OK": {#TYPE}, rate

The number of queries on TiDB instance per second with success of command execution results.

Dependent item

tidb.server_query.ok.rate[{#TYPE}]

Preprocessing

JSON Path: $[?(@.labels.result == "OK")].value.first()
Change per second

TiDB: Server query "Error": {#TYPE}, rate

The number of queries on TiDB instance per second with failure of command execution results.

Dependent item

tidb.server_query.error.rate[{#TYPE}]

Preprocessing

JSON Path: $[?(@.labels.result == "Error")].value.first()
Change per second

LLD rule Statement metrics discovery

Name Description Type Key and additional info

Statement metrics discovery

Name	Description	Type	Key and additional info
Statement metrics discovery	Discovery statement specific metrics.	Dependent item	tidb.statement.discover Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Discovery statement specific metrics.

Dependent item

tidb.statement.discover

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Statement metrics discovery

Name Description Type Key and additional info

TiDB: SQL statements: {#TYPE}, rate

Name	Description	Type	Key and additional info
TiDB: SQL statements: {#TYPE}, rate	The number of SQL statements executed per second.	Dependent item	tidb.statement.rate[{#TYPE}] Preprocessing JSON Path: `$[?(@.labels.type == "{#TYPE}")].value.first()` Change per second

The number of SQL statements executed per second.

Dependent item

tidb.statement.rate[{#TYPE}]

Preprocessing

JSON Path: $[?(@.labels.type == "{#TYPE}")].value.first()
Change per second

LLD rule KV metrics discovery

Name Description Type Key and additional info

KV metrics discovery

Name	Description	Type	Key and additional info
KV metrics discovery	Discovery KV specific metrics.	Dependent item	tidb.kv_ops.discovery Preprocessing JSON Path: `The text is too long. Please see the template.` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Discovery KV specific metrics.

Dependent item

tidb.kv_ops.discovery

Preprocessing

JSON Path: The text is too long. Please see the template.
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for KV metrics discovery

Name Description Type Key and additional info

TiDB: KV Commands: {#TYPE}, rate

Name	Description	Type	Key and additional info
TiDB: KV Commands: {#TYPE}, rate	The number of executed KV commands per second.	Dependent item	tidb.tikvclient_txn.rate[{#TYPE}] Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second

The number of executed KV commands per second.

Dependent item

tidb.tikvclient_txn.rate[{#TYPE}]

Preprocessing

JSON Path: The text is too long. Please see the template.
Change per second

LLD rule Lock resolves discovery

Name Description Type Key and additional info

Lock resolves discovery

Name	Description	Type	Key and additional info
Lock resolves discovery	Discovery lock resolves specific metrics.	Dependent item	tidb.tikvclientlockresolver_action.discovery Preprocessing JSON Path: `$[?(@.name=="tidb_tikvclient_lock_resolver_actions_total")]` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Discovery lock resolves specific metrics.

Dependent item

tidb.tikvclientlockresolver_action.discovery

Preprocessing

JSON Path: $[?(@.name=="tidb_tikvclient_lock_resolver_actions_total")]
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Lock resolves discovery

Name Description Type Key and additional info

TiDB: Lock resolves: {#TYPE}, rate

Name	Description	Type	Key and additional info
TiDB: Lock resolves: {#TYPE}, rate	The number of TiDB operations that resolve locks per second. When TiDB's read or write request encounters a lock, it tries to resolve the lock.	Dependent item	tidb.tikvclientlockresolver_action.rate[{#TYPE}] Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second

The number of TiDB operations that resolve locks per second. When TiDB's read or write request encounters a lock, it tries to resolve the lock.

Dependent item

tidb.tikvclientlockresolver_action.rate[{#TYPE}]

Preprocessing

JSON Path: The text is too long. Please see the template.
Change per second

LLD rule KV backoff discovery

Name Description Type Key and additional info

KV backoff discovery

Name	Description	Type	Key and additional info
KV backoff discovery	Discovery KV backoff specific metrics.	Dependent item	tidb.tikvclient_backoff.discovery Preprocessing JSON Path: `$[?(@.name=="tidb_tikvclient_backoff_total")]` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Discovery KV backoff specific metrics.

Dependent item

tidb.tikvclient_backoff.discovery

Preprocessing

JSON Path: $[?(@.name=="tidb_tikvclient_backoff_total")]
⛔️Custom on fail: Discard value
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for KV backoff discovery

Name Description Type Key and additional info

TiDB: KV backoff: {#TYPE}, rate

Name	Description	Type	Key and additional info
TiDB: KV backoff: {#TYPE}, rate	The number of TiDB operations that resolve locks per second. When TiDB's read or write request encounters a lock, it tries to resolve the lock.	Dependent item	tidb.tikvclient_backoff.rate[{#TYPE}] Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second

The number of TiDB operations that resolve locks per second. When TiDB's read or write request encounters a lock, it tries to resolve the lock.

Dependent item

tidb.tikvclient_backoff.rate[{#TYPE}]

Preprocessing

JSON Path: The text is too long. Please see the template.
Change per second

LLD rule GC action results discovery

Name Description Type Key and additional info

GC action results discovery

Name	Description	Type	Key and additional info
GC action results discovery	Discovery GC action results metrics.	Dependent item	tidb.tikvclientgcaction.discovery Preprocessing JSON Path: `$[?(@.name=="tidb_tikvclient_gc_action_result")]` ⛔️Custom on fail: Discard value JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Discovery GC action results metrics.

Dependent item

tidb.tikvclientgcaction.discovery

Preprocessing

JSON Path: $[?(@.name=="tidb_tikvclient_gc_action_result")]
⛔️Custom on fail: Discard value
JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for GC action results discovery

Name Description Type Key and additional info

TiDB: GC action result: {#TYPE}, rate

Name	Description	Type	Key and additional info
TiDB: GC action result: {#TYPE}, rate	The number of results of GC-related operations per second.	Dependent item	tidb.tikvclientgcaction.rate[{#TYPE}] Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second

The number of results of GC-related operations per second.

Dependent item

tidb.tikvclientgcaction.rate[{#TYPE}]

Preprocessing

JSON Path: The text is too long. Please see the template.
Change per second

Trigger prototypes for GC action results discovery

Name	Description	Expression	Severity	Dependencies and additional info
TiDB: Too many failed GC-related operations		`min(/TiDB by HTTP/tidb.tikvclient_gc_action.rate[{#TYPE}],5m)>{$TIDB.GC_ACTIONS.ERRORS.MAX.WARN}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

db_tidb_pd_http

View README Download JSON

TiDB PD by HTTP

Overview

The template to monitor PD server of TiDB cluster by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template TiDB PD by HTTP — collects metrics by HTTP agent from PD /metrics endpoint and from monitoring API. See https://docs.pingcap.com/tidb/stable/tidb-monitoring-api.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

TiDB cluster 4.0.10, 6.5.1

Configuration

Setup

This template works with PD server of TiDB cluster. Internal service metrics are collected from PD /metrics endpoint and from monitoring API. See https://docs.pingcap.com/tidb/stable/tidb-monitoring-api. Don't forget to change the macros {$PD.URL}, {$PD.PORT}. Also, see the Macros section for a list of macros used to set trigger values.

Macros used

Name	Description	Default
{$PD.PORT}	The port of PD server metrics web endpoint	`2379`
{$PD.URL}	PD server URL	`localhost`
{$PD.MISS_REGION.MAX.WARN}	Maximum number of missed regions	`100`
{$PD.STORAGE_USAGE.MAX.WARN}	Maximum percentage of cluster space used	`80`

Items

Name	Description	Type	Key and additional info
PD: Get instance metrics	Get TiDB PD instance metrics.	HTTP agent	pd.get_metrics Preprocessing Check for not supported value ⛔️Custom on fail: Discard value Prometheus to JSON
PD: Get instance status	Get TiDB PD instance status info.	HTTP agent	pd.get_status Preprocessing Check for not supported value ⛔️Custom on fail: Set value to: `{"status": "0"}`
PD: Status	Status of PD instance.	Dependent item	pd.status Preprocessing JSON Path: `$.status` ⛔️Custom on fail: Set value to: `1` Discard unchanged with heartbeat: `1h`
PD: gRPC Commands total, rate	The rate at which gRPC commands are completed.	Dependent item	pd.grpc_command.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
PD: Version	Version of the PD instance.	Dependent item	pd.version Preprocessing JSON Path: `$.version` Discard unchanged with heartbeat: `3h`
PD: Uptime	The runtime of each PD instance.	Dependent item	pd.uptime Preprocessing JSON Path: `$.start_timestamp` JavaScript: `The text is too long. Please see the template.`
PD: Get cluster metrics	Get cluster metrics.	Dependent item	pd.clusterstatus.getmetrics Preprocessing JSON Path: `$[?(@.name == "pd_cluster_status")]` ⛔️Custom on fail: Discard value
PD: Get region metrics	Get region metrics.	Dependent item	pd.regions.get_metrics Preprocessing JSON Path: `$[?(@.name == "pd_scheduler_region_heartbeat")]` ⛔️Custom on fail: Discard value
PD: Get region label metrics	Get region label metrics.	Dependent item	pd.regionlabels.getmetrics Preprocessing JSON Path: `$[?(@.name == "pd_regions_label_level")]` ⛔️Custom on fail: Discard value
PD: Get region status metrics	Get region status metrics.	Dependent item	pd.regionstatus.getmetrics Preprocessing JSON Path: `$[?(@.name == "pd_regions_status")]` ⛔️Custom on fail: Discard value
PD: Get gRPC command metrics	Get gRPC command metrics.	Dependent item	pd.grpccommands.getmetrics Preprocessing JSON Path: `$[?(@.name == "grpc_server_handling_seconds_count")]` ⛔️Custom on fail: Discard value
PD: Get scheduler metrics	Get scheduler metrics.	Dependent item	pd.scheduler.get_metrics Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value

Triggers

Name	Description	Expression	Severity
PD: Instance is not responding		`last(/TiDB PD by HTTP/pd.status)=0`\|Average
PD: Version has changed	PD version has changed. Acknowledge to close the problem manually.	`last(/TiDB PD by HTTP/pd.version,#1)<>last(/TiDB PD by HTTP/pd.version,#2) and length(last(/TiDB PD by HTTP/pd.version))>0`\|Info	Manual close: Yes
PD: has been restarted	Uptime is less than 10 minutes.	`last(/TiDB PD by HTTP/pd.uptime)<10m`\|Info	Manual close: Yes

LLD rule Cluster metrics discovery

Name Description Type Key and additional info

Cluster metrics discovery

Name	Description	Type	Key and additional info
Cluster metrics discovery	Discovery cluster specific metrics.	Dependent item	pd.cluster.discovery Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Discovery cluster specific metrics.

Dependent item

pd.cluster.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Cluster metrics discovery

Name	Description	Type	Key and additional info
TiDB cluster: Offline stores		Dependent item	pd.clusterstatus.storeoffline[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.labels.type == "store_offline_count")].value.first()` Discard unchanged with heartbeat: `1h`
TiDB cluster: Tombstone stores	The count of tombstone stores.	Dependent item	pd.clusterstatus.storetombstone[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.labels.type == "store_tombstone_count")].value.first()` Discard unchanged with heartbeat: `1h`
TiDB cluster: Down stores	The count of down stores.	Dependent item	pd.clusterstatus.storedown[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.labels.type == "store_down_count")].value.first()` Discard unchanged with heartbeat: `1h`
TiDB cluster: Lowspace stores	The count of low space stores.	Dependent item	pd.clusterstatus.storelow_space[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.labels.type == "store_low_space_count")].value.first()` Discard unchanged with heartbeat: `1h`
TiDB cluster: Unhealth stores	The count of unhealthy stores.	Dependent item	pd.clusterstatus.storeunhealth[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.labels.type == "store_unhealth_count")].value.first()` Discard unchanged with heartbeat: `1h`
TiDB cluster: Disconnect stores	The count of disconnected stores.	Dependent item	pd.clusterstatus.storedisconnected[{#SINGLETON}] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
TiDB cluster: Normal stores	The count of healthy storage instances.	Dependent item	pd.clusterstatus.storeup[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.labels.type == "store_up_count")].value.first()` Discard unchanged with heartbeat: `1h`
TiDB cluster: Storage capacity	The total storage capacity for this TiDB cluster.	Dependent item	pd.clusterstatus.storagecapacity[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.labels.type == "storage_capacity")].value.first()` Discard unchanged with heartbeat: `1h`
TiDB cluster: Storage size	The storage size that is currently used by the TiDB cluster.	Dependent item	pd.clusterstatus.storagesize[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.labels.type == "storage_size")].value.first()`
TiDB cluster: Number of regions	The total count of cluster Regions.	Dependent item	pd.clusterstatus.leadercount[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.labels.type == "leader_count")].value.first()`
TiDB cluster: Current peer count	The current count of all cluster peers.	Dependent item	pd.clusterstatus.regioncount[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.labels.type == "region_count")].value.first()`

Trigger prototypes for Cluster metrics discovery

Name	Description	Expression
TiDB cluster: There are offline TiKV nodes	PD has not received a TiKV heartbeat for a long time.	`last(/TiDB PD by HTTP/pd.cluster_status.store_down[{#SINGLETON}])>0`\|Average
TiDB cluster: There are low space TiKV nodes	Indicates that there is no sufficient space on the TiKV node.	`last(/TiDB PD by HTTP/pd.cluster_status.store_low_space[{#SINGLETON}])>0`\|Average
TiDB cluster: There are disconnected TiKV nodes	PD does not receive a TiKV heartbeat within 20 seconds. Normally a TiKV heartbeat comes in every 10 seconds.	`last(/TiDB PD by HTTP/pd.cluster_status.store_disconnected[{#SINGLETON}])>0`\|Warning
TiDB cluster: Current storage usage is too high	Over {$PD.STORAGE_USAGE.MAX.WARN}% of the cluster space is occupied.	`min(/TiDB PD by HTTP/pd.cluster_status.storage_size[{#SINGLETON}],5m)/last(/TiDB PD by HTTP/pd.cluster_status.storage_capacity[{#SINGLETON}])*100>{$PD.STORAGE_USAGE.MAX.WARN}`\|Warning

LLD rule Region labels discovery

Name Description Type Key and additional info

Region labels discovery

Name	Description	Type	Key and additional info
Region labels discovery	Discovery region labels specific metrics.	Dependent item	pd.region_labels.discovery Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Discovery region labels specific metrics.

Dependent item

pd.region_labels.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Region labels discovery

Name Description Type Key and additional info

TiDB cluster: Regions label: {#TYPE}

Name	Description	Type	Key and additional info
TiDB cluster: Regions label: {#TYPE}	The number of Regions in different label levels.	Dependent item	pd.region_labels[{#TYPE}] Preprocessing JSON Path: `$[?(@.labels.type == "{#TYPE}")].value.first()`

The number of Regions in different label levels.

Dependent item

pd.region_labels[{#TYPE}]

Preprocessing

JSON Path: $[?(@.labels.type == "{#TYPE}")].value.first()

LLD rule Region status discovery

Name Description Type Key and additional info

Region status discovery

Name	Description	Type	Key and additional info
Region status discovery	Discovery region status specific metrics.	Dependent item	pd.region_status.discovery Preprocessing JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Discovery region status specific metrics.

Dependent item

pd.region_status.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Region status discovery

Name Description Type Key and additional info

TiDB cluster: Regions status: {#TYPE}

Name	Description	Type	Key and additional info
TiDB cluster: Regions status: {#TYPE}	The health status of Regions indicated via the count of unusual Regions including pending peers, down peers, extra peers, offline peers, missing peers, learner peers and incorrect namespaces.	Dependent item	pd.region_status[{#TYPE}] Preprocessing JSON Path: `$[?(@.labels.type == "{#TYPE}")].value.first()`

The health status of Regions indicated via the count of unusual Regions including pending peers, down peers, extra peers, offline peers, missing peers, learner peers and incorrect namespaces.

Dependent item

pd.region_status[{#TYPE}]

Preprocessing

JSON Path: $[?(@.labels.type == "{#TYPE}")].value.first()

Trigger prototypes for Region status discovery

Name	Description	Expression	Severity	Dependencies and additional info
TiDB cluster: Too many missed regions	The number of Region replicas is smaller than the value of max-replicas. When a TiKV machine is down and its downtime exceeds max-down-time, it usually leads to missing replicas for some Regions during a period of time. When a TiKV node is made offline, it might result in a small number of Regions with missing replicas.	`min(/TiDB PD by HTTP/pd.region_status[{#TYPE}],5m)>{$PD.MISS_REGION.MAX.WARN}`\|Warning
TiDB cluster: There are unresponsive peers	The number of Regions with an unresponsive peer reported by the Raft leader.	`min(/TiDB PD by HTTP/pd.region_status[{#TYPE}],5m)>0`\|Warning

LLD rule Running scheduler discovery

Name Description Type Key and additional info

Running scheduler discovery

Discovery scheduler specific metrics.

Dependent item

pd.scheduler.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Running scheduler discovery

Name Description Type Key and additional info

TiDB cluster: Scheduler status: {#KIND}

The current running schedulers.

Dependent item

pd.scheduler[{#KIND}]

Preprocessing

JSON Path: $[?(@.labels.kind == "{#KIND}")].value.first()
⛔️Custom on fail: Set value to: 0

LLD rule gRPC commands discovery

Name Description Type Key and additional info

gRPC commands discovery

Discovery grpc commands specific metrics.

Dependent item

pd.grpc_command.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for gRPC commands discovery

Name Description Type Key and additional info

PD: gRPC Commands: {#GRPC_METHOD}, rate

The rate per command type at which gRPC commands are completed.

Dependent item

pd.grpccommand.rate[{#GRPCMETHOD}]

Preprocessing

JSON Path: $[?(@.labels.grpc_method == "{#GRPC_METHOD}")].value.first()
Change per second

LLD rule Region discovery

Name Description Type Key and additional info

Region discovery

Discovery region specific metrics.

Dependent item

pd.region.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Region discovery

Name	Description	Type	Key and additional info
PD: Get metrics: {#STORE_ADDRESS}	Get region metrics for {#STORE_ADDRESS}.	Dependent item	pd.regionheartbeat.getmetrics[{#STORE_ADDRESS}] Preprocessing JSON Path: `$[?(@.labels.address == "{#STORE_ADDRESS}")]` ⛔️Custom on fail: Discard value
PD: Region heartbeat: active, rate	The count of heartbeats with the ok status per second.	Dependent item	pd.regionheartbeat.ok.rate[{#STOREADDRESS}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0` Change per second
PD: Region heartbeat: error, rate	The count of heartbeats with the error status per second.	Dependent item	pd.regionheartbeat.error.rate[{#STOREADDRESS}] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0` Change per second
PD: Region heartbeat: total, rate	The count of heartbeats reported to PD per instance per second.	Dependent item	pd.regionheartbeat.rate[{#STOREADDRESS}] Preprocessing JSON Path: `$[?(@.labels.type == "report")].value.sum()` ⛔️Custom on fail: Set value to: `0` Change per second
PD: Region schedule push: total, rate		Dependent item	pd.regionheartbeat.push.err.rate[{#STOREADDRESS}] Preprocessing JSON Path: `$[?(@.labels.type == "push")].value.sum()` ⛔️Custom on fail: Set value to: `0` Change per second

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

db_redis

View README Download JSON

Redis by Zabbix agent 2

Overview

This template is designed for the effortless deployment of Redis monitoring by Zabbix via Zabbix agent 2 and doesn't require any external scripts.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

Redis, version 3.0.6, 4.0.14, 5.0.6, 7.2.4

Configuration

Setup

Setup and configure zabbix-agent2 compiled with the Redis monitoring plugin.

Redis' default user should have permissions to run CONFIG, INFO, PING, CLIENT and SLOWLOG commands.

Or default user ACL should have @admin, @slow, @dangerous, @fast and @connection categories.

Test availability: zabbix_get -s 127.0.0.1 -k redis.ping[tcp://127.0.0.1:6379]

Macros used

Name	Description	Default
{$REDIS.CONN.URI}	Connection string in the URI format (password is not used). This param overwrites a value configured in the "Server" option of the configuration file (if it's set), otherwise, the plugin's default value is used: "tcp://localhost:6379"	`tcp://localhost:6379`
{$REDIS.PROCESS_NAME}	Redis server process name	`redis-server`
{$REDIS.LLD.PROCESS_NAME}	Redis server process name for LLD	`redis-server`
{$REDIS.LLD.FILTER.DB.MATCHES}	Filter of discoverable databases	`.*`
{$REDIS.LLD.FILTER.DB.NOT_MATCHES}	Filter to exclude discovered databases	`CHANGE_IF_NEEDED`
{$REDIS.REPL.LAG.MAX.WARN}	Maximum replication lag in seconds	`30s`
{$REDIS.SLOWLOG.COUNT.MAX.WARN}	Maximum number of slowlog entries per second	`1`
{$REDIS.CLIENTS.PRC.MAX.WARN}	Maximum percentage of connected clients	`80`
{$REDIS.MEM.PUSED.MAX.WARN}	Maximum percentage of memory used	`90`
{$REDIS.MEM.FRAG_RATIO.MAX.WARN}	Maximum memory fragmentation ratio	`1.5`

Items

Name	Description	Type	Key and additional info
Redis: Get info		Zabbix agent	redis.info["{$REDIS.CONN.URI}"]
Redis: Get config		Zabbix agent	redis.config["{$REDIS.CONN.URI}"] Preprocessing Discard unchanged with heartbeat: `1h`
Redis: Ping		Zabbix agent	redis.ping["{$REDIS.CONN.URI}"] Preprocessing Discard unchanged with heartbeat: `10m`
Redis: Slowlog entries per second		Zabbix agent	redis.slowlog.count["{$REDIS.CONN.URI}"] Preprocessing Change per second
Redis: Get Clients info		Dependent item	redis.clients.info_raw Preprocessing JSON Path: `$.Clients` ⛔️Custom on fail: Discard value
Redis: Get CPU info		Dependent item	redis.cpu.info_raw Preprocessing JSON Path: `$.CPU` ⛔️Custom on fail: Discard value
Redis: Get Keyspace info		Dependent item	redis.keyspace.info_raw Preprocessing JSON Path: `$.Keyspace` ⛔️Custom on fail: Discard value
Redis: Get Memory info		Dependent item	redis.memory.info_raw Preprocessing JSON Path: `$.Memory` ⛔️Custom on fail: Discard value
Redis: Get Persistence info		Dependent item	redis.persistence.info_raw Preprocessing JSON Path: `$.Persistence` ⛔️Custom on fail: Discard value
Redis: Get Replication info		Dependent item	redis.replication.info_raw Preprocessing JSON Path: `$.Replication` ⛔️Custom on fail: Discard value
Redis: Get Server info		Dependent item	redis.server.info_raw Preprocessing JSON Path: `$.Server` ⛔️Custom on fail: Discard value
Redis: Get Stats info		Dependent item	redis.stats.info_raw Preprocessing JSON Path: `$.Stats` ⛔️Custom on fail: Discard value
Redis: CPU sys	System CPU consumed by the Redis server	Dependent item	redis.cpu.sys Preprocessing JSON Path: `$.used_cpu_sys`
Redis: CPU sys children	System CPU consumed by the background processes	Dependent item	redis.cpu.sys_children Preprocessing JSON Path: `$.used_cpu_sys_children`
Redis: CPU user	User CPU consumed by the Redis server	Dependent item	redis.cpu.user Preprocessing JSON Path: `$.used_cpu_user`
Redis: CPU user children	User CPU consumed by the background processes	Dependent item	redis.cpu.user_children Preprocessing JSON Path: `$.used_cpu_user_children`
Redis: Blocked clients	The number of connections waiting on a blocking call	Dependent item	redis.clients.blocked Preprocessing JSON Path: `$.blocked_clients`
Redis: Max input buffer	The biggest input buffer among current client connections	Dependent item	redis.clients.maxinputbuffer Preprocessing JavaScript: `The text is too long. Please see the template.`
Redis: Max output buffer	The biggest output buffer among current client connections	Dependent item	redis.clients.maxoutputbuffer Preprocessing JavaScript: `The text is too long. Please see the template.`
Redis: Connected clients	The number of connected clients	Dependent item	redis.clients.connected Preprocessing JSON Path: `$.connected_clients`
Redis: Cluster enabled	Indicate Redis cluster is enabled	Dependent item	redis.cluster.enabled Preprocessing JSON Path: `$.Cluster.cluster_enabled`
Redis: Memory used	Total number of bytes allocated by Redis using its allocator	Dependent item	redis.memory.used_memory Preprocessing JSON Path: `$.used_memory`
Redis: Memory used Lua	Amount of memory used by the Lua engine	Dependent item	redis.memory.usedmemorylua Preprocessing JSON Path: `$.used_memory_lua`
Redis: Memory used peak	Peak memory consumed by Redis (in bytes)	Dependent item	redis.memory.usedmemorypeak Preprocessing JSON Path: `$.used_memory_peak`
Redis: Memory used RSS	Number of bytes that Redis allocated as seen by the operating system	Dependent item	redis.memory.usedmemoryrss Preprocessing JSON Path: `$.used_memory_rss`
Redis: Memory fragmentation ratio	This ratio is an indication of memory mapping efficiency: - Value over 1.0 indicate that memory fragmentation is very likely. Consider restarting the Redis server so the operating system can recover fragmented memory, especially with a ratio over 1.5. - Value under 1.0 indicate that Redis likely has insufficient memory available. Consider optimizing memory usage or adding more RAM. Note: If your peak memory usage is much higher than your current memory usage, the memory fragmentation ratio may be unreliable. https://redis.io/topics/memory-optimization	Dependent item	redis.memory.fragmentation_ratio Preprocessing JSON Path: `$.mem_fragmentation_ratio`
Redis: AOF current rewrite time sec	Duration of the on-going AOF rewrite operation if any	Dependent item	redis.persistence.aofcurrentrewritetimesec Preprocessing JSON Path: `$.aof_current_rewrite_time_sec`
Redis: AOF enabled	Flag indicating AOF logging is activated	Dependent item	redis.persistence.aof_enabled Preprocessing JSON Path: `$.aof_enabled`
Redis: AOF last bgrewrite status	Status of the last AOF rewrite operation	Dependent item	redis.persistence.aoflastbgrewrite_status Preprocessing JSON Path: `$.aof_last_bgrewrite_status` Boolean to decimal
Redis: AOF last rewrite time sec	Duration of the last AOF rewrite	Dependent item	redis.persistence.aoflastrewritetimesec Preprocessing JSON Path: `$.aof_last_rewrite_time_sec`
Redis: AOF last write status	Status of the last write operation to the AOF	Dependent item	redis.persistence.aoflastwrite_status Preprocessing JSON Path: `$.aof_last_write_status` Boolean to decimal
Redis: AOF rewrite in progress	Flag indicating an AOF rewrite operation is on-going	Dependent item	redis.persistence.aofrewritein_progress Preprocessing JSON Path: `$.aof_rewrite_in_progress`
Redis: AOF rewrite scheduled	Flag indicating an AOF rewrite operation will be scheduled once the on-going RDB save is complete	Dependent item	redis.persistence.aofrewritescheduled Preprocessing JSON Path: `$.aof_rewrite_scheduled`
Redis: Dump loading	Flag indicating if the load of a dump file is on-going	Dependent item	redis.persistence.loading Preprocessing JSON Path: `$.loading`
Redis: RDB bgsave in progress	"1" if bgsave is in progress and "0" otherwise	Dependent item	redis.persistence.rdbbgsavein_progress Preprocessing JSON Path: `$.rdb_bgsave_in_progress`
Redis: RDB changes since last save	Number of changes since the last background save	Dependent item	redis.persistence.rdbchangessincelastsave Preprocessing JSON Path: `$.rdb_changes_since_last_save`
Redis: RDB current bgsave time sec	Duration of the on-going RDB save operation if any	Dependent item	redis.persistence.rdbcurrentbgsavetimesec Preprocessing JSON Path: `$.rdb_current_bgsave_time_sec`
Redis: RDB last bgsave status	Status of the last RDB save operation	Dependent item	redis.persistence.rdblastbgsave_status Preprocessing JSON Path: `$.rdb_last_bgsave_status` Boolean to decimal
Redis: RDB last bgsave time sec	Duration of the last bg_save operation	Dependent item	redis.persistence.rdblastbgsavetimesec Preprocessing JSON Path: `$.rdb_last_bgsave_time_sec`
Redis: RDB last save time	Epoch-based timestamp of last successful RDB save	Dependent item	redis.persistence.rdblastsave_time Preprocessing JSON Path: `$.rdb_last_save_time`
Redis: Connected slaves	Number of connected slaves	Dependent item	redis.replication.connected_slaves Preprocessing JSON Path: `$.connected_slaves`
Redis: Replication backlog active	Flag indicating replication backlog is active	Dependent item	redis.replication.replbacklogactive Preprocessing JSON Path: `$.repl_backlog_active`
Redis: Replication backlog first byte offset	The master offset of the replication backlog buffer	Dependent item	redis.replication.replbacklogfirstbyteoffset Preprocessing JSON Path: `$.repl_backlog_first_byte_offset`
Redis: Replication backlog history length	Amount of data in the backlog sync buffer	Dependent item	redis.replication.replbackloghistlen Preprocessing JSON Path: `$.repl_backlog_histlen`
Redis: Replication backlog size	Total size in bytes of the replication backlog buffer	Dependent item	redis.replication.replbacklogsize Preprocessing JSON Path: `$.repl_backlog_size`
Redis: Replication role	Value is "master" if the instance is replica of no one, or "slave" if the instance is a replica of some master instance. Note that a replica can be master of another replica (chained replication).	Dependent item	redis.replication.role Preprocessing JSON Path: `$.role` Discard unchanged with heartbeat: `1d`
Redis: Master replication offset	Replication offset reported by the master	Dependent item	redis.replication.masterreploffset Preprocessing JSON Path: `$.master_repl_offset`
Redis: Process id	PID of the server process	Dependent item	redis.server.process_id Preprocessing JSON Path: `$.process_id` Discard unchanged with heartbeat: `1d`
Redis: Redis mode	The server's mode ("standalone", "sentinel" or "cluster")	Dependent item	redis.server.redis_mode Preprocessing JSON Path: `$.redis_mode` Discard unchanged with heartbeat: `1d`
Redis: Redis version	Version of the Redis server	Dependent item	redis.server.redis_version Preprocessing JSON Path: `$.redis_version` Discard unchanged with heartbeat: `1d`
Redis: TCP port	TCP/IP listen port	Dependent item	redis.server.tcp_port Preprocessing JSON Path: `$.tcp_port` Discard unchanged with heartbeat: `1d`
Redis: Uptime	Number of seconds since Redis server start	Dependent item	redis.server.uptime Preprocessing JSON Path: `$.uptime_in_seconds`
Redis: Evicted keys	Number of evicted keys due to maxmemory limit	Dependent item	redis.stats.evicted_keys Preprocessing JSON Path: `$.evicted_keys`
Redis: Expired keys	Total number of key expiration events	Dependent item	redis.stats.expired_keys Preprocessing JSON Path: `$.expired_keys`
Redis: Instantaneous input bytes per second	The network's read rate per second in KB/sec	Dependent item	redis.stats.instantaneous_input.rate Preprocessing JSON Path: `$.instantaneous_input_kbps` Custom multiplier: `1024`
Redis: Instantaneous operations per sec	Number of commands processed per second	Dependent item	redis.stats.instantaneous_ops.rate Preprocessing JSON Path: `$.instantaneous_ops_per_sec`
Redis: Instantaneous output bytes per second	The network's write rate per second in KB/sec	Dependent item	redis.stats.instantaneous_output.rate Preprocessing JSON Path: `$.instantaneous_output_kbps` Custom multiplier: `1024`
Redis: Keyspace hits	Number of successful lookup of keys in the main dictionary	Dependent item	redis.stats.keyspace_hits Preprocessing JSON Path: `$.keyspace_hits`
Redis: Keyspace misses	Number of failed lookup of keys in the main dictionary	Dependent item	redis.stats.keyspace_misses Preprocessing JSON Path: `$.keyspace_misses`
Redis: Latest fork usec	Duration of the latest fork operation in microseconds	Dependent item	redis.stats.latestforkusec Preprocessing JSON Path: `$.latest_fork_usec` Custom multiplier: `1e-05`
Redis: Migrate cached sockets	The number of sockets open for MIGRATE purposes	Dependent item	redis.stats.migratecachedsockets Preprocessing JSON Path: `$.migrate_cached_sockets`
Redis: Pubsub channels	Global number of pub/sub channels with client subscriptions	Dependent item	redis.stats.pubsub_channels Preprocessing JSON Path: `$.pubsub_channels`
Redis: Pubsub patterns	Global number of pub/sub pattern with client subscriptions	Dependent item	redis.stats.pubsub_patterns Preprocessing JSON Path: `$.pubsub_patterns`
Redis: Rejected connections	Number of connections rejected because of maxclients limit	Dependent item	redis.stats.rejected_connections Preprocessing JSON Path: `$.rejected_connections`
Redis: Sync full	The number of full resyncs with replicas	Dependent item	redis.stats.sync_full Preprocessing JSON Path: `$.sync_full`
Redis: Sync partial err	The number of denied partial resync requests	Dependent item	redis.stats.syncpartialerr Preprocessing JSON Path: `$.sync_partial_err`
Redis: Sync partial ok	The number of accepted partial resync requests	Dependent item	redis.stats.syncpartialok Preprocessing JSON Path: `$.sync_partial_ok`
Redis: Total commands processed	Total number of commands processed by the server	Dependent item	redis.stats.totalcommandsprocessed Preprocessing JSON Path: `$.total_commands_processed`
Redis: Total connections received	Total number of connections accepted by the server	Dependent item	redis.stats.totalconnectionsreceived Preprocessing JSON Path: `$.total_connections_received`
Redis: Total net input bytes	The total number of bytes read from the network	Dependent item	redis.stats.totalnetinput_bytes Preprocessing JSON Path: `$.total_net_input_bytes`
Redis: Total net output bytes	The total number of bytes written to the network	Dependent item	redis.stats.totalnetoutput_bytes Preprocessing JSON Path: `$.total_net_output_bytes`
Redis: Max clients	Max number of connected clients at the same time. Once the limit is reached Redis will close all the new connections sending an error "max number of clients reached".	Dependent item	redis.config.maxclients Preprocessing JSON Path: `$.maxclients` Discard unchanged with heartbeat: `30m`

Triggers

Name	Description	Expression	Severity
Redis: Failed to fetch info data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/Redis by Zabbix agent 2/redis.info["{$REDIS.CONN.URI}"],30m)=1`\|Warning	Manual close: Yes Depends on: Redis: Service is down
Redis: Configuration has changed	Redis configuration has changed. Acknowledge to close the problem manually.	`last(/Redis by Zabbix agent 2/redis.config["{$REDIS.CONN.URI}"],#1)<>last(/Redis by Zabbix agent 2/redis.config["{$REDIS.CONN.URI}"],#2) and length(last(/Redis by Zabbix agent 2/redis.config["{$REDIS.CONN.URI}"]))>0`\|Info	Manual close: Yes
Redis: Service is down		`last(/Redis by Zabbix agent 2/redis.ping["{$REDIS.CONN.URI}"])=0`\|Average	Manual close: Yes
Redis: Too many entries in the slowlog		`min(/Redis by Zabbix agent 2/redis.slowlog.count["{$REDIS.CONN.URI}"],5m)>{$REDIS.SLOWLOG.COUNT.MAX.WARN}`\|Info
Redis: Total number of connected clients is too high	When the number of clients reaches the value of the "maxclients" parameter, new connections will be rejected. https://redis.io/topics/clients#maximum-number-of-clients	`min(/Redis by Zabbix agent 2/redis.clients.connected,5m)/last(/Redis by Zabbix agent 2/redis.config.maxclients)*100>{$REDIS.CLIENTS.PRC.MAX.WARN}`\|Warning
Redis: Memory fragmentation ratio is too high	This ratio is an indication of memory mapping efficiency: - Value over 1.0 indicate that memory fragmentation is very likely. Consider restarting the Redis server so the operating system can recover fragmented memory, especially with a ratio over 1.5. - Value under 1.0 indicate that Redis likely has insufficient memory available. Consider optimizing memory usage or adding more RAM. Note: If your peak memory usage is much higher than your current memory usage, the memory fragmentation ratio may be unreliable. https://redis.io/topics/memory-optimization	`min(/Redis by Zabbix agent 2/redis.memory.fragmentation_ratio,15m)>{$REDIS.MEM.FRAG_RATIO.MAX.WARN}`\|Warning
Redis: Last AOF write operation failed	Detailed information about persistence: https://redis.io/topics/persistence	`last(/Redis by Zabbix agent 2/redis.persistence.aof_last_write_status)=0`\|Warning
Redis: Last RDB save operation failed	Detailed information about persistence: https://redis.io/topics/persistence	`last(/Redis by Zabbix agent 2/redis.persistence.rdb_last_bgsave_status)=0`\|Warning
Redis: Number of slaves has changed	Redis number of slaves has changed. Acknowledge to close the problem manually.	`last(/Redis by Zabbix agent 2/redis.replication.connected_slaves,#1)<>last(/Redis by Zabbix agent 2/redis.replication.connected_slaves,#2)`\|Info	Manual close: Yes
Redis: Replication role has changed	Redis replication role has changed. Acknowledge to close the problem manually.	`last(/Redis by Zabbix agent 2/redis.replication.role,#1)<>last(/Redis by Zabbix agent 2/redis.replication.role,#2) and length(last(/Redis by Zabbix agent 2/redis.replication.role))>0`\|Warning	Manual close: Yes
Redis: Version has changed	The Redis version has changed. Acknowledge to close the problem manually.	`last(/Redis by Zabbix agent 2/redis.server.redis_version,#1)<>last(/Redis by Zabbix agent 2/redis.server.redis_version,#2) and length(last(/Redis by Zabbix agent 2/redis.server.redis_version))>0`\|Info	Manual close: Yes
Redis: Host has been restarted	The host uptime is less than 10 minutes.	`last(/Redis by Zabbix agent 2/redis.server.uptime)<10m`\|Info	Manual close: Yes
Redis: Connections are rejected	The number of connections has reached the value of "maxclients". https://redis.io/topics/clients	`last(/Redis by Zabbix agent 2/redis.stats.rejected_connections)>0`\|High

LLD rule Keyspace discovery

Name Description Type Key and additional info

Keyspace discovery

Individual keyspace metrics

Dependent item

redis.keyspace.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Keyspace discovery

Name	Description	Type	Key and additional info
DB {#DB}: Get Keyspace info	The item gets information about keyspace of {#DB} database.	Dependent item	redis.db.info_raw["{#DB}"] Preprocessing JSON Path: `$['{#DB}']` ⛔️Custom on fail: Discard value
DB {#DB}: Average TTL	Average TTL	Dependent item	redis.db.avg_ttl["{#DB}"] Preprocessing JSON Path: `$.avg_ttl` Custom multiplier: `0.001`
DB {#DB}: Expires	Number of keys with an expiration	Dependent item	redis.db.expires["{#DB}"] Preprocessing JSON Path: `$.expires`
DB {#DB}: Keys	Total number of keys	Dependent item	redis.db.keys["{#DB}"] Preprocessing JSON Path: `$.keys`

LLD rule AOF metrics discovery

Name Description Type Key and additional info

AOF metrics discovery

If AOF is activated, additional metrics will be added

Dependent item

redis.persistence.aof.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for AOF metrics discovery

Name	Description	Type	Key and additional info
Redis: AOF current size{#SINGLETON}	AOF current file size	Dependent item	redis.persistence.aofcurrentsize[{#SINGLETON}] Preprocessing JSON Path: `$.aof_current_size`
Redis: AOF base size{#SINGLETON}	AOF file size on latest startup or rewrite	Dependent item	redis.persistence.aofbasesize[{#SINGLETON}] Preprocessing JSON Path: `$.aof_base_size`
Redis: AOF pending rewrite{#SINGLETON}	Flag indicating an AOF rewrite operation will	Dependent item	redis.persistence.aofpendingrewrite[{#SINGLETON}] Preprocessing JSON Path: `$.aof_pending_rewrite`
Redis: AOF buffer length{#SINGLETON}	Size of the AOF buffer	Dependent item	redis.persistence.aofbufferlength[{#SINGLETON}] Preprocessing JSON Path: `$.aof_buffer_length`
Redis: AOF rewrite buffer length{#SINGLETON}	Size of the AOF rewrite buffer	Dependent item	redis.persistence.aofrewritebuffer_length[{#SINGLETON}] Preprocessing JSON Path: `$.aof_rewrite_buffer_length` ⛔️Custom on fail: Discard value
Redis: AOF pending background I/O fsync{#SINGLETON}	Number of fsync pending jobs in background I/O queue	Dependent item	redis.persistence.aofpendingbio_fsync[{#SINGLETON}] Preprocessing JSON Path: `$.aof_pending_bio_fsync`
Redis: AOF delayed fsync{#SINGLETON}	Delayed fsync counter	Dependent item	redis.persistence.aofdelayedfsync[{#SINGLETON}] Preprocessing JSON Path: `$.aof_delayed_fsync`

LLD rule Slave metrics discovery

Name Description Type Key and additional info

Slave metrics discovery

If the instance is a replica, additional metrics are provided

Dependent item

redis.replication.slave.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Slave metrics discovery

Name	Description	Type	Key and additional info
Redis: Master host{#SINGLETON}	Host or IP address of the master	Dependent item	redis.replication.master_host[{#SINGLETON}] Preprocessing JSON Path: `$.master_host` Discard unchanged with heartbeat: `1d`
Redis: Master port{#SINGLETON}	Master listening TCP port	Dependent item	redis.replication.master_port[{#SINGLETON}] Preprocessing JSON Path: `$.master_port` Discard unchanged with heartbeat: `1d`
Redis: Master link status{#SINGLETON}	Status of the link (up/down)	Dependent item	redis.replication.masterlinkstatus[{#SINGLETON}] Preprocessing JSON Path: `$.master_link_status` Boolean to decimal
Redis: Master last I/O seconds ago{#SINGLETON}	Number of seconds since the last interaction with master	Dependent item	redis.replication.masterlastiosecondsago[{#SINGLETON}] Preprocessing JSON Path: `$.master_last_io_seconds_ago`
Redis: Master sync in progress{#SINGLETON}	Indicate the master is syncing to the replica	Dependent item	redis.replication.mastersyncin_progress[{#SINGLETON}] Preprocessing JSON Path: `$.master_sync_in_progress`
Redis: Slave replication offset{#SINGLETON}	The replication offset of the replica instance	Dependent item	redis.replication.slavereploffset[{#SINGLETON}] Preprocessing JSON Path: `$.slave_repl_offset`
Redis: Slave priority{#SINGLETON}	The priority of the instance as a candidate for failover	Dependent item	redis.replication.slave_priority[{#SINGLETON}] Preprocessing JSON Path: `$.slave_priority`
Redis: Slave priority{#SINGLETON}	Flag indicating if the replica is read-only	Dependent item	redis.replication.slavereadonly[{#SINGLETON}] Preprocessing JSON Path: `$.slave_read_only` Discard unchanged with heartbeat: `1d`

Trigger prototypes for Slave metrics discovery

Name	Description	Expression	Severity	Dependencies and additional info
Redis: Replication lag with master is too high		`min(/Redis by Zabbix agent 2/redis.replication.master_last_io_seconds_ago[{#SINGLETON}],5m)>{$REDIS.REPL.LAG.MAX.WARN}`\|Warning

LLD rule Replication metrics discovery

Name Description Type Key and additional info

Replication metrics discovery

If the instance is the master and the slaves are connected, additional metrics are provided

Dependent item

redis.replication.master.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Replication metrics discovery

Name Description Type Key and additional info

Redis slave {#SLAVEIP}:{#SLAVEPORT}: Replication lag in bytes

Replication lag in bytes

Dependent item

redis.replication.lagbytes["{#SLAVEIP}:{#SLAVE_PORT}"]

Preprocessing

JavaScript: The text is too long. Please see the template.

LLD rule Process metrics discovery

Name Description Type Key and additional info

Process metrics discovery

Collect metrics by Zabbix agent if it exists

Zabbix agent

proc.num["{$REDIS.LLD.PROCESS_NAME}"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Process metrics discovery

Name	Description	Type	Key and additional info
Redis: Number of running processes		Zabbix agent	proc.num["{$REDIS.PROCESS_NAME}{#SINGLETON}"]
Redis: Memory usage (rss)	Resident set size memory used by process in bytes.	Zabbix agent	proc.mem["{$REDIS.PROCESS_NAME}{#SINGLETON}",,,,rss]
Redis: Memory usage (vsize)	Virtual memory size used by process in bytes.	Zabbix agent	proc.mem["{$REDIS.PROCESS_NAME}{#SINGLETON}",,,,vsize]
Redis: CPU utilization	Process CPU utilization percentage.	Zabbix agent	proc.cpu.util["{$REDIS.PROCESS_NAME}{#SINGLETON}"]

Trigger prototypes for Process metrics discovery

Name	Description	Expression	Severity	Dependencies and additional info
Redis: Process is not running		`last(/Redis by Zabbix agent 2/proc.num["{$REDIS.PROCESS_NAME}{#SINGLETON}"])=0`\|High

LLD rule Version 4+ metrics discovery

Name Description Type Key and additional info

Version 4+ metrics discovery

Additional metrics for versions 4+

Dependent item

redis.metrics.v4.discovery

Preprocessing

JSON Path: $.redis_version
JavaScript: The text is too long. Please see the template.

Item prototypes for Version 4+ metrics discovery

Name	Description	Type	Key and additional info
Redis: Executable path{#SINGLETON}	The path to the server's executable	Dependent item	redis.server.executable[{#SINGLETON}] Preprocessing JSON Path: `$.executable` Discard unchanged with heartbeat: `1d`
Redis: Memory used peak %{#SINGLETON}	The percentage of usedmemorypeak out of used_memory	Dependent item	redis.memory.usedmemorypeak_perc[{#SINGLETON}] Preprocessing JSON Path: `$.used_memory_peak_perc` Regular expression: `(.+)% \1`
Redis: Memory used overhead{#SINGLETON}	The sum in bytes of all overheads that the server allocated for managing its internal data structures	Dependent item	redis.memory.usedmemoryoverhead[{#SINGLETON}] Preprocessing JSON Path: `$.used_memory_overhead`
Redis: Memory used startup{#SINGLETON}	Initial amount of memory consumed by Redis at startup in bytes	Dependent item	redis.memory.usedmemorystartup[{#SINGLETON}] Preprocessing JSON Path: `$.used_memory_startup`
Redis: Memory used dataset{#SINGLETON}	The size in bytes of the dataset	Dependent item	redis.memory.usedmemorydataset[{#SINGLETON}] Preprocessing JSON Path: `$.used_memory_dataset`
Redis: Memory used dataset %{#SINGLETON}	The percentage of usedmemorydataset out of the net memory usage (usedmemory minus usedmemory_startup)	Dependent item	redis.memory.usedmemorydataset_perc[{#SINGLETON}] Preprocessing JSON Path: `$.used_memory_dataset_perc` Regular expression: `(.+)% \1`
Redis: Total system memory{#SINGLETON}	The total amount of memory that the Redis host has	Dependent item	redis.memory.totalsystemmemory[{#SINGLETON}] Preprocessing JSON Path: `$.total_system_memory`
Redis: Max memory{#SINGLETON}	Maximum amount of memory allocated to the Redisdb system	Dependent item	redis.memory.maxmemory[{#SINGLETON}] Preprocessing JSON Path: `$.maxmemory`
Redis: Max memory policy{#SINGLETON}	The value of the maxmemory-policy configuration directive	Dependent item	redis.memory.maxmemory_policy[{#SINGLETON}] Preprocessing JSON Path: `$.maxmemory_policy` Discard unchanged with heartbeat: `1d`
Redis: Active defrag running{#SINGLETON}	Flag indicating if active defragmentation is active	Dependent item	redis.memory.activedefragrunning[{#SINGLETON}] Preprocessing JSON Path: `$.active_defrag_running`
Redis: Lazyfree pending objects{#SINGLETON}	The number of objects waiting to be freed (as a result of calling UNLINK, or FLUSHDB and FLUSHALL with the ASYNC option)	Dependent item	redis.memory.lazyfreependingobjects[{#SINGLETON}] Preprocessing JSON Path: `$.lazyfree_pending_objects`
Redis: RDB last CoW size{#SINGLETON}	The size in bytes of copy-on-write allocations during the last RDB save operation	Dependent item	redis.persistence.rdblastcow_size[{#SINGLETON}] Preprocessing JSON Path: `$.rdb_last_cow_size`
Redis: AOF last CoW size{#SINGLETON}	The size in bytes of copy-on-write allocations during the last AOF rewrite operation	Dependent item	redis.persistence.aoflastcow_size[{#SINGLETON}] Preprocessing JSON Path: `$.aof_last_cow_size`
Redis: Expired stale %{#SINGLETON}		Dependent item	redis.stats.expiredstaleperc[{#SINGLETON}] Preprocessing JSON Path: `$.expired_stale_perc`
Redis: Expired time cap reached count{#SINGLETON}		Dependent item	redis.stats.expiredtimecapreachedcount[{#SINGLETON}] Preprocessing JSON Path: `$.expired_time_cap_reached_count`
Redis: Slave expires tracked keys{#SINGLETON}	The number of keys tracked for expiry purposes (applicable only to writable replicas)	Dependent item	redis.stats.slaveexpirestracked_keys[{#SINGLETON}] Preprocessing JSON Path: `$.slave_expires_tracked_keys`
Redis: Active defrag hits{#SINGLETON}	Number of value reallocations performed by active the defragmentation process	Dependent item	redis.stats.activedefraghits[{#SINGLETON}] Preprocessing JSON Path: `$.active_defrag_hits`
Redis: Active defrag misses{#SINGLETON}	Number of aborted value reallocations started by the active defragmentation process	Dependent item	redis.stats.activedefragmisses[{#SINGLETON}] Preprocessing JSON Path: `$.active_defrag_misses`
Redis: Active defrag key hits{#SINGLETON}	Number of keys that were actively defragmented	Dependent item	redis.stats.activedefragkey_hits[{#SINGLETON}] Preprocessing JSON Path: `$.active_defrag_key_hits`
Redis: Active defrag key misses{#SINGLETON}	Number of keys that were skipped by the active defragmentation process	Dependent item	redis.stats.activedefragkey_misses[{#SINGLETON}] Preprocessing JSON Path: `$.active_defrag_key_misses`
Redis: Replication second offset{#SINGLETON}	Offset up to which replication IDs are accepted	Dependent item	redis.replication.secondreploffset[{#SINGLETON}] Preprocessing JSON Path: `$.second_repl_offset`

Trigger prototypes for Version 4+ metrics discovery

Name	Description	Expression	Severity	Dependencies and additional info
Redis: Memory usage is too high		`last(/Redis by Zabbix agent 2/redis.memory.used_memory)/min(/Redis by Zabbix agent 2/redis.memory.maxmemory[{#SINGLETON}],5m)*100>{$REDIS.MEM.PUSED.MAX.WARN}`\|Warning

LLD rule Version 5+ metrics discovery

Name Description Type Key and additional info

Version 5+ metrics discovery

Additional metrics for versions 5+

Dependent item

redis.metrics.v5.discovery

Preprocessing

JSON Path: $.redis_version
JavaScript: The text is too long. Please see the template.

Item prototypes for Version 5+ metrics discovery

Name	Description	Type	Key and additional info
Redis: Allocator active{#SINGLETON}		Dependent item	redis.memory.allocator_active[{#SINGLETON}] Preprocessing JSON Path: `$.allocator_active`
Redis: Allocator allocated{#SINGLETON}		Dependent item	redis.memory.allocator_allocated[{#SINGLETON}] Preprocessing JSON Path: `$.allocator_allocated`
Redis: Allocator resident{#SINGLETON}		Dependent item	redis.memory.allocator_resident[{#SINGLETON}] Preprocessing JSON Path: `$.allocator_resident`
Redis: Memory used scripts{#SINGLETON}		Dependent item	redis.memory.usedmemoryscripts[{#SINGLETON}] Preprocessing JSON Path: `$.used_memory_scripts`
Redis: Memory number of cached scripts{#SINGLETON}		Dependent item	redis.memory.numberofcached_scripts[{#SINGLETON}] Preprocessing JSON Path: `$.number_of_cached_scripts`
Redis: Allocator fragmentation bytes{#SINGLETON}		Dependent item	redis.memory.allocatorfragbytes[{#SINGLETON}] Preprocessing JSON Path: `$.allocator_frag_bytes`
Redis: Allocator fragmentation ratio{#SINGLETON}		Dependent item	redis.memory.allocatorfragratio[{#SINGLETON}] Preprocessing JSON Path: `$.allocator_frag_ratio`
Redis: Allocator RSS bytes{#SINGLETON}		Dependent item	redis.memory.allocatorrssbytes[{#SINGLETON}] Preprocessing JSON Path: `$.allocator_rss_bytes`
Redis: Allocator RSS ratio{#SINGLETON}		Dependent item	redis.memory.allocatorrssratio[{#SINGLETON}] Preprocessing JSON Path: `$.allocator_rss_ratio`
Redis: Memory RSS overhead bytes{#SINGLETON}		Dependent item	redis.memory.rssoverheadbytes[{#SINGLETON}] Preprocessing JSON Path: `$.rss_overhead_bytes`
Redis: Memory RSS overhead ratio{#SINGLETON}		Dependent item	redis.memory.rssoverheadratio[{#SINGLETON}] Preprocessing JSON Path: `$.rss_overhead_ratio`
Redis: Memory fragmentation bytes{#SINGLETON}		Dependent item	redis.memory.fragmentation_bytes[{#SINGLETON}] Preprocessing JSON Path: `$.mem_fragmentation_bytes`
Redis: Memory not counted for evict{#SINGLETON}		Dependent item	redis.memory.notcountedfor_evict[{#SINGLETON}] Preprocessing JSON Path: `$.mem_not_counted_for_evict`
Redis: Memory replication backlog{#SINGLETON}		Dependent item	redis.memory.replication_backlog[{#SINGLETON}] Preprocessing JSON Path: `$.mem_replication_backlog`
Redis: Memory clients normal{#SINGLETON}		Dependent item	redis.memory.memclientsnormal[{#SINGLETON}] Preprocessing JSON Path: `$.mem_clients_normal`
Redis: Memory clients slaves{#SINGLETON}		Dependent item	redis.memory.memclientsslaves[{#SINGLETON}] Preprocessing JSON Path: `$.mem_clients_slaves`
Redis: Memory AOF buffer{#SINGLETON}	Size of the AOF buffer	Dependent item	redis.memory.memaofbuffer[{#SINGLETON}] Preprocessing JSON Path: `$.mem_aof_buffer`

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

db_postgresql_odbc

View README Download JSON

PostgreSQL by ODBC

Overview

This template is designed for the effortless deployment of PostgreSQL monitoring by Zabbix via ODBC and doesn't require any external scripts.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

PostgreSQL 11-15

Configuration

Setup

Create the PostgreSQL user for monitoring (<password> at your discretion) and inherit permissions from the default role pg_monitor:

CREATE USER zbx_monitor WITH PASSWORD '<PASSWORD>' INHERIT;
GRANT pg_monitor TO zbx_monitor;

Edit the pg_hba.conf configuration file to allow TCP connections for the user zbx_monitor. For example, you could add one of the following rows to allow local connections from the same host:

# TYPE  DATABASE        USER            ADDRESS                 METHOD
  host       all        zbx_monitor     localhost               trust
  host       all        zbx_monitor     127.0.0.1/32            md5
  host       all        zbx_monitor     ::1/128                 scram-sha-256

For more information please read the PostgreSQL documentation https://www.postgresql.org/docs/current/auth-pg-hba-conf.html.

Install the PostgreSQL ODBC driver. Check the Zabbix documentation for details about ODBC checks and recommended parameters page.
Set up the connection string with the {$PG.CONNSTRING.ODBC} macro. The minimum required parameters are:

Driver= - set the name of the driver which will be used for monitoring (from the odbcinst.ini file) or specify the path to the driver file (for example /usr/lib64/psqlodbcw.so);
Servername= - set the host name or IP address of the PostgreSQL instance;
Port= - adjust the port number if needed.

Note: if you want to use SSL/TLS encryption to protect communications with the remote PostgreSQL instance, you can also specify encryption parameters here.

It is assumed that you set up the PostgreSQL instance to work in the desired encryption mode. Check the PostgreSQL documentation for details.

For example, to enable required encryption in transport mode without identity checks, the connection string could look like this (replace <instanceip> with the address of the PostgreSQL instance):

Servername=<instanceip>;Port=5432;Driver=/usr/lib64/psqlodbcw.so;SSLmode=require

Set the password that you specified in step 1 in the macro {$PG.PASSWORD}.

Macros used

Name	Description	Default
{$PG.PASSWORD}	PostgreSQL user password.	`<Put the password here>`
{$PG.USER}	PostgreSQL username.	`zbx_monitor`
{$PG.CONNSTRING.ODBC}	Connection string for the PostgreSQL instance.	`Macro too long. Please see the template.`
{$PG.LLD.FILTER.DBNAME}	Filter of discoverable databases.	`.+`
{$PG.CONNTOTALPCT.MAX.WARN}	Maximum percentage of current connections for trigger expression.	`90`
{$PG.DATABASE}	Default PostgreSQL database for the connection.	`postgres`
{$PG.DEADLOCKS.MAX.WARN}	Maximum number of detected deadlocks for trigger expression.	`0`
{$PG.LLD.FILTER.APPLICATION}	Filter of discoverable applications.	`.+`
{$PG.CONFLICTS.MAX.WARN}	Maximum number of recovery conflicts for trigger expression.	`0`
{$PG.QUERY_ETIME.MAX.WARN}	Execution time limit for count of slow queries.	`30`
{$PG.SLOW_QUERIES.MAX.WARN}	Slow queries count threshold for a trigger.	`5`

Items

Name	Description	Type	Key and additional info
PostgreSQL: Get bgwriter	Collect all metrics from pgstatbgwriter: https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-BGWRITER-VIEW	Database monitor	db.odbc.select[pgsql.bgwriter,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"]
PostgreSQL: Get archive	Collect archive status metrics.	Database monitor	db.odbc.select[pgsql.archive,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"]
PostgreSQL: Get dbstat	Collect all metrics from pgstatdatabase per database: https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-DATABASE-VIEW	Database monitor	db.odbc.select[pgsql.dbstat,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"]
PostgreSQL: Get dbstat sum	Collect all metrics from pgstatdatabase as sums for all databases: https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-DATABASE-VIEW	Database monitor	db.odbc.select[pgsql.dbstat.sum,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"]
PostgreSQL: Get connections sum	Collect all metrics from pgstatactivity: https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW	Database monitor	db.odbc.select[pgsql.connections.sum,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"]
PostgreSQL: Get WAL	Collect write-ahead log (WAL) metrics.	Database monitor	db.odbc.select[pgsql.wal.stat,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"]
PostgreSQL: Get locks	Collect all metrics from pg_locks per database: https://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-TABLES	Database monitor	db.odbc.select[pgsql.locks,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"]
PostgreSQL: Get replication	Collect metrics from the pgstatreplication, which contains information about the WAL sender process, showing statistics about replication to that sender's connected standby server.	Database monitor	db.odbc.select[pgsql.replication.process,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"] Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
PostgreSQL: Get queries	Collect all metrics by query execution time.	Database monitor	db.odbc.select[pgsql.queries,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"]
PostgreSQL: Version	PostgreSQL version.	Database monitor	db.odbc.select[pgsql.version,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"] Preprocessing Discard unchanged with heartbeat: `1d`
WAL: Bytes written	WAL write, in bytes.	Dependent item	pgsql.wal.write Preprocessing JSON Path: `$.write` Change per second
WAL: Bytes received	WAL receive, in bytes.	Dependent item	pgsql.wal.receive Preprocessing JSON Path: `$.receive` Change per second
WAL: Segments count	Number of WAL segments.	Dependent item	pgsql.wal.count Preprocessing JSON Path: `$.count`
Bgwriter: Buffers allocated per second	Number of buffers allocated per second.	Dependent item	pgsql.bgwriter.buffers_alloc.rate Preprocessing JSON Path: `$.buffers_alloc` Change per second
Bgwriter: Buffers written directly by a backend per second	Number of buffers written directly by a backend per second.	Dependent item	pgsql.bgwriter.buffers_backend.rate Preprocessing JSON Path: `$.buffers_backend` Change per second
Bgwriter: Number of bgwriter cleaning scan stopped per second	Number of times the background writer stopped a cleaning scan because it had written too many buffers per second.	Dependent item	pgsql.bgwriter.maxwritten_clean.rate Preprocessing JSON Path: `$.maxwritten_clean` Change per second
Bgwriter: Times a backend executed its own fsync per second	Number of times a backend had to execute its own fsync call per second (normally the background writer handles those even when the backend does its own write).	Dependent item	pgsql.bgwriter.buffersbackendfsync.rate Preprocessing JSON Path: `$.buffers_backend_fsync` Change per second
Checkpoint: Buffers written by the background writer per second	Number of buffers written by the background writer per second.	Dependent item	pgsql.bgwriter.buffers_clean.rate Preprocessing JSON Path: `$.buffers_clean` Change per second
Checkpoint: Buffers written during checkpoints per second	Number of buffers written during checkpoints per second.	Dependent item	pgsql.bgwriter.buffers_checkpoint.rate Preprocessing JSON Path: `$.buffers_checkpoint` Change per second
Checkpoint: Scheduled per second	Number of scheduled checkpoints that have been performed per second.	Dependent item	pgsql.bgwriter.checkpoints_timed.rate Preprocessing JSON Path: `$.checkpoints_timed` Change per second
Checkpoint: Requested per second	Number of requested checkpoints that have been performed per second.	Dependent item	pgsql.bgwriter.checkpoints_req.rate Preprocessing JSON Path: `$.checkpoints_req` Change per second
Checkpoint: Checkpoint write time per second	Total amount of time per second that has been spent in the portion of checkpoint processing where files are written to disk.	Dependent item	pgsql.bgwriter.checkpointwritetime.rate Preprocessing JSON Path: `$.checkpoint_write_time` Custom multiplier: `0.001` Change per second
Checkpoint: Checkpoint sync time per second	Total amount of time per second that has been spent in the portion of checkpoint processing where files are synchronized to disk.	Dependent item	pgsql.bgwriter.checkpointsynctime.rate Preprocessing JSON Path: `$.checkpoint_sync_time` Custom multiplier: `0.001` Change per second
Archive: Count of archived files	Count of archived files.	Dependent item	pgsql.archive.countarchivedfiles Preprocessing JSON Path: `$.archived_count`
Archive: Count of failed attempts to archive files	Count of failed attempts to archive files.	Dependent item	pgsql.archive.failedtryingto_archive Preprocessing JSON Path: `$.failed_count`
Archive: Count of files in archive_status need to archive	Count of files to archive.	Dependent item	pgsql.archive.countfilesto_archive Preprocessing JSON Path: `$.count_files`
Archive: Size of files need to archive	Size of files to archive.	Dependent item	pgsql.archive.sizefilesto_archive Preprocessing JSON Path: `$.size_files`
Dbstat: Blocks read time	Time spent reading data file blocks by backends.	Dependent item	pgsql.dbstat.sum.blkreadtime Preprocessing JSON Path: `$.blk_read_time` Custom multiplier: `0.001`
Dbstat: Blocks write time	Time spent writing data file blocks by backends.	Dependent item	pgsql.dbstat.sum.blkwritetime Preprocessing JSON Path: `$.blk_write_time` Custom multiplier: `0.001`
Dbstat: Committed transactions per second	Number of transactions that have been committed per second.	Dependent item	pgsql.dbstat.sum.xact_commit.rate Preprocessing JSON Path: `$.xact_commit` Change per second
Dbstat: Conflicts per second	Number of queries canceled per second due to conflicts with recovery (conflicts occur only on standby servers; see pgstatdatabase_conflicts for details).	Dependent item	pgsql.dbstat.sum.conflicts.rate Preprocessing JSON Path: `$.conflicts` Change per second
Dbstat: Deadlocks per second	Number of deadlocks detected per second.	Dependent item	pgsql.dbstat.sum.deadlocks.rate Preprocessing JSON Path: `$.deadlocks` Change per second
Dbstat: Disk blocks read per second	Number of disk blocks read per second.	Dependent item	pgsql.dbstat.sum.blks_read.rate Preprocessing JSON Path: `$.blks_read` Change per second
Dbstat: Hit blocks read per second	Number of times per second disk blocks were found already in the buffer cache.	Dependent item	pgsql.dbstat.sum.blks_hit.rate Preprocessing JSON Path: `$.blks_hit` Change per second
Dbstat: Number temp bytes per second	Total amount of data written per second to temporary files by queries.	Dependent item	pgsql.dbstat.sum.temp_bytes.rate Preprocessing JSON Path: `$.temp_bytes` Change per second
Dbstat: Number temp files per second	Number of temporary files created by queries per second.	Dependent item	pgsql.dbstat.sum.temp_files.rate Preprocessing JSON Path: `$.temp_files` Change per second
Dbstat: Roll backed transactions per second	Number of transactions that have been rolled back per second.	Dependent item	pgsql.dbstat.sum.xact_rollback.rate Preprocessing JSON Path: `$.xact_rollback` Change per second
Dbstat: Rows deleted per second	Number of rows deleted by queries per second.	Dependent item	pgsql.dbstat.sum.tup_deleted.rate Preprocessing JSON Path: `$.tup_deleted` Change per second
Dbstat: Rows fetched per second	Number of rows fetched by queries per second.	Dependent item	pgsql.dbstat.sum.tup_fetched.rate Preprocessing JSON Path: `$.tup_fetched` Change per second
Dbstat: Rows inserted per second	Number of rows inserted by queries per second.	Dependent item	pgsql.dbstat.sum.tup_inserted.rate Preprocessing JSON Path: `$.tup_inserted` Change per second
Dbstat: Rows returned per second	Number of rows returned by queries per second.	Dependent item	pgsql.dbstat.sum.tup_returned.rate Preprocessing JSON Path: `$.tup_returned` Change per second
Dbstat: Rows updated per second	Number of rows updated by queries per second.	Dependent item	pgsql.dbstat.sum.tup_updated.rate Preprocessing JSON Path: `$.tup_updated` Change per second
Dbstat: Backends connected	Number of connected backends.	Dependent item	pgsql.dbstat.sum.numbackends Preprocessing JSON Path: `$.numbackends`
Connections sum: Active	Total number of connections executing a query.	Dependent item	pgsql.connections.sum.active Preprocessing JSON Path: `$.active`
Connections sum: Fastpath function call	Total number of connections executing a fast-path function.	Dependent item	pgsql.connections.sum.fastpathfunctioncall Preprocessing JSON Path: `$.fastpath_function_call`
Connections sum: Idle	Total number of connections waiting for a new client command.	Dependent item	pgsql.connections.sum.idle Preprocessing JSON Path: `$.idle`
Connections sum: Idle in transaction	Total number of connections in a transaction state but not executing a query.	Dependent item	pgsql.connections.sum.idleintransaction Preprocessing JSON Path: `$.idle_in_transaction`
Connections sum: Prepared	Total number of prepared transactions: https://www.postgresql.org/docs/current/sql-prepare-transaction.html	Dependent item	pgsql.connections.sum.prepared Preprocessing JSON Path: `$.prepared`
Connections sum: Total	Total number of connections.	Dependent item	pgsql.connections.sum.total Preprocessing JSON Path: `$.total`
Connections sum: Total, %	Total number of connections, in percentage.	Dependent item	pgsql.connections.sum.total_pct Preprocessing JSON Path: `$.total_pct`
Connections sum: Waiting	Total number of waiting connections: https://www.postgresql.org/docs/current/monitoring-stats.html#WAIT-EVENT-TABLE	Dependent item	pgsql.connections.sum.waiting Preprocessing JSON Path: `$.waiting`
Connections sum: Idle in transaction (aborted)	Total number of connections in a transaction state but not executing a query, and where one of the statements in the transaction caused an error.	Dependent item	pgsql.connections.sum.idleintransaction_aborted Preprocessing JSON Path: `$.idle_in_transaction_aborted`
Connections sum: Disabled	Total number of disabled connections.	Dependent item	pgsql.connections.sum.disabled Preprocessing JSON Path: `$.disabled`
PostgreSQL: Age of oldest xid	Age of oldest xid.	Database monitor	db.odbc.select[pgsql.oldest.xid,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"]
PostgreSQL: Count of autovacuum workers	Number of autovacuum workers.	Database monitor	db.odbc.select[pgsql.autovacuum.count,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"]
PostgreSQL: Cache hit ratio, %	Cache hit ratio.	Calculated	pgsql.cache.hit
PostgreSQL: Uptime	Time since the server started.	Database monitor	db.odbc.select[pgsql.uptime,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"]
PostgreSQL: Replication: Lag in bytes	Replication lag with master, in bytes.	Database monitor	db.odbc.select[pgsql.replication.lag.b,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"]
PostgreSQL: Replication: Lag in seconds	Replication lag with master, in seconds.	Database monitor	db.odbc.select[pgsql.replication.lag.sec,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"]
PostgreSQL: Replication: Recovery role	Replication role: 1 — recovery is still in progress (standby mode), 0 — master mode.	Database monitor	db.odbc.select[pgsql.replication.recovery_role,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"]
PostgreSQL: Replication: Standby count	Number of standby servers.	Database monitor	db.odbc.select[pgsql.replication.count,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"]
PostgreSQL: Replication: Status	Replication status: 0 — streaming is down, 1 — streaming is up, 2 — master mode.	Database monitor	db.odbc.select[pgsql.replication.status,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"]
PostgreSQL: Ping	Used to test a connection to see if it is alive. It is set to 0 if the query is unsuccessful.	Database monitor	db.odbc.select[pgsql.ping,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"] Preprocessing Check for not supported value ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `1h`

Triggers

Name	Description	Expression
PostgreSQL: Version has changed		`last(/PostgreSQL by ODBC/db.odbc.select[pgsql.version,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"],#1)<>last(/PostgreSQL by ODBC/db.odbc.select[pgsql.version,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"],#2) and length(last(/PostgreSQL by ODBC/db.odbc.select[pgsql.version,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"]))>0`\|Info
PostgreSQL: Total number of connections is too high	Total number of current connections exceeds the limit of {$PG.CONNTOTALPCT.MAX.WARN}% out of the maximum number of concurrent connections to the database server (the "max_connections" setting).	`min(/PostgreSQL by ODBC/pgsql.connections.sum.total_pct,5m) > {$PG.CONN_TOTAL_PCT.MAX.WARN}`\|Average
PostgreSQL: Oldest xid is too big		`last(/PostgreSQL by ODBC/db.odbc.select[pgsql.oldest.xid,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"]) > 18000000`\|Average
PostgreSQL: Service has been restarted	PostgreSQL uptime is less than 10 minutes.	`last(/PostgreSQL by ODBC/db.odbc.select[pgsql.uptime,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"]) < 10m`\|Average
PostgreSQL: Service is down	Last test of a connection was unsuccessful.	`last(/PostgreSQL by ODBC/db.odbc.select[pgsql.ping,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"])=0`\|High

LLD rule Replication discovery

Name Description Type Key and additional info

Replication discovery

Discovers replication lag metrics.

Database monitor

db.odbc.select[pgsql.replication.process.discovery,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"]

Preprocessing

Discard unchanged with heartbeat: 3h

Item prototypes for Replication discovery

Name	Description	Type	Key and additional info
Application [{#APPLICATION_NAME}]: Get replication	Collect metrics from the "pgstatreplication" about the application "{#APPLICATION_NAME}" that is connected to this WAL sender, which contains information about the WAL sender process, showing statistics about replication to that sender's connected standby server.	Dependent item	pgsql.replication.getmetrics["{#APPLICATIONNAME}"] Preprocessing JSON Path: `$['{#APPLICATION_NAME}']` ⛔️Custom on fail: Discard value
Application [{#APPLICATION_NAME}]: Replication flush lag	Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written and flushed it (but not yet applied it). This can be used to gauge the delay that synchronous_commit level on incurred while committing if this server was configured as a synchronous standby.	Dependent item	pgsql.replication.process.flushlag["{#APPLICATIONNAME}"] Preprocessing JSON Path: `$.flush_lag`
Application [{#APPLICATION_NAME}]: Replication replay lag	Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written, flushed and applied it. This can be used to gauge the delay that synchronouscommit level remoteapply incurred while committing if this server was configured as a synchronous standby.	Dependent item	pgsql.replication.process.replaylag["{#APPLICATIONNAME}"] Preprocessing JSON Path: `$.replay_lag`
Application [{#APPLICATION_NAME}]: Replication write lag	Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written it (but not yet flushed it or applied it). This can be used to gauge the delay that synchronouscommit level remotewrite incurred while committing if this server was configured as a synchronous standby.	Dependent item	pgsql.replication.process.writelag["{#APPLICATIONNAME}"] Preprocessing JSON Path: `$.write_lag`

LLD rule Database discovery

Name

Description

Type

Key and additional info

Database discovery

Discovers databases (DB) in the database management system (DBMS), except:

- templates;

- default "postgres" DB;

- DBs that do not allow connections.

Database monitor

db.odbc.select[pgsql.db.discovery,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"]

Item prototypes for Database discovery

Name	Description	Type	Key and additional info
DB [{#DBNAME}]: Get dbstat	Get dbstat metrics for database "{#DBNAME}".	Dependent item	pgsql.dbstat.get_metrics["{#DBNAME}"] Preprocessing JSON Path: `$['{#DBNAME}']` ⛔️Custom on fail: Discard value
DB [{#DBNAME}]: Get locks	Get locks metrics for database "{#DBNAME}".	Dependent item	pgsql.locks.get_metrics["{#DBNAME}"] Preprocessing JSON Path: `$['{#DBNAME}']` ⛔️Custom on fail: Discard value
DB [{#DBNAME}]: Get queries	Get queries metrics for database "{#DBNAME}".	Dependent item	pgsql.queries.get_metrics["{#DBNAME}"] Preprocessing JSON Path: `$['{#DBNAME}']` ⛔️Custom on fail: Discard value
DB [{#DBNAME}]: Database age	Database age.	Database monitor	db.odbc.select[pgsql.db.age,,"Database={#DBNAME};{$PG.CONNSTRING.ODBC}"]
DB [{#DBNAME}]: Bloating tables	Number of bloating tables.	Database monitor	db.odbc.select[pgsql.db.bloating_tables,,"Database={#DBNAME};{$PG.CONNSTRING.ODBC}"]
DB [{#DBNAME}]: Database size	Database size.	Database monitor	db.odbc.select[pgsql.db.size,,"Database={#DBNAME};{$PG.CONNSTRING.ODBC}"]
DB [{#DBNAME}]: Blocks hit per second	Total number of times per second disk blocks were found already in the buffer cache, so that a read was not necessary.	Dependent item	pgsql.dbstat.blks_hit.rate["{#DBNAME}"] Preprocessing JSON Path: `$.blks_hit` Change per second
DB [{#DBNAME}]: Disk blocks read per second	Total number of disk blocks read per second in this database.	Dependent item	pgsql.dbstat.blks_read.rate["{#DBNAME}"] Preprocessing JSON Path: `$.blks_read` Change per second
DB [{#DBNAME}]: Detected conflicts per second	Total number of queries canceled due to conflicts with recovery in this database per second.	Dependent item	pgsql.dbstat.conflicts.rate["{#DBNAME}"] Preprocessing JSON Path: `$.conflicts` Change per second
DB [{#DBNAME}]: Detected deadlocks per second	Total number of detected deadlocks in this database per second.	Dependent item	pgsql.dbstat.deadlocks.rate["{#DBNAME}"] Preprocessing JSON Path: `$.deadlocks` Change per second
DB [{#DBNAME}]: Temp_bytes written per second	Total amount of data written to temporary files by queries in this database.	Dependent item	pgsql.dbstat.temp_bytes.rate["{#DBNAME}"] Preprocessing JSON Path: `$.temp_bytes` Change per second
DB [{#DBNAME}]: Temp_files created per second	Total number of temporary files created by queries in this database.	Dependent item	pgsql.dbstat.temp_files.rate["{#DBNAME}"] Preprocessing JSON Path: `$.temp_files` Change per second
DB [{#DBNAME}]: Tuples deleted per second	Total number of rows deleted by queries in this database per second.	Dependent item	pgsql.dbstat.tup_deleted.rate["{#DBNAME}"] Preprocessing JSON Path: `$.tup_deleted` Change per second
DB [{#DBNAME}]: Tuples fetched per second	Total number of rows fetched by queries in this database per second.	Dependent item	pgsql.dbstat.tup_fetched.rate["{#DBNAME}"] Preprocessing JSON Path: `$.tup_fetched` Change per second
DB [{#DBNAME}]: Tuples inserted per second	Total number of rows inserted by queries in this database per second.	Dependent item	pgsql.dbstat.tup_inserted.rate["{#DBNAME}"] Preprocessing JSON Path: `$.tup_inserted` Change per second
DB [{#DBNAME}]: Tuples returned per second	Number of rows returned by queries in this database per second.	Dependent item	pgsql.dbstat.tup_returned.rate["{#DBNAME}"] Preprocessing JSON Path: `$.tup_returned` Change per second
DB [{#DBNAME}]: Tuples updated per second	Total number of rows updated by queries in this database per second.	Dependent item	pgsql.dbstat.tup_updated.rate["{#DBNAME}"] Preprocessing JSON Path: `$.tup_updated` Change per second
DB [{#DBNAME}]: Commits per second	Number of transactions in this database that have been committed per second.	Dependent item	pgsql.dbstat.xact_commit.rate["{#DBNAME}"] Preprocessing JSON Path: `$.xact_commit` Change per second
DB [{#DBNAME}]: Rollbacks per second	Total number of transactions in this database that have been rolled back.	Dependent item	pgsql.dbstat.xact_rollback.rate["{#DBNAME}"] Preprocessing JSON Path: `$.xact_rollback` Change per second
DB [{#DBNAME}]: Backends connected	Number of backends currently connected to this database.	Dependent item	pgsql.dbstat.numbackends["{#DBNAME}"] Preprocessing JSON Path: `$.numbackends`
DB [{#DBNAME}]: Disk blocks read time per second	Time spent reading data file blocks by backends per second.	Dependent item	pgsql.dbstat.blkreadtime.rate["{#DBNAME}"] Preprocessing JSON Path: `$.blk_read_time` Custom multiplier: `0.001` Change per second
DB [{#DBNAME}]: Disk blocks write time per second	Time spent writing data file blocks by backends per second.	Dependent item	pgsql.dbstat.blkwritetime.rate["{#DBNAME}"] Preprocessing JSON Path: `$.blk_write_time` Custom multiplier: `0.001` Change per second
DB [{#DBNAME}]: Num of accessexclusive locks	Number of accessexclusive locks for this database.	Dependent item	pgsql.locks.accessexclusive["{#DBNAME}"] Preprocessing JSON Path: `$.accessexclusive`
DB [{#DBNAME}]: Num of accessshare locks	Number of accessshare locks for this database.	Dependent item	pgsql.locks.accessshare["{#DBNAME}"] Preprocessing JSON Path: `$.accessshare`
DB [{#DBNAME}]: Num of exclusive locks	Number of exclusive locks for this database.	Dependent item	pgsql.locks.exclusive["{#DBNAME}"] Preprocessing JSON Path: `$.exclusive`
DB [{#DBNAME}]: Num of rowexclusive locks	Number of rowexclusive locks for this database.	Dependent item	pgsql.locks.rowexclusive["{#DBNAME}"] Preprocessing JSON Path: `$.rowexclusive`
DB [{#DBNAME}]: Num of rowshare locks	Number of rowshare locks for this database.	Dependent item	pgsql.locks.rowshare["{#DBNAME}"] Preprocessing JSON Path: `$['{#DBNAME}'].rowshare` ⛔️Custom on fail: Discard value
DB [{#DBNAME}]: Num of sharerowexclusive locks	Number of total sharerowexclusive for this database.	Dependent item	pgsql.locks.sharerowexclusive["{#DBNAME}"] Preprocessing JSON Path: `$.sharerowexclusive`
DB [{#DBNAME}]: Num of shareupdateexclusive locks	Number of shareupdateexclusive locks for this database.	Dependent item	pgsql.locks.shareupdateexclusive["{#DBNAME}"] Preprocessing JSON Path: `$.shareupdateexclusive`
DB [{#DBNAME}]: Num of share locks	Number of share locks for this database.	Dependent item	pgsql.locks.share["{#DBNAME}"] Preprocessing JSON Path: `$.share`
DB [{#DBNAME}]: Num of locks total	Total number of locks in this database.	Dependent item	pgsql.locks.total["{#DBNAME}"] Preprocessing JSON Path: `$.total`
DB [{#DBNAME}]: Queries max maintenance time	Max maintenance query time for this database.	Dependent item	pgsql.queries.mro.time_max["{#DBNAME}"] Preprocessing JSON Path: `$.mro_time_max`
DB [{#DBNAME}]: Queries max query time	Max query time for this database.	Dependent item	pgsql.queries.query.time_max["{#DBNAME}"] Preprocessing JSON Path: `$.query_time_max`
DB [{#DBNAME}]: Queries max transaction time	Max transaction query time for this database.	Dependent item	pgsql.queries.tx.time_max["{#DBNAME}"] Preprocessing JSON Path: `$.tx_time_max`
DB [{#DBNAME}]: Queries slow maintenance count	Slow maintenance query count for this database.	Dependent item	pgsql.queries.mro.slow_count["{#DBNAME}"] Preprocessing JSON Path: `$.mro_slow_count`
DB [{#DBNAME}]: Queries slow query count	Slow query count for this database.	Dependent item	pgsql.queries.query.slow_count["{#DBNAME}"] Preprocessing JSON Path: `$.query_slow_count`
DB [{#DBNAME}]: Queries slow transaction count	Slow transaction query count for this database.	Dependent item	pgsql.queries.tx.slow_count["{#DBNAME}"] Preprocessing JSON Path: `$.tx_slow_count`
DB [{#DBNAME}]: Queries sum maintenance time	Sum maintenance query time for this database.	Dependent item	pgsql.queries.mro.time_sum["{#DBNAME}"] Preprocessing JSON Path: `$.mro_time_sum`
DB [{#DBNAME}]: Queries sum query time	Sum query time for this database.	Dependent item	pgsql.queries.query.time_sum["{#DBNAME}"] Preprocessing JSON Path: `$.query_time_sum`
DB [{#DBNAME}]: Queries sum transaction time	Sum transaction query time for this database.	Dependent item	pgsql.queries.tx.time_sum["{#DBNAME}"] Preprocessing JSON Path: `$.tx_time_sum`

Trigger prototypes for Database discovery

Name	Description	Expression
DB [{#DBNAME}]: Too many recovery conflicts	The primary and standby servers are in many ways loosely connected. Actions on the primary will have an effect on the standby. As a result, there is potential for negative interactions or conflicts between them. https://www.postgresql.org/docs/current/hot-standby.html#HOT-STANDBY-CONFLICT	`min(/PostgreSQL by ODBC/pgsql.dbstat.conflicts.rate["{#DBNAME}"],5m) > {$PG.CONFLICTS.MAX.WARN:"{#DBNAME}"}`\|Average
DB [{#DBNAME}]: Deadlock occurred	Number of deadlocks detected per second exceeds {$PG.DEADLOCKS.MAX.WARN:"{#DBNAME}"} for 5m.	`min(/PostgreSQL by ODBC/pgsql.dbstat.deadlocks.rate["{#DBNAME}"],5m) > {$PG.DEADLOCKS.MAX.WARN:"{#DBNAME}"}`\|High
DB [{#DBNAME}]: Too many slow queries	The number of detected slow queries exceeds the limit of {$PG.SLOW_QUERIES.MAX.WARN:"{#DBNAME}"}.	`min(/PostgreSQL by ODBC/pgsql.queries.query.slow_count["{#DBNAME}"],5m)>{$PG.SLOW_QUERIES.MAX.WARN:"{#DBNAME}"}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

db_postgresql_agent2

View README Download JSON

PostgreSQL by Zabbix agent 2

Overview

This template is designed for the deployment of PostgreSQL monitoring by Zabbix via Zabbix agent 2 and uses a loadable plugin to run SQL queries.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

PostgreSQL 10-15

Configuration

Setup

Deploy Zabbix agent 2 with the PostgreSQL plugin. Starting with Zabbix versions 6.0.10 / 6.2.4 / 6.4 PostgreSQL metrics are moved to a loadable plugin and require installation of a separate package or compilation of the plugin from sources.
Create the PostgreSQL user for monitoring (<password> at your discretion) and inherit permissions from the default role pg_monitor:

CREATE USER zbx_monitor WITH PASSWORD '<PASSWORD>' INHERIT;
GRANT pg_monitor TO zbx_monitor;

Edit the pg_hba.conf configuration file to allow connections for the user zbx_monitor. For example, you could add one of the following rows to allow local TCP connections from the same host:

# TYPE  DATABASE        USER            ADDRESS                 METHOD
  host       all        zbx_monitor     localhost               trust
  host       all        zbx_monitor     127.0.0.1/32            md5
  host       all        zbx_monitor     ::1/128                 scram-sha-256

For more information please read the PostgreSQL documentation https://www.postgresql.org/docs/current/auth-pg-hba-conf.html.

Set the connection string for the PostgreSQL instance in the {$PG.CONNSTRING.AGENT2} macro as URI, such as <protocol(host:port)>, or specify the named session - <sessionname>.

Note: if you want to use SSL/TLS encryption to protect communications with the remote PostgreSQL instance, a named session must be used. In that case, the instance URI should be specified in the Plugins.PostgreSQL.Sessions.*.Uri parameter in the PostgreSQL plugin configuration files alongside all the encryption parameters (type, cerfiticate/key filepaths if needed etc.).

You can check the PostgreSQL plugin documentation for details about agent plugin parameters and named sessions.

Also, it is assumed that you set up the PostgreSQL instance to work in the desired encryption mode. Check the PostgreSQL documentation for details.

Note: plugin TLS certificate validation relies on checking the Subject Alternative Names (SAN) instead of the Common Name (CN), check the cryptography package documentation for details.

For example, to enable required encryption in transport mode without identity checks you could create the file /etc/zabbix/zabbix_agent2.d/postgresql_myconn.conf with the following configuration for the named session myconn (replace <instanceip> with the address of the PostgreSQL instance):

Plugins.PostgreSQL.Sessions.myconn.Uri=tcp://<instanceip>:5432
Plugins.PostgreSQL.Sessions.myconn.TLSConnect=required

Then set the {$PG.CONNSTRING.AGENT2} macro to myconn to use this named session.

Set the password that you specified in step 2 in the macro {$PG.PASSWORD}.

Macros used

Name	Description	Default
{$PG.PASSWORD}	PostgreSQL user password.	`<Put the password here>`
{$PG.CONNSTRING.AGENT2}	URI or named session of the PostgreSQL instance.	`tcp://localhost:5432`
{$PG.USER}	PostgreSQL username.	`zbx_monitor`
{$PG.LLD.FILTER.DBNAME}	Filter of discoverable databases.	`.+`
{$PG.CONNTOTALPCT.MAX.WARN}	Maximum percentage of current connections for trigger expression.	`90`
{$PG.DATABASE}	Default PostgreSQL database for the connection.	`postgres`
{$PG.DEADLOCKS.MAX.WARN}	Maximum number of detected deadlocks for trigger expression.	`0`
{$PG.LLD.FILTER.APPLICATION}	Filter of discoverable applications.	`.+`
{$PG.CONFLICTS.MAX.WARN}	Maximum number of recovery conflicts for trigger expression.	`0`
{$PG.QUERY_ETIME.MAX.WARN}	Execution time limit for count of slow queries.	`30`
{$PG.SLOW_QUERIES.MAX.WARN}	Slow queries count threshold for a trigger.	`5`

Items

Name	Description	Type	Key and additional info
PostgreSQL: Get bgwriter	Collect all metrics from pgstatbgwriter: https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-BGWRITER-VIEW	Zabbix agent	pgsql.bgwriter["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Get archive	Collect archive status metrics.	Zabbix agent	pgsql.archive["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Get dbstat	Collect all metrics from pgstatdatabase per database: https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-DATABASE-VIEW	Zabbix agent	pgsql.dbstat["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Get dbstat sum	Collect all metrics from pgstatdatabase as sums for all databases: https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-DATABASE-VIEW	Zabbix agent	pgsql.dbstat.sum["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Get connections sum	Collect all metrics from pgstatactivity: https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW	Zabbix agent	pgsql.connections["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Get WAL	Collect write-ahead log (WAL) metrics.	Zabbix agent	pgsql.wal.stat["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Get locks	Collect all metrics from pg_locks per database: https://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-TABLES	Zabbix agent	pgsql.locks["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Custom queries	Execute custom queries from file *.sql (check for option Plugins.Postgres.CustomQueriesPath at agent configuration).	Zabbix agent	pgsql.custom.query["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}",""]
PostgreSQL: Get replication	Collect metrics from the pgstatreplication, which contains information about the WAL sender process, showing statistics about replication to that sender's connected standby server.	Zabbix agent	pgsql.replication.process["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Get queries	Collect all metrics by query execution time.	Zabbix agent	pgsql.queries["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}","{$PG.QUERY_ETIME.MAX.WARN}"]
PostgreSQL: Version	PostgreSQL version.	Zabbix agent	pgsql.version["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] Preprocessing Discard unchanged with heartbeat: `1d`
WAL: Bytes written	WAL write, in bytes.	Dependent item	pgsql.wal.write Preprocessing JSON Path: `$.write` Change per second
WAL: Bytes received	WAL receive, in bytes.	Dependent item	pgsql.wal.receive Preprocessing JSON Path: `$.receive` Change per second
WAL: Segments count	Number of WAL segments.	Dependent item	pgsql.wal.count Preprocessing JSON Path: `$.count`
Bgwriter: Buffers allocated per second	Number of buffers allocated per second.	Dependent item	pgsql.bgwriter.buffers_alloc.rate Preprocessing JSON Path: `$.buffers_alloc` Change per second
Bgwriter: Buffers written directly by a backend per second	Number of buffers written directly by a backend per second.	Dependent item	pgsql.bgwriter.buffers_backend.rate Preprocessing JSON Path: `$.buffers_backend` Change per second
Bgwriter: Number of bgwriter cleaning scan stopped per second	Number of times the background writer stopped a cleaning scan because it had written too many buffers per second.	Dependent item	pgsql.bgwriter.maxwritten_clean.rate Preprocessing JSON Path: `$.maxwritten_clean` Change per second
Bgwriter: Times a backend executed its own fsync per second	Number of times a backend had to execute its own fsync call per second (normally the background writer handles those even when the backend does its own write).	Dependent item	pgsql.bgwriter.buffersbackendfsync.rate Preprocessing JSON Path: `$.buffers_backend_fsync` Change per second
Checkpoint: Buffers written by the background writer per second	Number of buffers written by the background writer per second.	Dependent item	pgsql.bgwriter.buffers_clean.rate Preprocessing JSON Path: `$.buffers_clean` Change per second
Checkpoint: Buffers written during checkpoints per second	Number of buffers written during checkpoints per second.	Dependent item	pgsql.bgwriter.buffers_checkpoint.rate Preprocessing JSON Path: `$.buffers_checkpoint` Change per second
Checkpoint: Scheduled per second	Number of scheduled checkpoints that have been performed per second.	Dependent item	pgsql.bgwriter.checkpoints_timed.rate Preprocessing JSON Path: `$.checkpoints_timed` Change per second
Checkpoint: Requested per second	Number of requested checkpoints that have been performed per second.	Dependent item	pgsql.bgwriter.checkpoints_req.rate Preprocessing JSON Path: `$.checkpoints_req` Change per second
Checkpoint: Checkpoint write time per second	Total amount of time per second that has been spent in the portion of checkpoint processing where files are written to disk.	Dependent item	pgsql.bgwriter.checkpointwritetime.rate Preprocessing JSON Path: `$.checkpoint_write_time` Custom multiplier: `0.001` Change per second
Checkpoint: Checkpoint sync time per second	Total amount of time per second that has been spent in the portion of checkpoint processing where files are synchronized to disk.	Dependent item	pgsql.bgwriter.checkpointsynctime.rate Preprocessing JSON Path: `$.checkpoint_sync_time` Custom multiplier: `0.001` Change per second
Archive: Count of archived files	Count of archived files.	Dependent item	pgsql.archive.countarchivedfiles Preprocessing JSON Path: `$.archived_count`
Archive: Count of failed attempts to archive files	Count of failed attempts to archive files.	Dependent item	pgsql.archive.failedtryingto_archive Preprocessing JSON Path: `$.failed_count`
Archive: Count of files in archive_status need to archive	Count of files to archive.	Dependent item	pgsql.archive.countfilesto_archive Preprocessing JSON Path: `$.count_files`
Archive: Size of files need to archive	Size of files to archive.	Dependent item	pgsql.archive.sizefilesto_archive Preprocessing JSON Path: `$.size_files`
Dbstat: Blocks read time	Time spent reading data file blocks by backends.	Dependent item	pgsql.dbstat.sum.blkreadtime Preprocessing JSON Path: `$.blk_read_time` Custom multiplier: `0.001`
Dbstat: Blocks write time	Time spent writing data file blocks by backends.	Dependent item	pgsql.dbstat.sum.blkwritetime Preprocessing JSON Path: `$.blk_write_time` Custom multiplier: `0.001`
Dbstat: Checksum failures per second	Number of data page checksum failures per second detected (or on a shared object), or NULL if data checksums are not enabled. This metric is available since PostgreSQL 12.	Dependent item	pgsql.dbstat.sum.checksum_failures.rate Preprocessing JSON Path: `$.checksum_failures` Matches regular expression: `^\d*$` ⛔️Custom on fail: Set value to: `-2` Change per second ⛔️Custom on fail: Set value to: `-1`
Dbstat: Committed transactions per second	Number of transactions that have been committed per second.	Dependent item	pgsql.dbstat.sum.xact_commit.rate Preprocessing JSON Path: `$.xact_commit` Change per second
Dbstat: Conflicts per second	Number of queries canceled per second due to conflicts with recovery (conflicts occur only on standby servers; see pgstatdatabase_conflicts for details).	Dependent item	pgsql.dbstat.sum.conflicts.rate Preprocessing JSON Path: `$.conflicts` Change per second
Dbstat: Deadlocks per second	Number of deadlocks detected per second.	Dependent item	pgsql.dbstat.sum.deadlocks.rate Preprocessing JSON Path: `$.deadlocks` Change per second
Dbstat: Disk blocks read per second	Number of disk blocks read per second.	Dependent item	pgsql.dbstat.sum.blks_read.rate Preprocessing JSON Path: `$.blks_read` Change per second
Dbstat: Hit blocks read per second	Number of times per second disk blocks were found already in the buffer cache	Dependent item	pgsql.dbstat.sum.blks_hit.rate Preprocessing JSON Path: `$.blks_hit` Change per second
Dbstat: Number temp bytes per second	Total amount of data written per second to temporary files by queries.	Dependent item	pgsql.dbstat.sum.temp_bytes.rate Preprocessing JSON Path: `$.temp_bytes` Change per second
Dbstat: Number temp files per second	Number of temporary files created by queries per second.	Dependent item	pgsql.dbstat.sum.temp_files.rate Preprocessing JSON Path: `$.temp_files` Change per second
Dbstat: Roll backed transactions per second	Number of transactions that have been rolled back per second.	Dependent item	pgsql.dbstat.sum.xact_rollback.rate Preprocessing JSON Path: `$.xact_rollback` Change per second
Dbstat: Rows deleted per second	Number of rows deleted by queries per second.	Dependent item	pgsql.dbstat.sum.tup_deleted.rate Preprocessing JSON Path: `$.tup_deleted` Change per second
Dbstat: Rows fetched per second	Number of rows fetched by queries per second.	Dependent item	pgsql.dbstat.sum.tup_fetched.rate Preprocessing JSON Path: `$.tup_fetched` Change per second
Dbstat: Rows inserted per second	Number of rows inserted by queries per second.	Dependent item	pgsql.dbstat.sum.tup_inserted.rate Preprocessing JSON Path: `$.tup_inserted` Change per second
Dbstat: Rows returned per second	Number of rows returned by queries per second.	Dependent item	pgsql.dbstat.sum.tup_returned.rate Preprocessing JSON Path: `$.tup_returned` Change per second
Dbstat: Rows updated per second	Number of rows updated by queries per second.	Dependent item	pgsql.dbstat.sum.tup_updated.rate Preprocessing JSON Path: `$.tup_updated` Change per second
Dbstat: Backends connected	Number of connected backends.	Dependent item	pgsql.dbstat.sum.numbackends Preprocessing JSON Path: `$.numbackends`
Connections sum: Active	Total number of connections executing a query.	Dependent item	pgsql.connections.sum.active Preprocessing JSON Path: `$.active`
Connections sum: Fastpath function call	Total number of connections executing a fast-path function.	Dependent item	pgsql.connections.sum.fastpathfunctioncall Preprocessing JSON Path: `$.fastpath_function_call`
Connections sum: Idle	Total number of connections waiting for a new client command.	Dependent item	pgsql.connections.sum.idle Preprocessing JSON Path: `$.idle`
Connections sum: Idle in transaction	Total number of connections in a transaction state but not executing a query.	Dependent item	pgsql.connections.sum.idleintransaction Preprocessing JSON Path: `$.idle_in_transaction`
Connections sum: Prepared	Total number of prepared transactions: https://www.postgresql.org/docs/current/sql-prepare-transaction.html	Dependent item	pgsql.connections.sum.prepared Preprocessing JSON Path: `$.prepared`
Connections sum: Total	Total number of connections.	Dependent item	pgsql.connections.sum.total Preprocessing JSON Path: `$.total`
Connections sum: Total, %	Total number of connections, in percentage.	Dependent item	pgsql.connections.sum.total_pct Preprocessing JSON Path: `$.total_pct`
Connections sum: Waiting	Total number of waiting connections: https://www.postgresql.org/docs/current/monitoring-stats.html#WAIT-EVENT-TABLE	Dependent item	pgsql.connections.sum.waiting Preprocessing JSON Path: `$.waiting`
Connections sum: Idle in transaction (aborted)	Total number of connections in a transaction state but not executing a query, and where one of the statements in the transaction caused an error.	Dependent item	pgsql.connections.sum.idleintransaction_aborted Preprocessing JSON Path: `$.idle_in_transaction_aborted`
Connections sum: Disabled	Total number of disabled connections.	Dependent item	pgsql.connections.sum.disabled Preprocessing JSON Path: `$.disabled`
PostgreSQL: Age of oldest xid	Age of oldest xid.	Zabbix agent	pgsql.oldest.xid["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Count of autovacuum workers	Number of autovacuum workers.	Zabbix agent	pgsql.autovacuum.count["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Cache hit ratio, %	Cache hit ratio.	Calculated	pgsql.cache.hit
PostgreSQL: Uptime	Time since the server started.	Zabbix agent	pgsql.uptime["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Replication: Lag in bytes	Replication lag with master, in bytes.	Zabbix agent	pgsql.replication.lag.b["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Replication: Lag in seconds	Replication lag with master, in seconds.	Zabbix agent	pgsql.replication.lag.sec["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Replication: Recovery role	Replication role: 1 — recovery is still in progress (standby mode), 0 — master mode.	Zabbix agent	pgsql.replication.recovery_role["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Replication: Standby count	Number of standby servers.	Zabbix agent	pgsql.replication.count["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Replication: Status	Replication status: 0 — streaming is down, 1 — streaming is up, 2 — master mode.	Zabbix agent	pgsql.replication.status["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Ping	Used to test a connection to see if it is alive. It is set to 0 if the query is unsuccessful.	Zabbix agent	pgsql.ping["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] Preprocessing Discard unchanged with heartbeat: `1h`

Triggers

Name	Description	Expression
PostgreSQL: Version has changed		`last(/PostgreSQL by Zabbix agent 2/pgsql.version["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"],#1)<>last(/PostgreSQL by Zabbix agent 2/pgsql.version["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"],#2) and length(last(/PostgreSQL by Zabbix agent 2/pgsql.version["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]))>0`\|Info
Dbstat: Checksum failures detected	Data page checksum failures were detected on that DB instance: https://www.postgresql.org/docs/current/checksums.html	`last(/PostgreSQL by Zabbix agent 2/pgsql.dbstat.sum.checksum_failures.rate)>0`\|Average
PostgreSQL: Total number of connections is too high	Total number of current connections exceeds the limit of {$PG.CONNTOTALPCT.MAX.WARN}% out of the maximum number of concurrent connections to the database server (the "max_connections" setting).	`min(/PostgreSQL by Zabbix agent 2/pgsql.connections.sum.total_pct,5m) > {$PG.CONN_TOTAL_PCT.MAX.WARN}`\|Average
PostgreSQL: Oldest xid is too big		`last(/PostgreSQL by Zabbix agent 2/pgsql.oldest.xid["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]) > 18000000`\|Average
PostgreSQL: Service has been restarted	PostgreSQL uptime is less than 10 minutes.	`last(/PostgreSQL by Zabbix agent 2/pgsql.uptime["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]) < 10m`\|Average
PostgreSQL: Service is down	Last test of a connection was unsuccessful.	`last(/PostgreSQL by Zabbix agent 2/pgsql.ping["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"])=0`\|High

LLD rule Replication discovery

Name	Description	Type	Key and additional info
Replication discovery	Discovers replication lag metrics.	Zabbix agent	pgsql.replication.process.discovery["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]

Item prototypes for Replication discovery

Name	Description	Type	Key and additional info
Application [{#APPLICATION_NAME}]: Get replication	Collect metrics from the "pgstatreplication" about the application "{#APPLICATION_NAME}" that is connected to this WAL sender, which contains information about the WAL sender process, showing statistics about replication to that sender's connected standby server.	Dependent item	pgsql.replication.getmetrics["{#APPLICATIONNAME}"] Preprocessing JSON Path: `$['{#APPLICATION_NAME}']` ⛔️Custom on fail: Discard value
Application [{#APPLICATION_NAME}]: Replication flush lag		Dependent item	pgsql.replication.process.flushlag["{#APPLICATIONNAME}"] Preprocessing JSON Path: `$.flush_lag`
Application [{#APPLICATION_NAME}]: Replication replay lag		Dependent item	pgsql.replication.process.replaylag["{#APPLICATIONNAME}"] Preprocessing JSON Path: `$.replay_lag`
Application [{#APPLICATION_NAME}]: Replication write lag		Dependent item	pgsql.replication.process.writelag["{#APPLICATIONNAME}"] Preprocessing JSON Path: `$.write_lag`

LLD rule Database discovery

Name

Description

Type

Key and additional info

Database discovery

Discovers databases (DB) in the database management system (DBMS), except:

- templates;

- DBs that do not allow connections.

Zabbix agent

pgsql.db.discovery["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]

Item prototypes for Database discovery

Name	Description	Type	Key and additional info
DB [{#DBNAME}]: Get dbstat	Get dbstat metrics for database "{#DBNAME}".	Dependent item	pgsql.dbstat.get_metrics["{#DBNAME}"] Preprocessing JSON Path: `$['{#DBNAME}']` ⛔️Custom on fail: Discard value
DB [{#DBNAME}]: Get locks	Get locks metrics for database "{#DBNAME}".	Dependent item	pgsql.locks.get_metrics["{#DBNAME}"] Preprocessing JSON Path: `$['{#DBNAME}']` ⛔️Custom on fail: Discard value
DB [{#DBNAME}]: Get queries	Get queries metrics for database "{#DBNAME}".	Dependent item	pgsql.queries.get_metrics["{#DBNAME}"] Preprocessing JSON Path: `$['{#DBNAME}']` ⛔️Custom on fail: Discard value
DB [{#DBNAME}]: Database age	Database age.	Zabbix agent	pgsql.db.age["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{#DBNAME}"]
DB [{#DBNAME}]: Bloating tables	Number of bloating tables.	Zabbix agent	pgsql.db.bloating_tables["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{#DBNAME}"]
DB [{#DBNAME}]: Database size	Database size.	Zabbix agent	pgsql.db.size["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{#DBNAME}"]
DB [{#DBNAME}]: Blocks hit per second	Total number of times per second disk blocks were found already in the buffer cache, so that a read was not necessary.	Dependent item	pgsql.dbstat.blks_hit.rate["{#DBNAME}"] Preprocessing JSON Path: `$.blks_hit` Change per second
DB [{#DBNAME}]: Disk blocks read per second	Total number of disk blocks read per second in this database.	Dependent item	pgsql.dbstat.blks_read.rate["{#DBNAME}"] Preprocessing JSON Path: `$.blks_read` Change per second
DB [{#DBNAME}]: Detected conflicts per second	Total number of queries canceled due to conflicts with recovery in this database per second.	Dependent item	pgsql.dbstat.conflicts.rate["{#DBNAME}"] Preprocessing JSON Path: `$.conflicts` Change per second
DB [{#DBNAME}]: Detected deadlocks per second	Total number of detected deadlocks in this database per second.	Dependent item	pgsql.dbstat.deadlocks.rate["{#DBNAME}"] Preprocessing JSON Path: `$.deadlocks` Change per second
DB [{#DBNAME}]: Temp_bytes written per second	Total amount of data written to temporary files by queries in this database.	Dependent item	pgsql.dbstat.temp_bytes.rate["{#DBNAME}"] Preprocessing JSON Path: `$.temp_bytes` Change per second
DB [{#DBNAME}]: Temp_files created per second	Total number of temporary files created by queries in this database.	Dependent item	pgsql.dbstat.temp_files.rate["{#DBNAME}"] Preprocessing JSON Path: `$.temp_files` Change per second
DB [{#DBNAME}]: Tuples deleted per second	Total number of rows deleted by queries in this database per second.	Dependent item	pgsql.dbstat.tup_deleted.rate["{#DBNAME}"] Preprocessing JSON Path: `$.tup_deleted` Change per second
DB [{#DBNAME}]: Tuples fetched per second	Total number of rows fetched by queries in this database per second.	Dependent item	pgsql.dbstat.tup_fetched.rate["{#DBNAME}"] Preprocessing JSON Path: `$.tup_fetched` Change per second
DB [{#DBNAME}]: Tuples inserted per second	Total number of rows inserted by queries in this database per second.	Dependent item	pgsql.dbstat.tup_inserted.rate["{#DBNAME}"] Preprocessing JSON Path: `$.tup_inserted` Change per second
DB [{#DBNAME}]: Tuples returned per second	Number of rows returned by queries in this database per second.	Dependent item	pgsql.dbstat.tup_returned.rate["{#DBNAME}"] Preprocessing JSON Path: `$.tup_returned` Change per second
DB [{#DBNAME}]: Tuples updated per second	Total number of rows updated by queries in this database per second.	Dependent item	pgsql.dbstat.tup_updated.rate["{#DBNAME}"] Preprocessing JSON Path: `$.tup_updated` Change per second
DB [{#DBNAME}]: Commits per second	Number of transactions in this database that have been committed per second.	Dependent item	pgsql.dbstat.xact_commit.rate["{#DBNAME}"] Preprocessing JSON Path: `$.xact_commit` Change per second
DB [{#DBNAME}]: Rollbacks per second	Total number of transactions in this database that have been rolled back.	Dependent item	pgsql.dbstat.xact_rollback.rate["{#DBNAME}"] Preprocessing JSON Path: `$.xact_rollback` Change per second
DB [{#DBNAME}]: Backends connected	Number of backends currently connected to this database.	Dependent item	pgsql.dbstat.numbackends["{#DBNAME}"] Preprocessing JSON Path: `$.numbackends`
DB [{#DBNAME}]: Checksum failures	Number of data page checksum failures detected in this database.	Dependent item	pgsql.dbstat.checksum_failures.rate["{#DBNAME}"] Preprocessing JSON Path: `$.checksum_failures` Matches regular expression: `^\d*$` ⛔️Custom on fail: Set value to: `-2` Change per second ⛔️Custom on fail: Set value to: `-1`
DB [{#DBNAME}]: Disk blocks read time per second	Time spent reading data file blocks by backends per second.	Dependent item	pgsql.dbstat.blkreadtime.rate["{#DBNAME}"] Preprocessing JSON Path: `$.blk_read_time` Custom multiplier: `0.001` Change per second
DB [{#DBNAME}]: Disk blocks write time	Time spent writing data file blocks by backends per second.	Dependent item	pgsql.dbstat.blkwritetime.rate["{#DBNAME}"] Preprocessing JSON Path: `$.blk_write_time` Custom multiplier: `0.001` Change per second
DB [{#DBNAME}]: Num of accessexclusive locks	Number of accessexclusive locks for this database.	Dependent item	pgsql.locks.accessexclusive["{#DBNAME}"] Preprocessing JSON Path: `$.accessexclusive`
DB [{#DBNAME}]: Num of accessshare locks	Number of accessshare locks for this database.	Dependent item	pgsql.locks.accessshare["{#DBNAME}"] Preprocessing JSON Path: `$.accessshare`
DB [{#DBNAME}]: Num of exclusive locks	Number of exclusive locks for this database.	Dependent item	pgsql.locks.exclusive["{#DBNAME}"] Preprocessing JSON Path: `$.exclusive`
DB [{#DBNAME}]: Num of rowexclusive locks	Number of rowexclusive locks for this database.	Dependent item	pgsql.locks.rowexclusive["{#DBNAME}"] Preprocessing JSON Path: `$.rowexclusive`
DB [{#DBNAME}]: Num of rowshare locks	Number of rowshare locks for this database.	Dependent item	pgsql.locks.rowshare["{#DBNAME}"] Preprocessing JSON Path: `$.rowshare` ⛔️Custom on fail: Discard value
DB [{#DBNAME}]: Num of sharerowexclusive locks	Number of total sharerowexclusive for this database.	Dependent item	pgsql.locks.sharerowexclusive["{#DBNAME}"] Preprocessing JSON Path: `$.sharerowexclusive`
DB [{#DBNAME}]: Num of shareupdateexclusive locks	Number of shareupdateexclusive locks for this database.	Dependent item	pgsql.locks.shareupdateexclusive["{#DBNAME}"] Preprocessing JSON Path: `$.shareupdateexclusive`
DB [{#DBNAME}]: Num of share locks	Number of share locks for this database.	Dependent item	pgsql.locks.share["{#DBNAME}"] Preprocessing JSON Path: `$.share`
DB [{#DBNAME}]: Num of locks total	Total number of locks in this database.	Dependent item	pgsql.locks.total["{#DBNAME}"] Preprocessing JSON Path: `$.total`
DB [{#DBNAME}]: Queries max maintenance time	Max maintenance query time for this database.	Dependent item	pgsql.queries.mro.time_max["{#DBNAME}"] Preprocessing JSON Path: `$.mro_time_max`
DB [{#DBNAME}]: Queries max query time	Max query time for this database.	Dependent item	pgsql.queries.query.time_max["{#DBNAME}"] Preprocessing JSON Path: `$.query_time_max`
DB [{#DBNAME}]: Queries max transaction time	Max transaction query time for this database.	Dependent item	pgsql.queries.tx.time_max["{#DBNAME}"] Preprocessing JSON Path: `$.tx_time_max`
DB [{#DBNAME}]: Queries slow maintenance count	Slow maintenance query count for this database.	Dependent item	pgsql.queries.mro.slow_count["{#DBNAME}"] Preprocessing JSON Path: `$.mro_slow_count`
DB [{#DBNAME}]: Queries slow query count	Slow query count for this database.	Dependent item	pgsql.queries.query.slow_count["{#DBNAME}"] Preprocessing JSON Path: `$.query_slow_count`
DB [{#DBNAME}]: Queries slow transaction count	Slow transaction query count for this database.	Dependent item	pgsql.queries.tx.slow_count["{#DBNAME}"] Preprocessing JSON Path: `$.tx_slow_count`
DB [{#DBNAME}]: Queries sum maintenance time	Sum maintenance query time for this database.	Dependent item	pgsql.queries.mro.time_sum["{#DBNAME}"] Preprocessing JSON Path: `$.mro_time_sum`
DB [{#DBNAME}]: Queries sum query time	Sum query time for this database.	Dependent item	pgsql.queries.query.time_sum["{#DBNAME}"] Preprocessing JSON Path: `$.query_time_sum`
DB [{#DBNAME}]: Queries sum transaction time	Sum transaction query time for this database.	Dependent item	pgsql.queries.tx.time_sum["{#DBNAME}"] Preprocessing JSON Path: `$.tx_time_sum`

Trigger prototypes for Database discovery

Name	Description	Expression
DB [{#DBNAME}]: Too many recovery conflicts	The primary and standby servers are in many ways loosely connected. Actions on the primary will have an effect on the standby. As a result, there is potential for negative interactions or conflicts between them. https://www.postgresql.org/docs/current/hot-standby.html#HOT-STANDBY-CONFLICT	`min(/PostgreSQL by Zabbix agent 2/pgsql.dbstat.conflicts.rate["{#DBNAME}"],5m) > {$PG.CONFLICTS.MAX.WARN:"{#DBNAME}"}`\|Average
DB [{#DBNAME}]: Deadlock occurred	Number of deadlocks detected per second exceeds {$PG.DEADLOCKS.MAX.WARN:"{#DBNAME}"} for 5m.	`min(/PostgreSQL by Zabbix agent 2/pgsql.dbstat.deadlocks.rate["{#DBNAME}"],5m) > {$PG.DEADLOCKS.MAX.WARN:"{#DBNAME}"}`\|High
DB [{#DBNAME}]: Checksum failures detected	Data page checksum failures were detected on that database: https://www.postgresql.org/docs/current/checksums.html	`last(/PostgreSQL by Zabbix agent 2/pgsql.dbstat.checksum_failures.rate["{#DBNAME}"])>0`\|Average
DB [{#DBNAME}]: Too many slow queries	The number of detected slow queries exceeds the limit of {$PG.SLOW_QUERIES.MAX.WARN:"{#DBNAME}"}.	`min(/PostgreSQL by Zabbix agent 2/pgsql.queries.query.slow_count["{#DBNAME}"],5m)>{$PG.SLOW_QUERIES.MAX.WARN:"{#DBNAME}"}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

db_postgresql

View README Download JSON

PostgreSQL by Zabbix agent

Overview

This template is designed for the deployment of PostgreSQL monitoring by Zabbix via Zabbix agent and uses user parameters to run SQL queries with the psql command-line tool.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

PostgreSQL 10-15

Configuration

Setup

Note:

The template requires pg_isready and psql utilities to be installed on the same host with Zabbix agent.

Deploy Zabbix agent and create the PostgreSQL user for monitoring (<password> at your discretion) with proper access rights to your PostgreSQL instance.

For PostgreSQL version 10 and above:

CREATE USER zbx_monitor WITH PASSWORD '<PASSWORD>' INHERIT;
GRANT pg_monitor TO zbx_monitor;

For PostgreSQL version 9.6 and below:

CREATE USER zbx_monitor WITH PASSWORD '<PASSWORD>';
GRANT SELECT ON pg_stat_database TO zbx_monitor;

-- To collect WAL metrics, the user must have a `superuser` role.
ALTER USER zbx_monitor WITH SUPERUSER;

Copy the postgresql/ directory to the zabbix user home directory - /var/lib/zabbix/. The postgresql/ directory contains the files with SQL queries needed to obtain metrics from PostgreSQL instance.

If the home directory of the zabbix user doesn't exist, create it first:

mkdir -m u=rwx,g=rwx,o= -p /var/lib/zabbix
chown zabbix:zabbix /var/lib/zabbix

Copy the template_db_postgresql.conf file, containing user parameters, to the Zabbix agent configuration directory /etc/zabbix/zabbix_agentd.d/ and restart Zabbix agent service.

Note: if you want to use SSL/TLS encryption to protect communications with the remote PostgreSQL instance, you can modify the connection string in user parameters. For example, to enable required encryption in transport mode without identity checks you could append ?sslmode=required to the end of the connection string for all keys that use psql:

UserParameter=pgsql.bgwriter[*], psql -qtAX postgresql://"$3":"$4"@"$1":"$2"/"$5"?sslmode=required -f "/var/lib/zabbix/postgresql/pgsql.bgwriter.sql"

Consult the PostgreSQL documentation about protection modes and client connection parameters.

Also, it is assumed that you set up the PostgreSQL instance to work in the desired encryption mode. Check the PostgreSQL documentation for details.

Edit the pg_hba.conf configuration file to allow connections for the user zbx_monitor. For example, you could add one of the following rows to allow local TCP connections from the same host:

# TYPE  DATABASE        USER            ADDRESS                 METHOD
  host       all        zbx_monitor     localhost               trust
  host       all        zbx_monitor     127.0.0.1/32            md5
  host       all        zbx_monitor     ::1/128                 scram-sha-256

For more information please read the PostgreSQL documentation https://www.postgresql.org/docs/current/auth-pg-hba-conf.html.

Specify the host name or IP address in the {$PG.HOST} macro. Adjust the port number with {$PG.PORT} macro if needed.
Set the password that you specified in step 1 in the macro {$PG.PASSWORD}.

Macros used

Name	Description	Default
{$PG.CACHE_HITRATIO.MIN.WARN}	Minimum cache hit ratio percentage for trigger expression.	`90`
{$PG.CHECKPOINTS_REQ.MAX.WARN}	Maximum required checkpoint occurrences for trigger expression.	`5`
{$PG.CONFLICTS.MAX.WARN}	Maximum number of recovery conflicts for trigger expression.	`0`
{$PG.CONNTOTALPCT.MAX.WARN}	Maximum percentage of current connections for trigger expression.	`90`
{$PG.DATABASE}	Default PostgreSQL database for the connection.	`postgres`
{$PG.DEADLOCKS.MAX.WARN}	Maximum number of detected deadlocks for trigger expression.	`0`
{$PG.FROZENXIDPCTSTOP.MIN.HIGH}	Minimum frozen XID before stop percentage for trigger expression.	`75`
{$PG.HOST}	Hostname or IP of PostgreSQL host.	`localhost`
{$PG.LLD.FILTER.DBNAME}	Filter of discoverable databases.	`.+`
{$PG.LOCKS.MAX.WARN}	Maximum number of locks for trigger expression.	`100`
{$PG.PING_TIME.MAX.WARN}	Maximum time of connection response for trigger expression.	`1s`
{$PG.PORT}	PostgreSQL service port.	`5432`
{$PG.QUERY_ETIME.MAX.WARN}	Execution time limit for count of slow queries.	`30`
{$PG.REPL_LAG.MAX.WARN}	Maximum replication lag time for trigger expression.	`10m`
{$PG.SLOW_QUERIES.MAX.WARN}	Slow queries count threshold for a trigger.	`5`
{$PG.USER}	PostgreSQL username.	`zbx_monitor`
{$PG.PASSWORD}	PostgreSQL user password.	`<Put the password here>`

Items

Name	Description	Type	Key and additional info
Bgwriter: Buffers allocated per second	Number of buffers allocated per second.	Dependent item	pgsql.bgwriter.buffers_alloc.rate Preprocessing JSON Path: `$.buffers_alloc` Change per second
Bgwriter: Buffers written directly by a backend per second	Number of buffers written directly by a backend per second.	Dependent item	pgsql.bgwriter.buffers_backend.rate Preprocessing JSON Path: `$.buffers_backend` Change per second
Bgwriter: Times a backend executed its own fsync per second	Number of times a backend had to execute its own fsync call per second (normally the background writer handles those even when the backend does its own write).	Dependent item	pgsql.bgwriter.buffersbackendfsync.rate Preprocessing JSON Path: `$.buffers_backend_fsync` Change per second
Checkpoint: Buffers written during checkpoints per second	Number of buffers written during checkpoints per second.	Dependent item	pgsql.bgwriter.buffers_checkpoint.rate Preprocessing JSON Path: `$.buffers_checkpoint` Change per second
Checkpoint: Buffers written by the background writer per second	Number of buffers written by the background writer per second.	Dependent item	pgsql.bgwriter.buffers_clean.rate Preprocessing JSON Path: `$.buffers_clean` Change per second
Checkpoint: Requested per second	Number of requested checkpoints that have been performed per second.	Dependent item	pgsql.bgwriter.checkpoints_req.rate Preprocessing JSON Path: `$.checkpoints_req` Change per second
Checkpoint: Scheduled per second	Number of scheduled checkpoints that have been performed per second.	Dependent item	pgsql.bgwriter.checkpoints_timed.rate Preprocessing JSON Path: `$.checkpoints_timed` Change per second
Checkpoint: Checkpoint sync time per second	Total amount of time per second that has been spent in the portion of checkpoint processing where files are synchronized to disk.	Dependent item	pgsql.bgwriter.checkpointsynctime.rate Preprocessing JSON Path: `$.checkpoint_sync_time` Custom multiplier: `0.001` Change per second
Checkpoint: Checkpoint write time per second	Total amount of time per second that has been spent in the portion of checkpoint processing where files are written to disk.	Dependent item	pgsql.bgwriter.checkpointwritetime.rate Preprocessing JSON Path: `$.checkpoint_write_time` Custom multiplier: `0.001` Change per second
Bgwriter: Number of bgwriter cleaning scan stopped per second	Number of times the background writer stopped a cleaning scan because it had written too many buffers per second.	Dependent item	pgsql.bgwriter.maxwritten_clean.rate Preprocessing JSON Path: `$.maxwritten_clean` Change per second
PostgreSQL: Get bgwriter	Collect all metrics from pgstatbgwriter: https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-BGWRITER-VIEW	Zabbix agent	pgsql.bgwriter["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Cache hit ratio, %	Cache hit ratio.	Zabbix agent	pgsql.cache.hit["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Config hash	PostgreSQL configuration hash.	Zabbix agent	pgsql.config.hash["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] Preprocessing Discard unchanged with heartbeat: `1h`
Connections sum: Active	Total number of connections executing a query.	Dependent item	pgsql.connections.sum.active Preprocessing JSON Path: `$.active`
Connections sum: Idle	Total number of connections waiting for a new client command.	Dependent item	pgsql.connections.sum.idle Preprocessing JSON Path: `$.idle`
Connections sum: Idle in transaction	Total number of connections in a transaction state but not executing a query.	Dependent item	pgsql.connections.sum.idleintransaction Preprocessing JSON Path: `$.idle_in_transaction`
Connections sum: Prepared	Total number of prepared transactions: https://www.postgresql.org/docs/current/sql-prepare-transaction.html	Dependent item	pgsql.connections.sum.prepared Preprocessing JSON Path: `$.prepared`
Connections sum: Total	Total number of connections.	Dependent item	pgsql.connections.sum.total Preprocessing JSON Path: `$.total`
Connections sum: Total, %	Total number of connections, in percentage.	Dependent item	pgsql.connections.sum.total_pct Preprocessing JSON Path: `$.total_pct`
Connections sum: Waiting	Total number of waiting connections: https://www.postgresql.org/docs/current/monitoring-stats.html#WAIT-EVENT-TABLE	Dependent item	pgsql.connections.sum.waiting Preprocessing JSON Path: `$.waiting`
PostgreSQL: Get connections sum	Collect all metrics from pgstatactivity: https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW	Zabbix agent	pgsql.connections.sum["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Get dbstat	Collect all metrics from pgstatdatabase per database: https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-DATABASE-VIEW	Zabbix agent	pgsql.dbstat["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Get locks	Collect all metrics from pg_locks per database: https://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-TABLES	Zabbix agent	pgsql.locks["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Ping time	Used to get the `SELECT 1` query execution time.	Zabbix agent	pgsql.ping.time["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] Preprocessing Regular expression: `Time:\s+(\d+\.\d+)\s+ms \1` Custom multiplier: `0.001`
PostgreSQL: Ping	Used to test a connection to see if it is alive. It is set to 0 if the instance doesn't accept the connections.	Zabbix agent	pgsql.ping["{$PG.HOST}","{$PG.PORT}"] Preprocessing JavaScript: `return value.search(/accepting connections/)>0 ? 1 : 0` Discard unchanged with heartbeat: `1h`
PostgreSQL: Get queries	Collect all metrics by query execution time.	Zabbix agent	pgsql.queries["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}","{$PG.QUERY_ETIME.MAX.WARN}"]
PostgreSQL: Replication: Standby count	Number of standby servers.	Zabbix agent	pgsql.replication.count["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Replication: Lag in seconds	Replication lag with master, in seconds.	Zabbix agent	pgsql.replication.lag.sec["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Replication: Recovery role	Replication role: 1 — recovery is still in progress (standby mode), 0 — master mode.	Zabbix agent	pgsql.replication.recovery_role["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Replication: Status	Replication status: 0 — streaming is down, 1 — streaming is up, 2 — master mode.	Zabbix agent	pgsql.replication.status["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
Transactions: Max active transaction time	Current max active transaction time.	Dependent item	pgsql.transactions.active Preprocessing JSON Path: `$.active`
Transactions: Max idle transaction time	Current max idle transaction time.	Dependent item	pgsql.transactions.idle Preprocessing JSON Path: `$.idle`
Transactions: Max prepared transaction time	Current max prepared transaction time.	Dependent item	pgsql.transactions.prepared Preprocessing JSON Path: `$.prepared`
Transactions: Max waiting transaction time	Current max waiting transaction time.	Dependent item	pgsql.transactions.waiting Preprocessing JSON Path: `$.waiting`
PostgreSQL: Get transactions	Collect metrics by transaction execution time.	Zabbix agent	pgsql.transactions["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Uptime	Time since the server started.	Zabbix agent	pgsql.uptime["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
PostgreSQL: Version	PostgreSQL version.	Zabbix agent	pgsql.version["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] Preprocessing Discard unchanged with heartbeat: `1d`
WAL: Segments count	Number of WAL segments.	Dependent item	pgsql.wal.count Preprocessing JSON Path: `$.count`
PostgreSQL: Get WAL	Collect write-ahead log (WAL) metrics.	Zabbix agent	pgsql.wal.stat["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]
WAL: Bytes written	WAL write, in bytes.	Dependent item	pgsql.wal.write Preprocessing JSON Path: `$.write` Change per second

Triggers

Name	Description	Expression	Severity
PostgreSQL: Required checkpoints occur too frequently	Checkpoints are points in the sequence of transactions at which it is guaranteed that the heap and index data files have been updated with all information written before that checkpoint. At checkpoint time, all dirty data pages are flushed to disk and a special checkpoint record is written to the log file. https://www.postgresql.org/docs/current/wal-configuration.html	`last(/PostgreSQL by Zabbix agent/pgsql.bgwriter.checkpoints_req.rate) > {$PG.CHECKPOINTS_REQ.MAX.WARN}`\|Average
PostgreSQL: Failed to get items	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/PostgreSQL by Zabbix agent/pgsql.bgwriter["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"],30m) = 1`\|Warning	Depends on: PostgreSQL: Service is down
PostgreSQL: Cache hit ratio too low	Cache hit ratio is lower than {$PG.CACHE_HITRATIO.MIN.WARN} for 5m.	`max(/PostgreSQL by Zabbix agent/pgsql.cache.hit["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"],5m) < {$PG.CACHE_HITRATIO.MIN.WARN}`\|Warning
PostgreSQL: Configuration has changed	PostgreSQL configuration has changed.	`last(/PostgreSQL by Zabbix agent/pgsql.config.hash["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"],#1)<>last(/PostgreSQL by Zabbix agent/pgsql.config.hash["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"],#2) and length(last(/PostgreSQL by Zabbix agent/pgsql.config.hash["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]))>0`\|Info
PostgreSQL: Total number of connections is too high	Total number of current connections exceeds the limit of {$PG.CONNTOTALPCT.MAX.WARN}% out of the maximum number of concurrent connections to the database server (the "max_connections" setting).	`min(/PostgreSQL by Zabbix agent/pgsql.connections.sum.total_pct,5m) > {$PG.CONN_TOTAL_PCT.MAX.WARN}`\|Average
PostgreSQL: Response too long	Response is taking too long (over {$PG.PING_TIME.MAX.WARN} for 5m).	`min(/PostgreSQL by Zabbix agent/pgsql.ping.time["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"],5m) > {$PG.PING_TIME.MAX.WARN}`\|Average	Depends on: PostgreSQL: Service is down
PostgreSQL: Service is down	Last test of a connection was unsuccessful.	`last(/PostgreSQL by Zabbix agent/pgsql.ping["{$PG.HOST}","{$PG.PORT}"]) = 0`\|High
PostgreSQL: Streaming lag with master is too high	Replication lag with master is higher than {$PG.REPL_LAG.MAX.WARN} for 5m.	`min(/PostgreSQL by Zabbix agent/pgsql.replication.lag.sec["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"],5m) > {$PG.REPL_LAG.MAX.WARN}`\|Average
PostgreSQL: Replication is down	Replication is enabled and data streaming was down for 5m.	`max(/PostgreSQL by Zabbix agent/pgsql.replication.status["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"],5m)=0`\|Average
PostgreSQL: Service has been restarted	PostgreSQL uptime is less than 10 minutes.	`last(/PostgreSQL by Zabbix agent/pgsql.uptime["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]) < 10m`\|Average
PostgreSQL: Version has changed		`last(/PostgreSQL by Zabbix agent/pgsql.version["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"],#1)<>last(/PostgreSQL by Zabbix agent/pgsql.version["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"],#2) and length(last(/PostgreSQL by Zabbix agent/pgsql.version["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]))>0`\|Info

LLD rule Database discovery

Name

Description

Type

Key and additional info

Database discovery

Discovers databases (DB) in the database management system (DBMS), except:

- templates;

- default "postgres" DB;

- DBs that do not allow connections.

Zabbix agent

pgsql.discovery.db["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]

Item prototypes for Database discovery

Name	Description	Type	Key and additional info
DB [{#DBNAME}]: Get dbstat	Get dbstat metrics for database "{#DBNAME}".	Dependent item	pgsql.dbstat.get_metrics["{#DBNAME}"] Preprocessing JSON Path: `$['{#DBNAME}']` ⛔️Custom on fail: Discard value
DB [{#DBNAME}]: Get queries	Get queries metrics for database "{#DBNAME}".	Dependent item	pgsql.queries.get_metrics["{#DBNAME}"] Preprocessing JSON Path: `$['{#DBNAME}']` ⛔️Custom on fail: Discard value
DB [{#DBNAME}]: Database size	Database size.	Zabbix agent	pgsql.db.size["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}","{#DBNAME}"]
DB [{#DBNAME}]: Blocks hit per second	Total number of times per second disk blocks were found already in the buffer cache, so that a read was not necessary.	Dependent item	pgsql.dbstat.blks_hit.rate["{#DBNAME}"] Preprocessing JSON Path: `$.blks_hit` Change per second
DB [{#DBNAME}]: Disk blocks read per second	Total number of disk blocks read per second in this database.	Dependent item	pgsql.dbstat.blks_read.rate["{#DBNAME}"] Preprocessing JSON Path: `$.blks_read` Change per second
DB [{#DBNAME}]: Detected conflicts per second	Total number of queries canceled due to conflicts with recovery in this database per second.	Dependent item	pgsql.dbstat.conflicts.rate["{#DBNAME}"] Preprocessing JSON Path: `$.conflicts` Change per second
DB [{#DBNAME}]: Detected deadlocks per second	Total number of detected deadlocks in this database per second.	Dependent item	pgsql.dbstat.deadlocks.rate["{#DBNAME}"] Preprocessing JSON Path: `$.deadlocks` Change per second
DB [{#DBNAME}]: Temp_bytes written per second	Total amount of data written to temporary files by queries in this database.	Dependent item	pgsql.dbstat.temp_bytes.rate["{#DBNAME}"] Preprocessing JSON Path: `$.temp_bytes` Change per second
DB [{#DBNAME}]: Temp_files created per second	Total number of temporary files created by queries in this database.	Dependent item	pgsql.dbstat.temp_files.rate["{#DBNAME}"] Preprocessing JSON Path: `$.temp_files` Change per second
DB [{#DBNAME}]: Tuples deleted per second	Total number of rows deleted by queries in this database per second.	Dependent item	pgsql.dbstat.tup_deleted.rate["{#DBNAME}"] Preprocessing JSON Path: `$.tup_deleted` Change per second
DB [{#DBNAME}]: Tuples fetched per second	Total number of rows fetched by queries in this database per second.	Dependent item	pgsql.dbstat.tup_fetched.rate["{#DBNAME}"] Preprocessing JSON Path: `$.tup_fetched` Change per second
DB [{#DBNAME}]: Tuples inserted per second	Total number of rows inserted by queries in this database per second.	Dependent item	pgsql.dbstat.tup_inserted.rate["{#DBNAME}"] Preprocessing JSON Path: `$.tup_inserted` Change per second
DB [{#DBNAME}]: Tuples returned per second	Number of rows returned by queries in this database per second.	Dependent item	pgsql.dbstat.tup_returned.rate["{#DBNAME}"] Preprocessing JSON Path: `$.tup_returned` Change per second
DB [{#DBNAME}]: Tuples updated per second	Total number of rows updated by queries in this database per second.	Dependent item	pgsql.dbstat.tup_updated.rate["{#DBNAME}"] Preprocessing JSON Path: `$.tup_updated` Change per second
DB [{#DBNAME}]: Commits per second	Number of transactions in this database that have been committed per second.	Dependent item	pgsql.dbstat.xact_commit.rate["{#DBNAME}"] Preprocessing JSON Path: `$.xact_commit` Change per second
DB [{#DBNAME}]: Rollbacks per second	Total number of transactions in this database that have been rolled back.	Dependent item	pgsql.dbstat.xact_rollback.rate["{#DBNAME}"] Preprocessing JSON Path: `$.xact_rollback` Change per second
DB [{#DBNAME}]: Frozen XID before autovacuum, %	Preventing Transaction ID Wraparound Failures: https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND	Dependent item	pgsql.frozenxid.prcbeforeav["{#DBNAME}"] Preprocessing JSON Path: `$.prc_before_av`
DB [{#DBNAME}]: Frozen XID before stop, %	Preventing Transaction ID Wraparound Failures: https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND	Dependent item	pgsql.frozenxid.prcbeforestop["{#DBNAME}"] Preprocessing JSON Path: `$.prc_before_stop`
DB [{#DBNAME}]: Get frozen XID		Zabbix agent	pgsql.frozenxid["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{#DBNAME}"]
DB [{#DBNAME}]: Num of locks total	Total number of locks in this database.	Dependent item	pgsql.locks.total["{#DBNAME}"] Preprocessing JSON Path: `$['{#DBNAME}'].total`
DB [{#DBNAME}]: Queries slow maintenance count	Slow maintenance query count for this database.	Dependent item	pgsql.queries.mro.slow_count["{#DBNAME}"] Preprocessing JSON Path: `$.mro_slow_count`
DB [{#DBNAME}]: Queries max maintenance time	Max maintenance query time for this database.	Dependent item	pgsql.queries.mro.time_max["{#DBNAME}"] Preprocessing JSON Path: `$.mro_time_max`
DB [{#DBNAME}]: Queries sum maintenance time	Sum maintenance query time for this database.	Dependent item	pgsql.queries.mro.time_sum["{#DBNAME}"] Preprocessing JSON Path: `$.mro_time_sum`
DB [{#DBNAME}]: Queries slow query count	Slow query count for this database.	Dependent item	pgsql.queries.query.slow_count["{#DBNAME}"] Preprocessing JSON Path: `$.query_slow_count`
DB [{#DBNAME}]: Queries max query time	Max query time for this database.	Dependent item	pgsql.queries.query.time_max["{#DBNAME}"] Preprocessing JSON Path: `$.query_time_max`
DB [{#DBNAME}]: Queries sum query time	Sum query time for this database.	Dependent item	pgsql.queries.query.time_sum["{#DBNAME}"] Preprocessing JSON Path: `$.query_time_sum`
DB [{#DBNAME}]: Queries slow transaction count	Slow transaction query count for this database.	Dependent item	pgsql.queries.tx.slow_count["{#DBNAME}"] Preprocessing JSON Path: `$.tx_slow_count`
DB [{#DBNAME}]: Queries max transaction time	Max transaction query time for this database.	Dependent item	pgsql.queries.tx.time_max["{#DBNAME}"] Preprocessing JSON Path: `$.tx_time_max`
DB [{#DBNAME}]: Queries sum transaction time	Sum transaction query time for this database.	Dependent item	pgsql.queries.tx.time_sum["{#DBNAME}"] Preprocessing JSON Path: `$.tx_time_sum`
DB [{#DBNAME}]: Index scans per second	Number of index scans in the database per second.	Dependent item	pgsql.scans.idx.rate["{#DBNAME}"] Preprocessing JSON Path: `$.idx` Change per second
DB [{#DBNAME}]: Sequential scans per second	Number of sequential scans in this database per second.	Dependent item	pgsql.scans.seq.rate["{#DBNAME}"] Preprocessing JSON Path: `$.seq` Change per second
DB [{#DBNAME}]: Get scans	Number of scans done for table/index in this database.	Zabbix agent	pgsql.scans["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{#DBNAME}"]

Trigger prototypes for Database discovery

Name	Description	Expression
DB [{#DBNAME}]: Too many recovery conflicts	The primary and standby servers are in many ways loosely connected. Actions on the primary will have an effect on the standby. As a result, there is potential for negative interactions or conflicts between them. https://www.postgresql.org/docs/current/hot-standby.html#HOT-STANDBY-CONFLICT	`min(/PostgreSQL by Zabbix agent/pgsql.dbstat.conflicts.rate["{#DBNAME}"],5m) > {$PG.CONFLICTS.MAX.WARN:"{#DBNAME}"}`\|Average
DB [{#DBNAME}]: Deadlock occurred	Number of deadlocks detected per second exceeds {$PG.DEADLOCKS.MAX.WARN:"{#DBNAME}"} for 5m.	`min(/PostgreSQL by Zabbix agent/pgsql.dbstat.deadlocks.rate["{#DBNAME}"],5m) > {$PG.DEADLOCKS.MAX.WARN:"{#DBNAME}"}`\|High
DB [{#DBNAME}]: VACUUM FREEZE is required to prevent wraparound	Preventing Transaction ID Wraparound Failures: https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND	`last(/PostgreSQL by Zabbix agent/pgsql.frozenxid.prc_before_stop["{#DBNAME}"])<{$PG.FROZENXID_PCT_STOP.MIN.HIGH:"{#DBNAME}"}`\|Average
DB [{#DBNAME}]: Number of locks is too high		`min(/PostgreSQL by Zabbix agent/pgsql.locks.total["{#DBNAME}"],5m)>{$PG.LOCKS.MAX.WARN:"{#DBNAME}"}`\|Warning
DB [{#DBNAME}]: Too many slow queries	The number of detected slow queries exceeds the limit of {$PG.SLOW_QUERIES.MAX.WARN:"{#DBNAME}"}.	`min(/PostgreSQL by Zabbix agent/pgsql.queries.query.slow_count["{#DBNAME}"],5m)>{$PG.SLOW_QUERIES.MAX.WARN:"{#DBNAME}"}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

db_oracle_odbc

View README Download JSON

Oracle by ODBC

Overview

The template is developed to monitor a single DBMS Oracle Database instance with ODBC and can monitor CDB or non-CDB installations.

Supported versions

Oracle Database 12c2 and newer.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

Oracle Database 12c2, 18c, 19c, 21c, 23c

Configuration

Setup

Create an Oracle Database user for monitoring:

In CDB installations, it is possible to monitor tablespaces from the CDB (container database) and all PDBs (pluggable databases). To do so, a common user is needed with the correct rights:

CREATE USER c##zabbix_mon IDENTIFIED BY <PASSWORD>;
-- Grant access to the c##zabbix_mon user.
ALTER USER c##zabbix_mon SET CONTAINER_DATA=ALL CONTAINER=CURRENT;
GRANT CONNECT, CREATE SESSION TO c##zabbix_mon;
GRANT SELECT_CATALOG_ROLE to c##zabbix_mon;
GRANT SELECT ON v_$instance TO c##zabbix_mon;
GRANT SELECT ON v_$database TO c##zabbix_mon;
GRANT SELECT ON v_$sysmetric TO c##zabbix_mon;
GRANT SELECT ON v_$system_parameter TO c##zabbix_mon;
GRANT SELECT ON v_$session TO c##zabbix_mon;
GRANT SELECT ON v_$recovery_file_dest TO c##zabbix_mon;
GRANT SELECT ON v_$active_session_history TO c##zabbix_mon;
GRANT SELECT ON v_$osstat TO c##zabbix_mon;
GRANT SELECT ON v_$process TO c##zabbix_mon;
GRANT SELECT ON v_$datafile TO c##zabbix_mon;
GRANT SELECT ON v_$pgastat TO c##zabbix_mon;
GRANT SELECT ON v_$sgastat TO c##zabbix_mon;
GRANT SELECT ON v_$log TO c##zabbix_mon;
GRANT SELECT ON v_$archive_dest TO c##zabbix_mon;
GRANT SELECT ON v_$asm_diskgroup TO c##zabbix_mon;
GRANT SELECT ON v_$asm_diskgroup_stat TO c##zabbix_mon;
GRANT SELECT ON DBA_USERS TO c##zabbix_mon;

This is needed because the template uses CDB_* views to monitor tablespaces from the CDB and different PDBs - the monitoring user therefore needs access to the container data objects on all PDBs.

However, if you wish to monitor only a single PDB or a non-CDB instance, a local user is sufficient:

CREATE USER zabbix_mon IDENTIFIED BY <PASSWORD>;
-- Grant access to the zabbix_mon user.
GRANT CONNECT, CREATE SESSION TO zabbix_mon;
GRANT SELECT_CATALOG_ROLE to zabbix_mon;
GRANT SELECT ON v_$instance TO zabbix_mon;
GRANT SELECT ON v_$database TO zabbix_mon;
GRANT SELECT ON v_$sysmetric TO zabbix_mon;
GRANT SELECT ON v_$system_parameter TO zabbix_mon;
GRANT SELECT ON v_$session TO zabbix_mon;
GRANT SELECT ON v_$recovery_file_dest TO zabbix_mon;
GRANT SELECT ON v_$active_session_history TO zabbix_mon;
GRANT SELECT ON v_$osstat TO zabbix_mon;
GRANT SELECT ON v_$process TO zabbix_mon;
GRANT SELECT ON v_$datafile TO zabbix_mon;
GRANT SELECT ON v_$pgastat TO zabbix_mon;
GRANT SELECT ON v_$sgastat TO zabbix_mon;
GRANT SELECT ON v_$log TO zabbix_mon;
GRANT SELECT ON v_$archive_dest TO zabbix_mon;
GRANT SELECT ON v_$asm_diskgroup TO zabbix_mon;
GRANT SELECT ON v_$asm_diskgroup_stat TO zabbix_mon;
GRANT SELECT ON DBA_USERS TO zabbix_mon;

Important! Ensure that the ODBC connection to Oracle includes the session parameter NLS_NUMERIC_CHARACTERS= '.,'. It is important for displaying the float numbers in Zabbix correctly.

Important! These privileges grant the monitoring user SELECT_CATALOG_ROLE, which, in turn, gives access to thousands of tables in the database. This role is required to access the V$RESTORE_POINT dynamic performance view. However, there are ways to go around this, if the SELECT_CATALOG_ROLE assigned to a monitoring user raises any security issues. One way to do this is using pipelined table functions:

Log into your database as the SYS user or make sure that your administration user has the required privileges to execute the steps below;

Create types for the table function:

CREATE OR REPLACE TYPE zbx_mon_restore_point_row AS OBJECT (
  SCN                           NUMBER,
  DATABASE_INCARNATION#         NUMBER,
  GUARANTEE_FLASHBACK_DATABASE  VARCHAR2(3),
  STORAGE_SIZE                  NUMBER,
  TIME                          TIMESTAMP(9),
  RESTORE_POINT_TIME            TIMESTAMP(9),
  PRESERVED                     VARCHAR2(3),
  NAME                          VARCHAR2(128),
  PDB_RESTORE_POINT             VARCHAR2(3),
  CLEAN_PDB_RESTORE_POINT       VARCHAR2(3),
  PDB_INCARNATION#              NUMBER,
  REPLICATED                    VARCHAR2(3),
  CON_ID                        NUMBER
);
CREATE OR REPLACE TYPE zbx_mon_restore_point_tab IS TABLE OF zbx_mon_restore_point_row;

Create the pipelined table function:

CREATE OR REPLACE FUNCTION zbx_mon_restore_point RETURN zbx_mon_restore_point_tab PIPELINED AS
BEGIN
  FOR i IN (SELECT * FROM V$RESTORE_POINT) LOOP
    PIPE ROW (zbx_mon_restore_point_row(i.SCN, i.DATABASE_INCARNATION#, i.GUARANTEE_FLASHBACK_DATABASE, i.STORAGE_SIZE, i.TIME, i.RESTORE_POINT_TIME, i.PRESERVED, i.NAME, i.PDB_RESTORE_POINT, i.CLEAN_PDB_RESTORE_POINT, i.PDB_INCARNATION#, i.REPLICATED, i.CON_ID));
  END LOOP;
  RETURN;
END;

Grant the Zabbix monitoring user the Execute privilege on the created pipelined table function and replace the monitoring user V$RESTORE_POINT view with the SYS user function (in this example, the SYS user is used to create DB types and function):
```
GRANT EXECUTE ON zbx_mon_restore_point TO c##zabbix_mon;
CREATE OR REPLACE VIEW c##zabbix_mon.V$RESTORE_POINT AS SELECT * FROM TABLE(SYS.zbx_mon_restore_point);
  
```

Finally, revoke the SELECT_CATALOG_ROLE and grant additional permissions that were previously covered by the SELECT_CATALOG_ROLE.

REVOKE SELECT_CATALOG_ROLE FROM c##zabbix_mon;
GRANT SELECT ON v_$pdbs TO c##zabbix_mon;
GRANT SELECT ON v_$sort_segment TO c##zabbix_mon;
GRANT SELECT ON v_$parameter TO c##zabbix_mon;
GRANT SELECT ON CDB_TABLESPACES TO c##zabbix_mon;
GRANT SELECT ON CDB_DATA_FILES TO c##zabbix_mon;
GRANT SELECT ON CDB_FREE_SPACE TO c##zabbix_mon;
GRANT SELECT ON CDB_TEMP_FILES TO c##zabbix_mon;

If this workaround does not work for you, there are more options available, such as materialized views, but look out for data refresh as V$RESTORE_POINT is a dynamic performance view.

Install the ODBC driver on Zabbix server or Zabbix proxy. See the Oracle documentation for instructions.
Configure Zabbix server or Zabbix proxy for using the Oracle environment:

This step is required only when:
- installing Oracle Instant Client with .rpm packages with a version < 19.3 (if Instant Client is the only Oracle software installed on Zabbix server or Zabbix proxy);
- installing Oracle Instant Client manually with .zip files.
There are multiple configuration options:
1. Using the LDCONFIG utility (recommended option):
  
  To update the runtime link path, it is recommended to use the LDCONFIG utility, for example:
```
# sh -c "echo /opt/oracle/instantclient_19_18 > /etc/ld.so.conf.d/oracle-instantclient.conf"
# ldconfig
  
```
2. Using the application configuration file:
  
  An alternative solution is to export the required variables by editing or adding a new application configuration file:
  - /etc/sysconfig/zabbix-server # for server
  - /etc/sysconfig/zabbix-proxy # for proxy
    
    And then, adding:
```
# Oracle Instant Client library
LD_LIBRARY_PATH=/opt/oracle/instantclient_19_18:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH
```
Keep in mind that the library paths will vary depending on your installation.

This is a minimal configuration example. Depending on the Oracle Instant Client version, required functionality and host operating system, a different set of additional packages might need to be installed. For more detailed configuration instructions, see the official Oracle Instant Client installation instructions for Linux.
Restart Zabbix server or Zabbix proxy.
Set the username and password in the host macros {$ORACLE.USER} and {$ORACLE.PASSWORD}.
Set the {$ORACLE.DRIVER} and {$ORACLE.SERVICE} in the host macros.
- {$ORACLE.DRIVER} is a path to the driver location in the OS. The ODBC driver file should be found in the Instant Client directory and named libsqora.so.XX.Y.
- {$ORACLE.SERVICE} is a service name to which the host will connect to. The value in this macro is important as it determines if the connection is established to a non-CDB, CDB, or PDB. If you wish to monitor tablespaces of all PDBs, you will need to set a service name that points to the CDB. Active service names can be seen from the instance running Oracle Database with lsnrctl status.
Important! Make sure that the user created in step #1 is present on the specified service.

The "Service's TCP port state" item uses {HOST.CONN} and {$ORACLE.PORT} macros to check the availability of the listener.

Macros used

Name	Description	Default
{$ORACLE.DRIVER}	Oracle driver path. For example: `/usr/lib/oracle/21/client64/lib/libsqora.so.21.1`	`<Put path to oracle driver here>`
{$ORACLE.SERVICE}	Oracle Service Name.	`<Put oracle service name here>`
{$ORACLE.USER}	Oracle username.	`<Put your username here>`
{$ORACLE.PASSWORD}	Oracle user's password.	`<Put your password here>`
{$ORACLE.PORT}	Oracle Database TCP port.	`1521`
{$ORACLE.DBNAME.MATCHES}	Used in database discovery. It can be overridden on the host or linked template level.	`.*`
{$ORACLE.DBNAME.NOT_MATCHES}	Used in database discovery. It can be overridden on the host or linked template level.	`PDB\$SEED`
{$ORACLE.TABLESPACE.CONTAINER.MATCHES}	Used in tablespace discovery. It can be overridden on the host or linked template level.	`.*`
{$ORACLE.TABLESPACE.CONTAINER.NOT_MATCHES}	Used in tablespace discovery. It can be overridden on the host or linked template level.	`CHANGE_IF_NEEDED`
{$ORACLE.TABLESPACE.NAME.MATCHES}	Used in tablespace discovery. It can be overridden on the host or linked template level.	`.*`
{$ORACLE.TABLESPACE.NAME.NOT_MATCHES}	Used in tablespace discovery. It can be overridden on the host or linked template level.	`CHANGE_IF_NEEDED`
{$ORACLE.TBS.USED.PCT.FROM.MAX.WARN}	Warning severity alert threshold for the maximum percentage of tablespace usage from maximum tablespace size (used bytes/max bytes) for the Warning trigger expression.	`90`
{$ORACLE.TBS.USED.PCT.FROM.MAX.HIGH}	High severity alert threshold for the maximum percentage of tablespace usage (used bytes/max bytes) for the High trigger expression.	`95`
{$ORACLE.TBS.USED.PCT.MAX.WARN}	Warning severity alert threshold for the maximum percentage of tablespace usage (used bytes/allocated bytes) for the Warning trigger expression.	`90`
{$ORACLE.TBS.USED.PCT.MAX.HIGH}	High severity alert threshold for the maximum percentage of tablespace usage (used bytes/allocated bytes) for the High trigger expression.	`95`
{$ORACLE.TBS.UTIL.PCT.MAX.WARN}	Warning severity alert threshold for the maximum percentage of tablespace utilization (allocated bytes/max bytes) for the High trigger expression.	`80`
{$ORACLE.TBS.UTIL.PCT.MAX.HIGH}	High severity alert threshold for the maximum percentage of tablespace utilization (allocated bytes/max bytes) for the High trigger expression.	`90`
{$ORACLE.PROCESSES.MAX.WARN}	Alert threshold for the maximum percentage of active processes for the Warning trigger expression.	`80`
{$ORACLE.SESSIONS.MAX.WARN}	Alert threshold for the maximum percentage of active sessions for the Warning trigger expression.	`80`
{$ORACLE.DB.FILE.MAX.WARN}	The maximum percentage of used database files for the Warning trigger expression.	`80`
{$ORACLE.PGA.USE.MAX.WARN}	Alert threshold for the maximum percentage of the Program Global Area (PGA) usage for the Warning trigger expression.	`90`
{$ORACLE.SESSIONS.LOCK.MAX.WARN}	Alert threshold for the maximum percentage of locked sessions for the Warning trigger expression.	`20`
{$ORACLE.SESSION.LOCK.MAX.TIME}	The maximum duration of the session lock in seconds to count the session as a prolongedly locked query.	`600`
{$ORACLE.SESSION.LONG.LOCK.MAX.WARN}	Alert threshold for the maximum number of the prolongedly locked sessions for the Warning trigger expression.	`3`
{$ORACLE.CONCURRENCY.MAX.WARN}	The maximum percentage of session concurrency for the Warning trigger expression.	`80`
{$ORACLE.REDO.MIN.WARN}	Alert threshold for the minimum number of redo logs for the Warning trigger expression.	`3`
{$ORACLE.SHARED.FREE.MIN.WARN}	Alert threshold for the minimum percentage of free shared pool for the Warning trigger expression.	`5`
{$ORACLE.EXPIRE.PASSWORD.MIN.WARN}	The number of days before the password expires for the Warning trigger expression.	`7`
{$ORACLE.ASM.USED.PCT.MAX.WARN}	The maximum percentage of used space in the Automatic Storage Management (ASM) disk group for the Warning trigger expression.	`90`
{$ORACLE.ASM.USED.PCT.MAX.HIGH}	The maximum percentage of used space in the Automatic Storage Management (ASM) disk group for the High trigger expression.	`95`

Items

Name	Description	Type	Key and additional info
Oracle: Service's TCP port state	Checks the availability of Oracle on the TCP port.	Zabbix agent	net.tcp.service[tcp,{HOST.CONN},{$ORACLE.PORT}] Preprocessing Discard unchanged with heartbeat: `10m`
Oracle: Number of LISTENER processes	The number of running listener processes.	Zabbix agent	proc.num[,,,"tnslsnr LISTENER"] Preprocessing Discard unchanged with heartbeat: `10m`
Oracle: Get instance state	Gets the state of the current instance.	Database monitor	db.odbc.get[getinstancestate,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"]
Oracle: Get archive log	Gets the destinations of the log archive.	Database monitor	db.odbc.get[get_archivelog,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"]
Oracle: Get ASM disk groups	Gets the ASM disk groups.	Database monitor	db.odbc.get[get_asm,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"]
Oracle: Get database	Gets the databases in the database management system (DBMS).	Database monitor	db.odbc.get[get_db,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"]
Oracle: Get PDB	Gets the pluggable database (PDB) in DBMS.	Database monitor	db.odbc.get[get_pdb,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"]
Oracle: Get tablespace	Gets tablespaces in DBMS.	Database monitor	db.odbc.get[get_tablespace,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"]
Oracle: Version	The Oracle Server version.	Dependent item	oracle.version Preprocessing JSON Path: `$..VERSION.first()` Discard unchanged with heartbeat: `1d`
Oracle: Uptime	The Oracle instance uptime expressed in seconds.	Dependent item	oracle.uptime Preprocessing JSON Path: `$..UPTIME.first()`
Oracle: Instance status	The status of the instance.	Dependent item	oracle.instance_status Preprocessing JSON Path: `$..STATUS.first()`
Oracle: Archiver state	The status of automatic archiving.	Dependent item	oracle.archiver_state Preprocessing JSON Path: `$..ARCHIVER.first()`
Oracle: Instance name	The name of the instance.	Dependent item	oracle.instance_name Preprocessing JSON Path: `$..INSTANCE_NAME.first()`
Oracle: Instance hostname	The name of the host machine.	Dependent item	oracle.instance_hostname Preprocessing JSON Path: `$..HOST_NAME.first()`
Oracle: Instance role	Indicates whether the instance is an active instance or an inactive secondary instance.	Dependent item	oracle.instance.role Preprocessing JSON Path: `$..INSTANCE_ROLE.first()`
Oracle: Get system metrics	Gets the values of the system metrics.	Database monitor	db.odbc.get[getsystemmetrics,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"]
Oracle: Sessions limit	The user and system sessions.	Dependent item	oracle.session_limit Preprocessing JSON Path: `$[?(@.METRIC=='SYSPARAM::Sessions')].VALUE.first()`
Oracle: Datafiles limit	The maximum allowable number of datafiles.	Dependent item	oracle.dbfileslimit Preprocessing JSON Path: `$[?(@.METRIC=='SYSPARAM::Db_Files')].VALUE.first()`
Oracle: Processes limit	The maximum number of user processes.	Dependent item	oracle.processes_limit Preprocessing JSON Path: `$[?(@.METRIC=='SYSPARAM::Processes')].VALUE.first()`
Oracle: Number of processes	The current number of user processes.	Dependent item	oracle.processes_count Preprocessing JSON Path: `$[?(@.METRIC=='PROC::Procnum')].VALUE.first()`
Oracle: Datafiles count	The current number of datafiles.	Dependent item	oracle.dbfilescount Preprocessing JSON Path: `$[?(@.METRIC=='DATAFILE::Count')].VALUE.first()`
Oracle: Buffer cache hit ratio	The ratio of buffer cache hits ((LogRead - PhyRead)/LogRead).	Dependent item	oracle.buffercachehit_ratio Preprocessing JSON Path: `$[?(@.METRIC=='SYS::Buffer Cache Hit Ratio')].VALUE.first()`
Oracle: Cursor cache hit ratio	The ratio of cursor cache hits (CursorCacheHit/SoftParse).	Dependent item	oracle.cursorcachehit_ratio Preprocessing JSON Path: `$[?(@.METRIC=='SYS::Cursor Cache Hit Ratio')].VALUE.first()`
Oracle: Library cache hit ratio	The ratio of library cache hits (Hits/Pins).	Dependent item	oracle.librarycachehit_ratio Preprocessing JSON Path: `$[?(@.METRIC=='SYS::Library Cache Hit Ratio')].VALUE.first()`
Oracle: Shared pool free %	Free memory of a shared pool expressed in %.	Dependent item	oracle.sharedpoolfree Preprocessing JSON Path: `$[?(@.METRIC=='SYS::Shared Pool Free %')].VALUE.first()`
Oracle: Physical reads per second	Reads per second.	Dependent item	oracle.physicalreadsrate Preprocessing JSON Path: `$[?(@.METRIC=='SYS::Physical Reads Per Sec')].VALUE.first()`
Oracle: Physical writes per second	Writes per second.	Dependent item	oracle.physicalwritesrate Preprocessing JSON Path: `$[?(@.METRIC=='SYS::Physical Writes Per Sec')].VALUE.first()`
Oracle: Physical reads bytes per second	Read bytes per second.	Dependent item	oracle.physicalreadbytes_rate Preprocessing JSON Path: `The text is too long. Please see the template.`
Oracle: Physical writes bytes per second	Write bytes per second.	Dependent item	oracle.physicalwritebytes_rate Preprocessing JSON Path: `The text is too long. Please see the template.`
Oracle: Enqueue timeouts per second	Enqueue timeouts per second.	Dependent item	oracle.enqueuetimeoutsrate Preprocessing JSON Path: `The text is too long. Please see the template.`
Oracle: GC CR block received per second	The global cache (GC) and the consistent read (CR) block received per second.	Dependent item	oracle.gccrblockreceivedrate Preprocessing JSON Path: `The text is too long. Please see the template.`
Oracle: Global cache blocks corrupted	The number of blocks that encountered corruption or checksum failure during the interconnect.	Dependent item	oracle.cacheblockscorrupt Preprocessing JSON Path: `The text is too long. Please see the template.`
Oracle: Global cache blocks lost	The number of lost global cache blocks.	Dependent item	oracle.cacheblockslost Preprocessing JSON Path: `The text is too long. Please see the template.`
Oracle: Logons per second	The number of logon attempts.	Dependent item	oracle.logons_rate Preprocessing JSON Path: `$[?(@.METRIC=='SYS::Logons Per Sec')].VALUE.first()`
Oracle: Average active sessions	The average number of active sessions at a point in time that are either working or waiting.	Dependent item	oracle.active_sessions Preprocessing JSON Path: `$[?(@.METRIC=='SYS::Average Active Sessions')].VALUE.first()`
Oracle: Session count	The session count.	Dependent item	oracle.session_count Preprocessing JSON Path: `$[?(@.METRIC=='SESSION::Total')].VALUE.first()`
Oracle: Active user sessions	The number of active user sessions.	Dependent item	oracle.sessionactiveuser Preprocessing JSON Path: `$[?(@.METRIC=='SESSION::Active User')].VALUE.first()` ⛔️Custom on fail: Set value to: `0`
Oracle: Active background sessions	The number of active background sessions.	Dependent item	oracle.sessionactivebackground Preprocessing JSON Path: `$[?(@.METRIC=='SESSION::Active Background')].VALUE.first()` ⛔️Custom on fail: Set value to: `0`
Oracle: Inactive user sessions	The number of inactive user sessions.	Dependent item	oracle.sessioninactiveuser Preprocessing JSON Path: `$[?(@.METRIC=='SESSION::Inactive User')].VALUE.first()` ⛔️Custom on fail: Set value to: `0`
Oracle: Sessions lock rate	The percentage of locked sessions. Locks are mechanisms that prevent destructive interaction between transactions accessing the same resource - either user objects, such as tables and rows or system objects not visible to users, such as shared data structures in memory and data dictionary rows.	Dependent item	oracle.sessionlockrate Preprocessing JSON Path: `$[?(@.METRIC=='SESSION::Lock rate')].VALUE.first()`
Oracle: Sessions locked over {$ORACLE.SESSION.LOCK.MAX.TIME}s	The count of the prolongedly locked sessions. (You can change the duration of the maximum session lock in seconds for a query using the `{$ORACLE.SESSION.LOCK.MAX.TIME}` macro. Default = 600 s).	Dependent item	oracle.sessionlongtime_locked Preprocessing JSON Path: `$[?(@.METRIC=='SESSION::Long time locked')].VALUE.first()`
Oracle: Sessions concurrency	The percentage of concurrency. Concurrency is a database behavior when different transactions request to change the same resource. In the case of modifying data transactions, it sequentially temporarily blocks the right to change the data, and the rest of the transactions wait for access. When the access to a resource is locked for a long time, the concurrency grows (like the transaction queue), often leaving an extremely negative impact on performance. A high contention value does not indicate the root cause of the problem, but is a signal to search for it.	Dependent item	oracle.sessionconcurrencyrate Preprocessing JSON Path: `$[?(@.METRIC=='SESSION::Concurrency rate')].VALUE.first()`
Oracle: User '{$ORACLE.USER}' expire password	The number of days before the Zabbix account password expires.	Dependent item	oracle.userexpirepassword Preprocessing JSON Path: `$[?(@.METRIC=='USER::Expire password')].VALUE.first()`
Oracle: Active serial sessions	The number of active serial sessions.	Dependent item	oracle.activeserialsessions Preprocessing JSON Path: `$[?(@.METRIC=='SYS::Active Serial Sessions')].VALUE.first()`
Oracle: Active parallel sessions	The number of active parallel sessions.	Dependent item	oracle.activeparallelsessions Preprocessing JSON Path: `The text is too long. Please see the template.`
Oracle: Long table scans per second	The number of long table scans per second. A table is considered long if it is not cached and if its high water mark is greater than five blocks.	Dependent item	oracle.longtablescans_rate Preprocessing JSON Path: `The text is too long. Please see the template.`
Oracle: SQL service response time	The Structured Query Language (SQL) service response time expressed in seconds.	Dependent item	oracle.serviceresponsetime Preprocessing JSON Path: `The text is too long. Please see the template.` Custom multiplier: `0.01`
Oracle: User rollbacks per second	The number of times that users manually issued the `ROLLBACK` statement or an error occurred during the users' transactions.	Dependent item	oracle.userrollbacksrate Preprocessing JSON Path: `$[?(@.METRIC=='SYS::User Rollbacks Per Sec')].VALUE.first()`
Oracle: Total sorts per user call	The total sorts per user call.	Dependent item	oracle.sortsperuser_call Preprocessing JSON Path: `The text is too long. Please see the template.`
Oracle: Rows per sort	The average number of rows per sort for all types of sorts performed.	Dependent item	oracle.rowspersort Preprocessing JSON Path: `$[?(@.METRIC=='SYS::Rows Per Sort')].VALUE.first()`
Oracle: Disk sort per second	The number of sorts going to disk per second.	Dependent item	oracle.disk_sorts Preprocessing JSON Path: `$[?(@.METRIC=='SYS::Disk Sort Per Sec')].VALUE.first()`
Oracle: Memory sorts ratio	The percentage of sorts (from `ORDER BY` clauses or index building) that are done to disk vs. in-memory.	Dependent item	oracle.memorysortsratio Preprocessing JSON Path: `$[?(@.METRIC=='SYS::Memory Sorts Ratio')].VALUE.first()`
Oracle: Database wait time ratio	Wait time - the time that the server process spends waiting for available shared resources to be released by other server processes such as latches, locks, data buffers, etc.	Dependent item	oracle.databasewaittime_ratio Preprocessing JSON Path: `The text is too long. Please see the template.`
Oracle: Database CPU time ratio	The ratio calculated by dividing the total CPU (used by the database) by the Oracle time model statistic DB time.	Dependent item	oracle.databasecputime_ratio Preprocessing JSON Path: `$[?(@.METRIC=='SYS::Database CPU Time Ratio')].VALUE.first()`
Oracle: Temp space used	Used temporary space.	Dependent item	oracle.tempspaceused Preprocessing JSON Path: `$[?(@.METRIC=='SYS::Temp Space Used')].VALUE.first()`
Oracle: PGA, Total inuse	The amount of Program Global Area (PGA) memory currently consumed by work areas. This number can be used to determine how much memory is consumed by other consumers of the PGA memory (for example, PL/SQL or Java).	Dependent item	oracle.totalpgaused Preprocessing JSON Path: `$[?(@.METRIC=='PGA::Total Pga Inuse')].VALUE.first()`
Oracle: PGA, Aggregate target parameter	The current value of the `PGA_AGGREGATE_TARGET` initialization parameter. If this parameter is not set, then its value is "0" and automatic management of the PGA memory is disabled.	Dependent item	oracle.pga_target Preprocessing JSON Path: `The text is too long. Please see the template.`
Oracle: PGA, Total allocated	The current amount of the PGA memory allocated by the instance. The Oracle Database attempts to keep this number below the value of the `PGA_AGGREGATE_TARGET` initialization parameter. However, it is possible for the PGA allocated to exceed that value by a small percentage and for a short period of time when the work area workload is increasing very rapidly or when `PGA_AGGREGATE_TARGET` is set to a small value.	Dependent item	oracle.totalpgaallocated Preprocessing JSON Path: `$[?(@.METRIC=='PGA::Total Pga Allocated')].VALUE.first()`
Oracle: PGA, Total freeable	The number of bytes of the PGA memory in all processes that could be freed back to the OS.	Dependent item	oracle.totalpgafreeable Preprocessing JSON Path: `The text is too long. Please see the template.`
Oracle: PGA, Global memory bound	The maximum size of a work area executed in automatic mode.	Dependent item	oracle.pgaglobalbound Preprocessing JSON Path: `$[?(@.METRIC=='PGA::Global Memory Bound')].VALUE.first()`
Oracle: FRA, Space limit	The maximum amount of disk space (in bytes) that the database can use for the Fast Recovery Area (FRA).	Dependent item	oracle.fraspacelimit Preprocessing JSON Path: `$[?(@.METRIC=='FRA::Space Limit')].VALUE.first()` ⛔️Custom on fail: Set value to: `0`
Oracle: FRA, Used space	The amount of disk space (in bytes) used by FRA files created in the current and all the previous FRAs.	Dependent item	oracle.fraspaceused Preprocessing JSON Path: `$[?(@.METRIC=='FRA::Space Used')].VALUE.first()` ⛔️Custom on fail: Set value to: `0`
Oracle: FRA, Space reclaimable	The total amount of disk space (in bytes) that can be created by deleting obsolete, redundant, and other low-priority files from the FRA.	Dependent item	oracle.fraspacereclaimable Preprocessing JSON Path: `$[?(@.METRIC=='FRA::Space Reclaimable')].VALUE.first()` ⛔️Custom on fail: Set value to: `0`
Oracle: FRA, Number of files	The number of files in the FRA.	Dependent item	oracle.franumberof_files Preprocessing JSON Path: `$[?(@.METRIC=='FRA::Number Of Files')].VALUE.first()` ⛔️Custom on fail: Set value to: `0`
Oracle: FRA, Usable space in %	Percentage of space usable in the FRA.	Dependent item	oracle.frausablepct Preprocessing JSON Path: `$[?(@.METRIC=='FRA::Usable Pct')].VALUE.first()` ⛔️Custom on fail: Set value to: `0`
Oracle: FRA, Number of restore points	Number of restore points in the FRA.	Dependent item	oracle.frarestorepoint Preprocessing JSON Path: `$[?(@.METRIC=='FRA::Restore Point')].VALUE.first()`
Oracle: SGA, java pool	The memory is allocated from the Java pool.	Dependent item	oracle.sgajavapool Preprocessing JSON Path: `$[?(@.METRIC=='SGA::Java Pool')].VALUE.first()` ⛔️Custom on fail: Set value to: `0`
Oracle: SGA, large pool	The memory is allocated from a large pool.	Dependent item	oracle.sgalargepool Preprocessing JSON Path: `$[?(@.METRIC=='SGA::Large Pool')].VALUE.first()` ⛔️Custom on fail: Set value to: `0`
Oracle: SGA, shared pool	The memory is allocated from a shared pool.	Dependent item	oracle.sgasharedpool Preprocessing JSON Path: `$[?(@.METRIC=='SGA::Shared Pool')].VALUE.first()` ⛔️Custom on fail: Set value to: `0`
Oracle: SGA, log buffer	The number of bytes allocated for the redo log buffer.	Dependent item	oracle.sgalogbuffer Preprocessing JSON Path: `$[?(@.METRIC=='SGA::Log_Buffer')].VALUE.first()` ⛔️Custom on fail: Set value to: `0`
Oracle: SGA, fixed	The fixed System Global Area (SGA) is an internal housekeeping area.	Dependent item	oracle.sga_fixed Preprocessing JSON Path: `$[?(@.METRIC=='SGA::Fixed_Sga')].VALUE.first()` ⛔️Custom on fail: Set value to: `0`
Oracle: SGA, buffer cache	The size of standard block cache.	Dependent item	oracle.sgabuffercache Preprocessing JSON Path: `$[?(@.METRIC=='SGA::Buffer_Cache')].VALUE.first()` ⛔️Custom on fail: Set value to: `0`
Oracle: Redo logs available to switch	The number of inactive/unused redo logs available for log switching.	Dependent item	oracle.redologsavailable Preprocessing JSON Path: `$[?(@.METRIC=='REDO::Available')].VALUE.first()`

Triggers

Name	Description	Expression	Severity
Oracle: Port {$ORACLE.PORT} is unavailable	The TCP port of the Oracle Server service is currently unavailable.	`max(/Oracle by ODBC/net.tcp.service[tcp,{HOST.CONN},{$ORACLE.PORT}],#3)=0 and max(/Oracle by ODBC/proc.num[,,,"tnslsnr LISTENER"],#3)>0`\|Disaster
Oracle: LISTENER process is not running	The Oracle listener process is not running.	`max(/Oracle by ODBC/proc.num[,,,"tnslsnr LISTENER"],#3)=0`\|Disaster
Oracle: Version has changed	The Oracle Database version has changed. Acknowledge to close the problem manually.	`last(/Oracle by ODBC/oracle.version,#1)<>last(/Oracle by ODBC/oracle.version,#2) and length(last(/Oracle by ODBC/oracle.version))>0`\|Info	Manual close: Yes
Oracle: Host has been restarted	Uptime is less than 10 minutes.	`last(/Oracle by ODBC/oracle.uptime)<10m`\|Info	Manual close: Yes
Oracle: Failed to fetch info data	Zabbix has not received any data for the items for the last 5 minutes. The database might be unavailable for connecting.	`nodata(/Oracle by ODBC/oracle.uptime,5m)=1`\|Warning	Depends on: Oracle: Port {$ORACLE.PORT} is unavailable
Oracle: Instance name has changed	An Oracle Database instance name has changed. Acknowledge to close the problem manually.	`last(/Oracle by ODBC/oracle.instance_name,#1)<>last(/Oracle by ODBC/oracle.instance_name,#2) and length(last(/Oracle by ODBC/oracle.instance_name))>0`\|Info	Manual close: Yes
Oracle: Instance hostname has changed	An Oracle Database instance hostname has changed. Acknowledge to close the problem manually.	`last(/Oracle by ODBC/oracle.instance_hostname,#1)<>last(/Oracle by ODBC/oracle.instance_hostname,#2) and length(last(/Oracle by ODBC/oracle.instance_hostname))>0`\|Info	Manual close: Yes
Oracle: Too many active processes	Active processes are using more than `{$ORACLE.PROCESSES.MAX.WARN}`% of the available number of processes.	`min(/Oracle by ODBC/oracle.processes_count,5m) * 100 / last(/Oracle by ODBC/oracle.processes_limit) > {$ORACLE.PROCESSES.MAX.WARN}`\|Warning
Oracle: Too many database files	The number of datafiles is higher than `{$ORACLE.DB.FILE.MAX.WARN}`% of the available datafile limit.	`min(/Oracle by ODBC/oracle.db_files_count,5m) * 100 / last(/Oracle by ODBC/oracle.db_files_limit) > {$ORACLE.DB.FILE.MAX.WARN}`\|Warning
Oracle: Shared pool free is too low	The free memory percent of the shared pool has been less than `{$ORACLE.SHARED.FREE.MIN.WARN}`% for the last 5 minutes.	`max(/Oracle by ODBC/oracle.shared_pool_free,5m)<{$ORACLE.SHARED.FREE.MIN.WARN}`\|Warning
Oracle: Too many active sessions	Active sessions are using more than `{$ORACLE.SESSIONS.MAX.WARN}`% of the available sessions.	`min(/Oracle by ODBC/oracle.session_count,5m) * 100 / last(/Oracle by ODBC/oracle.session_limit) > {$ORACLE.SESSIONS.MAX.WARN}`\|Warning
Oracle: Too many locked sessions	The number of locked sessions exceeds `{$ORACLE.SESSIONS.LOCK.MAX.WARN}`% of the running sessions.	`min(/Oracle by ODBC/oracle.session_lock_rate,5m) > {$ORACLE.SESSIONS.LOCK.MAX.WARN}`\|Warning
Oracle: Too many sessions locked	The number of locked sessions exceeding `{$ORACLE.SESSION.LOCK.MAX.TIME}` seconds is too high. Long-term locks can negatively affect the database performance. Therefore, if they are detected, you should first find the most difficult queries from the database point of view and then analyze possible resource leaks.	`min(/Oracle by ODBC/oracle.session_long_time_locked,5m) > {$ORACLE.SESSION.LONG.LOCK.MAX.WARN}`\|Warning
Oracle: Too high database concurrency	The concurrency rate exceeds `{$ORACLE.CONCURRENCY.MAX.WARN}`%. A high contention value does not indicate the root cause of the problem, but is a signal to review resource consumption (determine the "heaviest" queries in the database, trace sessions, etc.) This will help find the root cause and possible optimization points both in database configuration and the logic of building queries.	`min(/Oracle by ODBC/oracle.session_concurrency_rate,5m) > {$ORACLE.CONCURRENCY.MAX.WARN}`\|Warning
Oracle: Zabbix account will expire soon	The password for the Zabbix user in the database expires soon.	`last(/Oracle by ODBC/oracle.user_expire_password) < {$ORACLE.EXPIRE.PASSWORD.MIN.WARN}`\|Warning
Oracle: Total PGA inuse is too high	The total PGA currently consumed by work areas is more than `{$ORACLE.PGA.USE.MAX.WARN}`% of `PGA_AGGREGATE_TARGET`.	`min(/Oracle by ODBC/oracle.total_pga_used,5m) * 100 / last(/Oracle by ODBC/oracle.pga_target) > {$ORACLE.PGA.USE.MAX.WARN}`\|Warning
Oracle: Number of REDO logs available for switching is too low	The number of inactive/unused redos available for log switching is low (risk of database downtime).	`max(/Oracle by ODBC/oracle.redo_logs_available,5m) < {$ORACLE.REDO.MIN.WARN}`\|Warning

LLD rule Database discovery

Name	Description	Type	Key and additional info
Database discovery	Used for database discovery.	Dependent item	oracle.db.discovery

Item prototypes for Database discovery

Name	Description	Type	Key and additional info
Oracle Database '{#DBNAME}': Get CDB and No-CDB info	Gets the information about the CDB and non-CDB database on an instance.	Database monitor	db.odbc.get[getcdb{#DBNAME}_info,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] Preprocessing JSON Path: `$.first()` ⛔️Custom on fail: Discard value
Oracle Database '{#DBNAME}': Open status	1 - MOUNTED; 2 - READ WRITE; 3 - READ ONLY; 4 - READ ONLY WITH APPLY (a physical standby database is open in real-time query mode).	Dependent item	oracle.dbopenmode["{#DBNAME}"] Preprocessing JSON Path: `$.OPEN_MODE` Discard unchanged with heartbeat: `15m`
Oracle Database '{#DBNAME}': Role	The current role of the database where: 1 - SNAPSHOT STANDBY; 2 - LOGICAL STANDBY; 3 - PHYSICAL STANDBY; 4 - PRIMARY; 5 - FAR SYNC.	Dependent item	oracle.db_role["{#DBNAME}"] Preprocessing JSON Path: `$.ROLE` Discard unchanged with heartbeat: `15m`
Oracle Database '{#DBNAME}': Log mode	The archive log mode where: 0 - NOARCHIVELOG; 1 - ARCHIVELOG; 2 - MANUAL.	Dependent item	oracle.dblogmode["{#DBNAME}"] Preprocessing JSON Path: `$.LOG_MODE` Discard unchanged with heartbeat: `15m`
Oracle Database '{#DBNAME}': Force logging	Indicates whether the database is under force logging mode (`YES`/`NO`).	Dependent item	oracle.dbforcelogging["{#DBNAME}"] Preprocessing JSON Path: `$.FORCE_LOGGING` Discard unchanged with heartbeat: `15m`

Trigger prototypes for Database discovery

Name	Description	Expression	Severity
Oracle Database '{#DBNAME}': Open status in mount mode	The Oracle Database is in a mounted state.	`last(/Oracle by ODBC/oracle.db_open_mode["{#DBNAME}"])=1`\|Warning
Oracle Database '{#DBNAME}': Open status has changed	The Oracle Database open status has changed. Acknowledge to close the problem manually.	`last(/Oracle by ODBC/oracle.db_open_mode["{#DBNAME}"],#1)<>last(/Oracle by ODBC/oracle.db_open_mode["{#DBNAME}"],#2)`\|Info	Manual close: Yes Depends on: Oracle Database '{#DBNAME}': Open status in mount mode
Oracle Database '{#DBNAME}': Role has changed	The Oracle Database role has changed. Acknowledge to close the problem manually.	`last(/Oracle by ODBC/oracle.db_role["{#DBNAME}"],#1)<>last(/Oracle by ODBC/oracle.db_role["{#DBNAME}"],#2)`\|Info	Manual close: Yes
Oracle Database '{#DBNAME}': Force logging is deactivated for DB with active Archivelog	Force logging mode is a very important metric for databases in `ARCHIVELOG`. This feature allows to forcibly write all the transactions to the redo log.	`last(/Oracle by ODBC/oracle.db_force_logging["{#DBNAME}"]) = 0 and last(/Oracle by ODBC/oracle.db_log_mode["{#DBNAME}"]) = 1`\|Warning

LLD rule PDB discovery

Name	Description	Type	Key and additional info
PDB discovery	Used for the discovery of the pluggable database (PDB).	Dependent item	oracle.pdb.discovery

Item prototypes for PDB discovery

Name Description Type Key and additional info

Oracle Database '{#DBNAME}': Get PDB info

Gets the information about the PDB database on an instance.

Database monitor

db.odbc.get[getpdb{#DBNAME}_info,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"]

Preprocessing

JSON Path: $.first()
⛔️Custom on fail: Discard value

Oracle Database '{#DBNAME}': Open status

1 - MOUNTED;

2 - READ WRITE;

3 - READ ONLY;

4 - READ ONLY WITH APPLY (a physical standby database is open in real-time query mode).

Dependent item

oracle.pdbopenmode["{#DBNAME}"]

Preprocessing

JSON Path: $.OPEN_MODE
Discard unchanged with heartbeat: 15m

Trigger prototypes for PDB discovery

Name	Description	Expression	Severity	Dependencies and additional info
Oracle Database '{#DBNAME}': Open status in mount mode	The Oracle Database is in a mounted state.	`last(/Oracle by ODBC/oracle.pdb_open_mode["{#DBNAME}"])=1`\|Warning
Oracle Database '{#DBNAME}': Open status has changed	The Oracle Database open status has changed. Acknowledge to close the problem manually.	`last(/Oracle by ODBC/oracle.pdb_open_mode["{#DBNAME}"],#1)<>last(/Oracle by ODBC/oracle.pdb_open_mode["{#DBNAME}"],#2)`\|Info	Manual close: Yes

LLD rule Tablespace discovery

Name	Description	Type	Key and additional info
Tablespace discovery	Used for the discovery of tablespaces in DBMS.	Dependent item	oracle.tablespace.discovery

Item prototypes for Tablespace discovery

Name	Description	Type	Key and additional info
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Get tablespaces stats	Gets the statistics of the tablespace.	Database monitor	db.odbc.get[get{#CONNAME}tablespace{#TABLESPACE}_stats,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] Preprocessing JSON Path: `$.first()` ⛔️Custom on fail: Discard value
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace allocated, bytes	Currently allocated bytes for the tablespace (sum of the current size of datafiles).	Dependent item	oracle.tbsallocbytes["{#CON_NAME}","{#TABLESPACE}"] Preprocessing JSON Path: `$.FILE_BYTES`
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace MAX size, bytes	The maximum size of the tablespace.	Dependent item	oracle.tbsmaxbytes["{#CON_NAME}","{#TABLESPACE}"] Preprocessing JSON Path: `$.MAX_BYTES`
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace used, bytes	Currently used bytes for the tablespace (current size of datafiles minus the free space).	Dependent item	oracle.tbsusedbytes["{#CON_NAME}","{#TABLESPACE}"] Preprocessing JSON Path: `$.USED_BYTES`
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace free, bytes	Free bytes of the allocated space.	Dependent item	oracle.tbsfreebytes["{#CON_NAME}","{#TABLESPACE}"] Preprocessing JSON Path: `$.FREE_BYTES`
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace allocated, percent	Allocated bytes/max bytes*100.	Dependent item	oracle.tbsusedpct["{#CON_NAME}","{#TABLESPACE}"] Preprocessing JSON Path: `$.USED_PCT_MAX`
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace usage, percent	Used bytes/allocated bytes*100.	Dependent item	oracle.tbsusedfilepct["{#CONNAME}","{#TABLESPACE}"] Preprocessing JSON Path: `$.USED_FILE_PCT`
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace usage from MAX, percent	Used bytes/max bytes*100.	Dependent item	oracle.tbsusedfrommaxpct["{#CON_NAME}","{#TABLESPACE}"] Preprocessing JSON Path: `$.USED_FROM_MAX_PCT`
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Open status	The tablespace status where: 1 - ONLINE; 2 - OFFLINE; 3 - READ ONLY.	Dependent item	oracle.tbsstatus["{#CONNAME}","{#TABLESPACE}"] Preprocessing JSON Path: `$.STATUS`

Trigger prototypes for Tablespace discovery

Name	Description	Expression	Severity
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace utilization is too high	The utilization of the tablespace `{#TABLESPACE}` exceeds `{$ORACLE.TBS.UTIL.PCT.MAX.WARN}`%	`min(/Oracle by ODBC/oracle.tbs_used_pct["{#CON_NAME}","{#TABLESPACE}"],5m)>{$ORACLE.TBS.UTIL.PCT.MAX.WARN}`\|Warning	Depends on: Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace utilization is too high
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace utilization is too high	The utilization of the tablespace `{#TABLESPACE}` exceeds `{$ORACLE.TBS.UTIL.PCT.MAX.HIGH}`%	`min(/Oracle by ODBC/oracle.tbs_used_pct["{#CON_NAME}","{#TABLESPACE}"],5m)>{$ORACLE.TBS.UTIL.PCT.MAX.HIGH}`\|High
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace usage is too high	The usage of the tablespace `{#TABLESPACE}` exceeds `{$ORACLE.TBS.USED.PCT.MAX.WARN}`%	`min(/Oracle by ODBC/oracle.tbs_used_file_pct["{#CON_NAME}","{#TABLESPACE}"],5m)>{$ORACLE.TBS.USED.PCT.MAX.WARN}`\|Warning	Depends on: Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace usage is too high
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace usage is too high	The usage of the tablespace `{#TABLESPACE}` exceeds `{$ORACLE.TBS.USED.PCT.MAX.HIGH}`%	`min(/Oracle by ODBC/oracle.tbs_used_file_pct["{#CON_NAME}","{#TABLESPACE}"],5m)>{$ORACLE.TBS.USED.PCT.MAX.HIGH}`\|High
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace usage from MAX is too high	The usage of the tablespace `{#TABLESPACE}` from MAX exceeds `{$ORACLE.TBS.USED.PCT.FROM.MAX.WARN}`%	`min(/Oracle by ODBC/oracle.tbs_used_from_max_pct["{#CON_NAME}","{#TABLESPACE}"],5m)>{$ORACLE.TBS.USED.PCT.FROM.MAX.WARN}`\|Warning	Depends on: Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace usage from MAX is too high
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace usage from MAX is too high	The usage of the tablespace `{#TABLESPACE}` from MAX exceeds `{$ORACLE.TBS.USED.PCT.FROM.MAX.HIGH}`%	`min(/Oracle by ODBC/oracle.tbs_used_from_max_pct["{#CON_NAME}","{#TABLESPACE}"],5m)>{$ORACLE.TBS.USED.PCT.FROM.MAX.HIGH}`\|High
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace is OFFLINE	The tablespace is in the offline state.	`last(/Oracle by ODBC/oracle.tbs_status["{#CON_NAME}","{#TABLESPACE}"])=2`\|Warning
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace status has changed	Oracle tablespace status has changed. Acknowledge to close the problem manually.	`last(/Oracle by ODBC/oracle.tbs_status["{#CON_NAME}","{#TABLESPACE}"],#1)<>last(/Oracle by ODBC/oracle.tbs_status["{#CON_NAME}","{#TABLESPACE}"],#2)`\|Info	Manual close: Yes Depends on: Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace is OFFLINE

LLD rule Archive log discovery

Name	Description	Type	Key and additional info
Archive log discovery	Used for the discovery of the log archive.	Dependent item	oracle.archivelog.discovery

Item prototypes for Archive log discovery

Name	Description	Type	Key and additional info
Archivelog '{#DEST_NAME}': Get archive log info	Gets the archive log statistics.	Database monitor	db.odbc.get[getarchivelog{#DESTNAME}stat,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] Preprocessing JSON Path: `$.first()` ⛔️Custom on fail: Discard value
Archivelog '{#DEST_NAME}': Error	Displays the error message.	Dependent item	oracle.archivelogerror["{#DESTNAME}"] Preprocessing JSON Path: `$.ERROR` Discard unchanged with heartbeat: `1h`
Archivelog '{#DEST_NAME}': Last sequence	Identifies the sequence number of the last archived redo log to be archived.	Dependent item	oracle.archiveloglogsequence["{#DEST_NAME}"] Preprocessing JSON Path: `$.LOG_SEQUENCE`
Archivelog '{#DEST_NAME}': Status	Identifies the current status of the destination where: 1 - VALID; 2 - DEFERRED; 3 - ERROR; 0 - UNKNOWN.	Dependent item	oracle.archiveloglogstatus["{#DEST_NAME}"] Preprocessing JSON Path: `$.STATUS` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Archive log discovery

Name	Description	Expression	Severity	Dependencies and additional info
Archivelog '{#DEST_NAME}': Log Archive is not valid	The trigger will launch if the archive log destination is not in one of these states: 2 - DEFERRED; 3 - VALID.	`last(/Oracle by ODBC/oracle.archivelog_log_status["{#DEST_NAME}"])<2`\|High

LLD rule ASM disk groups discovery

Name	Description	Type	Key and additional info
ASM disk groups discovery	Used for discovering the ASM disk groups.	Dependent item	oracle.asm.discovery

Item prototypes for ASM disk groups discovery

Name	Description	Type	Key and additional info
ASM '{#DGNAME}': Get ASM stats	Gets the ASM disk group statistics.	Database monitor	db.odbc.get[getasm{#DGNAME}_stat,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] Preprocessing JSON Path: `$.first()` ⛔️Custom on fail: Discard value
ASM '{#DGNAME}': Total size	The total size of the ASM disk group.	Dependent item	oracle.asmtotalsize["{#DGNAME}"] Preprocessing JSON Path: `$.SIZE_BYTE`
ASM '{#DGNAME}': Free size	The free size of the ASM disk group.	Dependent item	oracle.asmfreesize["{#DGNAME}"] Preprocessing JSON Path: `$.FREE_SIZE_BYTE`
ASM '{#DGNAME}': Used size, percent	Usage of the ASM disk group expressed in %.	Dependent item	oracle.asmusedpct["{#DGNAME}"] Preprocessing JSON Path: `$.USED_PERCENT`

Trigger prototypes for ASM disk groups discovery

Name	Description	Expression	Severity	Dependencies and additional info
ASM '{#DGNAME}': Disk group usage is too high	The usage of the ASM disk group expressed in % exceeds `{$ORACLE.ASM.USED.PCT.MAX.WARN}`.	`min(/Oracle by ODBC/oracle.asm_used_pct["{#DGNAME}"],5m)>{$ORACLE.ASM.USED.PCT.MAX.WARN}`\|Warning	Depends on: ASM '{#DGNAME}': Disk group usage is too high
ASM '{#DGNAME}': Disk group usage is too high	The usage of the ASM disk group expressed in % exceeds `{$ORACLE.ASM.USED.PCT.MAX.WARN}`.	`min(/Oracle by ODBC/oracle.asm_used_pct["{#DGNAME}"],5m)>{$ORACLE.ASM.USED.PCT.MAX.HIGH}`\|High

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

db_oracle_agent2

View README Download JSON

Oracle by Zabbix agent 2

Overview

The template is developed to monitor a single DBMS Oracle Database instance with Zabbix agent 2.

Supported versions

Oracle Database 12c2 and newer.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

Oracle Database 12c2, 18c, 19c, 21c, 23c

Configuration

Setup

Setup and configure Zabbix agent 2 compiled with the Oracle monitoring plugin. See the setup instructions for Oracle Database plugin.
Set the {$ORACLE.CONNSTRING} macro value using either
If you want to override parameters from Zabbix agent configuration file, set the user name, password and service name in host macros ({$ORACLE.USER}, {$ORACLE.PASSWORD}, and {$ORACLE.SERVICE}).

User can contain sysdba, sysoper, sysasm privileges. It must be used with as as a separator e.g user as sysdba, privilege can be upper or lowercase, and must be at the end of username string.

Test availability: zabbix_get -s oracle-host -k oracle.ping["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"]

Macros used

Name	Description	Default
{$ORACLE.USER}	Oracle username.	`zabbix`
{$ORACLE.PASSWORD}	Oracle user's password.	`zabbix_password`
{$ORACLE.CONNSTRING}	Oracle URI or a session name.	`tcp://localhost:1521`
{$ORACLE.SERVICE}	Oracle Service Name.	`ORA`
{$ORACLE.DBNAME.MATCHES}	Used in database discovery. It can be overridden on the host or linked template level.	`.*`
{$ORACLE.DBNAME.NOT_MATCHES}	Used in database discovery. It can be overridden on the host or linked template level.	`PDB\$SEED`
{$ORACLE.TABLESPACE.NAME.MATCHES}	Used in tablespace discovery. It can be overridden on the host or linked template level.	`.*`
{$ORACLE.TABLESPACE.NAME.NOT_MATCHES}	Used in tablespace discovery. It can be overridden on the host or linked template level.	`CHANGE_IF_NEEDED`
{$ORACLE.TBS.USED.PCT.FROM.MAX.WARN}	Warning severity alert threshold for the maximum percentage of tablespace usage from maximum tablespace size (used bytes/max bytes) for the Warning trigger expression.	`90`
{$ORACLE.TBS.USED.PCT.FROM.MAX.HIGH}	High severity alert threshold for the maximum percentage of tablespace usage (used bytes/max bytes) for the High trigger expression.	`95`
{$ORACLE.TBS.USED.PCT.MAX.WARN}	Warning severity alert threshold for the maximum percentage of tablespace usage (used bytes/allocated bytes) for the Warning trigger expression.	`90`
{$ORACLE.TBS.USED.PCT.MAX.HIGH}	High severity alert threshold for the maximum percentage of tablespace usage (used bytes/allocated bytes) for the High trigger expression.	`95`
{$ORACLE.TBS.UTIL.PCT.MAX.WARN}	Warning severity alert threshold for the maximum percentage of tablespace utilization (allocated bytes/max bytes) for the High trigger expression.	`80`
{$ORACLE.TBS.UTIL.PCT.MAX.HIGH}	High severity alert threshold for the maximum percentage of tablespace utilization (allocated bytes/max bytes) for the High trigger expression.	`90`
{$ORACLE.PROCESSES.MAX.WARN}	Alert threshold for the maximum percentage of active processes for the Warning trigger expression.	`80`
{$ORACLE.SESSIONS.MAX.WARN}	Alert threshold for the maximum percentage of active sessions for the Warning trigger expression.	`80`
{$ORACLE.DB.FILE.MAX.WARN}	The maximum percentage of used database files for the Warning trigger expression.	`80`
{$ORACLE.PGA.USE.MAX.WARN}	Alert threshold for the maximum percentage of the Program Global Area (PGA) usage for the Warning trigger expression.	`90`
{$ORACLE.SESSIONS.LOCK.MAX.WARN}	Alert threshold for the maximum percentage of locked sessions for the Warning trigger expression.	`20`
{$ORACLE.SESSION.LOCK.MAX.TIME}	The maximum duration of the session lock in seconds to count the session as a prolongedly locked query.	`600`
{$ORACLE.SESSION.LONG.LOCK.MAX.WARN}	Alert threshold for the maximum number of the prolongedly locked sessions for the Warning trigger expression.	`3`
{$ORACLE.CONCURRENCY.MAX.WARN}	The maximum percentage of session concurrency for the Warning trigger expression.	`80`
{$ORACLE.REDO.MIN.WARN}	Alert threshold for the minimum number of redo logs for the Warning trigger expression.	`3`
{$ORACLE.SHARED.FREE.MIN.WARN}	Alert threshold for the minimum percentage of free shared pool for the Warning trigger expression.	`5`
{$ORACLE.EXPIRE.PASSWORD.MIN.WARN}	The number of days before the password expires for the Warning trigger expression.	`7`
{$ORACLE.ASM.USED.PCT.MAX.WARN}	The maximum percentage of used space in the Automatic Storage Management (ASM) disk group for the Warning trigger expression.	`90`
{$ORACLE.ASM.USED.PCT.MAX.HIGH}	The maximum percentage of used space in the Automatic Storage Management (ASM) disk group for the High trigger expression.	`95`

Items

Name	Description	Type	Key and additional info
Oracle: Ping	Test the connection to Oracle Database state.	Zabbix agent	oracle.ping["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] Preprocessing Discard unchanged with heartbeat: `10m`
Oracle: Get instance state	Gets the state of the current instance.	Zabbix agent	oracle.instance.info["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"]
Oracle: Version	The Oracle Server version.	Dependent item	oracle.version Preprocessing JSON Path: `$.version` Discard unchanged with heartbeat: `1d`
Oracle: Uptime	The Oracle instance uptime expressed in seconds.	Dependent item	oracle.uptime Preprocessing JSON Path: `$.uptime`
Oracle: Instance status	The status of the instance.	Dependent item	oracle.instance_status Preprocessing JSON Path: `$.status`
Oracle: Archiver state	The status of automatic archiving.	Dependent item	oracle.archiver_state Preprocessing JSON Path: `$..archiver.first()`
Oracle: Instance name	The name of the instance.	Dependent item	oracle.instance_name Preprocessing JSON Path: `$.instance`
Oracle: Instance hostname	The name of the host machine.	Dependent item	oracle.instance_hostname Preprocessing JSON Path: `$..hostname.first()`
Oracle: Instance role	Indicates whether the instance is an active instance or an inactive secondary instance.	Dependent item	oracle.instance.role Preprocessing JSON Path: `$.role`
Oracle: Get system metrics	Gets the values of the system metrics.	Zabbix agent	oracle.sys.metrics["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"]
Oracle: Buffer cache hit ratio	The ratio of buffer cache hits ((LogRead - PhyRead)/LogRead).	Dependent item	oracle.buffercachehit_ratio Preprocessing JSON Path: `$.['Buffer Cache Hit Ratio']`
Oracle: Cursor cache hit ratio	The ratio of cursor cache hits (CursorCacheHit/SoftParse).	Dependent item	oracle.cursorcachehit_ratio Preprocessing JSON Path: `$.['Cursor Cache Hit Ratio']`
Oracle: Library cache hit ratio	The ratio of library cache hits (Hits/Pins).	Dependent item	oracle.librarycachehit_ratio Preprocessing JSON Path: `$.['Library Cache Hit Ratio']`
Oracle: Shared pool free %	Free memory of a shared pool expressed in %.	Dependent item	oracle.sharedpoolfree Preprocessing JSON Path: `$.['Shared Pool Free %']`
Oracle: Physical reads per second	Reads per second.	Dependent item	oracle.physicalreadsrate Preprocessing JSON Path: `$.['Physical Reads Per Sec']`
Oracle: Physical writes per second	Writes per second.	Dependent item	oracle.physicalwritesrate Preprocessing JSON Path: `$.['Physical Writes Per Sec']`
Oracle: Physical reads bytes per second	Read bytes per second.	Dependent item	oracle.physicalreadbytes_rate Preprocessing JSON Path: `$.['Physical Read Bytes Per Sec']`
Oracle: Physical writes bytes per second	Write bytes per second.	Dependent item	oracle.physicalwritebytes_rate Preprocessing JSON Path: `$.['Physical Write Bytes Per Sec']`
Oracle: Enqueue timeouts per second	Enqueue timeouts per second.	Dependent item	oracle.enqueuetimeoutsrate Preprocessing JSON Path: `$.['Enqueue Timeouts Per Sec']`
Oracle: GC CR block received per second	The global cache (GC) and the consistent read (CR) block received per second.	Dependent item	oracle.gccrblockreceivedrate Preprocessing JSON Path: `$.['GC CR Block Received Per Second']`
Oracle: Global cache blocks corrupted	The number of blocks that encountered corruption or checksum failure during the interconnect.	Dependent item	oracle.cacheblockscorrupt Preprocessing JSON Path: `$.['Global Cache Blocks Corrupted']`
Oracle: Global cache blocks lost	The number of lost global cache blocks.	Dependent item	oracle.cacheblockslost Preprocessing JSON Path: `$.['Global Cache Blocks Lost']`
Oracle: Logons per second	The number of logon attempts.	Dependent item	oracle.logons_rate Preprocessing JSON Path: `$.['Logons Per Sec']`
Oracle: Average active sessions	The average number of active sessions at a point in time that are either working or waiting.	Dependent item	oracle.active_sessions Preprocessing JSON Path: `$.['Average Active Sessions']`
Oracle: Active serial sessions	The number of active serial sessions.	Dependent item	oracle.activeserialsessions Preprocessing JSON Path: `$.['Active Serial Sessions']`
Oracle: Active parallel sessions	The number of active parallel sessions.	Dependent item	oracle.activeparallelsessions Preprocessing JSON Path: `$.['Active Parallel Sessions']`
Oracle: Long table scans per second	The number of long table scans per second. A table is considered long if it is not cached and if its high water mark is greater than five blocks.	Dependent item	oracle.longtablescans_rate Preprocessing JSON Path: `$.['Long Table Scans Per Sec']`
Oracle: SQL service response time	The Structured Query Language (SQL) service response time expressed in seconds.	Dependent item	oracle.serviceresponsetime Preprocessing JSON Path: `$.['SQL Service Response Time']` Custom multiplier: `0.01`
Oracle: User rollbacks per second	The number of times that users manually issued the `ROLLBACK` statement or an error occurred during the users' transactions.	Dependent item	oracle.userrollbacksrate Preprocessing JSON Path: `$.['User Rollbacks Per Sec']`
Oracle: Total sorts per user call	The total sorts per user call.	Dependent item	oracle.sortsperuser_call Preprocessing JSON Path: `$.['Total Sorts Per User Call']`
Oracle: Rows per sort	The average number of rows per sort for all types of sorts performed.	Dependent item	oracle.rowspersort Preprocessing JSON Path: `$.['Rows Per Sort']`
Oracle: Disk sort per second	The number of sorts going to disk per second.	Dependent item	oracle.disk_sorts Preprocessing JSON Path: `$.['Disk Sort Per Sec']`
Oracle: Memory sorts ratio	The percentage of sorts (from `ORDER BY` clauses or index building) that are done to disk vs. in-memory.	Dependent item	oracle.memorysortsratio Preprocessing JSON Path: `$.['Memory Sorts Ratio']`
Oracle: Database wait time ratio	Wait time - the time that the server process spends waiting for available shared resources to be released by other server processes such as latches, locks, data buffers, etc.	Dependent item	oracle.databasewaittime_ratio Preprocessing JSON Path: `$.['Database Wait Time Ratio']`
Oracle: Database CPU time ratio	The ratio calculated by dividing the total CPU (used by the database) by the Oracle time model statistic DB time.	Dependent item	oracle.databasecputime_ratio Preprocessing JSON Path: `$.['Database CPU Time Ratio']`
Oracle: Temp space used	Used temporary space.	Dependent item	oracle.tempspaceused Preprocessing JSON Path: `$.['Temp Space Used']`
Oracle: Get system parameters	Get a set of system parameter values.	Zabbix agent	oracle.sys.params["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"]
Oracle: Sessions limit	The user and system sessions.	Dependent item	oracle.session_limit Preprocessing JSON Path: `$.sessions`
Oracle: Datafiles limit	The maximum allowable number of datafiles.	Dependent item	oracle.dbfileslimit Preprocessing JSON Path: `$.db_files`
Oracle: Processes limit	The maximum number of user processes.	Dependent item	oracle.processes_limit Preprocessing JSON Path: `$.processes`
Oracle: Get sessions stats	Get sessions statistics. {$ORACLE.SESSION.LOCK.MAX.TIME} -- maximum seconds in the current wait condition for counting long time locked sessions. Default: 600 seconds.	Zabbix agent	oracle.sessions.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}","{$ORACLE.SESSION.LOCK.MAX.TIME}"]
Oracle: Session count	The session count.	Dependent item	oracle.session_count Preprocessing JSON Path: `$.total`
Oracle: Active user sessions	The number of active user sessions.	Dependent item	oracle.sessionactiveuser Preprocessing JSON Path: `$.active_user` ⛔️Custom on fail: Set value to: `0`
Oracle: Active background sessions	The number of active background sessions.	Dependent item	oracle.sessionactivebackground Preprocessing JSON Path: `$.active_background` ⛔️Custom on fail: Set value to: `0`
Oracle: Inactive user sessions	The number of inactive user sessions.	Dependent item	oracle.sessioninactiveuser Preprocessing JSON Path: `$.inactive_user` ⛔️Custom on fail: Set value to: `0`
Oracle: Sessions lock rate	The percentage of locked sessions. Locks are mechanisms that prevent destructive interaction between transactions accessing the same resource - either user objects, such as tables and rows or system objects not visible to users, such as shared data structures in memory and data dictionary rows.	Dependent item	oracle.sessionlockrate Preprocessing JSON Path: `$.lock_rate`
Oracle: Sessions locked over {$ORACLE.SESSION.LOCK.MAX.TIME}s	The count of the prolongedly locked sessions. (You can change the duration of the maximum session lock in seconds for a query using the `{$ORACLE.SESSION.LOCK.MAX.TIME}` macro. Default = 600 s).	Dependent item	oracle.sessionlongtime_locked Preprocessing JSON Path: `$.long_time_locked`
Oracle: Sessions concurrency	The percentage of concurrency. Concurrency is a database behavior when different transactions request to change the same resource. In the case of modifying data transactions, it sequentially temporarily blocks the right to change the data, and the rest of the transactions wait for access. When the access to a resource is locked for a long time, the concurrency grows (like the transaction queue), often leaving an extremely negative impact on performance. A high contention value does not indicate the root cause of the problem, but is a signal to search for it.	Dependent item	oracle.sessionconcurrencyrate Preprocessing JSON Path: `$.concurrency_rate`
Oracle: Get PGA stats	Get PGA statistics.	Zabbix agent	oracle.pga.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"]
Oracle: PGA, Total inuse	The amount of Program Global Area (PGA) memory currently consumed by work areas. This number can be used to determine how much memory is consumed by other consumers of the PGA memory (for example, PL/SQL or Java).	Dependent item	oracle.totalpgaused Preprocessing JSON Path: `$.['total PGA inuse']`
Oracle: PGA, Aggregate target parameter	The current value of the `PGA_AGGREGATE_TARGET` initialization parameter. If this parameter is not set, then its value is "0" and automatic management of the PGA memory is disabled.	Dependent item	oracle.pga_target Preprocessing JSON Path: `$.['aggregate PGA target parameter']`
Oracle: PGA, Total allocated	The current amount of the PGA memory allocated by the instance. The Oracle Database attempts to keep this number below the value of the `PGA_AGGREGATE_TARGET` initialization parameter. However, it is possible for the PGA allocated to exceed that value by a small percentage and for a short period of time when the work area workload is increasing very rapidly or when `PGA_AGGREGATE_TARGET` is set to a small value.	Dependent item	oracle.totalpgaallocated Preprocessing JSON Path: `$.['total PGA allocated']`
Oracle: PGA, Total freeable	The number of bytes of the PGA memory in all processes that could be freed back to the OS.	Dependent item	oracle.totalpgafreeable Preprocessing JSON Path: `$.['total freeable PGA memory']`
Oracle: PGA, Global memory bound	The maximum size of a work area executed in automatic mode.	Dependent item	oracle.pgaglobalbound Preprocessing JSON Path: `$.['global memory bound']`
Oracle: Get FRA stats	Get FRA statistics.	Zabbix agent	oracle.fra.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"]
Oracle: FRA, Space limit	The maximum amount of disk space (in bytes) that the database can use for the Fast Recovery Area (FRA).	Dependent item	oracle.fraspacelimit Preprocessing JSON Path: `$.space_limit`
Oracle: FRA, Used space	The amount of disk space (in bytes) used by FRA files created in the current and all the previous FRAs.	Dependent item	oracle.fraspaceused Preprocessing JSON Path: `$.space_used`
Oracle: FRA, Space reclaimable	The total amount of disk space (in bytes) that can be created by deleting obsolete, redundant, and other low-priority files from the FRA.	Dependent item	oracle.fraspacereclaimable Preprocessing JSON Path: `$.space_reclaimable`
Oracle: FRA, Number of files	The number of files in the FRA.	Dependent item	oracle.franumberof_files Preprocessing JSON Path: `$.number_of_files`
Oracle: FRA, Usable space in %	Percentage of space usable in the FRA.	Dependent item	oracle.frausablepct Preprocessing JSON Path: `$.usable_pct`
Oracle: FRA, Number of restore points	Number of restore points in the FRA.	Dependent item	oracle.frarestorepoint Preprocessing JSON Path: `$.restore_point`
Oracle: Get SGA stats	Get SGA statistics.	Zabbix agent	oracle.sga.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"]
Oracle: SGA, java pool	The memory is allocated from the Java pool.	Dependent item	oracle.sgajavapool Preprocessing JSON Path: `$.java_pool`
Oracle: SGA, large pool	The memory is allocated from a large pool.	Dependent item	oracle.sgalargepool Preprocessing JSON Path: `$.large_pool`
Oracle: SGA, shared pool	The memory is allocated from a shared pool.	Dependent item	oracle.sgasharedpool Preprocessing JSON Path: `$.shared_pool`
Oracle: SGA, log buffer	The number of bytes allocated for the redo log buffer.	Dependent item	oracle.sgalogbuffer Preprocessing JSON Path: `$.log_buffer`
Oracle: SGA, fixed	The fixed System Global Area (SGA) is an internal housekeeping area.	Dependent item	oracle.sga_fixed Preprocessing JSON Path: `$.fixed_sga`
Oracle: SGA, buffer cache	The size of standard block cache.	Dependent item	oracle.sgabuffercache Preprocessing JSON Path: `$.buffer_cache`
Oracle: User's expire password	The number of days before the Zabbix account password expires.	Zabbix agent	oracle.user.info["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] Preprocessing JSON Path: `$.exp_passwd_days_before`
Oracle: Redo logs available to switch	The number of inactive/unused redo logs available for log switching.	Zabbix agent	oracle.redolog.info["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] Preprocessing JSON Path: `$.available`
Oracle: Number of processes	The current number of user processes.	Zabbix agent	oracle.proc.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] Preprocessing JSON Path: `$.proc_num`
Oracle: Datafiles count	The current number of datafiles.	Zabbix agent	oracle.datafiles.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] Preprocessing JSON Path: `$.datafile_num`

Triggers

Name	Description	Expression	Severity
Oracle: Connection to database is unavailable	Connection to Oracle Database is currently unavailable.	`last(/Oracle by Zabbix agent 2/oracle.ping["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"])=0`\|Disaster
Oracle: Version has changed	The Oracle Database version has changed. Acknowledge to close the problem manually.	`last(/Oracle by Zabbix agent 2/oracle.version,#1)<>last(/Oracle by Zabbix agent 2/oracle.version,#2) and length(last(/Oracle by Zabbix agent 2/oracle.version))>0`\|Info	Manual close: Yes
Oracle: Failed to fetch info data	Zabbix has not received any data for the items for the last 5 minutes. The database might be unavailable for connecting.	`nodata(/Oracle by Zabbix agent 2/oracle.uptime,30m)=1`\|Info
Oracle: Host has been restarted	Uptime is less than 10 minutes.	`last(/Oracle by Zabbix agent 2/oracle.uptime)<10m`\|Info	Manual close: Yes
Oracle: Instance name has changed	An Oracle Database instance name has changed. Acknowledge to close the problem manually.	`last(/Oracle by Zabbix agent 2/oracle.instance_name,#1)<>last(/Oracle by Zabbix agent 2/oracle.instance_name,#2) and length(last(/Oracle by Zabbix agent 2/oracle.instance_name))>0`\|Info	Manual close: Yes
Oracle: Instance hostname has changed	An Oracle Database instance hostname has changed. Acknowledge to close the problem manually.	`last(/Oracle by Zabbix agent 2/oracle.instance_hostname,#1)<>last(/Oracle by Zabbix agent 2/oracle.instance_hostname,#2) and length(last(/Oracle by Zabbix agent 2/oracle.instance_hostname))>0`\|Info	Manual close: Yes
Oracle: Shared pool free is too low	The free memory percent of the shared pool has been less than `{$ORACLE.SHARED.FREE.MIN.WARN}`% for the last 5 minutes.	`max(/Oracle by Zabbix agent 2/oracle.shared_pool_free,5m)<{$ORACLE.SHARED.FREE.MIN.WARN}`\|Warning
Oracle: Too many active sessions	Active sessions are using more than `{$ORACLE.SESSIONS.MAX.WARN}`% of the available sessions.	`min(/Oracle by Zabbix agent 2/oracle.session_count,5m) * 100 / last(/Oracle by Zabbix agent 2/oracle.session_limit) > {$ORACLE.SESSIONS.MAX.WARN}`\|Warning
Oracle: Too many locked sessions	The number of locked sessions exceeds `{$ORACLE.SESSIONS.LOCK.MAX.WARN}`% of the running sessions.	`min(/Oracle by Zabbix agent 2/oracle.session_lock_rate,5m) > {$ORACLE.SESSIONS.LOCK.MAX.WARN}`\|Warning
Oracle: Too many sessions locked	The number of locked sessions exceeding `{$ORACLE.SESSION.LOCK.MAX.TIME}` seconds is too high. Long-term locks can negatively affect the database performance. Therefore, if they are detected, you should first find the most difficult queries from the database point of view and then analyze possible resource leaks.	`min(/Oracle by Zabbix agent 2/oracle.session_long_time_locked,5m) > {$ORACLE.SESSION.LONG.LOCK.MAX.WARN}`\|Warning
Oracle: Too high database concurrency	The concurrency rate exceeds `{$ORACLE.CONCURRENCY.MAX.WARN}`%. A high contention value does not indicate the root cause of the problem, but is a signal to review resource consumption (determine the "heaviest" queries in the database, trace sessions, etc.) This will help find the root cause and possible optimization points both in database configuration and the logic of building queries.	`min(/Oracle by Zabbix agent 2/oracle.session_concurrency_rate,5m) > {$ORACLE.CONCURRENCY.MAX.WARN}`\|Warning
Oracle: Total PGA inuse is too high	The total PGA in use is more than `{$ORACLE.PGA.USE.MAX.WARN}`% of `PGA_AGGREGATE_TARGET`.	`min(/Oracle by Zabbix agent 2/oracle.total_pga_used,5m) * 100 / last(/Oracle by Zabbix agent 2/oracle.pga_target) > {$ORACLE.PGA.USE.MAX.WARN}`\|Warning
Oracle: Zabbix account will expire soon	The password for the Zabbix user in the database expires soon.	`last(/Oracle by Zabbix agent 2/oracle.user.info["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"]) < {$ORACLE.EXPIRE.PASSWORD.MIN.WARN}`\|Warning
Oracle: Number of REDO logs available for switching is too low	The number of inactive/unused redos available for log switching is low (risk of database downtime).	`max(/Oracle by Zabbix agent 2/oracle.redolog.info["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"],5m) < {$ORACLE.REDO.MIN.WARN}`\|Warning
Oracle: Too many active processes	Active processes are using more than `{$ORACLE.PROCESSES.MAX.WARN}`% of the available number of processes.	`min(/Oracle by Zabbix agent 2/oracle.proc.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"],5m) * 100 / last(/Oracle by Zabbix agent 2/oracle.processes_limit) > {$ORACLE.PROCESSES.MAX.WARN}`\|Warning
Oracle: Too many database files	The number of datafiles is higher than `{$ORACLE.DB.FILE.MAX.WARN}`% of the available datafile limit.	`min(/Oracle by Zabbix agent 2/oracle.datafiles.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"],5m) * 100 / last(/Oracle by Zabbix agent 2/oracle.db_files_limit) > {$ORACLE.DB.FILE.MAX.WARN}`\|Warning

LLD rule Database discovery

Name	Description	Type	Key and additional info
Database discovery	Scanning databases in the database management system (DBMS).	Zabbix agent	oracle.db.discovery["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"]

Item prototypes for Database discovery

Name	Description	Type	Key and additional info
Oracle Database '{#DBNAME}': Get CDB and No-CDB info	Gets the information about the CDB and non-CDB database on an instance.	Zabbix agent	oracle.cdb.info["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}","{#DBNAME}"]
Oracle Database '{#DBNAME}': Open status	1 - MOUNTED; 2 - READ WRITE; 3 - READ ONLY; 4 - READ ONLY WITH APPLY (a physical standby database is open in real-time query mode).	Dependent item	oracle.dbopenmode["{#DBNAME}"] Preprocessing JSON Path: `$..{#DBNAME}.open_mode.first()` Discard unchanged with heartbeat: `15m`
Oracle Database '{#DBNAME}': Role	The current role of the database where: 1 - SNAPSHOT STANDBY; 2 - LOGICAL STANDBY; 3 - PHYSICAL STANDBY; 4 - PRIMARY; 5 - FAR SYNC.	Dependent item	oracle.db_role["{#DBNAME}"] Preprocessing JSON Path: `$..{#DBNAME}.role.first()` Discard unchanged with heartbeat: `15m`
Oracle Database '{#DBNAME}': Log mode	The archive log mode where: 0 - NOARCHIVELOG; 1 - ARCHIVELOG; 2 - MANUAL.	Dependent item	oracle.dblogmode["{#DBNAME}"] Preprocessing JSON Path: `$..{#DBNAME}.log_mode.first()` Discard unchanged with heartbeat: `15m`
Oracle Database '{#DBNAME}': Force logging	Indicates whether the database is under force logging mode (`YES`/`NO`).	Dependent item	oracle.dbforcelogging["{#DBNAME}"] Preprocessing JSON Path: `$..{#DBNAME}.force_logging.first()` Discard unchanged with heartbeat: `15m`

Trigger prototypes for Database discovery

Name	Description	Expression	Severity
Oracle Database '{#DBNAME}': Open status in mount mode	The Oracle Database is in a mounted state.	`last(/Oracle by Zabbix agent 2/oracle.db_open_mode["{#DBNAME}"])=1`\|Warning
Oracle Database '{#DBNAME}': Open status has changed	The Oracle Database open status has changed. Acknowledge to close the problem manually.	`last(/Oracle by Zabbix agent 2/oracle.db_open_mode["{#DBNAME}"],#1)<>last(/Oracle by Zabbix agent 2/oracle.db_open_mode["{#DBNAME}"],#2)`\|Info	Manual close: Yes Depends on: Oracle Database '{#DBNAME}': Open status in mount mode
Oracle Database '{#DBNAME}': Role has changed	The Oracle Database role has changed. Acknowledge to close the problem manually.	`last(/Oracle by Zabbix agent 2/oracle.db_role["{#DBNAME}"],#1)<>last(/Oracle by Zabbix agent 2/oracle.db_role["{#DBNAME}"],#2)`\|Info	Manual close: Yes
Oracle Database '{#DBNAME}': Force logging is deactivated for DB with active Archivelog	Force logging mode is a very important metric for databases in `ARCHIVELOG`. This feature allows to forcibly write all the transactions to the redo log.	`last(/Oracle by Zabbix agent 2/oracle.db_force_logging["{#DBNAME}"]) = 0 and last(/Oracle by Zabbix agent 2/oracle.db_log_mode["{#DBNAME}"]) = 1`\|Warning

LLD rule PDB discovery

Name	Description	Type	Key and additional info
PDB discovery	Scanning a pluggable database (PDB) in DBMS.	Zabbix agent	oracle.pdb.discovery["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"]

Item prototypes for PDB discovery

Name Description Type Key and additional info

Oracle Database '{#DBNAME}': Get PDB info

Gets the information about the PDB database on an instance.

Zabbix agent

oracle.pdb.info["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}","{#DBNAME}"]

Oracle Database '{#DBNAME}': Open status

1 - MOUNTED;

2 - READ WRITE;

3 - READ ONLY;

4 - READ ONLY WITH APPLY (a physical standby database is open in real-time query mode).

Dependent item

oracle.pdbopenmode["{#DBNAME}"]

Preprocessing

JSON Path: $..{#DBNAME}.open_mode.first()
Discard unchanged with heartbeat: 15m

Trigger prototypes for PDB discovery

Name	Description	Expression	Severity	Dependencies and additional info
Oracle Database '{#DBNAME}': Open status in mount mode	The Oracle Database is in a mounted state.	`last(/Oracle by Zabbix agent 2/oracle.pdb_open_mode["{#DBNAME}"])=1`\|Warning
Oracle Database '{#DBNAME}': Open status has changed	The Oracle Database open status has changed. Acknowledge to close the problem manually.	`last(/Oracle by Zabbix agent 2/oracle.pdb_open_mode["{#DBNAME}"],#1)<>last(/Oracle by Zabbix agent 2/oracle.pdb_open_mode["{#DBNAME}"],#2)`\|Info	Manual close: Yes

LLD rule Tablespace discovery

Name	Description	Type	Key and additional info
Tablespace discovery	Scanning tablespaces in DBMS.	Zabbix agent	oracle.ts.discovery["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"]

Item prototypes for Tablespace discovery

Name	Description	Type	Key and additional info
Oracle TBS '{#TABLESPACE}': Get tablespaces stats	Gets the statistics of the tablespace.	Zabbix agent	oracle.ts.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}","{#TABLESPACE}","{#CONTENTS}"]
Oracle TBS '{#TABLESPACE}': Tablespace allocated, bytes	Currently allocated bytes for the tablespace (sum of the current size of datafiles).	Dependent item	oracle.tbsallocbytes["{#TABLESPACE}"] Preprocessing JSON Path: `$..['{#TABLESPACE}'].file_bytes.first()`
Oracle TBS '{#TABLESPACE}': Tablespace MAX size, bytes	The maximum size of the tablespace.	Dependent item	oracle.tbsmaxbytes["{#TABLESPACE}"] Preprocessing JSON Path: `$..['{#TABLESPACE}'].max_bytes.first()`
Oracle TBS '{#TABLESPACE}': Tablespace used, bytes	Currently used bytes for the tablespace (current size of datafiles minus the free space).	Dependent item	oracle.tbsusedbytes["{#TABLESPACE}"] Preprocessing JSON Path: `$..['{#TABLESPACE}'].used_bytes.first()`
Oracle TBS '{#TABLESPACE}': Tablespace free, bytes	Free bytes of the allocated space.	Dependent item	oracle.tbsfreebytes["{#TABLESPACE}"] Preprocessing JSON Path: `$..['{#TABLESPACE}'].free_bytes.first()`
Oracle TBS '{#TABLESPACE}': Tablespace usage, percent	Used bytes/allocated bytes*100.	Dependent item	oracle.tbsusedfile_pct["{#TABLESPACE}"] Preprocessing JSON Path: `$..['{#TABLESPACE}'].used_file_pct.first()`
Oracle TBS '{#TABLESPACE}': Tablespace allocated, percent	Allocated bytes/max bytes*100.	Dependent item	oracle.tbsusedpct["{#TABLESPACE}"] Preprocessing JSON Path: `$..['{#TABLESPACE}'].used_pct_max.first()`
Oracle TBS '{#TABLESPACE}': Tablespace usage from MAX, percent	Used bytes/max bytes*100.	Dependent item	oracle.tbsusedfrommaxpct["{#TABLESPACE}"] Preprocessing JSON Path: `$..['{#TABLESPACE}'].used_from_max_pct.first()`
Oracle TBS '{#TABLESPACE}': Open status	The tablespace status where: 1 - ONLINE; 2 - OFFLINE; 3 - READ ONLY.	Dependent item	oracle.tbs_status["{#TABLESPACE}"] Preprocessing JSON Path: `$..['{#TABLESPACE}'].status.first()`

Trigger prototypes for Tablespace discovery

Name	Description	Expression	Severity
Oracle TBS '{#TABLESPACE}': Tablespace usage is too high	The usage of the tablespace `{#TABLESPACE}` exceeds `{$ORACLE.TBS.USED.PCT.MAX.WARN}`%	`min(/Oracle by Zabbix agent 2/oracle.tbs_used_file_pct["{#TABLESPACE}"],5m)>{$ORACLE.TBS.USED.PCT.MAX.WARN}`\|Warning	Depends on: Oracle TBS '{#TABLESPACE}': Tablespace usage is too high
Oracle TBS '{#TABLESPACE}': Tablespace usage is too high	The usage of the tablespace `{#TABLESPACE}` exceeds `{$ORACLE.TBS.USED.PCT.MAX.HIGH}`%	`min(/Oracle by Zabbix agent 2/oracle.tbs_used_file_pct["{#TABLESPACE}"],5m)>{$ORACLE.TBS.USED.PCT.MAX.HIGH}`\|High
Oracle TBS '{#TABLESPACE}': Tablespace utilization is too high	The utilization of the tablespace `{#TABLESPACE}` exceeds `{$ORACLE.TBS.UTIL.PCT.MAX.WARN}`%	`min(/Oracle by Zabbix agent 2/oracle.tbs_used_pct["{#TABLESPACE}"],5m)>{$ORACLE.TBS.UTIL.PCT.MAX.WARN}`\|Warning	Depends on: Oracle TBS '{#TABLESPACE}': Tablespace utilization is too high
Oracle TBS '{#TABLESPACE}': Tablespace utilization is too high	The utilization of the tablespace `{#TABLESPACE}` exceeds `{$ORACLE.TBS.UTIL.PCT.MAX.HIGH}`%	`min(/Oracle by Zabbix agent 2/oracle.tbs_used_pct["{#TABLESPACE}"],5m)>{$ORACLE.TBS.UTIL.PCT.MAX.HIGH}`\|High
Oracle TBS '{#TABLESPACE}': Tablespace usage from MAX is too high	The usage of the tablespace `{#TABLESPACE}` from MAX exceeds `{$ORACLE.TBS.USED.PCT.FROM.MAX.WARN}`%	`min(/Oracle by Zabbix agent 2/oracle.tbs_used_from_max_pct["{#TABLESPACE}"],5m)>{$ORACLE.TBS.USED.PCT.FROM.MAX.WARN}`\|Warning	Depends on: Oracle TBS '{#TABLESPACE}': Tablespace utilization from MAX is too high
Oracle TBS '{#TABLESPACE}': Tablespace utilization from MAX is too high	The usage of the tablespace `{#TABLESPACE}` from MAX exceeds `{$ORACLE.TBS.USED.PCT.FROM.MAX.HIGH}`%	`min(/Oracle by Zabbix agent 2/oracle.tbs_used_from_max_pct["{#TABLESPACE}"],5m)>{$ORACLE.TBS.USED.PCT.FROM.MAX.HIGH}`\|High
Oracle TBS '{#TABLESPACE}': Tablespace is OFFLINE	The tablespace is in the offline state.	`last(/Oracle by Zabbix agent 2/oracle.tbs_status["{#TABLESPACE}"])=2`\|Warning
Oracle TBS '{#TABLESPACE}': Tablespace status has changed	Oracle tablespace status has changed. Acknowledge to close the problem manually.	`last(/Oracle by Zabbix agent 2/oracle.tbs_status["{#TABLESPACE}"],#1)<>last(/Oracle by Zabbix agent 2/oracle.tbs_status["{#TABLESPACE}"],#2)`\|Info	Manual close: Yes Depends on: Oracle TBS '{#TABLESPACE}': Tablespace is OFFLINE

LLD rule Archive log discovery

Name	Description	Type	Key and additional info
Archive log discovery	Destinations of the log archive.	Zabbix agent	oracle.archive.discovery["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"]

Item prototypes for Archive log discovery

Name	Description	Type	Key and additional info
Archivelog '{#DEST_NAME}': Get archive log info	Gets the archive log statistics.	Zabbix agent	oracle.archive.info["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}","{#DEST_NAME}"]
Archivelog '{#DEST_NAME}': Error	Displays the error message.	Dependent item	oracle.archivelogerror["{#DESTNAME}"] Preprocessing JSON Path: `$..['{#DEST_NAME}'].error.first()` Discard unchanged with heartbeat: `1h`
Archivelog '{#DEST_NAME}': Last sequence	Identifies the sequence number of the last archived redo log to be archived.	Dependent item	oracle.archiveloglogsequence["{#DEST_NAME}"] Preprocessing JSON Path: `$..['{#DEST_NAME}'].log_sequence.first()`
Archivelog '{#DEST_NAME}': Status	Identifies the current status of the destination where: 1 - VALID; 2 - DEFERRED; 3 - ERROR; 0 - UNKNOWN.	Dependent item	oracle.archiveloglogstatus["{#DEST_NAME}"] Preprocessing JSON Path: `$..['{#DEST_NAME}'].status.first()` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Archive log discovery

Name	Description	Expression	Severity	Dependencies and additional info
Archivelog '{#DEST_NAME}': Log Archive is not valid	The trigger will launch if the archive log destination is not in one of these states: 2 - DEFERRED; 3 - VALID.	`last(/Oracle by Zabbix agent 2/oracle.archivelog_log_status["{#DEST_NAME}"])<2`\|High

LLD rule ASM disk groups discovery

Name	Description	Type	Key and additional info
ASM disk groups discovery	The ASM disk groups.	Zabbix agent	oracle.diskgroups.discovery["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"]

Item prototypes for ASM disk groups discovery

Name	Description	Type	Key and additional info
ASM '{#DGNAME}': Get ASM stats	Gets the ASM disk group statistics.	Zabbix agent	oracle.diskgroups.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}","{#DGNAME}"]
ASM '{#DGNAME}': Total size	The total size of the ASM disk group.	Dependent item	oracle.asmtotalsize["{#DGNAME}"] Preprocessing JSON Path: `$..['{#DGNAME}'].total_bytes.first()`
ASM '{#DGNAME}': Free size	The free size of the ASM disk group.	Dependent item	oracle.asmfreesize["{#DGNAME}"] Preprocessing JSON Path: `$..['{#DGNAME}'].free_bytes.first()`
ASM '{#DGNAME}': Used size, percent	Usage of the ASM disk group expressed in %.	Dependent item	oracle.asmusedpct["{#DGNAME}"] Preprocessing JSON Path: `$..['{#DGNAME}'].used_pct.first()`

Trigger prototypes for ASM disk groups discovery

Name	Description	Expression	Severity	Dependencies and additional info
ASM '{#DGNAME}': Disk group usage is too high	The usage of the ASM disk group expressed in % exceeds `{$ORACLE.ASM.USED.PCT.MAX.WARN}`.	`min(/Oracle by Zabbix agent 2/oracle.asm_used_pct["{#DGNAME}"],5m)>{$ORACLE.ASM.USED.PCT.MAX.WARN}`\|Warning	Depends on: ASM '{#DGNAME}': Disk group usage is too high
ASM '{#DGNAME}': Disk group usage is too high	The usage of the ASM disk group expressed in % exceeds `{$ORACLE.ASM.USED.PCT.MAX.HIGH}`.	`min(/Oracle by Zabbix agent 2/oracle.asm_used_pct["{#DGNAME}"],5m)>{$ORACLE.ASM.USED.PCT.MAX.HIGH}`\|High

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

db_mysql_odbc

View README Download JSON

MySQL by ODBC

Overview

This template is designed for the effortless deployment of MySQL monitoring by Zabbix via ODBC and doesn't require any external scripts.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

MySQL 5.7, 8.0
Percona 8.0
MariaDB 10.4

Configuration

Setup

Create a MySQL user for monitoring (<password> at your discretion):

CREATE USER 'zbx_monitor'@'%' IDENTIFIED BY '<password>';
GRANT REPLICATION CLIENT,PROCESS,SHOW DATABASES,SHOW VIEW ON *.* TO 'zbx_monitor'@'%';

For more information, please see MySQL documentation https://dev.mysql.com/doc/refman/8.0/en/grant.html

Set the username and password in the host macros {$MYSQL.USER} and {$MYSQL.PASSWORD}.

Macros used

Name	Description	Default
{$MYSQL.ABORTED_CONN.MAX.WARN}	Number of failed attempts to connect to the MySQL server for trigger expressions.	`3`
{$MYSQL.REPL_LAG.MAX.WARN}	Amount of time the slave is behind the master for trigger expressions.	`30m`
{$MYSQL.SLOW_QUERIES.MAX.WARN}	Number of slow queries for trigger expressions.	`3`
{$MYSQL.BUFF_UTIL.MIN.WARN}	The minimum buffer pool utilization in percentage for trigger expressions.	`50`
{$MYSQL.DSN}	System data source name.	`<Put your DSN here>`
{$MYSQL.USER}	MySQL username.	`<Put your username here>`
{$MYSQL.PASSWORD}	MySQL user password.	`<Put your password here>`
{$MYSQL.CREATEDTMPTABLES.MAX.WARN}	The maximum number of temporary tables created in memory per second for trigger expressions.	`30`
{$MYSQL.CREATEDTMPDISK_TABLES.MAX.WARN}	The maximum number of temporary tables created on a disk per second for trigger expressions.	`10`
{$MYSQL.CREATEDTMPFILES.MAX.WARN}	The maximum number of temporary files created on a disk per second for trigger expressions.	`10`
{$MYSQL.INNODBLOGFILES}	Number of physical files in the InnoDB redo log for calculating `innodb_log_file_size`.	`2`
{$MYSQL.DBNAME.MATCHES}	Filter of discoverable databases.	`.+`
{$MYSQL.DBNAME.NOT_MATCHES}	Filter to exclude discovered databases.	`information_schema`

Items

Name	Description	Type	Key and additional info
MySQL: Get status variables	Gets server global status information.	Database monitor	db.odbc.get[getstatusvariables,"{$MYSQL.DSN}"]
MySQL: Get database	Used for scanning databases in DBMS.	Database monitor	db.odbc.get[get_database,"{$MYSQL.DSN}"]
MySQL: Get replication	Gets replication status information.	Database monitor	db.odbc.get[get_replication,"{$MYSQL.DSN}"]
MySQL: Status	MySQL server status.	Database monitor	db.odbc.select[ping,"{$MYSQL.DSN}"] Preprocessing Discard unchanged with heartbeat: `10m`
MySQL: Version	MySQL server version.	Database monitor	db.odbc.select[version,"{$MYSQL.DSN}"] Preprocessing Discard unchanged with heartbeat: `6h`
MySQL: Uptime	Number of seconds that the server has been up.	Dependent item	mysql.uptime Preprocessing JSON Path: `$[?(@.Variable_name=='Uptime')].Value.first()`
MySQL: Aborted clients per second	Number of connections that were aborted because the client died without closing the connection properly.	Dependent item	mysql.aborted_clients.rate Preprocessing JSON Path: `$[?(@.Variable_name=='Aborted_clients')].Value.first()` Change per second
MySQL: Aborted connections per second	Number of failed attempts to connect to the MySQL server.	Dependent item	mysql.aborted_connects.rate Preprocessing JSON Path: `$[?(@.Variable_name=='Aborted_connects')].Value.first()` Change per second
MySQL: Connection errors accept per second	Number of errors that occurred during calls to `accept()` on the listening port.	Dependent item	mysql.connectionerrorsaccept.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MySQL: Connection errors internal per second	Number of refused connections due to internal server errors, for example, out of memory errors, or failed thread starts.	Dependent item	mysql.connectionerrorsinternal.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MySQL: Connection errors max connections per second	Number of refused connections due to the `max_connections` limit being reached.	Dependent item	mysql.connectionerrorsmax_connections.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MySQL: Connection errors peer address per second	Number of errors while searching for the connecting client's IP address.	Dependent item	mysql.connectionerrorspeer_address.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MySQL: Connection errors select per second	Number of errors during calls to `select()` or `poll()` on the listening port. The client would not necessarily have been rejected in these cases.	Dependent item	mysql.connectionerrorsselect.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MySQL: Connection errors tcpwrap per second	Number of connections the libwrap library has refused.	Dependent item	mysql.connectionerrorstcpwrap.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MySQL: Connections per second	Number of connection attempts (successful or not) to the MySQL server.	Dependent item	mysql.connections.rate Preprocessing JSON Path: `$[?(@.Variable_name=='Connections')].Value.first()` Change per second
MySQL: Max used connections	The maximum number of connections that have been in use simultaneously since the server start.	Dependent item	mysql.maxusedconnections Preprocessing JSON Path: `$[?(@.Variable_name=='Max_used_connections')].Value.first()` Discard unchanged with heartbeat: `1h`
MySQL: Threads cached	Number of threads in the thread cache.	Dependent item	mysql.threads_cached Preprocessing JSON Path: `$[?(@.Variable_name=='Threads_cached')].Value.first()`
MySQL: Threads connected	Number of currently open connections.	Dependent item	mysql.threads_connected Preprocessing JSON Path: `$[?(@.Variable_name=='Threads_connected')].Value.first()`
MySQL: Threads created per second	Number of threads created to handle connections. If the value of `Threads_created` is large, you may want to increase the `thread_cache_size` value. The cache miss rate can be calculated as `Threads_created`/`Connections`.	Dependent item	mysql.threads_created.rate Preprocessing JSON Path: `$[?(@.Variable_name=='Threads_created')].Value.first()` Change per second
MySQL: Threads running	Number of threads that are not sleeping.	Dependent item	mysql.threads_running Preprocessing JSON Path: `$[?(@.Variable_name=='Threads_running')].Value.first()`
MySQL: Buffer pool efficiency	The item shows how effectively the buffer pool is serving reads.	Calculated	mysql.bufferpoolefficiency
MySQL: Buffer pool utilization	Ratio of used to total pages in the buffer pool.	Calculated	mysql.bufferpoolutilization
MySQL: Created tmp files on disk per second	How many temporary files `mysqld` has created.	Dependent item	mysql.createdtmpfiles.rate Preprocessing JSON Path: `$[?(@.Variable_name=='Created_tmp_files')].Value.first()` Change per second
MySQL: Created tmp tables on disk per second	Number of internal on-disk temporary tables created by the server while executing statements.	Dependent item	mysql.createdtmpdisk_tables.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MySQL: Created tmp tables on memory per second	Number of internal temporary tables created by the server while executing statements.	Dependent item	mysql.createdtmptables.rate Preprocessing JSON Path: `$[?(@.Variable_name=='Created_tmp_tables')].Value.first()` Change per second
MySQL: InnoDB buffer pool pages free	The total size of the InnoDB buffer pool, in pages.	Dependent item	mysql.innodbbufferpoolpagesfree Preprocessing JSON Path: `The text is too long. Please see the template.`
MySQL: InnoDB buffer pool pages total	The total size of the InnoDB buffer pool, in pages.	Dependent item	mysql.innodbbufferpoolpagestotal Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
MySQL: InnoDB buffer pool read requests	Number of logical read requests.	Dependent item	mysql.innodbbufferpoolreadrequests Preprocessing JSON Path: `The text is too long. Please see the template.`
MySQL: InnoDB buffer pool read requests per second	Number of logical read requests per second.	Dependent item	mysql.innodbbufferpoolreadrequests.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MySQL: InnoDB buffer pool reads	Number of logical reads that InnoDB could not satisfy from the buffer pool and had to read directly from the disk.	Dependent item	mysql.innodbbufferpool_reads Preprocessing JSON Path: `The text is too long. Please see the template.`
MySQL: InnoDB buffer pool reads per second	Number of logical reads per second that InnoDB could not satisfy from the buffer pool and had to read directly from the disk.	Dependent item	mysql.innodbbufferpool_reads.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MySQL: InnoDB row lock time	The total time spent in acquiring row locks for InnoDB tables, in milliseconds.	Dependent item	mysql.innodbrowlock_time Preprocessing JSON Path: `$[?(@.Variable_name=='Innodb_row_lock_time')].Value.first()` Custom multiplier: `0.001` Discard unchanged with heartbeat: `1h`
MySQL: InnoDB row lock time max	The maximum time to acquire a row lock for InnoDB tables, in milliseconds.	Dependent item	mysql.innodbrowlocktimemax Preprocessing JSON Path: `The text is too long. Please see the template.` Custom multiplier: `0.001` Discard unchanged with heartbeat: `1h`
MySQL: InnoDB row lock waits	Number of times operations on InnoDB tables had to wait for a row lock.	Dependent item	mysql.innodbrowlock_waits Preprocessing JSON Path: `$[?(@.Variable_name=='Innodb_row_lock_waits')].Value.first()`
MySQL: Slow queries per second	Number of queries that have taken more than `long_query_time` seconds.	Dependent item	mysql.slow_queries.rate Preprocessing JSON Path: `$[?(@.Variable_name=='Slow_queries')].Value.first()` Change per second
MySQL: Bytes received	Number of bytes received from all clients.	Dependent item	mysql.bytes_received.rate Preprocessing JSON Path: `$[?(@.Variable_name=='Bytes_received')].Value.first()` Change per second
MySQL: Bytes sent	Number of bytes sent to all clients.	Dependent item	mysql.bytes_sent.rate Preprocessing JSON Path: `$[?(@.Variable_name=='Bytes_sent')].Value.first()` Change per second
MySQL: Command Delete per second	The `Com_delete` counter variable indicates the number of times the `DELETE` statement has been executed.	Dependent item	mysql.com_delete.rate Preprocessing JSON Path: `$[?(@.Variable_name=='Com_delete')].Value.first()` Change per second
MySQL: Command Insert per second	The `Com_insert` counter variable indicates the number of times the `INSERT` statement has been executed.	Dependent item	mysql.com_insert.rate Preprocessing JSON Path: `$[?(@.Variable_name=='Com_insert')].Value.first()` Change per second
MySQL: Command Select per second	The `Com_select` counter variable indicates the number of times the `SELECT` statement has been executed.	Dependent item	mysql.com_select.rate Preprocessing JSON Path: `$[?(@.Variable_name=='Com_select')].Value.first()` Change per second
MySQL: Command Update per second	The `Com_update` counter variable indicates the number of times the `UPDATE` statement has been executed.	Dependent item	mysql.com_update.rate Preprocessing JSON Path: `$[?(@.Variable_name=='Com_update')].Value.first()` Change per second
MySQL: Queries per second	Number of statements executed by the server. This variable includes statements executed within stored programs, unlike the `Questions` variable.	Dependent item	mysql.queries.rate Preprocessing JSON Path: `$[?(@.Variable_name=='Queries')].Value.first()` Change per second
MySQL: Questions per second	Number of statements executed by the server. This includes only statements sent to the server by clients and not statements executed within stored programs, unlike the `Queries` variable.	Dependent item	mysql.questions.rate Preprocessing JSON Path: `$[?(@.Variable_name=='Questions')].Value.first()` Change per second
MySQL: Binlog cache disk use	Number of transactions that used a temporary disk cache because they could not fit in the regular binary log cache, being larger than `binlog_cache_size`.	Dependent item	mysql.binlogcachedisk_use Preprocessing JSON Path: `$[?(@.Variable_name=='Binlog_cache_disk_use')].Value.first()` Discard unchanged with heartbeat: `6h`
MySQL: Innodb buffer pool wait free	Number of times InnoDB waited for a free page before reading or creating a page. Normally, writes to the InnoDB buffer pool happen in the background. When no clean pages are available, dirty pages are flushed first in order to free some up. This counts the numbers of wait for this operation to finish. If this value is not small, look at the increasing `innodb_buffer_pool_size`.	Dependent item	mysql.innodbbufferpoolwaitfree Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`
MySQL: Innodb number open files	Number of open files held by InnoDB. InnoDB only.	Dependent item	mysql.innodbnumopen_files Preprocessing JSON Path: `$[?(@.Variable_name=='Innodb_num_open_files')].Value.first()` Discard unchanged with heartbeat: `6h`
MySQL: Open table definitions	Number of cached table definitions.	Dependent item	mysql.opentabledefinitions Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`
MySQL: Open tables	Number of tables that are open.	Dependent item	mysql.open_tables Preprocessing JSON Path: `$[?(@.Variable_name=='Open_tables')].Value.first()` Discard unchanged with heartbeat: `6h`
MySQL: Innodb log written	Number of bytes written to the InnoDB log.	Dependent item	mysql.innodboslog_written Preprocessing JSON Path: `$[?(@.Variable_name=='Innodb_os_log_written')].Value.first()` Discard unchanged with heartbeat: `6h`
MySQL: Calculated value of innodblogfile_size	`Innodb_log_file_size` is calculated as: (`innodb_os_log_written`-`innodb_os_log_written`(time shift -1h))/`{$MYSQL.INNODB_LOG_FILES}`. `Innodb_log_file_size` is the size in bytes of the each InnoDB redo log file in the log group. The combined size can be no more than 512 GB. Larger values mean less disk I/O due to less flushing checkpoint activity, but also slower recovery from a crash.	Calculated	mysql.innodblogfile_size Preprocessing Discard unchanged with heartbeat: `6h`

Triggers

Name	Description	Expression	Severity
MySQL: Service is down	MySQL is down.	`last(/MySQL by ODBC/db.odbc.select[ping,"{$MYSQL.DSN}"])=0`\|High
MySQL: Version has changed	The MySQL version has changed. Acknowledge to close the problem manually.	`last(/MySQL by ODBC/db.odbc.select[version,"{$MYSQL.DSN}"],#1)<>last(/MySQL by ODBC/db.odbc.select[version,"{$MYSQL.DSN}"],#2) and length(last(/MySQL by ODBC/db.odbc.select[version,"{$MYSQL.DSN}"]))>0`\|Info	Manual close: Yes
MySQL: Service has been restarted	MySQL uptime is less than 10 minutes.	`last(/MySQL by ODBC/mysql.uptime)<10m`\|Info
MySQL: Failed to fetch info data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/MySQL by ODBC/mysql.uptime,30m)=1`\|Info	Depends on: MySQL: Service is down
MySQL: Server has aborted connections	The number of failed attempts to connect to the MySQL server is more than `{$MYSQL.ABORTED_CONN.MAX.WARN}` in the last 5 minutes.	`min(/MySQL by ODBC/mysql.aborted_connects.rate,5m)>{$MYSQL.ABORTED_CONN.MAX.WARN}`\|Average	Depends on: MySQL: Refused connections
MySQL: Refused connections	Number of refused connections due to the `max_connections` limit being reached.	`last(/MySQL by ODBC/mysql.connection_errors_max_connections.rate)>0`\|Average
MySQL: Buffer pool utilization is too low	The buffer pool utilization is less than `{$MYSQL.BUFF_UTIL.MIN.WARN}`% in the last 5 minutes. This means that there is a lot of unused RAM allocated for the buffer pool, which you can easily reallocate at the moment.	`max(/MySQL by ODBC/mysql.buffer_pool_utilization,5m)<{$MYSQL.BUFF_UTIL.MIN.WARN}`\|Warning
MySQL: Number of temporary files created per second is high	The application using the database may be in need of query optimization.	`min(/MySQL by ODBC/mysql.created_tmp_files.rate,5m)>{$MYSQL.CREATED_TMP_FILES.MAX.WARN}`\|Warning
MySQL: Number of on-disk temporary tables created per second is high	The application using the database may be in need of query optimization.	`min(/MySQL by ODBC/mysql.created_tmp_disk_tables.rate,5m)>{$MYSQL.CREATED_TMP_DISK_TABLES.MAX.WARN}`\|Warning
MySQL: Number of internal temporary tables created per second is high	The application using the database may be in need of query optimization.	`min(/MySQL by ODBC/mysql.created_tmp_tables.rate,5m)>{$MYSQL.CREATED_TMP_TABLES.MAX.WARN}`\|Warning
MySQL: Server has slow queries	The number of slow queries is more than `{$MYSQL.SLOW_QUERIES.MAX.WARN}` in the last 5 minutes.	`min(/MySQL by ODBC/mysql.slow_queries.rate,5m)>{$MYSQL.SLOW_QUERIES.MAX.WARN}`\|Warning

LLD rule Database discovery

Name Description Type Key and additional info

Database discovery

Used for the discovery of the databases.

Dependent item

mysql.database.discovery

Preprocessing

Discard unchanged with heartbeat: 1d

Item prototypes for Database discovery

Name Description Type Key and additional info

MySQL: Size of database {#DATABASE}

Database size.

Database monitor

db.odbc.select[{#DATABASE}_size,"{$MYSQL.DSN}"]

Preprocessing

Discard unchanged with heartbeat: 1h

LLD rule Replication discovery

Name Description Type Key and additional info

Replication discovery

Discovery of the replication.

Dependent item

mysql.replication.discovery

Preprocessing

Discard unchanged with heartbeat: 1d

Item prototypes for Replication discovery

Name	Description	Type	Key and additional info
MySQL: Replication Slave status {#MASTER_HOST}	Gets status information on the essential parameters of the slave threads.	Dependent item	mysql.slavestatus["{#MASTERHOST}"] Preprocessing JSON Path: `$.[?(@.Master_Host=='{#MASTER_HOST}')].first()`
MySQL: Replication Slave SQL Running State {#MASTER_HOST}	Shows the state of the SQL driver threads.	Dependent item	mysql.slavesqlrunningstate["{#MASTERHOST}"] Preprocessing JSON Path: `$.Slave_SQL_Running_State` Discard unchanged with heartbeat: `6h`
MySQL: Replication Seconds Behind Master {#MASTER_HOST}	The number of seconds the slave SQL thread has been behind processing the master binary log. A high number (or an increasing one) can indicate that the slave is unable to handle events from the master in a timely fashion.	Dependent item	mysql.secondsbehindmaster["{#MASTER_HOST}"] Preprocessing JSON Path: `$.Seconds_Behind_Master` Matches regular expression: `\d+` ⛔️Custom on fail: Set error to: `Replication is not performed.` Discard unchanged with heartbeat: `1h`
MySQL: Replication Slave IO Running {#MASTER_HOST}	Whether the I/O thread for reading the master's binary log is running. Normally, you want this to be `Yes` unless you have not yet started a replication or have explicitly stopped it with `STOP SLAVE`.	Dependent item	mysql.slaveiorunning["{#MASTER_HOST}"] Preprocessing JSON Path: `$.Slave_IO_Running` Discard unchanged with heartbeat: `1h`
MySQL: Replication Slave SQL Running {#MASTER_HOST}	Whether the SQL thread for executing events in the relay log is running. As with the I/O thread, this should normally be `Yes`.	Dependent item	mysql.slavesqlrunning["{#MASTER_HOST}"] Preprocessing JSON Path: `$.Slave_SQL_Running` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Replication discovery

Name	Description	Expression	Severity
MySQL: Replication lag is too high	Replication delay is too long.	`min(/MySQL by ODBC/mysql.seconds_behind_master["{#MASTER_HOST}"],5m)>{$MYSQL.REPL_LAG.MAX.WARN}`\|Warning
MySQL: The slave I/O thread is not running	Whether the I/O thread for reading the master's binary log is running.	`count(/MySQL by ODBC/mysql.slave_io_running["{#MASTER_HOST}"],#1,"eq","No")=1`\|Average
MySQL: The slave I/O thread is not connected to a replication master	Whether the slave I/O thread is connected to the master.	`count(/MySQL by ODBC/mysql.slave_io_running["{#MASTER_HOST}"],#1,"ne","Yes")=1`\|Warning	Depends on: MySQL: The slave I/O thread is not running
MySQL: The SQL thread is not running	Whether the SQL thread for executing events in the relay log is running.	`count(/MySQL by ODBC/mysql.slave_sql_running["{#MASTER_HOST}"],#1,"eq","No")=1`\|Warning	Depends on: MySQL: The slave I/O thread is not running

LLD rule MariaDB discovery

Name Description Type Key and additional info

MariaDB discovery

Used for additional metrics if MariaDB is used.

Dependent item

mysql.extra_metric.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for MariaDB discovery

Name	Description	Type	Key and additional info
MySQL: Binlog commits	Total number of transactions committed to the binary log.	Dependent item	mysql.binlog_commits[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.Variable_name=='Binlog_commits')].Value.first()`
MySQL: Binlog group commits	Total number of group commits done to the binary log.	Dependent item	mysql.binloggroupcommits[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.Variable_name=='Binlog_group_commits')].Value.first()`
MySQL: Master GTID wait count	The number of times `MASTER_GTID_WAIT` called.	Dependent item	mysql.mastergtidwait_count[{#SINGLETON}] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`
MySQL: Master GTID wait time	Total number of time spent in `MASTER_GTID_WAIT`.	Dependent item	mysql.mastergtidwait_time[{#SINGLETON}] Preprocessing JSON Path: `$[?(@.Variable_name=='Master_gtid_wait_time')].Value.first()` Discard unchanged with heartbeat: `6h`
MySQL: Master GTID wait timeouts	Number of timeouts occurring in `MASTER_GTID_WAIT`.	Dependent item	mysql.mastergtidwait_timeouts[{#SINGLETON}] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

db_mysql_agent2

View README Download JSON

MySQL by Zabbix agent 2

Overview

This template is designed for the effortless deployment of MySQL monitoring by Zabbix via Zabbix agent 2 and doesn't require any external scripts.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

MySQL 5.7, 8.0
Percona 8.0
MariaDB 10.4

Configuration

Setup

Create a MySQL user for monitoring (<password> at your discretion):

CREATE USER 'zbx_monitor'@'%' IDENTIFIED BY '<password>';
GRANT REPLICATION CLIENT,PROCESS,SHOW DATABASES,SHOW VIEW ON *.* TO 'zbx_monitor'@'%';

For more information, please see MySQL documentation https://dev.mysql.com/doc/refman/8.0/en/grant.html

Set in the {$MYSQL.DSN} macro the data source name of the MySQL instance either session name from Zabbix agent 2 configuration file or URI. Examples: MySQL1, tcp://localhost:3306, tcp://172.16.0.10, unix:/var/run/mysql.sock For more information about MySQL Unix socket file, see the MySQL documentation https://dev.mysql.com/doc/refman/8.0/en/problems-with-mysql-sock.html.
If you had set URI in the {$MYSQL.DSN}, define the user name and password in host macros ({$MYSQL.USER} and {$MYSQL.PASSWORD}). Leave macros {$MYSQL.USER} and {$MYSQL.PASSWORD} empty if you use a session name. Set the user name and password in the Plugins.Mysql.<...> section of your Zabbix agent 2 configuration file. For more information about configuring the Zabbix MySQL plugin, see the documentation https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/src/go/plugins/mysql/README.md.

Macros used

Name	Description	Default
{$MYSQL.USER}	MySQL user name.
{$MYSQL.PASSWORD}	MySQL user password.
{$MYSQL.ABORTED_CONN.MAX.WARN}	Number of failed attempts to connect to the MySQL server for trigger expressions.	`3`
{$MYSQL.REPL_LAG.MAX.WARN}	Amount of time the slave is behind the master for trigger expressions.	`30m`
{$MYSQL.SLOW_QUERIES.MAX.WARN}	Number of slow queries for trigger expressions.	`3`
{$MYSQL.BUFF_UTIL.MIN.WARN}	The minimum buffer pool utilization in percentage for trigger expressions.	`50`
{$MYSQL.DSN}	System data source name such as .	`<Put your DSN>`
{$MYSQL.CREATEDTMPTABLES.MAX.WARN}	The maximum number of temporary tables created in memory per second for trigger expressions.	`30`
{$MYSQL.CREATEDTMPDISK_TABLES.MAX.WARN}	The maximum number of temporary tables created on a disk per second for trigger expressions.	`10`
{$MYSQL.CREATEDTMPFILES.MAX.WARN}	The maximum number of temporary files created on a disk per second for trigger expressions.	`10`
{$MYSQL.INNODBLOGFILES}	Number of physical files in the InnoDB redo log for calculating `innodb_log_file_size`.	`2`
{$MYSQL.DBNAME.MATCHES}	Filter of discoverable databases.	`.+`
{$MYSQL.DBNAME.NOT_MATCHES}	Filter to exclude discovered databases.	`information_schema`

Items

Name	Description	Type	Key and additional info
MySQL: Get status variables	Gets server global status information.	Zabbix agent	mysql.getstatusvariables["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"]
MySQL: Status	MySQL server status.	Zabbix agent	mysql.ping["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"] Preprocessing Discard unchanged with heartbeat: `10m`
MySQL: Version	MySQL server version.	Zabbix agent	mysql.version["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"] Preprocessing Discard unchanged with heartbeat: `6h`
MySQL: Uptime	Number of seconds that the server has been up.	Dependent item	mysql.uptime Preprocessing JSON Path: `$.Uptime`
MySQL: Aborted clients per second	Number of connections that were aborted because the client died without closing the connection properly.	Dependent item	mysql.aborted_clients.rate Preprocessing JSON Path: `$.Aborted_clients` Change per second
MySQL: Aborted connections per second	Number of failed attempts to connect to the MySQL server.	Dependent item	mysql.aborted_connects.rate Preprocessing JSON Path: `$.Aborted_connects` Change per second
MySQL: Connection errors accept per second	Number of errors that occurred during calls to `accept()` on the listening port.	Dependent item	mysql.connectionerrorsaccept.rate Preprocessing JSON Path: `$.Connection_errors_accept` Change per second
MySQL: Connection errors internal per second	Number of refused connections due to internal server errors, for example, out of memory errors, or failed thread starts.	Dependent item	mysql.connectionerrorsinternal.rate Preprocessing JSON Path: `$.Connection_errors_internal` Change per second
MySQL: Connection errors max connections per second	Number of refused connections due to the `max_connections` limit being reached.	Dependent item	mysql.connectionerrorsmax_connections.rate Preprocessing JSON Path: `$.Connection_errors_max_connections` Change per second
MySQL: Connection errors peer address per second	Number of errors while searching for the connecting client's IP address.	Dependent item	mysql.connectionerrorspeer_address.rate Preprocessing JSON Path: `$.Connection_errors_peer_address` Change per second
MySQL: Connection errors select per second	Number of errors during calls to `select()` or `poll()` on the listening port. The client would not necessarily have been rejected in these cases.	Dependent item	mysql.connectionerrorsselect.rate Preprocessing JSON Path: `$.Connection_errors_select` Change per second
MySQL: Connection errors tcpwrap per second	Number of connections the libwrap library has refused.	Dependent item	mysql.connectionerrorstcpwrap.rate Preprocessing JSON Path: `$.Connection_errors_tcpwrap` Change per second
MySQL: Connections per second	Number of connection attempts (successful or not) to the MySQL server.	Dependent item	mysql.connections.rate Preprocessing JSON Path: `$.Connections` Change per second
MySQL: Max used connections	The maximum number of connections that have been in use simultaneously since the server start.	Dependent item	mysql.maxusedconnections Preprocessing JSON Path: `$.Max_used_connections` Discard unchanged with heartbeat: `1h`
MySQL: Threads cached	Number of threads in the thread cache.	Dependent item	mysql.threads_cached Preprocessing JSON Path: `$.Threads_cached`
MySQL: Threads connected	Number of currently open connections.	Dependent item	mysql.threads_connected Preprocessing JSON Path: `$.Threads_connected`
MySQL: Threads created per second	Number of threads created to handle connections. If the value of `Threads_created` is large, you may want to increase the `thread_cache_size` value. The cache miss rate can be calculated as `Threads_created`/`Connections`.	Dependent item	mysql.threads_created.rate Preprocessing JSON Path: `$.Threads_created` Change per second
MySQL: Threads running	Number of threads that are not sleeping.	Dependent item	mysql.threads_running Preprocessing JSON Path: `$.Threads_running`
MySQL: Buffer pool efficiency	The item shows how effectively the buffer pool is serving reads.	Calculated	mysql.bufferpoolefficiency
MySQL: Buffer pool utilization	Ratio of used to total pages in the buffer pool.	Calculated	mysql.bufferpoolutilization
MySQL: Created tmp files on disk per second	How many temporary files `mysqld` has created.	Dependent item	mysql.createdtmpfiles.rate Preprocessing JSON Path: `$.Created_tmp_files` Change per second
MySQL: Created tmp tables on disk per second	Number of internal on-disk temporary tables created by the server while executing statements.	Dependent item	mysql.createdtmpdisk_tables.rate Preprocessing JSON Path: `$.Created_tmp_disk_tables` Change per second
MySQL: Created tmp tables on memory per second	Number of internal temporary tables created by the server while executing statements.	Dependent item	mysql.createdtmptables.rate Preprocessing JSON Path: `$.Created_tmp_tables` Change per second
MySQL: InnoDB buffer pool pages free	The total size of the InnoDB buffer pool, in pages.	Dependent item	mysql.innodbbufferpoolpagesfree Preprocessing JSON Path: `$.Innodb_buffer_pool_pages_free`
MySQL: InnoDB buffer pool pages total	The total size of the InnoDB buffer pool, in pages.	Dependent item	mysql.innodbbufferpoolpagestotal Preprocessing JSON Path: `$.Innodb_buffer_pool_pages_total` Discard unchanged with heartbeat: `1h`
MySQL: InnoDB buffer pool read requests	Number of logical read requests.	Dependent item	mysql.innodbbufferpoolreadrequests Preprocessing JSON Path: `$.Innodb_buffer_pool_read_requests`
MySQL: InnoDB buffer pool read requests per second	Number of logical read requests per second.	Dependent item	mysql.innodbbufferpoolreadrequests.rate Preprocessing JSON Path: `$.Innodb_buffer_pool_read_requests` Change per second
MySQL: InnoDB buffer pool reads	Number of logical reads that InnoDB could not satisfy from the buffer pool and had to read directly from the disk.	Dependent item	mysql.innodbbufferpool_reads Preprocessing JSON Path: `$.Innodb_buffer_pool_reads`
MySQL: InnoDB buffer pool reads per second	Number of logical reads per second that InnoDB could not satisfy from the buffer pool and had to read directly from the disk.	Dependent item	mysql.innodbbufferpool_reads.rate Preprocessing JSON Path: `$.Innodb_buffer_pool_reads` Change per second
MySQL: InnoDB row lock time	The total time spent in acquiring row locks for InnoDB tables, in milliseconds.	Dependent item	mysql.innodbrowlock_time Preprocessing JSON Path: `$.Innodb_row_lock_time` Custom multiplier: `0.001` Discard unchanged with heartbeat: `1h`
MySQL: InnoDB row lock time max	The maximum time to acquire a row lock for InnoDB tables, in milliseconds.	Dependent item	mysql.innodbrowlocktimemax Preprocessing JSON Path: `$.Innodb_row_lock_time_max` Custom multiplier: `0.001` Discard unchanged with heartbeat: `1h`
MySQL: InnoDB row lock waits	Number of times operations on InnoDB tables had to wait for a row lock.	Dependent item	mysql.innodbrowlock_waits Preprocessing JSON Path: `$.Innodb_row_lock_waits`
MySQL: Slow queries per second	Number of queries that have taken more than `long_query_time` seconds.	Dependent item	mysql.slow_queries.rate Preprocessing JSON Path: `$.Slow_queries` Change per second
MySQL: Bytes received	Number of bytes received from all clients.	Dependent item	mysql.bytes_received.rate Preprocessing JSON Path: `$.Bytes_received` Change per second
MySQL: Bytes sent	Number of bytes sent to all clients.	Dependent item	mysql.bytes_sent.rate Preprocessing JSON Path: `$.Bytes_sent` Change per second
MySQL: Command Delete per second	The `Com_delete` counter variable indicates the number of times the `DELETE` statement has been executed.	Dependent item	mysql.com_delete.rate Preprocessing JSON Path: `$.Com_delete` Change per second
MySQL: Command Insert per second	The `Com_insert` counter variable indicates the number of times the `INSERT` statement has been executed.	Dependent item	mysql.com_insert.rate Preprocessing JSON Path: `$.Com_insert` Change per second
MySQL: Command Select per second	The `Com_select` counter variable indicates the number of times the `SELECT` statement has been executed.	Dependent item	mysql.com_select.rate Preprocessing JSON Path: `$.Com_select` Change per second
MySQL: Command Update per second	The `Com_update` counter variable indicates the number of times the `UPDATE` statement has been executed.	Dependent item	mysql.com_update.rate Preprocessing JSON Path: `$.Com_update` Change per second
MySQL: Queries per second	Number of statements executed by the server. This variable includes statements executed within stored programs, unlike the `Questions` variable.	Dependent item	mysql.queries.rate Preprocessing JSON Path: `$.Queries` Change per second
MySQL: Questions per second	Number of statements executed by the server. This includes only statements sent to the server by clients and not statements executed within stored programs, unlike the `Queries` variable.	Dependent item	mysql.questions.rate Preprocessing JSON Path: `$.Questions` Change per second
MySQL: Binlog cache disk use	Number of transactions that used a temporary disk cache because they could not fit in the regular binary log cache, being larger than `binlog_cache_size`.	Dependent item	mysql.binlogcachedisk_use Preprocessing JSON Path: `$.Binlog_cache_disk_use` Discard unchanged with heartbeat: `6h`
MySQL: Innodb buffer pool wait free	Number of times InnoDB waited for a free page before reading or creating a page. Normally, writes to the InnoDB buffer pool happen in the background. When no clean pages are available, dirty pages are flushed first in order to free some up. This counts the numbers of wait for this operation to finish. If this value is not small, look at the increasing `innodb_buffer_pool_size`.	Dependent item	mysql.innodbbufferpoolwaitfree Preprocessing JSON Path: `$.Innodb_buffer_pool_wait_free` Discard unchanged with heartbeat: `6h`
MySQL: Innodb number open files	Number of open files held by InnoDB. InnoDB only.	Dependent item	mysql.innodbnumopen_files Preprocessing JSON Path: `$.Innodb_num_open_files` Discard unchanged with heartbeat: `6h`
MySQL: Open table definitions	Number of cached table definitions.	Dependent item	mysql.opentabledefinitions Preprocessing JSON Path: `$.Open_table_definitions` Discard unchanged with heartbeat: `6h`
MySQL: Open tables	Number of tables that are open.	Dependent item	mysql.open_tables Preprocessing JSON Path: `$.Open_tables` Discard unchanged with heartbeat: `6h`
MySQL: Innodb log written	Number of bytes written to the InnoDB log.	Dependent item	mysql.innodboslog_written Preprocessing JSON Path: `$.Innodb_os_log_written` Discard unchanged with heartbeat: `6h`
MySQL: Calculated value of innodblogfile_size	`Innodb_log_file_size` is calculated as: (`innodb_os_log_written`-`innodb_os_log_written`(time shift -1h))/`{$MYSQL.INNODB_LOG_FILES}`. `Innodb_log_file_size` is the size in bytes of the each InnoDB redo log file in the log group. The combined size can be no more than 512 GB. Larger values mean less disk I/O due to less flushing checkpoint activity, but also slower recovery from a crash.	Calculated	mysql.innodblogfile_size Preprocessing Discard unchanged with heartbeat: `6h`

Triggers

Name	Description	Expression	Severity
MySQL: Service is down	MySQL is down.	`last(/MySQL by Zabbix agent 2/mysql.ping["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"])=0`\|High
MySQL: Version has changed	The MySQL version has changed. Acknowledge to close the problem manually.	`last(/MySQL by Zabbix agent 2/mysql.version["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"],#1)<>last(/MySQL by Zabbix agent 2/mysql.version["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"],#2) and length(last(/MySQL by Zabbix agent 2/mysql.version["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"]))>0`\|Info	Manual close: Yes
MySQL: Service has been restarted	MySQL uptime is less than 10 minutes.	`last(/MySQL by Zabbix agent 2/mysql.uptime)<10m`\|Info
MySQL: Failed to fetch info data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/MySQL by Zabbix agent 2/mysql.uptime,30m)=1`\|Info	Depends on: MySQL: Service is down
MySQL: Server has aborted connections	The number of failed attempts to connect to the MySQL server is more than `{$MYSQL.ABORTED_CONN.MAX.WARN}` in the last 5 minutes.	`min(/MySQL by Zabbix agent 2/mysql.aborted_connects.rate,5m)>{$MYSQL.ABORTED_CONN.MAX.WARN}`\|Average	Depends on: MySQL: Refused connections
MySQL: Refused connections	Number of refused connections due to the `max_connections` limit being reached.	`last(/MySQL by Zabbix agent 2/mysql.connection_errors_max_connections.rate)>0`\|Average
MySQL: Buffer pool utilization is too low	The buffer pool utilization is less than `{$MYSQL.BUFF_UTIL.MIN.WARN}`% in the last 5 minutes. This means that there is a lot of unused RAM allocated for the buffer pool, which you can easily reallocate at the moment.	`max(/MySQL by Zabbix agent 2/mysql.buffer_pool_utilization,5m)<{$MYSQL.BUFF_UTIL.MIN.WARN}`\|Warning
MySQL: Number of temporary files created per second is high	The application using the database may be in need of query optimization.	`min(/MySQL by Zabbix agent 2/mysql.created_tmp_files.rate,5m)>{$MYSQL.CREATED_TMP_FILES.MAX.WARN}`\|Warning
MySQL: Number of on-disk temporary tables created per second is high	The application using the database may be in need of query optimization.	`min(/MySQL by Zabbix agent 2/mysql.created_tmp_disk_tables.rate,5m)>{$MYSQL.CREATED_TMP_DISK_TABLES.MAX.WARN}`\|Warning
MySQL: Number of internal temporary tables created per second is high	The application using the database may be in need of query optimization.	`min(/MySQL by Zabbix agent 2/mysql.created_tmp_tables.rate,5m)>{$MYSQL.CREATED_TMP_TABLES.MAX.WARN}`\|Warning
MySQL: Server has slow queries	The number of slow queries is more than `{$MYSQL.SLOW_QUERIES.MAX.WARN}` in the last 5 minutes.	`min(/MySQL by Zabbix agent 2/mysql.slow_queries.rate,5m)>{$MYSQL.SLOW_QUERIES.MAX.WARN}`\|Warning

LLD rule Database discovery

Name Description Type Key and additional info

Database discovery

Scanning databases in DBMS.

Zabbix agent

mysql.db.discovery["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"]

Preprocessing

Discard unchanged with heartbeat: 1d

Item prototypes for Database discovery

Name Description Type Key and additional info

MySQL: Size of database {#DATABASE}

Database size.

Zabbix agent

mysql.db.size["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}","{#DATABASE}"]

Preprocessing

Discard unchanged with heartbeat: 1h

LLD rule Replication discovery

Name Description Type Key and additional info

Replication discovery

If "show slave status" returns Master_Host, "Replication: *" items are created.

Zabbix agent

mysql.replication.discovery["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"]

Preprocessing

Discard unchanged with heartbeat: 1d

Item prototypes for Replication discovery

Name	Description	Type	Key and additional info
MySQL: Replication Slave status {#MASTER_HOST}	The item gets status information on the essential parameters of the slave threads.	Zabbix agent	mysql.replication.getslavestatus["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}","{#MASTER_HOST}"]
MySQL: Replication Slave SQL Running State {#MASTER_HOST}	This shows the state of the SQL driver threads.	Dependent item	mysql.replication.slavesqlrunningstate["{#MASTERHOST}"] Preprocessing JSON Path: `$.Slave_SQL_Running_State` Discard unchanged with heartbeat: `6h`
MySQL: Replication Seconds Behind Master {#MASTER_HOST}	Number of seconds that the slave SQL thread is behind processing the master binary log. A high number (or an increasing one) can indicate that the slave is unable to handle events from the master in a timely fashion.	Dependent item	mysql.replication.secondsbehindmaster["{#MASTER_HOST}"] Preprocessing JSON Path: `$.Seconds_Behind_Master` Matches regular expression: `\d+` ⛔️Custom on fail: Set error to: `Replication is not performed.` Discard unchanged with heartbeat: `1h`
MySQL: Replication Slave IO Running {#MASTER_HOST}	Whether the I/O thread for reading the master's binary log is running. Normally, you want this to be Yes unless you have not yet started a replication or have explicitly stopped it with STOP SLAVE.	Dependent item	mysql.replication.slaveiorunning["{#MASTER_HOST}"] Preprocessing JSON Path: `$.Slave_IO_Running` Discard unchanged with heartbeat: `1h`
MySQL: Replication Slave SQL Running {#MASTER_HOST}	Whether the SQL thread for executing events in the relay log is running. As with the I/O thread, this should normally be Yes.	Dependent item	mysql.replication.slavesqlrunning["{#MASTER_HOST}"] Preprocessing JSON Path: `$.Slave_SQL_Running` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Replication discovery

Name	Description	Expression	Severity
MySQL: Replication lag is too high	Replication delay is too long.	`min(/MySQL by Zabbix agent 2/mysql.replication.seconds_behind_master["{#MASTER_HOST}"],5m)>{$MYSQL.REPL_LAG.MAX.WARN}`\|Warning
MySQL: The slave I/O thread is not running	Whether the I/O thread for reading the master's binary log is running.	`count(/MySQL by Zabbix agent 2/mysql.replication.slave_io_running["{#MASTER_HOST}"],#1,"eq","No")=1`\|Average
MySQL: The slave I/O thread is not connected to a replication master	Whether the slave I/O thread is connected to the master.	`count(/MySQL by Zabbix agent 2/mysql.replication.slave_io_running["{#MASTER_HOST}"],#1,"ne","Yes")=1`\|Warning	Depends on: MySQL: The slave I/O thread is not running
MySQL: The SQL thread is not running	Whether the SQL thread for executing events in the relay log is running.	`count(/MySQL by Zabbix agent 2/mysql.replication.slave_sql_running["{#MASTER_HOST}"],#1,"eq","No")=1`\|Warning	Depends on: MySQL: The slave I/O thread is not running

LLD rule MariaDB discovery

Name Description Type Key and additional info

MariaDB discovery

Used for additional metrics if MariaDB is used.

Dependent item

mysql.extra_metric.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for MariaDB discovery

Name	Description	Type	Key and additional info
MySQL: Binlog commits	Total number of transactions committed to the binary log.	Dependent item	mysql.binlog_commits[{#SINGLETON}] Preprocessing JSON Path: `$.Binlog_commits`
MySQL: Binlog group commits	Total number of group commits done to the binary log.	Dependent item	mysql.binloggroupcommits[{#SINGLETON}] Preprocessing JSON Path: `$.Binlog_group_commits`
MySQL: Master GTID wait count	The number of times `MASTER_GTID_WAIT` called.	Dependent item	mysql.mastergtidwait_count[{#SINGLETON}] Preprocessing JSON Path: `$.Master_gtid_wait_count` Discard unchanged with heartbeat: `6h`
MySQL: Master GTID wait time	Total number of time spent in `MASTER_GTID_WAIT`.	Dependent item	mysql.mastergtidwait_time[{#SINGLETON}] Preprocessing JSON Path: `$.Master_gtid_wait_time` Discard unchanged with heartbeat: `6h`
MySQL: Master GTID wait timeouts	Number of timeouts occurring in `MASTER_GTID_WAIT`.	Dependent item	mysql.mastergtidwait_timeouts[{#SINGLETON}] Preprocessing JSON Path: `$.Master_gtid_wait_timeouts` Discard unchanged with heartbeat: `6h`

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

db_mysql_agent

View README Download JSON

MySQL by Zabbix agent

Overview

This template is designed for the effortless deployment of MySQL monitoring by Zabbix via Zabbix agent and doesn't require any external scripts.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

MySQL 5.7, 8.0
Percona 8.0
MariaDB 10.4, 10.6.8

Configuration

Setup

Install Zabbix agent and MySQL client. If necessary, add the path to the mysql and mysqladmin utilities to the global environment variable PATH.
Copy the template_db_mysql.conf file with user parameters into folder with Zabbix agent configuration (/etc/zabbix/zabbix_agentd.d/ by default). Don't forget to restart Zabbix agent.
Create the MySQL user that will be used for monitoring (<password> at your discretion). For example:

CREATE USER 'zbx_monitor'@'%' IDENTIFIED BY '<password>';
GRANT REPLICATION CLIENT,PROCESS,SHOW DATABASES,SHOW VIEW ON *.* TO 'zbx_monitor'@'%';

For more information, please see MySQL documentation.

Create .my.cnf configuration file in the home directory of Zabbix agent for Linux distributions (/var/lib/zabbix by default) or my.cnf in c:\ for Windows. For example:

[client]
protocol=tcp
user='zbx_monitor'
password='<password>'

For more information, please see MySQL documentation.

NOTE: Linux distributions that use SELinux may require additional steps for access configuration.

For example, the following rule could be added to the SELinux policy:

# cat <<EOF > zabbix_home.te
module zabbix_home 1.0;

require {
        type zabbix_agent_t;
        type zabbix_var_lib_t;
        type mysqld_etc_t;
        type mysqld_port_t;
        type mysqld_var_run_t;
        class file { open read };
        class tcp_socket name_connect;
        class sock_file write;
}

#============= zabbix_agent_t ==============

allow zabbix_agent_t zabbix_var_lib_t:file read;
allow zabbix_agent_t zabbix_var_lib_t:file open;
allow zabbix_agent_t mysqld_etc_t:file read;
allow zabbix_agent_t mysqld_port_t:tcp_socket name_connect;
allow zabbix_agent_t mysqld_var_run_t:sock_file write;
EOF
# checkmodule -M -m -o zabbix_home.mod zabbix_home.te
# semodule_package -o zabbix_home.pp -m zabbix_home.mod
# semodule -i zabbix_home.pp
# restorecon -R /var/lib/zabbix

Macros used

Name	Description	Default
{$MYSQL.ABORTED_CONN.MAX.WARN}	Number of failed attempts to connect to the MySQL server for trigger expressions.	`3`
{$MYSQL.HOST}	Hostname or IP of MySQL host or container.	`127.0.0.1`
{$MYSQL.PORT}	MySQL service port.	`3306`
{$MYSQL.REPL_LAG.MAX.WARN}	Amount of time the slave is behind the master for trigger expressions.	`30m`
{$MYSQL.SLOW_QUERIES.MAX.WARN}	Number of slow queries for trigger expressions.	`3`
{$MYSQL.BUFF_UTIL.MIN.WARN}	The minimum buffer pool utilization in percentage for trigger expressions.	`50`
{$MYSQL.CREATEDTMPTABLES.MAX.WARN}	The maximum number of temporary tables created in memory per second for trigger expressions.	`30`
{$MYSQL.CREATEDTMPDISK_TABLES.MAX.WARN}	The maximum number of temporary tables created on a disk per second for trigger expressions.	`10`
{$MYSQL.CREATEDTMPFILES.MAX.WARN}	The maximum number of temporary files created on a disk per second for trigger expressions.	`10`
{$MYSQL.INNODBLOGFILES}	Number of physical files in the InnoDB redo log for calculating `innodb_log_file_size`.	`2`
{$MYSQL.DBNAME.MATCHES}	Filter of discoverable databases.	`.+`
{$MYSQL.DBNAME.NOT_MATCHES}	Filter to exclude discovered databases.	`information_schema`

Items

Name	Description	Type	Key and additional info
MySQL: Get status variables	Gets server global status information.	Zabbix agent	mysql.getstatusvariables["{$MYSQL.HOST}","{$MYSQL.PORT}"]
MySQL: Status	MySQL server status.	Zabbix agent	mysql.ping["{$MYSQL.HOST}","{$MYSQL.PORT}"] Preprocessing JavaScript: `return value.indexOf('is alive') !== -1 ? 1 : 0;` Discard unchanged with heartbeat: `10m`
MySQL: Version	MySQL server version.	Zabbix agent	mysql.version["{$MYSQL.HOST}","{$MYSQL.PORT}"] Preprocessing Regular expression: `(Server version)\s+(.+) \2` Discard unchanged with heartbeat: `1d`
MySQL: Uptime	Number of seconds that the server has been up.	Dependent item	mysql.uptime Preprocessing XML XPath: `The text is too long. Please see the template.`
MySQL: Aborted clients per second	Number of connections that were aborted because the client died without closing the connection properly.	Dependent item	mysql.aborted_clients.rate Preprocessing XML XPath: `The text is too long. Please see the template.` Change per second
MySQL: Aborted connections per second	Number of failed attempts to connect to the MySQL server.	Dependent item	mysql.aborted_connects.rate Preprocessing XML XPath: `The text is too long. Please see the template.` Change per second
MySQL: Connection errors accept per second	Number of errors that occurred during calls to `accept()` on the listening port.	Dependent item	mysql.connectionerrorsaccept.rate Preprocessing XML XPath: `The text is too long. Please see the template.` Change per second
MySQL: Connection errors internal per second	Number of refused connections due to internal server errors, for example, out of memory errors, or failed thread starts.	Dependent item	mysql.connectionerrorsinternal.rate Preprocessing XML XPath: `The text is too long. Please see the template.` Change per second
MySQL: Connection errors max connections per second	Number of refused connections due to the `max_connections` limit being reached.	Dependent item	mysql.connectionerrorsmax_connections.rate Preprocessing XML XPath: `The text is too long. Please see the template.` Change per second
MySQL: Connection errors peer address per second	Number of errors while searching for the connecting client's IP address.	Dependent item	mysql.connectionerrorspeer_address.rate Preprocessing XML XPath: `The text is too long. Please see the template.` Change per second
MySQL: Connection errors select per second	Number of errors during calls to `select()` or `poll()` on the listening port. The client would not necessarily have been rejected in these cases.	Dependent item	mysql.connectionerrorsselect.rate Preprocessing XML XPath: `The text is too long. Please see the template.` Change per second
MySQL: Connection errors tcpwrap per second	Number of connections the libwrap library has refused.	Dependent item	mysql.connectionerrorstcpwrap.rate Preprocessing XML XPath: `The text is too long. Please see the template.` Change per second
MySQL: Connections per second	Number of connection attempts (successful or not) to the MySQL server.	Dependent item	mysql.connections.rate Preprocessing XML XPath: `The text is too long. Please see the template.` Change per second
MySQL: Max used connections	The maximum number of connections that have been in use simultaneously since the server start.	Dependent item	mysql.maxusedconnections Preprocessing XML XPath: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
MySQL: Threads cached	Number of threads in the thread cache.	Dependent item	mysql.threads_cached Preprocessing XML XPath: `The text is too long. Please see the template.`
MySQL: Threads connected	Number of currently open connections.	Dependent item	mysql.threads_connected Preprocessing XML XPath: `The text is too long. Please see the template.`
MySQL: Threads created per second	Number of threads created to handle connections. If the value of `Threads_created` is large, you may want to increase the `thread_cache_size` value. The cache miss rate can be calculated as `Threads_created`/`Connections`.	Dependent item	mysql.threads_created.rate Preprocessing XML XPath: `The text is too long. Please see the template.` Change per second
MySQL: Threads running	Number of threads that are not sleeping.	Dependent item	mysql.threads_running Preprocessing XML XPath: `The text is too long. Please see the template.`
MySQL: Buffer pool efficiency	The item shows how effectively the buffer pool is serving reads.	Calculated	mysql.bufferpoolefficiency
MySQL: Buffer pool utilization	Ratio of used to total pages in the buffer pool.	Calculated	mysql.bufferpoolutilization
MySQL: Created tmp files on disk per second	How many temporary files `mysqld` has created.	Dependent item	mysql.createdtmpfiles.rate Preprocessing XML XPath: `The text is too long. Please see the template.` Change per second
MySQL: Created tmp tables on disk per second	Number of internal on-disk temporary tables created by the server while executing statements.	Dependent item	mysql.createdtmpdisk_tables.rate Preprocessing XML XPath: `The text is too long. Please see the template.` Change per second
MySQL: Created tmp tables on memory per second	Number of internal temporary tables created by the server while executing statements.	Dependent item	mysql.createdtmptables.rate Preprocessing XML XPath: `The text is too long. Please see the template.` Change per second
MySQL: InnoDB buffer pool pages free	The total size of the InnoDB buffer pool, in pages.	Dependent item	mysql.innodbbufferpoolpagesfree Preprocessing XML XPath: `The text is too long. Please see the template.`
MySQL: InnoDB buffer pool pages total	The total size of the InnoDB buffer pool, in pages.	Dependent item	mysql.innodbbufferpoolpagestotal Preprocessing XML XPath: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
MySQL: InnoDB buffer pool read requests	Number of logical read requests.	Dependent item	mysql.innodbbufferpoolreadrequests Preprocessing XML XPath: `The text is too long. Please see the template.`
MySQL: InnoDB buffer pool read requests per second	Number of logical read requests per second.	Dependent item	mysql.innodbbufferpoolreadrequests.rate Preprocessing XML XPath: `The text is too long. Please see the template.` Change per second
MySQL: InnoDB buffer pool reads	Number of logical reads that InnoDB could not satisfy from the buffer pool and had to read directly from the disk.	Dependent item	mysql.innodbbufferpool_reads Preprocessing XML XPath: `The text is too long. Please see the template.`
MySQL: InnoDB buffer pool reads per second	Number of logical reads per second that InnoDB could not satisfy from the buffer pool and had to read directly from the disk.	Dependent item	mysql.innodbbufferpool_reads.rate Preprocessing XML XPath: `The text is too long. Please see the template.` Change per second
MySQL: InnoDB row lock time	The total time spent in acquiring row locks for InnoDB tables, in milliseconds.	Dependent item	mysql.innodbrowlock_time Preprocessing XML XPath: `The text is too long. Please see the template.` Custom multiplier: `0.001` Discard unchanged with heartbeat: `1h`
MySQL: InnoDB row lock time max	The maximum time to acquire a row lock for InnoDB tables, in milliseconds.	Dependent item	mysql.innodbrowlocktimemax Preprocessing XML XPath: `The text is too long. Please see the template.` Custom multiplier: `0.001` Discard unchanged with heartbeat: `1h`
MySQL: InnoDB row lock waits	Number of times operations on InnoDB tables had to wait for a row lock.	Dependent item	mysql.innodbrowlock_waits Preprocessing XML XPath: `The text is too long. Please see the template.`
MySQL: Slow queries per second	Number of queries that have taken more than `long_query_time` seconds.	Dependent item	mysql.slow_queries.rate Preprocessing XML XPath: `The text is too long. Please see the template.` Change per second
MySQL: Bytes received	Number of bytes received from all clients.	Dependent item	mysql.bytes_received.rate Preprocessing XML XPath: `The text is too long. Please see the template.` Change per second
MySQL: Bytes sent	Number of bytes sent to all clients.	Dependent item	mysql.bytes_sent.rate Preprocessing XML XPath: `The text is too long. Please see the template.` Change per second
MySQL: Command Delete per second	The `Com_delete` counter variable indicates the number of times the `DELETE` statement has been executed.	Dependent item	mysql.com_delete.rate Preprocessing XML XPath: `The text is too long. Please see the template.` Change per second
MySQL: Command Insert per second	The `Com_insert` counter variable indicates the number of times the `INSERT` statement has been executed.	Dependent item	mysql.com_insert.rate Preprocessing XML XPath: `The text is too long. Please see the template.` Change per second
MySQL: Command Select per second	The `Com_select` counter variable indicates the number of times the `SELECT` statement has been executed.	Dependent item	mysql.com_select.rate Preprocessing XML XPath: `The text is too long. Please see the template.` Change per second
MySQL: Command Update per second	The `Com_update` counter variable indicates the number of times the `UPDATE` statement has been executed.	Dependent item	mysql.com_update.rate Preprocessing XML XPath: `The text is too long. Please see the template.` Change per second
MySQL: Queries per second	Number of statements executed by the server. This variable includes statements executed within stored programs, unlike the `Questions` variable.	Dependent item	mysql.queries.rate Preprocessing XML XPath: `The text is too long. Please see the template.` Change per second
MySQL: Questions per second	Number of statements executed by the server. This includes only statements sent to the server by clients and not statements executed within stored programs, unlike the `Queries` variable.	Dependent item	mysql.questions.rate Preprocessing XML XPath: `The text is too long. Please see the template.` Change per second
MySQL: Binlog cache disk use	Number of transactions that used a temporary disk cache because they could not fit in the regular binary log cache, being larger than `binlog_cache_size`.	Dependent item	mysql.binlogcachedisk_use Preprocessing XML XPath: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`
MySQL: Innodb buffer pool wait free	Number of times InnoDB waited for a free page before reading or creating a page. Normally, writes to the InnoDB buffer pool happen in the background. When no clean pages are available, dirty pages are flushed first in order to free some up. This counts the numbers of wait for this operation to finish. If this value is not small, look at the increasing `innodb_buffer_pool_size`.	Dependent item	mysql.innodbbufferpoolwaitfree Preprocessing XML XPath: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`
MySQL: Innodb number open files	Number of open files held by InnoDB. InnoDB only.	Dependent item	mysql.innodbnumopen_files Preprocessing XML XPath: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`
MySQL: Open table definitions	Number of cached table definitions.	Dependent item	mysql.opentabledefinitions Preprocessing XML XPath: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`
MySQL: Open tables	Number of tables that are open.	Dependent item	mysql.open_tables Preprocessing XML XPath: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`
MySQL: Innodb log written	Number of bytes written to the InnoDB log.	Dependent item	mysql.innodboslog_written Preprocessing XML XPath: `The text is too long. Please see the template.`
MySQL: Calculated value of innodblogfile_size	`Innodb_log_file_size` is calculated as: (`innodb_os_log_written`-`innodb_os_log_written`(time shift -1h))/`{$MYSQL.INNODB_LOG_FILES}`. `Innodb_log_file_size` is the size in bytes of the each InnoDB redo log file in the log group. The combined size can be no more than 512 GB. Larger values mean less disk I/O due to less flushing checkpoint activity, but also slower recovery from a crash.	Calculated	mysql.innodblogfile_size Preprocessing Discard unchanged with heartbeat: `6h`

Triggers

Name	Description	Expression	Severity
MySQL: Service is down	MySQL is down.	`last(/MySQL by Zabbix agent/mysql.ping["{$MYSQL.HOST}","{$MYSQL.PORT}"])=0`\|High
MySQL: Version has changed	The MySQL version has changed. Acknowledge to close the problem manually.	`last(/MySQL by Zabbix agent/mysql.version["{$MYSQL.HOST}","{$MYSQL.PORT}"],#1)<>last(/MySQL by Zabbix agent/mysql.version["{$MYSQL.HOST}","{$MYSQL.PORT}"],#2) and length(last(/MySQL by Zabbix agent/mysql.version["{$MYSQL.HOST}","{$MYSQL.PORT}"]))>0`\|Info	Manual close: Yes
MySQL: Service has been restarted	MySQL uptime is less than 10 minutes.	`last(/MySQL by Zabbix agent/mysql.uptime)<10m`\|Info
MySQL: Failed to fetch info data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/MySQL by Zabbix agent/mysql.uptime,30m)=1`\|Info	Depends on: MySQL: Service is down
MySQL: Server has aborted connections	The number of failed attempts to connect to the MySQL server is more than `{$MYSQL.ABORTED_CONN.MAX.WARN}` in the last 5 minutes.	`min(/MySQL by Zabbix agent/mysql.aborted_connects.rate,5m)>{$MYSQL.ABORTED_CONN.MAX.WARN}`\|Average	Depends on: MySQL: Refused connections
MySQL: Refused connections	Number of refused connections due to the `max_connections` limit being reached.	`last(/MySQL by Zabbix agent/mysql.connection_errors_max_connections.rate)>0`\|Average
MySQL: Buffer pool utilization is too low	The buffer pool utilization is less than `{$MYSQL.BUFF_UTIL.MIN.WARN}`% in the last 5 minutes. This means that there is a lot of unused RAM allocated for the buffer pool, which you can easily reallocate at the moment.	`max(/MySQL by Zabbix agent/mysql.buffer_pool_utilization,5m)<{$MYSQL.BUFF_UTIL.MIN.WARN}`\|Warning
MySQL: Number of temporary files created per second is high	The application using the database may be in need of query optimization.	`min(/MySQL by Zabbix agent/mysql.created_tmp_files.rate,5m)>{$MYSQL.CREATED_TMP_FILES.MAX.WARN}`\|Warning
MySQL: Number of on-disk temporary tables created per second is high	The application using the database may be in need of query optimization.	`min(/MySQL by Zabbix agent/mysql.created_tmp_disk_tables.rate,5m)>{$MYSQL.CREATED_TMP_DISK_TABLES.MAX.WARN}`\|Warning
MySQL: Number of internal temporary tables created per second is high	The application using the database may be in need of query optimization.	`min(/MySQL by Zabbix agent/mysql.created_tmp_tables.rate,5m)>{$MYSQL.CREATED_TMP_TABLES.MAX.WARN}`\|Warning
MySQL: Server has slow queries	The number of slow queries is more than `{$MYSQL.SLOW_QUERIES.MAX.WARN}` in the last 5 minutes.	`min(/MySQL by Zabbix agent/mysql.slow_queries.rate,5m)>{$MYSQL.SLOW_QUERIES.MAX.WARN}`\|Warning

LLD rule Database discovery

Name Description Type Key and additional info

Database discovery

Scanning databases in DBMS.

Zabbix agent

mysql.db.discovery["{$MYSQL.HOST}","{$MYSQL.PORT}"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1d

Item prototypes for Database discovery

Name Description Type Key and additional info

MySQL: Size of database {#DBNAME}

Database size.

Zabbix agent

mysql.dbsize["{$MYSQL.HOST}","{$MYSQL.PORT}","{#DBNAME}"]

Preprocessing

Discard unchanged with heartbeat: 1h

LLD rule Replication discovery

Name Description Type Key and additional info

Replication discovery

If "show slave status" returns Master_Host, "Replication: *" items are created.

Zabbix agent

mysql.replication.discovery["{$MYSQL.HOST}","{$MYSQL.PORT}"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1d

Item prototypes for Replication discovery

Name	Description	Type	Key and additional info
MySQL: Replication Slave status {#MASTER_HOST}	The item gets status information on the essential parameters of the slave threads.	Zabbix agent	mysql.slavestatus["{$MYSQL.HOST}","{$MYSQL.PORT}","{#MASTERHOST}"]
MySQL: Replication Slave SQL Running State {#MASTER_HOST}	This shows the state of the SQL driver threads.	Dependent item	mysql.slavesqlrunningstate["{#MASTERHOST}"] Preprocessing XML XPath: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`
MySQL: Replication Seconds Behind Master {#MASTER_HOST}	The number of seconds that the slave SQL thread is behind processing the master binary log. A high number (or an increasing one) can indicate that the slave is unable to handle events from the master in a timely fashion.	Dependent item	mysql.secondsbehindmaster["{#MASTER_HOST}"] Preprocessing XML XPath: `/resultset/row/field[@name='Seconds_Behind_Master']/text()` Discard unchanged with heartbeat: `1h` Does not match regular expression: `null` ⛔️Custom on fail: Set error to: `Replication is not performed.`
MySQL: Replication Slave IO Running {#MASTER_HOST}	Whether the I/O thread for reading the master's binary log is running. Normally, you want this to be Yes unless you have not yet started replication or have explicitly stopped it with STOP SLAVE.	Dependent item	mysql.slaveiorunning["{#MASTER_HOST}"] Preprocessing XML XPath: `/resultset/row/field[@name='Slave_IO_Running']/text()` Discard unchanged with heartbeat: `1h`
MySQL: Replication Slave SQL Running {#MASTER_HOST}	Whether the SQL thread for executing events in the relay log is running. As with the I/O thread, this should normally be Yes.	Dependent item	mysql.slavesqlrunning["{#MASTER_HOST}"] Preprocessing XML XPath: `/resultset/row/field[@name='Slave_SQL_Running']/text()` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Replication discovery

Name	Description	Expression	Severity
MySQL: Replication lag is too high	Replication delay is too long.	`min(/MySQL by Zabbix agent/mysql.seconds_behind_master["{#MASTER_HOST}"],5m)>{$MYSQL.REPL_LAG.MAX.WARN}`\|Warning
MySQL: The slave I/O thread is not running	Whether the I/O thread for reading the master's binary log is running.	`count(/MySQL by Zabbix agent/mysql.slave_io_running["{#MASTER_HOST}"],#1,"eq","No")=1`\|Average
MySQL: The slave I/O thread is not connected to a replication master	Whether the slave I/O thread is connected to the master.	`count(/MySQL by Zabbix agent/mysql.slave_io_running["{#MASTER_HOST}"],#1,"ne","Yes")=1`\|Warning	Depends on: MySQL: The slave I/O thread is not running
MySQL: The SQL thread is not running	Whether the SQL thread for executing events in the relay log is running.	`count(/MySQL by Zabbix agent/mysql.slave_sql_running["{#MASTER_HOST}"],#1,"eq","No")=1`\|Warning	Depends on: MySQL: The slave I/O thread is not running

LLD rule MariaDB discovery

Name Description Type Key and additional info

MariaDB discovery

Used for additional metrics if MariaDB is used.

Dependent item

mysql.extra_metric.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for MariaDB discovery

Name	Description	Type	Key and additional info
MySQL: Binlog commits	Total number of transactions committed to the binary log.	Dependent item	mysql.binlog_commits[{#SINGLETON}] Preprocessing XML XPath: `The text is too long. Please see the template.`
MySQL: Binlog group commits	Total number of group commits done to the binary log.	Dependent item	mysql.binloggroupcommits[{#SINGLETON}] Preprocessing XML XPath: `The text is too long. Please see the template.`
MySQL: Master GTID wait count	The number of times `MASTER_GTID_WAIT` called.	Dependent item	mysql.mastergtidwait_count[{#SINGLETON}] Preprocessing XML XPath: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`
MySQL: Master GTID wait time	Total number of time spent in `MASTER_GTID_WAIT`.	Dependent item	mysql.mastergtidwait_time[{#SINGLETON}] Preprocessing XML XPath: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`
MySQL: Master GTID wait timeouts	Number of timeouts occurring in `MASTER_GTID_WAIT`.	Dependent item	mysql.mastergtidwait_timeouts[{#SINGLETON}] Preprocessing XML XPath: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `6h`

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

db_mssql_odbc

View README Download JSON

MSSQL by ODBC

Overview

This template is designed for the effortless deployment of MSSQL monitoring by Zabbix via ODBC and doesn't require any external scripts.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

Microsoft SQL, version 2019, 2022

Configuration

Setup

Create an MSSQL user for monitoring. For example, "zbx_monitor".

View Server State and View Any Definition permissions should be granted to the user. Grant this user read permissions to the sysjobschedules, sysjobhistory, and sysjobs tables.

For example, using T-SQL commands:

GRANT SELECT ON OBJECT::msdb.dbo.sysjobs TO zbx_monitor;
GRANT SELECT ON OBJECT::msdb.dbo.sysjobservers TO zbx_monitor;
GRANT SELECT ON OBJECT::msdb.dbo.sysjobactivity TO zbx_monitor;
GRANT EXECUTE ON OBJECT::msdb.dbo.agent_datetime TO zbx_monitor;

For more information, see MSSQL documentation:

Create a database user

GRANT Server Permissions

Configure a User to Create and Manage SQL Server Agent Jobs

Set the username and password in the host macros {$MSSQL.USER} and {$MSSQL.PASSWORD}.
Do not forget to install Microsoft ODBC driver on Zabbix server or Zabbix proxy and specify data source name in macro {$MSSQL.DSN}.

See Microsoft documentation for instructions: https://docs.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server?view=sql-server-ver16.

Note! Credentials in the odbc.ini do not work for MSSQL.

The Service's TCP port state item uses the {HOST.CONN} and {$MSSQL.PORT} macros to check the availability of the MSSQL instance. Keep in mind that if dynamic ports are used on the MSSQL server side, this check will not work correctly.

If your instance uses a non-default TCP port, set the port in your section of odbc.ini in the line Server = IP or FQDN name, port.

Note: You can use the context macros {$MSSQL.BACKUP_FULL.USED}, {$MSSQL.BACKUP_LOG.USED}, and {$MSSQL.BACKUP_DIFF.USED} to disable backup age triggers for a certain database. If set to a value other than "1", the trigger expression for the backup age will not fire.

Macros used

Name	Description	Default
{$MSSQL.DSN}	System data source name.	`<Put your DSN here>`
{$MSSQL.USER}	MSSQL username.	`<Put your username here>`
{$MSSQL.PASSWORD}	MSSQL user password.	`<Put your password here>`
{$MSSQL.PORT}	MSSQL TCP port.	`1433`
{$MSSQL.DBNAME.MATCHES}	This macro is used in database discovery. It can be overridden on the host or linked template level.	`.*`
{$MSSQL.DBNAME.NOT_MATCHES}	This macro is used in database discovery. It can be overridden on the host or linked template level.	`master\|tempdb\|model\|msdb`
{$MSSQL.WORK_FILES.MAX}	The maximum number of work files created per second - for the trigger expression.	`20`
{$MSSQL.WORK_TABLES.MAX}	The maximum number of work tables created per second - for the trigger expression.	`20`
{$MSSQL.WORKTABLESFROMCACHE_RATIO.MIN.CRIT}	The minimum percentage of work tables from the cache ratio - for the High trigger expression.	`90`
{$MSSQL.BUFFERCACHERATIO.MIN.CRIT}	The minimum buffer cache hit ratio, in percent - for the High trigger expression.	`30`
{$MSSQL.BUFFERCACHERATIO.MIN.WARN}	The minimum buffer cache hit ratio, in percent - for the Warning trigger expression.	`50`
{$MSSQL.FREELISTSTALLS.MAX}	The maximum free list stalls per second - for the trigger expression.	`2`
{$MSSQL.LAZY_WRITES.MAX}	The maximum lazy writes per second - for the trigger expression.	`20`
{$MSSQL.PAGELIFEEXPECTANCY.MIN}	The minimum page life expectancy - for the trigger expression.	`300`
{$MSSQL.PAGE_READS.MAX}	The maximum page reads per second - for the trigger expression.	`90`
{$MSSQL.PAGE_WRITES.MAX}	The maximum page writes per second - for the trigger expression.	`90`
{$MSSQL.AVERAGEWAITTIME.MAX}	The maximum average wait time, in milliseconds - for the trigger expression.	`500`
{$MSSQL.LOCK_REQUESTS.MAX}	The maximum lock requests per second - for the trigger expression.	`1000`
{$MSSQL.LOCK_TIMEOUTS.MAX}	The maximum lock timeouts per second - for the trigger expression.	`1`
{$MSSQL.DEADLOCKS.MAX}	The maximum deadlocks per second - for the trigger expression.	`1`
{$MSSQL.LOGFLUSHWAITS.MAX}	The maximum log flush waits per second - for the trigger expression.	`1`
{$MSSQL.LOGFLUSHWAIT_TIME.MAX}	The maximum log flush wait time, in milliseconds - for the trigger expression.	`1`
{$MSSQL.PERCENTLOGUSED.MAX}	The maximum percentage of log used - for the trigger expression.	`80`
{$MSSQL.PERCENT_COMPILATIONS.MAX}	The maximum percentage of Transact-SQL compilations - for the trigger expression.	`10`
{$MSSQL.PERCENT_RECOMPILATIONS.MAX}	The maximum percentage of Transact-SQL recompilations - for the trigger expression.	`10`
{$MSSQL.PERCENT_READAHEAD.MAX}	The maximum percentage of pages read per second in anticipation of use - for the trigger expression.	`20`
{$MSSQL.BACKUP_DIFF.CRIT}	The maximum of days without a differential backup - for the High trigger expression.	`6d`
{$MSSQL.BACKUP_DIFF.WARN}	The maximum of days without a differential backup - for the Warning trigger expression.	`3d`
{$MSSQL.BACKUP_FULL.CRIT}	The maximum of days without a full backup - for the High trigger expression.	`10d`
{$MSSQL.BACKUP_FULL.WARN}	The maximum of days without a full backup - for the Warning trigger expression.	`9d`
{$MSSQL.BACKUP_LOG.CRIT}	The maximum of days without a log backup - for the High trigger expression.	`8h`
{$MSSQL.BACKUP_LOG.WARN}	The maximum of days without a log backup - for the Warning trigger expression.	`4h`
{$MSSQL.JOB.MATCHES}	This macro is used in job discovery. It can be overridden on the host or linked template level.	`.*`
{$MSSQL.JOB.NOT_MATCHES}	This macro is used in job discovery. It can be overridden on the host or linked template level.	`CHANGE_IF_NEEDED`
{$MSSQL.BACKUP_DURATION.WARN}	The maximum job duration - for the Warning trigger expression.	`1h`
{$MSSQL.BACKUP_FULL.USED}	The flag for checking the age of a full backup. If set to a value other than "1", the trigger expression for the full backup age will not fire. Can be used with context for database name.	`1`
{$MSSQL.BACKUP_LOG.USED}	The flag for checking the age of a log backup. If set to a value other than "1", the trigger expression for the log backup age will not fire. Can be used with context for database name.	`1`
{$MSSQL.BACKUP_DIFF.USED}	The flag for checking the age of a differential backup. If set to a value other than "1", the trigger expression for the differential backup age will not fire. Can be used with context for database name.	`1`
{$MSSQL.QUORUM.MEMBER.DISCOVERY.NAME.MATCHES}	Filter to include discovered quorum member by name.	`.*`
{$MSSQL.QUORUM.MEMBER.DISCOVERY.NAME.NOT_MATCHES}	Filter to exclude discovered quorum member by name.	`CHANGE_IF_NEEDED`

Items

Name	Description	Type	Key and additional info
MSSQL: Service's TCP port state	Test the availability of MSSQL Server on a TCP port.	Simple check	net.tcp.service[tcp,{HOST.CONN},{$MSSQL.PORT}] Preprocessing Discard unchanged with heartbeat: `10m`
MSSQL: Get last backup	The item gets information about backup processes.	Database monitor	db.odbc.get[getlastbackup,"{$MSSQL.DSN}"]
MSSQL: Get job status	The item gets the SQL agent job status.	Database monitor	db.odbc.get[getjobstatus,"{$MSSQL.DSN}"]
MSSQL: Get performance counters	The item gets server global status information.	Database monitor	db.odbc.get[getstatusvariables,"{$MSSQL.DSN}"]
MSSQL: Get availability groups	The item gets availability group states - name, primary and secondary health, synchronization health.	Database monitor	db.odbc.get[getavailabilitygroup,"{$MSSQL.DSN}"]
MSSQL: Get local DB	Getting the states of the local availability database.	Database monitor	db.odbc.get[getlocaldb,"{$MSSQL.DSN}"]
MSSQL: Get DB mirroring	Getting DB mirroring.	Database monitor	db.odbc.get[getdbmirroring,"{$MSSQL.DSN}"]
MSSQL: Get non-local DB	Getting the non-local availability database.	Database monitor	db.odbc.get[getnonlocal_db,"{$MSSQL.DSN}"]
MSSQL: Get replica	Getting the database replica.	Database monitor	db.odbc.get[get_replica,"{$MSSQL.DSN}"]
MSSQL: Get quorum	Getting quorum - cluster name, type, and state.	Database monitor	db.odbc.get[get_quorum,"{$MSSQL.DSN}"]
MSSQL: Get quorum member	Getting quorum members - member name, type, state, and number of quorum votes.	Database monitor	db.odbc.get[getquorummember,"{$MSSQL.DSN}"]
MSSQL: Get database	Getting databases - database name and recovery model.	Database monitor	db.odbc.get[get_database,"{$MSSQL.DSN}"]
MSSQL: Version	MSSQL Server version.	Dependent item	mssql.version Preprocessing JSON Path: `$[?(@.counter_name=='Version')].instance_name.first()` Discard unchanged with heartbeat: `1d`
MSSQL: Uptime	MSSQL Server uptime in the format "N days, hh:mm:ss".	Dependent item	mssql.uptime Preprocessing JSON Path: `$[?(@.counter_name=='Uptime')].cntr_value.first()`
MSSQL: Get Access Methods counters	The item gets server information about access methods.	Dependent item	mssql.access_methods.raw Preprocessing JSON Path: `$[?(@.object_name=~'.*Access Methods')]` ⛔️Custom on fail: Discard value
MSSQL: Forwarded records per second	Number of records per second fetched through forwarded record pointers.	Dependent item	mssql.forwardedrecordssec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Full scans per second	Number of unrestricted full scans per second. These can be either base-table or full-index scans. Values greater than 1 or 2 indicate that there are table / index page scans. If that is combined with high CPU, this counter requires further investigation, otherwise, if the full scans are on small tables, it can be ignored.	Dependent item	mssql.fullscanssec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Full Scans/sec')].cntr_value.first()` Change per second
MSSQL: Index searches per second	Number of index searches per second. These are used to start a range scan, reposition a range scan, revalidate a scan point, fetch a single index record, and search down the index to locate where to insert a new row.	Dependent item	mssql.indexsearchessec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Page splits per second	Number of page splits per second that occur as a result of overflowing index pages.	Dependent item	mssql.pagesplitssec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Page Splits/sec')].cntr_value.first()` Change per second
MSSQL: Work files created per second	Number of work files created per second. For example, work files can be used to store temporary results for hash joins and hash aggregates.	Dependent item	mssql.workfilescreatedsec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Work tables created per second	Number of work tables created per second. For example, work tables can be used to store temporary results for query spool, LOB variables, XML variables, and cursors.	Dependent item	mssql.worktablescreatedsec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Table lock escalations per second	Number of times locks on a table were escalated to the TABLE or HoBT granularity.	Dependent item	mssql.tablelockescalations.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Worktables from cache ratio	Percentage of work tables created where the initial two pages of the work table were not allocated but were immediately available from the work table cache.	Dependent item	mssql.worktablesfromcache_ratio Preprocessing JSON Path: `The text is too long. Please see the template.`
MSSQL: Get Buffer Manager counters	The item gets server information about the buffer pool.	Dependent item	mssql.buffer_manager.raw Preprocessing JSON Path: `$[?(@.object_name=~'.*Buffer Manager')]` ⛔️Custom on fail: Discard value
MSSQL: Buffer cache hit ratio	Indicates the percentage of pages found in the buffer cache without having to read from the disk. The ratio is the total number of cache hits divided by the total number of cache lookups over the last few thousand page accesses. After a long period of time, the ratio changes very little. Since reading from the cache is much less expensive than reading from the disk, a higher value is preferred for this item. To increase the buffer cache hit ratio, consider increasing the amount of memory available to MSSQL Server or using the buffer pool extension feature.	Dependent item	mssql.buffercachehit_ratio Preprocessing JSON Path: `The text is too long. Please see the template.`
MSSQL: Checkpoint pages per second	Indicates the number of pages flushed to the disk per second by a checkpoint or other operation which required all dirty pages to be flushed.	Dependent item	mssql.checkpointpagessec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Database pages	Indicates the number of pages in the buffer pool with database content.	Dependent item	mssql.database_pages Preprocessing JSON Path: `$[?(@.counter_name=='Database pages')].cntr_value.first()`
MSSQL: Free list stalls per second	Indicates the number of requests per second that had to wait for a free page.	Dependent item	mssql.freeliststalls_sec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Lazy writes per second	Indicates the number of buffers written per second by the buffer manager's lazy writer. The lazy writer is a system process that flushes out batches of dirty, aged buffers (buffers that contain changes that must be written back to the disk before the buffer can be reused for a different page) and makes them available to user processes. The lazy writer eliminates the need to perform frequent checkpoints in order to create available buffers.	Dependent item	mssql.lazywritessec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Lazy writes/sec')].cntr_value.first()` Change per second
MSSQL: Page life expectancy	Indicates the number of seconds a page will stay in the buffer pool without references.	Dependent item	mssql.pagelifeexpectancy Preprocessing JSON Path: `The text is too long. Please see the template.`
MSSQL: Page lookups per second	Indicates the number of requests per second to find a page in the buffer pool.	Dependent item	mssql.pagelookupssec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Page lookups/sec')].cntr_value.first()` Change per second
MSSQL: Page reads per second	Indicates the number of physical database page reads that are issued per second. This statistic displays the total number of physical page reads across all databases. As physical I/O is expensive, you may be able to minimize the cost either by using a larger data cache, intelligent indexes, and more efficient queries, or by changing the database design.	Dependent item	mssql.pagereadssec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Page reads/sec')].cntr_value.first()` Change per second
MSSQL: Page writes per second	Indicates the number of physical database page writes that are issued per second.	Dependent item	mssql.pagewritessec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Page writes/sec')].cntr_value.first()` Change per second
MSSQL: Read-ahead pages per second	Indicates the number of pages read per second in anticipation of use.	Dependent item	mssql.readaheadpagessec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Target pages	The optimal number of pages in the buffer pool.	Dependent item	mssql.target_pages Preprocessing JSON Path: `$[?(@.counter_name=='Target pages')].cntr_value.first()` Discard unchanged with heartbeat: `1h`
MSSQL: Get DB counters	The item gets summary information about databases.	Dependent item	mssql.db_info.raw Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
MSSQL: Total data file size	Total size of all data files.	Dependent item	mssql.datafilessize Preprocessing JSON Path: `The text is too long. Please see the template.` Custom multiplier: `1024` Discard unchanged with heartbeat: `1h`
MSSQL: Total log file size	Total size of all the transaction log files.	Dependent item	mssql.logfilessize Preprocessing JSON Path: `The text is too long. Please see the template.` Custom multiplier: `1024` Discard unchanged with heartbeat: `1h`
MSSQL: Total log file used size	The cumulative size of all the log files in the database.	Dependent item	mssql.logfilesused_size Preprocessing JSON Path: `The text is too long. Please see the template.` Custom multiplier: `1024`
MSSQL: Total transactions per second	Total number of transactions started for all databases per second.	Dependent item	mssql.transactions_sec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Transactions/sec')].cntr_value.first()` Change per second
MSSQL: Get General Statistics counters	The item gets general statistics information.	Dependent item	mssql.general_statistics.raw Preprocessing JSON Path: `$[?(@.object_name=~'.*General Statistics')]` ⛔️Custom on fail: Discard value
MSSQL: Logins per second	Total number of logins started per second. This does not include pooled connections. Any value over 2 may indicate insufficient connection pooling.	Dependent item	mssql.logins_sec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Logins/sec')].cntr_value.first()` Change per second
MSSQL: Logouts per second	Total number of logout operations started per second. Any value over 2 may indicate insufficient connection pooling.	Dependent item	mssql.logouts_sec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Logouts/sec')].cntr_value.first()` Change per second
MSSQL: Number of blocked processes	Number of currently blocked processes.	Dependent item	mssql.processes_blocked Preprocessing JSON Path: `$[?(@.counter_name=='Processes blocked')].cntr_value.first()` Discard unchanged with heartbeat: `1h`
MSSQL: Number of users connected	Number of users connected to MSSQL Server.	Dependent item	mssql.user_connections Preprocessing JSON Path: `$[?(@.counter_name=='User Connections')].cntr_value.first()`
MSSQL: Average latch wait time	Average latch wait time (in milliseconds) for latch requests that had to wait.	Calculated	mssql.averagelatchwait_time
MSSQL: Get Latches counters	The item gets server information about latches.	Dependent item	mssql.latches_info.raw Preprocessing JSON Path: `$[?(@.object_name=~'.*Latches')]` ⛔️Custom on fail: Discard value
MSSQL: Average latch wait time raw	Average latch wait time (in milliseconds) for latch requests that had to wait.	Dependent item	mssql.averagelatchwaittimeraw Preprocessing JSON Path: `The text is too long. Please see the template.`
MSSQL: Average latch wait time base	For internal use only.	Dependent item	mssql.averagelatchwaittimebase Preprocessing JSON Path: `The text is too long. Please see the template.`
MSSQL: Latch waits per second	The number of latch requests that could not be granted immediately. Latches are lightweight means of holding a very transient server resource, such as an address in memory.	Dependent item	mssql.latchwaitssec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Latch Waits/sec')].cntr_value.first()` Change per second
MSSQL: Total latch wait time	Total latch wait time (in milliseconds) for latch requests in the last second. This value should stay stable compared to the number of latch waits per second.	Dependent item	mssql.totallatchwait_time Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Total average wait time	The average wait time, in milliseconds, for each lock request that had to wait.	Calculated	mssql.averagewaittime
MSSQL: Get Locks counters	The item gets server information about locks.	Dependent item	mssql.locks_info.raw Preprocessing JSON Path: `$[?(@.object_name=~'.*Locks' && @.instance_name=='_Total')]` ⛔️Custom on fail: Discard value
MSSQL: Total average wait time raw	Average amount of wait time (in milliseconds) for each lock request that resulted in a wait. Information for all locks.	Dependent item	mssql.averagewaittime_raw Preprocessing JSON Path: `The text is too long. Please see the template.`
MSSQL: Total average wait time base	For internal use only.	Dependent item	mssql.averagewaittime_base Preprocessing JSON Path: `The text is too long. Please see the template.`
MSSQL: Total lock requests per second	Number of new locks and lock conversions per second requested from the lock manager.	Dependent item	mssql.lockrequestssec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Lock Requests/sec')].cntr_value.first()` Change per second
MSSQL: Total lock requests per second that timed out	Number of timed out lock requests per second, including requests for NOWAIT locks.	Dependent item	mssql.locktimeoutssec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Lock Timeouts/sec')].cntr_value.first()` Change per second
MSSQL: Total lock requests per second that required waiting	Number of lock requests per second that required the caller to wait.	Dependent item	mssql.lockwaitssec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Lock Waits/sec')].cntr_value.first()` Change per second
MSSQL: Lock wait time	Average of total wait time (in milliseconds) for locks in the last second.	Dependent item	mssql.lockwaittime Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Total lock requests per second that have deadlocks	Number of lock requests per second that resulted in a deadlock.	Dependent item	mssql.numberdeadlockssec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Get Memory counters	The item gets memory information.	Dependent item	mssql.mem_manager.raw Preprocessing JSON Path: `$[?(@.object_name=~'.*Memory Manager')]` ⛔️Custom on fail: Discard value
MSSQL: Granted Workspace Memory	Specifies the total amount of memory currently granted to executing processes, such as hash, sort, bulk copy, and index creation operations.	Dependent item	mssql.grantedworkspacememory Preprocessing JSON Path: `The text is too long. Please see the template.` Custom multiplier: `1024`
MSSQL: Maximum workspace memory	Indicates the maximum amount of memory available for executing processes, such as hash, sort, bulk copy, and index creation operations.	Dependent item	mssql.maximumworkspacememory Preprocessing JSON Path: `The text is too long. Please see the template.` Custom multiplier: `1024`
MSSQL: Memory grants outstanding	Specifies the total number of processes that have successfully acquired a workspace memory grant.	Dependent item	mssql.memorygrantsoutstanding Preprocessing JSON Path: `The text is too long. Please see the template.`
MSSQL: Memory grants pending	Specifies the total number of processes waiting for a workspace memory grant.	Dependent item	mssql.memorygrantspending Preprocessing JSON Path: `The text is too long. Please see the template.`
MSSQL: Target server memory	Indicates the ideal amount of memory the server can consume.	Dependent item	mssql.targetservermemory Preprocessing JSON Path: `The text is too long. Please see the template.` Custom multiplier: `1024`
MSSQL: Total server memory	Specifies the amount of memory the server has committed using the memory manager.	Dependent item	mssql.totalservermemory Preprocessing JSON Path: `The text is too long. Please see the template.` Custom multiplier: `1024`
MSSQL: Get Cache counters	The item gets server information about cache.	Dependent item	mssql.cache_info.raw Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
MSSQL: Cache hit ratio	Ratio between cache hits and lookups.	Dependent item	mssql.cachehitratio Preprocessing JSON Path: `$[?(@.counter_name=='CacheHitRatio')].cntr_value.first()`
MSSQL: Cache object counts	Number of cache objects in the cache.	Dependent item	mssql.cacheobjectcounts Preprocessing JSON Path: `The text is too long. Please see the template.`
MSSQL: Cache objects in use	Number of cache objects in use.	Dependent item	mssql.cacheobjectsin_use Preprocessing JSON Path: `The text is too long. Please see the template.`
MSSQL: Cache pages	Number of 8-kilobyte (KB) pages used by cache objects.	Dependent item	mssql.cache_pages Preprocessing JSON Path: `$[?(@.counter_name=='Cache Pages')].cntr_value.first()`
MSSQL: Get SQL Errors counters	The item gets SQL error information.	Dependent item	mssql.sql_errors.raw Preprocessing JSON Path: `$[?(@.object_name=~'.*SQL Errors')]` ⛔️Custom on fail: Discard value
MSSQL: Errors per second (DB offline errors)	Number of errors per second.	Dependent item	mssql.offlineerrorssec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Errors per second (Info errors)	Number of errors per second.	Dependent item	mssql.infoerrorssec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Errors per second (Kill connection errors)	Number of errors per second.	Dependent item	mssql.killconnectionerrors_sec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Errors per second (User errors)	Number of errors per second.	Dependent item	mssql.usererrorssec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Total errors per second	Number of errors per second.	Dependent item	mssql.errors_sec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Get SQL Statistics counters	The item gets SQL statistics information.	Dependent item	mssql.sql_statistics.raw Preprocessing JSON Path: `$[?(@.object_name=~'.*SQL Statistics')]` ⛔️Custom on fail: Discard value
MSSQL: Auto-param attempts per second	Number of auto-parameterization attempts per second. The total should be the sum of the failed, safe, and unsafe auto-parameterizations. Auto-parameterization occurs when an instance of SQL Server tries to parameterize a Transact-SQL request by replacing some literals with parameters so that reuse of the resulting cached execution plan across multiple similar-looking requests is possible. Note that auto-parameterizations are also known as simple parameterizations in the newer versions of SQL Server. This counter does not include forced parameterizations.	Dependent item	mssql.autoparamattemptssec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Batch requests per second	Number of Transact-SQL command batches received per second. This statistic is affected by all constraints (such as I/O, number of users, cache size, complexity of requests, and so on). High batch requests mean good throughput.	Dependent item	mssql.batchrequestssec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Percent of ad hoc queries running	The ratio of SQL compilations per second to batch requests per second, in percent.	Calculated	mssql.percentofadhoc_queries
MSSQL: Percent of Recompiled Transact-SQL Objects	The ratio of SQL re-compilations per second to SQL compilations per second, in percent.	Calculated	mssql.percentrecompilationsto_compilations
MSSQL: Full scans to Index searches ratio	The ratio of full scans per second to index searches per second. The threshold recommendation is strictly for OLTP workloads.	Calculated	mssql.scantosearch
MSSQL: Failed auto-params per second	Number of failed auto-parameterization attempts per second. This number should be small. Note that auto-parameterizations are also known as simple parameterizations in the newer versions of SQL Server.	Dependent item	mssql.failedautoparamssec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Safe auto-params per second	Number of safe auto-parameterization attempts per second. Safe refers to a determination that a cached execution plan can be shared between different similar-looking Transact-SQL statements. SQL Server makes many auto-parameterization attempts, some of which turn out to be safe and others fail. Note that auto-parameterizations are also known as simple parameterizations in the newer versions of SQL Server. This does not include forced parameterizations.	Dependent item	mssql.safeautoparamssec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: SQL compilations per second	Number of SQL compilations per second. Indicates the number of times the compile code path is entered. Includes runs caused by statement-level recompilations in SQL Server. After SQL Server user activity is stable, this value reaches a steady state.	Dependent item	mssql.sqlcompilationssec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: SQL re-compilations per second	Number of statement recompiles per second. Counts the number of times statement recompiles are triggered. Generally, you want the recompiles to be low.	Dependent item	mssql.sqlrecompilationssec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Unsafe auto-params per second	Number of unsafe auto-parameterization attempts per second. For example, the query has some characteristics that prevent the cached plan from being shared. These are designated as unsafe. This does not count the number of forced parameterizations.	Dependent item	mssql.unsafeautoparamssec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Total transactions number	The number of currently active transactions of all types.	Dependent item	mssql.transactions Preprocessing JSON Path: `The text is too long. Please see the template.`

Triggers

Name	Description	Expression	Severity
MSSQL: Service is unavailable	The TCP port of the MSSQL Server service is currently unavailable.	`last(/MSSQL by ODBC/net.tcp.service[tcp,{HOST.CONN},{$MSSQL.PORT}])=0`\|Disaster
MSSQL: Version has changed	MSSQL version has changed. Acknowledge to close the problem manually.	`last(/MSSQL by ODBC/mssql.version,#1)<>last(/MSSQL by ODBC/mssql.version,#2) and length(last(/MSSQL by ODBC/mssql.version))>0`\|Info	Manual close: Yes
MSSQL: Service has been restarted	Uptime is less than 10 minutes.	`last(/MSSQL by ODBC/mssql.uptime)<10m`\|Info	Manual close: Yes
MSSQL: Failed to fetch info data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/MSSQL by ODBC/mssql.uptime,30m)=1`\|Info	Depends on: MSSQL: Service is unavailable
MSSQL: Too frequently using pointers	Rows with VARCHAR columns can experience expansion when VARCHAR values are updated with a longer string. In the case where the row cannot fit in the existing page, the row migrates, and access to the row will traverse a pointer. This only happens on heaps (tables without clustered indexes). In cases where clustered indexes cannot be used, drop non-clustered indexes, build a clustered index to reorg pages and rows, drop the clustered index, then recreate non-clustered indexes.	`last(/MSSQL by ODBC/mssql.forwarded_records_sec.rate) * 100 > 10 * last(/MSSQL by ODBC/mssql.batch_requests_sec.rate)`\|Warning
MSSQL: Number of work files created per second is high	Too many work files created per second to store temporary results for hash joins and hash aggregates.	`min(/MSSQL by ODBC/mssql.workfiles_created_sec.rate,5m)>{$MSSQL.WORK_FILES.MAX}`\|Average
MSSQL: Number of work tables created per second is high	Too many work tables created per second to store temporary results for query spool, LOB variables, XML variables, and cursors.	`min(/MSSQL by ODBC/mssql.worktables_created_sec.rate,5m)>{$MSSQL.WORK_TABLES.MAX}`\|Average
MSSQL: Percentage of work tables available from the work table cache is low	A value less than 90% may indicate insufficient memory, since execution plans are being dropped, or, on 32-bit systems, may indicate the need for an upgrade to a 64-bit system.	`max(/MSSQL by ODBC/mssql.worktables_from_cache_ratio,5m)<{$MSSQL.WORKTABLES_FROM_CACHE_RATIO.MIN.CRIT}`\|High
MSSQL: Percentage of the buffer cache efficiency is low	Too low buffer cache hit ratio.	`max(/MSSQL by ODBC/mssql.buffer_cache_hit_ratio,5m)<{$MSSQL.BUFFER_CACHE_RATIO.MIN.CRIT}`\|High
MSSQL: Percentage of the buffer cache efficiency is low	Low buffer cache hit ratio.	`max(/MSSQL by ODBC/mssql.buffer_cache_hit_ratio,5m)<{$MSSQL.BUFFER_CACHE_RATIO.MIN.WARN}`\|Warning	Depends on: MSSQL: Percentage of the buffer cache efficiency is low
MSSQL: Number of rps waiting for a free page is high	Some requests have to wait for a free page.	`min(/MSSQL by ODBC/mssql.free_list_stalls_sec.rate,5m)>{$MSSQL.FREE_LIST_STALLS.MAX}`\|Warning
MSSQL: Number of buffers written per second by the lazy writer is high	The number of buffers written per second by the buffer manager's lazy writer exceeds the threshold.	`min(/MSSQL by ODBC/mssql.lazy_writes_sec.rate,5m)>{$MSSQL.LAZY_WRITES.MAX}`\|Warning
MSSQL: Page life expectancy is low	The page stays in the buffer pool without references for less time than the threshold value.	`max(/MSSQL by ODBC/mssql.page_life_expectancy,15m)<{$MSSQL.PAGE_LIFE_EXPECTANCY.MIN}`\|High
MSSQL: Number of physical database page reads per second is high	The physical database page reads are issued too frequently.	`min(/MSSQL by ODBC/mssql.page_reads_sec.rate,5m)>{$MSSQL.PAGE_READS.MAX}`\|Warning
MSSQL: Number of physical database page writes per second is high	The physical database page writes are issued too frequently.	`min(/MSSQL by ODBC/mssql.page_writes_sec.rate,5m)>{$MSSQL.PAGE_WRITES.MAX}`\|Warning
MSSQL: Too many physical reads occurring	If this value makes up even a sizeable minority of the total "Page Reads/sec" (say, greater than 20% of the total page reads), you may have too many physical reads occurring.	`last(/MSSQL by ODBC/mssql.readahead_pages_sec.rate) > {$MSSQL.PERCENT_READAHEAD.MAX} / 100 * last(/MSSQL by ODBC/mssql.page_reads_sec.rate)`\|Warning
MSSQL: Total average wait time for locks is high	An average wait time longer than 500 ms may indicate excessive blocking. This value should generally correlate to "Lock Waits/sec" and move up or down with it accordingly.	`min(/MSSQL by ODBC/mssql.average_wait_time,5m)>{$MSSQL.AVERAGE_WAIT_TIME.MAX}`\|Warning
MSSQL: Total number of locks per second is high	Number of new locks and lock conversions per second requested from the lock manager is high.	`min(/MSSQL by ODBC/mssql.lock_requests_sec.rate,5m)>{$MSSQL.LOCK_REQUESTS.MAX}`\|Warning
MSSQL: Total lock requests per second that timed out is high	The total number of timed out lock requests per second, including requests for NOWAIT locks, is high.	`min(/MSSQL by ODBC/mssql.lock_timeouts_sec.rate,5m)>{$MSSQL.LOCK_TIMEOUTS.MAX}`\|Warning
MSSQL: Some blocking is occurring for 5m	Values greater than zero indicate at least some blocking is occurring, while a value of zero can quickly eliminate blocking as a potential root-cause problem.	`min(/MSSQL by ODBC/mssql.lock_waits_sec.rate,5m)>0`\|Average
MSSQL: Number of deadlocks is high	Too many deadlocks are occurring currently.	`min(/MSSQL by ODBC/mssql.number_deadlocks_sec.rate,5m)>{$MSSQL.DEADLOCKS.MAX}`\|Average
MSSQL: Percent of ad hoc queries running is high	The lower this value is, the better. High values often indicate excessive ad hoc querying and should be as low as possible. If excessive ad hoc querying is happening, try rewriting the queries as procedures or invoke the queries using `sp_executeSQL`. When rewriting isn't possible, consider using a plan guide or setting the database to parameterization forced mode.	`min(/MSSQL by ODBC/mssql.percent_of_adhoc_queries,15m) > {$MSSQL.PERCENT_COMPILATIONS.MAX}`\|Warning
MSSQL: Percent of times statement recompiles is high	This number should be at or near zero, since recompiles can cause deadlocks and exclusive compile locks. This counter's value should follow in proportion to "Batch Requests/sec" and "SQL Compilations/sec".	`min(/MSSQL by ODBC/mssql.percent_recompilations_to_compilations,15m) > {$MSSQL.PERCENT_RECOMPILATIONS.MAX}`\|Warning
MSSQL: Number of index and table scans exceeds index searches in the last 15m	Index searches are preferable to index and table scans. For OLTP applications, optimize for more index searches and less scans (preferably, 1 full scan for every 1000 index searches). Index and table scans are expensive I/O operations.	`min(/MSSQL by ODBC/mssql.scan_to_search,15m) > 0.001`\|Warning

LLD rule Database discovery

Name Description Type Key and additional info

Database discovery

Scanning databases in DBMS.

Dependent item

mssql.database.discovery

Preprocessing

Discard unchanged with heartbeat: 1d

Item prototypes for Database discovery

Name	Description	Type	Key and additional info
MSSQL DB '{#DBNAME}': Get performance counters	The item gets server status information for {#DBNAME}.	Dependent item	mssql.db.perf_raw["{#DBNAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
MSSQL DB '{#DBNAME}': Get last backup	The item gets information about backup processes for {#DBNAME}.	Dependent item	mssql.backup.raw["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.dbname=='{#DBNAME}')]` ⛔️Custom on fail: Discard value
MSSQL DB '{#DBNAME}': State	0 = Online 1 = Restoring 2 = Recovering \| SQL Server 2008 and later 3 = Recoverypending \| SQL Server 2008 and later 4 = Suspect 5 = Emergency \| SQL Server 2008 and later 6 = Offline \| SQL Server 2008 and later 7 = Copying \| Azure SQL Database Active Geo-Replication 10 = Offlinesecondary \| Azure SQL Database Active Geo-Replication	Dependent item	mssql.db.state["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.counter_name=='State')].cntr_value.first()` Discard unchanged with heartbeat: `15m`
MSSQL DB '{#DBNAME}': Active transactions	Number of active transactions for the database.	Dependent item	mssql.db.active_transactions["{#DBNAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.`
MSSQL DB '{#DBNAME}': Data file size	Cumulative size of all the data files in the database including any automatic growth. Monitoring this counter is useful, for example, for determining the correct size of `tempdb`.	Dependent item	mssql.db.datafilessize["{#DBNAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Custom multiplier: `1024`
MSSQL DB '{#DBNAME}': Log bytes flushed per second	Total number of log bytes flushed per second. Useful for determining trends and utilization of the transaction log.	Dependent item	mssql.db.logbytesflushed_sec.rate["{#DBNAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL DB '{#DBNAME}': Log file size	Cumulative size of all the transaction log files in the database.	Dependent item	mssql.db.logfilessize["{#DBNAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Custom multiplier: `1024`
MSSQL DB '{#DBNAME}': Log file used size	Cumulative size of all the log files in the database.	Dependent item	mssql.db.logfilesused_size["{#DBNAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Custom multiplier: `1024`
MSSQL DB '{#DBNAME}': Log flushes per second	Number of log flushes per second.	Dependent item	mssql.db.logflushessec.rate["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.counter_name=='Log Flushes/sec')].cntr_value.first()` Change per second
MSSQL DB '{#DBNAME}': Log flush waits per second	Number of commits per second waiting for the log flush.	Dependent item	mssql.db.logflushwaits_sec.rate["{#DBNAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL DB '{#DBNAME}': Log flush wait time	Total wait time (in milliseconds) to flush the log. On an Always On secondary database, this value indicates the wait time for log records to be hardened to disk.	Dependent item	mssql.db.logflushwait_time["{#DBNAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL DB '{#DBNAME}': Log growths	Total number of times the transaction log for the database has been expanded.	Dependent item	mssql.db.log_growths["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.counter_name=='Log Growths')].cntr_value.first()`
MSSQL DB '{#DBNAME}': Log shrinks	Total number of times the transaction log for the database has been shrunk.	Dependent item	mssql.db.log_shrinks["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.counter_name=='Log Shrinks')].cntr_value.first()`
MSSQL DB '{#DBNAME}': Log truncations	Number of times the transaction log has been shrunk.	Dependent item	mssql.db.log_truncations["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.counter_name=='Log Truncations')].cntr_value.first()`
MSSQL DB '{#DBNAME}': Percent log used	Percentage of log space in use.	Dependent item	mssql.db.percentlogused["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.counter_name=='Percent Log Used')].cntr_value.first()`
MSSQL DB '{#DBNAME}': Transactions per second	Number of transactions started for the database per second.	Dependent item	mssql.db.transactions_sec.rate["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.counter_name=='Transactions/sec')].cntr_value.first()` Change per second
MSSQL DB '{#DBNAME}': Last diff backup duration	Duration of the last differential backup.	Dependent item	mssql.backup.diff.duration["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.type=='I')].duration.first()` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `12h`
MSSQL DB '{#DBNAME}': Last diff backup (time ago)	The amount of time since the last differential backup.	Dependent item	mssql.backup.diff["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.type=='I')].time_since_last_backup.first()` ⛔️Custom on fail: Set value to: `0`
MSSQL DB '{#DBNAME}': Last full backup duration	Duration of the last full backup.	Dependent item	mssql.backup.full.duration["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.type=='D')].duration.first()` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `12h`
MSSQL DB '{#DBNAME}': Last full backup (time ago)	The amount of time since the last full backup.	Dependent item	mssql.backup.full["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.type=='D')].time_since_last_backup.first()` ⛔️Custom on fail: Set value to: `0`
MSSQL DB '{#DBNAME}': Last log backup duration	Duration of the last log backup.	Dependent item	mssql.backup.log.duration["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.type=='L')].duration.first()` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `12h`
MSSQL DB '{#DBNAME}': Last log backup (time ago)	The amount of time since the last log backup.	Dependent item	mssql.backup.log["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.type=='L')].time_since_last_backup.first()` ⛔️Custom on fail: Set value to: `0`
MSSQL DB '{#DBNAME}': Recovery model	Recovery model selected: 1 = Full 2 = Bulk_logged 3 = Simple	Dependent item	mssql.backup.recovery_model["{#DBNAME}"] Preprocessing JSON Path: `$[0].db_recovery_model` ⛔️Custom on fail: Set value to: `1` Discard unchanged with heartbeat: `1d`

Trigger prototypes for Database discovery

Name	Description	Expression	Severity
MSSQL DB '{#DBNAME}': State is {ITEM.VALUE}	The DB has a non-working state.	`last(/MSSQL by ODBC/mssql.db.state["{#DBNAME}"])>1`\|High
MSSQL DB '{#DBNAME}': Number of commits waiting for the log flush is high	Too many commits are waiting for the log flush.	`min(/MSSQL by ODBC/mssql.db.log_flush_waits_sec.rate["{#DBNAME}"],5m)>{$MSSQL.LOG_FLUSH_WAITS.MAX:"{#DBNAME}"}`\|Warning
MSSQL DB '{#DBNAME}': Total wait time to flush the log is high	The wait time to flush the log is too long.	`min(/MSSQL by ODBC/mssql.db.log_flush_wait_time["{#DBNAME}"],5m)>{$MSSQL.LOG_FLUSH_WAIT_TIME.MAX:"{#DBNAME}"}`\|Warning
MSSQL DB '{#DBNAME}': Percent of log usage is high	There's not enough space left in the log.	`min(/MSSQL by ODBC/mssql.db.percent_log_used["{#DBNAME}"],5m)>{$MSSQL.PERCENT_LOG_USED.MAX:"{#DBNAME}"}`\|Warning
MSSQL DB '{#DBNAME}': Diff backup is old	The differential backup has not been executed for a long time.	`last(/MSSQL by ODBC/mssql.backup.diff["{#DBNAME}"])>{$MSSQL.BACKUP_DIFF.CRIT:"{#DBNAME}"} and {$MSSQL.BACKUP_DIFF.USED:"{#DBNAME}"}=1`\|High	Manual close: Yes
MSSQL DB '{#DBNAME}': Diff backup is old	The differential backup has not been executed for a long time.	`last(/MSSQL by ODBC/mssql.backup.diff["{#DBNAME}"])>{$MSSQL.BACKUP_DIFF.WARN:"{#DBNAME}"} and {$MSSQL.BACKUP_DIFF.USED:"{#DBNAME}"}=1`\|Warning	Manual close: Yes Depends on: MSSQL DB '{#DBNAME}': Diff backup is old
MSSQL DB '{#DBNAME}': Full backup is old	The full backup has not been executed for a long time.	`last(/MSSQL by ODBC/mssql.backup.full["{#DBNAME}"])>{$MSSQL.BACKUP_FULL.CRIT:"{#DBNAME}"} and {$MSSQL.BACKUP_FULL.USED:"{#DBNAME}"}=1`\|High	Manual close: Yes
MSSQL DB '{#DBNAME}': Full backup is old	The full backup has not been executed for a long time.	`last(/MSSQL by ODBC/mssql.backup.full["{#DBNAME}"])>{$MSSQL.BACKUP_FULL.WARN:"{#DBNAME}"} and {$MSSQL.BACKUP_FULL.USED:"{#DBNAME}"}=1`\|Warning	Manual close: Yes Depends on: MSSQL DB '{#DBNAME}': Full backup is old
MSSQL DB '{#DBNAME}': Log backup is old	The log backup has not been executed for a long time.	`last(/MSSQL by ODBC/mssql.backup.log["{#DBNAME}"])>{$MSSQL.BACKUP_LOG.CRIT:"{#DBNAME}"} and {$MSSQL.BACKUP_LOG.USED:"{#DBNAME}"}=1 and last(/MSSQL by ODBC/mssql.backup.recovery_model["{#DBNAME}"])<>3`\|High	Manual close: Yes
MSSQL DB '{#DBNAME}': Log backup is old	The log backup has not been executed for a long time.	`last(/MSSQL by ODBC/mssql.backup.log["{#DBNAME}"])>{$MSSQL.BACKUP_LOG.WARN:"{#DBNAME}"} and {$MSSQL.BACKUP_LOG.USED:"{#DBNAME}"}=1 and last(/MSSQL by ODBC/mssql.backup.recovery_model["{#DBNAME}"])<>3`\|Warning	Manual close: Yes Depends on: MSSQL DB '{#DBNAME}': Log backup is old

LLD rule Availability group discovery

Name Description Type Key and additional info

Availability group discovery

Discovery of the existing availability groups.

Dependent item

mssql.availability.group.discovery

Preprocessing

Discard unchanged with heartbeat: 1d

Item prototypes for Availability group discovery

Name	Description	Type	Key and additional info
MSSQL AG '{#GROUP_NAME}': Primary replica recovery health	Indicates the recovery health of the primary replica: 0 = In progress 1 = Online 2 = Unavailable	Dependent item	mssql.primaryrecoveryhealth["{#GROUP_NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
MSSQL AG '{#GROUP_NAME}': Primary replica name	Name of the server instance that is hosting the current primary replica.	Dependent item	mssql.primaryreplica["{#GROUPNAME}"] Preprocessing JSON Path: `$[?(@.group_name=='{#GROUP_NAME}')].primary_replica.first()` Discard unchanged with heartbeat: `3h`
MSSQL AG '{#GROUP_NAME}': Secondary replica recovery health	Indicates the recovery health of a secondary replica: 0 = In progress 1 = Online 2 = Unavailable	Dependent item	mssql.secondaryrecoveryhealth["{#GROUP_NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
MSSQL AG '{#GROUP_NAME}': Synchronization health	Reflects a rollup of the `synchronization_health` of all availability replicas in the availability group: 0 = Not healthy. None of the availability replicas have a healthy synchronization. 1 = Partially healthy. The synchronization of some, but not all, availability replicas is healthy. 2 = Healthy. The synchronization of every availability replica is healthy.	Dependent item	mssql.synchronizationhealth["{#GROUPNAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Availability group discovery

Name	Description	Expression
MSSQL AG '{#GROUP_NAME}': Primary replica recovery health in progress	The primary replica is in the synchronization process.	`last(/MSSQL by ODBC/mssql.primary_recovery_health["{#GROUP_NAME}"])=0`\|Warning
MSSQL AG '{#GROUP_NAME}': Secondary replica recovery health in progress	The secondary replica is in the synchronization process.	`last(/MSSQL by ODBC/mssql.secondary_recovery_health["{#GROUP_NAME}"])=0`\|Warning
MSSQL AG '{#GROUP_NAME}': All replicas unhealthy	None of the availability replicas have a healthy synchronization.	`last(/MSSQL by ODBC/mssql.synchronization_health["{#GROUP_NAME}"])=0`\|Disaster
MSSQL AG '{#GROUP_NAME}': Some replicas unhealthy	The synchronization health of some, but not all, availability replicas is healthy.	`last(/MSSQL by ODBC/mssql.synchronization_health["{#GROUP_NAME}"])=1`\|High

LLD rule Local database discovery

Name Description Type Key and additional info

Local database discovery

Discovery of the local availability databases.

Dependent item

mssql.local.db.discovery

Preprocessing

Discard unchanged with heartbeat: 1d

Item prototypes for Local database discovery

Name Description Type Key and additional info

MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': State

0 = Online

1 = Restoring

2 = Recovering

3 = Recovery pending

4 = Suspect

5 = Emergency

6 = Offline

Dependent item

mssql.local_db.state["{#DBNAME}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': Suspended

Database state:

0 = Resumed

1 = Suspended

Dependent item

mssql.localdb.issuspended["{#DBNAME}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': Synchronization health

Reflects the intersection of the synchronization state of a database that is joined to the availability group on the availability replica and the availability mode of the availability replica (synchronous-commit or asynchronous-commit mode):

0 = Not healthy. The synchronizationstate of the database is 0 ("Not synchronizing").

1 = Partially healthy. A database on a synchronous-commit availability replica is considered partially healthy if synchronizationstate is 1 ("Synchronizing").

2 = Healthy. A database on an synchronous-commit availability replica is considered healthy if synchronizationstate is 2 ("Synchronized"), and a database on an asynchronous-commit availability replica is considered healthy if synchronizationstate is 1 ("Synchronizing").

Dependent item

mssql.localdb.synchronizationhealth["{#DBNAME}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Trigger prototypes for Local database discovery

Name	Description	Expression
MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': "{#DBNAME}" is {ITEM.VALUE}	The local availability database has a non-working state.	`last(/MSSQL by ODBC/mssql.local_db.state["{#DBNAME}"])>0`\|Warning
MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': "{#DBNAME}" is Not healthy	The synchronization state of the local availability database is "Not synchronizing".	`last(/MSSQL by ODBC/mssql.local_db.synchronization_health["{#DBNAME}"])=0`\|High
MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': "{#DBNAME}" is Partially healthy	A database on a synchronous-commit availability replica is considered partially healthy if synchronization state is "Synchronizing".	`last(/MSSQL by ODBC/mssql.local_db.synchronization_health["{#DBNAME}"])=1`\|Average

LLD rule Non-local database discovery

Name Description Type Key and additional info

Non-local database discovery

Discovery of the non-local (not local to SQL Server instance) availability databases.

Dependent item

mssql.non.local.db.discovery

Preprocessing

Discard unchanged with heartbeat: 1d

Item prototypes for Non-local database discovery

Name Description Type Key and additional info

MSSQL AG '{#GROUPNAME}' Non-Local DB '*{#REPLICANAME}*{#DBNAME}': Log queue size

Amount of the log records of the primary database that has not been sent to the secondary databases.

Dependent item

mssql.non-localdb.logsendqueuesize["{#GROUPNAME}*{#REPLICANAME}*{#DBNAME}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
Custom multiplier: 1024
Discard unchanged with heartbeat: 1h

MSSQL AG '{#GROUPNAME}' Non-Local DB '*{#REPLICANAME}*{#DBNAME}': Redo log queue size

Amount of log records in the log files of the secondary replica that has not yet been redone.

Dependent item

mssql.non-localdb.redoqueuesize["{#GROUPNAME}{#REPLICA_NAME}{#DBNAME}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
Custom multiplier: 1024
Discard unchanged with heartbeat: 1h

Trigger prototypes for Non-local database discovery

Name	Description	Expression	Severity	Dependencies and additional info
MSSQL AG '{#GROUPNAME}' Non-Local DB '{#REPLICANAME}{#DBNAME}': Log queue size is growing	The log records of the primary database are not sent to the secondary databases.	`last(/MSSQL by ODBC/mssql.non-local_db.log_send_queue_size["{#GROUP_NAME}{#REPLICA_NAME}{#DBNAME}"],#1)>last(/MSSQL by ODBC/mssql.non-local_db.log_send_queue_size["{#GROUP_NAME}{#REPLICA_NAME}{#DBNAME}"],#2) and last(/MSSQL by ODBC/mssql.non-local_db.log_send_queue_size["{#GROUP_NAME}{#REPLICA_NAME}{#DBNAME}"],#2)>last(/MSSQL by ODBC/mssql.non-local_db.log_send_queue_size["{#GROUP_NAME}{#REPLICA_NAME}{#DBNAME}"],#3)`\|High
MSSQL AG '{#GROUPNAME}' Non-Local DB '{#REPLICANAME}{#DBNAME}': Redo log queue size is growing	The log records in the log files of the secondary replica have not yet been redone.	`last(/MSSQL by ODBC/mssql.non-local_db.redo_queue_size["{#GROUP_NAME}{#REPLICA_NAME}{#DBNAME}"],#1)>last(/MSSQL by ODBC/mssql.non-local_db.redo_queue_size["{#GROUP_NAME}{#REPLICA_NAME}{#DBNAME}"],#2) and last(/MSSQL by ODBC/mssql.non-local_db.redo_queue_size["{#GROUP_NAME}{#REPLICA_NAME}{#DBNAME}"],#2)>last(/MSSQL by ODBC/mssql.non-local_db.redo_queue_size["{#GROUP_NAME}{#REPLICA_NAME}{#DBNAME}"],#3)`\|High

LLD rule Quorum discovery

Name Description Type Key and additional info

Quorum discovery

Discovery of the quorum of the WSFC cluster.

Dependent item

mssql.quorum.discovery

Preprocessing

Discard unchanged with heartbeat: 1d

Item prototypes for Quorum discovery

Name Description Type Key and additional info

MSSQL Cluster '{#CLUSTER_NAME}': Quorum type

Type of quorum used by this WSFC cluster, one of:

0 = Node Majority. This quorum configuration can sustain failures of half the nodes (rounding up) minus one.

1 = Node and Disk Majority. If the disk witness remains on line, this quorum configuration can sustain failures of half the nodes (rounding up).

2 = Node and File Share Majority. This quorum configuration works in a similar way to Node and Disk Majority, but uses a file-share witness instead of a disk witness.

3 = No Majority: Disk Only. If the quorum disk is online, this quorum configuration can sustain failures of all nodes except one.

4 = Unknown Quorum. Unknown quorum for the cluster.

5 = Cloud Witness. Cluster utilizes Microsoft Azure for quorum arbitration. If the cloud witness is available, the cluster can sustain the failure of half the nodes (rounding up).

Dependent item

mssql.quorum.type.[{#CLUSTER_NAME}]

Preprocessing

JSON Path: $[?(@.cluster_name=='{#CLUSTER_NAME}')].quorum_type.first()
Discard unchanged with heartbeat: 1d

MSSQL Cluster '{#CLUSTER_NAME}': Quorum state

State of the WSFC quorum, one of:

0 = Unknown quorum state

1 = Normal quorum

2 = Forced quorum

Dependent item

mssql.quorum.state.[{#CLUSTER_NAME}]

Preprocessing

JSON Path: $[?(@.cluster_name=='{#CLUSTER_NAME}')].quorum_state.first()
Discard unchanged with heartbeat: 1h

LLD rule Quorum members discovery

Name Description Type Key and additional info

Quorum members discovery

Discovery of the quorum members of the WSFC cluster.

Dependent item

mssql.quorum.member.discovery

Preprocessing

Discard unchanged with heartbeat: 1d

Item prototypes for Quorum members discovery

Name Description Type Key and additional info

MSSQL Cluster member '{#MEMBER_NAME}': Number of quorum votes

Number of quorum votes possessed by this quorum member.

Dependent item

mssql.quorummembers.numberofquorumvotes.[{#MEMBER_NAME}]

Preprocessing

JSON Path: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1d

MSSQL Cluster member '{#MEMBER_NAME}': Member type

The type of member, one of:

0 = WSFC node

1 = Disk witness

2 = File share witness

3 = Cloud Witness

Dependent item

mssql.quorummembers.membertype.[{#MEMBER_NAME}]

Preprocessing

JSON Path: $[?(@.member_name=='{#MEMBER_NAME}')].member_type.first()
Discard unchanged with heartbeat: 1d

MSSQL Cluster member '{#MEMBER_NAME}': Member state

The member state, one of:

0 = Offline

1 = Online

Dependent item

mssql.quorummembers.memberstate.[{#MEMBER_NAME}]

Preprocessing

JSON Path: $[?(@.member_name=='{#MEMBER_NAME}')].member_state.first()
Discard unchanged with heartbeat: 1h

LLD rule Replication discovery

Name Description Type Key and additional info

Replication discovery

Discovery of the database replicas.

Dependent item

mssql.replica.discovery

Preprocessing

Discard unchanged with heartbeat: 1d

Item prototypes for Replication discovery

Name	Description	Type	Key and additional info
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Connected state	Whether a secondary replica is currently connected to the primary replica: 0 = Disconnected. The response of an availability replica to the "Disconnected" state depends on its role: On the primary replica, if a secondary replica is disconnected, its secondary databases are marked as "Not synchronized" on the primary replica, which waits for the secondary to reconnect; On a secondary replica, upon detecting that it is disconnected, the secondary replica attempts to reconnect to the primary replica. 1 = Connected. Each primary replica tracks the connection state for every secondary replica in the same availability group. Secondary replicas track the connection state of only the primary replica.	Dependent item	mssql.replica.connectedstate["{#GROUPNAME}{#REPLICANAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Is local	Whether the replica is local: 0 = Indicates a remote secondary replica in an availability group whose primary replica is hosted by the local server instance. This value occurs only on the primary replica location. 1 = Indicates a local replica. On secondary replicas, this is the only available value for the availability group to which the replica belongs.	Dependent item	mssql.replica.islocal["{#GROUPNAME}{#REPLICANAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Join state	0 = Not joined 1 = Joined, standalone instance 2 = Joined, failover cluster instance	Dependent item	mssql.replica.joinstate["{#GROUPNAME}{#REPLICANAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Operational state	Current operational state of the replica: 0 = Pending failover 1 = Pending 2 = Online 3 = Offline 4 = Failed 5 = Failed, no quorum 6 = Not local	Dependent item	mssql.replica.operationalstate["{#GROUPNAME}{#REPLICANAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Recovery health	Rollup of the "databasestate" column of the `sys.dm_hadr_database_replica_states` dynamic management view: 0 = In progress. At least one joined database has a database state other than "Online" (databasestate is not "0"). 1 = Online. All the joined databases have a database state of "Online" (database_state is "0").	Dependent item	mssql.replica.recoveryhealth["{#GROUPNAME}{#REPLICANAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Role	Current Always On availability group role of a local replica or a connected remote replica: 0 = Resolving 1 = Primary 2 = Secondary	Dependent item	mssql.replica.role["{#GROUPNAME}{#REPLICA_NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Sync health	Reflects a rollup of the database synchronization state (synchronization_state) of all joined availability databases (also known as replicas) and the availability mode of the replica (synchronous-commit or asynchronous-commit mode). The rollup will reflect the least healthy accumulated state of the databases on the replica: 0 = Not healthy. At least one joined database is in the "Not synchronizing" state. 1 = Partially healthy. Some replicas are not in the target synchronization state: synchronous-commit replicas should be synchronized, and asynchronous-commit replicas should be synchronizing. 2 = Healthy. All replicas are in the target synchronization state: synchronous-commit replicas are synchronized, and asynchronous-commit replicas are synchronizing.	Dependent item	mssql.replica.synchronizationhealth["{#GROUPNAME}{#REPLICANAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Replication discovery

Name	Description	Expression
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is disconnected	The response of an availability replica to the "Disconnected" state depends on its role: On the primary replica, if a secondary replica is disconnected, its secondary databases are marked as "Not synchronized" on the primary replica, which waits for the secondary to reconnect; On a secondary replica, upon detecting that it is disconnected, the secondary replica attempts to reconnect to the primary replica.	`last(/MSSQL by ODBC/mssql.replica.connected_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=0 and last(/MSSQL by ODBC/mssql.replica.role["{#GROUP_NAME}_{#REPLICA_NAME}"])=2`\|Warning
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is {ITEM.VALUE}	The operational state of the replica in a given availability group is "Pending" or "Offline".	`last(/MSSQL by ODBC/mssql.replica.operational_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=0 or last(/MSSQL by ODBC/mssql.replica.operational_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=1 or last(/MSSQL by ODBC/mssql.replica.operational_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=3`\|Warning
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is {ITEM.VALUE}	The operational state of the replica in a given availability group is "Failed".	`last(/MSSQL by ODBC/mssql.replica.operational_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=4`\|Average
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is {ITEM.VALUE}	The operational state of the replica in a given availability group is "Failed, no quorum".	`last(/MSSQL by ODBC/mssql.replica.operational_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=5`\|High
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} Recovery in progress	At least one joined database has a database state other than "Online".	`last(/MSSQL by ODBC/mssql.replica.recovery_health["{#GROUP_NAME}_{#REPLICA_NAME}"])=0`\|Info
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is Not healthy	At least one joined database is in the "Not synchronizing" state.	`last(/MSSQL by ODBC/mssql.replica.synchronization_health["{#GROUP_NAME}_{#REPLICA_NAME}"])=0`\|Average
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is Partially healthy	Some replicas are not in the target synchronization state: synchronous-commit replicas should be synchronized, and asynchronous-commit replicas should be synchronizing.	`last(/MSSQL by ODBC/mssql.replica.synchronization_health["{#GROUP_NAME}_{#REPLICA_NAME}"])=1`\|Warning

LLD rule Mirroring discovery

Name Description Type Key and additional info

Mirroring discovery

To see the row for a database other than master or tempdb, you must either be the database owner or have at least ALTER ANY DATABASE or VIEW ANY DATABASE server-level permission or CREATE DATABASE permission in the master database. To see non-NULL values on a mirror database, you must be a member of the sysadmin fixed server role.

Dependent item

mssql.mirroring.discovery

Preprocessing

Discard unchanged with heartbeat: 1d

Item prototypes for Mirroring discovery

Name	Description	Type	Key and additional info
MSSQL Mirroring '{#DBNAME}': Role	Current role of the local database plays in the database mirroring session. 1 = Principal 2 = Mirror	Dependent item	mssql.mirroring.role["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.dbname=='{#DBNAME}')].mirroring_role.first()` Discard unchanged with heartbeat: `1h`
MSSQL Mirroring '{#DBNAME}': Role sequence	The number of times that mirroring partners have switched the principal and mirror roles due to a failover or forced service.	Dependent item	mssql.mirroring.role_sequence["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.dbname=='{#DBNAME}')].mirroring_role_sequence.first()` Simple change Discard unchanged with heartbeat: `1h`
MSSQL Mirroring '{#DBNAME}': State	State of the mirror database and of the database mirroring session. 0 = Suspended 1 = Disconnected from the other partner 2 = Synchronizing 3 = Pending failover 4 = Synchronized 5 = The partners are not synchronized. Failover is not possible now. 6 = The partners are synchronized. Failover is potentially possible. For information about the requirements for the failover, see Database Mirroring Operating Modes: https://learn.microsoft.com/en-us/sql/database-engine/database-mirroring/database-mirroring-operating-modes?view=sql-server-ver16.	Dependent item	mssql.mirroring.state["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.dbname=='{#DBNAME}')].mirroring_state.first()` Discard unchanged with heartbeat: `1h`
MSSQL Mirroring '{#DBNAME}': Witness state	State of the witness in the database mirroring session of the database: 0 = Unknown 1 = Connected 2 = Disconnected	Dependent item	mssql.mirroring.witness_state["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.dbname=='{#DBNAME}')].mirroring_witness_state.first()` Discard unchanged with heartbeat: `1h`
MSSQL Mirroring '{#DBNAME}': Safety level	Safety setting for updates on the mirror database: 0 = Unknown state 1 = Off [asynchronous] 2 = Full [synchronous]	Dependent item	mssql.mirroring.safety_level["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.dbname=='{#DBNAME}')].mirroring_safety_level.first()` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Mirroring discovery

Name	Description	Expression
MSSQL Mirroring '{#DBNAME}': "{#DBNAME}" is {ITEM.VALUE}	The state of the mirror database and of the database mirroring session is "Suspended", "Disconnected from the other partner", or "Synchronizing".	`last(/MSSQL by ODBC/mssql.mirroring.state["{#DBNAME}"])>=0 and last(/MSSQL by ODBC/mssql.mirroring.state["{#DBNAME}"])<=2`\|Info
MSSQL Mirroring '{#DBNAME}': "{#DBNAME}" is {ITEM.VALUE}	The state of the mirror database and of the database mirroring session is "Pending failover".	`last(/MSSQL by ODBC/mssql.mirroring.state["{#DBNAME}"])=3`\|Warning
MSSQL Mirroring '{#DBNAME}': "{#DBNAME}" is {ITEM.VALUE}	The state of the mirror database and of the database mirroring session is "Not synchronized". The partners are not synchronized. A failover is not possible now.	`last(/MSSQL by ODBC/mssql.mirroring.state["{#DBNAME}"])=5`\|High
MSSQL Mirroring '{#DBNAME}': "{#DBNAME}" Witness is disconnected	The state of the witness in the database mirroring session of the database is "Disconnected".	`last(/MSSQL by ODBC/mssql.mirroring.witness_state["{#DBNAME}"])=2`\|Warning

LLD rule Job discovery

Name Description Type Key and additional info

Job discovery

Scanning jobs in DBMS.

Dependent item

mssql.job.discovery

Preprocessing

Discard unchanged with heartbeat: 1d

Item prototypes for Job discovery

Name	Description	Type	Key and additional info
MSSQL Job '{#JOBNAME}': Get job status	The item gets the status of SQL agent job {#JOBNAME}.	Dependent item	mssql.job.status_raw["{#JOBNAME}"] Preprocessing JSON Path: `$[?(@.job_name=='{#JOBNAME}')].first()` ⛔️Custom on fail: Discard value
MSSQL Job '{#JOBNAME}': Enabled	The possible values of the job status: 0 = Disabled 1 = Enabled	Dependent item	mssql.job.enabled["{#JOBNAME}"] Preprocessing JSON Path: `$.enabled` Discard unchanged with heartbeat: `12h`
MSSQL Job '{#JOBNAME}': Last run date-time	The last date-time of the job run.	Dependent item	mssql.job.lastrundatetime["{#JOBNAME}"] Preprocessing JSON Path: `$.last_run_date_time` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `12h`
MSSQL Job '{#JOBNAME}': Next run date-time	The next date-time of the job run.	Dependent item	mssql.job.nextrundatetime["{#JOBNAME}"] Preprocessing JSON Path: `$.next_run_date_time` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `12h`
MSSQL Job '{#JOBNAME}': Last run status message	An informational message about the last run of the job.	Dependent item	mssql.job.lastrunstatusmessage["{#JOBNAME}"] Preprocessing JSON Path: `$.last_run_status_message` Discard unchanged with heartbeat: `12h`
MSSQL Job '{#JOBNAME}': Run status	The possible values of the job status: 0 ⇒ Failed 1 ⇒ Succeeded 2 ⇒ Retry 3 ⇒ Canceled 4 ⇒ Running	Dependent item	mssql.job.runstatus["{#JOBNAME}"] Preprocessing JSON Path: `$.run_status` Discard unchanged with heartbeat: `15m`
MSSQL Job '{#JOBNAME}': Run duration	Duration of the last-run job.	Dependent item	mssql.job.run_duration["{#JOBNAME}"] Preprocessing JSON Path: `$.run_duration` Discard unchanged with heartbeat: `15m`

Trigger prototypes for Job discovery

Name	Description	Expression	Severity	Dependencies and additional info
MSSQL Job '{#JOBNAME}': Failed to run	The last run of the job has failed.	`last(/MSSQL by ODBC/mssql.job.runstatus["{#JOBNAME}"])=0`\|Warning	Manual close: Yes
MSSQL Job '{#JOBNAME}': Job duration is high	The job is taking too long.	`last(/MSSQL by ODBC/mssql.job.run_duration["{#JOBNAME}"])>{$MSSQL.BACKUP_DURATION.WARN:"{#JOBNAME}"}`\|Warning	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

db_mssql_agent2

View README Download JSON

MSSQL by Zabbix agent 2

Overview

This template is designed for the effortless deployment of MSSQL monitoring by Zabbix via Zabbix agent 2 and doesn't require any external scripts.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

Microsoft SQL, version 2019, 2022

Configuration

Setup

Deploy Zabbix agent 2 with the MSSQL plugin. You can use this template starting with version 6.0.27 of Zabbix and 6.0.39 of the MSSQL plugin. For more information, see MSSQL plugin documentation.

Important! Starting with Zabbix 6.0.39, the MSSQL plugin must be updated to a version equal to or above 6.0.39.

Loadable plugin requires installation of a separate package or binary file or compilation from sources.

Create an MSSQL user for monitoring. For example, "zbx_monitor".

View Server State and View Any Definition permissions should be granted to the user. Grant this user read permissions to the sysjobschedules, sysjobhistory, and sysjobs tables.

For example, using T-SQL commands:

GRANT SELECT ON OBJECT::msdb.dbo.sysjobs TO zbx_monitor;
GRANT SELECT ON OBJECT::msdb.dbo.sysjobservers TO zbx_monitor;
GRANT SELECT ON OBJECT::msdb.dbo.sysjobactivity TO zbx_monitor;
GRANT EXECUTE ON OBJECT::msdb.dbo.agent_datetime TO zbx_monitor;

For more information, see MSSQL documentation:

Create a database user

GRANT Server Permissions

Configure a User to Create and Manage SQL Server Agent Jobs

Set the username and password in the host macros {$MSSQL.USER} and {$MSSQL.PASSWORD}.
Set the connection string for the MSSQL instance in the {$MSSQL.URI} macro as a URI, such as <protocol://host:port>, or specify the named session - <sessionname>.

Note: Since version 6.0.36, you can also connect to the MSSQL instance using its name. To do this, set the connection string in the {$MSSQL.URI} macro as <protocol://host/instance_name>.

Macros used

Name	Description	Default
{$MSSQL.URI}	Connection string.	`<Put your URI here>`
{$MSSQL.USER}	MSSQL username.	`<Put your username here>`
{$MSSQL.PASSWORD}	MSSQL user password.	`<Put your password here>`
{$MSSQL.PORT}	MSSQL TCP port.	`1433`
{$MSSQL.DBNAME.MATCHES}	This macro is used in database discovery. It can be overridden on the host or linked template level.	`.*`
{$MSSQL.DBNAME.NOT_MATCHES}	This macro is used in database discovery. It can be overridden on the host or linked template level.	`master\|tempdb\|model\|msdb`
{$MSSQL.WORK_FILES.MAX}	The maximum number of work files created per second - for the trigger expression.	`20`
{$MSSQL.WORK_TABLES.MAX}	The maximum number of work tables created per second - for the trigger expression.	`20`
{$MSSQL.WORKTABLESFROMCACHE_RATIO.MIN.CRIT}	The minimum percentage of work tables from the cache ratio - for the High trigger expression.	`90`
{$MSSQL.BUFFERCACHERATIO.MIN.CRIT}	The minimum buffer cache hit ratio, in percent - for the High trigger expression.	`30`
{$MSSQL.BUFFERCACHERATIO.MIN.WARN}	The minimum buffer cache hit ratio, in percent - for the Warning trigger expression.	`50`
{$MSSQL.FREELISTSTALLS.MAX}	The maximum free list stalls per second - for the trigger expression.	`2`
{$MSSQL.LAZY_WRITES.MAX}	The maximum lazy writes per second - for the trigger expression.	`20`
{$MSSQL.PAGELIFEEXPECTANCY.MIN}	The minimum page life expectancy - for the trigger expression.	`300`
{$MSSQL.PAGE_READS.MAX}	The maximum page reads per second - for the trigger expression.	`90`
{$MSSQL.PAGE_WRITES.MAX}	The maximum page writes per second - for the trigger expression.	`90`
{$MSSQL.AVERAGEWAITTIME.MAX}	The maximum average wait time, in milliseconds - for the trigger expression.	`500`
{$MSSQL.LOCK_REQUESTS.MAX}	The maximum lock requests per second - for the trigger expression.	`1000`
{$MSSQL.LOCK_TIMEOUTS.MAX}	The maximum lock timeouts per second - for the trigger expression.	`1`
{$MSSQL.DEADLOCKS.MAX}	The maximum deadlocks per second - for the trigger expression.	`1`
{$MSSQL.LOGFLUSHWAITS.MAX}	The maximum log flush waits per second - for the trigger expression.	`1`
{$MSSQL.LOGFLUSHWAIT_TIME.MAX}	The maximum log flush wait time, in milliseconds - for the trigger expression.	`1`
{$MSSQL.PERCENTLOGUSED.MAX}	The maximum percentage of log used - for the trigger expression.	`80`
{$MSSQL.PERCENT_COMPILATIONS.MAX}	The maximum percentage of Transact-SQL compilations - for the trigger expression.	`10`
{$MSSQL.PERCENT_RECOMPILATIONS.MAX}	The maximum percentage of Transact-SQL recompilations - for the trigger expression.	`10`
{$MSSQL.PERCENT_READAHEAD.MAX}	The maximum percentage of pages read per second in anticipation of use - for the trigger expression.	`20`
{$MSSQL.BACKUP_DIFF.CRIT}	The maximum of days without a differential backup - for the High trigger expression.	`6d`
{$MSSQL.BACKUP_DIFF.WARN}	The maximum of days without a differential backup - for the Warning trigger expression.	`3d`
{$MSSQL.BACKUP_FULL.CRIT}	The maximum of days without a full backup - for the High trigger expression.	`10d`
{$MSSQL.BACKUP_FULL.WARN}	The maximum of days without a full backup - for the Warning trigger expression.	`9d`
{$MSSQL.BACKUP_LOG.CRIT}	The maximum of days without a log backup - for the High trigger expression.	`8h`
{$MSSQL.BACKUP_LOG.WARN}	The maximum of days without a log backup - for the Warning trigger expression.	`4h`
{$MSSQL.JOB.MATCHES}	This macro is used in job discovery. It can be overridden on the host or linked template level.	`.*`
{$MSSQL.JOB.NOT_MATCHES}	This macro is used in job discovery. It can be overridden on the host or linked template level.	`CHANGE_IF_NEEDED`
{$MSSQL.BACKUP_DURATION.WARN}	The maximum job duration - for the Warning trigger expression.	`1h`
{$MSSQL.BACKUP_FULL.USED}	The flag for checking the age of a full backup. If set to a value other than "1", the trigger expression for the full backup age will not fire. Can be used with context for database name.	`1`
{$MSSQL.BACKUP_LOG.USED}	The flag for checking the age of a log backup. If set to a value other than "1", the trigger expression for the log backup age will not fire. Can be used with context for database name.	`1`
{$MSSQL.BACKUP_DIFF.USED}	The flag for checking the age of a differential backup. If set to a value other than "1", the trigger expression for the differential backup age will not fire. Can be used with context for database name.	`1`
{$MSSQL.QUORUM.MEMBER.DISCOVERY.NAME.MATCHES}	Filter to include discovered quorum member by name.	`.*`
{$MSSQL.QUORUM.MEMBER.DISCOVERY.NAME.NOT_MATCHES}	Filter to exclude discovered quorum member by name.	`CHANGE_IF_NEEDED`

Items

Name	Description	Type	Key and additional info
MSSQL: Service's TCP port state	Test the availability of MSSQL Server on a TCP port.	Simple check	net.tcp.service[tcp,{HOST.CONN},{$MSSQL.PORT}] Preprocessing Discard unchanged with heartbeat: `10m`
MSSQL: Get last backup	The item gets information about backup processes.	Zabbix agent	mssql.last.backup.get["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"]
MSSQL: Get job status	The item gets the SQL agent job status.	Zabbix agent	mssql.job.status.get["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"]
MSSQL: Get performance counters	The item gets server global status information.	Zabbix agent	mssql.perfcounter.get["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"]
MSSQL: Get availability groups	The item gets availability group states - name, primary and secondary health, synchronization health.	Zabbix agent	mssql.availability.group.get["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"]
MSSQL: Get local DB	Getting the states of the local availability database.	Zabbix agent	mssql.local.db.get["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"]
MSSQL: Get DB mirroring	Getting DB mirroring.	Zabbix agent	mssql.mirroring.get["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"]
MSSQL: Get non-local DB	Getting the non-local availability database.	Zabbix agent	mssql.nonlocal.db.get["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"]
MSSQL: Get replica	Getting the database replica.	Zabbix agent	mssql.replica.get["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"]
MSSQL: Get quorum	Getting quorum - cluster name, type, and state.	Zabbix agent	mssql.quorum.get["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"]
MSSQL: Get quorum member	Getting quorum members - member name, type, state, and number of quorum votes.	Zabbix agent	mssql.quorum.member.get["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"]
MSSQL: Get database	Getting databases - database name and recovery model.	Zabbix agent	mssql.db.get["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"]
MSSQL: Version	MSSQL Server version.	Zabbix agent	mssql.version["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"] Preprocessing Discard unchanged with heartbeat: `1d`
MSSQL: Uptime	MSSQL Server uptime in the format "N days, hh:mm:ss".	Dependent item	mssql.uptime Preprocessing JSON Path: `$[?(@.counter_name=='Uptime')].cntr_value.first()`
MSSQL: Get Access Methods counters	The item gets server information about access methods.	Dependent item	mssql.access_methods.raw Preprocessing JSON Path: `$[?(@.object_name=~'.*Access Methods')]` ⛔️Custom on fail: Discard value
MSSQL: Forwarded records per second	Number of records per second fetched through forwarded record pointers.	Dependent item	mssql.forwardedrecordssec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Full scans per second	Number of unrestricted full scans per second. These can be either base-table or full-index scans. Values greater than 1 or 2 indicate that there are table / index page scans. If that is combined with high CPU, this counter requires further investigation, otherwise, if the full scans are on small tables, it can be ignored.	Dependent item	mssql.fullscanssec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Full Scans/sec')].cntr_value.first()` Change per second
MSSQL: Index searches per second	Number of index searches per second. These are used to start a range scan, reposition a range scan, revalidate a scan point, fetch a single index record, and search down the index to locate where to insert a new row.	Dependent item	mssql.indexsearchessec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Page splits per second	Number of page splits per second that occur as a result of overflowing index pages.	Dependent item	mssql.pagesplitssec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Page Splits/sec')].cntr_value.first()` Change per second
MSSQL: Work files created per second	Number of work files created per second. For example, work files can be used to store temporary results for hash joins and hash aggregates.	Dependent item	mssql.workfilescreatedsec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Work tables created per second	Number of work tables created per second. For example, work tables can be used to store temporary results for query spool, LOB variables, XML variables, and cursors.	Dependent item	mssql.worktablescreatedsec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Table lock escalations per second	Number of times locks on a table were escalated to the TABLE or HoBT granularity.	Dependent item	mssql.tablelockescalations.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Worktables from cache ratio	Percentage of work tables created where the initial two pages of the work table were not allocated but were immediately available from the work table cache.	Dependent item	mssql.worktablesfromcache_ratio Preprocessing JSON Path: `The text is too long. Please see the template.`
MSSQL: Get Buffer Manager counters	The item gets server information about the buffer pool.	Dependent item	mssql.buffer_manager.raw Preprocessing JSON Path: `$[?(@.object_name=~'.*Buffer Manager')]` ⛔️Custom on fail: Discard value
MSSQL: Buffer cache hit ratio	Indicates the percentage of pages found in the buffer cache without having to read from the disk. The ratio is the total number of cache hits divided by the total number of cache lookups over the last few thousand page accesses. After a long period of time, the ratio changes very little. Since reading from the cache is much less expensive than reading from the disk, a higher value is preferred for this item. To increase the buffer cache hit ratio, consider increasing the amount of memory available to MSSQL Server or using the buffer pool extension feature.	Dependent item	mssql.buffercachehit_ratio Preprocessing JSON Path: `The text is too long. Please see the template.`
MSSQL: Checkpoint pages per second	Indicates the number of pages flushed to the disk per second by a checkpoint or other operation which required all dirty pages to be flushed.	Dependent item	mssql.checkpointpagessec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Database pages	Indicates the number of pages in the buffer pool with database content.	Dependent item	mssql.database_pages Preprocessing JSON Path: `$[?(@.counter_name=='Database pages')].cntr_value.first()`
MSSQL: Free list stalls per second	Indicates the number of requests per second that had to wait for a free page.	Dependent item	mssql.freeliststalls_sec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Lazy writes per second	Indicates the number of buffers written per second by the buffer manager's lazy writer. The lazy writer is a system process that flushes out batches of dirty, aged buffers (buffers that contain changes that must be written back to the disk before the buffer can be reused for a different page) and makes them available to user processes. The lazy writer eliminates the need to perform frequent checkpoints in order to create available buffers.	Dependent item	mssql.lazywritessec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Lazy writes/sec')].cntr_value.first()` Change per second
MSSQL: Page life expectancy	Indicates the number of seconds a page will stay in the buffer pool without references.	Dependent item	mssql.pagelifeexpectancy Preprocessing JSON Path: `The text is too long. Please see the template.`
MSSQL: Page lookups per second	Indicates the number of requests per second to find a page in the buffer pool.	Dependent item	mssql.pagelookupssec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Page lookups/sec')].cntr_value.first()` Change per second
MSSQL: Page reads per second	Indicates the number of physical database page reads that are issued per second. This statistic displays the total number of physical page reads across all databases. As physical I/O is expensive, you may be able to minimize the cost either by using a larger data cache, intelligent indexes, and more efficient queries, or by changing the database design.	Dependent item	mssql.pagereadssec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Page reads/sec')].cntr_value.first()` Change per second
MSSQL: Page writes per second	Indicates the number of physical database page writes that are issued per second.	Dependent item	mssql.pagewritessec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Page writes/sec')].cntr_value.first()` Change per second
MSSQL: Read-ahead pages per second	Indicates the number of pages read per second in anticipation of use.	Dependent item	mssql.readaheadpagessec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Target pages	The optimal number of pages in the buffer pool.	Dependent item	mssql.target_pages Preprocessing JSON Path: `$[?(@.counter_name=='Target pages')].cntr_value.first()` Discard unchanged with heartbeat: `1h`
MSSQL: Get DB counters	The item gets summary information about databases.	Dependent item	mssql.db_info.raw Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
MSSQL: Total data file size	Total size of all data files.	Dependent item	mssql.datafilessize Preprocessing JSON Path: `The text is too long. Please see the template.` Custom multiplier: `1024` Discard unchanged with heartbeat: `1h`
MSSQL: Total log file size	Total size of all the transaction log files.	Dependent item	mssql.logfilessize Preprocessing JSON Path: `The text is too long. Please see the template.` Custom multiplier: `1024` Discard unchanged with heartbeat: `1h`
MSSQL: Total log file used size	The cumulative size of all the log files in the database.	Dependent item	mssql.logfilesused_size Preprocessing JSON Path: `The text is too long. Please see the template.` Custom multiplier: `1024`
MSSQL: Total transactions per second	Total number of transactions started for all databases per second.	Dependent item	mssql.transactions_sec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Transactions/sec')].cntr_value.first()` Change per second
MSSQL: Get General Statistics counters	The item gets general statistics information.	Dependent item	mssql.general_statistics.raw Preprocessing JSON Path: `$[?(@.object_name=~'.*General Statistics')]` ⛔️Custom on fail: Discard value
MSSQL: Logins per second	Total number of logins started per second. This does not include pooled connections. Any value over 2 may indicate insufficient connection pooling.	Dependent item	mssql.logins_sec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Logins/sec')].cntr_value.first()` Change per second
MSSQL: Logouts per second	Total number of logout operations started per second. Any value over 2 may indicate insufficient connection pooling.	Dependent item	mssql.logouts_sec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Logouts/sec')].cntr_value.first()` Change per second
MSSQL: Number of blocked processes	Number of currently blocked processes.	Dependent item	mssql.processes_blocked Preprocessing JSON Path: `$[?(@.counter_name=='Processes blocked')].cntr_value.first()` Discard unchanged with heartbeat: `1h`
MSSQL: Number of users connected	Number of users connected to MSSQL Server.	Dependent item	mssql.user_connections Preprocessing JSON Path: `$[?(@.counter_name=='User Connections')].cntr_value.first()`
MSSQL: Average latch wait time	Average latch wait time (in milliseconds) for latch requests that had to wait.	Calculated	mssql.averagelatchwait_time
MSSQL: Get Latches counters	The item gets server information about latches.	Dependent item	mssql.latches_info.raw Preprocessing JSON Path: `$[?(@.object_name=~'.*Latches')]` ⛔️Custom on fail: Discard value
MSSQL: Average latch wait time raw	Average latch wait time (in milliseconds) for latch requests that had to wait.	Dependent item	mssql.averagelatchwaittimeraw Preprocessing JSON Path: `The text is too long. Please see the template.`
MSSQL: Average latch wait time base	For internal use only.	Dependent item	mssql.averagelatchwaittimebase Preprocessing JSON Path: `The text is too long. Please see the template.`
MSSQL: Latch waits per second	The number of latch requests that could not be granted immediately. Latches are lightweight means of holding a very transient server resource, such as an address in memory.	Dependent item	mssql.latchwaitssec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Latch Waits/sec')].cntr_value.first()` Change per second
MSSQL: Total latch wait time	Total latch wait time (in milliseconds) for latch requests in the last second. This value should stay stable compared to the number of latch waits per second.	Dependent item	mssql.totallatchwait_time Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Total average wait time	The average wait time, in milliseconds, for each lock request that had to wait.	Calculated	mssql.averagewaittime
MSSQL: Get Locks counters	The item gets server information about locks.	Dependent item	mssql.locks_info.raw Preprocessing JSON Path: `$[?(@.object_name=~'.*Locks' && @.instance_name=='_Total')]` ⛔️Custom on fail: Discard value
MSSQL: Total average wait time raw	Average amount of wait time (in milliseconds) for each lock request that resulted in a wait. Information for all locks.	Dependent item	mssql.averagewaittime_raw Preprocessing JSON Path: `The text is too long. Please see the template.`
MSSQL: Total average wait time base	For internal use only.	Dependent item	mssql.averagewaittime_base Preprocessing JSON Path: `The text is too long. Please see the template.`
MSSQL: Total lock requests per second	Number of new locks and lock conversions per second requested from the lock manager.	Dependent item	mssql.lockrequestssec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Lock Requests/sec')].cntr_value.first()` Change per second
MSSQL: Total lock requests per second that timed out	Number of timed out lock requests per second, including requests for NOWAIT locks.	Dependent item	mssql.locktimeoutssec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Lock Timeouts/sec')].cntr_value.first()` Change per second
MSSQL: Total lock requests per second that required waiting	Number of lock requests per second that required the caller to wait.	Dependent item	mssql.lockwaitssec.rate Preprocessing JSON Path: `$[?(@.counter_name=='Lock Waits/sec')].cntr_value.first()` Change per second
MSSQL: Lock wait time	Average of total wait time (in milliseconds) for locks in the last second.	Dependent item	mssql.lockwaittime Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Total lock requests per second that have deadlocks	Number of lock requests per second that resulted in a deadlock.	Dependent item	mssql.numberdeadlockssec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Get Memory counters	The item gets memory information.	Dependent item	mssql.mem_manager.raw Preprocessing JSON Path: `$[?(@.object_name=~'.*Memory Manager')]` ⛔️Custom on fail: Discard value
MSSQL: Granted Workspace Memory	Specifies the total amount of memory currently granted to executing processes, such as hash, sort, bulk copy, and index creation operations.	Dependent item	mssql.grantedworkspacememory Preprocessing JSON Path: `The text is too long. Please see the template.` Custom multiplier: `1024`
MSSQL: Maximum workspace memory	Indicates the maximum amount of memory available for executing processes, such as hash, sort, bulk copy, and index creation operations.	Dependent item	mssql.maximumworkspacememory Preprocessing JSON Path: `The text is too long. Please see the template.` Custom multiplier: `1024`
MSSQL: Memory grants outstanding	Specifies the total number of processes that have successfully acquired a workspace memory grant.	Dependent item	mssql.memorygrantsoutstanding Preprocessing JSON Path: `The text is too long. Please see the template.`
MSSQL: Memory grants pending	Specifies the total number of processes waiting for a workspace memory grant.	Dependent item	mssql.memorygrantspending Preprocessing JSON Path: `The text is too long. Please see the template.`
MSSQL: Target server memory	Indicates the ideal amount of memory the server can consume.	Dependent item	mssql.targetservermemory Preprocessing JSON Path: `The text is too long. Please see the template.` Custom multiplier: `1024`
MSSQL: Total server memory	Specifies the amount of memory the server has committed using the memory manager.	Dependent item	mssql.totalservermemory Preprocessing JSON Path: `The text is too long. Please see the template.` Custom multiplier: `1024`
MSSQL: Get Cache counters	The item gets server information about cache.	Dependent item	mssql.cache_info.raw Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
MSSQL: Cache hit ratio	Ratio between cache hits and lookups.	Dependent item	mssql.cachehitratio Preprocessing JSON Path: `$[?(@.counter_name=='CacheHitRatio')].cntr_value.first()`
MSSQL: Cache object counts	Number of cache objects in the cache.	Dependent item	mssql.cacheobjectcounts Preprocessing JSON Path: `The text is too long. Please see the template.`
MSSQL: Cache objects in use	Number of cache objects in use.	Dependent item	mssql.cacheobjectsin_use Preprocessing JSON Path: `The text is too long. Please see the template.`
MSSQL: Cache pages	Number of 8-kilobyte (KB) pages used by cache objects.	Dependent item	mssql.cache_pages Preprocessing JSON Path: `$[?(@.counter_name=='Cache Pages')].cntr_value.first()`
MSSQL: Get SQL Errors counters	The item gets SQL error information.	Dependent item	mssql.sql_errors.raw Preprocessing JSON Path: `$[?(@.object_name=~'.*SQL Errors')]` ⛔️Custom on fail: Discard value
MSSQL: Errors per second (DB offline errors)	Number of errors per second.	Dependent item	mssql.offlineerrorssec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Errors per second (Info errors)	Number of errors per second.	Dependent item	mssql.infoerrorssec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Errors per second (Kill connection errors)	Number of errors per second.	Dependent item	mssql.killconnectionerrors_sec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Errors per second (User errors)	Number of errors per second.	Dependent item	mssql.usererrorssec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Total errors per second	Number of errors per second.	Dependent item	mssql.errors_sec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Get SQL Statistics counters	The item gets SQL statistics information.	Dependent item	mssql.sql_statistics.raw Preprocessing JSON Path: `$[?(@.object_name=~'.*SQL Statistics')]` ⛔️Custom on fail: Discard value
MSSQL: Auto-param attempts per second	Number of auto-parameterization attempts per second. The total should be the sum of the failed, safe, and unsafe auto-parameterizations. Auto-parameterization occurs when an instance of SQL Server tries to parameterize a Transact-SQL request by replacing some literals with parameters so that reuse of the resulting cached execution plan across multiple similar-looking requests is possible. Note that auto-parameterizations are also known as simple parameterizations in the newer versions of SQL Server. This counter does not include forced parameterizations.	Dependent item	mssql.autoparamattemptssec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Batch requests per second	Number of Transact-SQL command batches received per second. This statistic is affected by all constraints (such as I/O, number of users, cache size, complexity of requests, and so on). High batch requests mean good throughput.	Dependent item	mssql.batchrequestssec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Percent of ad hoc queries running	The ratio of SQL compilations per second to batch requests per second, in percent.	Calculated	mssql.percentofadhoc_queries
MSSQL: Percent of Recompiled Transact-SQL Objects	The ratio of SQL re-compilations per second to SQL compilations per second, in percent.	Calculated	mssql.percentrecompilationsto_compilations
MSSQL: Full scans to Index searches ratio	The ratio of full scans per second to index searches per second. The threshold recommendation is strictly for OLTP workloads.	Calculated	mssql.scantosearch
MSSQL: Failed auto-params per second	Number of failed auto-parameterization attempts per second. This number should be small. Note that auto-parameterizations are also known as simple parameterizations in the newer versions of SQL Server.	Dependent item	mssql.failedautoparamssec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Safe auto-params per second	Number of safe auto-parameterization attempts per second. Safe refers to a determination that a cached execution plan can be shared between different similar-looking Transact-SQL statements. SQL Server makes many auto-parameterization attempts, some of which turn out to be safe and others fail. Note that auto-parameterizations are also known as simple parameterizations in the newer versions of SQL Server. This does not include forced parameterizations.	Dependent item	mssql.safeautoparamssec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: SQL compilations per second	Number of SQL compilations per second. Indicates the number of times the compile code path is entered. Includes runs caused by statement-level recompilations in SQL Server. After SQL Server user activity is stable, this value reaches a steady state.	Dependent item	mssql.sqlcompilationssec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: SQL re-compilations per second	Number of statement recompiles per second. Counts the number of times statement recompiles are triggered. Generally, you want the recompiles to be low.	Dependent item	mssql.sqlrecompilationssec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Unsafe auto-params per second	Number of unsafe auto-parameterization attempts per second. For example, the query has some characteristics that prevent the cached plan from being shared. These are designated as unsafe. This does not count the number of forced parameterizations.	Dependent item	mssql.unsafeautoparamssec.rate Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL: Total transactions number	The number of currently active transactions of all types.	Dependent item	mssql.transactions Preprocessing JSON Path: `The text is too long. Please see the template.`

Triggers

Name	Description	Expression	Severity
MSSQL: Service is unavailable	The TCP port of the MSSQL Server service is currently unavailable.	`last(/MSSQL by Zabbix agent 2/net.tcp.service[tcp,{HOST.CONN},{$MSSQL.PORT}])=0`\|Disaster
MSSQL: Version has changed	MSSQL version has changed. Acknowledge to close the problem manually.	`last(/MSSQL by Zabbix agent 2/mssql.version["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"],#1)<>last(/MSSQL by Zabbix agent 2/mssql.version["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"],#2) and length(last(/MSSQL by Zabbix agent 2/mssql.version["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"]))>0`\|Info	Manual close: Yes
MSSQL: Service has been restarted	Uptime is less than 10 minutes.	`last(/MSSQL by Zabbix agent 2/mssql.uptime)<10m`\|Info	Manual close: Yes
MSSQL: Failed to fetch info data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/MSSQL by Zabbix agent 2/mssql.uptime,30m)=1`\|Info	Depends on: MSSQL: Service is unavailable
MSSQL: Too frequently using pointers	Rows with VARCHAR columns can experience expansion when VARCHAR values are updated with a longer string. In the case where the row cannot fit in the existing page, the row migrates, and access to the row will traverse a pointer. This only happens on heaps (tables without clustered indexes). In cases where clustered indexes cannot be used, drop non-clustered indexes, build a clustered index to reorg pages and rows, drop the clustered index, then recreate non-clustered indexes.	`last(/MSSQL by Zabbix agent 2/mssql.forwarded_records_sec.rate) * 100 > 10 * last(/MSSQL by Zabbix agent 2/mssql.batch_requests_sec.rate)`\|Warning
MSSQL: Number of work files created per second is high	Too many work files created per second to store temporary results for hash joins and hash aggregates.	`min(/MSSQL by Zabbix agent 2/mssql.workfiles_created_sec.rate,5m)>{$MSSQL.WORK_FILES.MAX}`\|Average
MSSQL: Number of work tables created per second is high	Too many work tables created per second to store temporary results for query spool, LOB variables, XML variables, and cursors.	`min(/MSSQL by Zabbix agent 2/mssql.worktables_created_sec.rate,5m)>{$MSSQL.WORK_TABLES.MAX}`\|Average
MSSQL: Percentage of work tables available from the work table cache is low	A value less than 90% may indicate insufficient memory, since execution plans are being dropped, or, on 32-bit systems, may indicate the need for an upgrade to a 64-bit system.	`max(/MSSQL by Zabbix agent 2/mssql.worktables_from_cache_ratio,5m)<{$MSSQL.WORKTABLES_FROM_CACHE_RATIO.MIN.CRIT}`\|High
MSSQL: Percentage of the buffer cache efficiency is low	Too low buffer cache hit ratio.	`max(/MSSQL by Zabbix agent 2/mssql.buffer_cache_hit_ratio,5m)<{$MSSQL.BUFFER_CACHE_RATIO.MIN.CRIT}`\|High
MSSQL: Percentage of the buffer cache efficiency is low	Low buffer cache hit ratio.	`max(/MSSQL by Zabbix agent 2/mssql.buffer_cache_hit_ratio,5m)<{$MSSQL.BUFFER_CACHE_RATIO.MIN.WARN}`\|Warning	Depends on: MSSQL: Percentage of the buffer cache efficiency is low
MSSQL: Number of rps waiting for a free page is high	Some requests have to wait for a free page.	`min(/MSSQL by Zabbix agent 2/mssql.free_list_stalls_sec.rate,5m)>{$MSSQL.FREE_LIST_STALLS.MAX}`\|Warning
MSSQL: Number of buffers written per second by the lazy writer is high	The number of buffers written per second by the buffer manager's lazy writer exceeds the threshold.	`min(/MSSQL by Zabbix agent 2/mssql.lazy_writes_sec.rate,5m)>{$MSSQL.LAZY_WRITES.MAX}`\|Warning
MSSQL: Page life expectancy is low	The page stays in the buffer pool without references for less time than the threshold value.	`max(/MSSQL by Zabbix agent 2/mssql.page_life_expectancy,15m)<{$MSSQL.PAGE_LIFE_EXPECTANCY.MIN}`\|High
MSSQL: Number of physical database page reads per second is high	The physical database page reads are issued too frequently.	`min(/MSSQL by Zabbix agent 2/mssql.page_reads_sec.rate,5m)>{$MSSQL.PAGE_READS.MAX}`\|Warning
MSSQL: Number of physical database page writes per second is high	The physical database page writes are issued too frequently.	`min(/MSSQL by Zabbix agent 2/mssql.page_writes_sec.rate,5m)>{$MSSQL.PAGE_WRITES.MAX}`\|Warning
MSSQL: Too many physical reads occurring	If this value makes up even a sizeable minority of the total "Page Reads/sec" (say, greater than 20% of the total page reads), you may have too many physical reads occurring.	`last(/MSSQL by Zabbix agent 2/mssql.readahead_pages_sec.rate) > {$MSSQL.PERCENT_READAHEAD.MAX} / 100 * last(/MSSQL by Zabbix agent 2/mssql.page_reads_sec.rate)`\|Warning
MSSQL: Total average wait time for locks is high	An average wait time longer than 500 ms may indicate excessive blocking. This value should generally correlate to "Lock Waits/sec" and move up or down with it accordingly.	`min(/MSSQL by Zabbix agent 2/mssql.average_wait_time,5m)>{$MSSQL.AVERAGE_WAIT_TIME.MAX}`\|Warning
MSSQL: Total number of locks per second is high	Number of new locks and lock conversions per second requested from the lock manager is high.	`min(/MSSQL by Zabbix agent 2/mssql.lock_requests_sec.rate,5m)>{$MSSQL.LOCK_REQUESTS.MAX}`\|Warning
MSSQL: Total lock requests per second that timed out is high	The total number of timed out lock requests per second, including requests for NOWAIT locks, is high.	`min(/MSSQL by Zabbix agent 2/mssql.lock_timeouts_sec.rate,5m)>{$MSSQL.LOCK_TIMEOUTS.MAX}`\|Warning
MSSQL: Some blocking is occurring for 5m	Values greater than zero indicate at least some blocking is occurring, while a value of zero can quickly eliminate blocking as a potential root-cause problem.	`min(/MSSQL by Zabbix agent 2/mssql.lock_waits_sec.rate,5m)>0`\|Average
MSSQL: Number of deadlocks is high	Too many deadlocks are occurring currently.	`min(/MSSQL by Zabbix agent 2/mssql.number_deadlocks_sec.rate,5m)>{$MSSQL.DEADLOCKS.MAX}`\|Average
MSSQL: Percent of ad hoc queries running is high	The lower this value is, the better. High values often indicate excessive ad hoc querying and should be as low as possible. If excessive ad hoc querying is happening, try rewriting the queries as procedures or invoke the queries using `sp_executeSQL`. When rewriting isn't possible, consider using a plan guide or setting the database to parameterization forced mode.	`min(/MSSQL by Zabbix agent 2/mssql.percent_of_adhoc_queries,15m) > {$MSSQL.PERCENT_COMPILATIONS.MAX}`\|Warning
MSSQL: Percent of times statement recompiles is high	This number should be at or near zero, since recompiles can cause deadlocks and exclusive compile locks. This counter's value should follow in proportion to "Batch Requests/sec" and "SQL Compilations/sec".	`min(/MSSQL by Zabbix agent 2/mssql.percent_recompilations_to_compilations,15m) > {$MSSQL.PERCENT_RECOMPILATIONS.MAX}`\|Warning
MSSQL: Number of index and table scans exceeds index searches in the last 15m	Index searches are preferable to index and table scans. For OLTP applications, optimize for more index searches and less scans (preferably, 1 full scan for every 1000 index searches). Index and table scans are expensive I/O operations.	`min(/MSSQL by Zabbix agent 2/mssql.scan_to_search,15m) > 0.001`\|Warning

LLD rule Database discovery

Name Description Type Key and additional info

Database discovery

Scanning databases in DBMS.

Dependent item

mssql.database.discovery

Preprocessing

Discard unchanged with heartbeat: 1d

Item prototypes for Database discovery

Name	Description	Type	Key and additional info
MSSQL DB '{#DBNAME}': Get performance counters	The item gets server status information for {#DBNAME}.	Dependent item	mssql.db.perf_raw["{#DBNAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
MSSQL DB '{#DBNAME}': Get last backup	The item gets information about backup processes for {#DBNAME}.	Dependent item	mssql.backup.raw["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.dbname=='{#DBNAME}')]` ⛔️Custom on fail: Discard value
MSSQL DB '{#DBNAME}': State	0 = Online 1 = Restoring 2 = Recovering \| SQL Server 2008 and later 3 = Recoverypending \| SQL Server 2008 and later 4 = Suspect 5 = Emergency \| SQL Server 2008 and later 6 = Offline \| SQL Server 2008 and later 7 = Copying \| Azure SQL Database Active Geo-Replication 10 = Offlinesecondary \| Azure SQL Database Active Geo-Replication	Dependent item	mssql.db.state["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.counter_name=='State')].cntr_value.first()` Discard unchanged with heartbeat: `15m`
MSSQL DB '{#DBNAME}': Active transactions	Number of active transactions for the database.	Dependent item	mssql.db.active_transactions["{#DBNAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.`
MSSQL DB '{#DBNAME}': Data file size	Cumulative size of all the data files in the database including any automatic growth. Monitoring this counter is useful, for example, for determining the correct size of `tempdb`.	Dependent item	mssql.db.datafilessize["{#DBNAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Custom multiplier: `1024`
MSSQL DB '{#DBNAME}': Log bytes flushed per second	Total number of log bytes flushed per second. Useful for determining trends and utilization of the transaction log.	Dependent item	mssql.db.logbytesflushed_sec.rate["{#DBNAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL DB '{#DBNAME}': Log file size	Cumulative size of all the transaction log files in the database.	Dependent item	mssql.db.logfilessize["{#DBNAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Custom multiplier: `1024`
MSSQL DB '{#DBNAME}': Log file used size	Cumulative size of all the log files in the database.	Dependent item	mssql.db.logfilesused_size["{#DBNAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Custom multiplier: `1024`
MSSQL DB '{#DBNAME}': Log flushes per second	Number of log flushes per second.	Dependent item	mssql.db.logflushessec.rate["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.counter_name=='Log Flushes/sec')].cntr_value.first()` Change per second
MSSQL DB '{#DBNAME}': Log flush waits per second	Number of commits per second waiting for the log flush.	Dependent item	mssql.db.logflushwaits_sec.rate["{#DBNAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL DB '{#DBNAME}': Log flush wait time	Total wait time (in milliseconds) to flush the log. On an Always On secondary database, this value indicates the wait time for log records to be hardened to disk.	Dependent item	mssql.db.logflushwait_time["{#DBNAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Change per second
MSSQL DB '{#DBNAME}': Log growths	Total number of times the transaction log for the database has been expanded.	Dependent item	mssql.db.log_growths["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.counter_name=='Log Growths')].cntr_value.first()`
MSSQL DB '{#DBNAME}': Log shrinks	Total number of times the transaction log for the database has been shrunk.	Dependent item	mssql.db.log_shrinks["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.counter_name=='Log Shrinks')].cntr_value.first()`
MSSQL DB '{#DBNAME}': Log truncations	Number of times the transaction log has been shrunk.	Dependent item	mssql.db.log_truncations["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.counter_name=='Log Truncations')].cntr_value.first()`
MSSQL DB '{#DBNAME}': Percent log used	Percentage of log space in use.	Dependent item	mssql.db.percentlogused["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.counter_name=='Percent Log Used')].cntr_value.first()`
MSSQL DB '{#DBNAME}': Transactions per second	Number of transactions started for the database per second.	Dependent item	mssql.db.transactions_sec.rate["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.counter_name=='Transactions/sec')].cntr_value.first()` Change per second
MSSQL DB '{#DBNAME}': Last diff backup duration	Duration of the last differential backup.	Dependent item	mssql.backup.diff.duration["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.type=='I')].duration.first()` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `12h`
MSSQL DB '{#DBNAME}': Last diff backup (time ago)	The amount of time since the last differential backup.	Dependent item	mssql.backup.diff["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.type=='I')].time_since_last_backup.first()` ⛔️Custom on fail: Set value to: `0`
MSSQL DB '{#DBNAME}': Last full backup duration	Duration of the last full backup.	Dependent item	mssql.backup.full.duration["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.type=='D')].duration.first()` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `12h`
MSSQL DB '{#DBNAME}': Last full backup (time ago)	The amount of time since the last full backup.	Dependent item	mssql.backup.full["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.type=='D')].time_since_last_backup.first()` ⛔️Custom on fail: Set value to: `0`
MSSQL DB '{#DBNAME}': Last log backup duration	Duration of the last log backup.	Dependent item	mssql.backup.log.duration["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.type=='L')].duration.first()` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `12h`
MSSQL DB '{#DBNAME}': Last log backup (time ago)	The amount of time since the last log backup.	Dependent item	mssql.backup.log["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.type=='L')].time_since_last_backup.first()` ⛔️Custom on fail: Set value to: `0`
MSSQL DB '{#DBNAME}': Recovery model	Recovery model selected: 1 = Full 2 = Bulk_logged 3 = Simple	Dependent item	mssql.backup.recovery_model["{#DBNAME}"] Preprocessing JSON Path: `$[0].db_recovery_model` ⛔️Custom on fail: Set value to: `1` Discard unchanged with heartbeat: `1d`

Trigger prototypes for Database discovery

Name	Description	Expression	Severity
MSSQL DB '{#DBNAME}': State is {ITEM.VALUE}	The DB has a non-working state.	`last(/MSSQL by Zabbix agent 2/mssql.db.state["{#DBNAME}"])>1`\|High
MSSQL DB '{#DBNAME}': Number of commits waiting for the log flush is high	Too many commits are waiting for the log flush.	`min(/MSSQL by Zabbix agent 2/mssql.db.log_flush_waits_sec.rate["{#DBNAME}"],5m)>{$MSSQL.LOG_FLUSH_WAITS.MAX:"{#DBNAME}"}`\|Warning
MSSQL DB '{#DBNAME}': Total wait time to flush the log is high	The wait time to flush the log is too long.	`min(/MSSQL by Zabbix agent 2/mssql.db.log_flush_wait_time["{#DBNAME}"],5m)>{$MSSQL.LOG_FLUSH_WAIT_TIME.MAX:"{#DBNAME}"}`\|Warning
MSSQL DB '{#DBNAME}': Percent of log usage is high	There's not enough space left in the log.	`min(/MSSQL by Zabbix agent 2/mssql.db.percent_log_used["{#DBNAME}"],5m)>{$MSSQL.PERCENT_LOG_USED.MAX:"{#DBNAME}"}`\|Warning
MSSQL DB '{#DBNAME}': Diff backup is old	The differential backup has not been executed for a long time.	`last(/MSSQL by Zabbix agent 2/mssql.backup.diff["{#DBNAME}"])>{$MSSQL.BACKUP_DIFF.CRIT:"{#DBNAME}"} and {$MSSQL.BACKUP_DIFF.USED:"{#DBNAME}"}=1`\|High	Manual close: Yes
MSSQL DB '{#DBNAME}': Diff backup is old	The differential backup has not been executed for a long time.	`last(/MSSQL by Zabbix agent 2/mssql.backup.diff["{#DBNAME}"])>{$MSSQL.BACKUP_DIFF.WARN:"{#DBNAME}"} and {$MSSQL.BACKUP_DIFF.USED:"{#DBNAME}"}=1`\|Warning	Manual close: Yes Depends on: MSSQL DB '{#DBNAME}': Diff backup is old
MSSQL DB '{#DBNAME}': Full backup is old	The full backup has not been executed for a long time.	`last(/MSSQL by Zabbix agent 2/mssql.backup.full["{#DBNAME}"])>{$MSSQL.BACKUP_FULL.CRIT:"{#DBNAME}"} and {$MSSQL.BACKUP_FULL.USED:"{#DBNAME}"}=1`\|High	Manual close: Yes
MSSQL DB '{#DBNAME}': Full backup is old	The full backup has not been executed for a long time.	`last(/MSSQL by Zabbix agent 2/mssql.backup.full["{#DBNAME}"])>{$MSSQL.BACKUP_FULL.WARN:"{#DBNAME}"} and {$MSSQL.BACKUP_FULL.USED:"{#DBNAME}"}=1`\|Warning	Manual close: Yes Depends on: MSSQL DB '{#DBNAME}': Full backup is old
MSSQL DB '{#DBNAME}': Log backup is old	The log backup has not been executed for a long time.	`last(/MSSQL by Zabbix agent 2/mssql.backup.log["{#DBNAME}"])>{$MSSQL.BACKUP_LOG.CRIT:"{#DBNAME}"} and {$MSSQL.BACKUP_LOG.USED:"{#DBNAME}"}=1 and last(/MSSQL by Zabbix agent 2/mssql.backup.recovery_model["{#DBNAME}"])<>3`\|High	Manual close: Yes
MSSQL DB '{#DBNAME}': Log backup is old	The log backup has not been executed for a long time.	`last(/MSSQL by Zabbix agent 2/mssql.backup.log["{#DBNAME}"])>{$MSSQL.BACKUP_LOG.WARN:"{#DBNAME}"} and {$MSSQL.BACKUP_LOG.USED:"{#DBNAME}"}=1 and last(/MSSQL by Zabbix agent 2/mssql.backup.recovery_model["{#DBNAME}"])<>3`\|Warning	Manual close: Yes Depends on: MSSQL DB '{#DBNAME}': Log backup is old

LLD rule Availability group discovery

Name Description Type Key and additional info

Availability group discovery

Discovery of the existing availability groups.

Dependent item

mssql.availability.group.discovery

Preprocessing

Discard unchanged with heartbeat: 1d

Item prototypes for Availability group discovery

Name	Description	Type	Key and additional info
MSSQL AG '{#GROUP_NAME}': Primary replica recovery health	Indicates the recovery health of the primary replica: 0 = In progress 1 = Online 2 = Unavailable	Dependent item	mssql.primaryrecoveryhealth["{#GROUP_NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
MSSQL AG '{#GROUP_NAME}': Primary replica name	Name of the server instance that is hosting the current primary replica.	Dependent item	mssql.primaryreplica["{#GROUPNAME}"] Preprocessing JSON Path: `$[?(@.group_name=='{#GROUP_NAME}')].primary_replica.first()` Discard unchanged with heartbeat: `3h`
MSSQL AG '{#GROUP_NAME}': Secondary replica recovery health	Indicates the recovery health of a secondary replica: 0 = In progress 1 = Online 2 = Unavailable	Dependent item	mssql.secondaryrecoveryhealth["{#GROUP_NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
MSSQL AG '{#GROUP_NAME}': Synchronization health	Reflects a rollup of the `synchronization_health` of all availability replicas in the availability group: 0 = Not healthy. None of the availability replicas have a healthy synchronization. 1 = Partially healthy. The synchronization of some, but not all, availability replicas is healthy. 2 = Healthy. The synchronization of every availability replica is healthy.	Dependent item	mssql.synchronizationhealth["{#GROUPNAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Availability group discovery

Name	Description	Expression
MSSQL AG '{#GROUP_NAME}': Primary replica recovery health in progress	The primary replica is in the synchronization process.	`last(/MSSQL by Zabbix agent 2/mssql.primary_recovery_health["{#GROUP_NAME}"])=0`\|Warning
MSSQL AG '{#GROUP_NAME}': Secondary replica recovery health in progress	The secondary replica is in the synchronization process.	`last(/MSSQL by Zabbix agent 2/mssql.secondary_recovery_health["{#GROUP_NAME}"])=0`\|Warning
MSSQL AG '{#GROUP_NAME}': All replicas unhealthy	None of the availability replicas have a healthy synchronization.	`last(/MSSQL by Zabbix agent 2/mssql.synchronization_health["{#GROUP_NAME}"])=0`\|Disaster
MSSQL AG '{#GROUP_NAME}': Some replicas unhealthy	The synchronization health of some, but not all, availability replicas is healthy.	`last(/MSSQL by Zabbix agent 2/mssql.synchronization_health["{#GROUP_NAME}"])=1`\|High

LLD rule Local database discovery

Name Description Type Key and additional info

Local database discovery

Discovery of the local availability databases.

Dependent item

mssql.local.db.discovery

Preprocessing

Discard unchanged with heartbeat: 1d

Item prototypes for Local database discovery

Name Description Type Key and additional info

MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': State

0 = Online

1 = Restoring

2 = Recovering

3 = Recovery pending

4 = Suspect

5 = Emergency

6 = Offline

Dependent item

mssql.local_db.state["{#DBNAME}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': Suspended

Database state:

0 = Resumed

1 = Suspended

Dependent item

mssql.localdb.issuspended["{#DBNAME}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': Synchronization health

0 = Not healthy. The synchronizationstate of the database is 0 ("Not synchronizing").

1 = Partially healthy. A database on a synchronous-commit availability replica is considered partially healthy if synchronizationstate is 1 ("Synchronizing").

Dependent item

mssql.localdb.synchronizationhealth["{#DBNAME}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Trigger prototypes for Local database discovery

Name	Description	Expression
MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': "{#DBNAME}" is {ITEM.VALUE}	The local availability database has a non-working state.	`last(/MSSQL by Zabbix agent 2/mssql.local_db.state["{#DBNAME}"])>0`\|Warning
MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': "{#DBNAME}" is Not healthy	The synchronization state of the local availability database is "Not synchronizing".	`last(/MSSQL by Zabbix agent 2/mssql.local_db.synchronization_health["{#DBNAME}"])=0`\|High
MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': "{#DBNAME}" is Partially healthy	A database on a synchronous-commit availability replica is considered partially healthy if synchronization state is "Synchronizing".	`last(/MSSQL by Zabbix agent 2/mssql.local_db.synchronization_health["{#DBNAME}"])=1`\|Average

LLD rule Non-local database discovery

Name Description Type Key and additional info

Non-local database discovery

Discovery of the non-local (not local to SQL Server instance) availability databases.

Dependent item

mssql.non.local.db.discovery

Preprocessing

Discard unchanged with heartbeat: 1d

Item prototypes for Non-local database discovery

Name Description Type Key and additional info

MSSQL AG '{#GROUPNAME}' Non-Local DB '*{#REPLICANAME}*{#DBNAME}': Log queue size

Amount of the log records of the primary database that has not been sent to the secondary databases.

Dependent item

mssql.non-localdb.logsendqueuesize["{#GROUPNAME}*{#REPLICANAME}*{#DBNAME}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
Custom multiplier: 1024
Discard unchanged with heartbeat: 1h

MSSQL AG '{#GROUPNAME}' Non-Local DB '*{#REPLICANAME}*{#DBNAME}': Redo log queue size

Amount of log records in the log files of the secondary replica that has not yet been redone.

Dependent item

mssql.non-localdb.redoqueuesize["{#GROUPNAME}{#REPLICA_NAME}{#DBNAME}"]

Preprocessing

JSON Path: The text is too long. Please see the template.
Custom multiplier: 1024
Discard unchanged with heartbeat: 1h

Trigger prototypes for Non-local database discovery

Name	Description	Expression	Severity	Dependencies and additional info
MSSQL AG '{#GROUPNAME}' Non-Local DB '{#REPLICANAME}{#DBNAME}': Log queue size is growing	The log records of the primary database are not sent to the secondary databases.	`last(/MSSQL by Zabbix agent 2/mssql.non-local_db.log_send_queue_size["{#GROUP_NAME}{#REPLICA_NAME}{#DBNAME}"],#1)>last(/MSSQL by Zabbix agent 2/mssql.non-local_db.log_send_queue_size["{#GROUP_NAME}{#REPLICA_NAME}{#DBNAME}"],#2) and last(/MSSQL by Zabbix agent 2/mssql.non-local_db.log_send_queue_size["{#GROUP_NAME}{#REPLICA_NAME}{#DBNAME}"],#2)>last(/MSSQL by Zabbix agent 2/mssql.non-local_db.log_send_queue_size["{#GROUP_NAME}{#REPLICA_NAME}{#DBNAME}"],#3)`\|High
MSSQL AG '{#GROUPNAME}' Non-Local DB '{#REPLICANAME}{#DBNAME}': Redo log queue size is growing	The log records in the log files of the secondary replica have not yet been redone.	`last(/MSSQL by Zabbix agent 2/mssql.non-local_db.redo_queue_size["{#GROUP_NAME}{#REPLICA_NAME}{#DBNAME}"],#1)>last(/MSSQL by Zabbix agent 2/mssql.non-local_db.redo_queue_size["{#GROUP_NAME}{#REPLICA_NAME}{#DBNAME}"],#2) and last(/MSSQL by Zabbix agent 2/mssql.non-local_db.redo_queue_size["{#GROUP_NAME}{#REPLICA_NAME}{#DBNAME}"],#2)>last(/MSSQL by Zabbix agent 2/mssql.non-local_db.redo_queue_size["{#GROUP_NAME}{#REPLICA_NAME}{#DBNAME}"],#3)`\|High

LLD rule Quorum discovery

Name Description Type Key and additional info

Quorum discovery

Discovery of the quorum of the WSFC cluster.

Dependent item

mssql.quorum.discovery

Preprocessing

Discard unchanged with heartbeat: 1d

Item prototypes for Quorum discovery

Name Description Type Key and additional info

MSSQL Cluster '{#CLUSTER_NAME}': Quorum type

Type of quorum used by this WSFC cluster, one of:

0 = Node Majority. This quorum configuration can sustain failures of half the nodes (rounding up) minus one.

1 = Node and Disk Majority. If the disk witness remains on line, this quorum configuration can sustain failures of half the nodes (rounding up).

2 = Node and File Share Majority. This quorum configuration works in a similar way to Node and Disk Majority, but uses a file-share witness instead of a disk witness.

3 = No Majority: Disk Only. If the quorum disk is online, this quorum configuration can sustain failures of all nodes except one.

4 = Unknown Quorum. Unknown quorum for the cluster.

5 = Cloud Witness. Cluster utilizes Microsoft Azure for quorum arbitration. If the cloud witness is available, the cluster can sustain the failure of half the nodes (rounding up).

Dependent item

mssql.quorum.type.[{#CLUSTER_NAME}]

Preprocessing

JSON Path: $[?(@.cluster_name=='{#CLUSTER_NAME}')].quorum_type.first()
Discard unchanged with heartbeat: 1d

MSSQL Cluster '{#CLUSTER_NAME}': Quorum state

State of the WSFC quorum, one of:

0 = Unknown quorum state

1 = Normal quorum

2 = Forced quorum

Dependent item

mssql.quorum.state.[{#CLUSTER_NAME}]

Preprocessing

JSON Path: $[?(@.cluster_name=='{#CLUSTER_NAME}')].quorum_state.first()
Discard unchanged with heartbeat: 1h

LLD rule Quorum members discovery

Name Description Type Key and additional info

Quorum members discovery

Discovery of the quorum members of the WSFC cluster.

Dependent item

mssql.quorum.member.discovery

Preprocessing

Discard unchanged with heartbeat: 1d

Item prototypes for Quorum members discovery

Name Description Type Key and additional info

MSSQL Cluster member '{#MEMBER_NAME}': Number of quorum votes

Number of quorum votes possessed by this quorum member.

Dependent item

mssql.quorummembers.numberofquorumvotes.[{#MEMBER_NAME}]

Preprocessing

JSON Path: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1d

MSSQL Cluster member '{#MEMBER_NAME}': Member type

The type of member, one of:

0 = WSFC node

1 = Disk witness

2 = File share witness

3 = Cloud Witness

Dependent item

mssql.quorummembers.membertype.[{#MEMBER_NAME}]

Preprocessing

JSON Path: $[?(@.member_name=='{#MEMBER_NAME}')].member_type.first()
Discard unchanged with heartbeat: 1d

MSSQL Cluster member '{#MEMBER_NAME}': Member state

The member state, one of:

0 = Offline

1 = Online

Dependent item

mssql.quorummembers.memberstate.[{#MEMBER_NAME}]

Preprocessing

JSON Path: $[?(@.member_name=='{#MEMBER_NAME}')].member_state.first()
Discard unchanged with heartbeat: 1h

LLD rule Replication discovery

Name Description Type Key and additional info

Replication discovery

Discovery of the database replicas.

Dependent item

mssql.replica.discovery

Preprocessing

Discard unchanged with heartbeat: 1d

Item prototypes for Replication discovery

Name	Description	Type	Key and additional info
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Connected state	Whether a secondary replica is currently connected to the primary replica: 0 = Disconnected. The response of an availability replica to the "Disconnected" state depends on its role: On the primary replica, if a secondary replica is disconnected, its secondary databases are marked as "Not synchronized" on the primary replica, which waits for the secondary to reconnect; On a secondary replica, upon detecting that it is disconnected, the secondary replica attempts to reconnect to the primary replica. 1 = Connected. Each primary replica tracks the connection state for every secondary replica in the same availability group. Secondary replicas track the connection state of only the primary replica.	Dependent item	mssql.replica.connectedstate["{#GROUPNAME}{#REPLICANAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Is local	Whether the replica is local: 0 = Indicates a remote secondary replica in an availability group whose primary replica is hosted by the local server instance. This value occurs only on the primary replica location. 1 = Indicates a local replica. On secondary replicas, this is the only available value for the availability group to which the replica belongs.	Dependent item	mssql.replica.islocal["{#GROUPNAME}{#REPLICANAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Join state	0 = Not joined 1 = Joined, standalone instance 2 = Joined, failover cluster instance	Dependent item	mssql.replica.joinstate["{#GROUPNAME}{#REPLICANAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Operational state	Current operational state of the replica: 0 = Pending failover 1 = Pending 2 = Online 3 = Offline 4 = Failed 5 = Failed, no quorum 6 = Not local	Dependent item	mssql.replica.operationalstate["{#GROUPNAME}{#REPLICANAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Recovery health	Rollup of the "databasestate" column of the `sys.dm_hadr_database_replica_states` dynamic management view: 0 = In progress. At least one joined database has a database state other than "Online" (databasestate is not "0"). 1 = Online. All the joined databases have a database state of "Online" (database_state is "0").	Dependent item	mssql.replica.recoveryhealth["{#GROUPNAME}{#REPLICANAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Role	Current Always On availability group role of a local replica or a connected remote replica: 0 = Resolving 1 = Primary 2 = Secondary	Dependent item	mssql.replica.role["{#GROUPNAME}{#REPLICA_NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Sync health	Reflects a rollup of the database synchronization state (synchronization_state) of all joined availability databases (also known as replicas) and the availability mode of the replica (synchronous-commit or asynchronous-commit mode). The rollup will reflect the least healthy accumulated state of the databases on the replica: 0 = Not healthy. At least one joined database is in the "Not synchronizing" state. 1 = Partially healthy. Some replicas are not in the target synchronization state: synchronous-commit replicas should be synchronized, and asynchronous-commit replicas should be synchronizing. 2 = Healthy. All replicas are in the target synchronization state: synchronous-commit replicas are synchronized, and asynchronous-commit replicas are synchronizing.	Dependent item	mssql.replica.synchronizationhealth["{#GROUPNAME}{#REPLICANAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Replication discovery

Name	Description	Expression
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is disconnected	The response of an availability replica to the "Disconnected" state depends on its role: On the primary replica, if a secondary replica is disconnected, its secondary databases are marked as "Not synchronized" on the primary replica, which waits for the secondary to reconnect; On a secondary replica, upon detecting that it is disconnected, the secondary replica attempts to reconnect to the primary replica.	`last(/MSSQL by Zabbix agent 2/mssql.replica.connected_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=0 and last(/MSSQL by Zabbix agent 2/mssql.replica.role["{#GROUP_NAME}_{#REPLICA_NAME}"])=2`\|Warning
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is {ITEM.VALUE}	The operational state of the replica in a given availability group is "Pending" or "Offline".	`last(/MSSQL by Zabbix agent 2/mssql.replica.operational_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=0 or last(/MSSQL by Zabbix agent 2/mssql.replica.operational_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=1 or last(/MSSQL by Zabbix agent 2/mssql.replica.operational_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=3`\|Warning
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is {ITEM.VALUE}	The operational state of the replica in a given availability group is "Failed".	`last(/MSSQL by Zabbix agent 2/mssql.replica.operational_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=4`\|Average
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is {ITEM.VALUE}	The operational state of the replica in a given availability group is "Failed, no quorum".	`last(/MSSQL by Zabbix agent 2/mssql.replica.operational_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=5`\|High
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} Recovery in progress	At least one joined database has a database state other than "Online".	`last(/MSSQL by Zabbix agent 2/mssql.replica.recovery_health["{#GROUP_NAME}_{#REPLICA_NAME}"])=0`\|Info
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is Not healthy	At least one joined database is in the "Not synchronizing" state.	`last(/MSSQL by Zabbix agent 2/mssql.replica.synchronization_health["{#GROUP_NAME}_{#REPLICA_NAME}"])=0`\|Average
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is Partially healthy	Some replicas are not in the target synchronization state: synchronous-commit replicas should be synchronized, and asynchronous-commit replicas should be synchronizing.	`last(/MSSQL by Zabbix agent 2/mssql.replica.synchronization_health["{#GROUP_NAME}_{#REPLICA_NAME}"])=1`\|Warning

LLD rule Mirroring discovery

Name Description Type Key and additional info

Mirroring discovery

Dependent item

mssql.mirroring.discovery

Preprocessing

Discard unchanged with heartbeat: 1d

Item prototypes for Mirroring discovery

Name	Description	Type	Key and additional info
MSSQL Mirroring '{#DBNAME}': Role	Current role of the local database plays in the database mirroring session. 1 = Principal 2 = Mirror	Dependent item	mssql.mirroring.role["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.dbname=='{#DBNAME}')].mirroring_role.first()` Discard unchanged with heartbeat: `1h`
MSSQL Mirroring '{#DBNAME}': Role sequence	The number of times that mirroring partners have switched the principal and mirror roles due to a failover or forced service.	Dependent item	mssql.mirroring.role_sequence["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.dbname=='{#DBNAME}')].mirroring_role_sequence.first()` Simple change Discard unchanged with heartbeat: `1h`
MSSQL Mirroring '{#DBNAME}': State	State of the mirror database and of the database mirroring session. 0 = Suspended 1 = Disconnected from the other partner 2 = Synchronizing 3 = Pending failover 4 = Synchronized 5 = The partners are not synchronized. Failover is not possible now. 6 = The partners are synchronized. Failover is potentially possible. For information about the requirements for the failover, see Database Mirroring Operating Modes: https://learn.microsoft.com/en-us/sql/database-engine/database-mirroring/database-mirroring-operating-modes?view=sql-server-ver16.	Dependent item	mssql.mirroring.state["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.dbname=='{#DBNAME}')].mirroring_state.first()` Discard unchanged with heartbeat: `1h`
MSSQL Mirroring '{#DBNAME}': Witness state	State of the witness in the database mirroring session of the database: 0 = Unknown 1 = Connected 2 = Disconnected	Dependent item	mssql.mirroring.witness_state["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.dbname=='{#DBNAME}')].mirroring_witness_state.first()` Discard unchanged with heartbeat: `1h`
MSSQL Mirroring '{#DBNAME}': Safety level	Safety setting for updates on the mirror database: 0 = Unknown state 1 = Off [asynchronous] 2 = Full [synchronous]	Dependent item	mssql.mirroring.safety_level["{#DBNAME}"] Preprocessing JSON Path: `$[?(@.dbname=='{#DBNAME}')].mirroring_safety_level.first()` Discard unchanged with heartbeat: `1h`

Trigger prototypes for Mirroring discovery

Name	Description	Expression
MSSQL Mirroring '{#DBNAME}': "{#DBNAME}" is {ITEM.VALUE}	The state of the mirror database and of the database mirroring session is "Suspended", "Disconnected from the other partner", or "Synchronizing".	`last(/MSSQL by Zabbix agent 2/mssql.mirroring.state["{#DBNAME}"])>=0 and last(/MSSQL by Zabbix agent 2/mssql.mirroring.state["{#DBNAME}"])<=2`\|Info
MSSQL Mirroring '{#DBNAME}': "{#DBNAME}" is {ITEM.VALUE}	The state of the mirror database and of the database mirroring session is "Pending failover".	`last(/MSSQL by Zabbix agent 2/mssql.mirroring.state["{#DBNAME}"])=3`\|Warning
MSSQL Mirroring '{#DBNAME}': "{#DBNAME}" is {ITEM.VALUE}	The state of the mirror database and of the database mirroring session is "Not synchronized". The partners are not synchronized. A failover is not possible now.	`last(/MSSQL by Zabbix agent 2/mssql.mirroring.state["{#DBNAME}"])=5`\|High
MSSQL Mirroring '{#DBNAME}': "{#DBNAME}" Witness is disconnected	The state of the witness in the database mirroring session of the database is "Disconnected".	`last(/MSSQL by Zabbix agent 2/mssql.mirroring.witness_state["{#DBNAME}"])=2`\|Warning

LLD rule Job discovery

Name Description Type Key and additional info

Job discovery

Scanning jobs in DBMS.

Dependent item

mssql.job.discovery

Preprocessing

Discard unchanged with heartbeat: 1d

Item prototypes for Job discovery

Name	Description	Type	Key and additional info
MSSQL Job '{#JOBNAME}': Get job status	The item gets the status of SQL agent job {#JOBNAME}.	Dependent item	mssql.job.status_raw["{#JOBNAME}"] Preprocessing JSON Path: `$[?(@.job_name=='{#JOBNAME}')].first()` ⛔️Custom on fail: Discard value
MSSQL Job '{#JOBNAME}': Enabled	The possible values of the job status: 0 = Disabled 1 = Enabled	Dependent item	mssql.job.enabled["{#JOBNAME}"] Preprocessing JSON Path: `$.enabled` Discard unchanged with heartbeat: `12h`
MSSQL Job '{#JOBNAME}': Last run date-time	The last date-time of the job run.	Dependent item	mssql.job.lastrundatetime["{#JOBNAME}"] Preprocessing JSON Path: `$.last_run_date_time` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `12h`
MSSQL Job '{#JOBNAME}': Next run date-time	The next date-time of the job run.	Dependent item	mssql.job.nextrundatetime["{#JOBNAME}"] Preprocessing JSON Path: `$.next_run_date_time` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `12h`
MSSQL Job '{#JOBNAME}': Last run status message	An informational message about the last run of the job.	Dependent item	mssql.job.lastrunstatusmessage["{#JOBNAME}"] Preprocessing JSON Path: `$.last_run_status_message` Discard unchanged with heartbeat: `12h`
MSSQL Job '{#JOBNAME}': Run status	The possible values of the job status: 0 ⇒ Failed 1 ⇒ Succeeded 2 ⇒ Retry 3 ⇒ Canceled 4 ⇒ Running	Dependent item	mssql.job.runstatus["{#JOBNAME}"] Preprocessing JSON Path: `$.run_status` Discard unchanged with heartbeat: `15m`
MSSQL Job '{#JOBNAME}': Run duration	Duration of the last-run job.	Dependent item	mssql.job.run_duration["{#JOBNAME}"] Preprocessing JSON Path: `$.run_duration` Discard unchanged with heartbeat: `15m`

Trigger prototypes for Job discovery

Name	Description	Expression	Severity	Dependencies and additional info
MSSQL Job '{#JOBNAME}': Failed to run	The last run of the job has failed.	`last(/MSSQL by Zabbix agent 2/mssql.job.runstatus["{#JOBNAME}"])=0`\|Warning	Manual close: Yes
MSSQL Job '{#JOBNAME}': Job duration is high	The job is taking too long.	`last(/MSSQL by Zabbix agent 2/mssql.job.run_duration["{#JOBNAME}"])>{$MSSQL.BACKUP_DURATION.WARN:"{#JOBNAME}"}`\|Warning	Manual close: Yes

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

db_mongodb_cluster

View README Download JSON

MongoDB cluster by Zabbix agent 2

Overview

The template to monitor MongoDB sharded cluster by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

MongoDB cluster by Zabbix agent 2 — collects metrics from mongos proxy(router) by polling zabbix-agent2.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

MongoDB 4.0.21, 4.4.3

Configuration

Setup

Setup and configure zabbix-agent2 compiled with the MongoDB monitoring plugin.
Set the {$MONGODB.CONNSTRING} such as
Set the user name and password in host macros ({$MONGODB.USER}, {$MONGODB.PASSWORD}) if you want to override parameters from the Zabbix agent configuration file.

Note, depending on the number of DBs and collections discovery operation may be expensive. Use filters with macros {$MONGODB.LLD.FILTER.DB.MATCHES}, {$MONGODB.LLD.FILTER.DB.NOTMATCHES}, {$MONGODB.LLD.FILTER.COLLECTION.MATCHES}, {$MONGODB.LLD.FILTER.COLLECTION.NOTMATCHES}.

All sharded Mongodb nodes (mongod) will be discovered with attached template "MongoDB node by Zabbix agent 2".

Test availability: zabbix_get -s mongos.node -k 'mongodb.ping["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"]"

Macros used

Name	Description	Default
{$MONGODB.CONNSTRING}	Connection string in the URI format (password is not used). This param overwrites a value configured in the "Server" option of the configuration file (if it's set), otherwise, the plugin's default value is used: "tcp://localhost:27017"	`tcp://localhost:27017`
{$MONGODB.USER}	MongoDB username
{$MONGODB.PASSWORD}	MongoDB user password
{$MONGODB.CONNS.AVAILABLE.MIN.WARN}	Minimum number of available connections	`1000`
{$MONGODB.LLD.FILTER.COLLECTION.MATCHES}	Filter of discoverable collections	`.*`
{$MONGODB.LLD.FILTER.COLLECTION.NOT_MATCHES}	Filter to exclude discovered collections	`CHANGE_IF_NEEDED`
{$MONGODB.LLD.FILTER.DB.MATCHES}	Filter of discoverable databases	`.*`
{$MONGODB.LLD.FILTER.DB.NOT_MATCHES}	Filter to exclude discovered databases	`(admin\|config\|local)`
{$MONGODB.CURSOR.TIMEOUT.MAX.WARN}	Maximum number of cursors timing out per second	`1`
{$MONGODB.CURSOR.OPEN.MAX.WARN}	Maximum number of open cursors	`10000`

Items

Name	Description	Type	Key and additional info
MongoDB cluster: Get server status	The mongos statistic	Zabbix agent	mongodb.server.status["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"]
MongoDB cluster: Get mongodb.connpool.stats	Returns current info about connpool.stats.	Zabbix agent	mongodb.connpool.stats["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"]
MongoDB cluster: Ping	Test if a connection is alive or not.	Zabbix agent	mongodb.ping["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] Preprocessing Discard unchanged with heartbeat: `30m`
MongoDB cluster: Jumbo chunks	Total number of 'jumbo' chunks in the mongo cluster.	Zabbix agent	mongodb.jumbo_chunks.count["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"]
MongoDB cluster: Mongos version	Version of the Mongos server	Dependent item	mongodb.version Preprocessing JSON Path: `$.version` Discard unchanged with heartbeat: `3h`
MongoDB cluster: Uptime	Number of seconds since the Mongos server start.	Dependent item	mongodb.uptime Preprocessing JSON Path: `$.uptime`
MongoDB cluster: Operations: command	The number of commands issued to the database per second. Counts all commands except the write commands: insert, update, and delete.	Dependent item	mongodb.opcounters.command.rate Preprocessing JSON Path: `$.opcounters.command` Change per second
MongoDB cluster: Operations: delete	The number of delete operations the mongos instance per second.	Dependent item	mongodb.opcounters.delete.rate Preprocessing JSON Path: `$.opcounters.delete` Change per second
MongoDB cluster: Operations: update, rate	The number of update operations the mongos instance per second.	Dependent item	mongodb.opcounters.update.rate Preprocessing JSON Path: `$.opcounters.update` Change per second
MongoDB cluster: Operations: query, rate	The number of queries received the mongos instance per second.	Dependent item	mongodb.opcounters.query.rate Preprocessing JSON Path: `$.opcounters.query` Change per second
MongoDB cluster: Operations: insert, rate	The number of insert operations received the mongos instance per second.	Dependent item	mongodb.opcounters.insert.rate Preprocessing JSON Path: `$.opcounters.insert` Change per second
MongoDB cluster: Operations: getmore, rate	The number of "getmore" operations the mongos per second. This counter can be high even if the query count is low. Secondary nodes send getMore operations as part of the replication process.	Dependent item	mongodb.opcounters.getmore.rate Preprocessing JSON Path: `$.opcounters.getmore` Change per second
MongoDB cluster: Last seen configserver	The latest optime of the CSRS primary that the mongos has seen.	Dependent item	mongodb.lastseenconfig_server Preprocessing JSON Path: `$.sharding.lastSeenConfigServerOpTime.ts.T`
MongoDB cluster: Configserver heartbeat	Difference between the latest optime of the CSRS primary that the mongos has seen and cluster time.	Dependent item	mongodb.configserverheartbeat Preprocessing JavaScript: `The text is too long. Please see the template.`
MongoDB cluster: Bytes in, rate	The total number of bytes that the server has received over network connections initiated by clients or other mongod/mongos instances per second.	Dependent item	mongodb.network.bytes_in.rate Preprocessing JSON Path: `$.network.bytesIn` Change per second
MongoDB cluster: Bytes out, rate	The total number of bytes that the server has sent over network connections initiated by clients or other mongod/mongos instances per second.	Dependent item	mongodb.network.bytes_out.rate Preprocessing JSON Path: `$.network.bytesOut` Change per second
MongoDB cluster: Requests, rate	Number of distinct requests that the server has received per second.	Dependent item	mongodb.network.numRequests.rate Preprocessing JSON Path: `$.network.numRequests` Change per second
MongoDB cluster: Connections, current	The number of incoming connections from clients to the database server. This number includes the current shell session.	Dependent item	mongodb.connections.current Preprocessing JSON Path: `$.connections.current`
MongoDB cluster: New connections, rate	"Rate of all incoming connections created to the server."	Dependent item	mongodb.connections.rate Preprocessing JSON Path: `$.connections.totalCreated` Change per second
MongoDB cluster: Connections, active	The number of active client connections to the server. Active client connections refers to client connections that currently have operations in progress. Available starting in 4.0.7, 0 for older versions.	Dependent item	mongodb.connections.active Preprocessing JSON Path: `$.connections.active` ⛔️Custom on fail: Set value to: `0`
MongoDB cluster: Connections, available	"The number of unused incoming connections available."	Dependent item	mongodb.connections.available Preprocessing JSON Path: `$.connections.available`
MongoDB cluster: Connection pool: client connections	The number of active and stored outgoing synchronous connections from the current mongos instance to other members of the sharded cluster.	Dependent item	mongodb.connection_pool.client Preprocessing JSON Path: `$.numClientConnections`
MongoDB cluster: Connection pool: scoped	Number of active and stored outgoing scoped synchronous connections from the current mongos instance to other members of the sharded cluster.	Dependent item	mongodb.connection_pool.scoped Preprocessing JSON Path: `$.numAScopedConnections`
MongoDB cluster: Connection pool: created, rate	The total number of outgoing connections created per second by the current mongos instance to other members of the sharded cluster.	Dependent item	mongodb.connection_pool.created.rate Preprocessing JSON Path: `$.totalCreated` Change per second
MongoDB cluster: Connection pool: available	The total number of available outgoing connections from the current mongos instance to other members of the sharded cluster.	Dependent item	mongodb.connection_pool.available Preprocessing JSON Path: `$.totalAvailable`
MongoDB cluster: Connection pool: in use	Reports the total number of outgoing connections from the current mongos instance to other members of the sharded cluster set that are currently in use.	Dependent item	mongodb.connectionpool.inuse Preprocessing JSON Path: `$.totalInUse`
MongoDB cluster: Connection pool: refreshing	Reports the total number of outgoing connections from the current mongos instance to other members of the sharded cluster that are currently being refreshed.	Dependent item	mongodb.connection_pool.refreshing Preprocessing JSON Path: `$.totalRefreshing`
MongoDB cluster: Cursor: open no timeout	Number of open cursors with the option DBQuery.Option.noTimeout set to prevent timeout after a period of inactivity.	Dependent item	mongodb.metrics.cursor.open.no_timeout Preprocessing JSON Path: `$.metrics.cursor.open.noTimeout` ⛔️Custom on fail: Discard value
MongoDB cluster: Cursor: open pinned	Number of pinned open cursors.	Dependent item	mongodb.cursor.open.pinned Preprocessing JSON Path: `$.metrics.cursor.open.pinned`
MongoDB cluster: Cursor: open total	Number of cursors that MongoDB is maintaining for clients.	Dependent item	mongodb.cursor.open.total Preprocessing JSON Path: `$.metrics.cursor.open.total`
MongoDB cluster: Cursor: timed out, rate	Number of cursors that time out, per second.	Dependent item	mongodb.cursor.timed_out.rate Preprocessing JSON Path: `$.metrics.cursor.timedOut` Change per second
MongoDB cluster: Architecture	A number, either 64 or 32, that indicates whether the MongoDB instance is compiled for 64-bit or 32-bit architecture.	Dependent item	mongodb.mem.bits Preprocessing JSON Path: `$.mem.bits` Discard unchanged with heartbeat: `3h`
MongoDB cluster: Memory: resident	Amount of memory currently used by the database process.	Dependent item	mongodb.mem.resident Preprocessing JSON Path: `$.mem.resident` Custom multiplier: `1048576`
MongoDB cluster: Memory: virtual	Amount of virtual memory used by the mongos process.	Dependent item	mongodb.mem.virtual Preprocessing JSON Path: `$.mem.virtual` Custom multiplier: `1048576`

Triggers

Name	Description	Expression	Severity
MongoDB cluster: Connection to mongos proxy is unavailable	Connection to mongos proxy instance is currently unavailable.	`last(/MongoDB cluster by Zabbix agent 2/mongodb.ping["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"])=0`\|High
MongoDB cluster: Version has changed	MongoDB cluster version has changed. Acknowledge to close the problem manually.	`last(/MongoDB cluster by Zabbix agent 2/mongodb.version,#1)<>last(/MongoDB cluster by Zabbix agent 2/mongodb.version,#2) and length(last(/MongoDB cluster by Zabbix agent 2/mongodb.version))>0`\|Info	Manual close: Yes
MongoDB cluster: Mongos server has been restarted	Uptime is less than 10 minutes.	`last(/MongoDB cluster by Zabbix agent 2/mongodb.uptime)<10m`\|Info	Manual close: Yes
MongoDB cluster: Failed to fetch info data	Zabbix has not received data for items for the last 10 minutes	`nodata(/MongoDB cluster by Zabbix agent 2/mongodb.uptime,10m)=1`\|Warning	Manual close: Yes Depends on: MongoDB cluster: Connection to mongos proxy is unavailable
MongoDB cluster: Available connections is low	Too few available connections. Consider this value in combination with the value of connections current to understand the connection load on the database.	`max(/MongoDB cluster by Zabbix agent 2/mongodb.connections.available,5m)<{$MONGODB.CONNS.AVAILABLE.MIN.WARN}`\|Warning
MongoDB cluster: Too many cursors opened by MongoDB for clients		`min(/MongoDB cluster by Zabbix agent 2/mongodb.cursor.open.total,5m)>{$MONGODB.CURSOR.OPEN.MAX.WARN}`\|Warning
MongoDB cluster: Too many cursors are timing out		`min(/MongoDB cluster by Zabbix agent 2/mongodb.cursor.timed_out.rate,5m)>{$MONGODB.CURSOR.TIMEOUT.MAX.WARN}`\|Warning

LLD rule Database discovery

Name

Description

Type

Key and additional info

Database discovery

Collect database metrics.

Note, depending on the number of DBs this discovery operation may be expensive. Use filters with macros {$MONGODB.LLD.FILTER.DB.MATCHES}, {$MONGODB.LLD.FILTER.DB.NOT_MATCHES}.

Zabbix agent

mongodb.db.discovery["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"]

Item prototypes for Database discovery

Name	Description	Type	Key and additional info
MongoDB {#DBNAME}: Get db stats {#DBNAME}	Returns statistics reflecting the database system's state.	Zabbix agent	mongodb.db.stats["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}","{#DBNAME}"]
MongoDB {#DBNAME}: Objects, avg size	The average size of each document in bytes.	Dependent item	mongodb.db.size["{#DBNAME}"] Preprocessing JSON Path: `$.avgObjSize`
MongoDB {#DBNAME}: Size, data	Total size of the data held in this database including the padding factor.	Dependent item	mongodb.db.data_size["{#DBNAME}"] Preprocessing JSON Path: `$.dataSize`
MongoDB {#DBNAME}: Size, file	Total size of the data held in this database including the padding factor (only available with the mmapv1 storage engine).	Dependent item	mongodb.db.file_size["{#DBNAME}"] Preprocessing JSON Path: `$.fileSize` ⛔️Custom on fail: Discard value
MongoDB {#DBNAME}: Size, index	Total size of all indexes created on this database.	Dependent item	mongodb.db.index_size["{#DBNAME}"] Preprocessing JSON Path: `$.indexSize`
MongoDB {#DBNAME}: Size, storage	Total amount of space allocated to collections in this database for document storage.	Dependent item	mongodb.db.storage_size["{#DBNAME}"] Preprocessing JSON Path: `$.storageSize`
MongoDB {#DBNAME}: Objects, count	Number of objects (documents) in the database across all collections.	Dependent item	mongodb.db.objects["{#DBNAME}"] Preprocessing JSON Path: `$.objects`
MongoDB {#DBNAME}: Extents	Contains a count of the number of extents in the database across all collections.	Dependent item	mongodb.db.extents["{#DBNAME}"] Preprocessing JSON Path: `$.numExtents` ⛔️Custom on fail: Discard value

LLD rule Collection discovery

Name

Description

Type

Key and additional info

Collection discovery

Collect collections metrics.

Note, depending on the number of DBs and collections this discovery operation may be expensive. Use filters with macros {$MONGODB.LLD.FILTER.DB.MATCHES}, {$MONGODB.LLD.FILTER.DB.NOTMATCHES}, {$MONGODB.LLD.FILTER.COLLECTION.MATCHES}, {$MONGODB.LLD.FILTER.COLLECTION.NOTMATCHES}.

Zabbix agent

mongodb.collections.discovery["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"]

Item prototypes for Collection discovery

Name	Description	Type	Key and additional info
MongoDB {#DBNAME}.{#COLLECTION}: Get collection stats {#DBNAME}.{#COLLECTION}	Returns a variety of storage statistics for a given collection.	Zabbix agent	mongodb.collection.stats["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}","{#DBNAME}","{#COLLECTION}"]
MongoDB {#DBNAME}.{#COLLECTION}: Size	The total size in bytes of the data in the collection plus the size of every indexes on the mongodb.collection.	Dependent item	mongodb.collection.size["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.size`
MongoDB {#DBNAME}.{#COLLECTION}: Objects, avg size	The size of the average object in the collection in bytes.	Dependent item	mongodb.collection.avgobjsize["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.avgObjSize` ⛔️Custom on fail: Discard value
MongoDB {#DBNAME}.{#COLLECTION}: Objects, count	Total number of objects in the collection.	Dependent item	mongodb.collection.count["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.count`
MongoDB {#DBNAME}.{#COLLECTION}: Capped, max number	Maximum number of documents in a capped collection.	Dependent item	mongodb.collection.max["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.max` ⛔️Custom on fail: Discard value
MongoDB {#DBNAME}.{#COLLECTION}: Capped, max size	Maximum size of a capped collection in bytes.	Dependent item	mongodb.collection.max_size["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.maxSize` ⛔️Custom on fail: Discard value
MongoDB {#DBNAME}.{#COLLECTION}: Storage size	Total storage space allocated to this collection for document storage.	Dependent item	mongodb.collection.storage_size["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.storageSize`
MongoDB {#DBNAME}.{#COLLECTION}: Indexes	Total number of indices on the collection.	Dependent item	mongodb.collection.nindexes["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.nindexes`
MongoDB {#DBNAME}.{#COLLECTION}: Capped	Whether or not the collection is capped.	Dependent item	mongodb.collection.capped["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.capped` Boolean to decimal Discard unchanged with heartbeat: `3h`

LLD rule Shards discovery

Name	Description	Type	Key and additional info
Shards discovery	Discovery shared cluster hosts.	Zabbix agent	mongodb.sh.discovery["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"]

LLD rule Config servers discovery

Name	Description	Type	Key and additional info
Config servers discovery	Discovery shared cluster config servers.	Zabbix agent	mongodb.cfg.discovery["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"]

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

db_mongodb

View README Download JSON

MongoDB node by Zabbix agent 2

Overview

The template to monitor single MongoDB server by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

MongoDB node by Zabbix Agent 2 — collects metrics by polling zabbix-agent2.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

MongoDB 4.0.21, 4.4.3

Configuration

Setup

Setup and configure zabbix-agent2 compiled with the MongoDB monitoring plugin.
Set the {$MONGODB.CONNSTRING} such as
Set the user name and password in host macros ({$MONGODB.USER}, {$MONGODB.PASSWORD}) if you want to override parameters from the Zabbix agent configuration file.

Test availability: zabbix_get -s mongodb.node -k 'mongodb.ping["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"]"

Macros used

Name	Description	Default
{$MONGODB.CONNSTRING}	Connection string in the URI format (password is not used). This param overwrites a value configured in the "Server" option of the configuration file (if it's set), otherwise, the plugin's default value is used: "tcp://localhost:27017"	`tcp://localhost:27017`
{$MONGODB.USER}	MongoDB username
{$MONGODB.PASSWORD}	MongoDB user password
{$MONGODB.CONNS.PCT.USED.MAX.WARN}	Maximum percentage of used connections	`80`
{$MONGODB.CURSOR.TIMEOUT.MAX.WARN}	Maximum number of cursors timing out per second	`1`
{$MONGODB.CURSOR.OPEN.MAX.WARN}	Maximum number of open cursors	`10000`
{$MONGODB.REPL.LAG.MAX.WARN}	Maximum replication lag in seconds	`10s`
{$MONGODB.LLD.FILTER.COLLECTION.MATCHES}	Filter of discoverable collections	`.*`
{$MONGODB.LLD.FILTER.COLLECTION.NOT_MATCHES}	Filter to exclude discovered collections	`CHANGE_IF_NEEDED`
{$MONGODB.LLD.FILTER.DB.MATCHES}	Filter of discoverable databases	`.*`
{$MONGODB.LLD.FILTER.DB.NOT_MATCHES}	Filter to exclude discovered databases	`(admin\|config\|local)`
{$MONGODB.WIRED_TIGER.TICKETS.AVAILABLE.MIN.WARN}	Minimum number of available WiredTiger read or write tickets remaining	`5`

Items

Name	Description	Type	Key and additional info
MongoDB: Get server status	Returns a database's state.	Zabbix agent	mongodb.server.status["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"]
MongoDB: Get Replica Set status	Returns the replica set status from the point of view of the member where the method is run.	Zabbix agent	mongodb.rs.status["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"]
MongoDB: Get oplog stats	Returns status of the replica set, using data polled from the oplog.	Zabbix agent	mongodb.oplog.stats["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"]
MongoDB: Ping	Test if a connection is alive or not.	Zabbix agent	mongodb.ping["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] Preprocessing Discard unchanged with heartbeat: `30m`
MongoDB: Get collections usage stats	Returns usage statistics for each collection.	Zabbix agent	mongodb.collections.usage["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"]
MongoDB: MongoDB version	Version of the MongoDB server.	Dependent item	mongodb.version Preprocessing JSON Path: `$.version` Discard unchanged with heartbeat: `3h`
MongoDB: Uptime	Number of seconds that the mongod process has been active.	Dependent item	mongodb.uptime Preprocessing JSON Path: `$.uptime`
MongoDB: Asserts: message, rate	The number of message assertions raised per second. Check the log file for more information about these messages.	Dependent item	mongodb.asserts.msg.rate Preprocessing JSON Path: `$.asserts.msg` Change per second
MongoDB: Asserts: user, rate	The number of "user asserts" that have occurred per second. These are errors that user may generate, such as out of disk space or duplicate key.	Dependent item	mongodb.asserts.user.rate Preprocessing JSON Path: `$.asserts.user` Change per second
MongoDB: Asserts: warning, rate	The number of warnings raised per second.	Dependent item	mongodb.asserts.warning.rate Preprocessing JSON Path: `$.asserts.warning` Change per second
MongoDB: Asserts: regular, rate	The number of regular assertions raised per second. Check the log file for more information about these messages.	Dependent item	mongodb.asserts.regular.rate Preprocessing JSON Path: `$.asserts.regular` Change per second
MongoDB: Asserts: rollovers, rate	Number of times that the rollover counters roll over per second. The counters rollover to zero every 2^30 assertions.	Dependent item	mongodb.asserts.rollovers.rate Preprocessing JSON Path: `$.asserts.rollovers` Change per second
MongoDB: Active clients: writers	The number of active client connections performing write operations.	Dependent item	mongodb.active_clients.writers Preprocessing JSON Path: `$.globalLock.activeClients.writers`
MongoDB: Active clients: readers	The number of the active client connections performing read operations.	Dependent item	mongodb.active_clients.readers Preprocessing JSON Path: `$.globalLock.activeClients.readers`
MongoDB: Active clients: total	The total number of internal client connections to the database including system threads as well as queued readers and writers.	Dependent item	mongodb.active_clients.total Preprocessing JSON Path: `$.globalLock.activeClients.total`
MongoDB: Current queue: writers	The number of operations that are currently queued and waiting for the write lock. A consistently small write-queue, particularly of shorter operations, is no cause for concern.	Dependent item	mongodb.current_queue.writers Preprocessing JSON Path: `$.globalLock.currentQueue.writers`
MongoDB: Current queue: readers	The number of operations that are currently queued and waiting for the read lock. A consistently small read-queue, particularly of shorter operations, should cause no concern.	Dependent item	mongodb.current_queue.readers Preprocessing JSON Path: `$.globalLock.currentQueue.readers`
MongoDB: Current queue: total	The total number of operations queued waiting for the lock.	Dependent item	mongodb.current_queue.total Preprocessing JSON Path: `$.globalLock.currentQueue.total`
MongoDB: Operations: command, rate	The number of commands issued to the database the mongod instance per second. Counts all commands except the write commands: insert, update, and delete.	Dependent item	mongodb.opcounters.command.rate Preprocessing JSON Path: `$.opcounters.command` Change per second
MongoDB: Operations: delete, rate	The number of delete operations the mongod instance per second.	Dependent item	mongodb.opcounters.delete.rate Preprocessing JSON Path: `$.opcounters.delete` Change per second
MongoDB: Operations: update, rate	The number of update operations the mongod instance per second.	Dependent item	mongodb.opcounters.update.rate Preprocessing JSON Path: `$.opcounters.update` Change per second
MongoDB: Operations: query, rate	The number of queries received the mongod instance per second.	Dependent item	mongodb.opcounters.query.rate Preprocessing JSON Path: `$.opcounters.query` Change per second
MongoDB: Operations: insert, rate	The number of insert operations received since the mongod instance per second.	Dependent item	mongodb.opcounters.insert.rate Preprocessing JSON Path: `$.opcounters.insert` Change per second
MongoDB: Operations: getmore, rate	The number of "getmore" operations since the mongod instance per second. This counter can be high even if the query count is low. Secondary nodes send getMore operations as part of the replication process.	Dependent item	mongodb.opcounters.getmore.rate Preprocessing JSON Path: `$.opcounters.getmore` Change per second
MongoDB: Connections, current	The number of incoming connections from clients to the database server. This number includes the current shell session.	Dependent item	mongodb.connections.current Preprocessing JSON Path: `$.connections.current`
MongoDB: New connections, rate	Rate of all incoming connections created to the server.	Dependent item	mongodb.connections.rate Preprocessing JSON Path: `$.connections.totalCreated` Change per second
MongoDB: Connections, available	The number of unused incoming connections available.	Dependent item	mongodb.connections.available Preprocessing JSON Path: `$.connections.available`
MongoDB: Connections, active	The number of active client connections to the server. Active client connections refers to client connections that currently have operations in progress. Available starting in 4.0.7, 0 for older versions.	Dependent item	mongodb.connections.active Preprocessing JSON Path: `$.connections.active` ⛔️Custom on fail: Discard value
MongoDB: Bytes in, rate	The total number of bytes that the server has received over network connections initiated by clients or other mongod/mongos instances per second.	Dependent item	mongodb.network.bytes_in.rate Preprocessing JSON Path: `$.network.bytesIn` Change per second
MongoDB: Bytes out, rate	The total number of bytes that the server has sent over network connections initiated by clients or other mongod/mongos instances per second.	Dependent item	mongodb.network.bytes_out.rate Preprocessing JSON Path: `$.network.bytesOut` Change per second
MongoDB: Requests, rate	Number of distinct requests that the server has received per second	Dependent item	mongodb.network.numRequests.rate Preprocessing JSON Path: `$.network.numRequests` Change per second
MongoDB: Document: deleted, rate	Number of documents deleted per second.	Dependent item	mongod.document.deleted.rate Preprocessing JSON Path: `$.metrics.document.deleted` Change per second
MongoDB: Document: inserted, rate	Number of documents inserted per second.	Dependent item	mongod.document.inserted.rate Preprocessing JSON Path: `$.metrics.document.inserted` Change per second
MongoDB: Document: returned, rate	Number of documents returned by queries per second.	Dependent item	mongod.document.returned.rate Preprocessing JSON Path: `$.metrics.document.returned` Change per second
MongoDB: Document: updated, rate	Number of documents updated per second.	Dependent item	mongod.document.updated.rate Preprocessing JSON Path: `$.metrics.document.updated` Change per second
MongoDB: Cursor: open no timeout	Number of open cursors with the option DBQuery.Option.noTimeout set to prevent timeout after a period of inactivity.	Dependent item	mongodb.metrics.cursor.open.no_timeout Preprocessing JSON Path: `$.metrics.cursor.open.noTimeout`
MongoDB: Cursor: open pinned	Number of pinned open cursors.	Dependent item	mongodb.cursor.open.pinned Preprocessing JSON Path: `$.metrics.cursor.open.pinned`
MongoDB: Cursor: open total	Number of cursors that MongoDB is maintaining for clients.	Dependent item	mongodb.cursor.open.total Preprocessing JSON Path: `$.metrics.cursor.open.total`
MongoDB: Cursor: timed out, rate	Number of cursors that time out, per second.	Dependent item	mongodb.cursor.timed_out.rate Preprocessing JSON Path: `$.metrics.cursor.timedOut` Change per second
MongoDB: Architecture	A number, either 64 or 32, that indicates whether the MongoDB instance is compiled for 64-bit or 32-bit architecture.	Dependent item	mongodb.mem.bits Preprocessing JSON Path: `$.mem.bits` Discard unchanged with heartbeat: `3h`
MongoDB: Memory: mapped	Amount of mapped memory by the database.	Dependent item	mongodb.mem.mapped Preprocessing JSON Path: `$.mem.mapped` ⛔️Custom on fail: Discard value Custom multiplier: `1048576`
MongoDB: Memory: mapped with journal	The amount of mapped memory, including the memory used for journaling.	Dependent item	mongodb.mem.mappedwithjournal Preprocessing JSON Path: `$.mem.mappedWithJournal` ⛔️Custom on fail: Discard value Custom multiplier: `1048576`
MongoDB: Memory: resident	Amount of memory currently used by the database process.	Dependent item	mongodb.mem.resident Preprocessing JSON Path: `$.mem.resident` Custom multiplier: `1048576`
MongoDB: Memory: virtual	Amount of virtual memory used by the mongod process.	Dependent item	mongodb.mem.virtual Preprocessing JSON Path: `$.mem.virtual` Custom multiplier: `1048576`

Triggers

Name	Description	Expression	Severity
MongoDB: Connection to MongoDB is unavailable	Connection to MongoDB instance is currently unavailable.	`last(/MongoDB node by Zabbix agent 2/mongodb.ping["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"])=0`\|High
MongoDB: Version has changed	MongoDB version has changed. Acknowledge to close the problem manually.	`last(/MongoDB node by Zabbix agent 2/mongodb.version,#1)<>last(/MongoDB node by Zabbix agent 2/mongodb.version,#2) and length(last(/MongoDB node by Zabbix agent 2/mongodb.version))>0`\|Info	Manual close: Yes
MongoDB: mongod process has been restarted	Uptime is less than 10 minutes.	`last(/MongoDB node by Zabbix agent 2/mongodb.uptime)<10m`\|Info	Manual close: Yes
MongoDB: Failed to fetch info data	Zabbix has not received data for items for the last 10 minutes	`nodata(/MongoDB node by Zabbix agent 2/mongodb.uptime,10m)=1`\|Warning	Manual close: Yes Depends on: MongoDB: Connection to MongoDB is unavailable
MongoDB: Total number of open connections is too high	Too few available connections. If MongoDB runs low on connections, in may not be able to handle incoming requests in a timely manner.	`min(/MongoDB node by Zabbix agent 2/mongodb.connections.current,5m)/(last(/MongoDB node by Zabbix agent 2/mongodb.connections.available)+last(/MongoDB node by Zabbix agent 2/mongodb.connections.current))*100>{$MONGODB.CONNS.PCT.USED.MAX.WARN}`\|Warning
MongoDB: Too many cursors opened by MongoDB for clients		`min(/MongoDB node by Zabbix agent 2/mongodb.cursor.open.total,5m)>{$MONGODB.CURSOR.OPEN.MAX.WARN}`\|Warning
MongoDB: Too many cursors are timing out		`min(/MongoDB node by Zabbix agent 2/mongodb.cursor.timed_out.rate,5m)>{$MONGODB.CURSOR.TIMEOUT.MAX.WARN}`\|Warning

LLD rule Database discovery

Name

Description

Type

Key and additional info

Database discovery

Collect database metrics.

Note, depending on the number of DBs this discovery operation may be expensive. Use filters with macros {$MONGODB.LLD.FILTER.DB.MATCHES}, {$MONGODB.LLD.FILTER.DB.NOT_MATCHES}.

Zabbix agent

mongodb.db.discovery["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"]

Item prototypes for Database discovery

Name	Description	Type	Key and additional info
MongoDB {#DBNAME}: Get db stats {#DBNAME}	Returns statistics reflecting the database system's state.	Zabbix agent	mongodb.db.stats["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}","{#DBNAME}"]
MongoDB {#DBNAME}: Objects, avg size	The average size of each document in bytes.	Dependent item	mongodb.db.size["{#DBNAME}"] Preprocessing JSON Path: `$.avgObjSize`
MongoDB {#DBNAME}: Size, data	Total size of the data held in this database including the padding factor.	Dependent item	mongodb.db.data_size["{#DBNAME}"] Preprocessing JSON Path: `$.dataSize`
MongoDB {#DBNAME}: Size, file	Total size of the data held in this database including the padding factor (only available with the mmapv1 storage engine).	Dependent item	mongodb.db.file_size["{#DBNAME}"] Preprocessing JSON Path: `$.fileSize` ⛔️Custom on fail: Discard value
MongoDB {#DBNAME}: Size, index	Total size of all indexes created on this database.	Dependent item	mongodb.db.index_size["{#DBNAME}"] Preprocessing JSON Path: `$.indexSize`
MongoDB {#DBNAME}: Size, storage	Total amount of space allocated to collections in this database for document storage.	Dependent item	mongodb.db.storage_size["{#DBNAME}"] Preprocessing JSON Path: `$.storageSize`
MongoDB {#DBNAME}: Collections	Contains a count of the number of collections in that database.	Dependent item	mongodb.db.collections["{#DBNAME}"] Preprocessing JSON Path: `$.collections`
MongoDB {#DBNAME}: Objects, count	Number of objects (documents) in the database across all collections.	Dependent item	mongodb.db.objects["{#DBNAME}"] Preprocessing JSON Path: `$.objects`
MongoDB {#DBNAME}: Extents	Contains a count of the number of extents in the database across all collections.	Dependent item	mongodb.db.extents["{#DBNAME}"] Preprocessing JSON Path: `$.numExtents` ⛔️Custom on fail: Discard value

LLD rule Collection discovery

Name

Description

Type

Key and additional info

Collection discovery

Collect collections metrics.

Zabbix agent

mongodb.collections.discovery["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"]

Item prototypes for Collection discovery

Name	Description	Type	Key and additional info
MongoDB {#DBNAME}.{#COLLECTION}: Get collection stats {#DBNAME}.{#COLLECTION}	Returns a variety of storage statistics for a given collection.	Zabbix agent	mongodb.collection.stats["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}","{#DBNAME}","{#COLLECTION}"]
MongoDB {#DBNAME}.{#COLLECTION}: Size	The total size in bytes of the data in the collection plus the size of every indexes on the mongodb.collection.	Dependent item	mongodb.collection.size["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.size`
MongoDB {#DBNAME}.{#COLLECTION}: Objects, avg size	The size of the average object in the collection in bytes.	Dependent item	mongodb.collection.avgobjsize["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.avgObjSize` ⛔️Custom on fail: Discard value
MongoDB {#DBNAME}.{#COLLECTION}: Objects, count	Total number of objects in the collection.	Dependent item	mongodb.collection.count["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.count`
MongoDB {#DBNAME}.{#COLLECTION}: Capped: max number	Maximum number of documents that may be present in a capped collection.	Dependent item	mongodb.collection.max_number["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.max` ⛔️Custom on fail: Discard value
MongoDB {#DBNAME}.{#COLLECTION}: Capped: max size	Maximum size of a capped collection in bytes.	Dependent item	mongodb.collection.max_size["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.maxSize` ⛔️Custom on fail: Discard value
MongoDB {#DBNAME}.{#COLLECTION}: Storage size	Total storage space allocated to this collection for document storage.	Dependent item	mongodb.collection.storage_size["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.storageSize`
MongoDB {#DBNAME}.{#COLLECTION}: Indexes	Total number of indices on the collection.	Dependent item	mongodb.collection.nindexes["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.nindexes`
MongoDB {#DBNAME}.{#COLLECTION}: Capped	Whether or not the collection is capped.	Dependent item	mongodb.collection.capped["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.capped` Boolean to decimal Discard unchanged with heartbeat: `3h`
MongoDB {#DBNAME}.{#COLLECTION}: Operations: total, rate	The number of operations per second.	Dependent item	mongodb.collection.ops.total.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.totals["{#DBNAME}.{#COLLECTION}"].total.count` Change per second
MongoDB {#DBNAME}.{#COLLECTION}: Read lock, rate	The number of operations per second.	Dependent item	mongodb.collection.read_lock.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.totals["{#DBNAME}.{#COLLECTION}"].readLock.count` Change per second
MongoDB {#DBNAME}.{#COLLECTION}: Write lock, rate	The number of operations per second.	Dependent item	mongodb.collection.write_lock.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.totals["{#DBNAME}.{#COLLECTION}"].writeLock.count` Change per second
MongoDB {#DBNAME}.{#COLLECTION}: Operations: queries, rate	The number of operations per second.	Dependent item	mongodb.collection.ops.queries.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.totals["{#DBNAME}.{#COLLECTION}"].queries.count` Change per second
MongoDB {#DBNAME}.{#COLLECTION}: Operations: getmore, rate	The number of operations per second.	Dependent item	mongodb.collection.ops.getmore.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.totals["{#DBNAME}.{#COLLECTION}"].getmore.count` Change per second
MongoDB {#DBNAME}.{#COLLECTION}: Operations: insert, rate	The number of operations per second.	Dependent item	mongodb.collection.ops.insert.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.totals["{#DBNAME}.{#COLLECTION}"].insert.count` Change per second
MongoDB {#DBNAME}.{#COLLECTION}: Operations: update, rate	The number of operations per second.	Dependent item	mongodb.collection.ops.update.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.totals["{#DBNAME}.{#COLLECTION}"].update.count` Change per second
MongoDB {#DBNAME}.{#COLLECTION}: Operations: remove, rate	The number of operations per second.	Dependent item	mongodb.collection.ops.remove.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.totals["{#DBNAME}.{#COLLECTION}"].remove.count` Change per second
MongoDB {#DBNAME}.{#COLLECTION}: Operations: commands, rate	The number of operations per second.	Dependent item	mongodb.collection.ops.commands.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.totals["{#DBNAME}.{#COLLECTION}"].commands.count` Change per second
MongoDB {#DBNAME}.{#COLLECTION}: Operations: total, ms/s	Fraction of time (ms/s) the mongod has spent to operations.	Dependent item	mongodb.collection.ops.total.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.totals["{#DBNAME}.{#COLLECTION}"].total.time` Change per second
MongoDB {#DBNAME}.{#COLLECTION}: Read lock, ms/s	Fraction of time (ms/s) the mongod has spent to operations.	Dependent item	mongodb.collection.read_lock.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.totals["{#DBNAME}.{#COLLECTION}"].readLock.time` Change per second
MongoDB {#DBNAME}.{#COLLECTION}: Write lock, ms/s	Fraction of time (ms/s) the mongod has spent to operations.	Dependent item	mongodb.collection.write_lock.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.totals["{#DBNAME}.{#COLLECTION}"].writeLock.time` Change per second
MongoDB {#DBNAME}.{#COLLECTION}: Operations: queries, ms/s	Fraction of time (ms/s) the mongod has spent to operations.	Dependent item	mongodb.collection.ops.queries.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.totals["{#DBNAME}.{#COLLECTION}"].queries.time` Change per second
MongoDB {#DBNAME}.{#COLLECTION}: Operations: getmore, ms/s	Fraction of time (ms/s) the mongod has spent to operations.	Dependent item	mongodb.collection.ops.getmore.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.totals["{#DBNAME}.{#COLLECTION}"].getmore.time` Change per second
MongoDB {#DBNAME}.{#COLLECTION}: Operations: insert, ms/s	Fraction of time (ms/s) the mongod has spent to operations.	Dependent item	mongodb.collection.ops.insert.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.totals["{#DBNAME}.{#COLLECTION}"].insert.time` Change per second
MongoDB {#DBNAME}.{#COLLECTION}: Operations: update, ms/s	Fraction of time (ms/s) the mongod has spent to operations.	Dependent item	mongodb.collection.ops.update.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.totals["{#DBNAME}.{#COLLECTION}"].update.time` Change per second
MongoDB {#DBNAME}.{#COLLECTION}: Operations: remove, ms/s	Fraction of time (ms/s) the mongod has spent to operations.	Dependent item	mongodb.collection.ops.remove.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.totals["{#DBNAME}.{#COLLECTION}"].remove.time` Change per second
MongoDB {#DBNAME}.{#COLLECTION}: Operations: commands, ms/s	Fraction of time (ms/s) the mongod has spent to operations.	Dependent item	mongodb.collection.ops.commands.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing JSON Path: `$.totals["{#DBNAME}.{#COLLECTION}"].commands.time` Change per second

LLD rule Replication discovery

Name Description Type Key and additional info

Replication discovery

Collect metrics by Zabbix agent if it exists.

Dependent item

mongodb.rs.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Replication discovery

Name	Description	Type	Key and additional info
MongoDB: Node state	An integer between 0 and 10 that represents the replica state of the current member.	Dependent item	mongodb.rs.state[{#RS_NAME}] Preprocessing JSON Path: `$.myState` Discard unchanged with heartbeat: `1h`
MongoDB: Replication lag	Delay between a write operation on the primary and its copy to a secondary.	Dependent item	mongodb.rs.lag[{#RS_NAME}] Preprocessing JSON Path: `$.members[?(@.self == "true")].lag.first()`
MongoDB: Number of replicas	The number of replicated nodes in current ReplicaSet.	Dependent item	mongodb.rs.totalnodes[{#RSNAME}] Preprocessing JSON Path: `$.members[?(@.self == "true")].totalNodes.first()` Discard unchanged with heartbeat: `1h`
MongoDB: Number of unhealthy replicas	The number of replicated nodes with member health value = 0.	Dependent item	mongodb.rs.unhealthycount[{#RSNAME}] Preprocessing JSON Path: `$.members[?(@.self == "true")].unhealthyCount.first()` Discard unchanged with heartbeat: `1h`
MongoDB: Unhealthy replicas	The replicated nodes in current ReplicaSet with member health value = 0.	Dependent item	mongodb.rs.unhealthy[{#RS_NAME}] Preprocessing JSON Path: `$.members[?(@.self == "true")].unhealthyNodes.first()` JavaScript: `The text is too long. Please see the template.` Discard unchanged with heartbeat: `1h`
MongoDB: Apply batches, rate	Number of batches applied across all databases per second.	Dependent item	mongodb.rs.apply.batches.rate[{#RS_NAME}] Preprocessing JSON Path: `$.metrics.repl.apply.batches.num` Change per second
MongoDB: Apply batches, ms/s	Fraction of time (ms/s) the mongod has spent applying operations from the oplog.	Dependent item	mongodb.rs.apply.batches.ms.rate[{#RS_NAME}] Preprocessing JSON Path: `$.metrics.repl.apply.batches.totalMillis` Change per second
MongoDB: Apply ops, rate	Number of oplog operations applied per second.	Dependent item	mongodb.rs.apply.rate[{#RS_NAME}] Preprocessing JSON Path: `$.metrics.repl.apply.ops` Change per second
MongoDB: Buffer	Number of operations in the oplog buffer.	Dependent item	mongodb.rs.buffer.count[{#RS_NAME}] Preprocessing JSON Path: `$.metrics.repl.buffer.count`
MongoDB: Buffer, max size	Maximum size of the buffer.	Dependent item	mongodb.rs.buffer.maxsize[{#RSNAME}] Preprocessing JSON Path: `$.metrics.repl.buffer.maxSizeBytes`
MongoDB: Buffer, size	Current size of the contents of the oplog buffer.	Dependent item	mongodb.rs.buffer.size[{#RS_NAME}] Preprocessing JSON Path: `$.metrics.repl.buffer.sizeBytes`
MongoDB: Network bytes, rate	Amount of data read from the replication sync source per second.	Dependent item	mongodb.rs.network.bytes.rate[{#RS_NAME}] Preprocessing JSON Path: `$.metrics.repl.network.bytes` Change per second
MongoDB: Network getmores, rate	Number of getmore operations per second.	Dependent item	mongodb.rs.network.getmores.rate[{#RS_NAME}] Preprocessing JSON Path: `$.metrics.repl.network.getmores.num` Change per second
MongoDB: Network getmores, ms/s	Fraction of time (ms/s) required to collect data from getmore operations.	Dependent item	mongodb.rs.network.getmores.ms.rate[{#RS_NAME}] Preprocessing JSON Path: `$.metrics.repl.network.getmores.totalMillis` Change per second
MongoDB: Network ops, rate	Number of operations read from the replication source per second.	Dependent item	mongodb.rs.network.ops.rate[{#RS_NAME}] Preprocessing JSON Path: `$.metrics.repl.network.ops` Change per second
MongoDB: Network readers created, rate	Number of oplog query processes created per second.	Dependent item	mongodb.rs.network.readers.rate[{#RS_NAME}] Preprocessing JSON Path: `$.metrics.repl.network.readersCreated` Change per second
MongoDB {#RS_NAME}: Oplog time diff	Oplog window: difference between the first and last operation in the oplog. Only present if there are entries in the oplog.	Dependent item	mongodb.rs.oplog.timediff[{#RS_NAME}] Preprocessing JSON Path: `$.timediff`
MongoDB: Preload docs, rate	Number of documents loaded per second during the pre-fetch stage of replication.	Dependent item	mongodb.rs.preload.docs.rate[{#RS_NAME}] Preprocessing JSON Path: `$.metrics.repl.preload.docs.num` ⛔️Custom on fail: Discard value Change per second
MongoDB: Preload docs, ms/s	Fraction of time (ms/s) spent loading documents as part of the pre-fetch stage of replication.	Dependent item	mongodb.rs.preload.docs.ms.rate[{#RS_NAME}] Preprocessing JSON Path: `$.metrics.repl.preload.docs.totalMillis` ⛔️Custom on fail: Discard value Change per second
MongoDB: Preload indexes, rate	Number of index entries loaded by members before updating documents as part of the pre-fetch stage of replication.	Dependent item	mongodb.rs.preload.indexes.rate[{#RS_NAME}] Preprocessing JSON Path: `$.metrics.repl.preload.indexes.num` ⛔️Custom on fail: Discard value Change per second
MongoDB: Preload indexes, ms/s	Fraction of time (ms/s) spent loading documents as part of the pre-fetch stage of replication.	Dependent item	mongodb.rs.preload.indexes.ms.rate[{#RS_NAME}] Preprocessing JSON Path: `$.metrics.repl.preload.indexes.totalMillis` ⛔️Custom on fail: Discard value Change per second

Trigger prototypes for Replication discovery

Name	Description	Expression	Severity
MongoDB: Node in ReplicaSet changed the state	Node in ReplicaSet changed the state. Acknowledge to close the problem manually.	`last(/MongoDB node by Zabbix agent 2/mongodb.rs.state[{#RS_NAME}],#1)<>last(/MongoDB node by Zabbix agent 2/mongodb.rs.state[{#RS_NAME}],#2)`\|Warning	Manual close: Yes
MongoDB: Replication lag with primary is too high		`min(/MongoDB node by Zabbix agent 2/mongodb.rs.lag[{#RS_NAME}],5m)>{$MONGODB.REPL.LAG.MAX.WARN}`\|Warning
MongoDB: There are unhealthy replicas in ReplicaSet		`last(/MongoDB node by Zabbix agent 2/mongodb.rs.unhealthy_count[{#RS_NAME}])>0 and length(last(/MongoDB node by Zabbix agent 2/mongodb.rs.unhealthy[{#RS_NAME}]))>0`\|Average

LLD rule WiredTiger metrics

Name Description Type Key and additional info

WiredTiger metrics

Collect metrics of WiredTiger Storage Engine if it exists.

Dependent item

mongodb.wired_tiger.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 6h

Item prototypes for WiredTiger metrics

Name	Description	Type	Key and additional info
MongoDB: WiredTiger cache: bytes	Size of the data currently in cache.	Dependent item	mongodb.wiredtiger.cache.bytesin_cache[{#SINGLETON}] Preprocessing JSON Path: `$.wiredTiger.cache['bytes currently in the cache']`
MongoDB: WiredTiger cache: in-memory page splits	In-memory page splits.	Dependent item	mongodb.wired_tiger.cache.splits[{#SINGLETON}] Preprocessing JSON Path: `$.wiredTiger.cache['in-memory page splits']`
MongoDB: WiredTiger cache: bytes, max	Maximum cache size.	Dependent item	mongodb.wiredtiger.cache.maximumbytes_configured[{#SINGLETON}] Preprocessing JSON Path: `$.wiredTiger.cache['maximum bytes configured']`
MongoDB: WiredTiger cache: max page size at eviction	Maximum page size at eviction.	Dependent item	mongodb.wiredtiger.cache.maxpagesizeeviction[{#SINGLETON}] Preprocessing JSON Path: `$.wiredTiger.cache['maximum page size at eviction']`
MongoDB: WiredTiger cache: modified pages evicted	Number of pages, that have been modified, evicted from the cache.	Dependent item	mongodb.wiredtiger.cache.modifiedpages_evicted[{#SINGLETON}] Preprocessing JSON Path: `$.wiredTiger.cache['modified pages evicted']`
MongoDB: WiredTiger cache: pages read into cache	Number of pages read into the cache.	Dependent item	mongodb.wiredtiger.cache.pagesread[{#SINGLETON}] Preprocessing JSON Path: `$.wiredTiger.cache['pages read into cache']`
MongoDB: WiredTiger cache: pages written from cache	Number of pages written from the cache.	Dependent item	mongodb.wiredtiger.cache.pageswritten[{#SINGLETON}] Preprocessing JSON Path: `$.wiredTiger.cache['pages written from cache']`
MongoDB: WiredTiger cache: pages held in cache	Number of pages currently held in the cache.	Dependent item	mongodb.wiredtiger.cache.pagesin_cache[{#SINGLETON}] Preprocessing JSON Path: `$.wiredTiger.cache['pages currently held in the cache']`
MongoDB: WiredTiger cache: pages evicted by application threads, rate	Number of page evicted by application threads per second.	Dependent item	mongodb.wiredtiger.cache.pagesevicted_threads.rate[{#SINGLETON}] Preprocessing JSON Path: `$.wiredTiger.cache.['pages evicted by application threads']`
MongoDB: WiredTiger cache: tracked dirty bytes in the cache	Size of the dirty data in the cache.	Dependent item	mongodb.wiredtiger.cache.trackeddirty_bytes[{#SINGLETON}] Preprocessing JSON Path: `$.wiredTiger.cache.['tracked dirty bytes in the cache']`
MongoDB: WiredTiger cache: unmodified pages evicted	Number of pages, that were not modified, evicted from the cache.	Dependent item	mongodb.wiredtiger.cache.unmodifiedpages_evicted[{#SINGLETON}] Preprocessing JSON Path: `$.wiredTiger.cache.['unmodified pages evicted']`
MongoDB: WiredTiger concurrent transactions: read, available	Number of available read tickets (concurrent transactions) remaining.	Dependent item	mongodb.wiredtiger.concurrenttransactions.read.available[{#SINGLETON}] Preprocessing JSON Path: `$.wiredTiger.concurrentTransactions.read.available`
MongoDB: WiredTiger concurrent transactions: read, out	Number of read tickets (concurrent transactions) in use.	Dependent item	mongodb.wiredtiger.concurrenttransactions.read.out[{#SINGLETON}] Preprocessing JSON Path: `$.wiredTiger.concurrentTransactions.read.out`
MongoDB: WiredTiger concurrent transactions: read, total tickets	Total number of read tickets (concurrent transactions) available.	Dependent item	mongodb.wiredtiger.concurrenttransactions.read.totalTickets[{#SINGLETON}] Preprocessing JSON Path: `$.wiredTiger.concurrentTransactions.read.totalTickets`
MongoDB: WiredTiger concurrent transactions: write, available	Number of available write tickets (concurrent transactions) remaining.	Dependent item	mongodb.wiredtiger.concurrenttransactions.write.available[{#SINGLETON}] Preprocessing JSON Path: `$.wiredTiger.concurrentTransactions.write.available`
MongoDB: WiredTiger concurrent transactions: write, out	Number of write tickets (concurrent transactions) in use.	Dependent item	mongodb.wiredtiger.concurrenttransactions.write.out[{#SINGLETON}] Preprocessing JSON Path: `$.wiredTiger.concurrentTransactions.write.out`
MongoDB: WiredTiger concurrent transactions: write, total tickets	Total number of write tickets (concurrent transactions) available.	Dependent item	mongodb.wiredtiger.concurrenttransactions.write.totalTickets[{#SINGLETON}] Preprocessing JSON Path: `$.wiredTiger.concurrentTransactions.write.totalTickets`

Trigger prototypes for WiredTiger metrics

Name	Description	Expression	Severity	Dependencies and additional info
MongoDB: Available WiredTiger read tickets is low	Too few available read tickets. When the number of available read tickets remaining reaches zero, new read requests will be queued until a new read ticket is available.	`max(/MongoDB node by Zabbix agent 2/mongodb.wired_tiger.concurrent_transactions.read.available[{#SINGLETON}],5m)<{$MONGODB.WIRED_TIGER.TICKETS.AVAILABLE.MIN.WARN}`\|Warning
MongoDB: Available WiredTiger write tickets is low	Too few available write tickets. When the number of available write tickets remaining reaches zero, new write requests will be queued until a new write ticket is available.	`max(/MongoDB node by Zabbix agent 2/mongodb.wired_tiger.concurrent_transactions.write.available[{#SINGLETON}],5m)<{$MONGODB.WIRED_TIGER.TICKETS.AVAILABLE.MIN.WARN}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

db_influxdb_http

View README Download JSON

InfluxDB by HTTP

Overview

This template is designed for the effortless deployment of InfluxDB monitoring by Zabbix via HTTP and doesn't require any external scripts.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

InfluxDB 2.0

Configuration

Setup

This template works with self-hosted InfluxDB instances. Internal service metrics are collected from InfluxDB /metrics endpoint. For organization discovery template need to use Authorization via API token. See docs: https://docs.influxdata.com/influxdb/v2.0/security/tokens/

Don't forget to change the macros {$INFLUXDB.URL}, {$INFLUXDB.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values. NOTE. Some metrics may not be collected depending on your InfluxDB instance version and configuration.

Macros used

Name	Description	Default
{$INFLUXDB.URL}	InfluxDB instance URL	`http://localhost:8086`
{$INFLUXDB.API.TOKEN}	InfluxDB API Authorization Token
{$INFLUXDB.ORG_NAME.MATCHES}	Filter of discoverable organizations	`.*`
{$INFLUXDB.ORGNAME.NOTMATCHES}	Filter to exclude discovered organizations	`CHANGE_IF_NEEDED`
{$INFLUXDB.TASK.RUN.FAIL.MAX.WARN}	Maximum number of tasks runs failures for trigger expression.	`2`
{$INFLUXDB.REQ.FAIL.MAX.WARN}	Maximum number of query requests failures for trigger expression.	`2`

Items

Name	Description	Type	Key and additional info
InfluxDB: Get instance metrics		HTTP agent	influx.get_metrics Preprocessing Check for not supported value ⛔️Custom on fail: Discard value Prometheus to JSON
InfluxDB: Instance status	Get the health of an instance.	HTTP agent	influx.healthcheck Preprocessing Check for not supported value ⛔️Custom on fail: Set value to: `{"status":"fail"}]}` JavaScript: `return JSON.parse(value).status == 'pass' ? 1: 0` Discard unchanged with heartbeat: `30m`
InfluxDB: Boltdb reads, rate	Total number of boltdb reads per second.	Dependent item	influxdb.boltdb_reads.rate Preprocessing JSON Path: `$[?(@.name=="boltdb_reads_total")].value.first()` ⛔️Custom on fail: Discard value Change per second
InfluxDB: Boltdb writes, rate	Total number of boltdb writes per second.	Dependent item	influxdb.boltdb_writes.rate Preprocessing JSON Path: `$[?(@.name=="boltdb_writes_total")].value.first()` ⛔️Custom on fail: Discard value Change per second
InfluxDB: Buckets, total	Number of total buckets on the server.	Dependent item	influxdb.buckets.total Preprocessing JSON Path: `$[?(@.name=="influxdb_buckets_total")].value.first()` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `30m`
InfluxDB: Dashboards, total	Number of total dashboards on the server.	Dependent item	influxdb.dashboards.total Preprocessing JSON Path: `$[?(@.name=="influxdb_dashboards_total")].value.first()` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `30m`
InfluxDB: Organizations, total	Number of total organizations on the server.	Dependent item	influxdb.organizations.total Preprocessing JSON Path: `$[?(@.name=="influxdb_organizations_total")].value.first()` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `30m`
InfluxDB: Scrapers, total	Number of total scrapers on the server.	Dependent item	influxdb.scrapers.total Preprocessing JSON Path: `$[?(@.name=="influxdb_scrapers_total")].value.first()` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `30m`
InfluxDB: Telegraf plugins, total	Number of individual telegraf plugins configured.	Dependent item	influxdb.telegraf_plugins.total Preprocessing JSON Path: `$[?(@.name=="influxdb_telegraf_plugins_count")].value.sum()` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `30m`
InfluxDB: Telegrafs, total	Number of total telegraf configurations on the server.	Dependent item	influxdb.telegrafs.total Preprocessing JSON Path: `$[?(@.name=="influxdb_telegrafs_total")].value.first()` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `30m`
InfluxDB: Tokens, total	Number of total tokens on the server.	Dependent item	influxdb.tokens.total Preprocessing JSON Path: `$[?(@.name=="influxdb_tokens_total")].value.first()` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `30m`
InfluxDB: Users, total	Number of total users on the server.	Dependent item	influxdb.users.total Preprocessing JSON Path: `$[?(@.name=="influxdb_users_total")].value.first()` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `30m`
InfluxDB: Version	Version of the InfluxDB instance.	Dependent item	influxdb.version Preprocessing JSON Path: `$[?(@.name=="influxdb_info")].labels.version.first()` Discard unchanged with heartbeat: `3h`
InfluxDB: Uptime	InfluxDB process uptime in seconds.	Dependent item	influxdb.uptime Preprocessing JSON Path: `$[?(@.name=="influxdb_uptime_seconds")].value.first()`
InfluxDB: Workers currently running	Total number of workers currently running tasks.	Dependent item	influxdb.taskexecutorruns_active.total Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value
InfluxDB: Workers busy, pct	Percent of total available workers that are currently busy.	Dependent item	influxdb.taskexecutorworkers_busy.pct Preprocessing JSON Path: `$[?(@.name=="task_executor_workers_busy")].value.first()` ⛔️Custom on fail: Discard value
InfluxDB: Task runs failed, rate	Total number of failure runs across all tasks.	Dependent item	influxdb.taskexecutorcomplete.failed.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
InfluxDB: Task runs successful, rate	Total number of runs successful completed across all tasks.	Dependent item	influxdb.taskexecutorcomplete.successful.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second

Triggers

Name	Description	Expression	Severity
InfluxDB: Health check was failed	The InfluxDB instance is not available or unhealthy.	`last(/InfluxDB by HTTP/influx.healthcheck)=0`\|High
InfluxDB: Version has changed	InfluxDB version has changed. Acknowledge to close the problem manually.	`last(/InfluxDB by HTTP/influxdb.version,#1)<>last(/InfluxDB by HTTP/influxdb.version,#2) and length(last(/InfluxDB by HTTP/influxdb.version))>0`\|Info	Manual close: Yes
InfluxDB: has been restarted	Uptime is less than 10 minutes.	`last(/InfluxDB by HTTP/influxdb.uptime)<10m`\|Info	Manual close: Yes
InfluxDB: Too many tasks failure runs	"Number of failure runs completed across all tasks is too high."	`min(/InfluxDB by HTTP/influxdb.task_executor_complete.failed.rate,5m)>{$INFLUXDB.TASK.RUN.FAIL.MAX.WARN}`\|Warning

LLD rule Organizations discovery

Name Description Type Key and additional info

Organizations discovery

Discovery of organizations metrics.

HTTP agent

influxdb.orgs.discovery

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 1h

Item prototypes for Organizations discovery

Name	Description	Type	Key and additional info
InfluxDB: [{#ORG_NAME}] Query requests bytes, success	Count of bytes received with status 200 per second.	Dependent item	influxdb.org.queryrequestbytes.success.rate["{#ORG_NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
InfluxDB: [{#ORG_NAME}] Query requests bytes, failed	Count of bytes received with status not 200 per second.	Dependent item	influxdb.org.queryrequestbytes.failed.rate["{#ORG_NAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
InfluxDB: [{#ORG_NAME}] Query requests, failed	Total number of query requests with status not 200 per second.	Dependent item	influxdb.org.queryrequest.failed.rate["{#ORGNAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
InfluxDB: [{#ORG_NAME}] Query requests, success	Total number of query requests with status 200 per second.	Dependent item	influxdb.org.queryrequest.success.rate["{#ORGNAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
InfluxDB: [{#ORG_NAME}] Query response bytes, success	Count of bytes returned with status 200 per second.	Dependent item	influxdb.org.httpqueryresponsebytes.success.rate["{#ORGNAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second
InfluxDB: [{#ORG_NAME}] Query response bytes, failed	Count of bytes returned with status not 200 per second.	Dependent item	influxdb.org.httpqueryresponsebytes.failed.rate["{#ORGNAME}"] Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Discard value Change per second

Trigger prototypes for Organizations discovery

Name	Description	Expression	Severity	Dependencies and additional info
InfluxDB: [{#ORG_NAME}]: Too many requests failures	Too many query requests failed.	`min(/InfluxDB by HTTP/influxdb.org.query_request.failed.rate["{#ORG_NAME}"],5m)>{$INFLUXDB.REQ.FAIL.MAX.WARN}`\|Warning

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

db_ignite_jmx

View README Download JSON

Ignite by JMX

Overview

Official JMX Template for Apache Ignite computing platform. This template is based on the original template developed by Igor Akkuratov, Senior Engineer at GridGain Systems and Apache Ignite Contributor.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

Ignite 2.9.0

Configuration

Setup

This template works with standalone and cluster instances. Metrics are collected by JMX. All metrics are discoverable.

Enable and configure JMX access to Apache Ignite. See documentation for instructions. Current JMX tree hierarchy contains classloader by default. Add the following jvm option -DIGNITE_MBEAN_APPEND_CLASS_LOADER_ID=falseto will exclude one level with Classloader name. You can configure Cache and Data Region metrics which you want using official guide.
Set the user name and password in host macros {$IGNITE.USER} and {$IGNITE.PASSWORD}.

Macros used

Name	Description	Default
{$IGNITE.PASSWORD}		`<secret>`
{$IGNITE.USER}		`zabbix`
{$IGNITE.LLD.FILTER.THREAD.POOL.MATCHES}	Filter of discoverable thread pools.	`.*`
{$IGNITE.LLD.FILTER.THREAD.POOL.NOT_MATCHES}	Filter to exclude discovered thread pools.	`Macro too long. Please see the template.`
{$IGNITE.LLD.FILTER.DATA.REGION.MATCHES}	Filter of discoverable data regions.	`.*`
{$IGNITE.LLD.FILTER.DATA.REGION.NOT_MATCHES}	Filter to exclude discovered data regions.	`^(sysMemPlc\|TxLog)$`
{$IGNITE.LLD.FILTER.CACHE.MATCHES}	Filter of discoverable cache groups.	`.*`
{$IGNITE.LLD.FILTER.CACHE.NOT_MATCHES}	Filter to exclude discovered cache groups.	`CHANGE_IF_NEEDED`
{$IGNITE.THREAD.QUEUE.MAX.WARN}	Threshold for thread pool queue size. Can be used with thread pool name as context.	`1000`
{$IGNITE.PME.DURATION.MAX.WARN}	The maximum PME duration in ms for warning trigger expression.	`10000`
{$IGNITE.PME.DURATION.MAX.HIGH}	The maximum PME duration in ms for high trigger expression.	`60000`
{$IGNITE.THREADS.COUNT.MAX.WARN}	The maximum number of running threads for trigger expression.	`1000`
{$IGNITE.JOBS.QUEUE.MAX.WARN}	The maximum number of queued jobs for trigger expression.	`10`
{$IGNITE.CHECKPOINT.PUSED.MAX.HIGH}	The maximum percent of checkpoint buffer utilization for high trigger expression.	`80`
{$IGNITE.CHECKPOINT.PUSED.MAX.WARN}	The maximum percent of checkpoint buffer utilization for warning trigger expression.	`66`
{$IGNITE.DATA.REGION.PUSED.MAX.HIGH}	The maximum percent of data region utilization for high trigger expression.	`90`
{$IGNITE.DATA.REGION.PUSED.MAX.WARN}	The maximum percent of data region utilization for warning trigger expression.	`80`

LLD rule Ignite kernal metrics

Name Description Type Key and additional info

Ignite kernal metrics

JMX agent

jmx.discovery[beans,"org.apache:group=Kernal,name=IgniteKernal,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Ignite kernal metrics

Name Description Type Key and additional info

Ignite [{#JMXIGNITEINSTANCENAME}]: Uptime

Uptime of Ignite instance.

JMX agent

jmx["{#JMXOBJ}",UpTime]

Preprocessing

Custom multiplier: 0.001

Ignite [{#JMXIGNITEINSTANCENAME}]: Version

Version of Ignite instance.

JMX agent

jmx["{#JMXOBJ}",FullVersion]

Preprocessing

Regular expression: (.*)-\d+ \1
Discard unchanged with heartbeat: 3h

Ignite [{#JMXIGNITEINSTANCENAME}]: Local node ID

Unique identifier for this node within grid.

JMX agent

jmx["{#JMXOBJ}",LocalNodeId]

Preprocessing

Discard unchanged with heartbeat: 3h

Trigger prototypes for Ignite kernal metrics

Name	Description	Expression	Severity
Ignite [{#JMXIGNITEINSTANCENAME}]: has been restarted	Uptime is less than 10 minutes.	`last(/Ignite by JMX/jmx["{#JMXOBJ}",UpTime])<10m`\|Info	Manual close: Yes
Ignite [{#JMXIGNITEINSTANCENAME}]: Failed to fetch info data	Zabbix has not received data for items for the last 10 minutes.	`nodata(/Ignite by JMX/jmx["{#JMXOBJ}",UpTime],10m)=1`\|Warning	Manual close: Yes
Ignite [{#JMXIGNITEINSTANCENAME}]: Version has changed	Ignite [{#JMXIGNITEINSTANCENAME}] version has changed. Acknowledge to close the problem manually.	`last(/Ignite by JMX/jmx["{#JMXOBJ}",FullVersion],#1)<>last(/Ignite by JMX/jmx["{#JMXOBJ}",FullVersion],#2) and length(last(/Ignite by JMX/jmx["{#JMXOBJ}",FullVersion]))>0`\|Info	Manual close: Yes

LLD rule Cluster metrics

Name Description Type Key and additional info

Cluster metrics

JMX agent

jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterMetricsMXBeanImpl,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Cluster metrics

Name	Description	Type	Key and additional info
Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, Baseline	Total baseline nodes that are registered in the baseline topology.	JMX agent	jmx["{#JMXOBJ}",TotalBaselineNodes] Preprocessing Discard unchanged with heartbeat: `3h`
Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, Active baseline	The number of nodes that are currently active in the baseline topology.	JMX agent	jmx["{#JMXOBJ}",ActiveBaselineNodes] Preprocessing Discard unchanged with heartbeat: `3h`
Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, Client	The number of client nodes in the cluster.	JMX agent	jmx["{#JMXOBJ}",TotalClientNodes] Preprocessing Discard unchanged with heartbeat: `3h`
Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, total	Total number of nodes.	JMX agent	jmx["{#JMXOBJ}",TotalNodes] Preprocessing Discard unchanged with heartbeat: `3h`
Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, Server	The number of server nodes in the cluster.	JMX agent	jmx["{#JMXOBJ}",TotalServerNodes] Preprocessing Discard unchanged with heartbeat: `3h`

Trigger prototypes for Cluster metrics

Name	Description	Expression	Severity
Ignite [{#JMXIGNITEINSTANCENAME}]: Server node left the topology	One or more server node left the topology. Acknowledge to close the problem manually.	`change(/Ignite by JMX/jmx["{#JMXOBJ}",TotalServerNodes])<0`\|Warning	Manual close: Yes
Ignite [{#JMXIGNITEINSTANCENAME}]: Server node added to the topology	One or more server node added to the topology. Acknowledge to close the problem manually.	`change(/Ignite by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>0`\|Info	Manual close: Yes
Ignite [{#JMXIGNITEINSTANCENAME}]: There are nodes is not in topology	One or more server node left the topology. Acknowledge to close the problem manually.	`last(/Ignite by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>last(/Ignite by JMX/jmx["{#JMXOBJ}",TotalBaselineNodes])`\|Info	Manual close: Yes

LLD rule Local node metrics

Name Description Type Key and additional info

Local node metrics

JMX agent

jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterLocalNodeMetricsMXBeanImpl,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Local node metrics

Name	Description	Type	Key and additional info
Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, current	Number of cancelled jobs that are still running.	JMX agent	jmx["{#JMXOBJ}",CurrentCancelledJobs]
Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs rejected, current	Number of jobs rejected after more recent collision resolution operation.	JMX agent	jmx["{#JMXOBJ}",CurrentRejectedJobs]
Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs waiting, current	Number of queued jobs currently waiting to be executed.	JMX agent	jmx["{#JMXOBJ}",CurrentWaitingJobs]
Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs active, current	Number of currently active jobs concurrently executing on the node.	JMX agent	jmx["{#JMXOBJ}",CurrentActiveJobs]
Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs executed, rate	Total number of jobs handled by the node per second.	JMX agent	jmx["{#JMXOBJ}",TotalExecutedJobs] Preprocessing Change per second
Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, rate	Total number of jobs cancelled by the node per second.	JMX agent	jmx["{#JMXOBJ}",TotalCancelledJobs] Preprocessing Change per second
Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs rejects, rate	Total number of jobs this node rejects during collision resolution operations since node startup per second.	JMX agent	jmx["{#JMXOBJ}",TotalRejectedJobs] Preprocessing Change per second
Ignite [{#JMXIGNITEINSTANCENAME}]: PME duration, current	Current PME duration in milliseconds.	JMX agent	jmx["{#JMXOBJ}",CurrentPmeDuration]
Ignite [{#JMXIGNITEINSTANCENAME}]: Threads count, current	Current number of live threads.	JMX agent	jmx["{#JMXOBJ}",CurrentThreadCount]
Ignite [{#JMXIGNITEINSTANCENAME}]: Heap memory used	Current heap size that is used for object allocation.	JMX agent	jmx["{#JMXOBJ}",HeapMemoryUsed]

Trigger prototypes for Local node metrics

Name	Description	Expression	Severity
Ignite [{#JMXIGNITEINSTANCENAME}]: Number of queued jobs is too high	Number of queued jobs is over {$IGNITE.JOBS.QUEUE.MAX.WARN}.	`min(/Ignite by JMX/jmx["{#JMXOBJ}",CurrentWaitingJobs],15m) > {$IGNITE.JOBS.QUEUE.MAX.WARN}`\|Warning
Ignite [{#JMXIGNITEINSTANCENAME}]: PME duration is too long	PME duration is over {$IGNITE.PME.DURATION.MAX.WARN}ms.	`min(/Ignite by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$IGNITE.PME.DURATION.MAX.WARN}`\|Warning	Depends on: Ignite [{#JMXIGNITEINSTANCENAME}]: PME duration is too long
Ignite [{#JMXIGNITEINSTANCENAME}]: PME duration is too long	PME duration is over {$IGNITE.PME.DURATION.MAX.HIGH}ms. Looks like PME is hung.	`min(/Ignite by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$IGNITE.PME.DURATION.MAX.HIGH}`\|High
Ignite [{#JMXIGNITEINSTANCENAME}]: Number of running threads is too high	Number of running threads is over {$IGNITE.THREADS.COUNT.MAX.WARN}.	`min(/Ignite by JMX/jmx["{#JMXOBJ}",CurrentThreadCount],15m) > {$IGNITE.THREADS.COUNT.MAX.WARN}`\|Warning	Depends on: Ignite [{#JMXIGNITEINSTANCENAME}]: PME duration is too long

LLD rule TCP discovery SPI

Name Description Type Key and additional info

TCP discovery SPI

JMX agent

jmx.discovery[beans,"org.apache:group=SPIs,name=TcpDiscoverySpi,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for TCP discovery SPI

Name	Description	Type	Key and additional info
Ignite [{#JMXIGNITEINSTANCENAME}]: Coordinator	Current coordinator UUID.	JMX agent	jmx["{#JMXOBJ}",Coordinator] Preprocessing Discard unchanged with heartbeat: `3h`
Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes left	Nodes left count.	JMX agent	jmx["{#JMXOBJ}",NodesLeft]
Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes joined	Nodes join count.	JMX agent	jmx["{#JMXOBJ}",NodesJoined]
Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes failed	Nodes failed count.	JMX agent	jmx["{#JMXOBJ}",NodesFailed]
Ignite [{#JMXIGNITEINSTANCENAME}]: Discovery message worker queue	Message worker queue current size.	JMX agent	jmx["{#JMXOBJ}",MessageWorkerQueueSize]
Ignite [{#JMXIGNITEINSTANCENAME}]: Discovery reconnect, rate	Number of times node tries to (re)establish connection to another node per second.	JMX agent	jmx["{#JMXOBJ}",ReconnectCount] Preprocessing Change per second
Ignite [{#JMXIGNITEINSTANCENAME}]: TotalProcessedMessages	The number of messages received per second.	JMX agent	jmx["{#JMXOBJ}",TotalProcessedMessages] Preprocessing Change per second
Ignite [{#JMXIGNITEINSTANCENAME}]: Discovery messages received, rate	The number of messages processed per second.	JMX agent	jmx["{#JMXOBJ}",TotalReceivedMessages] Preprocessing Change per second

Trigger prototypes for TCP discovery SPI

Name	Description	Expression	Severity	Dependencies and additional info
Ignite [{#JMXIGNITEINSTANCENAME}]: Coordinator has changed	Ignite [{#JMXIGNITEINSTANCENAME}] version has changed. Acknowledge to close the problem manually.	`last(/Ignite by JMX/jmx["{#JMXOBJ}",Coordinator],#1)<>last(/Ignite by JMX/jmx["{#JMXOBJ}",Coordinator],#2) and length(last(/Ignite by JMX/jmx["{#JMXOBJ}",Coordinator]))>0`\|Warning	Manual close: Yes

LLD rule TCP Communication SPI metrics

Name Description Type Key and additional info

TCP Communication SPI metrics

JMX agent

jmx.discovery[beans,"org.apache:group=SPIs,name=TcpCommunicationSpi,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for TCP Communication SPI metrics

Name

Description

Type

Key and additional info

Ignite [{#JMXIGNITEINSTANCENAME}]: Communication outbound messages queue

Outbound messages queue size.

JMX agent

jmx["{#JMXOBJ}",OutboundMessagesQueueSize]

Ignite [{#JMXIGNITEINSTANCENAME}]: Communication messages received, rate

The number of messages received per second.

JMX agent

jmx["{#JMXOBJ}",ReceivedMessagesCount]

Preprocessing

Change per second

Ignite [{#JMXIGNITEINSTANCENAME}]: Communication messages sent, rate

The number of messages sent per second.

JMX agent

jmx["{#JMXOBJ}",SentMessagesCount]

Preprocessing

Change per second

LLD rule Transaction metrics

Name Description Type Key and additional info

Transaction metrics

JMX agent

jmx.discovery[beans,"org.apache:group=TransactionMetrics,name=TransactionMetricsMxBeanImpl,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Transaction metrics

Name	Description	Type	Key and additional info
Ignite [{#JMXIGNITEINSTANCENAME}]: Locked keys	The number of keys locked on the node.	JMX agent	jmx["{#JMXOBJ}",LockedKeysNumber]
Ignite [{#JMXIGNITEINSTANCENAME}]: Transactions owner, current	The number of active transactions for which this node is the initiator.	JMX agent	jmx["{#JMXOBJ}",OwnerTransactionsNumber]
Ignite [{#JMXIGNITEINSTANCENAME}]: Transactions holding lock, current	The number of active transactions holding at least one key lock.	JMX agent	jmx["{#JMXOBJ}",TransactionsHoldingLockNumber]
Ignite [{#JMXIGNITEINSTANCENAME}]: Transactions rolledback, rate	The number of transactions which were rollback per second.	JMX agent	jmx["{#JMXOBJ}",TransactionsRolledBackNumber]
Ignite [{#JMXIGNITEINSTANCENAME}]: Transactions committed, rate	The number of transactions which were committed per second.	JMX agent	jmx["{#JMXOBJ}",TransactionsCommittedNumber]

LLD rule Cache metrics

Name Description Type Key and additional info

Cache metrics

JMX agent

jmx.discovery[beans,"org.apache:name=\"org.apache.ignite.internal.processors.cache.CacheLocalMetricsMXBeanImpl\",*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Cache metrics

Name	Description	Type	Key and additional info
Cache group [{#JMXGROUP}]: Cache gets, rate	The number of gets to the cache per second.	JMX agent	jmx["{#JMXOBJ}",CacheGets] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache puts, rate	The number of puts to the cache per second.	JMX agent	jmx["{#JMXOBJ}",CachePuts] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache removals, rate	The number of removals from the cache per second.	JMX agent	jmx["{#JMXOBJ}",CacheRemovals] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache hits, pct	Percentage of successful hits.	JMX agent	jmx["{#JMXOBJ}",CacheHitPercentage]
Cache group [{#JMXGROUP}]: Cache misses, pct	Percentage of accesses that failed to find anything.	JMX agent	jmx["{#JMXOBJ}",CacheMissPercentage]
Cache group [{#JMXGROUP}]: Cache transaction commits, rate	The number of transaction commits per second.	JMX agent	jmx["{#JMXOBJ}",CacheTxCommits] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache transaction rollbacks, rate	The number of transaction rollback per second.	JMX agent	jmx["{#JMXOBJ}",CacheTxRollbacks] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache size	The number of non-null values in the cache as a long value.	JMX agent	jmx["{#JMXOBJ}",CacheSize]
Cache group [{#JMXGROUP}]: Cache heap entries	The number of entries in heap memory.	JMX agent	jmx["{#JMXOBJ}",HeapEntriesCount] Preprocessing Change per second

Trigger prototypes for Cache metrics

Name	Description	Expression	Severity
Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m		`min(/Ignite by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m)>0 and max(/Ignite by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)=0`\|Average
Cache group [{#JMXGROUP}]: Success transactions less than rollbacks for 5m		`min(/Ignite by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m) > max(/Ignite by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)`\|Warning	Depends on: Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m
Cache group [{#JMXGROUP}]: All entries are in heap	All entries are in heap. Possibly you use eager queries it may cause out of memory exceptions for big caches. Acknowledge to close the problem manually.	`last(/Ignite by JMX/jmx["{#JMXOBJ}",CacheSize])=last(/Ignite by JMX/jmx["{#JMXOBJ}",HeapEntriesCount])`\|Info	Manual close: Yes

LLD rule Data region metrics

Name Description Type Key and additional info

Data region metrics

JMX agent

jmx.discovery[beans,"org.apache:group=DataRegionMetrics,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Data region metrics

Name	Description	Type	Key and additional info
Data region {#JMXNAME}: Allocation, rate	Allocation rate (pages per second) averaged across rateTimeInternal.	JMX agent	jmx["{#JMXOBJ}",AllocationRate]
Data region {#JMXNAME}: Allocated, bytes	Total size of memory allocated in bytes.	JMX agent	jmx["{#JMXOBJ}",TotalAllocatedSize]
Data region {#JMXNAME}: Dirty pages	Number of pages in memory not yet synchronized with persistent storage.	JMX agent	jmx["{#JMXOBJ}",DirtyPages]
Data region {#JMXNAME}: Eviction, rate	Eviction rate (pages per second).	JMX agent	jmx["{#JMXOBJ}",EvictionRate]
Data region {#JMXNAME}: Size, max	Maximum memory region size defined by its data region.	JMX agent	jmx["{#JMXOBJ}",MaxSize]
Data region {#JMXNAME}: Offheap size	Offheap size in bytes.	JMX agent	jmx["{#JMXOBJ}",OffHeapSize]
Data region {#JMXNAME}: Offheap used size	Total used offheap size in bytes.	JMX agent	jmx["{#JMXOBJ}",OffheapUsedSize]
Data region {#JMXNAME}: Pages fill factor	The percentage of the used space.	JMX agent	jmx["{#JMXOBJ}",PagesFillFactor]
Data region {#JMXNAME}: Pages replace, rate	Rate at which pages in memory are replaced with pages from persistent storage (pages per second).	JMX agent	jmx["{#JMXOBJ}",PagesReplaceRate]
Data region {#JMXNAME}: Used checkpoint buffer size	Used checkpoint buffer size in bytes.	JMX agent	jmx["{#JMXOBJ}",UsedCheckpointBufferSize]
Data region {#JMXNAME}: Checkpoint buffer size	Total size in bytes for checkpoint buffer.	JMX agent	jmx["{#JMXOBJ}",CheckpointBufferSize]

Trigger prototypes for Data region metrics

Name	Description	Expression	Severity
Data region {#JMXNAME}: Node started to evict pages	You store more data than region can accommodate. Data started to move to disk it can make requests work slower. Acknowledge to close the problem manually.	`min(/Ignite by JMX/jmx["{#JMXOBJ}",EvictionRate],5m)>0`\|Info	Manual close: Yes
Data region {#JMXNAME}: Data region utilization is too high	Data region utilization is high. Increase data region size or delete any data.	`min(/Ignite by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/Ignite by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$IGNITE.DATA.REGION.PUSED.MAX.WARN}`\|Warning	Depends on: Data region {#JMXNAME}: Data region utilization is too high
Data region {#JMXNAME}: Data region utilization is too high	Data region utilization is high. Increase data region size or delete any data.	`min(/Ignite by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/Ignite by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$IGNITE.DATA.REGION.PUSED.MAX.HIGH}`\|High
Data region {#JMXNAME}: Pages replace rate more than 0	There is more data than DataRegionMaxSize. Cluster started to replace pages in memory. Page replacement can slow down operations.	`min(/Ignite by JMX/jmx["{#JMXOBJ}",PagesReplaceRate],5m)>0`\|Warning
Data region {#JMXNAME}: Checkpoint buffer utilization is too high	Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.	`min(/Ignite by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/Ignite by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$IGNITE.CHECKPOINT.PUSED.MAX.WARN}`\|Warning	Depends on: Data region {#JMXNAME}: Checkpoint buffer utilization is too high
Data region {#JMXNAME}: Checkpoint buffer utilization is too high	Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.	`min(/Ignite by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/Ignite by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$IGNITE.CHECKPOINT.PUSED.MAX.HIGH}`\|High

LLD rule Cache groups

Name Description Type Key and additional info

Cache groups

JMX agent

jmx.discovery[beans,"org.apache:group=\"Cache groups\",*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Cache groups

Name	Description	Type	Key and additional info
Cache group [{#JMXNAME}]: Backups	Count of backups configured for cache group.	JMX agent	jmx["{#JMXOBJ}",Backups]
Cache group [{#JMXNAME}]: Partitions	Count of partitions for cache group.	JMX agent	jmx["{#JMXOBJ}",Partitions]
Cache group [{#JMXNAME}]: Caches	List of caches.	JMX agent	jmx["{#JMXOBJ}",Caches] Preprocessing Discard unchanged with heartbeat: `3h`
Cache group [{#JMXNAME}]: Local node partitions, moving	Count of partitions with state MOVING for this cache group located on this node.	JMX agent	jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount]
Cache group [{#JMXNAME}]: Local node partitions, renting	Count of partitions with state RENTING for this cache group located on this node.	JMX agent	jmx["{#JMXOBJ}",LocalNodeRentingPartitionsCount]
Cache group [{#JMXNAME}]: Local node entries, renting	Count of entries remains to evict in RENTING partitions located on this node for this cache group.	JMX agent	jmx["{#JMXOBJ}",LocalNodeRentingEntriesCount]
Cache group [{#JMXNAME}]: Local node partitions, owning	Count of partitions with state OWNING for this cache group located on this node.	JMX agent	jmx["{#JMXOBJ}",LocalNodeOwningPartitionsCount]
Cache group [{#JMXNAME}]: Partition copies, min	Minimum number of partition copies for all partitions of this cache group.	JMX agent	jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies]
Cache group [{#JMXNAME}]: Partition copies, max	Maximum number of partition copies for all partitions of this cache group.	JMX agent	jmx["{#JMXOBJ}",MaximumNumberOfPartitionCopies]

Trigger prototypes for Cache groups

Name	Description	Expression	Severity
Cache group [{#JMXNAME}]: One or more backups are unavailable		`min(/Ignite by JMX/jmx["{#JMXOBJ}",Backups],5m)>=max(/Ignite by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],5m)`\|Warning
Cache group [{#JMXNAME}]: List of caches has changed	List of caches has changed. Significant changes have occurred in the cluster. Acknowledge to close the problem manually.	`last(/Ignite by JMX/jmx["{#JMXOBJ}",Caches],#1)<>last(/Ignite by JMX/jmx["{#JMXOBJ}",Caches],#2) and length(last(/Ignite by JMX/jmx["{#JMXOBJ}",Caches]))>0`\|Info	Manual close: Yes
Cache group [{#JMXNAME}]: Rebalance in progress	Acknowledge to close the problem manually.	`max(/Ignite by JMX/jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount],30m)>0`\|Info	Manual close: Yes
Cache group [{#JMXNAME}]: There is no copy for partitions		`max(/Ignite by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],30m)=0`\|Warning

LLD rule Thread pool metrics

Name Description Type Key and additional info

Thread pool metrics

JMX agent

jmx.discovery[beans,"org.apache:group=\"Thread Pools\",*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Thread pool metrics

Name	Description	Type	Key and additional info
Thread pool [{#JMXNAME}]: Queue size	Current size of the execution queue.	JMX agent	jmx["{#JMXOBJ}",QueueSize]
Thread pool [{#JMXNAME}]: Pool size	Current number of threads in the pool.	JMX agent	jmx["{#JMXOBJ}",PoolSize]
Thread pool [{#JMXNAME}]: Pool size, max	The maximum allowed number of threads.	JMX agent	jmx["{#JMXOBJ}",MaximumPoolSize]
Thread pool [{#JMXNAME}]: Pool size, core	The core number of threads.	JMX agent	jmx["{#JMXOBJ}",CorePoolSize]

Trigger prototypes for Thread pool metrics

Name	Description	Expression	Severity	Dependencies and additional info
Thread pool [{#JMXNAME}]: Too many messages in queue	Number of messages in queue more than {$IGNITE.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}.	`min(/Ignite by JMX/jmx["{#JMXOBJ}",QueueSize],5m) > {$IGNITE.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}`\|Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

db_gridgain_jmx

View README Download JSON

GridGain by JMX

Overview

Official JMX Template for GridGain In-Memory Computing Platform. This template is based on the original template developed by Igor Akkuratov, Senior Engineer at GridGain Systems and GridGain In-Memory Computing Platform Contributor.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

GridGain 8.8.5

Configuration

Setup

This template works with standalone and cluster instances. Metrics are collected by JMX. All metrics are discoverable.

Enable and configure JMX access to GridGain In-Memory Computing Platform. See documentation for instructions. Current JMX tree hierarchy contains classloader by default. Add the following jvm option -DIGNITE_MBEAN_APPEND_CLASS_LOADER_ID=falseto will exclude one level with Classloader name. You can configure Cache and Data Region metrics which you want using official guide.
Set the user name and password in host macros {$GRIDGAIN.USER} and {$GRIDGAIN.PASSWORD}.

Macros used

Name	Description	Default
{$GRIDGAIN.PASSWORD}		`<secret>`
{$GRIDGAIN.USER}		`zabbix`
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.MATCHES}	Filter of discoverable thread pools.	`.*`
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.NOT_MATCHES}	Filter to exclude discovered thread pools.	`Macro too long. Please see the template.`
{$GRIDGAIN.LLD.FILTER.DATA.REGION.MATCHES}	Filter of discoverable data regions.	`.*`
{$GRIDGAIN.LLD.FILTER.DATA.REGION.NOT_MATCHES}	Filter to exclude discovered data regions.	`^(sysMemPlc\|TxLog)$`
{$GRIDGAIN.LLD.FILTER.CACHE.MATCHES}	Filter of discoverable cache groups.	`.*`
{$GRIDGAIN.LLD.FILTER.CACHE.NOT_MATCHES}	Filter to exclude discovered cache groups.	`CHANGE_IF_NEEDED`
{$GRIDGAIN.THREAD.QUEUE.MAX.WARN}	Threshold for thread pool queue size. Can be used with thread pool name as context.	`1000`
{$GRIDGAIN.PME.DURATION.MAX.WARN}	The maximum PME duration in ms for warning trigger expression.	`10000`
{$GRIDGAIN.PME.DURATION.MAX.HIGH}	The maximum PME duration in ms for high trigger expression.	`60000`
{$GRIDGAIN.THREADS.COUNT.MAX.WARN}	The maximum number of running threads for trigger expression.	`1000`
{$GRIDGAIN.JOBS.QUEUE.MAX.WARN}	The maximum number of queued jobs for trigger expression.	`10`
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH}	The maximum percent of checkpoint buffer utilization for high trigger expression.	`80`
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN}	The maximum percent of checkpoint buffer utilization for warning trigger expression.	`66`
{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH}	The maximum percent of data region utilization for high trigger expression.	`90`
{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN}	The maximum percent of data region utilization for warning trigger expression.	`80`

LLD rule GridGain kernal metrics

Name Description Type Key and additional info

GridGain kernal metrics

JMX agent

jmx.discovery[beans,"org.apache:group=Kernal,name=IgniteKernal,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for GridGain kernal metrics

Name Description Type Key and additional info

GridGain [{#JMXIGNITEINSTANCENAME}]: Uptime

Uptime of GridGain instance.

JMX agent

jmx["{#JMXOBJ}",UpTime]

Preprocessing

Custom multiplier: 0.001

GridGain [{#JMXIGNITEINSTANCENAME}]: Version

Version of GridGain instance.

JMX agent

jmx["{#JMXOBJ}",FullVersion]

Preprocessing

Regular expression: (.*)-\d+ \1
Discard unchanged with heartbeat: 3h

GridGain [{#JMXIGNITEINSTANCENAME}]: Local node ID

Unique identifier for this node within grid.

JMX agent

jmx["{#JMXOBJ}",LocalNodeId]

Preprocessing

Discard unchanged with heartbeat: 3h

Trigger prototypes for GridGain kernal metrics

Name	Description	Expression	Severity
GridGain [{#JMXIGNITEINSTANCENAME}]: has been restarted	Uptime is less than 10 minutes.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime])<10m`\|Info	Manual close: Yes
GridGain [{#JMXIGNITEINSTANCENAME}]: Failed to fetch info data	Zabbix has not received data for items for the last 10 minutes.	`nodata(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime],10m)=1`\|Warning	Manual close: Yes
GridGain [{#JMXIGNITEINSTANCENAME}]: Version has changed	The GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion]))>0`\|Info	Manual close: Yes

LLD rule Cluster metrics

Name Description Type Key and additional info

Cluster metrics

JMX agent

jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterMetricsMXBeanImpl,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Cluster metrics

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Baseline	Total baseline nodes that are registered in the baseline topology.	JMX agent	jmx["{#JMXOBJ}",TotalBaselineNodes] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Active baseline	The number of nodes that are currently active in the baseline topology.	JMX agent	jmx["{#JMXOBJ}",ActiveBaselineNodes] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Client	The number of client nodes in the cluster.	JMX agent	jmx["{#JMXOBJ}",TotalClientNodes] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, total	Total number of nodes.	JMX agent	jmx["{#JMXOBJ}",TotalNodes] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Server	The number of server nodes in the cluster.	JMX agent	jmx["{#JMXOBJ}",TotalServerNodes] Preprocessing Discard unchanged with heartbeat: `3h`

Trigger prototypes for Cluster metrics

Name	Description	Expression	Severity
GridGain [{#JMXIGNITEINSTANCENAME}]: Server node left the topology	One or more server node left the topology. Acknowledge to close the problem manually.	`change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])<0`\|Warning	Manual close: Yes
GridGain [{#JMXIGNITEINSTANCENAME}]: Server node added to the topology	One or more server node added to the topology. Acknowledge to close the problem manually.	`change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>0`\|Info	Manual close: Yes
GridGain [{#JMXIGNITEINSTANCENAME}]: There are nodes is not in topology	One or more server node left the topology. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalBaselineNodes])`\|Info	Manual close: Yes

LLD rule Local node metrics

Name Description Type Key and additional info

Local node metrics

JMX agent

jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterLocalNodeMetricsMXBeanImpl,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Local node metrics

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, current	Number of cancelled jobs that are still running.	JMX agent	jmx["{#JMXOBJ}",CurrentCancelledJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejected, current	Number of jobs rejected after more recent collision resolution operation.	JMX agent	jmx["{#JMXOBJ}",CurrentRejectedJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs waiting, current	Number of queued jobs currently waiting to be executed.	JMX agent	jmx["{#JMXOBJ}",CurrentWaitingJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs active, current	Number of currently active jobs concurrently executing on the node.	JMX agent	jmx["{#JMXOBJ}",CurrentActiveJobs]
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs executed, rate	Total number of jobs handled by the node per second.	JMX agent	jmx["{#JMXOBJ}",TotalExecutedJobs] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, rate	Total number of jobs cancelled by the node per second.	JMX agent	jmx["{#JMXOBJ}",TotalCancelledJobs] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejects, rate	Total number of jobs this node rejects during collision resolution operations since node startup per second.	JMX agent	jmx["{#JMXOBJ}",TotalRejectedJobs] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration, current	Current PME duration in milliseconds.	JMX agent	jmx["{#JMXOBJ}",CurrentPmeDuration]
GridGain [{#JMXIGNITEINSTANCENAME}]: Threads count, current	Current number of live threads.	JMX agent	jmx["{#JMXOBJ}",CurrentThreadCount]
GridGain [{#JMXIGNITEINSTANCENAME}]: Heap memory used	Current heap size that is used for object allocation.	JMX agent	jmx["{#JMXOBJ}",HeapMemoryUsed]

Trigger prototypes for Local node metrics

Name	Description	Expression	Severity
GridGain [{#JMXIGNITEINSTANCENAME}]: Number of queued jobs is too high	Number of queued jobs is over {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentWaitingJobs],15m) > {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}`\|Warning
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long	PME duration is over {$GRIDGAIN.PME.DURATION.MAX.WARN}ms.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.WARN}`\|Warning	Depends on: GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long	PME duration is over {$GRIDGAIN.PME.DURATION.MAX.HIGH}ms. Looks like PME is hung.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.HIGH}`\|High
GridGain [{#JMXIGNITEINSTANCENAME}]: Number of running threads is too high	Number of running threads is over {$GRIDGAIN.THREADS.COUNT.MAX.WARN}.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentThreadCount],15m) > {$GRIDGAIN.THREADS.COUNT.MAX.WARN}`\|Warning	Depends on: GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long

LLD rule TCP discovery SPI

Name Description Type Key and additional info

TCP discovery SPI

JMX agent

jmx.discovery[beans,"org.apache:group=SPIs,name=TcpDiscoverySpi,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for TCP discovery SPI

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator	Current coordinator UUID.	JMX agent	jmx["{#JMXOBJ}",Coordinator] Preprocessing Discard unchanged with heartbeat: `3h`
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes left	Nodes left count.	JMX agent	jmx["{#JMXOBJ}",NodesLeft]
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes joined	Nodes join count.	JMX agent	jmx["{#JMXOBJ}",NodesJoined]
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes failed	Nodes failed count.	JMX agent	jmx["{#JMXOBJ}",NodesFailed]
GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery message worker queue	Message worker queue current size.	JMX agent	jmx["{#JMXOBJ}",MessageWorkerQueueSize]
GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery reconnect, rate	Number of times node tries to (re)establish connection to another node per second.	JMX agent	jmx["{#JMXOBJ}",ReconnectCount] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: TotalProcessedMessages	The number of messages received per second.	JMX agent	jmx["{#JMXOBJ}",TotalProcessedMessages] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery messages received, rate	The number of messages processed per second.	JMX agent	jmx["{#JMXOBJ}",TotalReceivedMessages] Preprocessing Change per second

Trigger prototypes for TCP discovery SPI

Name	Description	Expression	Severity	Dependencies and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator has changed	The GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator]))>0`\|Warning	Manual close: Yes

LLD rule TCP Communication SPI metrics

Name Description Type Key and additional info

TCP Communication SPI metrics

JMX agent

jmx.discovery[beans,"org.apache:group=SPIs,name=TcpCommunicationSpi,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for TCP Communication SPI metrics

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication outbound messages queue	Outbound messages queue size.	JMX agent	jmx["{#JMXOBJ}",OutboundMessagesQueueSize]
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages received, rate	The number of messages received per second.	JMX agent	jmx["{#JMXOBJ}",ReceivedMessagesCount] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages sent, rate	The number of messages sent per second.	JMX agent	jmx["{#JMXOBJ}",SentMessagesCount] Preprocessing Change per second
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication reconnect rate	Gets maximum number of reconnect attempts used when establishing connection with remote nodes per second.	JMX agent	jmx["{#JMXOBJ}",ReconnectCount,maxNumbers] Preprocessing Change per second

LLD rule Transaction metrics

Name Description Type Key and additional info

Transaction metrics

JMX agent

jmx.discovery[beans,"org.apache:group=TransactionMetrics,name=TransactionMetricsMxBeanImpl,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.

Item prototypes for Transaction metrics

Name	Description	Type	Key and additional info
GridGain [{#JMXIGNITEINSTANCENAME}]: Locked keys	The number of keys locked on the node.	JMX agent	jmx["{#JMXOBJ}",LockedKeysNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions owner, current	The number of active transactions for which this node is the initiator.	JMX agent	jmx["{#JMXOBJ}",OwnerTransactionsNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions holding lock, current	The number of active transactions holding at least one key lock.	JMX agent	jmx["{#JMXOBJ}",TransactionsHoldingLockNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions rolledback, rate	The number of transactions which were rollback per second.	JMX agent	jmx["{#JMXOBJ}",TransactionsRolledBackNumber]
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions committed, rate	The number of transactions which were committed per second.	JMX agent	jmx["{#JMXOBJ}",TransactionsCommittedNumber]

LLD rule Cache metrics

Name Description Type Key and additional info

Cache metrics

JMX agent

jmx.discovery[beans,"org.apache:name=\"org.apache.gridgain.internal.processors.cache.CacheLocalMetricsMXBeanImpl\",*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Cache metrics

Name	Description	Type	Key and additional info
Cache group [{#JMXGROUP}]: Cache gets, rate	The number of gets to the cache per second.	JMX agent	jmx["{#JMXOBJ}",CacheGets] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache puts, rate	The number of puts to the cache per second.	JMX agent	jmx["{#JMXOBJ}",CachePuts] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache removals, rate	The number of removals from the cache per second.	JMX agent	jmx["{#JMXOBJ}",CacheRemovals] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache hits, pct	Percentage of successful hits.	JMX agent	jmx["{#JMXOBJ}",CacheHitPercentage]
Cache group [{#JMXGROUP}]: Cache misses, pct	Percentage of accesses that failed to find anything.	JMX agent	jmx["{#JMXOBJ}",CacheMissPercentage]
Cache group [{#JMXGROUP}]: Cache transaction commits, rate	The number of transaction commits per second.	JMX agent	jmx["{#JMXOBJ}",CacheTxCommits] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache transaction rollbacks, rate	The number of transaction rollback per second.	JMX agent	jmx["{#JMXOBJ}",CacheTxRollbacks] Preprocessing Change per second
Cache group [{#JMXGROUP}]: Cache size	The number of non-null values in the cache as a long value.	JMX agent	jmx["{#JMXOBJ}",CacheSize]
Cache group [{#JMXGROUP}]: Cache heap entries	The number of entries in heap memory.	JMX agent	jmx["{#JMXOBJ}",HeapEntriesCount] Preprocessing Change per second

Trigger prototypes for Cache metrics

Name	Description	Expression	Severity
Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m		`min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m)>0 and max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)=0`\|Average
Cache group [{#JMXGROUP}]: Success transactions less than rollbacks for 5m		`min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m) > max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)`\|Warning	Depends on: Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m
Cache group [{#JMXGROUP}]: All entries are in heap	All entries are in heap. Possibly you use eager queries it may cause out of memory exceptions for big caches. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",CacheSize])=last(/GridGain by JMX/jmx["{#JMXOBJ}",HeapEntriesCount])`\|Info	Manual close: Yes

LLD rule Data region metrics

Name Description Type Key and additional info

Data region metrics

JMX agent

jmx.discovery[beans,"org.apache:group=DataRegionMetrics,*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Data region metrics

Name	Description	Type	Key and additional info
Data region {#JMXNAME}: Allocation, rate	Allocation rate (pages per second) averaged across rateTimeInternal.	JMX agent	jmx["{#JMXOBJ}",AllocationRate]
Data region {#JMXNAME}: Allocated, bytes	Total size of memory allocated in bytes.	JMX agent	jmx["{#JMXOBJ}",TotalAllocatedSize]
Data region {#JMXNAME}: Dirty pages	Number of pages in memory not yet synchronized with persistent storage.	JMX agent	jmx["{#JMXOBJ}",DirtyPages]
Data region {#JMXNAME}: Eviction, rate	Eviction rate (pages per second).	JMX agent	jmx["{#JMXOBJ}",EvictionRate]
Data region {#JMXNAME}: Size, max	Maximum memory region size defined by its data region.	JMX agent	jmx["{#JMXOBJ}",MaxSize]
Data region {#JMXNAME}: Offheap size	Offheap size in bytes.	JMX agent	jmx["{#JMXOBJ}",OffHeapSize]
Data region {#JMXNAME}: Offheap used size	Total used offheap size in bytes.	JMX agent	jmx["{#JMXOBJ}",OffheapUsedSize]
Data region {#JMXNAME}: Pages fill factor	The percentage of the used space.	JMX agent	jmx["{#JMXOBJ}",PagesFillFactor]
Data region {#JMXNAME}: Pages replace, rate	Rate at which pages in memory are replaced with pages from persistent storage (pages per second).	JMX agent	jmx["{#JMXOBJ}",PagesReplaceRate]
Data region {#JMXNAME}: Used checkpoint buffer size	Used checkpoint buffer size in bytes.	JMX agent	jmx["{#JMXOBJ}",UsedCheckpointBufferSize]
Data region {#JMXNAME}: Checkpoint buffer size	Total size in bytes for checkpoint buffer.	JMX agent	jmx["{#JMXOBJ}",CheckpointBufferSize]

Trigger prototypes for Data region metrics

Name	Description	Expression	Severity
Data region {#JMXNAME}: Node started to evict pages	You store more data than region can accommodate. Data started to move to disk it can make requests work slower. Acknowledge to close the problem manually.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",EvictionRate],5m)>0`\|Info	Manual close: Yes
Data region {#JMXNAME}: Data region utilization is too high	Data region utilization is high. Increase data region size or delete any data.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN}`\|Warning	Depends on: Data region {#JMXNAME}: Data region utilization is too high
Data region {#JMXNAME}: Data region utilization is too high	Data region utilization is high. Increase data region size or delete any data.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH}`\|High
Data region {#JMXNAME}: Pages replace rate more than 0	There is more data than DataRegionMaxSize. Cluster started to replace pages in memory. Page replacement can slow down operations.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",PagesReplaceRate],5m)>0`\|Warning
Data region {#JMXNAME}: Checkpoint buffer utilization is too high	Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN}`\|Warning	Depends on: Data region {#JMXNAME}: Checkpoint buffer utilization is too high
Data region {#JMXNAME}: Checkpoint buffer utilization is too high	Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH}`\|High

LLD rule Cache groups

Name Description Type Key and additional info

Cache groups

JMX agent

jmx.discovery[beans,"org.apache:group=\"Cache groups\",*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Cache groups

Name	Description	Type	Key and additional info
Cache group [{#JMXNAME}]: Backups	Count of backups configured for cache group.	JMX agent	jmx["{#JMXOBJ}",Backups]
Cache group [{#JMXNAME}]: Partitions	Count of partitions for cache group.	JMX agent	jmx["{#JMXOBJ}",Partitions]
Cache group [{#JMXNAME}]: Caches	List of caches.	JMX agent	jmx["{#JMXOBJ}",Caches] Preprocessing Discard unchanged with heartbeat: `3h`
Cache group [{#JMXNAME}]: Local node partitions, moving	Count of partitions with state MOVING for this cache group located on this node.	JMX agent	jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount]
Cache group [{#JMXNAME}]: Local node partitions, renting	Count of partitions with state RENTING for this cache group located on this node.	JMX agent	jmx["{#JMXOBJ}",LocalNodeRentingPartitionsCount]
Cache group [{#JMXNAME}]: Local node entries, renting	Count of entries remains to evict in RENTING partitions located on this node for this cache group.	JMX agent	jmx["{#JMXOBJ}",LocalNodeRentingEntriesCount]
Cache group [{#JMXNAME}]: Local node partitions, owning	Count of partitions with state OWNING for this cache group located on this node.	JMX agent	jmx["{#JMXOBJ}",LocalNodeOwningPartitionsCount]
Cache group [{#JMXNAME}]: Partition copies, min	Minimum number of partition copies for all partitions of this cache group.	JMX agent	jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies]
Cache group [{#JMXNAME}]: Partition copies, max	Maximum number of partition copies for all partitions of this cache group.	JMX agent	jmx["{#JMXOBJ}",MaximumNumberOfPartitionCopies]

Trigger prototypes for Cache groups

Name	Description	Expression	Severity
Cache group [{#JMXNAME}]: One or more backups are unavailable		`min(/GridGain by JMX/jmx["{#JMXOBJ}",Backups],5m)>=max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],5m)`\|Warning
Cache group [{#JMXNAME}]: List of caches has changed	List of caches has changed. Significant changes have occurred in the cluster. Acknowledge to close the problem manually.	`last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches]))>0`\|Info	Manual close: Yes
Cache group [{#JMXNAME}]: Rebalance in progress	Acknowledge to close the problem manually.	`max(/GridGain by JMX/jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount],30m)>0`\|Info	Manual close: Yes
Cache group [{#JMXNAME}]: There is no copy for partitions		`max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],30m)=0`\|Warning

LLD rule Thread pool metrics

Name Description Type Key and additional info

Thread pool metrics

JMX agent

jmx.discovery[beans,"org.apache:group=\"Thread Pools\",*"]

Preprocessing

JavaScript: The text is too long. Please see the template.
Discard unchanged with heartbeat: 3h

Item prototypes for Thread pool metrics

Name	Description	Type	Key and additional info
Thread pool [{#JMXNAME}]: Queue size	Current size of the execution queue.	JMX agent	jmx["{#JMXOBJ}",QueueSize]
Thread pool [{#JMXNAME}]: Pool size	Current number of threads in the pool.	JMX agent	jmx["{#JMXOBJ}",PoolSize]
Thread pool [{#JMXNAME}]: Pool size, max	The maximum allowed number of threads.	JMX agent	jmx["{#JMXOBJ}",MaximumPoolSize]
Thread pool [{#JMXNAME}]: Pool size, core	The core number of threads.	JMX agent	jmx["{#JMXOBJ}",CorePoolSize]

Trigger prototypes for Thread pool metrics

Name	Description	Expression	Severity	Dependencies and additional info
Thread pool [{#JMXNAME}]: Too many messages in queue	Number of messages in queue more than {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}.	`min(/GridGain by JMX/jmx["{#JMXOBJ}",QueueSize],5m) > {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}`\|Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

db_cockroachdb_http

View README Download JSON

CockroachDB by HTTP

Overview

The template to monitor CockroachDB nodes by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.

Template CockroachDB node by HTTP — collects metrics by HTTP agent from Prometheus endpoint and health endpoints.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

CockroachDB 21.2.8

Configuration

Setup

Internal node metrics are collected from Prometheus /_status/vars endpoint. Node health metrics are collected from /health and /health?ready=1 endpoints. Template doesn't require usage of session token.

Don't forget change macros {$COCKROACHDB.API.SCHEME} according to your situation (secure/insecure node). Also, see the Macros section for a list of macros used to set trigger values.

NOTE. Some metrics may not be collected depending on your CockroachDB version and configuration.

Macros used

Name	Description	Default
{$COCKROACHDB.API.PORT}	The port of CockroachDB API and Prometheus endpoint.	`8080`
{$COCKROACHDB.API.SCHEME}	Request scheme which may be http or https.	`http`
{$COCKROACHDB.STORE.USED.MIN.WARN}	The warning threshold of the available disk space in percent.	`20`
{$COCKROACHDB.STORE.USED.MIN.CRIT}	The critical threshold of the available disk space in percent.	`10`
{$COCKROACHDB.OPEN.FDS.MAX.WARN}	Maximum percentage of used file descriptors.	`80`
{$COCKROACHDB.CERT.NODE.EXPIRY.WARN}	Number of days until the node certificate expires.	`30`
{$COCKROACHDB.CERT.CA.EXPIRY.WARN}	Number of days until the CA certificate expires.	`90`
{$COCKROACHDB.CLOCK.OFFSET.MAX.WARN}	Maximum clock offset of the node against the rest of the cluster in milliseconds for trigger expression.	`300`
{$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN}	Maximum number of SQL statements errors for trigger expression.	`2`

Items

Name	Description	Type	Key and additional info
CockroachDB: Get metrics	Get raw metrics from the Prometheus endpoint.	HTTP agent	cockroachdb.get_metrics Preprocessing Check for not supported value ⛔️Custom on fail: Discard value
CockroachDB: Get health	Get node /health endpoint	HTTP agent	cockroachdb.get_health Preprocessing Check for not supported value ⛔️Custom on fail: Discard value Regular expression: `HTTP.*\s(\d+) \1` Discard unchanged with heartbeat: `3h`
CockroachDB: Get readiness	Get node /health?ready=1 endpoint	HTTP agent	cockroachdb.get_readiness Preprocessing Check for not supported value ⛔️Custom on fail: Discard value Regular expression: `HTTP.*\s(\d+) \1` Discard unchanged with heartbeat: `3h`
CockroachDB: Service ping	Check if HTTP/HTTPS service accepts TCP connections.	Simple check	net.tcp.service["{$COCKROACHDB.API.SCHEME}","{HOST.CONN}","{$COCKROACHDB.API.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
CockroachDB: Clock offset	Mean clock offset of the node against the rest of the cluster.	Dependent item	cockroachdb.clock.offset Preprocessing Prometheus pattern: `VALUE(clock_offset_meannanos)` Custom multiplier: `0.000000001`
CockroachDB: Version	Build information.	Dependent item	cockroachdb.version Preprocessing Prometheus pattern: `build_timestamp` label `tag` Discard unchanged with heartbeat: `3h`
CockroachDB: CPU: System time	System CPU time.	Dependent item	cockroachdb.cpu.system_time Preprocessing Prometheus pattern: `VALUE(sys_cpu_sys_ns)` Change per second Custom multiplier: `0.000000001`
CockroachDB: CPU: User time	User CPU time.	Dependent item	cockroachdb.cpu.user_time Preprocessing Prometheus pattern: `VALUE(sys_cpu_user_ns)` Change per second Custom multiplier: `0.000000001`
CockroachDB: CPU: Utilization	The CPU utilization expressed in %.	Dependent item	cockroachdb.cpu.util Preprocessing Prometheus pattern: `VALUE(sys_cpu_combined_percent_normalized)` Custom multiplier: `100`
CockroachDB: Disk: IOPS in progress, rate	Number of disk IO operations currently in progress on this host.	Dependent item	cockroachdb.disk.iops.in_progress.rate Preprocessing Prometheus pattern: `VALUE(sys_host_disk_iopsinprogress)` Change per second
CockroachDB: Disk: Reads, rate	Bytes read from all disks per second since this process started	Dependent item	cockroachdb.disk.read.rate Preprocessing Prometheus pattern: `VALUE(sys_host_disk_read_bytes)` Change per second
CockroachDB: Disk: Read IOPS, rate	Number of disk read operations per second across all disks since this process started.	Dependent item	cockroachdb.disk.iops.read.rate Preprocessing Prometheus pattern: `VALUE(sys_host_disk_read_count)` Change per second
CockroachDB: Disk: Writes, rate	Bytes written to all disks per second since this process started.	Dependent item	cockroachdb.disk.write.rate Preprocessing Prometheus pattern: `VALUE(sys_host_disk_write_bytes)` Change per second
CockroachDB: Disk: Write IOPS, rate	Disk write operations per second across all disks since this process started.	Dependent item	cockroachdb.disk.iops.write.rate Preprocessing Prometheus pattern: `VALUE(sys_host_disk_write_count)` Change per second
CockroachDB: File descriptors: Limit	Open file descriptors soft limit of the process.	Dependent item	cockroachdb.descriptors.limit Preprocessing Prometheus pattern: `VALUE(sys_fd_softlimit)` Discard unchanged with heartbeat: `3h`
CockroachDB: File descriptors: Open	The number of open file descriptors.	Dependent item	cockroachdb.descriptors.open Preprocessing Prometheus pattern: `VALUE(sys_fd_open)`
CockroachDB: GC: Pause time	The amount of processor time used by Go's garbage collector across all nodes. During garbage collection, application code execution is paused.	Dependent item	cockroachdb.gc.pause_time Preprocessing Prometheus pattern: `VALUE(sys_gc_pause_ns)` Change per second Custom multiplier: `0.000000001`
CockroachDB: GC: Runs, rate	The number of times that Go's garbage collector was invoked per second across all nodes.	Dependent item	cockroachdb.gc.runs.rate Preprocessing Prometheus pattern: `VALUE(sys_gc_count)` Change per second
CockroachDB: Go: Goroutines count	Current number of Goroutines. This count should rise and fall based on load.	Dependent item	cockroachdb.go.goroutines.count Preprocessing Prometheus pattern: `VALUE(sys_goroutines)`
CockroachDB: KV transactions: Aborted, rate	Number of aborted KV transactions per second.	Dependent item	cockroachdb.kv.transactions.aborted.rate Preprocessing Prometheus pattern: `VALUE(txn_aborts)` Change per second
CockroachDB: KV transactions: Committed, rate	Number of KV transactions (including 1PC) committed per second.	Dependent item	cockroachdb.kv.transactions.committed.rate Preprocessing Prometheus pattern: `VALUE(txn_commits)` Change per second
CockroachDB: Live nodes count	The number of live nodes in the cluster (will be 0 if this node is not itself live).	Dependent item	cockroachdb.live_count Preprocessing Prometheus pattern: `VALUE(liveness_livenodes)` Discard unchanged with heartbeat: `3h`
CockroachDB: Liveness heartbeats, rate	Number of successful node liveness heartbeats per second from this node.	Dependent item	cockroachdb.heartbeaths.success.rate Preprocessing Prometheus pattern: `VALUE(liveness_heartbeatsuccesses)` Change per second
CockroachDB: Memory: Allocated by Cgo	Current bytes of memory allocated by the C layer.	Dependent item	cockroachdb.memory.cgo.allocated Preprocessing Prometheus pattern: `VALUE(sys_cgo_allocbytes)`
CockroachDB: Memory: Allocated by Go	Current bytes of memory allocated by the Go layer.	Dependent item	cockroachdb.memory.go.allocated Preprocessing Prometheus pattern: `VALUE(sys_go_allocbytes)`
CockroachDB: Memory: Managed by Cgo	Total bytes of memory managed by the C layer.	Dependent item	cockroachdb.memory.cgo.managed Preprocessing Prometheus pattern: `VALUE(sys_cgo_totalbytes)`
CockroachDB: Memory: Managed by Go	Total bytes of memory managed by the Go layer.	Dependent item	cockroachdb.memory.go.managed Preprocessing Prometheus pattern: `VALUE(sys_go_totalbytes)`
CockroachDB: Memory: Total usage	Resident set size (RSS) of memory in use by the node.	Dependent item	cockroachdb.memory.total Preprocessing Prometheus pattern: `VALUE(sys_rss)`
CockroachDB: Network: Bytes received, rate	Bytes received per second on all network interfaces since this process started.	Dependent item	cockroachdb.network.bytes.received.rate Preprocessing Prometheus pattern: `VALUE(sys_host_net_recv_bytes)` Change per second
CockroachDB: Network: Bytes sent, rate	Bytes sent per second on all network interfaces since this process started.	Dependent item	cockroachdb.network.bytes.sent.rate Preprocessing Prometheus pattern: `VALUE(sys_host_net_send_bytes)` Change per second
CockroachDB: Time series: Sample errors, rate	The number of errors encountered while attempting to write metrics to disk, per second.	Dependent item	cockroachdb.ts.samples.errors.rate Preprocessing Prometheus pattern: `VALUE(timeseries_write_errors)` Change per second
CockroachDB: Time series: Samples written, rate	The number of successfully written metric samples per second.	Dependent item	cockroachdb.ts.samples.written.rate Preprocessing Prometheus pattern: `VALUE(timeseries_write_samples)` Change per second
CockroachDB: Slow requests: DistSender RPCs	Number of RPCs stuck or retrying for a long time.	Dependent item	cockroachdb.slow_requests.rpc Preprocessing Prometheus pattern: `VALUE(requests_slow_distsender)`
CockroachDB: SQL: Bytes received, rate	Total amount of incoming SQL client network traffic in bytes per second.	Dependent item	cockroachdb.sql.bytes.received.rate Preprocessing Prometheus pattern: `VALUE(sql_bytesin)` Change per second
CockroachDB: SQL: Bytes sent, rate	Total amount of outgoing SQL client network traffic in bytes per second.	Dependent item	cockroachdb.sql.bytes.sent.rate Preprocessing Prometheus pattern: `VALUE(sql_bytesout)` Change per second
CockroachDB: Memory: Allocated by SQL	Current SQL statement memory usage for root.	Dependent item	cockroachdb.memory.sql Preprocessing Prometheus pattern: `VALUE(sql_mem_root_current)`
CockroachDB: SQL: Schema changes, rate	Total number of SQL DDL statements successfully executed per second.	Dependent item	cockroachdb.sql.schema_changes.rate Preprocessing Prometheus pattern: `VALUE(sql_ddl_count)` Change per second
CockroachDB: SQL sessions: Open	Total number of open SQL sessions.	Dependent item	cockroachdb.sql.sessions Preprocessing Prometheus pattern: `VALUE(sql_conns)`
CockroachDB: SQL statements: Active	Total number of SQL statements currently active.	Dependent item	cockroachdb.sql.statements.active Preprocessing Prometheus pattern: `VALUE(sql_distsql_queries_active)`
CockroachDB: SQL statements: DELETE, rate	A moving average of the number of DELETE statements successfully executed per second.	Dependent item	cockroachdb.sql.statements.delete.rate Preprocessing Prometheus pattern: `VALUE(sql_delete_count)` Change per second
CockroachDB: SQL statements: Executed, rate	Number of SQL queries executed per second.	Dependent item	cockroachdb.sql.statements.executed.rate Preprocessing Prometheus pattern: `VALUE(sql_query_count)` Change per second
CockroachDB: SQL statements: Denials, rate	The number of statements denied per second by a feature flag.	Dependent item	cockroachdb.sql.statements.denials.rate Preprocessing Prometheus pattern: `VALUE(sql_feature_flag_denial)` Change per second
CockroachDB: SQL statements: Active flows distributed, rate	The number of distributed SQL flows currently active per second.	Dependent item	cockroachdb.sql.statements.flows.active.rate Preprocessing Prometheus pattern: `VALUE(sql_distsql_flows_active)` Change per second
CockroachDB: SQL statements: INSERT, rate	A moving average of the number of INSERT statements successfully executed per second.	Dependent item	cockroachdb.sql.statements.insert.rate Preprocessing Prometheus pattern: `VALUE(sql_insert_count)` Change per second
CockroachDB: SQL statements: SELECT, rate	A moving average of the number of SELECT statements successfully executed per second.	Dependent item	cockroachdb.sql.statements.select.rate Preprocessing Prometheus pattern: `VALUE(sql_select_count)` Change per second
CockroachDB: SQL statements: UPDATE, rate	A moving average of the number of UPDATE statements successfully executed per second.	Dependent item	cockroachdb.sql.statements.update.rate Preprocessing Prometheus pattern: `VALUE(sql_update_count)` Change per second
CockroachDB: SQL statements: Contention, rate	Total number of SQL statements that experienced contention per second.	Dependent item	cockroachdb.sql.statements.contention.rate Preprocessing Prometheus pattern: `VALUE(sql_distsql_contended_queries_count)` Change per second
CockroachDB: SQL statements: Errors, rate	Total number of statements which returned a planning or runtime error per second.	Dependent item	cockroachdb.sql.statements.errors.rate Preprocessing Prometheus pattern: `VALUE(sql_failure_count)` Change per second
CockroachDB: SQL transactions: Open	Total number of currently open SQL transactions.	Dependent item	cockroachdb.sql.transactions.open Preprocessing Prometheus pattern: `VALUE(sql_txns_open)`
CockroachDB: SQL transactions: Aborted, rate	Total number of SQL transaction abort errors per second.	Dependent item	cockroachdb.sql.transactions.aborted.rate Preprocessing Prometheus pattern: `VALUE(sql_txn_abort_count)` Change per second
CockroachDB: SQL transactions: Committed, rate	Total number of SQL transaction COMMIT statements successfully executed per second.	Dependent item	cockroachdb.sql.transactions.committed.rate Preprocessing Prometheus pattern: `VALUE(sql_txn_commit_count)` Change per second
CockroachDB: SQL transactions: Initiated, rate	Total number of SQL transaction BEGIN statements successfully executed per second.	Dependent item	cockroachdb.sql.transactions.initiated.rate Preprocessing Prometheus pattern: `VALUE(sql_txn_begin_count)` Change per second
CockroachDB: SQL transactions: Rolled back, rate	Total number of SQL transaction ROLLBACK statements successfully executed per second.	Dependent item	cockroachdb.sql.transactions.rollbacks.rate Preprocessing Prometheus pattern: `VALUE(sql_txn_rollback_count)` Change per second
CockroachDB: Uptime	Process uptime.	Dependent item	cockroachdb.uptime Preprocessing Prometheus pattern: `VALUE(sys_uptime)`
CockroachDB: Node certificate expiration date	Node certificate expires at that date.	Dependent item	cockroachdb.cert.expire_date.node Preprocessing Prometheus pattern: `VALUE(security_certificate_expiration_node)` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `6h`
CockroachDB: CA certificate expiration date	CA certificate expires at that date.	Dependent item	cockroachdb.cert.expire_date.ca Preprocessing Prometheus pattern: `VALUE(security_certificate_expiration_ca)` ⛔️Custom on fail: Discard value Discard unchanged with heartbeat: `6h`

Triggers

Name	Description	Expression	Severity
CockroachDB: Node is unhealthy	Node's /health endpoint has returned HTTP 500 Internal Server Error which indicates unhealthy mode.	`last(/CockroachDB by HTTP/cockroachdb.get_health) = 500`\|Average	Depends on: CockroachDB: Service is down
CockroachDB: Node is not ready	Node's /health?ready=1 endpoint has returned HTTP 503 Service Unavailable. Possible reasons: - node is in the wait phase of the node shutdown sequence; - node is unable to communicate with a majority of the other nodes in the cluster, likely because the cluster is unavailable due to too many nodes being down.	`last(/CockroachDB by HTTP/cockroachdb.get_readiness) = 503 and last(/CockroachDB by HTTP/cockroachdb.uptime) > 5m`\|Average	Depends on: CockroachDB: Service is down
CockroachDB: Service is down		`last(/CockroachDB by HTTP/net.tcp.service["{$COCKROACHDB.API.SCHEME}","{HOST.CONN}","{$COCKROACHDB.API.PORT}"]) = 0`\|Average
CockroachDB: Clock offset is too high	Cockroach-measured clock offset is nearing limit (by default, servers kill themselves at 400ms from the mean).	`min(/CockroachDB by HTTP/cockroachdb.clock.offset,5m) > {$COCKROACHDB.CLOCK.OFFSET.MAX.WARN} * 0.001`\|Warning
CockroachDB: Version has changed		`last(/CockroachDB by HTTP/cockroachdb.version) <> last(/CockroachDB by HTTP/cockroachdb.version,#2) and length(last(/CockroachDB by HTTP/cockroachdb.version)) > 0`\|Info
CockroachDB: Current number of open files is too high	Getting close to open file descriptor limit.	`min(/CockroachDB by HTTP/cockroachdb.descriptors.open,10m) / last(/CockroachDB by HTTP/cockroachdb.descriptors.limit) * 100 > {$COCKROACHDB.OPEN.FDS.MAX.WARN}`\|Warning
CockroachDB: Node is not executing SQL	Node is not executing SQL despite having connections.	`last(/CockroachDB by HTTP/cockroachdb.sql.sessions) > 0 and last(/CockroachDB by HTTP/cockroachdb.sql.statements.executed.rate) = 0`\|Warning
CockroachDB: SQL statements errors rate is too high		`min(/CockroachDB by HTTP/cockroachdb.sql.statements.errors.rate,5m) > {$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN}`\|Warning
CockroachDB: Node has been restarted	Uptime is less than 10 minutes.	`last(/CockroachDB by HTTP/cockroachdb.uptime) < 10m`\|Info
CockroachDB: Failed to fetch node data	Zabbix has not received data for items for the last 5 minutes.	`nodata(/CockroachDB by HTTP/cockroachdb.uptime,5m) = 1`\|Warning	Depends on: CockroachDB: Service is down
CockroachDB: Node certificate expires soon	Node certificate expires soon.	`(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.node) - now()) / 86400 < {$COCKROACHDB.CERT.NODE.EXPIRY.WARN}`\|Warning
CockroachDB: CA certificate expires soon	CA certificate expires soon.	`(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.ca) - now()) / 86400 < {$COCKROACHDB.CERT.CA.EXPIRY.WARN}`\|Warning

LLD rule Storage metrics discovery

Name Description Type Key and additional info

Storage metrics discovery

Discover per store metrics.

Dependent item

cockroachdb.store.discovery

Preprocessing

Prometheus to JSON: capacity
Discard unchanged with heartbeat: 3h

Item prototypes for Storage metrics discovery

Name	Description	Type	Key and additional info
CockroachDB: Storage [{#STORE}]: Bytes: Live	Number of logical bytes stored in live key-value pairs on this node. Live data excludes historical and deleted data.	Dependent item	cockroachdb.storage.bytes.[{#STORE},live] Preprocessing Prometheus pattern: `VALUE(livebytes{store="{#STORE}"})`
CockroachDB: Storage [{#STORE}]: Bytes: System	Number of physical bytes stored in system key-value pairs.	Dependent item	cockroachdb.storage.bytes.[{#STORE},system] Preprocessing Prometheus pattern: `VALUE(sysbytes{store="{#STORE}"})`
CockroachDB: Storage [{#STORE}]: Capacity available	Available storage capacity.	Dependent item	cockroachdb.storage.capacity.[{#STORE},available] Preprocessing Prometheus pattern: `VALUE(capacity_available{store="{#STORE}"})`
CockroachDB: Storage [{#STORE}]: Capacity total	Total storage capacity. This value may be explicitly set using --store. If a store size has not been set, this metric displays the actual disk capacity.	Dependent item	cockroachdb.storage.capacity.[{#STORE},total] Preprocessing Prometheus pattern: `VALUE(capacity{store="{#STORE}"})` Discard unchanged with heartbeat: `3h`
CockroachDB: Storage [{#STORE}]: Capacity used	Disk space in use by CockroachDB data on this node. This excludes the Cockroach binary, operating system, and other system files.	Dependent item	cockroachdb.storage.capacity.[{#STORE},used] Preprocessing Prometheus pattern: `VALUE(capacity_used{store="{#STORE}"})`
CockroachDB: Storage [{#STORE}]: Capacity available in %	Available storage capacity in %.	Calculated	cockroachdb.storage.capacity.[{#STORE},available_percent]
CockroachDB: Storage [{#STORE}]: Replication: Lease holders	Number of lease holders.	Dependent item	cockroachdb.replication.[{#STORE},lease_holders] Preprocessing Prometheus pattern: `VALUE(replicas_leaseholders{store="{#STORE}"})`
CockroachDB: Storage [{#STORE}]: Bytes: Logical	Number of logical bytes stored in key-value pairs on this node. This includes historical and deleted data.	Dependent item	cockroachdb.storage.bytes.[{#STORE},logical] Preprocessing Prometheus pattern: `VALUE(totalbytes{store="{#STORE}"})`
CockroachDB: Storage [{#STORE}]: Rebalancing: Average queries, rate	Number of kv-level requests received per second by the store, averaged over a large time period as used in rebalancing decisions.	Dependent item	cockroachdb.rebalancing.queries.average.[{#STORE},rate] Preprocessing Prometheus pattern: `VALUE(rebalancing_queriespersecond{store="{#STORE}"})`
CockroachDB: Storage [{#STORE}]: Rebalancing: Average writes, rate	Number of keys written (i.e. applied by raft) per second to the store, averaged over a large time period as used in rebalancing decisions.	Dependent item	cockroachdb.rebalancing.writes.average.[{#STORE},rate] Preprocessing Prometheus pattern: `VALUE(rebalancing_writespersecond{store="{#STORE}"})`
CockroachDB: Storage [{#STORE}]: Queue processing failures: Consistency, rate	Number of replicas which failed processing in the consistency checker queue per second.	Dependent item	cockroachdb.queue.processing_failures.consistency.[{#STORE},rate] Preprocessing Prometheus pattern: `VALUE(queue_consistency_process_failure{store="{#STORE}"})` Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: GC, rate	Number of replicas which failed processing in the GC queue per second.	Dependent item	cockroachdb.queue.processing_failures.gc.[{#STORE},rate] Preprocessing Prometheus pattern: `VALUE(queue_gc_process_failure{store="{#STORE}"})` Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: Raft log, rate	Number of replicas which failed processing in the Raft log queue per second.	Dependent item	cockroachdb.queue.processing_failures.raftlog.[{#STORE},rate] Preprocessing Prometheus pattern: `VALUE(queue_raftlog_process_failure{store="{#STORE}"})` Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: Raft snapshot, rate	Number of replicas which failed processing in the Raft repair queue per second.	Dependent item	cockroachdb.queue.processing_failures.raftsnapshot.[{#STORE},rate] Preprocessing Prometheus pattern: `VALUE(queue_raftsnapshot_process_failure{store="{#STORE}"})` Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: Replica GC, rate	Number of replicas which failed processing in the replica GC queue per second.	Dependent item	cockroachdb.queue.processingfailures.gcreplica.[{#STORE},rate] Preprocessing Prometheus pattern: `VALUE(queue_replicagc_process_failure{store="{#STORE}"})` Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: Replicate, rate	Number of replicas which failed processing in the replicate queue per second.	Dependent item	cockroachdb.queue.processing_failures.replicate.[{#STORE},rate] Preprocessing Prometheus pattern: `VALUE(queue_replicate_process_failure{store="{#STORE}"})` Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: Split, rate	Number of replicas which failed processing in the split queue per second.	Dependent item	cockroachdb.queue.processing_failures.split.[{#STORE},rate] Preprocessing Prometheus pattern: `VALUE(queue_split_process_failure{store="{#STORE}"})` Change per second
CockroachDB: Storage [{#STORE}]: Queue processing failures: Time series maintenance, rate	Number of replicas which failed processing in the time series maintenance queue per second.	Dependent item	cockroachdb.queue.processing_failures.tsmaintenance.[{#STORE},rate] Preprocessing Prometheus pattern: `VALUE(queue_tsmaintenance_process_failure{store="{#STORE}"})` Change per second
CockroachDB: Storage [{#STORE}]: Ranges count	Number of ranges.	Dependent item	cockroachdb.ranges.[{#STORE},count] Preprocessing Prometheus pattern: `VALUE(ranges{store="{#STORE}"})`
CockroachDB: Storage [{#STORE}]: Ranges unavailable	Number of ranges with fewer live replicas than needed for quorum.	Dependent item	cockroachdb.ranges.[{#STORE},unavailable] Preprocessing Prometheus pattern: `VALUE(ranges_unavailable{store="{#STORE}"})`
CockroachDB: Storage [{#STORE}]: Ranges underreplicated	Number of ranges with fewer live replicas than the replication target.	Dependent item	cockroachdb.ranges.[{#STORE},underreplicated] Preprocessing Prometheus pattern: `VALUE(ranges_underreplicated{store="{#STORE}"})`
CockroachDB: Storage [{#STORE}]: RocksDB read amplification	The average number of real read operations executed per logical read operation.	Dependent item	cockroachdb.rocksdb.[{#STORE},read_amp] Preprocessing Prometheus pattern: `VALUE(rocksdb_read_amplification{store="{#STORE}"})`
CockroachDB: Storage [{#STORE}]: RocksDB cache hits, rate	Count of block cache hits per second.	Dependent item	cockroachdb.rocksdb.cache.hits.[{#STORE},rate] Preprocessing Prometheus pattern: `VALUE(rocksdb_block_cache_hits{store="{#STORE}"})` Change per second
CockroachDB: Storage [{#STORE}]: RocksDB cache misses, rate	Count of block cache misses per second.	Dependent item	cockroachdb.rocksdb.cache.misses.[{#STORE},rate] Preprocessing Prometheus pattern: `VALUE(rocksdb_block_cache_misses{store="{#STORE}"})` Change per second
CockroachDB: Storage [{#STORE}]: RocksDB cache hit ratio	Block cache hit ratio in %.	Calculated	cockroachdb.rocksdb.cache.[{#STORE},hit_ratio]
CockroachDB: Storage [{#STORE}]: Replication: Replicas	Number of replicas.	Dependent item	cockroachdb.replication.replicas.[{#STORE},count] Preprocessing Prometheus pattern: `VALUE(replicas{store="{#STORE}"})`
CockroachDB: Storage [{#STORE}]: Replication: Replicas quiesced	Number of quiesced replicas.	Dependent item	cockroachdb.replication.replicas.[{#STORE},quiesced] Preprocessing Prometheus pattern: `VALUE(replicas_quiescent{store="{#STORE}"})`
CockroachDB: Storage [{#STORE}]: Slow requests: Latch acquisitions	Number of requests that have been stuck for a long time acquiring latches.	Dependent item	cockroachdb.slowrequests.[{#STORE},latchacquisitions] Preprocessing Prometheus pattern: `VALUE(requests_slow_latch{store="{#STORE}"})`
CockroachDB: Storage [{#STORE}]: Slow requests: Lease acquisitions	Number of requests that have been stuck for a long time acquiring a lease.	Dependent item	cockroachdb.slowrequests.[{#STORE},leaseacquisitions] Preprocessing Prometheus pattern: `VALUE(requests_slow_lease{store="{#STORE}"})`
CockroachDB: Storage [{#STORE}]: Slow requests: Raft proposals	Number of requests that have been stuck for a long time in raft.	Dependent item	cockroachdb.slowrequests.[{#STORE},raftproposals] Preprocessing Prometheus pattern: `VALUE(requests_slow_raft{store="{#STORE}"})`
CockroachDB: Storage [{#STORE}]: RocksDB SSTables	The number of SSTables in use.	Dependent item	cockroachdb.rocksdb.[{#STORE},sstables] Preprocessing Prometheus pattern: `VALUE(rocksdb_num_sstables{store="{#STORE}"})`

Trigger prototypes for Storage metrics discovery

Name	Description	Expression	Severity	Dependencies and additional info
CockroachDB: Storage [{#STORE}]: Available storage capacity is low	Storage is running low on free space (less than {$COCKROACHDB.STORE.USED.MIN.WARN}% available).	`max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.WARN}`\|Warning	Depends on: CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low
CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low	Storage is running critically low on free space (less than {$COCKROACHDB.STORE.USED.MIN.CRIT}% available).	`max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.CRIT}`\|Average

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

db_clickhouse_http

View README Download JSON

ClickHouse by HTTP

Overview

This template is designed for the effortless deployment of ClickHouse monitoring by Zabbix via HTTP and doesn't require any external scripts.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

ClickHouse 20.3+, 21.3+, 22.12+

Configuration

Setup

Create a user to monitor the service:

create file /etc/clickhouse-server/users.d/zabbix.xml
<yandex>
    <users>
      <zabbix>
        <password>zabbix_pass</password>
        <networks incl="networks" />
        <profile>web</profile>
        <quota>default</quota>
        <allow_databases>
          <database>test</database>
        </allow_databases>
      </zabbix>
    </users>
  </yandex>

{$CLICKHOUSE.USER}
{$CLICKHOUSE.PASSWORD} If you don't need authentication - remove headers from HTTP-Agent type items

Macros used

Name	Description	Default
{$CLICKHOUSE.USER}		`zabbix`
{$CLICKHOUSE.PASSWORD}		`zabbix_pass`
{$CLICKHOUSE.NETWORK.ERRORS.MAX.WARN}	Maximum number of network errors for trigger expression	`5`
{$CLICKHOUSE.PORT}	The port of ClickHouse HTTP endpoint	`8123`
{$CLICKHOUSE.SCHEME}	Request scheme which may be http or https	`http`
{$CLICKHOUSE.LLD.FILTER.DB.MATCHES}	Filter of discoverable databases	`.*`
{$CLICKHOUSE.LLD.FILTER.DB.NOT_MATCHES}	Filter to exclude discovered databases	`CHANGE_IF_NEEDED`
{$CLICKHOUSE.LLD.FILTER.DICT.MATCHES}	Filter of discoverable dictionaries	`.*`
{$CLICKHOUSE.LLD.FILTER.DICT.NOT_MATCHES}	Filter to exclude discovered dictionaries	`CHANGE_IF_NEEDED`
{$CLICKHOUSE.LLD.FILTER.TABLE.MATCHES}	Filter of discoverable tables	`.*`
{$CLICKHOUSE.LLD.FILTER.TABLE.NOT_MATCHES}	Filter to exclude discovered tables	`CHANGE_IF_NEEDED`
{$CLICKHOUSE.QUERY_TIME.MAX.WARN}	Maximum ClickHouse query time in seconds for trigger expression	`600`
{$CLICKHOUSE.QUEUE.SIZE.MAX.WARN}	Maximum size of the queue for operations waiting to be performed for trigger expression.	`20`
{$CLICKHOUSE.LOG_POSITION.DIFF.MAX.WARN}	Maximum diff between logpointer and logmax_index.	`30`
{$CLICKHOUSE.REPLICA.MAX.WARN}	Replication lag across all tables for trigger expression.	`600`
{$CLICKHOUSE.DELAYED.FILES.DISTRIBUTED.COUNT.MAX.WARN}	Maximum size of distributed files queue to insert for trigger expression.	`600`
{$CLICKHOUSE.PARTS.PER.PARTITION.WARN}	Maximum number of parts per partition for trigger expression.	`300`
{$CLICKHOUSE.DELAYED.INSERTS.MAX.WARN}	Maximum number of delayed inserts for trigger expression.	`0`

Items

Name	Description	Type	Key and additional info
ClickHouse: Get system.events	Get information about the number of events that have occurred in the system.	HTTP agent	clickhouse.system.events Preprocessing JSON Path: `$.data`
ClickHouse: Get system.metrics	Get metrics which can be calculated instantly, or have a current value format JSONEachRow	HTTP agent	clickhouse.system.metrics Preprocessing JSON Path: `$.data`
ClickHouse: Get system.asynchronous_metrics	Get metrics that are calculated periodically in the background	HTTP agent	clickhouse.system.asynchronous_metrics Preprocessing JSON Path: `$.data`
ClickHouse: Get system.settings	Get information about settings that are currently in use.	HTTP agent	clickhouse.system.settings Preprocessing JSON Path: `$.data` Discard unchanged with heartbeat: `1h`
ClickHouse: Longest currently running query time	Get longest running query.	HTTP agent	clickhouse.process.elapsed
ClickHouse: Check port availability		Simple check	net.tcp.service[{$CLICKHOUSE.SCHEME},"{HOST.CONN}","{$CLICKHOUSE.PORT}"] Preprocessing Discard unchanged with heartbeat: `10m`
ClickHouse: Ping		HTTP agent	clickhouse.ping Preprocessing Regular expression: `Ok\. 1` ⛔️Custom on fail: Set value to: `0` Discard unchanged with heartbeat: `10m`
ClickHouse: Version	Version of the server	HTTP agent	clickhouse.version Preprocessing Discard unchanged with heartbeat: `1d`
ClickHouse: Revision	Revision of the server.	Dependent item	clickhouse.revision Preprocessing JSON Path: `$[?(@.metric == "Revision")].value.first()`
ClickHouse: Uptime	Number of seconds since ClickHouse server start	Dependent item	clickhouse.uptime Preprocessing JSON Path: `$[?(@.metric == "Uptime")].value.first()`
ClickHouse: New queries per second	Number of queries to be interpreted and potentially executed. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries.	Dependent item	clickhouse.query.rate Preprocessing JSON Path: `$[?(@.data.event == "Query")].value.first()` ⛔️Custom on fail: Set value to: `0` Change per second
ClickHouse: New SELECT queries per second	Number of SELECT queries to be interpreted and potentially executed. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries.	Dependent item	clickhouse.select_query.rate Preprocessing JSON Path: `$[?(@.event == "SelectQuery")].value.first()` ⛔️Custom on fail: Set value to: `0` Change per second
ClickHouse: New INSERT queries per second	Number of INSERT queries to be interpreted and potentially executed. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries.	Dependent item	clickhouse.insert_query.rate Preprocessing JSON Path: `$[?(@.event == "InsertQuery")].value.first()` ⛔️Custom on fail: Set value to: `0` Change per second
ClickHouse: Delayed insert queries	Number of INSERT queries that are throttled due to high number of active data parts for partition in a MergeTree table.	Dependent item	clickhouse.insert.delay Preprocessing JSON Path: `$[?(@.metric == "DelayedInserts")].value.first()`
ClickHouse: Current running queries	Number of executing queries	Dependent item	clickhouse.query.current Preprocessing JSON Path: `$[?(@.metric == "Query")].value.first()`
ClickHouse: Current running merges	Number of executing background merges	Dependent item	clickhouse.merge.current Preprocessing JSON Path: `$[?(@.metric == "Merge")].value.first()`
ClickHouse: Inserted bytes per second	The number of uncompressed bytes inserted in all tables.	Dependent item	clickhouse.inserted_bytes.rate Preprocessing JSON Path: `$[?(@.event == "InsertedBytes")].value.first()` ⛔️Custom on fail: Set value to: `0` Change per second
ClickHouse: Read bytes per second	Number of bytes (the number of bytes before decompression) read from compressed sources (files, network).	Dependent item	clickhouse.read_bytes.rate Preprocessing JSON Path: `$[?(@.event == "ReadCompressedBytes")].value.first()` ⛔️Custom on fail: Set value to: `0` Change per second
ClickHouse: Inserted rows per second	The number of rows inserted in all tables.	Dependent item	clickhouse.inserted_rows.rate Preprocessing JSON Path: `$[?(@.event == "InsertedRows")].value.first()` ⛔️Custom on fail: Set value to: `0` Change per second
ClickHouse: Merged rows per second	Rows read for background merges.	Dependent item	clickhouse.merge_rows.rate Preprocessing JSON Path: `$[?(@.event == "MergedRows")].value.first()` ⛔️Custom on fail: Set value to: `0` Change per second
ClickHouse: Uncompressed bytes merged per second	Uncompressed bytes that were read for background merges	Dependent item	clickhouse.merge_bytes.rate Preprocessing JSON Path: `$[?(@.event == "MergedUncompressedBytes")].value.first()` ⛔️Custom on fail: Set value to: `0` Change per second
ClickHouse: Max count of parts per partition across all tables	Clickhouse MergeTree table engine split each INSERT query to partitions (PARTITION BY expression) and add one or more PARTS per INSERT inside each partition, after that background merge process run.	Dependent item	clickhouse.max.part.count.for.partition Preprocessing JSON Path: `$[?(@.metric == "MaxPartCountForPartition")].value.first()`
ClickHouse: Current TCP connections	Number of connections to TCP server (clients with native interface).	Dependent item	clickhouse.connections.tcp Preprocessing JSON Path: `$[?(@.metric == "TCPConnection")].value.first()`
ClickHouse: Current HTTP connections	Number of connections to HTTP server.	Dependent item	clickhouse.connections.http Preprocessing JSON Path: `$[?(@.metric == "HTTPConnection")].value.first()`
ClickHouse: Current distribute connections	Number of connections to remote servers sending data that was INSERTed into Distributed tables.	Dependent item	clickhouse.connections.distribute Preprocessing JSON Path: `$[?(@.metric == "DistributedSend")].value.first()`
ClickHouse: Current MySQL connections	Number of connections to MySQL server.	Dependent item	clickhouse.connections.mysql Preprocessing JSON Path: `$[?(@.metric == "MySQLConnection")].value.first()` ⛔️Custom on fail: Set value to: `0`
ClickHouse: Current Interserver connections	Number of connections from other replicas to fetch parts.	Dependent item	clickhouse.connections.interserver Preprocessing JSON Path: `$[?(@.metric == "InterserverConnection")].value.first()`
ClickHouse: Network errors per second	Network errors (timeouts and connection failures) during query execution, background pool tasks and DNS cache update.	Dependent item	clickhouse.network.error.rate Preprocessing JSON Path: `$[?(@.event == "NetworkErrors")].value.first()` ⛔️Custom on fail: Set value to: `0` Change per second
ClickHouse: ZooKeeper sessions	Number of sessions (connections) to ZooKeeper. Should be no more than one.	Dependent item	clickhouse.zookeeper.session Preprocessing JSON Path: `$[?(@.metric == "ZooKeeperSession")].value.first()`
ClickHouse: ZooKeeper watches	Number of watches (e.g., event subscriptions) in ZooKeeper.	Dependent item	clickhouse.zookeeper.watch Preprocessing JSON Path: `$[?(@.metric == "ZooKeeperWatch")].value.first()`
ClickHouse: ZooKeeper requests	Number of requests to ZooKeeper in progress.	Dependent item	clickhouse.zookeeper.request Preprocessing JSON Path: `$[?(@.metric == "ZooKeeperRequest")].value.first()`
ClickHouse: ZooKeeper wait time	Time spent in waiting for ZooKeeper operations.	Dependent item	clickhouse.zookeeper.wait.time Preprocessing JSON Path: `$[?(@.event == "ZooKeeperWaitMicroseconds")].value.first()` ⛔️Custom on fail: Set value to: `0` Custom multiplier: `1.0E-6` Change per second
ClickHouse: ZooKeeper exceptions per second	Count of ZooKeeper exceptions that does not belong to user/hardware exceptions.	Dependent item	clickhouse.zookeeper.exceptions.rate Preprocessing JSON Path: `$[?(@.event == "ZooKeeperOtherExceptions")].value.first()` ⛔️Custom on fail: Set value to: `0` Change per second
ClickHouse: ZooKeeper hardware exceptions per second	Count of ZooKeeper exceptions caused by session moved/expired, connection loss, marshalling error, operation timed out and invalid zhandle state.	Dependent item	clickhouse.zookeeper.hw_exceptions.rate Preprocessing JSON Path: `$[?(@.event == "ZooKeeperHardwareExceptions")].value.first()` ⛔️Custom on fail: Set value to: `0` Change per second
ClickHouse: ZooKeeper user exceptions per second	Count of ZooKeeper exceptions caused by no znodes, bad version, node exists, node empty and no children for ephemeral.	Dependent item	clickhouse.zookeeper.user_exceptions.rate Preprocessing JSON Path: `$[?(@.event == "ZooKeeperUserExceptions")].value.first()` ⛔️Custom on fail: Set value to: `0` Change per second
ClickHouse: Read syscalls in fly	Number of read (read, pread, io_getevents, etc.) syscalls in fly	Dependent item	clickhouse.read Preprocessing JSON Path: `$[?(@.metric == "Read")].value.first()`
ClickHouse: Write syscalls in fly	Number of write (write, pwrite, io_getevents, etc.) syscalls in fly	Dependent item	clickhouse.write Preprocessing JSON Path: `$[?(@.metric == "Write")].value.first()`
ClickHouse: Allocated bytes	Total number of bytes allocated by the application.	Dependent item	clickhouse.jemalloc.allocated Preprocessing JSON Path: `$[?(@.metric == "jemalloc.allocated")].value.first()`
ClickHouse: Resident memory	Maximum number of bytes in physically resident data pages mapped by the allocator, comprising all pages dedicated to allocator metadata, pages backing active allocations, and unused dirty pages.	Dependent item	clickhouse.jemalloc.resident Preprocessing JSON Path: `$[?(@.metric == "jemalloc.resident")].value.first()`
ClickHouse: Mapped memory	Total number of bytes in active extents mapped by the allocator.	Dependent item	clickhouse.jemalloc.mapped Preprocessing JSON Path: `$[?(@.metric == "jemalloc.mapped")].value.first()`
ClickHouse: Memory used for queries	Total amount of memory (bytes) allocated in currently executing queries.	Dependent item	clickhouse.memory.tracking Preprocessing JSON Path: `$[?(@.metric == "MemoryTracking")].value.first()`
ClickHouse: Memory used for background merges	Total amount of memory (bytes) allocated in background processing pool (that is dedicated for background merges, mutations and fetches). Note that this value may include a drift when the memory was allocated in a context of background processing pool and freed in other context or vice-versa. This happens naturally due to caches for tables indexes and doesn't indicate memory leaks.	Dependent item	clickhouse.memory.tracking.background Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0`
ClickHouse: Memory used for background moves	Total amount of memory (bytes) allocated in background processing pool (that is dedicated for background moves). Note that this value may include a drift when the memory was allocated in a context of background processing pool and freed in other context or vice-versa. This happens naturally due to caches for tables indexes and doesn't indicate memory leaks.	Dependent item	clickhouse.memory.tracking.background.moves Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0`
ClickHouse: Memory used for background schedule pool	Total amount of memory (bytes) allocated in background schedule pool (that is dedicated for bookkeeping tasks of Replicated tables).	Dependent item	clickhouse.memory.tracking.schedule.pool Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0`
ClickHouse: Memory used for merges	Total amount of memory (bytes) allocated for background merges. Included in MemoryTrackingInBackgroundProcessingPool. Note that this value may include a drift when the memory was allocated in a context of background processing pool and freed in other context or vice-versa. This happens naturally due to caches for tables indexes and doesn't indicate memory leaks.	Dependent item	clickhouse.memory.tracking.merges Preprocessing JSON Path: `$[?(@.metric == "MemoryTrackingForMerges")].value.first()` ⛔️Custom on fail: Set value to: `0`
ClickHouse: Current distributed files to insert	Number of pending files to process for asynchronous insertion into Distributed tables. Number of files for every shard is summed.	Dependent item	clickhouse.distributed.files Preprocessing JSON Path: `$[?(@.metric == "DistributedFilesToInsert")].value.first()`
ClickHouse: Distributed connection fail with retry per second	Connection retries in replicated DB connection pool	Dependent item	clickhouse.distributed.files.retry.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0` Change per second
ClickHouse: Distributed connection fail with retry per second	Connection failures after all retries in replicated DB connection pool	Dependent item	clickhouse.distributed.files.fail.rate Preprocessing JSON Path: `The text is too long. Please see the template.` ⛔️Custom on fail: Set value to: `0` Change per second
ClickHouse: Replication lag across all tables	Maximum replica queue delay relative to current time	Dependent item	clickhouse.replicas.max.absolute.delay Preprocessing JSON Path: `$[?(@.metric == "ReplicasMaxAbsoluteDelay")].value.first()`
ClickHouse: Total replication tasks in queue	Number of replication tasks in queue	Dependent item	clickhouse.replicas.sum.queue.size Preprocessing JSON Path: `$[?(@.metric == "ReplicasSumQueueSize")].value.first()`
ClickHouse: Total number read-only Replicas	Number of Replicated tables that are currently in readonly state due to re-initialization after ZooKeeper session loss or due to startup without ZooKeeper configured.	Dependent item	clickhouse.replicas.readonly.total Preprocessing JSON Path: `$[?(@.metric == "ReadonlyReplica")].value.first()`
ClickHouse: Get replicas info	Get information about replicas.	HTTP agent	clickhouse.replicas Preprocessing JSON Path: `$.data`
ClickHouse: Get databases info	Get information about databases.	HTTP agent	clickhouse.databases Preprocessing JSON Path: `$.data`
ClickHouse: Get tables info	Get information about tables.	HTTP agent	clickhouse.tables Preprocessing JSON Path: `$.data`
ClickHouse: Get dictionaries info	Get information about dictionaries.	HTTP agent	clickhouse.dictionaries Preprocessing JSON Path: `$.data`

Triggers

Name	Description	Expression	Severity
ClickHouse: Configuration has been changed	ClickHouse configuration has been changed. Acknowledge to close the problem manually.	`last(/ClickHouse by HTTP/clickhouse.system.settings,#1)<>last(/ClickHouse by HTTP/clickhouse.system.settings,#2) and length(last(/ClickHouse by HTTP/clickhouse.system.settings))>0`\|Info	Manual close: Yes
ClickHouse: There are queries running is long		`last(/ClickHouse by HTTP/clickhouse.process.elapsed)>{$CLICKHOUSE.QUERY_TIME.MAX.WARN}`\|Average	Manual close: Yes
ClickHouse: Port {$CLICKHOUSE.PORT} is unavailable		`last(/ClickHouse by HTTP/net.tcp.service[{$CLICKHOUSE.SCHEME},"{HOST.CONN}","{$CLICKHOUSE.PORT}"])=0`\|Average	Manual close: Yes
ClickHouse: Service is down		`last(/ClickHouse by HTTP/clickhouse.ping)=0 or last(/ClickHouse by HTTP/net.tcp.service[{$CLICKHOUSE.SCHEME},"{HOST.CONN}","{$CLICKHOUSE.PORT}"]) = 0`\|Average	Manual close: Yes Depends on: ClickHouse: Port {$CLICKHOUSE.PORT} is unavailable
ClickHouse: Version has changed	The ClickHouse version has changed. Acknowledge to close the problem manually.	`last(/ClickHouse by HTTP/clickhouse.version,#1)<>last(/ClickHouse by HTTP/clickhouse.version,#2) and length(last(/ClickHouse by HTTP/clickhouse.version))>0`\|Info	Manual close: Yes
ClickHouse: Host has been restarted	The host uptime is less than 10 minutes.	`last(/ClickHouse by HTTP/clickhouse.uptime)<10m`\|Info	Manual close: Yes
ClickHouse: Failed to fetch info data	Zabbix has not received any data for items for the last 30 minutes.	`nodata(/ClickHouse by HTTP/clickhouse.uptime,30m)=1`\|Warning	Manual close: Yes Depends on: ClickHouse: Service is down
ClickHouse: Too many throttled insert queries	Clickhouse have INSERT queries that are throttled due to high number of active data parts for partition in a MergeTree, please decrease INSERT frequency	`min(/ClickHouse by HTTP/clickhouse.insert.delay,5m)>{$CLICKHOUSE.DELAYED.INSERTS.MAX.WARN}`\|Warning	Manual close: Yes
ClickHouse: Too many MergeTree parts	Descease INSERT queries frequency. Clickhouse MergeTree table engine split each INSERT query to partitions (PARTITION BY expression) and add one or more PARTS per INSERT inside each partition, after that background merge process run, and when you have too much unmerged parts inside partition, SELECT queries performance can significate degrade, so clickhouse try delay insert, or abort it.	`min(/ClickHouse by HTTP/clickhouse.max.part.count.for.partition,5m)>{$CLICKHOUSE.PARTS.PER.PARTITION.WARN} * 0.9`\|Warning	Manual close: Yes
ClickHouse: Too many network errors	Number of errors (timeouts and connection failures) during query execution, background pool tasks and DNS cache update is too high.	`min(/ClickHouse by HTTP/clickhouse.network.error.rate,5m)>{$CLICKHOUSE.NETWORK.ERRORS.MAX.WARN}`\|Warning
ClickHouse: Too many ZooKeeper sessions opened	Number of sessions (connections) to ZooKeeper. Should be no more than one, because using more than one connection to ZooKeeper may lead to bugs due to lack of linearizability (stale reads) that ZooKeeper consistency model allows.	`min(/ClickHouse by HTTP/clickhouse.zookeeper.session,5m)>1`\|Warning
ClickHouse: Too many distributed files to insert	Clickhouse servers and in config.xml (https://clickhouse.tech/docs/en/operations/table_engines/distributed/)	`min(/ClickHouse by HTTP/clickhouse.distributed.files,5m)>{$CLICKHOUSE.DELAYED.FILES.DISTRIBUTED.COUNT.MAX.WARN}`\|Warning	Manual close: Yes
ClickHouse: Replication lag is too high	When replica have too much lag, it can be skipped from Distributed SELECT Queries without errors and you will have wrong query results.	`min(/ClickHouse by HTTP/clickhouse.replicas.max.absolute.delay,5m)>{$CLICKHOUSE.REPLICA.MAX.WARN}`\|Warning	Manual close: Yes

LLD rule Tables

Name	Description	Type	Key and additional info
Tables	Info about tables	Dependent item	clickhouse.tables.discovery

Item prototypes for Tables

Name	Description	Type	Key and additional info
ClickHouse: {#DB}.{#TABLE}: Get table info	The item gets information about {#TABLE} table of {#DB} database.	Dependent item	clickhouse.table.info_raw["{#DB}.{#TABLE}"] Preprocessing JSON Path: `$[?(@.database == "{#DB}" && @.table == "{#TABLE}")].first()` ⛔️Custom on fail: Discard value
ClickHouse: {#DB}.{#TABLE}: Bytes	Table size in bytes. Database: {#DB}, table: {#TABLE}	Dependent item	clickhouse.table.bytes["{#DB}.{#TABLE}"] Preprocessing JSON Path: `$.bytes`
ClickHouse: {#DB}.{#TABLE}: Parts	Number of parts of the table. Database: {#DB}, table: {#TABLE}	Dependent item	clickhouse.table.parts["{#DB}.{#TABLE}"] Preprocessing JSON Path: `$.parts`
ClickHouse: {#DB}.{#TABLE}: Rows	Number of rows in the table. Database: {#DB}, table: {#TABLE}	Dependent item	clickhouse.table.rows["{#DB}.{#TABLE}"] Preprocessing JSON Path: `$.rows`

LLD rule Replicas

Name	Description	Type	Key and additional info
Replicas	Info about replicas	Dependent item	clickhouse.replicas.discovery

Item prototypes for Replicas

Name	Description	Type	Key and additional info
ClickHouse: {#DB}.{#TABLE}: Get replicas info	The item gets information about replicas of {#TABLE} table of {#DB} database.	Dependent item	clickhouse.replica.info_raw["{#DB}.{#TABLE}"] Preprocessing JSON Path: `$[?(@.database == "{#DB}" && @.table == "{#TABLE}")].first()` ⛔️Custom on fail: Discard value
ClickHouse: {#DB}.{#TABLE}: Replica readonly	Whether the replica is in read-only mode. This mode is turned on if the config doesn't have sections with ZooKeeper, if an unknown error occurred when re-initializing sessions in ZooKeeper, and during session re-initialization in ZooKeeper.	Dependent item	clickhouse.replica.is_readonly["{#DB}.{#TABLE}"] Preprocessing JSON Path: `$.is_readonly`
ClickHouse: {#DB}.{#TABLE}: Replica session expired	True if the ZooKeeper session expired	Dependent item	clickhouse.replica.issessionexpired["{#DB}.{#TABLE}"] Preprocessing JSON Path: `$.is_session_expired`
ClickHouse: {#DB}.{#TABLE}: Replica future parts	Number of data parts that will appear as the result of INSERTs or merges that haven't been done yet.	Dependent item	clickhouse.replica.future_parts["{#DB}.{#TABLE}"] Preprocessing JSON Path: `$.future_parts`
ClickHouse: {#DB}.{#TABLE}: Replica parts to check	Number of data parts in the queue for verification. A part is put in the verification queue if there is suspicion that it might be damaged.	Dependent item	clickhouse.replica.partstocheck["{#DB}.{#TABLE}"] Preprocessing JSON Path: `$.parts_to_check`
ClickHouse: {#DB}.{#TABLE}: Replica queue size	Size of the queue for operations waiting to be performed.	Dependent item	clickhouse.replica.queue_size["{#DB}.{#TABLE}"] Preprocessing JSON Path: `$.queue_size`
ClickHouse: {#DB}.{#TABLE}: Replica queue inserts size	Number of inserts of blocks of data that need to be made.	Dependent item	clickhouse.replica.insertsinqueue["{#DB}.{#TABLE}"] Preprocessing JSON Path: `$.inserts_in_queue`
ClickHouse: {#DB}.{#TABLE}: Replica queue merges size	Number of merges waiting to be made.	Dependent item	clickhouse.replica.mergesinqueue["{#DB}.{#TABLE}"] Preprocessing JSON Path: `$.merges_in_queue`
ClickHouse: {#DB}.{#TABLE}: Replica log max index	Maximum entry number in the log of general activity. (Have a non-zero value only where there is an active session with ZooKeeper).	Dependent item	clickhouse.replica.logmaxindex["{#DB}.{#TABLE}"] Preprocessing JSON Path: `$.log_max_index`
ClickHouse: {#DB}.{#TABLE}: Replica log pointer	Maximum entry number in the log of general activity that the replica copied to its execution queue, plus one. (Have a non-zero value only where there is an active session with ZooKeeper).	Dependent item	clickhouse.replica.log_pointer["{#DB}.{#TABLE}"] Preprocessing JSON Path: `$.log_pointer`
ClickHouse: {#DB}.{#TABLE}: Total replicas	Total number of known replicas of this table. (Have a non-zero value only where there is an active session with ZooKeeper).	Dependent item	clickhouse.replica.total_replicas["{#DB}.{#TABLE}"] Preprocessing JSON Path: `$.total_replicas`
ClickHouse: {#DB}.{#TABLE}: Active replicas	Number of replicas of this table that have a session in ZooKeeper (i.e., the number of functioning replicas). (Have a non-zero value only where there is an active session with ZooKeeper).	Dependent item	clickhouse.replica.active_replicas["{#DB}.{#TABLE}"] Preprocessing JSON Path: `$.active_replicas`
ClickHouse: {#DB}.{#TABLE}: Replica lag	Difference between logmaxindex and log_pointer	Dependent item	clickhouse.replica.lag["{#DB}.{#TABLE}"] Preprocessing JSON Path: `$.replica_lag`

Trigger prototypes for Replicas

Name	Description	Expression
ClickHouse: {#DB}.{#TABLE} Replica is readonly	This mode is turned on if the config doesn't have sections with ZooKeeper, if an unknown error occurred when re-initializing sessions in ZooKeeper, and during session re-initialization in ZooKeeper.	`min(/ClickHouse by HTTP/clickhouse.replica.is_readonly["{#DB}.{#TABLE}"],5m)=1`\|Warning
ClickHouse: {#DB}.{#TABLE} Replica session is expired	This mode is turned on if the config doesn't have sections with ZooKeeper, if an unknown error occurred when re-initializing sessions in ZooKeeper, and during session re-initialization in ZooKeeper.	`min(/ClickHouse by HTTP/clickhouse.replica.is_session_expired["{#DB}.{#TABLE}"],5m)=1`\|Warning
ClickHouse: {#DB}.{#TABLE}: Too many operations in queue		`min(/ClickHouse by HTTP/clickhouse.replica.queue_size["{#DB}.{#TABLE}"],5m)>{$CLICKHOUSE.QUEUE.SIZE.MAX.WARN:"{#TABLE}"}`\|Warning
ClickHouse: {#DB}.{#TABLE}: Number of active replicas less than number of total replicas		`max(/ClickHouse by HTTP/clickhouse.replica.active_replicas["{#DB}.{#TABLE}"],5m) < last(/ClickHouse by HTTP/clickhouse.replica.total_replicas["{#DB}.{#TABLE}"])`\|Warning
ClickHouse: {#DB}.{#TABLE}: Difference between logmaxindex and log_pointer is too high		`min(/ClickHouse by HTTP/clickhouse.replica.lag["{#DB}.{#TABLE}"],5m) > {$CLICKHOUSE.LOG_POSITION.DIFF.MAX.WARN}`\|Warning

LLD rule Dictionaries

Name	Description	Type	Key and additional info
Dictionaries	Info about dictionaries	Dependent item	clickhouse.dictionaries.discovery

Item prototypes for Dictionaries

Name	Description	Type	Key and additional info
ClickHouse: Dictionary {#NAME}: Get dictionary info	The item gets information about {#NAME} dictionary.	Dependent item	clickhouse.dictionary.info_raw["{#NAME}"] Preprocessing JSON Path: `$[?(@.name == "{#NAME}")].first()` ⛔️Custom on fail: Discard value
ClickHouse: Dictionary {#NAME}: Bytes allocated	The amount of RAM the dictionary uses.	Dependent item	clickhouse.dictionary.bytes_allocated["{#NAME}"] Preprocessing JSON Path: `$.bytes_allocated`
ClickHouse: Dictionary {#NAME}: Element count	Number of items stored in the dictionary.	Dependent item	clickhouse.dictionary.element_count["{#NAME}"] Preprocessing JSON Path: `$.element_count`
ClickHouse: Dictionary {#NAME}: Load factor	The percentage filled in the dictionary (for a hashed dictionary, the percentage filled in the hash table).	Dependent item	clickhouse.dictionary.load_factor["{#NAME}"] Preprocessing JSON Path: `$.bytes_allocated` Custom multiplier: `100`

LLD rule Databases

Name	Description	Type	Key and additional info
Databases	Info about databases	Dependent item	clickhouse.db.discovery

Item prototypes for Databases

Name Description Type Key and additional info

ClickHouse: {#DB}: Get DB info

The item gets information about {#DB} database.

Dependent item

clickhouse.db.info_raw["{#DB}"]

Preprocessing

JSON Path: $[?(@.database == "{#DB}")].first()
⛔️Custom on fail: Discard value

ClickHouse: {#DB}: Bytes

Database size in bytes.

Dependent item

clickhouse.db.bytes["{#DB}"]

Preprocessing

JSON Path: $.bytes

ClickHouse: {#DB}: Tables

Number of tables in {#DB} database.

Dependent item

clickhouse.db.tables["{#DB}"]

Preprocessing

JSON Path: $.tables

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums

db_cassandra_jmx

View README Download JSON

Apache Cassandra by JMX

Overview

This template is designed for the effortless deployment of Apache Cassandra monitoring by Zabbix via JMX and doesn't require any external scripts.

Requirements

Zabbix version: 6.0 and higher.

Tested versions

This template has been tested on:

Apache Cassandra 3.11.8

Configuration

Setup

This template works with standalone and cluster instances. Metrics are collected by JMX.

Enable and configure JMX access to Apache cassandra. See documentation for instructions.
Set the user name and password in host macros {$CASSANDRA.USER} and {$CASSANDRA.PASSWORD}.

Macros used

Name	Description	Default
{$CASSANDRA.USER}		`zabbix`
{$CASSANDRA.PASSWORD}		`zabbix`
{$CASSANDRA.KEY_SPACE.MATCHES}	Filter of discoverable key spaces	`.*`
{$CASSANDRA.KEYSPACE.NOTMATCHES}	Filter to exclude discovered key spaces	`(system\|system_auth\|system_distributed\|system_schema)`
{$CASSANDRA.PENDING_TASKS.MAX.HIGH}		`500`
{$CASSANDRA.PENDING_TASKS.MAX.WARN}		`350`

Items

Name	Description	Type	Key and additional info
Apache Cassandra: Cluster - Nodes down		JMX agent	jmx["org.apache.cassandra.net:type=FailureDetector","DownEndpointCount"] Preprocessing Discard unchanged with heartbeat: `1h`
Apache Cassandra: Cluster - Nodes up		JMX agent	jmx["org.apache.cassandra.net:type=FailureDetector","UpEndpointCount"] Preprocessing Discard unchanged with heartbeat: `1h`
Apache Cassandra: Cluster - Name		JMX agent	jmx["org.apache.cassandra.db:type=StorageService","ClusterName"] Preprocessing Discard unchanged with heartbeat: `1h`
Apache Cassandra: Version		JMX agent	jmx["org.apache.cassandra.db:type=StorageService","ReleaseVersion"] Preprocessing Discard unchanged with heartbeat: `1h`
Apache Cassandra: Dropped messages - Write (Mutation)	Number of dropped regular writes messages.	JMX agent	jmx["org.apache.cassandra.metrics:type=DroppedMessage,scope=MUTATION,name=Dropped","Count"]
Apache Cassandra: Dropped messages - Read	Number of dropped regular reads messages.	JMX agent	jmx["org.apache.cassandra.metrics:type=DroppedMessage,scope=READ,name=Dropped","Count"]
Apache Cassandra: Storage - Used (bytes)	Size, in bytes, of the on disk data size this node manages.	JMX agent	jmx["org.apache.cassandra.metrics:type=Storage,name=Load","Count"]
Apache Cassandra: Storage - Errors	Number of internal exceptions caught. Under normal exceptions this should be zero.	JMX agent	jmx["org.apache.cassandra.metrics:type=Storage,name=Exceptions","Count"]
Apache Cassandra: Storage - Hints	Number of hint messages written to this node since [re]start. Includes one entry for each host to be hinted per hint.	JMX agent	jmx["org.apache.cassandra.metrics:type=Storage,name=TotalHints","Count"]
Apache Cassandra: Compaction - Number of completed tasks	Number of completed compactions since server [re]start.	JMX agent	jmx["org.apache.cassandra.metrics:name=CompletedTasks,type=Compaction","Value"]
Apache Cassandra: Compaction - Total compactions completed	Throughput of completed compactions since server [re]start.	JMX agent	jmx["org.apache.cassandra.metrics:name=TotalCompactionsCompleted,type=Compaction","Count"]
Apache Cassandra: Compaction - Pending tasks	Estimated number of compactions remaining to perform.	JMX agent	jmx["org.apache.cassandra.metrics:type=Compaction,name=PendingTasks","Value"]
Apache Cassandra: Commitlog - Pending tasks	Number of commit log messages written but yet to be fsync'd.	JMX agent	jmx["org.apache.cassandra.metrics:name=PendingTasks,type=CommitLog","Value"]
Apache Cassandra: Commitlog - Total size	Current size, in bytes, used by all the commit log segments.	JMX agent	jmx["org.apache.cassandra.metrics:name=TotalCommitLogSize,type=CommitLog","Value"]
Apache Cassandra: Latency - Read median	Latency read from disk in milliseconds - median.	JMX agent	jmx["org.apache.cassandra.metrics:name=ReadLatency,type=Table","50thPercentile"] Preprocessing Custom multiplier: `0.001`
Apache Cassandra: Latency - Read 75 percentile	Latency read from disk in milliseconds - p75.	JMX agent	jmx["org.apache.cassandra.metrics:name=ReadLatency,type=Table","75thPercentile"] Preprocessing Custom multiplier: `0.001`
Apache Cassandra: Latency - Read 95 percentile	Latency read from disk in milliseconds - p95.	JMX agent	jmx["org.apache.cassandra.metrics:name=ReadLatency,type=Table","95thPercentile"] Preprocessing Custom multiplier: `0.001`
Apache Cassandra: Latency - Write median	Latency write to disk in milliseconds - median.	JMX agent	jmx["org.apache.cassandra.metrics:name=WriteLatency,type=Table","50thPercentile"] Preprocessing Custom multiplier: `0.001`
Apache Cassandra: Latency - Write 75 percentile	Latency write to disk in milliseconds - p75.	JMX agent	jmx["org.apache.cassandra.metrics:name=WriteLatency,type=Table","75thPercentile"] Preprocessing Custom multiplier: `0.001`
Apache Cassandra: Latency - Write 95 percentile	Latency write to disk in milliseconds - p95.	JMX agent	jmx["org.apache.cassandra.metrics:name=WriteLatency,type=Table","95thPercentile"] Preprocessing Custom multiplier: `0.001`
Apache Cassandra: Latency - Client request read median	Total latency serving data to clients in milliseconds - median.	JMX agent	jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency","50thPercentile"] Preprocessing Custom multiplier: `0.001`
Apache Cassandra: Latency - Client request read 75 percentile	Total latency serving data to clients in milliseconds - p75.	JMX agent	jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency","75thPercentile"] Preprocessing Custom multiplier: `0.001`
Apache Cassandra: Latency - Client request read 95 percentile	Total latency serving data to clients in milliseconds - p95.	JMX agent	jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency","95thPercentile"] Preprocessing Custom multiplier: `0.001`
Apache Cassandra: Latency - Client request write median	Total latency serving write requests from clients in milliseconds - median.	JMX agent	jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Latency","50thPercentile"] Preprocessing Custom multiplier: `0.001`
Apache Cassandra: Latency - Client request write 75 percentile	Total latency serving write requests from clients in milliseconds - p75.	JMX agent	jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Latency","75thPercentile"] Preprocessing Custom multiplier: `0.001`
Apache Cassandra: Latency - Client request write 95 percentile	Total latency serving write requests from clients in milliseconds - p95.	JMX agent	jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Latency","95thPercentile"] Preprocessing Custom multiplier: `0.001`
Apache Cassandra: KeyCache - Capacity	Cache capacity in bytes.	JMX agent	jmx["org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Capacity","Value"] Preprocessing Discard unchanged with heartbeat: `1h`
Apache Cassandra: KeyCache - Entries	Total number of cache entries.	JMX agent	jmx["org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Entries","Value"]
Apache Cassandra: KeyCache - HitRate	All time cache hit rate.	JMX agent	jmx["org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=HitRate","Value"] Preprocessing Custom multiplier: `100`
Apache Cassandra: KeyCache - Hits per second	Rate of cache hits.	JMX agent	jmx["org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Hits","Count"] Preprocessing Change per second
Apache Cassandra: KeyCache - requests per second	Rate of cache requests.	JMX agent	jmx["org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Requests","Count"] Preprocessing Change per second
Apache Cassandra: KeyCache - Size	Total size of occupied cache, in bytes.	JMX agent	jmx["org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Size","Value"]
Apache Cassandra: Client connections - Native	Number of clients connected to this nodes native protocol server.	JMX agent	jmx["org.apache.cassandra.metrics:type=Client,name=connectedNativeClients","Value"]
Apache Cassandra: Client connections - Trifts	Number of connected to this nodes thrift clients.	JMX agent	jmx["org.apache.cassandra.metrics:type=Client,name=connectedThriftClients","Value"]
Apache Cassandra: Client request - Read per second	The number of client requests per second.	JMX agent	jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency","Count"] Preprocessing Change per second
Apache Cassandra: Client request - Write per second	The number of local write requests per second.	JMX agent	jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Latency","Count"] Preprocessing Change per second
Apache Cassandra: Client request - Write Timeouts	Number of write requests timeouts encountered.	JMX agent	jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Timeouts","Count"]
Apache Cassandra: Thread pool.MutationStage - Pending tasks	Number of queued tasks queued up on this pool. MutationStage: Responsible for writes (exclude materialized and counter writes).	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=MutationStage,name=PendingTasks","Value"]
Apache Cassandra: Thread pool MutationStage - Currently blocked task	Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. MutationStage: Responsible for writes (exclude materialized and counter writes).	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=MutationStage,name=CurrentlyBlockedTasks","Count"]
Apache Cassandra: Thread pool MutationStage - Total blocked tasks	Number of tasks that were blocked due to queue saturation. MutationStage: Responsible for writes (exclude materialized and counter writes).	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=MutationStage,name=TotalBlockedTasks","Count"]
Apache Cassandra: Thread pool CounterMutationStage - Pending tasks	Number of queued tasks queued up on this pool. CounterMutationStage: Responsible for counter writes.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=CounterMutationStage,name=PendingTasks","Value"]
Apache Cassandra: Thread pool CounterMutationStage - Currently blocked task	Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. CounterMutationStage: Responsible for counter writes.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=CounterMutationStage,name=CurrentlyBlockedTasks","Count"]
Apache Cassandra: Thread pool CounterMutationStage - Total blocked tasks	Number of tasks that were blocked due to queue saturation. CounterMutationStage: Responsible for counter writes.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=CounterMutationStage,name=TotalBlockedTasks","Count"]
Apache Cassandra: Thread pool ReadStage - Pending tasks	Number of queued tasks queued up on this pool. ReadStage: Local reads run on this thread pool.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ReadStage,name=PendingTasks","Value"]
Apache Cassandra: Thread pool ReadStage - Currently blocked task	Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. ReadStage: Local reads run on this thread pool.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ReadStage,name=CurrentlyBlockedTasks","Count"]
Apache Cassandra: Thread pool ReadStage - Total blocked tasks	Number of tasks that were blocked due to queue saturation. ReadStage: Local reads run on this thread pool.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ReadStage,name=TotalBlockedTasks","Count"]
Apache Cassandra: Thread pool ViewMutationStage - Pending tasks	Number of queued tasks queued up on this pool. ViewMutationStage: Responsible for materialized view writes.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ViewMutationStage,name=PendingTasks","Value"]
Apache Cassandra: Thread pool ViewMutationStage - Currently blocked task	Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. ViewMutationStage: Responsible for materialized view writes.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ViewMutationStage,name=CurrentlyBlockedTasks","Count"]
Apache Cassandra: Thread pool ViewMutationStage - Total blocked tasks	Number of tasks that were blocked due to queue saturation. ViewMutationStage: Responsible for materialized view writes.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ViewMutationStage,name=TotalBlockedTasks","Count"]
Apache Cassandra: Thread pool MemtableFlushWriter - Pending tasks	Number of queued tasks queued up on this pool. MemtableFlushWriter: Writes memtables to disk.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MemtableFlushWriter,name=PendingTasks","Value"]
Apache Cassandra: Thread pool MemtableFlushWriter - Currently blocked task	Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. MemtableFlushWriter: Writes memtables to disk.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MemtableFlushWriter,name=CurrentlyBlockedTasks","Count"]
Apache Cassandra: Thread pool MemtableFlushWriter - Total blocked tasks	Number of tasks that were blocked due to queue saturation. MemtableFlushWriter: Writes memtables to disk.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MemtableFlushWriter,name=TotalBlockedTasks","Count"]
Apache Cassandra: Thread pool HintsDispatcher - Pending tasks	Number of queued tasks queued up on this pool. HintsDispatcher: Performs hinted handoff.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=HintsDispatcher,name=PendingTasks","Value"]
Apache Cassandra: Thread pool HintsDispatcher - Currently blocked task	Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. HintsDispatcher: Performs hinted handoff.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=HintsDispatcher,name=CurrentlyBlockedTasks","Count"]
Apache Cassandra: Thread pool HintsDispatcher - Total blocked tasks	Number of tasks that were blocked due to queue saturation. HintsDispatcher: Performs hinted handoff.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=HintsDispatcher,name=TotalBlockedTasks","Count"]
Apache Cassandra: Thread pool MemtablePostFlush - Pending tasks	Number of queued tasks queued up on this pool. MemtablePostFlush: Cleans up commit log after memtable is written to disk.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MemtablePostFlush,name=PendingTasks","Value"]
Apache Cassandra: Thread pool MemtablePostFlush - Currently blocked task	Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. MemtablePostFlush: Cleans up commit log after memtable is written to disk.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MemtablePostFlush,name=CurrentlyBlockedTasks","Count"]
Apache Cassandra: Thread pool MemtablePostFlush - Total blocked tasks	Number of tasks that were blocked due to queue saturation. MemtablePostFlush: Cleans up commit log after memtable is written to disk.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MemtablePostFlush,name=TotalBlockedTasks","Count"]
Apache Cassandra: Thread pool MigrationStage - Pending tasks	Number of queued tasks queued up on this pool. MigrationStage: Runs schema migrations.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MigrationStage,name=PendingTasks","Value"]
Apache Cassandra: Thread pool MigrationStage - Currently blocked task	Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. MigrationStage: Runs schema migrations.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MigrationStage,name=CurrentlyBlockedTasks","Count"]
Apache Cassandra: Thread pool MigrationStage - Total blocked tasks	Number of tasks that were blocked due to queue saturation. MigrationStage: Runs schema migrations.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MigrationStage,name=TotalBlockedTasks","Count"]
Apache Cassandra: Thread pool MiscStage - Pending tasks	Number of queued tasks queued up on this pool. MiscStage: Miscellaneous tasks run here.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MiscStage,name=PendingTasks","Value"]
Apache Cassandra: Thread pool MiscStage - Currently blocked task	Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. MiscStage: Miscellaneous tasks run here.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MiscStage,name=CurrentlyBlockedTasks","Count"]
Apache Cassandra: Thread pool MiscStage - Total blocked tasks	Number of tasks that were blocked due to queue saturation. MiscStage: Miscellaneous tasks run here.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MiscStage,name=TotalBlockedTasks","Count"]
Apache Cassandra: Thread pool SecondaryIndexManagement - Pending tasks	Number of queued tasks queued up on this pool. SecondaryIndexManagement: Performs updates to secondary indexes.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=SecondaryIndexManagement,name=PendingTasks","Value"]
Apache Cassandra: Thread pool SecondaryIndexManagement - Currently blocked task	Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. SecondaryIndexManagement: Performs updates to secondary indexes.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=SecondaryIndexManagement,name=CurrentlyBlockedTasks","Count"]
Apache Cassandra: Thread pool SecondaryIndexManagement - Total blocked tasks	Number of tasks that were blocked due to queue saturation. SecondaryIndexManagement: Performs updates to secondary indexes.	JMX agent	jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=SecondaryIndexManagement,name=TotalBlockedTasks","Count"]

Triggers

Name	Description	Expression	Severity
Apache Cassandra: There are down nodes in cluster		`last(/Apache Cassandra by JMX/jmx["org.apache.cassandra.net:type=FailureDetector","DownEndpointCount"])>0`\|Average
Apache Cassandra: Version has changed	Cassandra version has changed. Acknowledge to close the problem manually.	`last(/Apache Cassandra by JMX/jmx["org.apache.cassandra.db:type=StorageService","ReleaseVersion"],#1)<>last(/Apache Cassandra by JMX/jmx["org.apache.cassandra.db:type=StorageService","ReleaseVersion"],#2) and length(last(/Apache Cassandra by JMX/jmx["org.apache.cassandra.db:type=StorageService","ReleaseVersion"]))>0`\|Info	Manual close: Yes
Apache Cassandra: Failed to fetch info data	Zabbix has not received data for items for the last 15 minutes	`nodata(/Apache Cassandra by JMX/jmx["org.apache.cassandra.metrics:type=Storage,name=Load","Count"],15m)=1`\|Warning
Apache Cassandra: Too many storage exceptions		`min(/Apache Cassandra by JMX/jmx["org.apache.cassandra.metrics:type=Storage,name=Exceptions","Count"],5m)>0`\|Warning
Apache Cassandra: Many pending tasks		`min(/Apache Cassandra by JMX/jmx["org.apache.cassandra.metrics:type=Compaction,name=PendingTasks","Value"],15m)>{$CASSANDRA.PENDING_TASKS.MAX.WARN}`\|Warning	Depends on: Apache Cassandra: Too many pending tasks
Apache Cassandra: Too many pending tasks		`min(/Apache Cassandra by JMX/jmx["org.apache.cassandra.metrics:type=Compaction,name=PendingTasks","Value"],15m)>{$CASSANDRA.PENDING_TASKS.MAX.HIGH}`\|Average

LLD rule Tables

Name	Description	Type	Key and additional info
Tables	Info about keyspaces and tables	JMX agent	jmx.discovery[beans,"org.apache.cassandra.metrics:type=Table,keyspace=,scope=,name=ReadLatency"]

Item prototypes for Tables

Name	Description	Type	Key and additional info
{#JMXKEYSPACE}.{#JMXSCOPE}: SS Tables per read 75 percentile	The number of SSTable data files accessed per read - p75.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=SSTablesPerReadHistogram","75thPercentile"]
{#JMXKEYSPACE}.{#JMXSCOPE}: SS Tables per read 95 percentile	The number of SSTable data files accessed per read - p95.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=SSTablesPerReadHistogram","95thPercentile"]
{#JMXKEYSPACE}.{#JMXSCOPE}: Tombstone scanned 75 percentile	Number of tombstones scanned per read - p75.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=TombstoneScannedHistogram","75thPercentile"]
{#JMXKEYSPACE}.{#JMXSCOPE}: Tombstone scanned 95 percentile	Number of tombstones scanned per read - p95.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=TombstoneScannedHistogram","95thPercentile"]
{#JMXKEYSPACE}.{#JMXSCOPE}: Waiting on free memtable space 75 percentile	The time spent waiting for free memtable space either on- or off-heap - p75.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=WaitingOnFreeMemtableSpace","75thPercentile"] Preprocessing Custom multiplier: `0.001`
{#JMXKEYSPACE}.{#JMXSCOPE}: Waiting on free memtable space95 percentile	The time spent waiting for free memtable space either on- or off-heap - p95.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=WaitingOnFreeMemtableSpace","95thPercentile"] Preprocessing Custom multiplier: `0.001`
{#JMXKEYSPACE}.{#JMXSCOPE}: Col update time delta75 percentile	The column update time delta - p75.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=ColUpdateTimeDeltaHistogram","75thPercentile"] Preprocessing Custom multiplier: `0.001`
{#JMXKEYSPACE}.{#JMXSCOPE}: Col update time delta 95 percentile	The column update time delta - p95.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=ColUpdateTimeDeltaHistogram","95thPercentile"] Preprocessing Custom multiplier: `0.001`
{#JMXKEYSPACE}.{#JMXSCOPE}: Bloom filter false ratio	The ratio of Bloom filter false positives to total checks.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=BloomFilterFalseRatio","Value"]
{#JMXKEYSPACE}.{#JMXSCOPE}: Compression ratio	The compression ratio for all SSTables.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=CompressionRatio","Value"]
{#JMXKEYSPACE}.{#JMXSCOPE}: KeyCache hit rate	The key cache hit rate.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=KeyCacheHitRate","Value"]
{#JMXKEYSPACE}.{#JMXSCOPE}: Live SS Table	Number of "live" (in use) SSTables.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=LiveSSTableCount","Value"]
{#JMXKEYSPACE}.{#JMXSCOPE}: Max partition size	The size of the largest compacted partition.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=MaxPartitionSize","Value"]
{#JMXKEYSPACE}.{#JMXSCOPE}: Mean partition size	The average size of compacted partition.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=MeanPartitionSize","Value"]
{#JMXKEYSPACE}.{#JMXSCOPE}: Pending compactions	The number of pending compactions.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=PendingCompactions","Value"]
{#JMXKEYSPACE}.{#JMXSCOPE}: Snapshots size	The disk space truly used by snapshots.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=SnapshotsSize","Value"]
{#JMXKEYSPACE}.{#JMXSCOPE}: Compaction bytes written	The amount of data that was compacted since (re)start.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=CompactionBytesWritten","Count"]
{#JMXKEYSPACE}.{#JMXSCOPE}: Bytes flushed	The amount of data that was flushed since (re)start.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=BytesFlushed","Count"]
{#JMXKEYSPACE}.{#JMXSCOPE}: Pending flushes	The number of pending flushes.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=PendingFlushes","Count"]
{#JMXKEYSPACE}.{#JMXSCOPE}: Live disk space used	The disk space used by "live" SSTables (only counts in use files).	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=LiveDiskSpaceUsed","Count"]
{#JMXKEYSPACE}.{#JMXSCOPE}: Disk space used	Disk space used.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=TotalDiskSpaceUsed","Count"]
{#JMXKEYSPACE}.{#JMXSCOPE}: Out of row cache hits	The number of row cache hits that do not satisfy the query filter and went to disk.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=RowCacheHitOutOfRange","Count"]
{#JMXKEYSPACE}.{#JMXSCOPE}: Row cache hits	The number of row cache hits.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=RowCacheHit","Count"]
{#JMXKEYSPACE}.{#JMXSCOPE}: Row cache misses	The number of table row cache misses.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=RowCacheMiss","Count"]
{#JMXKEYSPACE}.{#JMXSCOPE}: Read latency 75 percentile	Latency read from disk in milliseconds.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=ReadLatency","75thPercentile"] Preprocessing Custom multiplier: `0.001`
{#JMXKEYSPACE}.{#JMXSCOPE}: Read latency 95 percentile	Latency read from disk in milliseconds.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=ReadLatency","95thPercentile"] Preprocessing Custom multiplier: `0.001`
{#JMXKEYSPACE}.{#JMXSCOPE}: Read per second	The number of client requests per second.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=ReadLatency","Count"] Preprocessing Change per second
{#JMXKEYSPACE}.{#JMXSCOPE}: Write latency 75 percentile	Latency write to disk in milliseconds.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=WriteLatency","75thPercentile"] Preprocessing Custom multiplier: `0.001`
{#JMXKEYSPACE}.{#JMXSCOPE}: Write latency 95 percentile	Latency write to disk in milliseconds.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=WriteLatency","95thPercentile"] Preprocessing Custom multiplier: `0.001`
{#JMXKEYSPACE}.{#JMXSCOPE}: Write per second	The number of local write requests per second.	JMX agent	jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=WriteLatency","Count"] Preprocessing Change per second

Feedback

Please report any issues with the template at https://support.zabbix.com

You can also provide feedback, discuss the template, or ask for help at ZABBIX forums