This template is designed for the deployment of YugabyteDB monitoring by Zabbix via HTTP and doesn't require any external scripts.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Set your account ID as a value of the {$YUGABYTEDB.ACCOUNT.ID} macro. The account ID is the unique identifier for your customer account in YugabyteDB Managed. You can access the account ID from your profile in the YugabyteDB Managed user interface. To get your account ID, log in to YugabyteDB Managed and click the user profile icon. See YugabyteDB documentation for instructions.
Set your project ID as a value of the {$YUGABYTEDB.PROJECT.ID} macro. The project ID is the unique identifier for a YugabyteDB Managed project. You can access the project ID from your profile in the YugabyteDB Managed user interface (along with the account ID). See YugabyteDB documentation for instructions.
Generate the API access token and specify it as a value of the {$YUGABYTEDB.ACCESS.TOKEN} macro. See YugabyteDB documentation for instructions.
NOTE If needed, you can specify a HTTP proxy for the template to use by changing the value of the {$YUGABYTEDB.PROXY} user macro.
IMPORTANT
The value of the {$YUGABYTEDB.ACCESS.TOKEN} macro is stored as plain (not secret) text by default.
Name | Description | Default |
---|---|---|
{$YUGABYTEDB.ACCOUNT.ID} | YugabyteDB account ID. |
<Put your account ID here> |
{$YUGABYTEDB.PROJECT.ID} | YugabyteDB project ID. |
<Put your project ID here> |
{$YUGABYTEDB.ACCESS.TOKEN} | Access token for the YugabyteDB API. |
<Put your access token here> |
{$YUGABYTEDB.PROXY} | Sets the HTTP proxy value. If this macro is empty, then no proxy is used. |
Name | Description | Type | Key and additional info |
---|---|---|---|
YugabyteDB: Get cluster | Get raw data about clusters. |
Script | yugabytedb.clusters.get |
YugabyteDB: Get clusters item error | Item for gathering all the cluster item errors. |
Dependent item | yugabytedb.clusters.get.errors Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
YugabyteDB: Failed to fetch data | Failed to fetch data about cluster. |
length(last(/YugabyteDB by HTTP/yugabytedb.clusters.get.errors)) > 0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster discovery | Discovery of the available clusters. |
Dependent item | yugabytedb.cluster.discovery Preprocessing
|
Name | Description | Default |
---|---|---|
{$YUGABYTEDB.CLUSTER.NAME} | Name of cluster. |
<Put your cluster name here> |
{$YUGABYTEDB.CLUSTER.ID} | ID of cluster. |
<Put your cluster ID here> |
{$YUGABYTEDB.MEMORY.CLUSTER.UTILIZATION.WARN} | The percentage of memory use on the cluster - for the Warning trigger expression. |
70 |
{$YUGABYTEDB.MEMORY.CLUSTER.UTILIZATION.CRIT} | The percentage of memory use on the cluster - for the High trigger expression. |
90 |
{$YUGABYTEDB.DISK.UTILIZATION.WARN} | The percentage of disk use in the cluster - for the Warning trigger expression. |
75 |
{$YUGABYTEDB.DISK.UTILIZATION.CRIT} | The percentage of disk use in the cluster - for the High trigger expression. |
90 |
{$YUGABYTEDB.CONNECTION.UTILIZATION.WARN} | The percentage of connections in the cluster - for the Warning trigger expression. |
75 |
{$YUGABYTEDB.CONNECTION.UTILIZATION.CRIT} | The percentage of connections in the cluster - for the High trigger expression. |
90 |
{$YUGABYTEDB.CPU.UTILIZATION.CRIT} | The threshold of CPU utilization for the High trigger expression, expressed in percent. |
90 |
{$YUGABYTEDB.CPU.UTILIZATION.WARN} | The threshold of CPU utilization for the Warning trigger expression, expressed in percent. |
75 |
{$YUGABYTEDB.IOPS.UTILIZATION.WARN} | The percentage of IOPS use on the node - for the Warning trigger expression. |
75 |
{$YUGABYTEDB.IOPS.UTILIZATION.CRIT} | The percentage of IOPS use on the node - for the High trigger expression. |
90 |
{$YUGABYTEDB.PROXY} | Sets the HTTP proxy value. If this macro is empty, then no proxy is used. |
Name | Description | Type | Key and additional info |
---|---|---|---|
YugabyteDB Cluster: Get cluster | Get raw data about clusters. |
Script | yugabytedb.cluster.get |
YugabyteDB Cluster: Get cluster item error | Item for gathering all the cluster item errors. |
Dependent item | yugabytedb.cluster.get.errors Preprocessing
|
YugabyteDB Cluster: Get keyspace | Get raw data about keyspaces. |
Script | yugabytedb.keyspace.get |
YugabyteDB Cluster: Get keyspace item error | Item for gathering all the keyspace item errors. |
Dependent item | yugabytedb.keyspace.get.errors Preprocessing
|
YugabyteDB Cluster: Get node | Get raw data about nodes. |
Script | yugabytedb.node.get |
YugabyteDB Cluster: Get node item error | Item for gathering all the node item errors. |
Dependent item | yugabytedb.node.get.errors Preprocessing
|
YugabyteDB Cluster: Get cluster metrics | Getting metrics for the cluster. |
Script | yugabytedb.cluster.metric.get |
YugabyteDB Cluster: Get cluster metrics item error | Item for gathering all the cluster item errors. |
Dependent item | yugabytedb.cluster.metric.get.errors Preprocessing
|
YugabyteDB Cluster: Get cluster query statistic | Getting SQL statistics for the cluster. |
Script | yugabytedb.cluster.query.statistic.get |
YugabyteDB Cluster: Get cluster query statistic item error | Item for gathering all the cluster query statistics item errors. |
Dependent item | yugabytedb.cluster.query.statistic.get.errors Preprocessing
|
YugabyteDB Cluster: State | The current state of the cluster. One of the following: - INVALID - QUEUED - INIT - BOOTSTRAPPING - VPCPEERING - NETWORKCREATING - PROVISIONING - CONFIGURING - CREATINGLB - UPDATINGLB - ACTIVE - PAUSING - PAUSED - RESUMING - UPDATING - MAINTENANCE - RESTORE - FAILED - CREATEFAILED - DELETING - STARTINGNODE - STOPPINGNODE - REBOOTINGNODE - CREATEREADREPLICAFAILED - DELETEREADREPLICAFAILED - DELETECLUSTERFAILED - EDITCLUSTERFAILED - EDITREADREPLICAFAILED - PAUSECLUSTERFAILED - RESUMECLUSTERFAILED - RESTOREBACKUPFAILED - CERTIFICATEROTATIONFAILED - UPGRADECLUSTERFAILED - UPGRADECLUSTERGFLAGSFAILED - UPGRADECLUSTEROSFAILED - UPGRADECLUSTERSOFTWAREFAILED - STARTNODEFAILED - STOPNODEFAILED - REBOOTNODEFAILED - CONFIGURECMK - ENABLINGCMK - DISABLINGCMK - UPDATINGCMK - ROTATINGCMK - STOPPINGMETRICSEXPORTER - STARTINGMETRICSEXPORTER - CONFIGURINGMETRICSEXPORTER - STOPMETRICSEXPORTERFAILED - STARTMETRICSEXPORTERFAILED - CONFIGUREMETRICSEXPORTERFAILED - REMOVINGMETRICSEXPORTER - REMOVEMETRICSEXPORTER_FAILED |
Dependent item | yugabytedb.cluster.state Preprocessing
|
YugabyteDB Cluster: Type | The kind of cluster deployment: SYNCHRONOUS or GEO_PARTITIONED. |
Dependent item | yugabytedb.cluster.type Preprocessing
|
YugabyteDB Cluster: Number of nodes | How many nodes are in the cluster. |
Dependent item | yugabytedb.cluster.node.number Preprocessing
|
YugabyteDB Cluster: Software version | The current version of YugabyteDB installed on the cluster. |
Dependent item | yugabytedb.cluster.software.version Preprocessing
|
YugabyteDB Cluster: YB controller version | The current version of the YB controller installed on the cluster. |
Dependent item | yugabytedb.cluster.ybc.version Preprocessing
|
YugabyteDB Cluster: Health state | Current state regarding the health of the cluster: - HEALTHY - NEEDS_ATTENTION - UNHEALTHY - UNKNOWN |
Dependent item | yugabytedb.cluster.health.state Preprocessing
|
YugabyteDB Cluster: CPU utilization | The percentage of CPU use being consumed by the tablet or master server Yugabyte processes, as well as other processes, if any. |
Dependent item | yugabytedb.cluster.cpu.utilization Preprocessing
|
YugabyteDB Cluster: Disk space usage | Shows the amount of disk space used by the cluster. |
Dependent item | yugabytedb.cluster.disk.usage Preprocessing
|
YugabyteDB Cluster: Disk space provisioned | Shows the amount of disk space provisioned for the cluster. |
Dependent item | yugabytedb.cluster.disk.provisioned Preprocessing
|
YugabyteDB Cluster: Disk space utilization | Shows the percentage of disk space used by the cluster. |
Calculated | yugabytedb.cluster.disk.utilization |
YugabyteDB Cluster: Disk read, Bps | The number of bytes being read from disk per second, averaged over each node. |
Dependent item | yugabytedb.cluster.disk.read.bps Preprocessing
|
YugabyteDB Cluster: Disk write, Bps | The number of bytes being written to disk per second, averaged over each node. |
Dependent item | yugabytedb.cluster.disk.write.bps Preprocessing
|
YugabyteDB Cluster: Disk read OPS | The number of read operations per second. |
Dependent item | yugabytedb.cluster.disk.read.ops Preprocessing
|
YugabyteDB Cluster: Disk write OPS | The number of write operations per second. |
Dependent item | yugabytedb.cluster.disk.write.ops Preprocessing
|
YugabyteDB Cluster: Average read latency | The average latency of read operations at the tablet level. |
Dependent item | yugabytedb.cluster.read.latency Preprocessing
|
YugabyteDB Cluster: Average write latency | The average latency of write operations at the tablet level. |
Dependent item | yugabytedb.cluster.write.latency Preprocessing
|
YugabyteDB Cluster: YSQL connections limit | The limit of the number of connections to the YSQL backend for all nodes. |
Dependent item | yugabytedb.cluster.connection.limit Preprocessing
|
YugabyteDB Cluster: YSQL connections average used | Cumulative number of connections to the YSQL backend for all nodes. |
Dependent item | yugabytedb.cluster.connection.count Preprocessing
|
YugabyteDB Cluster: YSQL connections utilization | Cumulative number of connections to the YSQL backend for all nodes, expressed in percent. |
Calculated | yugabytedb.cluster.connection.utilization |
YugabyteDB Cluster: YSQL connections maximum used | Maximum of used connections to the YSQL backend for all nodes. |
Dependent item | yugabytedb.cluster.connection.max Preprocessing
|
YugabyteDB Cluster: Clock skew | The clock drift and skew across different nodes. |
Dependent item | yugabytedb.cluster.node.skew Preprocessing
|
YugabyteDB Cluster: Memory total | Shows the amount of RAM provisioned to the cluster. |
Dependent item | yugabytedb.cluster.memory.total Preprocessing
|
YugabyteDB Cluster: Memory usage | Shows the amount of RAM used on the cluster. |
Dependent item | yugabytedb.cluster.memory.usage Preprocessing
|
YugabyteDB Cluster: Memory utilization | Shows the amount of RAM used on the cluster, expressed in percent. |
Calculated | yugabytedb.cluster.memory.utilization |
YugabyteDB Cluster: Network receive, Bps | The size of network packets received per second, averaged over nodes. |
Dependent item | yugabytedb.cluster.network.receive.bps Preprocessing
|
YugabyteDB Cluster: Network transmit, Bps | The size of network packets transmitted per second, averaged over nodes. |
Dependent item | yugabytedb.cluster.network.transmit.bps Preprocessing
|
YugabyteDB Cluster: Network receive error, rate | The number of errors related to network packets received per second, averaged over nodes. |
Dependent item | yugabytedb.cluster.network.receive.error.rate Preprocessing
|
YugabyteDB Cluster: Network transmit error, rate | The number of errors related to network packets transmitted per second, averaged over nodes. |
Dependent item | yugabytedb.cluster.network.transmit.error.rate Preprocessing
|
YugabyteDB Cluster: YSQL SELECT OPS | The count of SELECT statements executed through the YSQL API per second. This does not include index writes. |
Dependent item | yugabytedb.cluster.ysql.select.ops Preprocessing
|
YugabyteDB Cluster: YSQL DELETE OPS | The count of DELETE statements executed through the YSQL API per second. This does not include index writes. |
Dependent item | yugabytedb.cluster.ysql.delete.ops Preprocessing
|
YugabyteDB Cluster: YSQL UPDATE OPS | The count of UPDATE statements executed through the YSQL API per second. This does not include index writes. |
Dependent item | yugabytedb.cluster.ysql.update.ops Preprocessing
|
YugabyteDB Cluster: YSQL INSERT OPS | The count of INSERT statements executed through the YSQL API per second. This does not include index writes. |
Dependent item | yugabytedb.cluster.ysql.insert.ops Preprocessing
|
YugabyteDB Cluster: YSQL OTHER OPS | The count of OTHER statements executed through the YSQL API per second. |
Dependent item | yugabytedb.cluster.ysql.other.ops Preprocessing
|
YugabyteDB Cluster: YSQL transaction OPS | The count of transactions executed through the YSQL API per second. |
Dependent item | yugabytedb.cluster.ysql.transaction.ops Preprocessing
|
YugabyteDB Cluster: YSQL SELECT average latency | Average time of executing SELECT statements through the YSQL API. |
Dependent item | yugabytedb.cluster.ysql.select.latency Preprocessing
|
YugabyteDB Cluster: YSQL DELETE average latency | Average time of executing DELETE statements through the YSQL API. |
Dependent item | yugabytedb.cluster.ysql.delete.latency Preprocessing
|
YugabyteDB Cluster: YSQL UPDATE average latency | Average time of executing UPDATE statements through the YSQL API. |
Dependent item | yugabytedb.cluster.ysql.update.latency Preprocessing
|
YugabyteDB Cluster: YSQL INSERT average latency | Average time of executing INSERT statements through the YSQL API. |
Dependent item | yugabytedb.cluster.ysql.insert.latency Preprocessing
|
YugabyteDB Cluster: YSQL OTHER average latency | Average time of executing OTHER statements through the YSQL API. |
Dependent item | yugabytedb.cluster.ysql.other.latency Preprocessing
|
YugabyteDB Cluster: YSQL transaction average latency | Average time of executing transactions through the YSQL API. |
Dependent item | yugabytedb.cluster.ysql.transaction.latency Preprocessing
|
YugabyteDB Cluster: YCQL SELECT OPS | The count of SELECT statements executed through the YCQL API per second. This does not include index writes. |
Dependent item | yugabytedb.cluster.ycql.select.ops Preprocessing
|
YugabyteDB Cluster: YCQL DELETE OPS | The count of DELETE statements executed through the YCQL API per second. This does not include index writes. |
Dependent item | yugabytedb.cluster.ycql.delete.ops Preprocessing
|
YugabyteDB Cluster: YCQL INSERT OPS | The count of INSERT statements executed through the YCQL API per second. This does not include index writes. |
Dependent item | yugabytedb.cluster.ycql.insert.ops Preprocessing
|
YugabyteDB Cluster: YCQL OTHER OPS | The count of OTHER statements executed through the YCQL API per second. |
Dependent item | yugabytedb.cluster.ycql.other.ops Preprocessing
|
YugabyteDB Cluster: YCQL UPDATE OPS | The count of UPDATE statements executed through the YCQL API per second. This does not include index writes. |
Dependent item | yugabytedb.cluster.ycql.update.ops Preprocessing
|
YugabyteDB Cluster: YCQL transaction OPS | The count of transactions executed through the YCQL API per second. |
Dependent item | yugabytedb.cluster.ycql.transaction.ops Preprocessing
|
YugabyteDB Cluster: YCQL SELECT average latency | Average time of executing SELECT statements through the YCQL API. |
Dependent item | yugabytedb.cluster.ycql.select.latency Preprocessing
|
YugabyteDB Cluster: YCQL DELETE average latency | Average time of executing DELETE statements through the YCQL API. |
Dependent item | yugabytedb.cluster.ycql.delete.latency Preprocessing
|
YugabyteDB Cluster: YCQL INSERT average latency | Average time of executing INSERT statements through the YCQL API. |
Dependent item | yugabytedb.cluster.ycql.insert.latency Preprocessing
|
YugabyteDB Cluster: YCQL OTHER average latency | Average time of executing OTHER statements through the YCQL API. |
Dependent item | yugabytedb.cluster.ycql.other.latency Preprocessing
|
YugabyteDB Cluster: YCQL UPDATE average latency | Average time of executing UPDATE statements through the YCQL API. |
Dependent item | yugabytedb.cluster.ycql.update.latency Preprocessing
|
YugabyteDB Cluster: YCQL transaction average latency | Average time of executing transactions through the YCQL API. |
Dependent item | yugabytedb.cluster.ycql.transaction.latency Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
YugabyteDB Cluster: Failed to fetch data | Failed to fetch data from the YugabyteDB API. |
length(last(/YugabyteDB Cluster by HTTP/yugabytedb.node.get.errors)) > 0 or length(last(/YugabyteDB Cluster by HTTP/yugabytedb.keyspace.get.errors)) > 0 or length(last(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.get.errors)) > 0 |Warning |
||
YugabyteDB Cluster: Failed to fetch metric data | Failed to fetch cluster metrics or cluster statistics. |
length(last(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.query.statistic.get.errors)) > 0 or length(last(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.metric.get.errors)) > 0 |Warning |
||
YugabyteDB Cluster: Cluster software version has changed | YugabyteDB Cluster software version has changed. Acknowledge to close the problem manually. |
last(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.software.version,#1) <> last(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.software.version,#2) and length(last(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.software.version)) > 0 |Info |
Manual close: Yes | |
YugabyteDB Cluster: YB controller version has changed | YugabyteDB Cluster YB controller version has changed. Acknowledge to close the problem manually. |
last(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.ybc.version,#1) <> last(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.ybc.version,#2) and length(last(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.ybc.version)) > 0 |Info |
Manual close: Yes | |
YugabyteDB Cluster: Cluster is not healthy | YugabyteDB Cluster is not healthy. |
last(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.health.state,#1) <> 0 |Average |
||
YugabyteDB Cluster: CPU utilization is too high | YugabyteDB Cluster CPU utilization is more than {$YUGABYTEDB.CPU.UTILIZATION.CRIT}%. The system might be slow to respond. |
min(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.cpu.utilization,5m) > {$YUGABYTEDB.CPU.UTILIZATION.CRIT} |High |
||
YugabyteDB Cluster: CPU utilization is high | YugabyteDB Cluster CPU utilization is more than {$YUGABYTEDB.CPU.UTILIZATION.WARN}%. The system might be slow to respond. |
min(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.cpu.utilization,5m) > {$YUGABYTEDB.CPU.UTILIZATION.WARN} |Warning |
Depends on:
|
|
YugabyteDB Cluster: Storage space is low | YugabyteDB Cluster uses more than {$YUGABYTEDB.DISK.UTILIZATION.WARN}% of disk space. |
min(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.disk.utilization,5m) > {$YUGABYTEDB.DISK.UTILIZATION.WARN} |Warning |
Depends on:
|
|
YugabyteDB Cluster: Storage space is critically low | YugabyteDB Cluster uses more than {$YUGABYTEDB.DISK.UTILIZATION.CRIT}% of disk space. |
min(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.disk.utilization,5m) > {$YUGABYTEDB.DISK.UTILIZATION.CRIT} |High |
||
YugabyteDB Cluster: Average utilization of connections is high | YugabyteDB Cluster uses more than {$YUGABYTEDB.CONNECTION.UTILIZATION.WARN}% of the connection limit. |
min(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.connection.utilization,5m) > {$YUGABYTEDB.CONNECTION.UTILIZATION.WARN} |Warning |
Depends on:
|
|
YugabyteDB Cluster: Average utilization of connections is too high | YugabyteDB Cluster uses more than {$YUGABYTEDB.CONNECTION.UTILIZATION.CRIT}% of the connection limit. |
min(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.connection.utilization,5m) > {$YUGABYTEDB.CONNECTION.UTILIZATION.CRIT} |High |
||
YugabyteDB Cluster: Memory utilization is high | YugabyteDB Cluster uses more than {$YUGABYTEDB.MEMORY.CLUSTER.UTILIZATION.WARN}% of memory. |
min(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.memory.utilization,5m) > {$YUGABYTEDB.MEMORY.CLUSTER.UTILIZATION.WARN} |Warning |
Depends on:
|
|
YugabyteDB Cluster: Memory utilization is too high | YugabyteDB Cluster uses more than {$YUGABYTEDB.MEMORY.CLUSTER.UTILIZATION.CRIT}% of memory. |
min(/YugabyteDB Cluster by HTTP/yugabytedb.cluster.memory.utilization,5m) > {$YUGABYTEDB.MEMORY.CLUSTER.UTILIZATION.CRIT} |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Keyspace discovery | Discovery of the available keyspaces. |
Dependent item | yugabytedb.keyspace.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
YugabyteDB Keyspace [{#KEYSPACE.NAME}]: Get keyspace info | Get raw data about the keyspace [{#KEYSPACE.NAME}]. |
Dependent item | yugabytedb.keyspace.get[{#KEYSPACE.NAME}] Preprocessing
|
YugabyteDB Keyspace [{#KEYSPACE.NAME}]: SST size | The size of the table's SST. |
Dependent item | yugabytedb.keyspace.sst.size[{#KEYSPACE.NAME}] Preprocessing
|
YugabyteDB Keyspace [{#KEYSPACE.NAME}]: Wal size | The size of the table's WAL. |
Dependent item | yugabytedb.keyspace.wal.size[{#KEYSPACE.NAME}] Preprocessing
|
YugabyteDB Keyspace [{#KEYSPACE.NAME}]: Type | The type of keyspace: YSQL or YCQL. |
Dependent item | yugabytedb.keyspace.type[{#KEYSPACE.NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Node discovery | Discovery of the nodes for all clusters. |
Dependent item | yugabytedb.node.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
YugabyteDB Node [{#NODE.NAME}]: Get node info | Get raw data about the node [{#NODE.NAME}]. |
Dependent item | yugabytedb.node.get[{#NODE.NAME}] Preprocessing
|
YugabyteDB Node [{#NODE.NAME}]: Disk IOPS limit | The IOPS to provision for the node [{#NODE.NAME}] for each disk. |
Dependent item | yugabytedb.node.iops.limit[{#NODE.NAME}] Preprocessing
|
YugabyteDB Node [{#NODE.NAME}]: Total disk size | The disk size (GB) for the node [{#NODE.NAME}]. |
Dependent item | yugabytedb.node.disk.size.total[{#NODE.NAME}] Preprocessing
|
YugabyteDB Node [{#NODE.NAME}]: Total memory, bytes | The amount of RAM for the node [{#NODE.NAME}]. |
Dependent item | yugabytedb.node.memory.total[{#NODE.NAME}] Preprocessing
|
YugabyteDB Node [{#NODE.NAME}]: Total CPU cores | The number of cores for the node [{#NODE.NAME}]. |
Dependent item | yugabytedb.node.cpu.num.cores[{#NODE.NAME}] Preprocessing
|
YugabyteDB Node [{#NODE.NAME}]: Region | The cloud information for the node [{#NODE.NAME}] about the region. |
Dependent item | yugabytedb.node.region[{#NODE.NAME}] Preprocessing
|
YugabyteDB Node [{#NODE.NAME}]: Zone | The cloud information for the node [{#NODE.NAME}] about the zone. |
Dependent item | yugabytedb.node.zone[{#NODE.NAME}] Preprocessing
|
YugabyteDB Node [{#NODE.NAME}]: Total SST file size | The size of all SST files. |
Dependent item | yugabytedb.node.sst.file.size.total[{#NODE.NAME}] Preprocessing
|
YugabyteDB Node [{#NODE.NAME}]: Uncompressed SST file size | The size of uncompressed SST files. |
Dependent item | yugabytedb.node.sst.file.size.uncompressed[{#NODE.NAME}] Preprocessing
|
YugabyteDB Node [{#NODE.NAME}]: Read OPS | The amount of read operations per second for the node [{#NODE.NAME}]. |
Dependent item | yugabytedb.node.read.ops[{#NODE.NAME}] Preprocessing
|
YugabyteDB Node [{#NODE.NAME}]: Write OPS | The amount of write operations per second for the node [{#NODE.NAME}]. |
Dependent item | yugabytedb.node.write.ops[{#NODE.NAME}] Preprocessing
|
YugabyteDB Node [{#NODE.NAME}]: Disk IOPS utilization | Shows the utilization of provisioned IOPS. |
Calculated | yugabytedb.node.iops.utilization[{#NODE.NAME}] |
YugabyteDB Node [{#NODE.NAME}]: Node status | The current status of the node [{#NODE.NAME}]: 0 = Down 1 = Up |
Dependent item | yugabytedb.node.status[{#NODE.NAME}] Preprocessing
|
YugabyteDB Node [{#NODE.NAME}]: Node is master | The current role of the node [{#NODE.NAME}]: 0 = False 1 = True |
Dependent item | yugabytedb.node.master[{#NODE.NAME}] Preprocessing
|
YugabyteDB Node [{#NODE.NAME}]: Node is TServer | This item indicates if the node [{#NODE.NAME}] is a TServer node: 0 = False 1 = True |
Dependent item | yugabytedb.node.tserver[{#NODE.NAME}] Preprocessing
|
YugabyteDB Node [{#NODE.NAME}]: Node is read replica | This item indicates if the node [{#NODE.NAME}] is a read replica: 0 = False 1 = True |
Dependent item | yugabytedb.node.read.replica[{#NODE.NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
YugabyteDB Node [{#NODE.NAME}]: Node disk IOPS utilization is high | IOPS utilization on the node [{#NODE.NAME}] is more than {$YUGABYTEDB.IOPS.UTILIZATION.WARN}% of the provisioned IOPS. |
min(/YugabyteDB Cluster by HTTP/yugabytedb.node.iops.utilization[{#NODE.NAME}],5m) > {$YUGABYTEDB.IOPS.UTILIZATION.WARN} |Warning |
Depends on:
|
|
YugabyteDB Node [{#NODE.NAME}]: Node disk IOPS utilization is too high | IOPS utilization on the node [{#NODE.NAME}] is more than {$YUGABYTEDB.IOPS.UTILIZATION.CRIT}% of the provisioned IOPS. |
min(/YugabyteDB Cluster by HTTP/yugabytedb.node.iops.utilization[{#NODE.NAME}],5m) > {$YUGABYTEDB.IOPS.UTILIZATION.CRIT} |High |
||
YugabyteDB Node [{#NODE.NAME}]: Node is down | The node [{#NODE.NAME}] is down. |
max(/YugabyteDB Cluster by HTTP/yugabytedb.node.status[{#NODE.NAME}],3m) = 0 |Average |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor TiKV server of TiDB cluster by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template TiDB TiKV by HTTP
— collects metrics by HTTP agent from TiKV /metrics endpoint.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template works with TiKV server of TiDB cluster. Internal service metrics are collected from TiKV /metrics endpoint. Don't forget to change the macros {$TIKV.URL}, {$TIKV.PORT}. Also, see the Macros section for a list of macros used to set trigger values.
Name | Description | Default |
---|---|---|
{$TIKV.PORT} | The port of TiKV server metrics web endpoint |
20180 |
{$TIKV.URL} | TiKV server URL |
localhost |
{$TIKV.COPROCESSOR.ERRORS.MAX.WARN} | Maximum number of coprocessor request errors |
1 |
{$TIKV.STORE.ERRORS.MAX.WARN} | Maximum number of failure messages |
1 |
{$TIKV.PENDING_COMMANDS.MAX.WARN} | Maximum number of pending commands |
1 |
{$TIKV.PENDING_TASKS.MAX.WARN} | Maximum number of tasks currently running by the worker or pending |
1 |
Name | Description | Type | Key and additional info |
---|---|---|---|
TiKV: Get instance metrics | Get TiKV instance metrics. |
HTTP agent | tikv.get_metrics Preprocessing
|
TiKV: Store size | The storage size of TiKV instance. |
Dependent item | tikv.engine_size Preprocessing
|
TiKV: Get store size metrics | Get capacity metrics of TiKV instance. |
Dependent item | tikv.store_size.metrics Preprocessing
|
TiKV: Available size | The available capacity of TiKV instance. |
Dependent item | tikv.store_size.available Preprocessing
|
TiKV: Capacity size | The capacity size of TiKV instance. |
Dependent item | tikv.store_size.capacity Preprocessing
|
TiKV: Bytes read | The total bytes of read in TiKV instance. |
Dependent item | tikv.engineflowbytes.read Preprocessing
|
TiKV: Bytes write | The total bytes of write in TiKV instance. |
Dependent item | tikv.engineflowbytes.write Preprocessing
|
TiKV: Storage: commands total, rate | Total number of commands received per second. |
Dependent item | tikv.storage_command.rate Preprocessing
|
TiKV: CPU util | The CPU usage ratio on TiKV instance. |
Dependent item | tikv.cpu.util Preprocessing
|
TiKV: RSS memory usage | Resident memory size in bytes. |
Dependent item | tikv.rss_bytes Preprocessing
|
TiKV: Regions, count | The number of regions collected in TiKV instance. |
Dependent item | tikv.region_count Preprocessing
|
TiKV: Regions, leader | The number of leaders in TiKV instance. |
Dependent item | tikv.region_leader Preprocessing
|
TiKV: Get QPS metrics | Get QPS metrics in TiKV instance. |
Dependent item | tikv.grpc_msgs.metrics Preprocessing
|
TiKV: Total query, rate | The total QPS in TiKV instance. |
Dependent item | tikv.grpc_msg.rate Preprocessing
|
TiKV: Total query errors, rate | The total number of gRPC message handling failure per second. |
Dependent item | tikv.grpcmsgfail.rate Preprocessing
|
TiKV: Coprocessor: Errors, rate | Total number of push down request error per second. |
Dependent item | tikv.coprocessorrequesterror.rate Preprocessing
|
TiKV: Get coprocessor requests metrics | Get metrics of coprocessor requests. |
Dependent item | tikv.coprocessor_requests.metrics Preprocessing
|
TiKV: Coprocessor: Requests, rate | Total number of coprocessor requests per second. |
Dependent item | tikv.coprocessor_request.rate Preprocessing
|
TiKV: Coprocessor: Scan keys, rate | Total number of scan keys observed per request per second. |
Dependent item | tikv.coprocessorscankeys_sum.rate Preprocessing
|
TiKV: Coprocessor: RocksDB ops, rate | Total number of RocksDB internal operations from PerfContext per second. |
Dependent item | tikv.coprocessorrocksdbperf.rate Preprocessing
|
TiKV: Coprocessor: Response size, rate | The total size of coprocessor response per second. |
Dependent item | tikv.coprocessorresponsebytes.rate Preprocessing
|
TiKV: Scheduler: Pending commands | The total number of pending commands. The scheduler receives commands from clients, executes them against the MVCC layer storage engine. |
Dependent item | tikv.scheduler_contex Preprocessing
|
TiKV: Scheduler: Busy, rate | The total count of too busy schedulers per second. |
Dependent item | tikv.schedulertoobusy.rate Preprocessing
|
TiKV: Get scheduler metrics | Get metrics of scheduler commands. |
Dependent item | tikv.scheduler.metrics Preprocessing
|
TiKV: Scheduler: Commands total, rate | Total number of commands per second. |
Dependent item | tikv.scheduler_commands.rate Preprocessing
|
TiKV: Scheduler: Low priority commands total, rate | Total count of low priority commands per second. |
Dependent item | tikv.commands_pri.low.rate Preprocessing
|
TiKV: Scheduler: Normal priority commands total, rate | Total count of normal priority commands per second. |
Dependent item | tikv.commands_pri.normal.rate Preprocessing
|
TiKV: Scheduler: High priority commands total, rate | Total count of high priority commands per second. |
Dependent item | tikv.commands_pri.high.rate Preprocessing
|
TiKV: Snapshot: Pending tasks | The number of tasks currently running by the worker or pending. |
Dependent item | tikv.workerpendingtask Preprocessing
|
TiKV: Snapshot: Sending | The total amount of raftstore snapshot traffic. |
Dependent item | tikv.snapshot.sending Preprocessing
|
TiKV: Snapshot: Receiving | The total amount of raftstore snapshot traffic. |
Dependent item | tikv.snapshot.receiving Preprocessing
|
TiKV: Snapshot: Applying | The total amount of raftstore snapshot traffic. |
Dependent item | tikv.snapshot.applying Preprocessing
|
TiKV: Uptime | The runtime of each TiKV instance. |
Dependent item | tikv.uptime Preprocessing
|
TiKV: Get failure msg metrics | Get metrics of reporting failure messages. |
Dependent item | tikv.messages.failure.metrics Preprocessing
|
TiKV: Server: failure messages total, rate | Total number of reporting failure messages per second. |
Dependent item | tikv.messages.failure.rate Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
TiKV: Too many coprocessor request error | min(/TiDB TiKV by HTTP/tikv.coprocessor_request_error.rate,5m)>{$TIKV.COPROCESSOR.ERRORS.MAX.WARN} |Warning |
|||
TiKV: Too many pending commands | min(/TiDB TiKV by HTTP/tikv.scheduler_contex,5m)>{$TIKV.PENDING_COMMANDS.MAX.WARN} |Average |
|||
TiKV: Too many pending tasks | min(/TiDB TiKV by HTTP/tikv.worker_pending_task,5m)>{$TIKV.PENDING_TASKS.MAX.WARN} |Average |
|||
TiKV: has been restarted | Uptime is less than 10 minutes. |
last(/TiDB TiKV by HTTP/tikv.uptime)<10m |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
QPS metrics discovery | Discovery QPS metrics. |
Dependent item | tikv.qps.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
TiKV: Query: {#TYPE}, rate | The QPS per command in TiKV instance. |
Dependent item | tikv.grpc_msg.rate[{#TYPE}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Coprocessor metrics discovery | Discovery coprocessor metrics. |
Dependent item | tikv.coprocessor.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
TiKV: Coprocessor: {#REQ_TYPE} metrics | Get metrics of {#REQ_TYPE} requests. |
Dependent item | tikv.coprocessorrequest.metrics[{#REQTYPE}] Preprocessing
|
TiKV: Coprocessor: {#REQ_TYPE} errors, rate | Total number of push down request error per second. |
Dependent item | tikv.coprocessorrequesterror.rate[{#REQ_TYPE}] Preprocessing
|
TiKV: Coprocessor: {#REQ_TYPE} requests, rate | Total number of coprocessor requests per second. |
Dependent item | tikv.coprocessorrequest.rate[{#REQTYPE}] Preprocessing
|
TiKV: Coprocessor: {#REQ_TYPE} scan keys, rate | Total number of scan keys observed per request per second. |
Dependent item | tikv.coprocessorscankeys.rate[{#REQ_TYPE}] Preprocessing
|
TiKV: Coprocessor: {#REQ_TYPE} RocksDB ops, rate | Total number of RocksDB internal operations from PerfContext per second. |
Dependent item | tikv.coprocessorrocksdbperf.rate[{#REQ_TYPE}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Scheduler metrics discovery | Discovery scheduler metrics. |
Dependent item | tikv.scheduler.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
TiKV: Scheduler: commands {#STAGE}, rate | Total number of commands on each stage per second. |
Dependent item | tikv.scheduler_stage.rate[{#STAGE}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Server errors discovery | Discovery server errors metrics. |
Dependent item | tikv.serverreportfailure.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
TiKV: Storeid {#STOREID}: failure messages "{#TYPE}", rate | Total number of reporting failure messages. The metric has two labels: type and storeid. type represents the failure type, and storeid represents the destination peer store id. |
Dependent item | tikv.messages.failure.rate[{#STORE_ID},{#TYPE}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
TiKV: Storeid {#STOREID}: Too many failure messages "{#TYPE}" | Indicates that the remote TiKV cannot be connected. |
min(/TiDB TiKV by HTTP/tikv.messages.failure.rate[{#STORE_ID},{#TYPE}],5m)>{$TIKV.STORE.ERRORS.MAX.WARN} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor TiDB server of TiDB cluster by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template TiDB by HTTP
— collects metrics by HTTP agent from PD /metrics endpoint and from monitoring API.
See https://docs.pingcap.com/tidb/stable/tidb-monitoring-api.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template works with TiDB server of TiDB cluster. Internal service metrics are collected from TiDB /metrics endpoint and from monitoring API. See https://docs.pingcap.com/tidb/stable/tidb-monitoring-api. Don't forget to change the macros {$TIDB.URL}, {$TIDB.PORT}. Also, see the Macros section for a list of macros used to set trigger values.
Name | Description | Default |
---|---|---|
{$TIDB.PORT} | The port of TiDB server metrics web endpoint |
10080 |
{$TIDB.URL} | TiDB server URL |
localhost |
{$TIDB.OPEN.FDS.MAX.WARN} | Maximum percentage of used file descriptors |
90 |
{$TIDB.HEAP.USAGE.MAX.WARN} | Maximum heap memory used |
10G |
{$TIDB.DDL.WAITING.MAX.WARN} | Maximum number of DDL tasks that are waiting |
5 |
{$TIDB.TIMEJUMPBACK.MAX.WARN} | Maximum number of times that the operating system rewinds every second |
1 |
{$TIDB.SCHEMALEASEERRORS.MAX.WARN} | Maximum number of schema lease errors |
0 |
{$TIDB.SCHEMALOADERRORS.MAX.WARN} | Maximum number of load schema errors |
1 |
{$TIDB.GC_ACTIONS.ERRORS.MAX.WARN} | Maximum number of GC-related operations failures |
1 |
{$TIDB.REGION_ERROR.MAX.WARN} | Maximum number of region related errors |
50 |
{$TIDB.MONITORKEEPALIVE.MAX.WARN} | Minimum number of keep alive operations |
10 |
Name | Description | Type | Key and additional info |
---|---|---|---|
TiDB: Get instance metrics | Get TiDB instance metrics. |
HTTP agent | tidb.get_metrics Preprocessing
|
TiDB: Get instance status | Get TiDB instance status info. |
HTTP agent | tidb.get_status Preprocessing
|
TiDB: Status | Status of PD instance. |
Dependent item | tidb.status Preprocessing
|
TiDB: Get total server query metrics | Get information about server queries. |
Dependent item | tidb.serverquery.getmetrics Preprocessing
|
TiDB: Total "error" server query, rate | The number of queries on TiDB instance per second with failure of command execution results. |
Dependent item | tidb.server_query.error.rate Preprocessing
|
TiDB: Total "ok" server query, rate | The number of queries on TiDB instance per second with success of command execution results. |
Dependent item | tidb.server_query.ok.rate Preprocessing
|
TiDB: Total server query, rate | The number of queries per second on TiDB instance. |
Dependent item | tidb.server_query.rate Preprocessing
|
TiDB: Get SQL statements metrics | Get SQL statements metrics. |
Dependent item | tidb.statementtotal.getmetrics Preprocessing
|
TiDB: SQL statements, rate | The total number of SQL statements executed per second. |
Dependent item | tidb.statement_total.rate Preprocessing
|
TiDB: Failed Query, rate | The number of error occurred when executing SQL statements per second (such as syntax errors and primary key conflicts). |
Dependent item | tidb.execute_error.rate Preprocessing
|
TiDB: Get TiKV client metrics | Get TiKV client metrics. |
Dependent item | tidb.tikvclient.get_metrics Preprocessing
|
TiDB: KV commands, rate | The number of executed KV commands per second. |
Dependent item | tidb.tikvclient_txn.rate Preprocessing
|
TiDB: PD TSO commands, rate | The number of TSO commands that TiDB obtains from PD per second. |
Dependent item | tidb.pdtsocmd.rate Preprocessing
|
TiDB: PD TSO requests, rate | The number of TSO requests that TiDB obtains from PD per second. |
Dependent item | tidb.pdtsorequest.rate Preprocessing
|
TiDB: TiClient region errors, rate | The number of region related errors returned by TiKV per second. |
Dependent item | tidb.tikvclientregionerr.rate Preprocessing
|
TiDB: Lock resolves, rate | The number of DDL tasks that are waiting. |
Dependent item | tidb.tikvclientlockresolver_action.rate Preprocessing
|
TiDB: DDL waiting jobs | The number of TiDB operations that resolve locks per second. When TiDB's read or write request encounters a lock, it tries to resolve the lock. |
Dependent item | tidb.ddlwaitingjobs Preprocessing
|
TiDB: Load schema total, rate | The statistics of the schemas that TiDB obtains from TiKV per second. |
Dependent item | tidb.domainloadschema.rate Preprocessing
|
TiDB: Load schema failed, rate | The total number of failures to reload the latest schema information in TiDB per second. |
Dependent item | tidb.domainloadschema.failed.rate Preprocessing
|
TiDB: Schema lease "outdate" errors , rate | The number of schema lease errors per second. "outdate" errors means that the schema cannot be updated, which is a more serious error and triggers an alert. |
Dependent item | tidb.sessionschemalease_error.outdate.rate Preprocessing
|
TiDB: Schema lease "change" errors, rate | The number of schema lease errors per second. "change" means that the schema has changed |
Dependent item | tidb.sessionschemalease_error.change.rate Preprocessing
|
TiDB: KV backoff, rate | The number of errors returned by TiKV. |
Dependent item | tidb.tikvclient_backoff.rate Preprocessing
|
TiDB: Keep alive, rate | The number of times that the metrics are refreshed on TiDB instance per minute. |
Dependent item | tidb.monitorkeepalive.rate Preprocessing
|
TiDB: Server connections | The connection number of current TiDB instance. |
Dependent item | tidb.tidbserverconnections Preprocessing
|
TiDB: Heap memory usage | Number of heap bytes that are in use. |
Dependent item | tidb.heap_bytes Preprocessing
|
TiDB: RSS memory usage | Resident memory size in bytes. |
Dependent item | tidb.rss_bytes Preprocessing
|
TiDB: Goroutine count | The number of Goroutines on TiDB instance. |
Dependent item | tidb.goroutines Preprocessing
|
TiDB: Open file descriptors | Number of open file descriptors. |
Dependent item | tidb.processopenfds Preprocessing
|
TiDB: Open file descriptors, max | Maximum number of open file descriptors. |
Dependent item | tidb.processmaxfds Preprocessing
|
TiDB: CPU | Total user and system CPU usage ratio. |
Dependent item | tidb.cpu.util Preprocessing
|
TiDB: Uptime | The runtime of each TiDB instance. |
Dependent item | tidb.uptime Preprocessing
|
TiDB: Version | Version of the TiDB instance. |
Dependent item | tidb.version Preprocessing
|
TiDB: Time jump back, rate | The number of times that the operating system rewinds every second. |
Dependent item | tidb.monitortimejump_back.rate Preprocessing
|
TiDB: Server critical error, rate | The number of critical errors occurred in TiDB per second. |
Dependent item | tidb.tidbservercriticalerrortotal.rate Preprocessing
|
TiDB: Server panic, rate | The number of panics occurred in TiDB per second. |
Dependent item | tidb.tidbserverpanic_total.rate Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
TiDB: Instance is not responding | last(/TiDB by HTTP/tidb.status)=0 |Average |
|||
TiDB: Too many region related errors | min(/TiDB by HTTP/tidb.tikvclient_region_err.rate,5m)>{$TIDB.REGION_ERROR.MAX.WARN} |Average |
|||
TiDB: Too many DDL waiting jobs | min(/TiDB by HTTP/tidb.ddl_waiting_jobs,5m)>{$TIDB.DDL.WAITING.MAX.WARN} |Warning |
|||
TiDB: Too many schema lease errors | min(/TiDB by HTTP/tidb.domain_load_schema.failed.rate,5m)>{$TIDB.SCHEMA_LOAD_ERRORS.MAX.WARN} |Average |
|||
TiDB: Too many schema lease errors | The latest schema information is not reloaded in TiDB within one lease. |
min(/TiDB by HTTP/tidb.session_schema_lease_error.outdate.rate,5m)>{$TIDB.SCHEMA_LEASE_ERRORS.MAX.WARN} |Average |
||
TiDB: Too few keep alive operations | Indicates whether the TiDB process still exists. If the number of times for tidbmonitorkeepalivetotal increases less than 10 per minute, the TiDB process might already exit and an alert is triggered. |
max(/TiDB by HTTP/tidb.monitor_keep_alive.rate,5m)<{$TIDB.MONITOR_KEEP_ALIVE.MAX.WARN} |Average |
||
TiDB: Heap memory usage is too high | min(/TiDB by HTTP/tidb.heap_bytes,5m)>{$TIDB.HEAP.USAGE.MAX.WARN} |Warning |
|||
TiDB: Current number of open files is too high | Heavy file descriptor usage (i.e., near the process's file descriptor limit) indicates a potential file descriptor exhaustion issue. |
min(/TiDB by HTTP/tidb.process_open_fds,5m)/last(/TiDB by HTTP/tidb.process_max_fds)*100>{$TIDB.OPEN.FDS.MAX.WARN} |Warning |
||
TiDB: has been restarted | Uptime is less than 10 minutes. |
last(/TiDB by HTTP/tidb.uptime)<10m |Info |
Manual close: Yes | |
TiDB: Version has changed | TiDB version has changed. Acknowledge to close the problem manually. |
last(/TiDB by HTTP/tidb.version,#1)<>last(/TiDB by HTTP/tidb.version,#2) and length(last(/TiDB by HTTP/tidb.version))>0 |Info |
Manual close: Yes | |
TiDB: Too many time jump backs | min(/TiDB by HTTP/tidb.monitor_time_jump_back.rate,5m)>{$TIDB.TIME_JUMP_BACK.MAX.WARN} |Warning |
|||
TiDB: There are panicked TiDB threads | When a panic occurs, an alert is triggered. The thread is often recovered, otherwise, TiDB will frequently restart. |
last(/TiDB by HTTP/tidb.tidb_server_panic_total.rate)>0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
QPS metrics discovery | Discovery QPS specific metrics. |
Dependent item | tidb.qps.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
TiDB: Get QPS metrics: {#TYPE} | Get QPS metrics of {#TYPE}. |
Dependent item | tidb.qps.get_metrics[{#TYPE}] Preprocessing
|
TiDB: Server query "OK": {#TYPE}, rate | The number of queries on TiDB instance per second with success of command execution results. |
Dependent item | tidb.server_query.ok.rate[{#TYPE}] Preprocessing
|
TiDB: Server query "Error": {#TYPE}, rate | The number of queries on TiDB instance per second with failure of command execution results. |
Dependent item | tidb.server_query.error.rate[{#TYPE}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Statement metrics discovery | Discovery statement specific metrics. |
Dependent item | tidb.statement.discover Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
TiDB: SQL statements: {#TYPE}, rate | The number of SQL statements executed per second. |
Dependent item | tidb.statement.rate[{#TYPE}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
KV metrics discovery | Discovery KV specific metrics. |
Dependent item | tidb.kv_ops.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
TiDB: KV Commands: {#TYPE}, rate | The number of executed KV commands per second. |
Dependent item | tidb.tikvclient_txn.rate[{#TYPE}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Lock resolves discovery | Discovery lock resolves specific metrics. |
Dependent item | tidb.tikvclientlockresolver_action.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
TiDB: Lock resolves: {#TYPE}, rate | The number of TiDB operations that resolve locks per second. When TiDB's read or write request encounters a lock, it tries to resolve the lock. |
Dependent item | tidb.tikvclientlockresolver_action.rate[{#TYPE}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
KV backoff discovery | Discovery KV backoff specific metrics. |
Dependent item | tidb.tikvclient_backoff.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
TiDB: KV backoff: {#TYPE}, rate | The number of TiDB operations that resolve locks per second. When TiDB's read or write request encounters a lock, it tries to resolve the lock. |
Dependent item | tidb.tikvclient_backoff.rate[{#TYPE}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
GC action results discovery | Discovery GC action results metrics. |
Dependent item | tidb.tikvclientgcaction.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
TiDB: GC action result: {#TYPE}, rate | The number of results of GC-related operations per second. |
Dependent item | tidb.tikvclientgcaction.rate[{#TYPE}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
TiDB: Too many failed GC-related operations | min(/TiDB by HTTP/tidb.tikvclient_gc_action.rate[{#TYPE}],5m)>{$TIDB.GC_ACTIONS.ERRORS.MAX.WARN} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor PD server of TiDB cluster by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template TiDB PD by HTTP
— collects metrics by HTTP agent from PD /metrics endpoint and from monitoring API.
See https://docs.pingcap.com/tidb/stable/tidb-monitoring-api.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template works with PD server of TiDB cluster. Internal service metrics are collected from PD /metrics endpoint and from monitoring API. See https://docs.pingcap.com/tidb/stable/tidb-monitoring-api. Don't forget to change the macros {$PD.URL}, {$PD.PORT}. Also, see the Macros section for a list of macros used to set trigger values.
Name | Description | Default |
---|---|---|
{$PD.PORT} | The port of PD server metrics web endpoint |
2379 |
{$PD.URL} | PD server URL |
localhost |
{$PD.MISS_REGION.MAX.WARN} | Maximum number of missed regions |
100 |
{$PD.STORAGE_USAGE.MAX.WARN} | Maximum percentage of cluster space used |
80 |
Name | Description | Type | Key and additional info |
---|---|---|---|
PD: Get instance metrics | Get TiDB PD instance metrics. |
HTTP agent | pd.get_metrics Preprocessing
|
PD: Get instance status | Get TiDB PD instance status info. |
HTTP agent | pd.get_status Preprocessing
|
PD: Status | Status of PD instance. |
Dependent item | pd.status Preprocessing
|
PD: gRPC Commands total, rate | The rate at which gRPC commands are completed. |
Dependent item | pd.grpc_command.rate Preprocessing
|
PD: Version | Version of the PD instance. |
Dependent item | pd.version Preprocessing
|
PD: Uptime | The runtime of each PD instance. |
Dependent item | pd.uptime Preprocessing
|
PD: Get cluster metrics | Get cluster metrics. |
Dependent item | pd.clusterstatus.getmetrics Preprocessing
|
PD: Get region metrics | Get region metrics. |
Dependent item | pd.regions.get_metrics Preprocessing
|
PD: Get region label metrics | Get region label metrics. |
Dependent item | pd.regionlabels.getmetrics Preprocessing
|
PD: Get region status metrics | Get region status metrics. |
Dependent item | pd.regionstatus.getmetrics Preprocessing
|
PD: Get gRPC command metrics | Get gRPC command metrics. |
Dependent item | pd.grpccommands.getmetrics Preprocessing
|
PD: Get scheduler metrics | Get scheduler metrics. |
Dependent item | pd.scheduler.get_metrics Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
PD: Instance is not responding | last(/TiDB PD by HTTP/pd.status)=0 |Average |
|||
PD: Version has changed | PD version has changed. Acknowledge to close the problem manually. |
last(/TiDB PD by HTTP/pd.version,#1)<>last(/TiDB PD by HTTP/pd.version,#2) and length(last(/TiDB PD by HTTP/pd.version))>0 |Info |
Manual close: Yes | |
PD: has been restarted | Uptime is less than 10 minutes. |
last(/TiDB PD by HTTP/pd.uptime)<10m |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster metrics discovery | Discovery cluster specific metrics. |
Dependent item | pd.cluster.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
TiDB cluster: Offline stores | Dependent item | pd.clusterstatus.storeoffline[{#SINGLETON}] Preprocessing
|
|
TiDB cluster: Tombstone stores | The count of tombstone stores. |
Dependent item | pd.clusterstatus.storetombstone[{#SINGLETON}] Preprocessing
|
TiDB cluster: Down stores | The count of down stores. |
Dependent item | pd.clusterstatus.storedown[{#SINGLETON}] Preprocessing
|
TiDB cluster: Lowspace stores | The count of low space stores. |
Dependent item | pd.clusterstatus.storelow_space[{#SINGLETON}] Preprocessing
|
TiDB cluster: Unhealth stores | The count of unhealthy stores. |
Dependent item | pd.clusterstatus.storeunhealth[{#SINGLETON}] Preprocessing
|
TiDB cluster: Disconnect stores | The count of disconnected stores. |
Dependent item | pd.clusterstatus.storedisconnected[{#SINGLETON}] Preprocessing
|
TiDB cluster: Normal stores | The count of healthy storage instances. |
Dependent item | pd.clusterstatus.storeup[{#SINGLETON}] Preprocessing
|
TiDB cluster: Storage capacity | The total storage capacity for this TiDB cluster. |
Dependent item | pd.clusterstatus.storagecapacity[{#SINGLETON}] Preprocessing
|
TiDB cluster: Storage size | The storage size that is currently used by the TiDB cluster. |
Dependent item | pd.clusterstatus.storagesize[{#SINGLETON}] Preprocessing
|
TiDB cluster: Number of regions | The total count of cluster Regions. |
Dependent item | pd.clusterstatus.leadercount[{#SINGLETON}] Preprocessing
|
TiDB cluster: Current peer count | The current count of all cluster peers. |
Dependent item | pd.clusterstatus.regioncount[{#SINGLETON}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
TiDB cluster: There are offline TiKV nodes | PD has not received a TiKV heartbeat for a long time. |
last(/TiDB PD by HTTP/pd.cluster_status.store_down[{#SINGLETON}])>0 |Average |
||
TiDB cluster: There are low space TiKV nodes | Indicates that there is no sufficient space on the TiKV node. |
last(/TiDB PD by HTTP/pd.cluster_status.store_low_space[{#SINGLETON}])>0 |Average |
||
TiDB cluster: There are disconnected TiKV nodes | PD does not receive a TiKV heartbeat within 20 seconds. Normally a TiKV heartbeat comes in every 10 seconds. |
last(/TiDB PD by HTTP/pd.cluster_status.store_disconnected[{#SINGLETON}])>0 |Warning |
||
TiDB cluster: Current storage usage is too high | Over {$PD.STORAGE_USAGE.MAX.WARN}% of the cluster space is occupied. |
min(/TiDB PD by HTTP/pd.cluster_status.storage_size[{#SINGLETON}],5m)/last(/TiDB PD by HTTP/pd.cluster_status.storage_capacity[{#SINGLETON}])*100>{$PD.STORAGE_USAGE.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Region labels discovery | Discovery region labels specific metrics. |
Dependent item | pd.region_labels.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
TiDB cluster: Regions label: {#TYPE} | The number of Regions in different label levels. |
Dependent item | pd.region_labels[{#TYPE}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Region status discovery | Discovery region status specific metrics. |
Dependent item | pd.region_status.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
TiDB cluster: Regions status: {#TYPE} | The health status of Regions indicated via the count of unusual Regions including pending peers, down peers, extra peers, offline peers, missing peers, learner peers and incorrect namespaces. |
Dependent item | pd.region_status[{#TYPE}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
TiDB cluster: Too many missed regions | The number of Region replicas is smaller than the value of max-replicas. When a TiKV machine is down and its downtime exceeds max-down-time, it usually leads to missing replicas for some Regions during a period of time. When a TiKV node is made offline, it might result in a small number of Regions with missing replicas. |
min(/TiDB PD by HTTP/pd.region_status[{#TYPE}],5m)>{$PD.MISS_REGION.MAX.WARN} |Warning |
||
TiDB cluster: There are unresponsive peers | The number of Regions with an unresponsive peer reported by the Raft leader. |
min(/TiDB PD by HTTP/pd.region_status[{#TYPE}],5m)>0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Running scheduler discovery | Discovery scheduler specific metrics. |
Dependent item | pd.scheduler.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
TiDB cluster: Scheduler status: {#KIND} | The current running schedulers. |
Dependent item | pd.scheduler[{#KIND}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
gRPC commands discovery | Discovery grpc commands specific metrics. |
Dependent item | pd.grpc_command.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
PD: gRPC Commands: {#GRPC_METHOD}, rate | The rate per command type at which gRPC commands are completed. |
Dependent item | pd.grpccommand.rate[{#GRPCMETHOD}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Region discovery | Discovery region specific metrics. |
Dependent item | pd.region.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
PD: Get metrics: {#STORE_ADDRESS} | Get region metrics for {#STORE_ADDRESS}. |
Dependent item | pd.regionheartbeat.getmetrics[{#STORE_ADDRESS}] Preprocessing
|
PD: Region heartbeat: active, rate | The count of heartbeats with the ok status per second. |
Dependent item | pd.regionheartbeat.ok.rate[{#STOREADDRESS}] Preprocessing
|
PD: Region heartbeat: error, rate | The count of heartbeats with the error status per second. |
Dependent item | pd.regionheartbeat.error.rate[{#STOREADDRESS}] Preprocessing
|
PD: Region heartbeat: total, rate | The count of heartbeats reported to PD per instance per second. |
Dependent item | pd.regionheartbeat.rate[{#STOREADDRESS}] Preprocessing
|
PD: Region schedule push: total, rate | Dependent item | pd.regionheartbeat.push.err.rate[{#STOREADDRESS}] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Redis monitoring by Zabbix via Zabbix agent 2 and doesn't require any external scripts.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Setup and configure zabbix-agent2 compiled with the Redis monitoring plugin.
Redis' default user should have permissions to run CONFIG, INFO, PING, CLIENT and SLOWLOG commands.
Or default user ACL should have @admin, @slow, @dangerous, @fast and @connection categories.
Test availability: zabbix_get -s 127.0.0.1 -k redis.ping[tcp://127.0.0.1:6379]
Name | Description | Default |
---|---|---|
{$REDIS.CONN.URI} | Connection string in the URI format (password is not used). This param overwrites a value configured in the "Server" option of the configuration file (if it's set), otherwise, the plugin's default value is used: "tcp://localhost:6379" |
tcp://localhost:6379 |
{$REDIS.PROCESS_NAME} | Redis server process name |
redis-server |
{$REDIS.LLD.PROCESS_NAME} | Redis server process name for LLD |
redis-server |
{$REDIS.LLD.FILTER.DB.MATCHES} | Filter of discoverable databases |
.* |
{$REDIS.LLD.FILTER.DB.NOT_MATCHES} | Filter to exclude discovered databases |
CHANGE_IF_NEEDED |
{$REDIS.REPL.LAG.MAX.WARN} | Maximum replication lag in seconds |
30s |
{$REDIS.SLOWLOG.COUNT.MAX.WARN} | Maximum number of slowlog entries per second |
1 |
{$REDIS.CLIENTS.PRC.MAX.WARN} | Maximum percentage of connected clients |
80 |
{$REDIS.MEM.PUSED.MAX.WARN} | Maximum percentage of memory used |
90 |
{$REDIS.MEM.FRAG_RATIO.MAX.WARN} | Maximum memory fragmentation ratio |
1.5 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Redis: Get info | Zabbix agent | redis.info["{$REDIS.CONN.URI}"] | |
Redis: Get config | Zabbix agent | redis.config["{$REDIS.CONN.URI}"] Preprocessing
|
|
Redis: Ping | Zabbix agent | redis.ping["{$REDIS.CONN.URI}"] Preprocessing
|
|
Redis: Slowlog entries per second | Zabbix agent | redis.slowlog.count["{$REDIS.CONN.URI}"] Preprocessing
|
|
Redis: Get Clients info | Dependent item | redis.clients.info_raw Preprocessing
|
|
Redis: Get CPU info | Dependent item | redis.cpu.info_raw Preprocessing
|
|
Redis: Get Keyspace info | Dependent item | redis.keyspace.info_raw Preprocessing
|
|
Redis: Get Memory info | Dependent item | redis.memory.info_raw Preprocessing
|
|
Redis: Get Persistence info | Dependent item | redis.persistence.info_raw Preprocessing
|
|
Redis: Get Replication info | Dependent item | redis.replication.info_raw Preprocessing
|
|
Redis: Get Server info | Dependent item | redis.server.info_raw Preprocessing
|
|
Redis: Get Stats info | Dependent item | redis.stats.info_raw Preprocessing
|
|
Redis: CPU sys | System CPU consumed by the Redis server |
Dependent item | redis.cpu.sys Preprocessing
|
Redis: CPU sys children | System CPU consumed by the background processes |
Dependent item | redis.cpu.sys_children Preprocessing
|
Redis: CPU user | User CPU consumed by the Redis server |
Dependent item | redis.cpu.user Preprocessing
|
Redis: CPU user children | User CPU consumed by the background processes |
Dependent item | redis.cpu.user_children Preprocessing
|
Redis: Blocked clients | The number of connections waiting on a blocking call |
Dependent item | redis.clients.blocked Preprocessing
|
Redis: Max input buffer | The biggest input buffer among current client connections |
Dependent item | redis.clients.maxinputbuffer Preprocessing
|
Redis: Max output buffer | The biggest output buffer among current client connections |
Dependent item | redis.clients.maxoutputbuffer Preprocessing
|
Redis: Connected clients | The number of connected clients |
Dependent item | redis.clients.connected Preprocessing
|
Redis: Cluster enabled | Indicate Redis cluster is enabled |
Dependent item | redis.cluster.enabled Preprocessing
|
Redis: Memory used | Total number of bytes allocated by Redis using its allocator |
Dependent item | redis.memory.used_memory Preprocessing
|
Redis: Memory used Lua | Amount of memory used by the Lua engine |
Dependent item | redis.memory.usedmemorylua Preprocessing
|
Redis: Memory used peak | Peak memory consumed by Redis (in bytes) |
Dependent item | redis.memory.usedmemorypeak Preprocessing
|
Redis: Memory used RSS | Number of bytes that Redis allocated as seen by the operating system |
Dependent item | redis.memory.usedmemoryrss Preprocessing
|
Redis: Memory fragmentation ratio | This ratio is an indication of memory mapping efficiency: - Value over 1.0 indicate that memory fragmentation is very likely. Consider restarting the Redis server so the operating system can recover fragmented memory, especially with a ratio over 1.5. - Value under 1.0 indicate that Redis likely has insufficient memory available. Consider optimizing memory usage or adding more RAM. Note: If your peak memory usage is much higher than your current memory usage, the memory fragmentation ratio may be unreliable. https://redis.io/topics/memory-optimization |
Dependent item | redis.memory.fragmentation_ratio Preprocessing
|
Redis: AOF current rewrite time sec | Duration of the on-going AOF rewrite operation if any |
Dependent item | redis.persistence.aofcurrentrewritetimesec Preprocessing
|
Redis: AOF enabled | Flag indicating AOF logging is activated |
Dependent item | redis.persistence.aof_enabled Preprocessing
|
Redis: AOF last bgrewrite status | Status of the last AOF rewrite operation |
Dependent item | redis.persistence.aoflastbgrewrite_status Preprocessing
|
Redis: AOF last rewrite time sec | Duration of the last AOF rewrite |
Dependent item | redis.persistence.aoflastrewritetimesec Preprocessing
|
Redis: AOF last write status | Status of the last write operation to the AOF |
Dependent item | redis.persistence.aoflastwrite_status Preprocessing
|
Redis: AOF rewrite in progress | Flag indicating an AOF rewrite operation is on-going |
Dependent item | redis.persistence.aofrewritein_progress Preprocessing
|
Redis: AOF rewrite scheduled | Flag indicating an AOF rewrite operation will be scheduled once the on-going RDB save is complete |
Dependent item | redis.persistence.aofrewritescheduled Preprocessing
|
Redis: Dump loading | Flag indicating if the load of a dump file is on-going |
Dependent item | redis.persistence.loading Preprocessing
|
Redis: RDB bgsave in progress | "1" if bgsave is in progress and "0" otherwise |
Dependent item | redis.persistence.rdbbgsavein_progress Preprocessing
|
Redis: RDB changes since last save | Number of changes since the last background save |
Dependent item | redis.persistence.rdbchangessincelastsave Preprocessing
|
Redis: RDB current bgsave time sec | Duration of the on-going RDB save operation if any |
Dependent item | redis.persistence.rdbcurrentbgsavetimesec Preprocessing
|
Redis: RDB last bgsave status | Status of the last RDB save operation |
Dependent item | redis.persistence.rdblastbgsave_status Preprocessing
|
Redis: RDB last bgsave time sec | Duration of the last bg_save operation |
Dependent item | redis.persistence.rdblastbgsavetimesec Preprocessing
|
Redis: RDB last save time | Epoch-based timestamp of last successful RDB save |
Dependent item | redis.persistence.rdblastsave_time Preprocessing
|
Redis: Connected slaves | Number of connected slaves |
Dependent item | redis.replication.connected_slaves Preprocessing
|
Redis: Replication backlog active | Flag indicating replication backlog is active |
Dependent item | redis.replication.replbacklogactive Preprocessing
|
Redis: Replication backlog first byte offset | The master offset of the replication backlog buffer |
Dependent item | redis.replication.replbacklogfirstbyteoffset Preprocessing
|
Redis: Replication backlog history length | Amount of data in the backlog sync buffer |
Dependent item | redis.replication.replbackloghistlen Preprocessing
|
Redis: Replication backlog size | Total size in bytes of the replication backlog buffer |
Dependent item | redis.replication.replbacklogsize Preprocessing
|
Redis: Replication role | Value is "master" if the instance is replica of no one, or "slave" if the instance is a replica of some master instance. Note that a replica can be master of another replica (chained replication). |
Dependent item | redis.replication.role Preprocessing
|
Redis: Master replication offset | Replication offset reported by the master |
Dependent item | redis.replication.masterreploffset Preprocessing
|
Redis: Process id | PID of the server process |
Dependent item | redis.server.process_id Preprocessing
|
Redis: Redis mode | The server's mode ("standalone", "sentinel" or "cluster") |
Dependent item | redis.server.redis_mode Preprocessing
|
Redis: Redis version | Version of the Redis server |
Dependent item | redis.server.redis_version Preprocessing
|
Redis: TCP port | TCP/IP listen port |
Dependent item | redis.server.tcp_port Preprocessing
|
Redis: Uptime | Number of seconds since Redis server start |
Dependent item | redis.server.uptime Preprocessing
|
Redis: Evicted keys | Number of evicted keys due to maxmemory limit |
Dependent item | redis.stats.evicted_keys Preprocessing
|
Redis: Expired keys | Total number of key expiration events |
Dependent item | redis.stats.expired_keys Preprocessing
|
Redis: Instantaneous input bytes per second | The network's read rate per second in KB/sec |
Dependent item | redis.stats.instantaneous_input.rate Preprocessing
|
Redis: Instantaneous operations per sec | Number of commands processed per second |
Dependent item | redis.stats.instantaneous_ops.rate Preprocessing
|
Redis: Instantaneous output bytes per second | The network's write rate per second in KB/sec |
Dependent item | redis.stats.instantaneous_output.rate Preprocessing
|
Redis: Keyspace hits | Number of successful lookup of keys in the main dictionary |
Dependent item | redis.stats.keyspace_hits Preprocessing
|
Redis: Keyspace misses | Number of failed lookup of keys in the main dictionary |
Dependent item | redis.stats.keyspace_misses Preprocessing
|
Redis: Latest fork usec | Duration of the latest fork operation in microseconds |
Dependent item | redis.stats.latestforkusec Preprocessing
|
Redis: Migrate cached sockets | The number of sockets open for MIGRATE purposes |
Dependent item | redis.stats.migratecachedsockets Preprocessing
|
Redis: Pubsub channels | Global number of pub/sub channels with client subscriptions |
Dependent item | redis.stats.pubsub_channels Preprocessing
|
Redis: Pubsub patterns | Global number of pub/sub pattern with client subscriptions |
Dependent item | redis.stats.pubsub_patterns Preprocessing
|
Redis: Rejected connections | Number of connections rejected because of maxclients limit |
Dependent item | redis.stats.rejected_connections Preprocessing
|
Redis: Sync full | The number of full resyncs with replicas |
Dependent item | redis.stats.sync_full Preprocessing
|
Redis: Sync partial err | The number of denied partial resync requests |
Dependent item | redis.stats.syncpartialerr Preprocessing
|
Redis: Sync partial ok | The number of accepted partial resync requests |
Dependent item | redis.stats.syncpartialok Preprocessing
|
Redis: Total commands processed | Total number of commands processed by the server |
Dependent item | redis.stats.totalcommandsprocessed Preprocessing
|
Redis: Total connections received | Total number of connections accepted by the server |
Dependent item | redis.stats.totalconnectionsreceived Preprocessing
|
Redis: Total net input bytes | The total number of bytes read from the network |
Dependent item | redis.stats.totalnetinput_bytes Preprocessing
|
Redis: Total net output bytes | The total number of bytes written to the network |
Dependent item | redis.stats.totalnetoutput_bytes Preprocessing
|
Redis: Max clients | Max number of connected clients at the same time. Once the limit is reached Redis will close all the new connections sending an error "max number of clients reached". |
Dependent item | redis.config.maxclients Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Redis: Failed to fetch info data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Redis by Zabbix agent 2/redis.info["{$REDIS.CONN.URI}"],30m)=1 |Warning |
Manual close: Yes Depends on:
|
|
Redis: Configuration has changed | Redis configuration has changed. Acknowledge to close the problem manually. |
last(/Redis by Zabbix agent 2/redis.config["{$REDIS.CONN.URI}"],#1)<>last(/Redis by Zabbix agent 2/redis.config["{$REDIS.CONN.URI}"],#2) and length(last(/Redis by Zabbix agent 2/redis.config["{$REDIS.CONN.URI}"]))>0 |Info |
Manual close: Yes | |
Redis: Service is down | last(/Redis by Zabbix agent 2/redis.ping["{$REDIS.CONN.URI}"])=0 |Average |
Manual close: Yes | ||
Redis: Too many entries in the slowlog | min(/Redis by Zabbix agent 2/redis.slowlog.count["{$REDIS.CONN.URI}"],5m)>{$REDIS.SLOWLOG.COUNT.MAX.WARN} |Info |
|||
Redis: Total number of connected clients is too high | When the number of clients reaches the value of the "maxclients" parameter, new connections will be rejected. |
min(/Redis by Zabbix agent 2/redis.clients.connected,5m)/last(/Redis by Zabbix agent 2/redis.config.maxclients)*100>{$REDIS.CLIENTS.PRC.MAX.WARN} |Warning |
||
Redis: Memory fragmentation ratio is too high | This ratio is an indication of memory mapping efficiency: |
min(/Redis by Zabbix agent 2/redis.memory.fragmentation_ratio,15m)>{$REDIS.MEM.FRAG_RATIO.MAX.WARN} |Warning |
||
Redis: Last AOF write operation failed | Detailed information about persistence: https://redis.io/topics/persistence |
last(/Redis by Zabbix agent 2/redis.persistence.aof_last_write_status)=0 |Warning |
||
Redis: Last RDB save operation failed | Detailed information about persistence: https://redis.io/topics/persistence |
last(/Redis by Zabbix agent 2/redis.persistence.rdb_last_bgsave_status)=0 |Warning |
||
Redis: Number of slaves has changed | Redis number of slaves has changed. Acknowledge to close the problem manually. |
last(/Redis by Zabbix agent 2/redis.replication.connected_slaves,#1)<>last(/Redis by Zabbix agent 2/redis.replication.connected_slaves,#2) |Info |
Manual close: Yes | |
Redis: Replication role has changed | Redis replication role has changed. Acknowledge to close the problem manually. |
last(/Redis by Zabbix agent 2/redis.replication.role,#1)<>last(/Redis by Zabbix agent 2/redis.replication.role,#2) and length(last(/Redis by Zabbix agent 2/redis.replication.role))>0 |Warning |
Manual close: Yes | |
Redis: Version has changed | The Redis version has changed. Acknowledge to close the problem manually. |
last(/Redis by Zabbix agent 2/redis.server.redis_version,#1)<>last(/Redis by Zabbix agent 2/redis.server.redis_version,#2) and length(last(/Redis by Zabbix agent 2/redis.server.redis_version))>0 |Info |
Manual close: Yes | |
Redis: Host has been restarted | The host uptime is less than 10 minutes. |
last(/Redis by Zabbix agent 2/redis.server.uptime)<10m |Info |
Manual close: Yes | |
Redis: Connections are rejected | The number of connections has reached the value of "maxclients". |
last(/Redis by Zabbix agent 2/redis.stats.rejected_connections)>0 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Keyspace discovery | Individual keyspace metrics |
Dependent item | redis.keyspace.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
DB {#DB}: Get Keyspace info | The item gets information about keyspace of {#DB} database. |
Dependent item | redis.db.info_raw["{#DB}"] Preprocessing
|
DB {#DB}: Average TTL | Average TTL |
Dependent item | redis.db.avg_ttl["{#DB}"] Preprocessing
|
DB {#DB}: Expires | Number of keys with an expiration |
Dependent item | redis.db.expires["{#DB}"] Preprocessing
|
DB {#DB}: Keys | Total number of keys |
Dependent item | redis.db.keys["{#DB}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
AOF metrics discovery | If AOF is activated, additional metrics will be added |
Dependent item | redis.persistence.aof.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Redis: AOF current size{#SINGLETON} | AOF current file size |
Dependent item | redis.persistence.aofcurrentsize[{#SINGLETON}] Preprocessing
|
Redis: AOF base size{#SINGLETON} | AOF file size on latest startup or rewrite |
Dependent item | redis.persistence.aofbasesize[{#SINGLETON}] Preprocessing
|
Redis: AOF pending rewrite{#SINGLETON} | Flag indicating an AOF rewrite operation will |
Dependent item | redis.persistence.aofpendingrewrite[{#SINGLETON}] Preprocessing
|
Redis: AOF buffer length{#SINGLETON} | Size of the AOF buffer |
Dependent item | redis.persistence.aofbufferlength[{#SINGLETON}] Preprocessing
|
Redis: AOF rewrite buffer length{#SINGLETON} | Size of the AOF rewrite buffer |
Dependent item | redis.persistence.aofrewritebuffer_length[{#SINGLETON}] Preprocessing
|
Redis: AOF pending background I/O fsync{#SINGLETON} | Number of fsync pending jobs in background I/O queue |
Dependent item | redis.persistence.aofpendingbio_fsync[{#SINGLETON}] Preprocessing
|
Redis: AOF delayed fsync{#SINGLETON} | Delayed fsync counter |
Dependent item | redis.persistence.aofdelayedfsync[{#SINGLETON}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Slave metrics discovery | If the instance is a replica, additional metrics are provided |
Dependent item | redis.replication.slave.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Redis: Master host{#SINGLETON} | Host or IP address of the master |
Dependent item | redis.replication.master_host[{#SINGLETON}] Preprocessing
|
Redis: Master port{#SINGLETON} | Master listening TCP port |
Dependent item | redis.replication.master_port[{#SINGLETON}] Preprocessing
|
Redis: Master link status{#SINGLETON} | Status of the link (up/down) |
Dependent item | redis.replication.masterlinkstatus[{#SINGLETON}] Preprocessing
|
Redis: Master last I/O seconds ago{#SINGLETON} | Number of seconds since the last interaction with master |
Dependent item | redis.replication.masterlastiosecondsago[{#SINGLETON}] Preprocessing
|
Redis: Master sync in progress{#SINGLETON} | Indicate the master is syncing to the replica |
Dependent item | redis.replication.mastersyncin_progress[{#SINGLETON}] Preprocessing
|
Redis: Slave replication offset{#SINGLETON} | The replication offset of the replica instance |
Dependent item | redis.replication.slavereploffset[{#SINGLETON}] Preprocessing
|
Redis: Slave priority{#SINGLETON} | The priority of the instance as a candidate for failover |
Dependent item | redis.replication.slave_priority[{#SINGLETON}] Preprocessing
|
Redis: Slave priority{#SINGLETON} | Flag indicating if the replica is read-only |
Dependent item | redis.replication.slavereadonly[{#SINGLETON}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Redis: Replication lag with master is too high | min(/Redis by Zabbix agent 2/redis.replication.master_last_io_seconds_ago[{#SINGLETON}],5m)>{$REDIS.REPL.LAG.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Replication metrics discovery | If the instance is the master and the slaves are connected, additional metrics are provided |
Dependent item | redis.replication.master.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Redis slave {#SLAVEIP}:{#SLAVEPORT}: Replication lag in bytes | Replication lag in bytes |
Dependent item | redis.replication.lagbytes["{#SLAVEIP}:{#SLAVE_PORT}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Process metrics discovery | Collect metrics by Zabbix agent if it exists |
Zabbix agent | proc.num["{$REDIS.LLD.PROCESS_NAME}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Redis: Number of running processes | Zabbix agent | proc.num["{$REDIS.PROCESS_NAME}{#SINGLETON}"] | |
Redis: Memory usage (rss) | Resident set size memory used by process in bytes. |
Zabbix agent | proc.mem["{$REDIS.PROCESS_NAME}{#SINGLETON}",,,,rss] |
Redis: Memory usage (vsize) | Virtual memory size used by process in bytes. |
Zabbix agent | proc.mem["{$REDIS.PROCESS_NAME}{#SINGLETON}",,,,vsize] |
Redis: CPU utilization | Process CPU utilization percentage. |
Zabbix agent | proc.cpu.util["{$REDIS.PROCESS_NAME}{#SINGLETON}"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Redis: Process is not running | last(/Redis by Zabbix agent 2/proc.num["{$REDIS.PROCESS_NAME}{#SINGLETON}"])=0 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Version 4+ metrics discovery | Additional metrics for versions 4+ |
Dependent item | redis.metrics.v4.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Redis: Executable path{#SINGLETON} | The path to the server's executable |
Dependent item | redis.server.executable[{#SINGLETON}] Preprocessing
|
Redis: Memory used peak %{#SINGLETON} | The percentage of usedmemorypeak out of used_memory |
Dependent item | redis.memory.usedmemorypeak_perc[{#SINGLETON}] Preprocessing
|
Redis: Memory used overhead{#SINGLETON} | The sum in bytes of all overheads that the server allocated for managing its internal data structures |
Dependent item | redis.memory.usedmemoryoverhead[{#SINGLETON}] Preprocessing
|
Redis: Memory used startup{#SINGLETON} | Initial amount of memory consumed by Redis at startup in bytes |
Dependent item | redis.memory.usedmemorystartup[{#SINGLETON}] Preprocessing
|
Redis: Memory used dataset{#SINGLETON} | The size in bytes of the dataset |
Dependent item | redis.memory.usedmemorydataset[{#SINGLETON}] Preprocessing
|
Redis: Memory used dataset %{#SINGLETON} | The percentage of usedmemorydataset out of the net memory usage (usedmemory minus usedmemory_startup) |
Dependent item | redis.memory.usedmemorydataset_perc[{#SINGLETON}] Preprocessing
|
Redis: Total system memory{#SINGLETON} | The total amount of memory that the Redis host has |
Dependent item | redis.memory.totalsystemmemory[{#SINGLETON}] Preprocessing
|
Redis: Max memory{#SINGLETON} | Maximum amount of memory allocated to the Redisdb system |
Dependent item | redis.memory.maxmemory[{#SINGLETON}] Preprocessing
|
Redis: Max memory policy{#SINGLETON} | The value of the maxmemory-policy configuration directive |
Dependent item | redis.memory.maxmemory_policy[{#SINGLETON}] Preprocessing
|
Redis: Active defrag running{#SINGLETON} | Flag indicating if active defragmentation is active |
Dependent item | redis.memory.activedefragrunning[{#SINGLETON}] Preprocessing
|
Redis: Lazyfree pending objects{#SINGLETON} | The number of objects waiting to be freed (as a result of calling UNLINK, or FLUSHDB and FLUSHALL with the ASYNC option) |
Dependent item | redis.memory.lazyfreependingobjects[{#SINGLETON}] Preprocessing
|
Redis: RDB last CoW size{#SINGLETON} | The size in bytes of copy-on-write allocations during the last RDB save operation |
Dependent item | redis.persistence.rdblastcow_size[{#SINGLETON}] Preprocessing
|
Redis: AOF last CoW size{#SINGLETON} | The size in bytes of copy-on-write allocations during the last AOF rewrite operation |
Dependent item | redis.persistence.aoflastcow_size[{#SINGLETON}] Preprocessing
|
Redis: Expired stale %{#SINGLETON} | Dependent item | redis.stats.expiredstaleperc[{#SINGLETON}] Preprocessing
|
|
Redis: Expired time cap reached count{#SINGLETON} | Dependent item | redis.stats.expiredtimecapreachedcount[{#SINGLETON}] Preprocessing
|
|
Redis: Slave expires tracked keys{#SINGLETON} | The number of keys tracked for expiry purposes (applicable only to writable replicas) |
Dependent item | redis.stats.slaveexpirestracked_keys[{#SINGLETON}] Preprocessing
|
Redis: Active defrag hits{#SINGLETON} | Number of value reallocations performed by active the defragmentation process |
Dependent item | redis.stats.activedefraghits[{#SINGLETON}] Preprocessing
|
Redis: Active defrag misses{#SINGLETON} | Number of aborted value reallocations started by the active defragmentation process |
Dependent item | redis.stats.activedefragmisses[{#SINGLETON}] Preprocessing
|
Redis: Active defrag key hits{#SINGLETON} | Number of keys that were actively defragmented |
Dependent item | redis.stats.activedefragkey_hits[{#SINGLETON}] Preprocessing
|
Redis: Active defrag key misses{#SINGLETON} | Number of keys that were skipped by the active defragmentation process |
Dependent item | redis.stats.activedefragkey_misses[{#SINGLETON}] Preprocessing
|
Redis: Replication second offset{#SINGLETON} | Offset up to which replication IDs are accepted |
Dependent item | redis.replication.secondreploffset[{#SINGLETON}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Redis: Memory usage is too high | last(/Redis by Zabbix agent 2/redis.memory.used_memory)/min(/Redis by Zabbix agent 2/redis.memory.maxmemory[{#SINGLETON}],5m)*100>{$REDIS.MEM.PUSED.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Version 5+ metrics discovery | Additional metrics for versions 5+ |
Dependent item | redis.metrics.v5.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Redis: Allocator active{#SINGLETON} | Dependent item | redis.memory.allocator_active[{#SINGLETON}] Preprocessing
|
|
Redis: Allocator allocated{#SINGLETON} | Dependent item | redis.memory.allocator_allocated[{#SINGLETON}] Preprocessing
|
|
Redis: Allocator resident{#SINGLETON} | Dependent item | redis.memory.allocator_resident[{#SINGLETON}] Preprocessing
|
|
Redis: Memory used scripts{#SINGLETON} | Dependent item | redis.memory.usedmemoryscripts[{#SINGLETON}] Preprocessing
|
|
Redis: Memory number of cached scripts{#SINGLETON} | Dependent item | redis.memory.numberofcached_scripts[{#SINGLETON}] Preprocessing
|
|
Redis: Allocator fragmentation bytes{#SINGLETON} | Dependent item | redis.memory.allocatorfragbytes[{#SINGLETON}] Preprocessing
|
|
Redis: Allocator fragmentation ratio{#SINGLETON} | Dependent item | redis.memory.allocatorfragratio[{#SINGLETON}] Preprocessing
|
|
Redis: Allocator RSS bytes{#SINGLETON} | Dependent item | redis.memory.allocatorrssbytes[{#SINGLETON}] Preprocessing
|
|
Redis: Allocator RSS ratio{#SINGLETON} | Dependent item | redis.memory.allocatorrssratio[{#SINGLETON}] Preprocessing
|
|
Redis: Memory RSS overhead bytes{#SINGLETON} | Dependent item | redis.memory.rssoverheadbytes[{#SINGLETON}] Preprocessing
|
|
Redis: Memory RSS overhead ratio{#SINGLETON} | Dependent item | redis.memory.rssoverheadratio[{#SINGLETON}] Preprocessing
|
|
Redis: Memory fragmentation bytes{#SINGLETON} | Dependent item | redis.memory.fragmentation_bytes[{#SINGLETON}] Preprocessing
|
|
Redis: Memory not counted for evict{#SINGLETON} | Dependent item | redis.memory.notcountedfor_evict[{#SINGLETON}] Preprocessing
|
|
Redis: Memory replication backlog{#SINGLETON} | Dependent item | redis.memory.replication_backlog[{#SINGLETON}] Preprocessing
|
|
Redis: Memory clients normal{#SINGLETON} | Dependent item | redis.memory.memclientsnormal[{#SINGLETON}] Preprocessing
|
|
Redis: Memory clients slaves{#SINGLETON} | Dependent item | redis.memory.memclientsslaves[{#SINGLETON}] Preprocessing
|
|
Redis: Memory AOF buffer{#SINGLETON} | Size of the AOF buffer |
Dependent item | redis.memory.memaofbuffer[{#SINGLETON}] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of PostgreSQL monitoring by Zabbix via ODBC and doesn't require any external scripts.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
<password>
at your discretion) and inherit permissions from the default role pg_monitor
:CREATE USER zbx_monitor WITH PASSWORD '<PASSWORD>' INHERIT;
GRANT pg_monitor TO zbx_monitor;
pg_hba.conf
configuration file to allow TCP connections for the user zbx_monitor
. For example, you could add one of the following rows to allow local connections from the same host:# TYPE DATABASE USER ADDRESS METHOD
host all zbx_monitor localhost trust
host all zbx_monitor 127.0.0.1/32 md5
host all zbx_monitor ::1/128 scram-sha-256
For more information please read the PostgreSQL documentation https://www.postgresql.org/docs/current/auth-pg-hba-conf.html
.
Install the PostgreSQL ODBC driver. Check the Zabbix documentation
for details about ODBC checks and recommended parameters page
.
Set up the connection string with the {$PG.CONNSTRING.ODBC}
macro. The minimum required parameters are:
Driver=
- set the name of the driver which will be used for monitoring (from the odbcinst.ini
file) or specify the path to the driver file (for example /usr/lib64/psqlodbcw.so
);Servername=
- set the host name or IP address of the PostgreSQL instance;Port=
- adjust the port number if needed.Note: if you want to use SSL/TLS encryption to protect communications with the remote PostgreSQL instance, you can also specify encryption parameters here.
It is assumed that you set up the PostgreSQL instance to work in the desired encryption mode. Check the PostgreSQL documentation
for details.
For example, to enable required encryption in transport mode without identity checks, the connection string could look like this (replace <instanceip>
with the address of the PostgreSQL instance):
Servername=<instanceip>;Port=5432;Driver=/usr/lib64/psqlodbcw.so;SSLmode=require
{$PG.PASSWORD}
.Name | Description | Default |
---|---|---|
{$PG.PASSWORD} | PostgreSQL user password. |
<Put the password here> |
{$PG.USER} | PostgreSQL username. |
zbx_monitor |
{$PG.CONNSTRING.ODBC} | Connection string for the PostgreSQL instance. |
Macro too long. Please see the template. |
{$PG.LLD.FILTER.DBNAME} | Filter of discoverable databases. |
.+ |
{$PG.CONNTOTALPCT.MAX.WARN} | Maximum percentage of current connections for trigger expression. |
90 |
{$PG.DATABASE} | Default PostgreSQL database for the connection. |
postgres |
{$PG.DEADLOCKS.MAX.WARN} | Maximum number of detected deadlocks for trigger expression. |
0 |
{$PG.LLD.FILTER.APPLICATION} | Filter of discoverable applications. |
.+ |
{$PG.CONFLICTS.MAX.WARN} | Maximum number of recovery conflicts for trigger expression. |
0 |
{$PG.QUERY_ETIME.MAX.WARN} | Execution time limit for count of slow queries. |
30 |
{$PG.SLOW_QUERIES.MAX.WARN} | Slow queries count threshold for a trigger. |
5 |
Name | Description | Type | Key and additional info |
---|---|---|---|
PostgreSQL: Get bgwriter | Collect all metrics from pgstatbgwriter: https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-BGWRITER-VIEW |
Database monitor | db.odbc.select[pgsql.bgwriter,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"] |
PostgreSQL: Get archive | Collect archive status metrics. |
Database monitor | db.odbc.select[pgsql.archive,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"] |
PostgreSQL: Get dbstat | Collect all metrics from pgstatdatabase per database: https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-DATABASE-VIEW |
Database monitor | db.odbc.select[pgsql.dbstat,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"] |
PostgreSQL: Get dbstat sum | Collect all metrics from pgstatdatabase as sums for all databases: https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-DATABASE-VIEW |
Database monitor | db.odbc.select[pgsql.dbstat.sum,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"] |
PostgreSQL: Get connections sum | Collect all metrics from pgstatactivity: https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW |
Database monitor | db.odbc.select[pgsql.connections.sum,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"] |
PostgreSQL: Get WAL | Collect write-ahead log (WAL) metrics. |
Database monitor | db.odbc.select[pgsql.wal.stat,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"] |
PostgreSQL: Get locks | Collect all metrics from pg_locks per database: https://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-TABLES |
Database monitor | db.odbc.select[pgsql.locks,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"] |
PostgreSQL: Get replication | Collect metrics from the pgstatreplication, which contains information about the WAL sender process, showing statistics about replication to that sender's connected standby server. |
Database monitor | db.odbc.select[pgsql.replication.process,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"] Preprocessing
|
PostgreSQL: Get queries | Collect all metrics by query execution time. |
Database monitor | db.odbc.select[pgsql.queries,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"] |
PostgreSQL: Version | PostgreSQL version. |
Database monitor | db.odbc.select[pgsql.version,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"] Preprocessing
|
WAL: Bytes written | WAL write, in bytes. |
Dependent item | pgsql.wal.write Preprocessing
|
WAL: Bytes received | WAL receive, in bytes. |
Dependent item | pgsql.wal.receive Preprocessing
|
WAL: Segments count | Number of WAL segments. |
Dependent item | pgsql.wal.count Preprocessing
|
Bgwriter: Buffers allocated per second | Number of buffers allocated per second. |
Dependent item | pgsql.bgwriter.buffers_alloc.rate Preprocessing
|
Bgwriter: Buffers written directly by a backend per second | Number of buffers written directly by a backend per second. |
Dependent item | pgsql.bgwriter.buffers_backend.rate Preprocessing
|
Bgwriter: Number of bgwriter cleaning scan stopped per second | Number of times the background writer stopped a cleaning scan because it had written too many buffers per second. |
Dependent item | pgsql.bgwriter.maxwritten_clean.rate Preprocessing
|
Bgwriter: Times a backend executed its own fsync per second | Number of times a backend had to execute its own fsync call per second (normally the background writer handles those even when the backend does its own write). |
Dependent item | pgsql.bgwriter.buffersbackendfsync.rate Preprocessing
|
Checkpoint: Buffers written by the background writer per second | Number of buffers written by the background writer per second. |
Dependent item | pgsql.bgwriter.buffers_clean.rate Preprocessing
|
Checkpoint: Buffers written during checkpoints per second | Number of buffers written during checkpoints per second. |
Dependent item | pgsql.bgwriter.buffers_checkpoint.rate Preprocessing
|
Checkpoint: Scheduled per second | Number of scheduled checkpoints that have been performed per second. |
Dependent item | pgsql.bgwriter.checkpoints_timed.rate Preprocessing
|
Checkpoint: Requested per second | Number of requested checkpoints that have been performed per second. |
Dependent item | pgsql.bgwriter.checkpoints_req.rate Preprocessing
|
Checkpoint: Checkpoint write time per second | Total amount of time per second that has been spent in the portion of checkpoint processing where files are written to disk. |
Dependent item | pgsql.bgwriter.checkpointwritetime.rate Preprocessing
|
Checkpoint: Checkpoint sync time per second | Total amount of time per second that has been spent in the portion of checkpoint processing where files are synchronized to disk. |
Dependent item | pgsql.bgwriter.checkpointsynctime.rate Preprocessing
|
Archive: Count of archived files | Count of archived files. |
Dependent item | pgsql.archive.countarchivedfiles Preprocessing
|
Archive: Count of failed attempts to archive files | Count of failed attempts to archive files. |
Dependent item | pgsql.archive.failedtryingto_archive Preprocessing
|
Archive: Count of files in archive_status need to archive | Count of files to archive. |
Dependent item | pgsql.archive.countfilesto_archive Preprocessing
|
Archive: Size of files need to archive | Size of files to archive. |
Dependent item | pgsql.archive.sizefilesto_archive Preprocessing
|
Dbstat: Blocks read time | Time spent reading data file blocks by backends. |
Dependent item | pgsql.dbstat.sum.blkreadtime Preprocessing
|
Dbstat: Blocks write time | Time spent writing data file blocks by backends. |
Dependent item | pgsql.dbstat.sum.blkwritetime Preprocessing
|
Dbstat: Committed transactions per second | Number of transactions that have been committed per second. |
Dependent item | pgsql.dbstat.sum.xact_commit.rate Preprocessing
|
Dbstat: Conflicts per second | Number of queries canceled per second due to conflicts with recovery (conflicts occur only on standby servers; see pgstatdatabase_conflicts for details). |
Dependent item | pgsql.dbstat.sum.conflicts.rate Preprocessing
|
Dbstat: Deadlocks per second | Number of deadlocks detected per second. |
Dependent item | pgsql.dbstat.sum.deadlocks.rate Preprocessing
|
Dbstat: Disk blocks read per second | Number of disk blocks read per second. |
Dependent item | pgsql.dbstat.sum.blks_read.rate Preprocessing
|
Dbstat: Hit blocks read per second | Number of times per second disk blocks were found already in the buffer cache. |
Dependent item | pgsql.dbstat.sum.blks_hit.rate Preprocessing
|
Dbstat: Number temp bytes per second | Total amount of data written per second to temporary files by queries. |
Dependent item | pgsql.dbstat.sum.temp_bytes.rate Preprocessing
|
Dbstat: Number temp files per second | Number of temporary files created by queries per second. |
Dependent item | pgsql.dbstat.sum.temp_files.rate Preprocessing
|
Dbstat: Roll backed transactions per second | Number of transactions that have been rolled back per second. |
Dependent item | pgsql.dbstat.sum.xact_rollback.rate Preprocessing
|
Dbstat: Rows deleted per second | Number of rows deleted by queries per second. |
Dependent item | pgsql.dbstat.sum.tup_deleted.rate Preprocessing
|
Dbstat: Rows fetched per second | Number of rows fetched by queries per second. |
Dependent item | pgsql.dbstat.sum.tup_fetched.rate Preprocessing
|
Dbstat: Rows inserted per second | Number of rows inserted by queries per second. |
Dependent item | pgsql.dbstat.sum.tup_inserted.rate Preprocessing
|
Dbstat: Rows returned per second | Number of rows returned by queries per second. |
Dependent item | pgsql.dbstat.sum.tup_returned.rate Preprocessing
|
Dbstat: Rows updated per second | Number of rows updated by queries per second. |
Dependent item | pgsql.dbstat.sum.tup_updated.rate Preprocessing
|
Dbstat: Backends connected | Number of connected backends. |
Dependent item | pgsql.dbstat.sum.numbackends Preprocessing
|
Connections sum: Active | Total number of connections executing a query. |
Dependent item | pgsql.connections.sum.active Preprocessing
|
Connections sum: Fastpath function call | Total number of connections executing a fast-path function. |
Dependent item | pgsql.connections.sum.fastpathfunctioncall Preprocessing
|
Connections sum: Idle | Total number of connections waiting for a new client command. |
Dependent item | pgsql.connections.sum.idle Preprocessing
|
Connections sum: Idle in transaction | Total number of connections in a transaction state but not executing a query. |
Dependent item | pgsql.connections.sum.idleintransaction Preprocessing
|
Connections sum: Prepared | Total number of prepared transactions: https://www.postgresql.org/docs/current/sql-prepare-transaction.html |
Dependent item | pgsql.connections.sum.prepared Preprocessing
|
Connections sum: Total | Total number of connections. |
Dependent item | pgsql.connections.sum.total Preprocessing
|
Connections sum: Total, % | Total number of connections, in percentage. |
Dependent item | pgsql.connections.sum.total_pct Preprocessing
|
Connections sum: Waiting | Total number of waiting connections: https://www.postgresql.org/docs/current/monitoring-stats.html#WAIT-EVENT-TABLE |
Dependent item | pgsql.connections.sum.waiting Preprocessing
|
Connections sum: Idle in transaction (aborted) | Total number of connections in a transaction state but not executing a query, and where one of the statements in the transaction caused an error. |
Dependent item | pgsql.connections.sum.idleintransaction_aborted Preprocessing
|
Connections sum: Disabled | Total number of disabled connections. |
Dependent item | pgsql.connections.sum.disabled Preprocessing
|
PostgreSQL: Age of oldest xid | Age of oldest xid. |
Database monitor | db.odbc.select[pgsql.oldest.xid,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"] |
PostgreSQL: Count of autovacuum workers | Number of autovacuum workers. |
Database monitor | db.odbc.select[pgsql.autovacuum.count,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"] |
PostgreSQL: Cache hit ratio, % | Cache hit ratio. |
Calculated | pgsql.cache.hit |
PostgreSQL: Uptime | Time since the server started. |
Database monitor | db.odbc.select[pgsql.uptime,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"] |
PostgreSQL: Replication: Lag in bytes | Replication lag with master, in bytes. |
Database monitor | db.odbc.select[pgsql.replication.lag.b,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"] |
PostgreSQL: Replication: Lag in seconds | Replication lag with master, in seconds. |
Database monitor | db.odbc.select[pgsql.replication.lag.sec,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"] |
PostgreSQL: Replication: Recovery role | Replication role: 1 — recovery is still in progress (standby mode), 0 — master mode. |
Database monitor | db.odbc.select[pgsql.replication.recovery_role,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"] |
PostgreSQL: Replication: Standby count | Number of standby servers. |
Database monitor | db.odbc.select[pgsql.replication.count,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"] |
PostgreSQL: Replication: Status | Replication status: 0 — streaming is down, 1 — streaming is up, 2 — master mode. |
Database monitor | db.odbc.select[pgsql.replication.status,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"] |
PostgreSQL: Ping | Used to test a connection to see if it is alive. It is set to 0 if the query is unsuccessful. |
Database monitor | db.odbc.select[pgsql.ping,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
PostgreSQL: Version has changed | last(/PostgreSQL by ODBC/db.odbc.select[pgsql.version,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"],#1)<>last(/PostgreSQL by ODBC/db.odbc.select[pgsql.version,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"],#2) and length(last(/PostgreSQL by ODBC/db.odbc.select[pgsql.version,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"]))>0 |Info |
|||
PostgreSQL: Total number of connections is too high | Total number of current connections exceeds the limit of {$PG.CONNTOTALPCT.MAX.WARN}% out of the maximum number of concurrent connections to the database server (the "max_connections" setting). |
min(/PostgreSQL by ODBC/pgsql.connections.sum.total_pct,5m) > {$PG.CONN_TOTAL_PCT.MAX.WARN} |Average |
||
PostgreSQL: Oldest xid is too big | last(/PostgreSQL by ODBC/db.odbc.select[pgsql.oldest.xid,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"]) > 18000000 |Average |
|||
PostgreSQL: Service has been restarted | PostgreSQL uptime is less than 10 minutes. |
last(/PostgreSQL by ODBC/db.odbc.select[pgsql.uptime,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"]) < 10m |Average |
||
PostgreSQL: Service is down | Last test of a connection was unsuccessful. |
last(/PostgreSQL by ODBC/db.odbc.select[pgsql.ping,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"])=0 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Replication discovery | Discovers replication lag metrics. |
Database monitor | db.odbc.select[pgsql.replication.process.discovery,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Application [{#APPLICATION_NAME}]: Get replication | Collect metrics from the "pgstatreplication" about the application "{#APPLICATION_NAME}" that is connected to this WAL sender, which contains information about the WAL sender process, showing statistics about replication to that sender's connected standby server. |
Dependent item | pgsql.replication.getmetrics["{#APPLICATIONNAME}"] Preprocessing
|
Application [{#APPLICATION_NAME}]: Replication flush lag | Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written and flushed it (but not yet applied it). This can be used to gauge the delay that synchronous_commit level on incurred while committing if this server was configured as a synchronous standby. |
Dependent item | pgsql.replication.process.flushlag["{#APPLICATIONNAME}"] Preprocessing
|
Application [{#APPLICATION_NAME}]: Replication replay lag | Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written, flushed and applied it. This can be used to gauge the delay that synchronouscommit level remoteapply incurred while committing if this server was configured as a synchronous standby. |
Dependent item | pgsql.replication.process.replaylag["{#APPLICATIONNAME}"] Preprocessing
|
Application [{#APPLICATION_NAME}]: Replication write lag | Time elapsed between flushing recent WAL locally and receiving notification that this standby server has written it (but not yet flushed it or applied it). This can be used to gauge the delay that synchronouscommit level remotewrite incurred while committing if this server was configured as a synchronous standby. |
Dependent item | pgsql.replication.process.writelag["{#APPLICATIONNAME}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Database discovery | Discovers databases (DB) in the database management system (DBMS), except: - templates; - default "postgres" DB; - DBs that do not allow connections. |
Database monitor | db.odbc.select[pgsql.db.discovery,,"Database={$PG.DATABASE};{$PG.CONNSTRING.ODBC}"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
DB [{#DBNAME}]: Get dbstat | Get dbstat metrics for database "{#DBNAME}". |
Dependent item | pgsql.dbstat.get_metrics["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Get locks | Get locks metrics for database "{#DBNAME}". |
Dependent item | pgsql.locks.get_metrics["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Get queries | Get queries metrics for database "{#DBNAME}". |
Dependent item | pgsql.queries.get_metrics["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Database age | Database age. |
Database monitor | db.odbc.select[pgsql.db.age,,"Database={#DBNAME};{$PG.CONNSTRING.ODBC}"] |
DB [{#DBNAME}]: Bloating tables | Number of bloating tables. |
Database monitor | db.odbc.select[pgsql.db.bloating_tables,,"Database={#DBNAME};{$PG.CONNSTRING.ODBC}"] |
DB [{#DBNAME}]: Database size | Database size. |
Database monitor | db.odbc.select[pgsql.db.size,,"Database={#DBNAME};{$PG.CONNSTRING.ODBC}"] |
DB [{#DBNAME}]: Blocks hit per second | Total number of times per second disk blocks were found already in the buffer cache, so that a read was not necessary. |
Dependent item | pgsql.dbstat.blks_hit.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Disk blocks read per second | Total number of disk blocks read per second in this database. |
Dependent item | pgsql.dbstat.blks_read.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Detected conflicts per second | Total number of queries canceled due to conflicts with recovery in this database per second. |
Dependent item | pgsql.dbstat.conflicts.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Detected deadlocks per second | Total number of detected deadlocks in this database per second. |
Dependent item | pgsql.dbstat.deadlocks.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Temp_bytes written per second | Total amount of data written to temporary files by queries in this database. |
Dependent item | pgsql.dbstat.temp_bytes.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Temp_files created per second | Total number of temporary files created by queries in this database. |
Dependent item | pgsql.dbstat.temp_files.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Tuples deleted per second | Total number of rows deleted by queries in this database per second. |
Dependent item | pgsql.dbstat.tup_deleted.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Tuples fetched per second | Total number of rows fetched by queries in this database per second. |
Dependent item | pgsql.dbstat.tup_fetched.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Tuples inserted per second | Total number of rows inserted by queries in this database per second. |
Dependent item | pgsql.dbstat.tup_inserted.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Tuples returned per second | Number of rows returned by queries in this database per second. |
Dependent item | pgsql.dbstat.tup_returned.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Tuples updated per second | Total number of rows updated by queries in this database per second. |
Dependent item | pgsql.dbstat.tup_updated.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Commits per second | Number of transactions in this database that have been committed per second. |
Dependent item | pgsql.dbstat.xact_commit.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Rollbacks per second | Total number of transactions in this database that have been rolled back. |
Dependent item | pgsql.dbstat.xact_rollback.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Backends connected | Number of backends currently connected to this database. |
Dependent item | pgsql.dbstat.numbackends["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Disk blocks read time per second | Time spent reading data file blocks by backends per second. |
Dependent item | pgsql.dbstat.blkreadtime.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Disk blocks write time per second | Time spent writing data file blocks by backends per second. |
Dependent item | pgsql.dbstat.blkwritetime.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Num of accessexclusive locks | Number of accessexclusive locks for this database. |
Dependent item | pgsql.locks.accessexclusive["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Num of accessshare locks | Number of accessshare locks for this database. |
Dependent item | pgsql.locks.accessshare["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Num of exclusive locks | Number of exclusive locks for this database. |
Dependent item | pgsql.locks.exclusive["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Num of rowexclusive locks | Number of rowexclusive locks for this database. |
Dependent item | pgsql.locks.rowexclusive["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Num of rowshare locks | Number of rowshare locks for this database. |
Dependent item | pgsql.locks.rowshare["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Num of sharerowexclusive locks | Number of total sharerowexclusive for this database. |
Dependent item | pgsql.locks.sharerowexclusive["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Num of shareupdateexclusive locks | Number of shareupdateexclusive locks for this database. |
Dependent item | pgsql.locks.shareupdateexclusive["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Num of share locks | Number of share locks for this database. |
Dependent item | pgsql.locks.share["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Num of locks total | Total number of locks in this database. |
Dependent item | pgsql.locks.total["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries max maintenance time | Max maintenance query time for this database. |
Dependent item | pgsql.queries.mro.time_max["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries max query time | Max query time for this database. |
Dependent item | pgsql.queries.query.time_max["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries max transaction time | Max transaction query time for this database. |
Dependent item | pgsql.queries.tx.time_max["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries slow maintenance count | Slow maintenance query count for this database. |
Dependent item | pgsql.queries.mro.slow_count["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries slow query count | Slow query count for this database. |
Dependent item | pgsql.queries.query.slow_count["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries slow transaction count | Slow transaction query count for this database. |
Dependent item | pgsql.queries.tx.slow_count["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries sum maintenance time | Sum maintenance query time for this database. |
Dependent item | pgsql.queries.mro.time_sum["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries sum query time | Sum query time for this database. |
Dependent item | pgsql.queries.query.time_sum["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries sum transaction time | Sum transaction query time for this database. |
Dependent item | pgsql.queries.tx.time_sum["{#DBNAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
DB [{#DBNAME}]: Too many recovery conflicts | The primary and standby servers are in many ways loosely connected. Actions on the primary will have an effect on the standby. As a result, there is potential for negative interactions or conflicts between them. |
min(/PostgreSQL by ODBC/pgsql.dbstat.conflicts.rate["{#DBNAME}"],5m) > {$PG.CONFLICTS.MAX.WARN:"{#DBNAME}"} |Average |
||
DB [{#DBNAME}]: Deadlock occurred | Number of deadlocks detected per second exceeds {$PG.DEADLOCKS.MAX.WARN:"{#DBNAME}"} for 5m. |
min(/PostgreSQL by ODBC/pgsql.dbstat.deadlocks.rate["{#DBNAME}"],5m) > {$PG.DEADLOCKS.MAX.WARN:"{#DBNAME}"} |High |
||
DB [{#DBNAME}]: Too many slow queries | The number of detected slow queries exceeds the limit of {$PG.SLOW_QUERIES.MAX.WARN:"{#DBNAME}"}. |
min(/PostgreSQL by ODBC/pgsql.queries.query.slow_count["{#DBNAME}"],5m)>{$PG.SLOW_QUERIES.MAX.WARN:"{#DBNAME}"} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the deployment of PostgreSQL monitoring by Zabbix via Zabbix agent 2 and uses a loadable plugin to run SQL queries.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Deploy Zabbix agent 2 with the PostgreSQL plugin. Starting with Zabbix versions 6.0.10 / 6.2.4 / 6.4 PostgreSQL metrics are moved to a loadable plugin and require installation of a separate package or compilation of the plugin from sources
.
Create the PostgreSQL user for monitoring (<password>
at your discretion) and inherit permissions from the default role pg_monitor
:
CREATE USER zbx_monitor WITH PASSWORD '<PASSWORD>' INHERIT;
GRANT pg_monitor TO zbx_monitor;
pg_hba.conf
configuration file to allow connections for the user zbx_monitor
. For example, you could add one of the following rows to allow local TCP connections from the same host:# TYPE DATABASE USER ADDRESS METHOD
host all zbx_monitor localhost trust
host all zbx_monitor 127.0.0.1/32 md5
host all zbx_monitor ::1/128 scram-sha-256
For more information please read the PostgreSQL documentation https://www.postgresql.org/docs/current/auth-pg-hba-conf.html
.
{$PG.CONNSTRING.AGENT2}
macro as URI, such as <protocol(host:port)>
, or specify the named session - <sessionname>
.Note: if you want to use SSL/TLS encryption to protect communications with the remote PostgreSQL instance, a named session must be used. In that case, the instance URI should be specified in the Plugins.PostgreSQL.Sessions.*.Uri
parameter in the PostgreSQL plugin configuration files alongside all the encryption parameters (type, cerfiticate/key filepaths if needed etc.).
You can check the PostgreSQL plugin documentation
for details about agent plugin parameters and named sessions.
Also, it is assumed that you set up the PostgreSQL instance to work in the desired encryption mode. Check the PostgreSQL documentation
for details.
Note: plugin TLS certificate validation relies on checking the Subject Alternative Names (SAN) instead of the Common Name (CN), check the cryptography package documentation
for details.
For example, to enable required encryption in transport mode without identity checks you could create the file /etc/zabbix/zabbix_agent2.d/postgresql_myconn.conf
with the following configuration for the named session myconn
(replace <instanceip>
with the address of the PostgreSQL instance):
Plugins.PostgreSQL.Sessions.myconn.Uri=tcp://<instanceip>:5432
Plugins.PostgreSQL.Sessions.myconn.TLSConnect=required
Then set the {$PG.CONNSTRING.AGENT2}
macro to myconn
to use this named session.
{$PG.PASSWORD}
.Name | Description | Default |
---|---|---|
{$PG.PASSWORD} | PostgreSQL user password. |
<Put the password here> |
{$PG.CONNSTRING.AGENT2} | URI or named session of the PostgreSQL instance. |
tcp://localhost:5432 |
{$PG.USER} | PostgreSQL username. |
zbx_monitor |
{$PG.LLD.FILTER.DBNAME} | Filter of discoverable databases. |
.+ |
{$PG.CONNTOTALPCT.MAX.WARN} | Maximum percentage of current connections for trigger expression. |
90 |
{$PG.DATABASE} | Default PostgreSQL database for the connection. |
postgres |
{$PG.DEADLOCKS.MAX.WARN} | Maximum number of detected deadlocks for trigger expression. |
0 |
{$PG.LLD.FILTER.APPLICATION} | Filter of discoverable applications. |
.+ |
{$PG.CONFLICTS.MAX.WARN} | Maximum number of recovery conflicts for trigger expression. |
0 |
{$PG.QUERY_ETIME.MAX.WARN} | Execution time limit for count of slow queries. |
30 |
{$PG.SLOW_QUERIES.MAX.WARN} | Slow queries count threshold for a trigger. |
5 |
Name | Description | Type | Key and additional info |
---|---|---|---|
PostgreSQL: Get bgwriter | Collect all metrics from pgstatbgwriter: https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-BGWRITER-VIEW |
Zabbix agent | pgsql.bgwriter["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Get archive | Collect archive status metrics. |
Zabbix agent | pgsql.archive["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Get dbstat | Collect all metrics from pgstatdatabase per database: https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-DATABASE-VIEW |
Zabbix agent | pgsql.dbstat["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Get dbstat sum | Collect all metrics from pgstatdatabase as sums for all databases: https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-DATABASE-VIEW |
Zabbix agent | pgsql.dbstat.sum["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Get connections sum | Collect all metrics from pgstatactivity: https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW |
Zabbix agent | pgsql.connections["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Get WAL | Collect write-ahead log (WAL) metrics. |
Zabbix agent | pgsql.wal.stat["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Get locks | Collect all metrics from pg_locks per database: https://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-TABLES |
Zabbix agent | pgsql.locks["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Custom queries | Execute custom queries from file *.sql (check for option Plugins.Postgres.CustomQueriesPath at agent configuration). |
Zabbix agent | pgsql.custom.query["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}",""] |
PostgreSQL: Get replication | Collect metrics from the pgstatreplication, which contains information about the WAL sender process, showing statistics about replication to that sender's connected standby server. |
Zabbix agent | pgsql.replication.process["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Get queries | Collect all metrics by query execution time. |
Zabbix agent | pgsql.queries["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}","{$PG.QUERY_ETIME.MAX.WARN}"] |
PostgreSQL: Version | PostgreSQL version. |
Zabbix agent | pgsql.version["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] Preprocessing
|
WAL: Bytes written | WAL write, in bytes. |
Dependent item | pgsql.wal.write Preprocessing
|
WAL: Bytes received | WAL receive, in bytes. |
Dependent item | pgsql.wal.receive Preprocessing
|
WAL: Segments count | Number of WAL segments. |
Dependent item | pgsql.wal.count Preprocessing
|
Bgwriter: Buffers allocated per second | Number of buffers allocated per second. |
Dependent item | pgsql.bgwriter.buffers_alloc.rate Preprocessing
|
Bgwriter: Buffers written directly by a backend per second | Number of buffers written directly by a backend per second. |
Dependent item | pgsql.bgwriter.buffers_backend.rate Preprocessing
|
Bgwriter: Number of bgwriter cleaning scan stopped per second | Number of times the background writer stopped a cleaning scan because it had written too many buffers per second. |
Dependent item | pgsql.bgwriter.maxwritten_clean.rate Preprocessing
|
Bgwriter: Times a backend executed its own fsync per second | Number of times a backend had to execute its own fsync call per second (normally the background writer handles those even when the backend does its own write). |
Dependent item | pgsql.bgwriter.buffersbackendfsync.rate Preprocessing
|
Checkpoint: Buffers written by the background writer per second | Number of buffers written by the background writer per second. |
Dependent item | pgsql.bgwriter.buffers_clean.rate Preprocessing
|
Checkpoint: Buffers written during checkpoints per second | Number of buffers written during checkpoints per second. |
Dependent item | pgsql.bgwriter.buffers_checkpoint.rate Preprocessing
|
Checkpoint: Scheduled per second | Number of scheduled checkpoints that have been performed per second. |
Dependent item | pgsql.bgwriter.checkpoints_timed.rate Preprocessing
|
Checkpoint: Requested per second | Number of requested checkpoints that have been performed per second. |
Dependent item | pgsql.bgwriter.checkpoints_req.rate Preprocessing
|
Checkpoint: Checkpoint write time per second | Total amount of time per second that has been spent in the portion of checkpoint processing where files are written to disk. |
Dependent item | pgsql.bgwriter.checkpointwritetime.rate Preprocessing
|
Checkpoint: Checkpoint sync time per second | Total amount of time per second that has been spent in the portion of checkpoint processing where files are synchronized to disk. |
Dependent item | pgsql.bgwriter.checkpointsynctime.rate Preprocessing
|
Archive: Count of archived files | Count of archived files. |
Dependent item | pgsql.archive.countarchivedfiles Preprocessing
|
Archive: Count of failed attempts to archive files | Count of failed attempts to archive files. |
Dependent item | pgsql.archive.failedtryingto_archive Preprocessing
|
Archive: Count of files in archive_status need to archive | Count of files to archive. |
Dependent item | pgsql.archive.countfilesto_archive Preprocessing
|
Archive: Size of files need to archive | Size of files to archive. |
Dependent item | pgsql.archive.sizefilesto_archive Preprocessing
|
Dbstat: Blocks read time | Time spent reading data file blocks by backends. |
Dependent item | pgsql.dbstat.sum.blkreadtime Preprocessing
|
Dbstat: Blocks write time | Time spent writing data file blocks by backends. |
Dependent item | pgsql.dbstat.sum.blkwritetime Preprocessing
|
Dbstat: Checksum failures per second | Number of data page checksum failures per second detected (or on a shared object), or NULL if data checksums are not enabled. This metric is available since PostgreSQL 12. |
Dependent item | pgsql.dbstat.sum.checksum_failures.rate Preprocessing
|
Dbstat: Committed transactions per second | Number of transactions that have been committed per second. |
Dependent item | pgsql.dbstat.sum.xact_commit.rate Preprocessing
|
Dbstat: Conflicts per second | Number of queries canceled per second due to conflicts with recovery (conflicts occur only on standby servers; see pgstatdatabase_conflicts for details). |
Dependent item | pgsql.dbstat.sum.conflicts.rate Preprocessing
|
Dbstat: Deadlocks per second | Number of deadlocks detected per second. |
Dependent item | pgsql.dbstat.sum.deadlocks.rate Preprocessing
|
Dbstat: Disk blocks read per second | Number of disk blocks read per second. |
Dependent item | pgsql.dbstat.sum.blks_read.rate Preprocessing
|
Dbstat: Hit blocks read per second | Number of times per second disk blocks were found already in the buffer cache |
Dependent item | pgsql.dbstat.sum.blks_hit.rate Preprocessing
|
Dbstat: Number temp bytes per second | Total amount of data written per second to temporary files by queries. |
Dependent item | pgsql.dbstat.sum.temp_bytes.rate Preprocessing
|
Dbstat: Number temp files per second | Number of temporary files created by queries per second. |
Dependent item | pgsql.dbstat.sum.temp_files.rate Preprocessing
|
Dbstat: Roll backed transactions per second | Number of transactions that have been rolled back per second. |
Dependent item | pgsql.dbstat.sum.xact_rollback.rate Preprocessing
|
Dbstat: Rows deleted per second | Number of rows deleted by queries per second. |
Dependent item | pgsql.dbstat.sum.tup_deleted.rate Preprocessing
|
Dbstat: Rows fetched per second | Number of rows fetched by queries per second. |
Dependent item | pgsql.dbstat.sum.tup_fetched.rate Preprocessing
|
Dbstat: Rows inserted per second | Number of rows inserted by queries per second. |
Dependent item | pgsql.dbstat.sum.tup_inserted.rate Preprocessing
|
Dbstat: Rows returned per second | Number of rows returned by queries per second. |
Dependent item | pgsql.dbstat.sum.tup_returned.rate Preprocessing
|
Dbstat: Rows updated per second | Number of rows updated by queries per second. |
Dependent item | pgsql.dbstat.sum.tup_updated.rate Preprocessing
|
Dbstat: Backends connected | Number of connected backends. |
Dependent item | pgsql.dbstat.sum.numbackends Preprocessing
|
Connections sum: Active | Total number of connections executing a query. |
Dependent item | pgsql.connections.sum.active Preprocessing
|
Connections sum: Fastpath function call | Total number of connections executing a fast-path function. |
Dependent item | pgsql.connections.sum.fastpathfunctioncall Preprocessing
|
Connections sum: Idle | Total number of connections waiting for a new client command. |
Dependent item | pgsql.connections.sum.idle Preprocessing
|
Connections sum: Idle in transaction | Total number of connections in a transaction state but not executing a query. |
Dependent item | pgsql.connections.sum.idleintransaction Preprocessing
|
Connections sum: Prepared | Total number of prepared transactions: https://www.postgresql.org/docs/current/sql-prepare-transaction.html |
Dependent item | pgsql.connections.sum.prepared Preprocessing
|
Connections sum: Total | Total number of connections. |
Dependent item | pgsql.connections.sum.total Preprocessing
|
Connections sum: Total, % | Total number of connections, in percentage. |
Dependent item | pgsql.connections.sum.total_pct Preprocessing
|
Connections sum: Waiting | Total number of waiting connections: https://www.postgresql.org/docs/current/monitoring-stats.html#WAIT-EVENT-TABLE |
Dependent item | pgsql.connections.sum.waiting Preprocessing
|
Connections sum: Idle in transaction (aborted) | Total number of connections in a transaction state but not executing a query, and where one of the statements in the transaction caused an error. |
Dependent item | pgsql.connections.sum.idleintransaction_aborted Preprocessing
|
Connections sum: Disabled | Total number of disabled connections. |
Dependent item | pgsql.connections.sum.disabled Preprocessing
|
PostgreSQL: Age of oldest xid | Age of oldest xid. |
Zabbix agent | pgsql.oldest.xid["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Count of autovacuum workers | Number of autovacuum workers. |
Zabbix agent | pgsql.autovacuum.count["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Cache hit ratio, % | Cache hit ratio. |
Calculated | pgsql.cache.hit |
PostgreSQL: Uptime | Time since the server started. |
Zabbix agent | pgsql.uptime["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Replication: Lag in bytes | Replication lag with master, in bytes. |
Zabbix agent | pgsql.replication.lag.b["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Replication: Lag in seconds | Replication lag with master, in seconds. |
Zabbix agent | pgsql.replication.lag.sec["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Replication: Recovery role | Replication role: 1 — recovery is still in progress (standby mode), 0 — master mode. |
Zabbix agent | pgsql.replication.recovery_role["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Replication: Standby count | Number of standby servers. |
Zabbix agent | pgsql.replication.count["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Replication: Status | Replication status: 0 — streaming is down, 1 — streaming is up, 2 — master mode. |
Zabbix agent | pgsql.replication.status["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Ping | Used to test a connection to see if it is alive. It is set to 0 if the query is unsuccessful. |
Zabbix agent | pgsql.ping["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
PostgreSQL: Version has changed | last(/PostgreSQL by Zabbix agent 2/pgsql.version["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"],#1)<>last(/PostgreSQL by Zabbix agent 2/pgsql.version["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"],#2) and length(last(/PostgreSQL by Zabbix agent 2/pgsql.version["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]))>0 |Info |
|||
Dbstat: Checksum failures detected | Data page checksum failures were detected on that DB instance: |
last(/PostgreSQL by Zabbix agent 2/pgsql.dbstat.sum.checksum_failures.rate)>0 |Average |
||
PostgreSQL: Total number of connections is too high | Total number of current connections exceeds the limit of {$PG.CONNTOTALPCT.MAX.WARN}% out of the maximum number of concurrent connections to the database server (the "max_connections" setting). |
min(/PostgreSQL by Zabbix agent 2/pgsql.connections.sum.total_pct,5m) > {$PG.CONN_TOTAL_PCT.MAX.WARN} |Average |
||
PostgreSQL: Oldest xid is too big | last(/PostgreSQL by Zabbix agent 2/pgsql.oldest.xid["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]) > 18000000 |Average |
|||
PostgreSQL: Service has been restarted | PostgreSQL uptime is less than 10 minutes. |
last(/PostgreSQL by Zabbix agent 2/pgsql.uptime["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]) < 10m |Average |
||
PostgreSQL: Service is down | Last test of a connection was unsuccessful. |
last(/PostgreSQL by Zabbix agent 2/pgsql.ping["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"])=0 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Replication discovery | Discovers replication lag metrics. |
Zabbix agent | pgsql.replication.process.discovery["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Application [{#APPLICATION_NAME}]: Get replication | Collect metrics from the "pgstatreplication" about the application "{#APPLICATION_NAME}" that is connected to this WAL sender, which contains information about the WAL sender process, showing statistics about replication to that sender's connected standby server. |
Dependent item | pgsql.replication.getmetrics["{#APPLICATIONNAME}"] Preprocessing
|
Application [{#APPLICATION_NAME}]: Replication flush lag | Dependent item | pgsql.replication.process.flushlag["{#APPLICATIONNAME}"] Preprocessing
|
|
Application [{#APPLICATION_NAME}]: Replication replay lag | Dependent item | pgsql.replication.process.replaylag["{#APPLICATIONNAME}"] Preprocessing
|
|
Application [{#APPLICATION_NAME}]: Replication write lag | Dependent item | pgsql.replication.process.writelag["{#APPLICATIONNAME}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Database discovery | Discovers databases (DB) in the database management system (DBMS), except: - templates; - DBs that do not allow connections. |
Zabbix agent | pgsql.db.discovery["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
DB [{#DBNAME}]: Get dbstat | Get dbstat metrics for database "{#DBNAME}". |
Dependent item | pgsql.dbstat.get_metrics["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Get locks | Get locks metrics for database "{#DBNAME}". |
Dependent item | pgsql.locks.get_metrics["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Get queries | Get queries metrics for database "{#DBNAME}". |
Dependent item | pgsql.queries.get_metrics["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Database age | Database age. |
Zabbix agent | pgsql.db.age["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{#DBNAME}"] |
DB [{#DBNAME}]: Bloating tables | Number of bloating tables. |
Zabbix agent | pgsql.db.bloating_tables["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{#DBNAME}"] |
DB [{#DBNAME}]: Database size | Database size. |
Zabbix agent | pgsql.db.size["{$PG.CONNSTRING.AGENT2}","{$PG.USER}","{$PG.PASSWORD}","{#DBNAME}"] |
DB [{#DBNAME}]: Blocks hit per second | Total number of times per second disk blocks were found already in the buffer cache, so that a read was not necessary. |
Dependent item | pgsql.dbstat.blks_hit.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Disk blocks read per second | Total number of disk blocks read per second in this database. |
Dependent item | pgsql.dbstat.blks_read.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Detected conflicts per second | Total number of queries canceled due to conflicts with recovery in this database per second. |
Dependent item | pgsql.dbstat.conflicts.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Detected deadlocks per second | Total number of detected deadlocks in this database per second. |
Dependent item | pgsql.dbstat.deadlocks.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Temp_bytes written per second | Total amount of data written to temporary files by queries in this database. |
Dependent item | pgsql.dbstat.temp_bytes.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Temp_files created per second | Total number of temporary files created by queries in this database. |
Dependent item | pgsql.dbstat.temp_files.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Tuples deleted per second | Total number of rows deleted by queries in this database per second. |
Dependent item | pgsql.dbstat.tup_deleted.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Tuples fetched per second | Total number of rows fetched by queries in this database per second. |
Dependent item | pgsql.dbstat.tup_fetched.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Tuples inserted per second | Total number of rows inserted by queries in this database per second. |
Dependent item | pgsql.dbstat.tup_inserted.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Tuples returned per second | Number of rows returned by queries in this database per second. |
Dependent item | pgsql.dbstat.tup_returned.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Tuples updated per second | Total number of rows updated by queries in this database per second. |
Dependent item | pgsql.dbstat.tup_updated.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Commits per second | Number of transactions in this database that have been committed per second. |
Dependent item | pgsql.dbstat.xact_commit.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Rollbacks per second | Total number of transactions in this database that have been rolled back. |
Dependent item | pgsql.dbstat.xact_rollback.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Backends connected | Number of backends currently connected to this database. |
Dependent item | pgsql.dbstat.numbackends["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Checksum failures | Number of data page checksum failures detected in this database. |
Dependent item | pgsql.dbstat.checksum_failures.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Disk blocks read time per second | Time spent reading data file blocks by backends per second. |
Dependent item | pgsql.dbstat.blkreadtime.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Disk blocks write time | Time spent writing data file blocks by backends per second. |
Dependent item | pgsql.dbstat.blkwritetime.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Num of accessexclusive locks | Number of accessexclusive locks for this database. |
Dependent item | pgsql.locks.accessexclusive["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Num of accessshare locks | Number of accessshare locks for this database. |
Dependent item | pgsql.locks.accessshare["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Num of exclusive locks | Number of exclusive locks for this database. |
Dependent item | pgsql.locks.exclusive["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Num of rowexclusive locks | Number of rowexclusive locks for this database. |
Dependent item | pgsql.locks.rowexclusive["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Num of rowshare locks | Number of rowshare locks for this database. |
Dependent item | pgsql.locks.rowshare["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Num of sharerowexclusive locks | Number of total sharerowexclusive for this database. |
Dependent item | pgsql.locks.sharerowexclusive["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Num of shareupdateexclusive locks | Number of shareupdateexclusive locks for this database. |
Dependent item | pgsql.locks.shareupdateexclusive["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Num of share locks | Number of share locks for this database. |
Dependent item | pgsql.locks.share["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Num of locks total | Total number of locks in this database. |
Dependent item | pgsql.locks.total["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries max maintenance time | Max maintenance query time for this database. |
Dependent item | pgsql.queries.mro.time_max["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries max query time | Max query time for this database. |
Dependent item | pgsql.queries.query.time_max["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries max transaction time | Max transaction query time for this database. |
Dependent item | pgsql.queries.tx.time_max["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries slow maintenance count | Slow maintenance query count for this database. |
Dependent item | pgsql.queries.mro.slow_count["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries slow query count | Slow query count for this database. |
Dependent item | pgsql.queries.query.slow_count["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries slow transaction count | Slow transaction query count for this database. |
Dependent item | pgsql.queries.tx.slow_count["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries sum maintenance time | Sum maintenance query time for this database. |
Dependent item | pgsql.queries.mro.time_sum["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries sum query time | Sum query time for this database. |
Dependent item | pgsql.queries.query.time_sum["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries sum transaction time | Sum transaction query time for this database. |
Dependent item | pgsql.queries.tx.time_sum["{#DBNAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
DB [{#DBNAME}]: Too many recovery conflicts | The primary and standby servers are in many ways loosely connected. Actions on the primary will have an effect on the standby. As a result, there is potential for negative interactions or conflicts between them. |
min(/PostgreSQL by Zabbix agent 2/pgsql.dbstat.conflicts.rate["{#DBNAME}"],5m) > {$PG.CONFLICTS.MAX.WARN:"{#DBNAME}"} |Average |
||
DB [{#DBNAME}]: Deadlock occurred | Number of deadlocks detected per second exceeds {$PG.DEADLOCKS.MAX.WARN:"{#DBNAME}"} for 5m. |
min(/PostgreSQL by Zabbix agent 2/pgsql.dbstat.deadlocks.rate["{#DBNAME}"],5m) > {$PG.DEADLOCKS.MAX.WARN:"{#DBNAME}"} |High |
||
DB [{#DBNAME}]: Checksum failures detected | Data page checksum failures were detected on that database: |
last(/PostgreSQL by Zabbix agent 2/pgsql.dbstat.checksum_failures.rate["{#DBNAME}"])>0 |Average |
||
DB [{#DBNAME}]: Too many slow queries | The number of detected slow queries exceeds the limit of {$PG.SLOW_QUERIES.MAX.WARN:"{#DBNAME}"}. |
min(/PostgreSQL by Zabbix agent 2/pgsql.queries.query.slow_count["{#DBNAME}"],5m)>{$PG.SLOW_QUERIES.MAX.WARN:"{#DBNAME}"} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the deployment of PostgreSQL monitoring by Zabbix via Zabbix agent and uses user parameters to run SQL queries with the psql
command-line tool.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Note:
pg_isready
and psql
utilities to be installed on the same host with Zabbix agent.<password>
at your discretion) with proper access rights to your PostgreSQL instance.For PostgreSQL version 10 and above:
CREATE USER zbx_monitor WITH PASSWORD '<PASSWORD>' INHERIT;
GRANT pg_monitor TO zbx_monitor;
For PostgreSQL version 9.6 and below:
CREATE USER zbx_monitor WITH PASSWORD '<PASSWORD>';
GRANT SELECT ON pg_stat_database TO zbx_monitor;
-- To collect WAL metrics, the user must have a `superuser` role.
ALTER USER zbx_monitor WITH SUPERUSER;
postgresql/
directory to the zabbix
user home directory - /var/lib/zabbix/
. The postgresql/
directory contains the files with SQL queries needed to obtain metrics from PostgreSQL instance.If the home directory of the zabbix
user doesn't exist, create it first:
mkdir -m u=rwx,g=rwx,o= -p /var/lib/zabbix
chown zabbix:zabbix /var/lib/zabbix
template_db_postgresql.conf
file, containing user parameters, to the Zabbix agent configuration directory /etc/zabbix/zabbix_agentd.d/
and restart Zabbix agent service.Note: if you want to use SSL/TLS encryption to protect communications with the remote PostgreSQL instance, you can modify the connection string in user parameters. For example, to enable required encryption in transport mode without identity checks you could append ?sslmode=required
to the end of the connection string for all keys that use psql
:
UserParameter=pgsql.bgwriter[*], psql -qtAX postgresql://"$3":"$4"@"$1":"$2"/"$5"?sslmode=required -f "/var/lib/zabbix/postgresql/pgsql.bgwriter.sql"
Consult the PostgreSQL documentation about protection modes
and client connection parameters
.
Also, it is assumed that you set up the PostgreSQL instance to work in the desired encryption mode. Check the PostgreSQL documentation
for details.
pg_hba.conf
configuration file to allow connections for the user zbx_monitor
. For example, you could add one of the following rows to allow local TCP connections from the same host:# TYPE DATABASE USER ADDRESS METHOD
host all zbx_monitor localhost trust
host all zbx_monitor 127.0.0.1/32 md5
host all zbx_monitor ::1/128 scram-sha-256
For more information please read the PostgreSQL documentation https://www.postgresql.org/docs/current/auth-pg-hba-conf.html
.
Specify the host name or IP address in the {$PG.HOST}
macro. Adjust the port number with {$PG.PORT}
macro if needed.
Set the password that you specified in step 1 in the macro {$PG.PASSWORD}
.
Name | Description | Default |
---|---|---|
{$PG.CACHE_HITRATIO.MIN.WARN} | Minimum cache hit ratio percentage for trigger expression. |
90 |
{$PG.CHECKPOINTS_REQ.MAX.WARN} | Maximum required checkpoint occurrences for trigger expression. |
5 |
{$PG.CONFLICTS.MAX.WARN} | Maximum number of recovery conflicts for trigger expression. |
0 |
{$PG.CONNTOTALPCT.MAX.WARN} | Maximum percentage of current connections for trigger expression. |
90 |
{$PG.DATABASE} | Default PostgreSQL database for the connection. |
postgres |
{$PG.DEADLOCKS.MAX.WARN} | Maximum number of detected deadlocks for trigger expression. |
0 |
{$PG.FROZENXIDPCTSTOP.MIN.HIGH} | Minimum frozen XID before stop percentage for trigger expression. |
75 |
{$PG.HOST} | Hostname or IP of PostgreSQL host. |
localhost |
{$PG.LLD.FILTER.DBNAME} | Filter of discoverable databases. |
.+ |
{$PG.LOCKS.MAX.WARN} | Maximum number of locks for trigger expression. |
100 |
{$PG.PING_TIME.MAX.WARN} | Maximum time of connection response for trigger expression. |
1s |
{$PG.PORT} | PostgreSQL service port. |
5432 |
{$PG.QUERY_ETIME.MAX.WARN} | Execution time limit for count of slow queries. |
30 |
{$PG.REPL_LAG.MAX.WARN} | Maximum replication lag time for trigger expression. |
10m |
{$PG.SLOW_QUERIES.MAX.WARN} | Slow queries count threshold for a trigger. |
5 |
{$PG.USER} | PostgreSQL username. |
zbx_monitor |
{$PG.PASSWORD} | PostgreSQL user password. |
<Put the password here> |
Name | Description | Type | Key and additional info |
---|---|---|---|
Bgwriter: Buffers allocated per second | Number of buffers allocated per second. |
Dependent item | pgsql.bgwriter.buffers_alloc.rate Preprocessing
|
Bgwriter: Buffers written directly by a backend per second | Number of buffers written directly by a backend per second. |
Dependent item | pgsql.bgwriter.buffers_backend.rate Preprocessing
|
Bgwriter: Times a backend executed its own fsync per second | Number of times a backend had to execute its own fsync call per second (normally the background writer handles those even when the backend does its own write). |
Dependent item | pgsql.bgwriter.buffersbackendfsync.rate Preprocessing
|
Checkpoint: Buffers written during checkpoints per second | Number of buffers written during checkpoints per second. |
Dependent item | pgsql.bgwriter.buffers_checkpoint.rate Preprocessing
|
Checkpoint: Buffers written by the background writer per second | Number of buffers written by the background writer per second. |
Dependent item | pgsql.bgwriter.buffers_clean.rate Preprocessing
|
Checkpoint: Requested per second | Number of requested checkpoints that have been performed per second. |
Dependent item | pgsql.bgwriter.checkpoints_req.rate Preprocessing
|
Checkpoint: Scheduled per second | Number of scheduled checkpoints that have been performed per second. |
Dependent item | pgsql.bgwriter.checkpoints_timed.rate Preprocessing
|
Checkpoint: Checkpoint sync time per second | Total amount of time per second that has been spent in the portion of checkpoint processing where files are synchronized to disk. |
Dependent item | pgsql.bgwriter.checkpointsynctime.rate Preprocessing
|
Checkpoint: Checkpoint write time per second | Total amount of time per second that has been spent in the portion of checkpoint processing where files are written to disk. |
Dependent item | pgsql.bgwriter.checkpointwritetime.rate Preprocessing
|
Bgwriter: Number of bgwriter cleaning scan stopped per second | Number of times the background writer stopped a cleaning scan because it had written too many buffers per second. |
Dependent item | pgsql.bgwriter.maxwritten_clean.rate Preprocessing
|
PostgreSQL: Get bgwriter | Collect all metrics from pgstatbgwriter: https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-BGWRITER-VIEW |
Zabbix agent | pgsql.bgwriter["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Cache hit ratio, % | Cache hit ratio. |
Zabbix agent | pgsql.cache.hit["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Config hash | PostgreSQL configuration hash. |
Zabbix agent | pgsql.config.hash["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] Preprocessing
|
Connections sum: Active | Total number of connections executing a query. |
Dependent item | pgsql.connections.sum.active Preprocessing
|
Connections sum: Idle | Total number of connections waiting for a new client command. |
Dependent item | pgsql.connections.sum.idle Preprocessing
|
Connections sum: Idle in transaction | Total number of connections in a transaction state but not executing a query. |
Dependent item | pgsql.connections.sum.idleintransaction Preprocessing
|
Connections sum: Prepared | Total number of prepared transactions: https://www.postgresql.org/docs/current/sql-prepare-transaction.html |
Dependent item | pgsql.connections.sum.prepared Preprocessing
|
Connections sum: Total | Total number of connections. |
Dependent item | pgsql.connections.sum.total Preprocessing
|
Connections sum: Total, % | Total number of connections, in percentage. |
Dependent item | pgsql.connections.sum.total_pct Preprocessing
|
Connections sum: Waiting | Total number of waiting connections: https://www.postgresql.org/docs/current/monitoring-stats.html#WAIT-EVENT-TABLE |
Dependent item | pgsql.connections.sum.waiting Preprocessing
|
PostgreSQL: Get connections sum | Collect all metrics from pgstatactivity: https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW |
Zabbix agent | pgsql.connections.sum["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Get dbstat | Collect all metrics from pgstatdatabase per database: https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-DATABASE-VIEW |
Zabbix agent | pgsql.dbstat["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Get locks | Collect all metrics from pg_locks per database: https://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-TABLES |
Zabbix agent | pgsql.locks["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Ping time | Used to get the |
Zabbix agent | pgsql.ping.time["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] Preprocessing
|
PostgreSQL: Ping | Used to test a connection to see if it is alive. It is set to 0 if the instance doesn't accept the connections. |
Zabbix agent | pgsql.ping["{$PG.HOST}","{$PG.PORT}"] Preprocessing
|
PostgreSQL: Get queries | Collect all metrics by query execution time. |
Zabbix agent | pgsql.queries["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}","{$PG.QUERY_ETIME.MAX.WARN}"] |
PostgreSQL: Replication: Standby count | Number of standby servers. |
Zabbix agent | pgsql.replication.count["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Replication: Lag in seconds | Replication lag with master, in seconds. |
Zabbix agent | pgsql.replication.lag.sec["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Replication: Recovery role | Replication role: 1 — recovery is still in progress (standby mode), 0 — master mode. |
Zabbix agent | pgsql.replication.recovery_role["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Replication: Status | Replication status: 0 — streaming is down, 1 — streaming is up, 2 — master mode. |
Zabbix agent | pgsql.replication.status["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
Transactions: Max active transaction time | Current max active transaction time. |
Dependent item | pgsql.transactions.active Preprocessing
|
Transactions: Max idle transaction time | Current max idle transaction time. |
Dependent item | pgsql.transactions.idle Preprocessing
|
Transactions: Max prepared transaction time | Current max prepared transaction time. |
Dependent item | pgsql.transactions.prepared Preprocessing
|
Transactions: Max waiting transaction time | Current max waiting transaction time. |
Dependent item | pgsql.transactions.waiting Preprocessing
|
PostgreSQL: Get transactions | Collect metrics by transaction execution time. |
Zabbix agent | pgsql.transactions["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Uptime | Time since the server started. |
Zabbix agent | pgsql.uptime["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
PostgreSQL: Version | PostgreSQL version. |
Zabbix agent | pgsql.version["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] Preprocessing
|
WAL: Segments count | Number of WAL segments. |
Dependent item | pgsql.wal.count Preprocessing
|
PostgreSQL: Get WAL | Collect write-ahead log (WAL) metrics. |
Zabbix agent | pgsql.wal.stat["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
WAL: Bytes written | WAL write, in bytes. |
Dependent item | pgsql.wal.write Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
PostgreSQL: Required checkpoints occur too frequently | Checkpoints are points in the sequence of transactions at which it is guaranteed that the heap and index data files have been updated with all information written before that checkpoint. At checkpoint time, all dirty data pages are flushed to disk and a special checkpoint record is written to the log file. |
last(/PostgreSQL by Zabbix agent/pgsql.bgwriter.checkpoints_req.rate) > {$PG.CHECKPOINTS_REQ.MAX.WARN} |Average |
||
PostgreSQL: Failed to get items | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/PostgreSQL by Zabbix agent/pgsql.bgwriter["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"],30m) = 1 |Warning |
Depends on:
|
|
PostgreSQL: Cache hit ratio too low | Cache hit ratio is lower than {$PG.CACHE_HITRATIO.MIN.WARN} for 5m. |
max(/PostgreSQL by Zabbix agent/pgsql.cache.hit["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"],5m) < {$PG.CACHE_HITRATIO.MIN.WARN} |Warning |
||
PostgreSQL: Configuration has changed | PostgreSQL configuration has changed. |
last(/PostgreSQL by Zabbix agent/pgsql.config.hash["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"],#1)<>last(/PostgreSQL by Zabbix agent/pgsql.config.hash["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"],#2) and length(last(/PostgreSQL by Zabbix agent/pgsql.config.hash["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]))>0 |Info |
||
PostgreSQL: Total number of connections is too high | Total number of current connections exceeds the limit of {$PG.CONNTOTALPCT.MAX.WARN}% out of the maximum number of concurrent connections to the database server (the "max_connections" setting). |
min(/PostgreSQL by Zabbix agent/pgsql.connections.sum.total_pct,5m) > {$PG.CONN_TOTAL_PCT.MAX.WARN} |Average |
||
PostgreSQL: Response too long | Response is taking too long (over {$PG.PING_TIME.MAX.WARN} for 5m). |
min(/PostgreSQL by Zabbix agent/pgsql.ping.time["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"],5m) > {$PG.PING_TIME.MAX.WARN} |Average |
Depends on:
|
|
PostgreSQL: Service is down | Last test of a connection was unsuccessful. |
last(/PostgreSQL by Zabbix agent/pgsql.ping["{$PG.HOST}","{$PG.PORT}"]) = 0 |High |
||
PostgreSQL: Streaming lag with master is too high | Replication lag with master is higher than {$PG.REPL_LAG.MAX.WARN} for 5m. |
min(/PostgreSQL by Zabbix agent/pgsql.replication.lag.sec["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"],5m) > {$PG.REPL_LAG.MAX.WARN} |Average |
||
PostgreSQL: Replication is down | Replication is enabled and data streaming was down for 5m. |
max(/PostgreSQL by Zabbix agent/pgsql.replication.status["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"],5m)=0 |Average |
||
PostgreSQL: Service has been restarted | PostgreSQL uptime is less than 10 minutes. |
last(/PostgreSQL by Zabbix agent/pgsql.uptime["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]) < 10m |Average |
||
PostgreSQL: Version has changed | last(/PostgreSQL by Zabbix agent/pgsql.version["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"],#1)<>last(/PostgreSQL by Zabbix agent/pgsql.version["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"],#2) and length(last(/PostgreSQL by Zabbix agent/pgsql.version["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"]))>0 |Info |
Name | Description | Type | Key and additional info |
---|---|---|---|
Database discovery | Discovers databases (DB) in the database management system (DBMS), except: - templates; - default "postgres" DB; - DBs that do not allow connections. |
Zabbix agent | pgsql.discovery.db["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
DB [{#DBNAME}]: Get dbstat | Get dbstat metrics for database "{#DBNAME}". |
Dependent item | pgsql.dbstat.get_metrics["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Get queries | Get queries metrics for database "{#DBNAME}". |
Dependent item | pgsql.queries.get_metrics["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Database size | Database size. |
Zabbix agent | pgsql.db.size["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}","{#DBNAME}"] |
DB [{#DBNAME}]: Blocks hit per second | Total number of times per second disk blocks were found already in the buffer cache, so that a read was not necessary. |
Dependent item | pgsql.dbstat.blks_hit.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Disk blocks read per second | Total number of disk blocks read per second in this database. |
Dependent item | pgsql.dbstat.blks_read.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Detected conflicts per second | Total number of queries canceled due to conflicts with recovery in this database per second. |
Dependent item | pgsql.dbstat.conflicts.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Detected deadlocks per second | Total number of detected deadlocks in this database per second. |
Dependent item | pgsql.dbstat.deadlocks.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Temp_bytes written per second | Total amount of data written to temporary files by queries in this database. |
Dependent item | pgsql.dbstat.temp_bytes.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Temp_files created per second | Total number of temporary files created by queries in this database. |
Dependent item | pgsql.dbstat.temp_files.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Tuples deleted per second | Total number of rows deleted by queries in this database per second. |
Dependent item | pgsql.dbstat.tup_deleted.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Tuples fetched per second | Total number of rows fetched by queries in this database per second. |
Dependent item | pgsql.dbstat.tup_fetched.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Tuples inserted per second | Total number of rows inserted by queries in this database per second. |
Dependent item | pgsql.dbstat.tup_inserted.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Tuples returned per second | Number of rows returned by queries in this database per second. |
Dependent item | pgsql.dbstat.tup_returned.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Tuples updated per second | Total number of rows updated by queries in this database per second. |
Dependent item | pgsql.dbstat.tup_updated.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Commits per second | Number of transactions in this database that have been committed per second. |
Dependent item | pgsql.dbstat.xact_commit.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Rollbacks per second | Total number of transactions in this database that have been rolled back. |
Dependent item | pgsql.dbstat.xact_rollback.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Frozen XID before autovacuum, % | Preventing Transaction ID Wraparound Failures: https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND |
Dependent item | pgsql.frozenxid.prcbeforeav["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Frozen XID before stop, % | Preventing Transaction ID Wraparound Failures: https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND |
Dependent item | pgsql.frozenxid.prcbeforestop["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Get frozen XID | Zabbix agent | pgsql.frozenxid["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{#DBNAME}"] | |
DB [{#DBNAME}]: Num of locks total | Total number of locks in this database. |
Dependent item | pgsql.locks.total["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries slow maintenance count | Slow maintenance query count for this database. |
Dependent item | pgsql.queries.mro.slow_count["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries max maintenance time | Max maintenance query time for this database. |
Dependent item | pgsql.queries.mro.time_max["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries sum maintenance time | Sum maintenance query time for this database. |
Dependent item | pgsql.queries.mro.time_sum["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries slow query count | Slow query count for this database. |
Dependent item | pgsql.queries.query.slow_count["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries max query time | Max query time for this database. |
Dependent item | pgsql.queries.query.time_max["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries sum query time | Sum query time for this database. |
Dependent item | pgsql.queries.query.time_sum["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries slow transaction count | Slow transaction query count for this database. |
Dependent item | pgsql.queries.tx.slow_count["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries max transaction time | Max transaction query time for this database. |
Dependent item | pgsql.queries.tx.time_max["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Queries sum transaction time | Sum transaction query time for this database. |
Dependent item | pgsql.queries.tx.time_sum["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Index scans per second | Number of index scans in the database per second. |
Dependent item | pgsql.scans.idx.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Sequential scans per second | Number of sequential scans in this database per second. |
Dependent item | pgsql.scans.seq.rate["{#DBNAME}"] Preprocessing
|
DB [{#DBNAME}]: Get scans | Number of scans done for table/index in this database. |
Zabbix agent | pgsql.scans["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{#DBNAME}"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
DB [{#DBNAME}]: Too many recovery conflicts | The primary and standby servers are in many ways loosely connected. Actions on the primary will have an effect on the standby. As a result, there is potential for negative interactions or conflicts between them. |
min(/PostgreSQL by Zabbix agent/pgsql.dbstat.conflicts.rate["{#DBNAME}"],5m) > {$PG.CONFLICTS.MAX.WARN:"{#DBNAME}"} |Average |
||
DB [{#DBNAME}]: Deadlock occurred | Number of deadlocks detected per second exceeds {$PG.DEADLOCKS.MAX.WARN:"{#DBNAME}"} for 5m. |
min(/PostgreSQL by Zabbix agent/pgsql.dbstat.deadlocks.rate["{#DBNAME}"],5m) > {$PG.DEADLOCKS.MAX.WARN:"{#DBNAME}"} |High |
||
DB [{#DBNAME}]: VACUUM FREEZE is required to prevent wraparound | Preventing Transaction ID Wraparound Failures: |
last(/PostgreSQL by Zabbix agent/pgsql.frozenxid.prc_before_stop["{#DBNAME}"])<{$PG.FROZENXID_PCT_STOP.MIN.HIGH:"{#DBNAME}"} |Average |
||
DB [{#DBNAME}]: Number of locks is too high | min(/PostgreSQL by Zabbix agent/pgsql.locks.total["{#DBNAME}"],5m)>{$PG.LOCKS.MAX.WARN:"{#DBNAME}"} |Warning |
|||
DB [{#DBNAME}]: Too many slow queries | The number of detected slow queries exceeds the limit of {$PG.SLOW_QUERIES.MAX.WARN:"{#DBNAME}"}. |
min(/PostgreSQL by Zabbix agent/pgsql.queries.query.slow_count["{#DBNAME}"],5m)>{$PG.SLOW_QUERIES.MAX.WARN:"{#DBNAME}"} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template is developed to monitor a single DBMS Oracle Database instance with ODBC and can monitor CDB or non-CDB installations.
Oracle Database 12c2 and newer.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Create an Oracle Database user for monitoring:
In CDB installations, it is possible to monitor tablespaces from the CDB (container database) and all PDBs (pluggable databases). To do so, a common user is needed with the correct rights:
CREATE USER c##zabbix_mon IDENTIFIED BY <PASSWORD>;
-- Grant access to the c##zabbix_mon user.
ALTER USER c##zabbix_mon SET CONTAINER_DATA=ALL CONTAINER=CURRENT;
GRANT CONNECT, CREATE SESSION TO c##zabbix_mon;
GRANT SELECT_CATALOG_ROLE to c##zabbix_mon;
GRANT SELECT ON v_$instance TO c##zabbix_mon;
GRANT SELECT ON v_$database TO c##zabbix_mon;
GRANT SELECT ON v_$sysmetric TO c##zabbix_mon;
GRANT SELECT ON v_$system_parameter TO c##zabbix_mon;
GRANT SELECT ON v_$session TO c##zabbix_mon;
GRANT SELECT ON v_$recovery_file_dest TO c##zabbix_mon;
GRANT SELECT ON v_$active_session_history TO c##zabbix_mon;
GRANT SELECT ON v_$osstat TO c##zabbix_mon;
GRANT SELECT ON v_$process TO c##zabbix_mon;
GRANT SELECT ON v_$datafile TO c##zabbix_mon;
GRANT SELECT ON v_$pgastat TO c##zabbix_mon;
GRANT SELECT ON v_$sgastat TO c##zabbix_mon;
GRANT SELECT ON v_$log TO c##zabbix_mon;
GRANT SELECT ON v_$archive_dest TO c##zabbix_mon;
GRANT SELECT ON v_$asm_diskgroup TO c##zabbix_mon;
GRANT SELECT ON v_$asm_diskgroup_stat TO c##zabbix_mon;
GRANT SELECT ON DBA_USERS TO c##zabbix_mon;
This is needed because the template uses CDB_*
views to monitor tablespaces from the CDB and different PDBs - the monitoring user therefore needs access to the container data objects on all PDBs.
However, if you wish to monitor only a single PDB or a non-CDB instance, a local user is sufficient:
CREATE USER zabbix_mon IDENTIFIED BY <PASSWORD>;
-- Grant access to the zabbix_mon user.
GRANT CONNECT, CREATE SESSION TO zabbix_mon;
GRANT SELECT_CATALOG_ROLE to zabbix_mon;
GRANT SELECT ON v_$instance TO zabbix_mon;
GRANT SELECT ON v_$database TO zabbix_mon;
GRANT SELECT ON v_$sysmetric TO zabbix_mon;
GRANT SELECT ON v_$system_parameter TO zabbix_mon;
GRANT SELECT ON v_$session TO zabbix_mon;
GRANT SELECT ON v_$recovery_file_dest TO zabbix_mon;
GRANT SELECT ON v_$active_session_history TO zabbix_mon;
GRANT SELECT ON v_$osstat TO zabbix_mon;
GRANT SELECT ON v_$process TO zabbix_mon;
GRANT SELECT ON v_$datafile TO zabbix_mon;
GRANT SELECT ON v_$pgastat TO zabbix_mon;
GRANT SELECT ON v_$sgastat TO zabbix_mon;
GRANT SELECT ON v_$log TO zabbix_mon;
GRANT SELECT ON v_$archive_dest TO zabbix_mon;
GRANT SELECT ON v_$asm_diskgroup TO zabbix_mon;
GRANT SELECT ON v_$asm_diskgroup_stat TO zabbix_mon;
GRANT SELECT ON DBA_USERS TO zabbix_mon;
Important! Ensure that the ODBC connection to Oracle includes the session parameter NLS_NUMERIC_CHARACTERS= '.,'
. It is important for displaying the float numbers in Zabbix correctly.
Important! These privileges grant the monitoring user SELECT_CATALOG_ROLE
, which, in turn, gives access to thousands of tables in the database.
This role is required to access the V$RESTORE_POINT
dynamic performance view.
However, there are ways to go around this, if the SELECT_CATALOG_ROLE
assigned to a monitoring user raises any security issues.
One way to do this is using pipelined table functions:
Log into your database as the SYS
user or make sure that your administration user has the required privileges to execute the steps below;
Create types for the table function:
CREATE OR REPLACE TYPE zbx_mon_restore_point_row AS OBJECT (
SCN NUMBER,
DATABASE_INCARNATION# NUMBER,
GUARANTEE_FLASHBACK_DATABASE VARCHAR2(3),
STORAGE_SIZE NUMBER,
TIME TIMESTAMP(9),
RESTORE_POINT_TIME TIMESTAMP(9),
PRESERVED VARCHAR2(3),
NAME VARCHAR2(128),
PDB_RESTORE_POINT VARCHAR2(3),
CLEAN_PDB_RESTORE_POINT VARCHAR2(3),
PDB_INCARNATION# NUMBER,
REPLICATED VARCHAR2(3),
CON_ID NUMBER
);
CREATE OR REPLACE TYPE zbx_mon_restore_point_tab IS TABLE OF zbx_mon_restore_point_row;
Create the pipelined table function:
CREATE OR REPLACE FUNCTION zbx_mon_restore_point RETURN zbx_mon_restore_point_tab PIPELINED AS
BEGIN
FOR i IN (SELECT * FROM V$RESTORE_POINT) LOOP
PIPE ROW (zbx_mon_restore_point_row(i.SCN, i.DATABASE_INCARNATION#, i.GUARANTEE_FLASHBACK_DATABASE, i.STORAGE_SIZE, i.TIME, i.RESTORE_POINT_TIME, i.PRESERVED, i.NAME, i.PDB_RESTORE_POINT, i.CLEAN_PDB_RESTORE_POINT, i.PDB_INCARNATION#, i.REPLICATED, i.CON_ID));
END LOOP;
RETURN;
END;
Grant the Zabbix monitoring user the Execute privilege on the created pipelined table function and replace the monitoring user V$RESTORE_POINT
view with the SYS
user function (in this example, the SYS
user is used to create DB types and function):
GRANT EXECUTE ON zbx_mon_restore_point TO c##zabbix_mon;
CREATE OR REPLACE VIEW c##zabbix_mon.V$RESTORE_POINT AS SELECT * FROM TABLE(SYS.zbx_mon_restore_point);
Finally, revoke the SELECT_CATALOG_ROLE
and grant additional permissions that were previously covered by the SELECT_CATALOG_ROLE
.
REVOKE SELECT_CATALOG_ROLE FROM c##zabbix_mon;
GRANT SELECT ON v_$pdbs TO c##zabbix_mon;
GRANT SELECT ON v_$sort_segment TO c##zabbix_mon;
GRANT SELECT ON v_$parameter TO c##zabbix_mon;
GRANT SELECT ON CDB_TABLESPACES TO c##zabbix_mon;
GRANT SELECT ON CDB_DATA_FILES TO c##zabbix_mon;
GRANT SELECT ON CDB_FREE_SPACE TO c##zabbix_mon;
GRANT SELECT ON CDB_TEMP_FILES TO c##zabbix_mon;
Note that in these examples, the monitoring user is named
c##zabbix_mon
and the system user -SYS
. Change these example usernames to ones that are appropriate for your environment.
If this workaround does not work for you, there are more options available, such as materialized views, but look out for data refresh as V$RESTORE_POINT
is a dynamic performance view.
Install the ODBC driver on Zabbix server or Zabbix proxy. See the Oracle documentation for instructions.
Configure Zabbix server or Zabbix proxy for using the Oracle environment:
This step is required only when:
installing Oracle Instant Client with .rpm packages with a version < 19.3 (if Instant Client is the only Oracle software installed on Zabbix server or Zabbix proxy);
installing Oracle Instant Client manually with .zip files.
There are multiple configuration options:
Using the LDCONFIG
utility (recommended option):
To update the runtime link path, it is recommended to use the LDCONFIG
utility, for example:
# sh -c "echo /opt/oracle/instantclient_19_18 > /etc/ld.so.conf.d/oracle-instantclient.conf"
# ldconfig
Using the application configuration file:
An alternative solution is to export the required variables by editing or adding a new application configuration file:
/etc/sysconfig/zabbix-server # for server
/etc/sysconfig/zabbix-proxy # for proxy
And then, adding:
# Oracle Instant Client library
LD_LIBRARY_PATH=/opt/oracle/instantclient_19_18:$LD_LIBRARY_PATH
export LD_LIBRARY_PATH
Keep in mind that the library paths will vary depending on your installation.
This is a minimal configuration example. Depending on the Oracle Instant Client version, required functionality and host operating system, a different set of additional packages might need to be installed. For more detailed configuration instructions, see the official Oracle Instant Client installation instructions for Linux.
Restart Zabbix server or Zabbix proxy.
Set the username and password in the host macros {$ORACLE.USER}
and {$ORACLE.PASSWORD}
.
Set the {$ORACLE.DRIVER}
and {$ORACLE.SERVICE}
in the host macros.
{$ORACLE.DRIVER}
is a path to the driver location in the OS. The ODBC driver file should be found in the Instant Client directory and named libsqora.so.XX.Y
.
{$ORACLE.SERVICE}
is a service name to which the host will connect to. The value in this macro is important as it determines if the connection is established to a non-CDB, CDB, or PDB. If you wish to monitor tablespaces of all PDBs, you will need to set a service name that points to the CDB.
Active service names can be seen from the instance running Oracle Database with lsnrctl status
.
Important! Make sure that the user created in step #1 is present on the specified service.
The "Service's TCP port state" item uses {HOST.CONN}
and {$ORACLE.PORT}
macros to check the availability of the listener.
Name | Description | Default |
---|---|---|
{$ORACLE.DRIVER} | Oracle driver path. For example: |
<Put path to oracle driver here> |
{$ORACLE.SERVICE} | Oracle Service Name. |
<Put oracle service name here> |
{$ORACLE.USER} | Oracle username. |
<Put your username here> |
{$ORACLE.PASSWORD} | Oracle user's password. |
<Put your password here> |
{$ORACLE.PORT} | Oracle Database TCP port. |
1521 |
{$ORACLE.DBNAME.MATCHES} | Used in database discovery. It can be overridden on the host or linked template level. |
.* |
{$ORACLE.DBNAME.NOT_MATCHES} | Used in database discovery. It can be overridden on the host or linked template level. |
PDB\$SEED |
{$ORACLE.TABLESPACE.CONTAINER.MATCHES} | Used in tablespace discovery. It can be overridden on the host or linked template level. |
.* |
{$ORACLE.TABLESPACE.CONTAINER.NOT_MATCHES} | Used in tablespace discovery. It can be overridden on the host or linked template level. |
CHANGE_IF_NEEDED |
{$ORACLE.TABLESPACE.NAME.MATCHES} | Used in tablespace discovery. It can be overridden on the host or linked template level. |
.* |
{$ORACLE.TABLESPACE.NAME.NOT_MATCHES} | Used in tablespace discovery. It can be overridden on the host or linked template level. |
CHANGE_IF_NEEDED |
{$ORACLE.TBS.USED.PCT.FROM.MAX.WARN} | Warning severity alert threshold for the maximum percentage of tablespace usage from maximum tablespace size (used bytes/max bytes) for the Warning trigger expression. |
90 |
{$ORACLE.TBS.USED.PCT.FROM.MAX.HIGH} | High severity alert threshold for the maximum percentage of tablespace usage (used bytes/max bytes) for the High trigger expression. |
95 |
{$ORACLE.TBS.USED.PCT.MAX.WARN} | Warning severity alert threshold for the maximum percentage of tablespace usage (used bytes/allocated bytes) for the Warning trigger expression. |
90 |
{$ORACLE.TBS.USED.PCT.MAX.HIGH} | High severity alert threshold for the maximum percentage of tablespace usage (used bytes/allocated bytes) for the High trigger expression. |
95 |
{$ORACLE.TBS.UTIL.PCT.MAX.WARN} | Warning severity alert threshold for the maximum percentage of tablespace utilization (allocated bytes/max bytes) for the High trigger expression. |
80 |
{$ORACLE.TBS.UTIL.PCT.MAX.HIGH} | High severity alert threshold for the maximum percentage of tablespace utilization (allocated bytes/max bytes) for the High trigger expression. |
90 |
{$ORACLE.PROCESSES.MAX.WARN} | Alert threshold for the maximum percentage of active processes for the Warning trigger expression. |
80 |
{$ORACLE.SESSIONS.MAX.WARN} | Alert threshold for the maximum percentage of active sessions for the Warning trigger expression. |
80 |
{$ORACLE.DB.FILE.MAX.WARN} | The maximum percentage of used database files for the Warning trigger expression. |
80 |
{$ORACLE.PGA.USE.MAX.WARN} | Alert threshold for the maximum percentage of the Program Global Area (PGA) usage for the Warning trigger expression. |
90 |
{$ORACLE.SESSIONS.LOCK.MAX.WARN} | Alert threshold for the maximum percentage of locked sessions for the Warning trigger expression. |
20 |
{$ORACLE.SESSION.LOCK.MAX.TIME} | The maximum duration of the session lock in seconds to count the session as a prolongedly locked query. |
600 |
{$ORACLE.SESSION.LONG.LOCK.MAX.WARN} | Alert threshold for the maximum number of the prolongedly locked sessions for the Warning trigger expression. |
3 |
{$ORACLE.CONCURRENCY.MAX.WARN} | The maximum percentage of session concurrency for the Warning trigger expression. |
80 |
{$ORACLE.REDO.MIN.WARN} | Alert threshold for the minimum number of redo logs for the Warning trigger expression. |
3 |
{$ORACLE.SHARED.FREE.MIN.WARN} | Alert threshold for the minimum percentage of free shared pool for the Warning trigger expression. |
5 |
{$ORACLE.EXPIRE.PASSWORD.MIN.WARN} | The number of days before the password expires for the Warning trigger expression. |
7 |
{$ORACLE.ASM.USED.PCT.MAX.WARN} | The maximum percentage of used space in the Automatic Storage Management (ASM) disk group for the Warning trigger expression. |
90 |
{$ORACLE.ASM.USED.PCT.MAX.HIGH} | The maximum percentage of used space in the Automatic Storage Management (ASM) disk group for the High trigger expression. |
95 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Oracle: Service's TCP port state | Checks the availability of Oracle on the TCP port. |
Zabbix agent | net.tcp.service[tcp,{HOST.CONN},{$ORACLE.PORT}] Preprocessing
|
Oracle: Number of LISTENER processes | The number of running listener processes. |
Zabbix agent | proc.num[,,,"tnslsnr LISTENER"] Preprocessing
|
Oracle: Get instance state | Gets the state of the current instance. |
Database monitor | db.odbc.get[getinstancestate,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] |
Oracle: Get archive log | Gets the destinations of the log archive. |
Database monitor | db.odbc.get[get_archivelog,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] |
Oracle: Get ASM disk groups | Gets the ASM disk groups. |
Database monitor | db.odbc.get[get_asm,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] |
Oracle: Get database | Gets the databases in the database management system (DBMS). |
Database monitor | db.odbc.get[get_db,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] |
Oracle: Get PDB | Gets the pluggable database (PDB) in DBMS. |
Database monitor | db.odbc.get[get_pdb,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] |
Oracle: Get tablespace | Gets tablespaces in DBMS. |
Database monitor | db.odbc.get[get_tablespace,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] |
Oracle: Version | The Oracle Server version. |
Dependent item | oracle.version Preprocessing
|
Oracle: Uptime | The Oracle instance uptime expressed in seconds. |
Dependent item | oracle.uptime Preprocessing
|
Oracle: Instance status | The status of the instance. |
Dependent item | oracle.instance_status Preprocessing
|
Oracle: Archiver state | The status of automatic archiving. |
Dependent item | oracle.archiver_state Preprocessing
|
Oracle: Instance name | The name of the instance. |
Dependent item | oracle.instance_name Preprocessing
|
Oracle: Instance hostname | The name of the host machine. |
Dependent item | oracle.instance_hostname Preprocessing
|
Oracle: Instance role | Indicates whether the instance is an active instance or an inactive secondary instance. |
Dependent item | oracle.instance.role Preprocessing
|
Oracle: Get system metrics | Gets the values of the system metrics. |
Database monitor | db.odbc.get[getsystemmetrics,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] |
Oracle: Sessions limit | The user and system sessions. |
Dependent item | oracle.session_limit Preprocessing
|
Oracle: Datafiles limit | The maximum allowable number of datafiles. |
Dependent item | oracle.dbfileslimit Preprocessing
|
Oracle: Processes limit | The maximum number of user processes. |
Dependent item | oracle.processes_limit Preprocessing
|
Oracle: Number of processes | The current number of user processes. |
Dependent item | oracle.processes_count Preprocessing
|
Oracle: Datafiles count | The current number of datafiles. |
Dependent item | oracle.dbfilescount Preprocessing
|
Oracle: Buffer cache hit ratio | The ratio of buffer cache hits ((LogRead - PhyRead)/LogRead). |
Dependent item | oracle.buffercachehit_ratio Preprocessing
|
Oracle: Cursor cache hit ratio | The ratio of cursor cache hits (CursorCacheHit/SoftParse). |
Dependent item | oracle.cursorcachehit_ratio Preprocessing
|
Oracle: Library cache hit ratio | The ratio of library cache hits (Hits/Pins). |
Dependent item | oracle.librarycachehit_ratio Preprocessing
|
Oracle: Shared pool free % | Free memory of a shared pool expressed in %. |
Dependent item | oracle.sharedpoolfree Preprocessing
|
Oracle: Physical reads per second | Reads per second. |
Dependent item | oracle.physicalreadsrate Preprocessing
|
Oracle: Physical writes per second | Writes per second. |
Dependent item | oracle.physicalwritesrate Preprocessing
|
Oracle: Physical reads bytes per second | Read bytes per second. |
Dependent item | oracle.physicalreadbytes_rate Preprocessing
|
Oracle: Physical writes bytes per second | Write bytes per second. |
Dependent item | oracle.physicalwritebytes_rate Preprocessing
|
Oracle: Enqueue timeouts per second | Enqueue timeouts per second. |
Dependent item | oracle.enqueuetimeoutsrate Preprocessing
|
Oracle: GC CR block received per second | The global cache (GC) and the consistent read (CR) block received per second. |
Dependent item | oracle.gccrblockreceivedrate Preprocessing
|
Oracle: Global cache blocks corrupted | The number of blocks that encountered corruption or checksum failure during the interconnect. |
Dependent item | oracle.cacheblockscorrupt Preprocessing
|
Oracle: Global cache blocks lost | The number of lost global cache blocks. |
Dependent item | oracle.cacheblockslost Preprocessing
|
Oracle: Logons per second | The number of logon attempts. |
Dependent item | oracle.logons_rate Preprocessing
|
Oracle: Average active sessions | The average number of active sessions at a point in time that are either working or waiting. |
Dependent item | oracle.active_sessions Preprocessing
|
Oracle: Session count | The session count. |
Dependent item | oracle.session_count Preprocessing
|
Oracle: Active user sessions | The number of active user sessions. |
Dependent item | oracle.sessionactiveuser Preprocessing
|
Oracle: Active background sessions | The number of active background sessions. |
Dependent item | oracle.sessionactivebackground Preprocessing
|
Oracle: Inactive user sessions | The number of inactive user sessions. |
Dependent item | oracle.sessioninactiveuser Preprocessing
|
Oracle: Sessions lock rate | The percentage of locked sessions. Locks are mechanisms that prevent destructive interaction between transactions accessing the same resource - either user objects, such as tables and rows or system objects not visible to users, such as shared data structures in memory and data dictionary rows. |
Dependent item | oracle.sessionlockrate Preprocessing
|
Oracle: Sessions locked over {$ORACLE.SESSION.LOCK.MAX.TIME}s | The count of the prolongedly locked sessions. (You can change the duration of the maximum session lock in seconds for a query using the |
Dependent item | oracle.sessionlongtime_locked Preprocessing
|
Oracle: Sessions concurrency | The percentage of concurrency. Concurrency is a database behavior when different transactions request to change the same resource. In the case of modifying data transactions, it sequentially temporarily blocks the right to change the data, and the rest of the transactions wait for access. When the access to a resource is locked for a long time, the concurrency grows (like the transaction queue), often leaving an extremely negative impact on performance. A high contention value does not indicate the root cause of the problem, but is a signal to search for it. |
Dependent item | oracle.sessionconcurrencyrate Preprocessing
|
Oracle: User '{$ORACLE.USER}' expire password | The number of days before the Zabbix account password expires. |
Dependent item | oracle.userexpirepassword Preprocessing
|
Oracle: Active serial sessions | The number of active serial sessions. |
Dependent item | oracle.activeserialsessions Preprocessing
|
Oracle: Active parallel sessions | The number of active parallel sessions. |
Dependent item | oracle.activeparallelsessions Preprocessing
|
Oracle: Long table scans per second | The number of long table scans per second. A table is considered long if it is not cached and if its high water mark is greater than five blocks. |
Dependent item | oracle.longtablescans_rate Preprocessing
|
Oracle: SQL service response time | The Structured Query Language (SQL) service response time expressed in seconds. |
Dependent item | oracle.serviceresponsetime Preprocessing
|
Oracle: User rollbacks per second | The number of times that users manually issued the |
Dependent item | oracle.userrollbacksrate Preprocessing
|
Oracle: Total sorts per user call | The total sorts per user call. |
Dependent item | oracle.sortsperuser_call Preprocessing
|
Oracle: Rows per sort | The average number of rows per sort for all types of sorts performed. |
Dependent item | oracle.rowspersort Preprocessing
|
Oracle: Disk sort per second | The number of sorts going to disk per second. |
Dependent item | oracle.disk_sorts Preprocessing
|
Oracle: Memory sorts ratio | The percentage of sorts (from |
Dependent item | oracle.memorysortsratio Preprocessing
|
Oracle: Database wait time ratio | Wait time - the time that the server process spends waiting for available shared resources to be released by other server processes such as latches, locks, data buffers, etc. |
Dependent item | oracle.databasewaittime_ratio Preprocessing
|
Oracle: Database CPU time ratio | The ratio calculated by dividing the total CPU (used by the database) by the Oracle time model statistic DB time. |
Dependent item | oracle.databasecputime_ratio Preprocessing
|
Oracle: Temp space used | Used temporary space. |
Dependent item | oracle.tempspaceused Preprocessing
|
Oracle: PGA, Total inuse | The amount of Program Global Area (PGA) memory currently consumed by work areas. This number can be used to determine how much memory is consumed by other consumers of the PGA memory (for example, PL/SQL or Java). |
Dependent item | oracle.totalpgaused Preprocessing
|
Oracle: PGA, Aggregate target parameter | The current value of the |
Dependent item | oracle.pga_target Preprocessing
|
Oracle: PGA, Total allocated | The current amount of the PGA memory allocated by the instance. The Oracle Database attempts to keep this number below the value of the |
Dependent item | oracle.totalpgaallocated Preprocessing
|
Oracle: PGA, Total freeable | The number of bytes of the PGA memory in all processes that could be freed back to the OS. |
Dependent item | oracle.totalpgafreeable Preprocessing
|
Oracle: PGA, Global memory bound | The maximum size of a work area executed in automatic mode. |
Dependent item | oracle.pgaglobalbound Preprocessing
|
Oracle: FRA, Space limit | The maximum amount of disk space (in bytes) that the database can use for the Fast Recovery Area (FRA). |
Dependent item | oracle.fraspacelimit Preprocessing
|
Oracle: FRA, Used space | The amount of disk space (in bytes) used by FRA files created in the current and all the previous FRAs. |
Dependent item | oracle.fraspaceused Preprocessing
|
Oracle: FRA, Space reclaimable | The total amount of disk space (in bytes) that can be created by deleting obsolete, redundant, and other low-priority files from the FRA. |
Dependent item | oracle.fraspacereclaimable Preprocessing
|
Oracle: FRA, Number of files | The number of files in the FRA. |
Dependent item | oracle.franumberof_files Preprocessing
|
Oracle: FRA, Usable space in % | Percentage of space usable in the FRA. |
Dependent item | oracle.frausablepct Preprocessing
|
Oracle: FRA, Number of restore points | Number of restore points in the FRA. |
Dependent item | oracle.frarestorepoint Preprocessing
|
Oracle: SGA, java pool | The memory is allocated from the Java pool. |
Dependent item | oracle.sgajavapool Preprocessing
|
Oracle: SGA, large pool | The memory is allocated from a large pool. |
Dependent item | oracle.sgalargepool Preprocessing
|
Oracle: SGA, shared pool | The memory is allocated from a shared pool. |
Dependent item | oracle.sgasharedpool Preprocessing
|
Oracle: SGA, log buffer | The number of bytes allocated for the redo log buffer. |
Dependent item | oracle.sgalogbuffer Preprocessing
|
Oracle: SGA, fixed | The fixed System Global Area (SGA) is an internal housekeeping area. |
Dependent item | oracle.sga_fixed Preprocessing
|
Oracle: SGA, buffer cache | The size of standard block cache. |
Dependent item | oracle.sgabuffercache Preprocessing
|
Oracle: Redo logs available to switch | The number of inactive/unused redo logs available for log switching. |
Dependent item | oracle.redologsavailable Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Oracle: Port {$ORACLE.PORT} is unavailable | The TCP port of the Oracle Server service is currently unavailable. |
max(/Oracle by ODBC/net.tcp.service[tcp,{HOST.CONN},{$ORACLE.PORT}],#3)=0 and max(/Oracle by ODBC/proc.num[,,,"tnslsnr LISTENER"],#3)>0 |Disaster |
||
Oracle: LISTENER process is not running | The Oracle listener process is not running. |
max(/Oracle by ODBC/proc.num[,,,"tnslsnr LISTENER"],#3)=0 |Disaster |
||
Oracle: Version has changed | The Oracle Database version has changed. Acknowledge to close the problem manually. |
last(/Oracle by ODBC/oracle.version,#1)<>last(/Oracle by ODBC/oracle.version,#2) and length(last(/Oracle by ODBC/oracle.version))>0 |Info |
Manual close: Yes | |
Oracle: Host has been restarted | Uptime is less than 10 minutes. |
last(/Oracle by ODBC/oracle.uptime)<10m |Info |
Manual close: Yes | |
Oracle: Failed to fetch info data | Zabbix has not received any data for the items for the last 5 minutes. The database might be unavailable for connecting. |
nodata(/Oracle by ODBC/oracle.uptime,5m)=1 |Warning |
Depends on:
|
|
Oracle: Instance name has changed | An Oracle Database instance name has changed. Acknowledge to close the problem manually. |
last(/Oracle by ODBC/oracle.instance_name,#1)<>last(/Oracle by ODBC/oracle.instance_name,#2) and length(last(/Oracle by ODBC/oracle.instance_name))>0 |Info |
Manual close: Yes | |
Oracle: Instance hostname has changed | An Oracle Database instance hostname has changed. Acknowledge to close the problem manually. |
last(/Oracle by ODBC/oracle.instance_hostname,#1)<>last(/Oracle by ODBC/oracle.instance_hostname,#2) and length(last(/Oracle by ODBC/oracle.instance_hostname))>0 |Info |
Manual close: Yes | |
Oracle: Too many active processes | Active processes are using more than |
min(/Oracle by ODBC/oracle.processes_count,5m) * 100 / last(/Oracle by ODBC/oracle.processes_limit) > {$ORACLE.PROCESSES.MAX.WARN} |Warning |
||
Oracle: Too many database files | The number of datafiles is higher than |
min(/Oracle by ODBC/oracle.db_files_count,5m) * 100 / last(/Oracle by ODBC/oracle.db_files_limit) > {$ORACLE.DB.FILE.MAX.WARN} |Warning |
||
Oracle: Shared pool free is too low | The free memory percent of the shared pool has been less than |
max(/Oracle by ODBC/oracle.shared_pool_free,5m)<{$ORACLE.SHARED.FREE.MIN.WARN} |Warning |
||
Oracle: Too many active sessions | Active sessions are using more than |
min(/Oracle by ODBC/oracle.session_count,5m) * 100 / last(/Oracle by ODBC/oracle.session_limit) > {$ORACLE.SESSIONS.MAX.WARN} |Warning |
||
Oracle: Too many locked sessions | The number of locked sessions exceeds |
min(/Oracle by ODBC/oracle.session_lock_rate,5m) > {$ORACLE.SESSIONS.LOCK.MAX.WARN} |Warning |
||
Oracle: Too many sessions locked | The number of locked sessions exceeding |
min(/Oracle by ODBC/oracle.session_long_time_locked,5m) > {$ORACLE.SESSION.LONG.LOCK.MAX.WARN} |Warning |
||
Oracle: Too high database concurrency | The concurrency rate exceeds |
min(/Oracle by ODBC/oracle.session_concurrency_rate,5m) > {$ORACLE.CONCURRENCY.MAX.WARN} |Warning |
||
Oracle: Zabbix account will expire soon | The password for the Zabbix user in the database expires soon. |
last(/Oracle by ODBC/oracle.user_expire_password) < {$ORACLE.EXPIRE.PASSWORD.MIN.WARN} |Warning |
||
Oracle: Total PGA inuse is too high | The total PGA currently consumed by work areas is more than |
min(/Oracle by ODBC/oracle.total_pga_used,5m) * 100 / last(/Oracle by ODBC/oracle.pga_target) > {$ORACLE.PGA.USE.MAX.WARN} |Warning |
||
Oracle: Number of REDO logs available for switching is too low | The number of inactive/unused redos available for log switching is low (risk of database downtime). |
max(/Oracle by ODBC/oracle.redo_logs_available,5m) < {$ORACLE.REDO.MIN.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Database discovery | Used for database discovery. |
Dependent item | oracle.db.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Oracle Database '{#DBNAME}': Get CDB and No-CDB info | Gets the information about the CDB and non-CDB database on an instance. |
Database monitor | db.odbc.get[getcdb{#DBNAME}_info,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] Preprocessing
|
Oracle Database '{#DBNAME}': Open status | 1 - MOUNTED; 2 - READ WRITE; 3 - READ ONLY; 4 - READ ONLY WITH APPLY (a physical standby database is open in real-time query mode). |
Dependent item | oracle.dbopenmode["{#DBNAME}"] Preprocessing
|
Oracle Database '{#DBNAME}': Role | The current role of the database where: 1 - SNAPSHOT STANDBY; 2 - LOGICAL STANDBY; 3 - PHYSICAL STANDBY; 4 - PRIMARY; 5 - FAR SYNC. |
Dependent item | oracle.db_role["{#DBNAME}"] Preprocessing
|
Oracle Database '{#DBNAME}': Log mode | The archive log mode where: 0 - NOARCHIVELOG; 1 - ARCHIVELOG; 2 - MANUAL. |
Dependent item | oracle.dblogmode["{#DBNAME}"] Preprocessing
|
Oracle Database '{#DBNAME}': Force logging | Indicates whether the database is under force logging mode ( |
Dependent item | oracle.dbforcelogging["{#DBNAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Oracle Database '{#DBNAME}': Open status in mount mode | The Oracle Database is in a mounted state. |
last(/Oracle by ODBC/oracle.db_open_mode["{#DBNAME}"])=1 |Warning |
||
Oracle Database '{#DBNAME}': Open status has changed | The Oracle Database open status has changed. Acknowledge to close the problem manually. |
last(/Oracle by ODBC/oracle.db_open_mode["{#DBNAME}"],#1)<>last(/Oracle by ODBC/oracle.db_open_mode["{#DBNAME}"],#2) |Info |
Manual close: Yes Depends on:
|
|
Oracle Database '{#DBNAME}': Role has changed | The Oracle Database role has changed. Acknowledge to close the problem manually. |
last(/Oracle by ODBC/oracle.db_role["{#DBNAME}"],#1)<>last(/Oracle by ODBC/oracle.db_role["{#DBNAME}"],#2) |Info |
Manual close: Yes | |
Oracle Database '{#DBNAME}': Force logging is deactivated for DB with active Archivelog | Force logging mode is a very important metric for databases in |
last(/Oracle by ODBC/oracle.db_force_logging["{#DBNAME}"]) = 0 and last(/Oracle by ODBC/oracle.db_log_mode["{#DBNAME}"]) = 1 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
PDB discovery | Used for the discovery of the pluggable database (PDB). |
Dependent item | oracle.pdb.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Oracle Database '{#DBNAME}': Get PDB info | Gets the information about the PDB database on an instance. |
Database monitor | db.odbc.get[getpdb{#DBNAME}_info,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] Preprocessing
|
Oracle Database '{#DBNAME}': Open status | 1 - MOUNTED; 2 - READ WRITE; 3 - READ ONLY; 4 - READ ONLY WITH APPLY (a physical standby database is open in real-time query mode). |
Dependent item | oracle.pdbopenmode["{#DBNAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Oracle Database '{#DBNAME}': Open status in mount mode | The Oracle Database is in a mounted state. |
last(/Oracle by ODBC/oracle.pdb_open_mode["{#DBNAME}"])=1 |Warning |
||
Oracle Database '{#DBNAME}': Open status has changed | The Oracle Database open status has changed. Acknowledge to close the problem manually. |
last(/Oracle by ODBC/oracle.pdb_open_mode["{#DBNAME}"],#1)<>last(/Oracle by ODBC/oracle.pdb_open_mode["{#DBNAME}"],#2) |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Tablespace discovery | Used for the discovery of tablespaces in DBMS. |
Dependent item | oracle.tablespace.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Get tablespaces stats | Gets the statistics of the tablespace. |
Database monitor | db.odbc.get[get{#CONNAME}tablespace{#TABLESPACE}_stats,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] Preprocessing
|
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace allocated, bytes | Currently allocated bytes for the tablespace (sum of the current size of datafiles). |
Dependent item | oracle.tbsallocbytes["{#CON_NAME}","{#TABLESPACE}"] Preprocessing
|
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace MAX size, bytes | The maximum size of the tablespace. |
Dependent item | oracle.tbsmaxbytes["{#CON_NAME}","{#TABLESPACE}"] Preprocessing
|
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace used, bytes | Currently used bytes for the tablespace (current size of datafiles minus the free space). |
Dependent item | oracle.tbsusedbytes["{#CON_NAME}","{#TABLESPACE}"] Preprocessing
|
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace free, bytes | Free bytes of the allocated space. |
Dependent item | oracle.tbsfreebytes["{#CON_NAME}","{#TABLESPACE}"] Preprocessing
|
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace allocated, percent | Allocated bytes/max bytes*100. |
Dependent item | oracle.tbsusedpct["{#CON_NAME}","{#TABLESPACE}"] Preprocessing
|
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace usage, percent | Used bytes/allocated bytes*100. |
Dependent item | oracle.tbsusedfilepct["{#CONNAME}","{#TABLESPACE}"] Preprocessing
|
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace usage from MAX, percent | Used bytes/max bytes*100. |
Dependent item | oracle.tbsusedfrommaxpct["{#CON_NAME}","{#TABLESPACE}"] Preprocessing
|
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Open status | The tablespace status where: 1 - ONLINE; 2 - OFFLINE; 3 - READ ONLY. |
Dependent item | oracle.tbsstatus["{#CONNAME}","{#TABLESPACE}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace utilization is too high | The utilization of the tablespace |
min(/Oracle by ODBC/oracle.tbs_used_pct["{#CON_NAME}","{#TABLESPACE}"],5m)>{$ORACLE.TBS.UTIL.PCT.MAX.WARN} |Warning |
Depends on:
|
|
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace utilization is too high | The utilization of the tablespace |
min(/Oracle by ODBC/oracle.tbs_used_pct["{#CON_NAME}","{#TABLESPACE}"],5m)>{$ORACLE.TBS.UTIL.PCT.MAX.HIGH} |High |
||
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace usage is too high | The usage of the tablespace |
min(/Oracle by ODBC/oracle.tbs_used_file_pct["{#CON_NAME}","{#TABLESPACE}"],5m)>{$ORACLE.TBS.USED.PCT.MAX.WARN} |Warning |
Depends on:
|
|
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace usage is too high | The usage of the tablespace |
min(/Oracle by ODBC/oracle.tbs_used_file_pct["{#CON_NAME}","{#TABLESPACE}"],5m)>{$ORACLE.TBS.USED.PCT.MAX.HIGH} |High |
||
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace usage from MAX is too high | The usage of the tablespace |
min(/Oracle by ODBC/oracle.tbs_used_from_max_pct["{#CON_NAME}","{#TABLESPACE}"],5m)>{$ORACLE.TBS.USED.PCT.FROM.MAX.WARN} |Warning |
Depends on:
|
|
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace usage from MAX is too high | The usage of the tablespace |
min(/Oracle by ODBC/oracle.tbs_used_from_max_pct["{#CON_NAME}","{#TABLESPACE}"],5m)>{$ORACLE.TBS.USED.PCT.FROM.MAX.HIGH} |High |
||
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace is OFFLINE | The tablespace is in the offline state. |
last(/Oracle by ODBC/oracle.tbs_status["{#CON_NAME}","{#TABLESPACE}"])=2 |Warning |
||
Oracle '{#CON_NAME}' TBS '{#TABLESPACE}': Tablespace status has changed | Oracle tablespace status has changed. Acknowledge to close the problem manually. |
last(/Oracle by ODBC/oracle.tbs_status["{#CON_NAME}","{#TABLESPACE}"],#1)<>last(/Oracle by ODBC/oracle.tbs_status["{#CON_NAME}","{#TABLESPACE}"],#2) |Info |
Manual close: Yes Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Archive log discovery | Used for the discovery of the log archive. |
Dependent item | oracle.archivelog.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Archivelog '{#DEST_NAME}': Get archive log info | Gets the archive log statistics. |
Database monitor | db.odbc.get[getarchivelog{#DESTNAME}stat,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] Preprocessing
|
Archivelog '{#DEST_NAME}': Error | Displays the error message. |
Dependent item | oracle.archivelogerror["{#DESTNAME}"] Preprocessing
|
Archivelog '{#DEST_NAME}': Last sequence | Identifies the sequence number of the last archived redo log to be archived. |
Dependent item | oracle.archiveloglogsequence["{#DEST_NAME}"] Preprocessing
|
Archivelog '{#DEST_NAME}': Status | Identifies the current status of the destination where: 1 - VALID; 2 - DEFERRED; 3 - ERROR; 0 - UNKNOWN. |
Dependent item | oracle.archiveloglogstatus["{#DEST_NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Archivelog '{#DEST_NAME}': Log Archive is not valid | The trigger will launch if the archive log destination is not in one of these states: |
last(/Oracle by ODBC/oracle.archivelog_log_status["{#DEST_NAME}"])<2 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
ASM disk groups discovery | Used for discovering the ASM disk groups. |
Dependent item | oracle.asm.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
ASM '{#DGNAME}': Get ASM stats | Gets the ASM disk group statistics. |
Database monitor | db.odbc.get[getasm{#DGNAME}_stat,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] Preprocessing
|
ASM '{#DGNAME}': Total size | The total size of the ASM disk group. |
Dependent item | oracle.asmtotalsize["{#DGNAME}"] Preprocessing
|
ASM '{#DGNAME}': Free size | The free size of the ASM disk group. |
Dependent item | oracle.asmfreesize["{#DGNAME}"] Preprocessing
|
ASM '{#DGNAME}': Used size, percent | Usage of the ASM disk group expressed in %. |
Dependent item | oracle.asmusedpct["{#DGNAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
ASM '{#DGNAME}': Disk group usage is too high | The usage of the ASM disk group expressed in % exceeds |
min(/Oracle by ODBC/oracle.asm_used_pct["{#DGNAME}"],5m)>{$ORACLE.ASM.USED.PCT.MAX.WARN} |Warning |
Depends on:
|
|
ASM '{#DGNAME}': Disk group usage is too high | The usage of the ASM disk group expressed in % exceeds |
min(/Oracle by ODBC/oracle.asm_used_pct["{#DGNAME}"],5m)>{$ORACLE.ASM.USED.PCT.MAX.HIGH} |High |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template is developed to monitor a single DBMS Oracle Database instance with Zabbix agent 2.
Oracle Database 12c2 and newer.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
If you want to override parameters from Zabbix agent configuration file, set the user name, password and service name in host macros ({$ORACLE.USER}, {$ORACLE.PASSWORD}, and {$ORACLE.SERVICE}).
User can contain sysdba, sysoper, sysasm privileges. It must be used with as
as a separator e.g user as sysdba
, privilege can be upper or lowercase, and must be at the end of username string.
Test availability:
zabbix_get -s oracle-host -k oracle.ping["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"]
Name | Description | Default |
---|---|---|
{$ORACLE.USER} | Oracle username. |
zabbix |
{$ORACLE.PASSWORD} | Oracle user's password. |
zabbix_password |
{$ORACLE.CONNSTRING} | Oracle URI or a session name. |
tcp://localhost:1521 |
{$ORACLE.SERVICE} | Oracle Service Name. |
ORA |
{$ORACLE.DBNAME.MATCHES} | Used in database discovery. It can be overridden on the host or linked template level. |
.* |
{$ORACLE.DBNAME.NOT_MATCHES} | Used in database discovery. It can be overridden on the host or linked template level. |
PDB\$SEED |
{$ORACLE.TABLESPACE.NAME.MATCHES} | Used in tablespace discovery. It can be overridden on the host or linked template level. |
.* |
{$ORACLE.TABLESPACE.NAME.NOT_MATCHES} | Used in tablespace discovery. It can be overridden on the host or linked template level. |
CHANGE_IF_NEEDED |
{$ORACLE.TBS.USED.PCT.FROM.MAX.WARN} | Warning severity alert threshold for the maximum percentage of tablespace usage from maximum tablespace size (used bytes/max bytes) for the Warning trigger expression. |
90 |
{$ORACLE.TBS.USED.PCT.FROM.MAX.HIGH} | High severity alert threshold for the maximum percentage of tablespace usage (used bytes/max bytes) for the High trigger expression. |
95 |
{$ORACLE.TBS.USED.PCT.MAX.WARN} | Warning severity alert threshold for the maximum percentage of tablespace usage (used bytes/allocated bytes) for the Warning trigger expression. |
90 |
{$ORACLE.TBS.USED.PCT.MAX.HIGH} | High severity alert threshold for the maximum percentage of tablespace usage (used bytes/allocated bytes) for the High trigger expression. |
95 |
{$ORACLE.TBS.UTIL.PCT.MAX.WARN} | Warning severity alert threshold for the maximum percentage of tablespace utilization (allocated bytes/max bytes) for the High trigger expression. |
80 |
{$ORACLE.TBS.UTIL.PCT.MAX.HIGH} | High severity alert threshold for the maximum percentage of tablespace utilization (allocated bytes/max bytes) for the High trigger expression. |
90 |
{$ORACLE.PROCESSES.MAX.WARN} | Alert threshold for the maximum percentage of active processes for the Warning trigger expression. |
80 |
{$ORACLE.SESSIONS.MAX.WARN} | Alert threshold for the maximum percentage of active sessions for the Warning trigger expression. |
80 |
{$ORACLE.DB.FILE.MAX.WARN} | The maximum percentage of used database files for the Warning trigger expression. |
80 |
{$ORACLE.PGA.USE.MAX.WARN} | Alert threshold for the maximum percentage of the Program Global Area (PGA) usage for the Warning trigger expression. |
90 |
{$ORACLE.SESSIONS.LOCK.MAX.WARN} | Alert threshold for the maximum percentage of locked sessions for the Warning trigger expression. |
20 |
{$ORACLE.SESSION.LOCK.MAX.TIME} | The maximum duration of the session lock in seconds to count the session as a prolongedly locked query. |
600 |
{$ORACLE.SESSION.LONG.LOCK.MAX.WARN} | Alert threshold for the maximum number of the prolongedly locked sessions for the Warning trigger expression. |
3 |
{$ORACLE.CONCURRENCY.MAX.WARN} | The maximum percentage of session concurrency for the Warning trigger expression. |
80 |
{$ORACLE.REDO.MIN.WARN} | Alert threshold for the minimum number of redo logs for the Warning trigger expression. |
3 |
{$ORACLE.SHARED.FREE.MIN.WARN} | Alert threshold for the minimum percentage of free shared pool for the Warning trigger expression. |
5 |
{$ORACLE.EXPIRE.PASSWORD.MIN.WARN} | The number of days before the password expires for the Warning trigger expression. |
7 |
{$ORACLE.ASM.USED.PCT.MAX.WARN} | The maximum percentage of used space in the Automatic Storage Management (ASM) disk group for the Warning trigger expression. |
90 |
{$ORACLE.ASM.USED.PCT.MAX.HIGH} | The maximum percentage of used space in the Automatic Storage Management (ASM) disk group for the High trigger expression. |
95 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Oracle: Ping | Test the connection to Oracle Database state. |
Zabbix agent | oracle.ping["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] Preprocessing
|
Oracle: Get instance state | Gets the state of the current instance. |
Zabbix agent | oracle.instance.info["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] |
Oracle: Version | The Oracle Server version. |
Dependent item | oracle.version Preprocessing
|
Oracle: Uptime | The Oracle instance uptime expressed in seconds. |
Dependent item | oracle.uptime Preprocessing
|
Oracle: Instance status | The status of the instance. |
Dependent item | oracle.instance_status Preprocessing
|
Oracle: Archiver state | The status of automatic archiving. |
Dependent item | oracle.archiver_state Preprocessing
|
Oracle: Instance name | The name of the instance. |
Dependent item | oracle.instance_name Preprocessing
|
Oracle: Instance hostname | The name of the host machine. |
Dependent item | oracle.instance_hostname Preprocessing
|
Oracle: Instance role | Indicates whether the instance is an active instance or an inactive secondary instance. |
Dependent item | oracle.instance.role Preprocessing
|
Oracle: Get system metrics | Gets the values of the system metrics. |
Zabbix agent | oracle.sys.metrics["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] |
Oracle: Buffer cache hit ratio | The ratio of buffer cache hits ((LogRead - PhyRead)/LogRead). |
Dependent item | oracle.buffercachehit_ratio Preprocessing
|
Oracle: Cursor cache hit ratio | The ratio of cursor cache hits (CursorCacheHit/SoftParse). |
Dependent item | oracle.cursorcachehit_ratio Preprocessing
|
Oracle: Library cache hit ratio | The ratio of library cache hits (Hits/Pins). |
Dependent item | oracle.librarycachehit_ratio Preprocessing
|
Oracle: Shared pool free % | Free memory of a shared pool expressed in %. |
Dependent item | oracle.sharedpoolfree Preprocessing
|
Oracle: Physical reads per second | Reads per second. |
Dependent item | oracle.physicalreadsrate Preprocessing
|
Oracle: Physical writes per second | Writes per second. |
Dependent item | oracle.physicalwritesrate Preprocessing
|
Oracle: Physical reads bytes per second | Read bytes per second. |
Dependent item | oracle.physicalreadbytes_rate Preprocessing
|
Oracle: Physical writes bytes per second | Write bytes per second. |
Dependent item | oracle.physicalwritebytes_rate Preprocessing
|
Oracle: Enqueue timeouts per second | Enqueue timeouts per second. |
Dependent item | oracle.enqueuetimeoutsrate Preprocessing
|
Oracle: GC CR block received per second | The global cache (GC) and the consistent read (CR) block received per second. |
Dependent item | oracle.gccrblockreceivedrate Preprocessing
|
Oracle: Global cache blocks corrupted | The number of blocks that encountered corruption or checksum failure during the interconnect. |
Dependent item | oracle.cacheblockscorrupt Preprocessing
|
Oracle: Global cache blocks lost | The number of lost global cache blocks. |
Dependent item | oracle.cacheblockslost Preprocessing
|
Oracle: Logons per second | The number of logon attempts. |
Dependent item | oracle.logons_rate Preprocessing
|
Oracle: Average active sessions | The average number of active sessions at a point in time that are either working or waiting. |
Dependent item | oracle.active_sessions Preprocessing
|
Oracle: Active serial sessions | The number of active serial sessions. |
Dependent item | oracle.activeserialsessions Preprocessing
|
Oracle: Active parallel sessions | The number of active parallel sessions. |
Dependent item | oracle.activeparallelsessions Preprocessing
|
Oracle: Long table scans per second | The number of long table scans per second. A table is considered long if it is not cached and if its high water mark is greater than five blocks. |
Dependent item | oracle.longtablescans_rate Preprocessing
|
Oracle: SQL service response time | The Structured Query Language (SQL) service response time expressed in seconds. |
Dependent item | oracle.serviceresponsetime Preprocessing
|
Oracle: User rollbacks per second | The number of times that users manually issued the |
Dependent item | oracle.userrollbacksrate Preprocessing
|
Oracle: Total sorts per user call | The total sorts per user call. |
Dependent item | oracle.sortsperuser_call Preprocessing
|
Oracle: Rows per sort | The average number of rows per sort for all types of sorts performed. |
Dependent item | oracle.rowspersort Preprocessing
|
Oracle: Disk sort per second | The number of sorts going to disk per second. |
Dependent item | oracle.disk_sorts Preprocessing
|
Oracle: Memory sorts ratio | The percentage of sorts (from |
Dependent item | oracle.memorysortsratio Preprocessing
|
Oracle: Database wait time ratio | Wait time - the time that the server process spends waiting for available shared resources to be released by other server processes such as latches, locks, data buffers, etc. |
Dependent item | oracle.databasewaittime_ratio Preprocessing
|
Oracle: Database CPU time ratio | The ratio calculated by dividing the total CPU (used by the database) by the Oracle time model statistic DB time. |
Dependent item | oracle.databasecputime_ratio Preprocessing
|
Oracle: Temp space used | Used temporary space. |
Dependent item | oracle.tempspaceused Preprocessing
|
Oracle: Get system parameters | Get a set of system parameter values. |
Zabbix agent | oracle.sys.params["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] |
Oracle: Sessions limit | The user and system sessions. |
Dependent item | oracle.session_limit Preprocessing
|
Oracle: Datafiles limit | The maximum allowable number of datafiles. |
Dependent item | oracle.dbfileslimit Preprocessing
|
Oracle: Processes limit | The maximum number of user processes. |
Dependent item | oracle.processes_limit Preprocessing
|
Oracle: Get sessions stats | Get sessions statistics. {$ORACLE.SESSION.LOCK.MAX.TIME} -- maximum seconds in the current wait condition for counting long time locked sessions. Default: 600 seconds. |
Zabbix agent | oracle.sessions.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}","{$ORACLE.SESSION.LOCK.MAX.TIME}"] |
Oracle: Session count | The session count. |
Dependent item | oracle.session_count Preprocessing
|
Oracle: Active user sessions | The number of active user sessions. |
Dependent item | oracle.sessionactiveuser Preprocessing
|
Oracle: Active background sessions | The number of active background sessions. |
Dependent item | oracle.sessionactivebackground Preprocessing
|
Oracle: Inactive user sessions | The number of inactive user sessions. |
Dependent item | oracle.sessioninactiveuser Preprocessing
|
Oracle: Sessions lock rate | The percentage of locked sessions. Locks are mechanisms that prevent destructive interaction between transactions accessing the same resource - either user objects, such as tables and rows or system objects not visible to users, such as shared data structures in memory and data dictionary rows. |
Dependent item | oracle.sessionlockrate Preprocessing
|
Oracle: Sessions locked over {$ORACLE.SESSION.LOCK.MAX.TIME}s | The count of the prolongedly locked sessions. (You can change the duration of the maximum session lock in seconds for a query using the |
Dependent item | oracle.sessionlongtime_locked Preprocessing
|
Oracle: Sessions concurrency | The percentage of concurrency. Concurrency is a database behavior when different transactions request to change the same resource. In the case of modifying data transactions, it sequentially temporarily blocks the right to change the data, and the rest of the transactions wait for access. When the access to a resource is locked for a long time, the concurrency grows (like the transaction queue), often leaving an extremely negative impact on performance. A high contention value does not indicate the root cause of the problem, but is a signal to search for it. |
Dependent item | oracle.sessionconcurrencyrate Preprocessing
|
Oracle: Get PGA stats | Get PGA statistics. |
Zabbix agent | oracle.pga.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] |
Oracle: PGA, Total inuse | The amount of Program Global Area (PGA) memory currently consumed by work areas. This number can be used to determine how much memory is consumed by other consumers of the PGA memory (for example, PL/SQL or Java). |
Dependent item | oracle.totalpgaused Preprocessing
|
Oracle: PGA, Aggregate target parameter | The current value of the |
Dependent item | oracle.pga_target Preprocessing
|
Oracle: PGA, Total allocated | The current amount of the PGA memory allocated by the instance. The Oracle Database attempts to keep this number below the value of the |
Dependent item | oracle.totalpgaallocated Preprocessing
|
Oracle: PGA, Total freeable | The number of bytes of the PGA memory in all processes that could be freed back to the OS. |
Dependent item | oracle.totalpgafreeable Preprocessing
|
Oracle: PGA, Global memory bound | The maximum size of a work area executed in automatic mode. |
Dependent item | oracle.pgaglobalbound Preprocessing
|
Oracle: Get FRA stats | Get FRA statistics. |
Zabbix agent | oracle.fra.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] |
Oracle: FRA, Space limit | The maximum amount of disk space (in bytes) that the database can use for the Fast Recovery Area (FRA). |
Dependent item | oracle.fraspacelimit Preprocessing
|
Oracle: FRA, Used space | The amount of disk space (in bytes) used by FRA files created in the current and all the previous FRAs. |
Dependent item | oracle.fraspaceused Preprocessing
|
Oracle: FRA, Space reclaimable | The total amount of disk space (in bytes) that can be created by deleting obsolete, redundant, and other low-priority files from the FRA. |
Dependent item | oracle.fraspacereclaimable Preprocessing
|
Oracle: FRA, Number of files | The number of files in the FRA. |
Dependent item | oracle.franumberof_files Preprocessing
|
Oracle: FRA, Usable space in % | Percentage of space usable in the FRA. |
Dependent item | oracle.frausablepct Preprocessing
|
Oracle: FRA, Number of restore points | Number of restore points in the FRA. |
Dependent item | oracle.frarestorepoint Preprocessing
|
Oracle: Get SGA stats | Get SGA statistics. |
Zabbix agent | oracle.sga.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] |
Oracle: SGA, java pool | The memory is allocated from the Java pool. |
Dependent item | oracle.sgajavapool Preprocessing
|
Oracle: SGA, large pool | The memory is allocated from a large pool. |
Dependent item | oracle.sgalargepool Preprocessing
|
Oracle: SGA, shared pool | The memory is allocated from a shared pool. |
Dependent item | oracle.sgasharedpool Preprocessing
|
Oracle: SGA, log buffer | The number of bytes allocated for the redo log buffer. |
Dependent item | oracle.sgalogbuffer Preprocessing
|
Oracle: SGA, fixed | The fixed System Global Area (SGA) is an internal housekeeping area. |
Dependent item | oracle.sga_fixed Preprocessing
|
Oracle: SGA, buffer cache | The size of standard block cache. |
Dependent item | oracle.sgabuffercache Preprocessing
|
Oracle: User's expire password | The number of days before the Zabbix account password expires. |
Zabbix agent | oracle.user.info["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] Preprocessing
|
Oracle: Redo logs available to switch | The number of inactive/unused redo logs available for log switching. |
Zabbix agent | oracle.redolog.info["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] Preprocessing
|
Oracle: Number of processes | The current number of user processes. |
Zabbix agent | oracle.proc.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] Preprocessing
|
Oracle: Datafiles count | The current number of datafiles. |
Zabbix agent | oracle.datafiles.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Oracle: Connection to database is unavailable | Connection to Oracle Database is currently unavailable. |
last(/Oracle by Zabbix agent 2/oracle.ping["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"])=0 |Disaster |
||
Oracle: Version has changed | The Oracle Database version has changed. Acknowledge to close the problem manually. |
last(/Oracle by Zabbix agent 2/oracle.version,#1)<>last(/Oracle by Zabbix agent 2/oracle.version,#2) and length(last(/Oracle by Zabbix agent 2/oracle.version))>0 |Info |
Manual close: Yes | |
Oracle: Failed to fetch info data | Zabbix has not received any data for the items for the last 5 minutes. The database might be unavailable for connecting. |
nodata(/Oracle by Zabbix agent 2/oracle.uptime,30m)=1 |Info |
||
Oracle: Host has been restarted | Uptime is less than 10 minutes. |
last(/Oracle by Zabbix agent 2/oracle.uptime)<10m |Info |
Manual close: Yes | |
Oracle: Instance name has changed | An Oracle Database instance name has changed. Acknowledge to close the problem manually. |
last(/Oracle by Zabbix agent 2/oracle.instance_name,#1)<>last(/Oracle by Zabbix agent 2/oracle.instance_name,#2) and length(last(/Oracle by Zabbix agent 2/oracle.instance_name))>0 |Info |
Manual close: Yes | |
Oracle: Instance hostname has changed | An Oracle Database instance hostname has changed. Acknowledge to close the problem manually. |
last(/Oracle by Zabbix agent 2/oracle.instance_hostname,#1)<>last(/Oracle by Zabbix agent 2/oracle.instance_hostname,#2) and length(last(/Oracle by Zabbix agent 2/oracle.instance_hostname))>0 |Info |
Manual close: Yes | |
Oracle: Shared pool free is too low | The free memory percent of the shared pool has been less than |
max(/Oracle by Zabbix agent 2/oracle.shared_pool_free,5m)<{$ORACLE.SHARED.FREE.MIN.WARN} |Warning |
||
Oracle: Too many active sessions | Active sessions are using more than |
min(/Oracle by Zabbix agent 2/oracle.session_count,5m) * 100 / last(/Oracle by Zabbix agent 2/oracle.session_limit) > {$ORACLE.SESSIONS.MAX.WARN} |Warning |
||
Oracle: Too many locked sessions | The number of locked sessions exceeds |
min(/Oracle by Zabbix agent 2/oracle.session_lock_rate,5m) > {$ORACLE.SESSIONS.LOCK.MAX.WARN} |Warning |
||
Oracle: Too many sessions locked | The number of locked sessions exceeding |
min(/Oracle by Zabbix agent 2/oracle.session_long_time_locked,5m) > {$ORACLE.SESSION.LONG.LOCK.MAX.WARN} |Warning |
||
Oracle: Too high database concurrency | The concurrency rate exceeds |
min(/Oracle by Zabbix agent 2/oracle.session_concurrency_rate,5m) > {$ORACLE.CONCURRENCY.MAX.WARN} |Warning |
||
Oracle: Total PGA inuse is too high | The total PGA in use is more than |
min(/Oracle by Zabbix agent 2/oracle.total_pga_used,5m) * 100 / last(/Oracle by Zabbix agent 2/oracle.pga_target) > {$ORACLE.PGA.USE.MAX.WARN} |Warning |
||
Oracle: Zabbix account will expire soon | The password for the Zabbix user in the database expires soon. |
last(/Oracle by Zabbix agent 2/oracle.user.info["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"]) < {$ORACLE.EXPIRE.PASSWORD.MIN.WARN} |Warning |
||
Oracle: Number of REDO logs available for switching is too low | The number of inactive/unused redos available for log switching is low (risk of database downtime). |
max(/Oracle by Zabbix agent 2/oracle.redolog.info["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"],5m) < {$ORACLE.REDO.MIN.WARN} |Warning |
||
Oracle: Too many active processes | Active processes are using more than |
min(/Oracle by Zabbix agent 2/oracle.proc.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"],5m) * 100 / last(/Oracle by Zabbix agent 2/oracle.processes_limit) > {$ORACLE.PROCESSES.MAX.WARN} |Warning |
||
Oracle: Too many database files | The number of datafiles is higher than |
min(/Oracle by Zabbix agent 2/oracle.datafiles.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"],5m) * 100 / last(/Oracle by Zabbix agent 2/oracle.db_files_limit) > {$ORACLE.DB.FILE.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Database discovery | Scanning databases in the database management system (DBMS). |
Zabbix agent | oracle.db.discovery["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Oracle Database '{#DBNAME}': Get CDB and No-CDB info | Gets the information about the CDB and non-CDB database on an instance. |
Zabbix agent | oracle.cdb.info["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}","{#DBNAME}"] |
Oracle Database '{#DBNAME}': Open status | 1 - MOUNTED; 2 - READ WRITE; 3 - READ ONLY; 4 - READ ONLY WITH APPLY (a physical standby database is open in real-time query mode). |
Dependent item | oracle.dbopenmode["{#DBNAME}"] Preprocessing
|
Oracle Database '{#DBNAME}': Role | The current role of the database where: 1 - SNAPSHOT STANDBY; 2 - LOGICAL STANDBY; 3 - PHYSICAL STANDBY; 4 - PRIMARY; 5 - FAR SYNC. |
Dependent item | oracle.db_role["{#DBNAME}"] Preprocessing
|
Oracle Database '{#DBNAME}': Log mode | The archive log mode where: 0 - NOARCHIVELOG; 1 - ARCHIVELOG; 2 - MANUAL. |
Dependent item | oracle.dblogmode["{#DBNAME}"] Preprocessing
|
Oracle Database '{#DBNAME}': Force logging | Indicates whether the database is under force logging mode ( |
Dependent item | oracle.dbforcelogging["{#DBNAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Oracle Database '{#DBNAME}': Open status in mount mode | The Oracle Database is in a mounted state. |
last(/Oracle by Zabbix agent 2/oracle.db_open_mode["{#DBNAME}"])=1 |Warning |
||
Oracle Database '{#DBNAME}': Open status has changed | The Oracle Database open status has changed. Acknowledge to close the problem manually. |
last(/Oracle by Zabbix agent 2/oracle.db_open_mode["{#DBNAME}"],#1)<>last(/Oracle by Zabbix agent 2/oracle.db_open_mode["{#DBNAME}"],#2) |Info |
Manual close: Yes Depends on:
|
|
Oracle Database '{#DBNAME}': Role has changed | The Oracle Database role has changed. Acknowledge to close the problem manually. |
last(/Oracle by Zabbix agent 2/oracle.db_role["{#DBNAME}"],#1)<>last(/Oracle by Zabbix agent 2/oracle.db_role["{#DBNAME}"],#2) |Info |
Manual close: Yes | |
Oracle Database '{#DBNAME}': Force logging is deactivated for DB with active Archivelog | Force logging mode is a very important metric for databases in |
last(/Oracle by Zabbix agent 2/oracle.db_force_logging["{#DBNAME}"]) = 0 and last(/Oracle by Zabbix agent 2/oracle.db_log_mode["{#DBNAME}"]) = 1 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
PDB discovery | Scanning a pluggable database (PDB) in DBMS. |
Zabbix agent | oracle.pdb.discovery["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Oracle Database '{#DBNAME}': Get PDB info | Gets the information about the PDB database on an instance. |
Zabbix agent | oracle.pdb.info["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}","{#DBNAME}"] |
Oracle Database '{#DBNAME}': Open status | 1 - MOUNTED; 2 - READ WRITE; 3 - READ ONLY; 4 - READ ONLY WITH APPLY (a physical standby database is open in real-time query mode). |
Dependent item | oracle.pdbopenmode["{#DBNAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Oracle Database '{#DBNAME}': Open status in mount mode | The Oracle Database is in a mounted state. |
last(/Oracle by Zabbix agent 2/oracle.pdb_open_mode["{#DBNAME}"])=1 |Warning |
||
Oracle Database '{#DBNAME}': Open status has changed | The Oracle Database open status has changed. Acknowledge to close the problem manually. |
last(/Oracle by Zabbix agent 2/oracle.pdb_open_mode["{#DBNAME}"],#1)<>last(/Oracle by Zabbix agent 2/oracle.pdb_open_mode["{#DBNAME}"],#2) |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Tablespace discovery | Scanning tablespaces in DBMS. |
Zabbix agent | oracle.ts.discovery["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Oracle TBS '{#TABLESPACE}': Get tablespaces stats | Gets the statistics of the tablespace. |
Zabbix agent | oracle.ts.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}","{#TABLESPACE}","{#CONTENTS}"] |
Oracle TBS '{#TABLESPACE}': Tablespace allocated, bytes | Currently allocated bytes for the tablespace (sum of the current size of datafiles). |
Dependent item | oracle.tbsallocbytes["{#TABLESPACE}"] Preprocessing
|
Oracle TBS '{#TABLESPACE}': Tablespace MAX size, bytes | The maximum size of the tablespace. |
Dependent item | oracle.tbsmaxbytes["{#TABLESPACE}"] Preprocessing
|
Oracle TBS '{#TABLESPACE}': Tablespace used, bytes | Currently used bytes for the tablespace (current size of datafiles minus the free space). |
Dependent item | oracle.tbsusedbytes["{#TABLESPACE}"] Preprocessing
|
Oracle TBS '{#TABLESPACE}': Tablespace free, bytes | Free bytes of the allocated space. |
Dependent item | oracle.tbsfreebytes["{#TABLESPACE}"] Preprocessing
|
Oracle TBS '{#TABLESPACE}': Tablespace usage, percent | Used bytes/allocated bytes*100. |
Dependent item | oracle.tbsusedfile_pct["{#TABLESPACE}"] Preprocessing
|
Oracle TBS '{#TABLESPACE}': Tablespace allocated, percent | Allocated bytes/max bytes*100. |
Dependent item | oracle.tbsusedpct["{#TABLESPACE}"] Preprocessing
|
Oracle TBS '{#TABLESPACE}': Tablespace usage from MAX, percent | Used bytes/max bytes*100. |
Dependent item | oracle.tbsusedfrommaxpct["{#TABLESPACE}"] Preprocessing
|
Oracle TBS '{#TABLESPACE}': Open status | The tablespace status where: 1 - ONLINE; 2 - OFFLINE; 3 - READ ONLY. |
Dependent item | oracle.tbs_status["{#TABLESPACE}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Oracle TBS '{#TABLESPACE}': Tablespace usage is too high | The usage of the tablespace |
min(/Oracle by Zabbix agent 2/oracle.tbs_used_file_pct["{#TABLESPACE}"],5m)>{$ORACLE.TBS.USED.PCT.MAX.WARN} |Warning |
Depends on:
|
|
Oracle TBS '{#TABLESPACE}': Tablespace usage is too high | The usage of the tablespace |
min(/Oracle by Zabbix agent 2/oracle.tbs_used_file_pct["{#TABLESPACE}"],5m)>{$ORACLE.TBS.USED.PCT.MAX.HIGH} |High |
||
Oracle TBS '{#TABLESPACE}': Tablespace utilization is too high | The utilization of the tablespace |
min(/Oracle by Zabbix agent 2/oracle.tbs_used_pct["{#TABLESPACE}"],5m)>{$ORACLE.TBS.UTIL.PCT.MAX.WARN} |Warning |
Depends on:
|
|
Oracle TBS '{#TABLESPACE}': Tablespace utilization is too high | The utilization of the tablespace |
min(/Oracle by Zabbix agent 2/oracle.tbs_used_pct["{#TABLESPACE}"],5m)>{$ORACLE.TBS.UTIL.PCT.MAX.HIGH} |High |
||
Oracle TBS '{#TABLESPACE}': Tablespace usage from MAX is too high | The usage of the tablespace |
min(/Oracle by Zabbix agent 2/oracle.tbs_used_from_max_pct["{#TABLESPACE}"],5m)>{$ORACLE.TBS.USED.PCT.FROM.MAX.WARN} |Warning |
Depends on:
|
|
Oracle TBS '{#TABLESPACE}': Tablespace utilization from MAX is too high | The usage of the tablespace |
min(/Oracle by Zabbix agent 2/oracle.tbs_used_from_max_pct["{#TABLESPACE}"],5m)>{$ORACLE.TBS.USED.PCT.FROM.MAX.HIGH} |High |
||
Oracle TBS '{#TABLESPACE}': Tablespace is OFFLINE | The tablespace is in the offline state. |
last(/Oracle by Zabbix agent 2/oracle.tbs_status["{#TABLESPACE}"])=2 |Warning |
||
Oracle TBS '{#TABLESPACE}': Tablespace status has changed | Oracle tablespace status has changed. Acknowledge to close the problem manually. |
last(/Oracle by Zabbix agent 2/oracle.tbs_status["{#TABLESPACE}"],#1)<>last(/Oracle by Zabbix agent 2/oracle.tbs_status["{#TABLESPACE}"],#2) |Info |
Manual close: Yes Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Archive log discovery | Destinations of the log archive. |
Zabbix agent | oracle.archive.discovery["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Archivelog '{#DEST_NAME}': Get archive log info | Gets the archive log statistics. |
Zabbix agent | oracle.archive.info["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}","{#DEST_NAME}"] |
Archivelog '{#DEST_NAME}': Error | Displays the error message. |
Dependent item | oracle.archivelogerror["{#DESTNAME}"] Preprocessing
|
Archivelog '{#DEST_NAME}': Last sequence | Identifies the sequence number of the last archived redo log to be archived. |
Dependent item | oracle.archiveloglogsequence["{#DEST_NAME}"] Preprocessing
|
Archivelog '{#DEST_NAME}': Status | Identifies the current status of the destination where: 1 - VALID; 2 - DEFERRED; 3 - ERROR; 0 - UNKNOWN. |
Dependent item | oracle.archiveloglogstatus["{#DEST_NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Archivelog '{#DEST_NAME}': Log Archive is not valid | The trigger will launch if the archive log destination is not in one of these states: |
last(/Oracle by Zabbix agent 2/oracle.archivelog_log_status["{#DEST_NAME}"])<2 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
ASM disk groups discovery | The ASM disk groups. |
Zabbix agent | oracle.diskgroups.discovery["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
ASM '{#DGNAME}': Get ASM stats | Gets the ASM disk group statistics. |
Zabbix agent | oracle.diskgroups.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}","{#DGNAME}"] |
ASM '{#DGNAME}': Total size | The total size of the ASM disk group. |
Dependent item | oracle.asmtotalsize["{#DGNAME}"] Preprocessing
|
ASM '{#DGNAME}': Free size | The free size of the ASM disk group. |
Dependent item | oracle.asmfreesize["{#DGNAME}"] Preprocessing
|
ASM '{#DGNAME}': Used size, percent | Usage of the ASM disk group expressed in %. |
Dependent item | oracle.asmusedpct["{#DGNAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
ASM '{#DGNAME}': Disk group usage is too high | The usage of the ASM disk group expressed in % exceeds |
min(/Oracle by Zabbix agent 2/oracle.asm_used_pct["{#DGNAME}"],5m)>{$ORACLE.ASM.USED.PCT.MAX.WARN} |Warning |
Depends on:
|
|
ASM '{#DGNAME}': Disk group usage is too high | The usage of the ASM disk group expressed in % exceeds |
min(/Oracle by Zabbix agent 2/oracle.asm_used_pct["{#DGNAME}"],5m)>{$ORACLE.ASM.USED.PCT.MAX.HIGH} |High |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of MySQL monitoring by Zabbix via ODBC and doesn't require any external scripts.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
<password>
at your discretion):CREATE USER 'zbx_monitor'@'%' IDENTIFIED BY '<password>';
GRANT REPLICATION CLIENT,PROCESS,SHOW DATABASES,SHOW VIEW ON *.* TO 'zbx_monitor'@'%';
For more information, please see MySQL documentation https://dev.mysql.com/doc/refman/8.0/en/grant.html
{$MYSQL.USER}
and {$MYSQL.PASSWORD}
.Name | Description | Default |
---|---|---|
{$MYSQL.ABORTED_CONN.MAX.WARN} | Number of failed attempts to connect to the MySQL server for trigger expressions. |
3 |
{$MYSQL.REPL_LAG.MAX.WARN} | Amount of time the slave is behind the master for trigger expressions. |
30m |
{$MYSQL.SLOW_QUERIES.MAX.WARN} | Number of slow queries for trigger expressions. |
3 |
{$MYSQL.BUFF_UTIL.MIN.WARN} | The minimum buffer pool utilization in percentage for trigger expressions. |
50 |
{$MYSQL.DSN} | System data source name. |
<Put your DSN here> |
{$MYSQL.USER} | MySQL username. |
<Put your username here> |
{$MYSQL.PASSWORD} | MySQL user password. |
<Put your password here> |
{$MYSQL.CREATEDTMPTABLES.MAX.WARN} | The maximum number of temporary tables created in memory per second for trigger expressions. |
30 |
{$MYSQL.CREATEDTMPDISK_TABLES.MAX.WARN} | The maximum number of temporary tables created on a disk per second for trigger expressions. |
10 |
{$MYSQL.CREATEDTMPFILES.MAX.WARN} | The maximum number of temporary files created on a disk per second for trigger expressions. |
10 |
{$MYSQL.INNODBLOGFILES} | Number of physical files in the InnoDB redo log for calculating |
2 |
{$MYSQL.DBNAME.MATCHES} | Filter of discoverable databases. |
.+ |
{$MYSQL.DBNAME.NOT_MATCHES} | Filter to exclude discovered databases. |
information_schema |
Name | Description | Type | Key and additional info |
---|---|---|---|
MySQL: Get status variables | Gets server global status information. |
Database monitor | db.odbc.get[getstatusvariables,"{$MYSQL.DSN}"] |
MySQL: Get database | Used for scanning databases in DBMS. |
Database monitor | db.odbc.get[get_database,"{$MYSQL.DSN}"] |
MySQL: Get replication | Gets replication status information. |
Database monitor | db.odbc.get[get_replication,"{$MYSQL.DSN}"] |
MySQL: Status | MySQL server status. |
Database monitor | db.odbc.select[ping,"{$MYSQL.DSN}"] Preprocessing
|
MySQL: Version | MySQL server version. |
Database monitor | db.odbc.select[version,"{$MYSQL.DSN}"] Preprocessing
|
MySQL: Uptime | Number of seconds that the server has been up. |
Dependent item | mysql.uptime Preprocessing
|
MySQL: Aborted clients per second | Number of connections that were aborted because the client died without closing the connection properly. |
Dependent item | mysql.aborted_clients.rate Preprocessing
|
MySQL: Aborted connections per second | Number of failed attempts to connect to the MySQL server. |
Dependent item | mysql.aborted_connects.rate Preprocessing
|
MySQL: Connection errors accept per second | Number of errors that occurred during calls to |
Dependent item | mysql.connectionerrorsaccept.rate Preprocessing
|
MySQL: Connection errors internal per second | Number of refused connections due to internal server errors, for example, out of memory errors, or failed thread starts. |
Dependent item | mysql.connectionerrorsinternal.rate Preprocessing
|
MySQL: Connection errors max connections per second | Number of refused connections due to the |
Dependent item | mysql.connectionerrorsmax_connections.rate Preprocessing
|
MySQL: Connection errors peer address per second | Number of errors while searching for the connecting client's IP address. |
Dependent item | mysql.connectionerrorspeer_address.rate Preprocessing
|
MySQL: Connection errors select per second | Number of errors during calls to |
Dependent item | mysql.connectionerrorsselect.rate Preprocessing
|
MySQL: Connection errors tcpwrap per second | Number of connections the libwrap library has refused. |
Dependent item | mysql.connectionerrorstcpwrap.rate Preprocessing
|
MySQL: Connections per second | Number of connection attempts (successful or not) to the MySQL server. |
Dependent item | mysql.connections.rate Preprocessing
|
MySQL: Max used connections | The maximum number of connections that have been in use simultaneously since the server start. |
Dependent item | mysql.maxusedconnections Preprocessing
|
MySQL: Threads cached | Number of threads in the thread cache. |
Dependent item | mysql.threads_cached Preprocessing
|
MySQL: Threads connected | Number of currently open connections. |
Dependent item | mysql.threads_connected Preprocessing
|
MySQL: Threads created per second | Number of threads created to handle connections. If the value of |
Dependent item | mysql.threads_created.rate Preprocessing
|
MySQL: Threads running | Number of threads that are not sleeping. |
Dependent item | mysql.threads_running Preprocessing
|
MySQL: Buffer pool efficiency | The item shows how effectively the buffer pool is serving reads. |
Calculated | mysql.bufferpoolefficiency |
MySQL: Buffer pool utilization | Ratio of used to total pages in the buffer pool. |
Calculated | mysql.bufferpoolutilization |
MySQL: Created tmp files on disk per second | How many temporary files |
Dependent item | mysql.createdtmpfiles.rate Preprocessing
|
MySQL: Created tmp tables on disk per second | Number of internal on-disk temporary tables created by the server while executing statements. |
Dependent item | mysql.createdtmpdisk_tables.rate Preprocessing
|
MySQL: Created tmp tables on memory per second | Number of internal temporary tables created by the server while executing statements. |
Dependent item | mysql.createdtmptables.rate Preprocessing
|
MySQL: InnoDB buffer pool pages free | The total size of the InnoDB buffer pool, in pages. |
Dependent item | mysql.innodbbufferpoolpagesfree Preprocessing
|
MySQL: InnoDB buffer pool pages total | The total size of the InnoDB buffer pool, in pages. |
Dependent item | mysql.innodbbufferpoolpagestotal Preprocessing
|
MySQL: InnoDB buffer pool read requests | Number of logical read requests. |
Dependent item | mysql.innodbbufferpoolreadrequests Preprocessing
|
MySQL: InnoDB buffer pool read requests per second | Number of logical read requests per second. |
Dependent item | mysql.innodbbufferpoolreadrequests.rate Preprocessing
|
MySQL: InnoDB buffer pool reads | Number of logical reads that InnoDB could not satisfy from the buffer pool and had to read directly from the disk. |
Dependent item | mysql.innodbbufferpool_reads Preprocessing
|
MySQL: InnoDB buffer pool reads per second | Number of logical reads per second that InnoDB could not satisfy from the buffer pool and had to read directly from the disk. |
Dependent item | mysql.innodbbufferpool_reads.rate Preprocessing
|
MySQL: InnoDB row lock time | The total time spent in acquiring row locks for InnoDB tables, in milliseconds. |
Dependent item | mysql.innodbrowlock_time Preprocessing
|
MySQL: InnoDB row lock time max | The maximum time to acquire a row lock for InnoDB tables, in milliseconds. |
Dependent item | mysql.innodbrowlocktimemax Preprocessing
|
MySQL: InnoDB row lock waits | Number of times operations on InnoDB tables had to wait for a row lock. |
Dependent item | mysql.innodbrowlock_waits Preprocessing
|
MySQL: Slow queries per second | Number of queries that have taken more than |
Dependent item | mysql.slow_queries.rate Preprocessing
|
MySQL: Bytes received | Number of bytes received from all clients. |
Dependent item | mysql.bytes_received.rate Preprocessing
|
MySQL: Bytes sent | Number of bytes sent to all clients. |
Dependent item | mysql.bytes_sent.rate Preprocessing
|
MySQL: Command Delete per second | The |
Dependent item | mysql.com_delete.rate Preprocessing
|
MySQL: Command Insert per second | The |
Dependent item | mysql.com_insert.rate Preprocessing
|
MySQL: Command Select per second | The |
Dependent item | mysql.com_select.rate Preprocessing
|
MySQL: Command Update per second | The |
Dependent item | mysql.com_update.rate Preprocessing
|
MySQL: Queries per second | Number of statements executed by the server. This variable includes statements executed within stored programs, unlike the |
Dependent item | mysql.queries.rate Preprocessing
|
MySQL: Questions per second | Number of statements executed by the server. This includes only statements sent to the server by clients and not statements executed within stored programs, unlike the |
Dependent item | mysql.questions.rate Preprocessing
|
MySQL: Binlog cache disk use | Number of transactions that used a temporary disk cache because they could not fit in the regular binary log cache, being larger than |
Dependent item | mysql.binlogcachedisk_use Preprocessing
|
MySQL: Innodb buffer pool wait free | Number of times InnoDB waited for a free page before reading or creating a page. Normally, writes to the InnoDB buffer pool happen in the background. When no clean pages are available, dirty pages are flushed first in order to free some up. This counts the numbers of wait for this operation to finish. If this value is not small, look at the increasing |
Dependent item | mysql.innodbbufferpoolwaitfree Preprocessing
|
MySQL: Innodb number open files | Number of open files held by InnoDB. InnoDB only. |
Dependent item | mysql.innodbnumopen_files Preprocessing
|
MySQL: Open table definitions | Number of cached table definitions. |
Dependent item | mysql.opentabledefinitions Preprocessing
|
MySQL: Open tables | Number of tables that are open. |
Dependent item | mysql.open_tables Preprocessing
|
MySQL: Innodb log written | Number of bytes written to the InnoDB log. |
Dependent item | mysql.innodboslog_written Preprocessing
|
MySQL: Calculated value of innodblogfile_size |
|
Calculated | mysql.innodblogfile_size Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MySQL: Service is down | MySQL is down. |
last(/MySQL by ODBC/db.odbc.select[ping,"{$MYSQL.DSN}"])=0 |High |
||
MySQL: Version has changed | The MySQL version has changed. Acknowledge to close the problem manually. |
last(/MySQL by ODBC/db.odbc.select[version,"{$MYSQL.DSN}"],#1)<>last(/MySQL by ODBC/db.odbc.select[version,"{$MYSQL.DSN}"],#2) and length(last(/MySQL by ODBC/db.odbc.select[version,"{$MYSQL.DSN}"]))>0 |Info |
Manual close: Yes | |
MySQL: Service has been restarted | MySQL uptime is less than 10 minutes. |
last(/MySQL by ODBC/mysql.uptime)<10m |Info |
||
MySQL: Failed to fetch info data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/MySQL by ODBC/mysql.uptime,30m)=1 |Info |
Depends on:
|
|
MySQL: Server has aborted connections | The number of failed attempts to connect to the MySQL server is more than |
min(/MySQL by ODBC/mysql.aborted_connects.rate,5m)>{$MYSQL.ABORTED_CONN.MAX.WARN} |Average |
Depends on:
|
|
MySQL: Refused connections | Number of refused connections due to the |
last(/MySQL by ODBC/mysql.connection_errors_max_connections.rate)>0 |Average |
||
MySQL: Buffer pool utilization is too low | The buffer pool utilization is less than |
max(/MySQL by ODBC/mysql.buffer_pool_utilization,5m)<{$MYSQL.BUFF_UTIL.MIN.WARN} |Warning |
||
MySQL: Number of temporary files created per second is high | The application using the database may be in need of query optimization. |
min(/MySQL by ODBC/mysql.created_tmp_files.rate,5m)>{$MYSQL.CREATED_TMP_FILES.MAX.WARN} |Warning |
||
MySQL: Number of on-disk temporary tables created per second is high | The application using the database may be in need of query optimization. |
min(/MySQL by ODBC/mysql.created_tmp_disk_tables.rate,5m)>{$MYSQL.CREATED_TMP_DISK_TABLES.MAX.WARN} |Warning |
||
MySQL: Number of internal temporary tables created per second is high | The application using the database may be in need of query optimization. |
min(/MySQL by ODBC/mysql.created_tmp_tables.rate,5m)>{$MYSQL.CREATED_TMP_TABLES.MAX.WARN} |Warning |
||
MySQL: Server has slow queries | The number of slow queries is more than |
min(/MySQL by ODBC/mysql.slow_queries.rate,5m)>{$MYSQL.SLOW_QUERIES.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Database discovery | Used for the discovery of the databases. |
Dependent item | mysql.database.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MySQL: Size of database {#DATABASE} | Database size. |
Database monitor | db.odbc.select[{#DATABASE}_size,"{$MYSQL.DSN}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Replication discovery | Discovery of the replication. |
Dependent item | mysql.replication.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MySQL: Replication Slave status {#MASTER_HOST} | Gets status information on the essential parameters of the slave threads. |
Dependent item | mysql.slavestatus["{#MASTERHOST}"] Preprocessing
|
MySQL: Replication Slave SQL Running State {#MASTER_HOST} | Shows the state of the SQL driver threads. |
Dependent item | mysql.slavesqlrunningstate["{#MASTERHOST}"] Preprocessing
|
MySQL: Replication Seconds Behind Master {#MASTER_HOST} | The number of seconds the slave SQL thread has been behind processing the master binary log. A high number (or an increasing one) can indicate that the slave is unable to handle events from the master in a timely fashion. |
Dependent item | mysql.secondsbehindmaster["{#MASTER_HOST}"] Preprocessing
|
MySQL: Replication Slave IO Running {#MASTER_HOST} | Whether the I/O thread for reading the master's binary log is running. Normally, you want this to be |
Dependent item | mysql.slaveiorunning["{#MASTER_HOST}"] Preprocessing
|
MySQL: Replication Slave SQL Running {#MASTER_HOST} | Whether the SQL thread for executing events in the relay log is running. As with the I/O thread, this should normally be |
Dependent item | mysql.slavesqlrunning["{#MASTER_HOST}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MySQL: Replication lag is too high | Replication delay is too long. |
min(/MySQL by ODBC/mysql.seconds_behind_master["{#MASTER_HOST}"],5m)>{$MYSQL.REPL_LAG.MAX.WARN} |Warning |
||
MySQL: The slave I/O thread is not running | Whether the I/O thread for reading the master's binary log is running. |
count(/MySQL by ODBC/mysql.slave_io_running["{#MASTER_HOST}"],#1,"eq","No")=1 |Average |
||
MySQL: The slave I/O thread is not connected to a replication master | Whether the slave I/O thread is connected to the master. |
count(/MySQL by ODBC/mysql.slave_io_running["{#MASTER_HOST}"],#1,"ne","Yes")=1 |Warning |
Depends on:
|
|
MySQL: The SQL thread is not running | Whether the SQL thread for executing events in the relay log is running. |
count(/MySQL by ODBC/mysql.slave_sql_running["{#MASTER_HOST}"],#1,"eq","No")=1 |Warning |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MariaDB discovery | Used for additional metrics if MariaDB is used. |
Dependent item | mysql.extra_metric.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MySQL: Binlog commits | Total number of transactions committed to the binary log. |
Dependent item | mysql.binlog_commits[{#SINGLETON}] Preprocessing
|
MySQL: Binlog group commits | Total number of group commits done to the binary log. |
Dependent item | mysql.binloggroupcommits[{#SINGLETON}] Preprocessing
|
MySQL: Master GTID wait count | The number of times |
Dependent item | mysql.mastergtidwait_count[{#SINGLETON}] Preprocessing
|
MySQL: Master GTID wait time | Total number of time spent in |
Dependent item | mysql.mastergtidwait_time[{#SINGLETON}] Preprocessing
|
MySQL: Master GTID wait timeouts | Number of timeouts occurring in |
Dependent item | mysql.mastergtidwait_timeouts[{#SINGLETON}] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of MySQL monitoring by Zabbix via Zabbix agent 2 and doesn't require any external scripts.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
<password>
at your discretion):CREATE USER 'zbx_monitor'@'%' IDENTIFIED BY '<password>';
GRANT REPLICATION CLIENT,PROCESS,SHOW DATABASES,SHOW VIEW ON *.* TO 'zbx_monitor'@'%';
For more information, please see MySQL documentation https://dev.mysql.com/doc/refman/8.0/en/grant.html
Set in the {$MYSQL.DSN} macro the data source name of the MySQL instance either session name from Zabbix agent 2 configuration file or URI. Examples: MySQL1, tcp://localhost:3306, tcp://172.16.0.10, unix:/var/run/mysql.sock For more information about MySQL Unix socket file, see the MySQL documentation https://dev.mysql.com/doc/refman/8.0/en/problems-with-mysql-sock.html.
If you had set URI in the {$MYSQL.DSN}, define the user name and password in host macros ({$MYSQL.USER} and {$MYSQL.PASSWORD}). Leave macros {$MYSQL.USER} and {$MYSQL.PASSWORD} empty if you use a session name. Set the user name and password in the Plugins.Mysql.<...> section of your Zabbix agent 2 configuration file. For more information about configuring the Zabbix MySQL plugin, see the documentation https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/src/go/plugins/mysql/README.md.
Name | Description | Default |
---|---|---|
{$MYSQL.USER} | MySQL user name. |
|
{$MYSQL.PASSWORD} | MySQL user password. |
|
{$MYSQL.ABORTED_CONN.MAX.WARN} | Number of failed attempts to connect to the MySQL server for trigger expressions. |
3 |
{$MYSQL.REPL_LAG.MAX.WARN} | Amount of time the slave is behind the master for trigger expressions. |
30m |
{$MYSQL.SLOW_QUERIES.MAX.WARN} | Number of slow queries for trigger expressions. |
3 |
{$MYSQL.BUFF_UTIL.MIN.WARN} | The minimum buffer pool utilization in percentage for trigger expressions. |
50 |
{$MYSQL.DSN} | System data source name such as |
<Put your DSN> |
{$MYSQL.CREATEDTMPTABLES.MAX.WARN} | The maximum number of temporary tables created in memory per second for trigger expressions. |
30 |
{$MYSQL.CREATEDTMPDISK_TABLES.MAX.WARN} | The maximum number of temporary tables created on a disk per second for trigger expressions. |
10 |
{$MYSQL.CREATEDTMPFILES.MAX.WARN} | The maximum number of temporary files created on a disk per second for trigger expressions. |
10 |
{$MYSQL.INNODBLOGFILES} | Number of physical files in the InnoDB redo log for calculating |
2 |
{$MYSQL.DBNAME.MATCHES} | Filter of discoverable databases. |
.+ |
{$MYSQL.DBNAME.NOT_MATCHES} | Filter to exclude discovered databases. |
information_schema |
Name | Description | Type | Key and additional info |
---|---|---|---|
MySQL: Get status variables | Gets server global status information. |
Zabbix agent | mysql.getstatusvariables["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"] |
MySQL: Status | MySQL server status. |
Zabbix agent | mysql.ping["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"] Preprocessing
|
MySQL: Version | MySQL server version. |
Zabbix agent | mysql.version["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"] Preprocessing
|
MySQL: Uptime | Number of seconds that the server has been up. |
Dependent item | mysql.uptime Preprocessing
|
MySQL: Aborted clients per second | Number of connections that were aborted because the client died without closing the connection properly. |
Dependent item | mysql.aborted_clients.rate Preprocessing
|
MySQL: Aborted connections per second | Number of failed attempts to connect to the MySQL server. |
Dependent item | mysql.aborted_connects.rate Preprocessing
|
MySQL: Connection errors accept per second | Number of errors that occurred during calls to |
Dependent item | mysql.connectionerrorsaccept.rate Preprocessing
|
MySQL: Connection errors internal per second | Number of refused connections due to internal server errors, for example, out of memory errors, or failed thread starts. |
Dependent item | mysql.connectionerrorsinternal.rate Preprocessing
|
MySQL: Connection errors max connections per second | Number of refused connections due to the |
Dependent item | mysql.connectionerrorsmax_connections.rate Preprocessing
|
MySQL: Connection errors peer address per second | Number of errors while searching for the connecting client's IP address. |
Dependent item | mysql.connectionerrorspeer_address.rate Preprocessing
|
MySQL: Connection errors select per second | Number of errors during calls to |
Dependent item | mysql.connectionerrorsselect.rate Preprocessing
|
MySQL: Connection errors tcpwrap per second | Number of connections the libwrap library has refused. |
Dependent item | mysql.connectionerrorstcpwrap.rate Preprocessing
|
MySQL: Connections per second | Number of connection attempts (successful or not) to the MySQL server. |
Dependent item | mysql.connections.rate Preprocessing
|
MySQL: Max used connections | The maximum number of connections that have been in use simultaneously since the server start. |
Dependent item | mysql.maxusedconnections Preprocessing
|
MySQL: Threads cached | Number of threads in the thread cache. |
Dependent item | mysql.threads_cached Preprocessing
|
MySQL: Threads connected | Number of currently open connections. |
Dependent item | mysql.threads_connected Preprocessing
|
MySQL: Threads created per second | Number of threads created to handle connections. If the value of |
Dependent item | mysql.threads_created.rate Preprocessing
|
MySQL: Threads running | Number of threads that are not sleeping. |
Dependent item | mysql.threads_running Preprocessing
|
MySQL: Buffer pool efficiency | The item shows how effectively the buffer pool is serving reads. |
Calculated | mysql.bufferpoolefficiency |
MySQL: Buffer pool utilization | Ratio of used to total pages in the buffer pool. |
Calculated | mysql.bufferpoolutilization |
MySQL: Created tmp files on disk per second | How many temporary files |
Dependent item | mysql.createdtmpfiles.rate Preprocessing
|
MySQL: Created tmp tables on disk per second | Number of internal on-disk temporary tables created by the server while executing statements. |
Dependent item | mysql.createdtmpdisk_tables.rate Preprocessing
|
MySQL: Created tmp tables on memory per second | Number of internal temporary tables created by the server while executing statements. |
Dependent item | mysql.createdtmptables.rate Preprocessing
|
MySQL: InnoDB buffer pool pages free | The total size of the InnoDB buffer pool, in pages. |
Dependent item | mysql.innodbbufferpoolpagesfree Preprocessing
|
MySQL: InnoDB buffer pool pages total | The total size of the InnoDB buffer pool, in pages. |
Dependent item | mysql.innodbbufferpoolpagestotal Preprocessing
|
MySQL: InnoDB buffer pool read requests | Number of logical read requests. |
Dependent item | mysql.innodbbufferpoolreadrequests Preprocessing
|
MySQL: InnoDB buffer pool read requests per second | Number of logical read requests per second. |
Dependent item | mysql.innodbbufferpoolreadrequests.rate Preprocessing
|
MySQL: InnoDB buffer pool reads | Number of logical reads that InnoDB could not satisfy from the buffer pool and had to read directly from the disk. |
Dependent item | mysql.innodbbufferpool_reads Preprocessing
|
MySQL: InnoDB buffer pool reads per second | Number of logical reads per second that InnoDB could not satisfy from the buffer pool and had to read directly from the disk. |
Dependent item | mysql.innodbbufferpool_reads.rate Preprocessing
|
MySQL: InnoDB row lock time | The total time spent in acquiring row locks for InnoDB tables, in milliseconds. |
Dependent item | mysql.innodbrowlock_time Preprocessing
|
MySQL: InnoDB row lock time max | The maximum time to acquire a row lock for InnoDB tables, in milliseconds. |
Dependent item | mysql.innodbrowlocktimemax Preprocessing
|
MySQL: InnoDB row lock waits | Number of times operations on InnoDB tables had to wait for a row lock. |
Dependent item | mysql.innodbrowlock_waits Preprocessing
|
MySQL: Slow queries per second | Number of queries that have taken more than |
Dependent item | mysql.slow_queries.rate Preprocessing
|
MySQL: Bytes received | Number of bytes received from all clients. |
Dependent item | mysql.bytes_received.rate Preprocessing
|
MySQL: Bytes sent | Number of bytes sent to all clients. |
Dependent item | mysql.bytes_sent.rate Preprocessing
|
MySQL: Command Delete per second | The |
Dependent item | mysql.com_delete.rate Preprocessing
|
MySQL: Command Insert per second | The |
Dependent item | mysql.com_insert.rate Preprocessing
|
MySQL: Command Select per second | The |
Dependent item | mysql.com_select.rate Preprocessing
|
MySQL: Command Update per second | The |
Dependent item | mysql.com_update.rate Preprocessing
|
MySQL: Queries per second | Number of statements executed by the server. This variable includes statements executed within stored programs, unlike the |
Dependent item | mysql.queries.rate Preprocessing
|
MySQL: Questions per second | Number of statements executed by the server. This includes only statements sent to the server by clients and not statements executed within stored programs, unlike the |
Dependent item | mysql.questions.rate Preprocessing
|
MySQL: Binlog cache disk use | Number of transactions that used a temporary disk cache because they could not fit in the regular binary log cache, being larger than |
Dependent item | mysql.binlogcachedisk_use Preprocessing
|
MySQL: Innodb buffer pool wait free | Number of times InnoDB waited for a free page before reading or creating a page. Normally, writes to the InnoDB buffer pool happen in the background. When no clean pages are available, dirty pages are flushed first in order to free some up. This counts the numbers of wait for this operation to finish. If this value is not small, look at the increasing |
Dependent item | mysql.innodbbufferpoolwaitfree Preprocessing
|
MySQL: Innodb number open files | Number of open files held by InnoDB. InnoDB only. |
Dependent item | mysql.innodbnumopen_files Preprocessing
|
MySQL: Open table definitions | Number of cached table definitions. |
Dependent item | mysql.opentabledefinitions Preprocessing
|
MySQL: Open tables | Number of tables that are open. |
Dependent item | mysql.open_tables Preprocessing
|
MySQL: Innodb log written | Number of bytes written to the InnoDB log. |
Dependent item | mysql.innodboslog_written Preprocessing
|
MySQL: Calculated value of innodblogfile_size |
|
Calculated | mysql.innodblogfile_size Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MySQL: Service is down | MySQL is down. |
last(/MySQL by Zabbix agent 2/mysql.ping["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"])=0 |High |
||
MySQL: Version has changed | The MySQL version has changed. Acknowledge to close the problem manually. |
last(/MySQL by Zabbix agent 2/mysql.version["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"],#1)<>last(/MySQL by Zabbix agent 2/mysql.version["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"],#2) and length(last(/MySQL by Zabbix agent 2/mysql.version["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"]))>0 |Info |
Manual close: Yes | |
MySQL: Service has been restarted | MySQL uptime is less than 10 minutes. |
last(/MySQL by Zabbix agent 2/mysql.uptime)<10m |Info |
||
MySQL: Failed to fetch info data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/MySQL by Zabbix agent 2/mysql.uptime,30m)=1 |Info |
Depends on:
|
|
MySQL: Server has aborted connections | The number of failed attempts to connect to the MySQL server is more than |
min(/MySQL by Zabbix agent 2/mysql.aborted_connects.rate,5m)>{$MYSQL.ABORTED_CONN.MAX.WARN} |Average |
Depends on:
|
|
MySQL: Refused connections | Number of refused connections due to the |
last(/MySQL by Zabbix agent 2/mysql.connection_errors_max_connections.rate)>0 |Average |
||
MySQL: Buffer pool utilization is too low | The buffer pool utilization is less than |
max(/MySQL by Zabbix agent 2/mysql.buffer_pool_utilization,5m)<{$MYSQL.BUFF_UTIL.MIN.WARN} |Warning |
||
MySQL: Number of temporary files created per second is high | The application using the database may be in need of query optimization. |
min(/MySQL by Zabbix agent 2/mysql.created_tmp_files.rate,5m)>{$MYSQL.CREATED_TMP_FILES.MAX.WARN} |Warning |
||
MySQL: Number of on-disk temporary tables created per second is high | The application using the database may be in need of query optimization. |
min(/MySQL by Zabbix agent 2/mysql.created_tmp_disk_tables.rate,5m)>{$MYSQL.CREATED_TMP_DISK_TABLES.MAX.WARN} |Warning |
||
MySQL: Number of internal temporary tables created per second is high | The application using the database may be in need of query optimization. |
min(/MySQL by Zabbix agent 2/mysql.created_tmp_tables.rate,5m)>{$MYSQL.CREATED_TMP_TABLES.MAX.WARN} |Warning |
||
MySQL: Server has slow queries | The number of slow queries is more than |
min(/MySQL by Zabbix agent 2/mysql.slow_queries.rate,5m)>{$MYSQL.SLOW_QUERIES.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Database discovery | Scanning databases in DBMS. |
Zabbix agent | mysql.db.discovery["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MySQL: Size of database {#DATABASE} | Database size. |
Zabbix agent | mysql.db.size["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}","{#DATABASE}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Replication discovery | If "show slave status" returns Master_Host, "Replication: *" items are created. |
Zabbix agent | mysql.replication.discovery["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MySQL: Replication Slave status {#MASTER_HOST} | The item gets status information on the essential parameters of the slave threads. |
Zabbix agent | mysql.replication.getslavestatus["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}","{#MASTER_HOST}"] |
MySQL: Replication Slave SQL Running State {#MASTER_HOST} | This shows the state of the SQL driver threads. |
Dependent item | mysql.replication.slavesqlrunningstate["{#MASTERHOST}"] Preprocessing
|
MySQL: Replication Seconds Behind Master {#MASTER_HOST} | Number of seconds that the slave SQL thread is behind processing the master binary log. A high number (or an increasing one) can indicate that the slave is unable to handle events from the master in a timely fashion. |
Dependent item | mysql.replication.secondsbehindmaster["{#MASTER_HOST}"] Preprocessing
|
MySQL: Replication Slave IO Running {#MASTER_HOST} | Whether the I/O thread for reading the master's binary log is running. Normally, you want this to be Yes unless you have not yet started a replication or have explicitly stopped it with STOP SLAVE. |
Dependent item | mysql.replication.slaveiorunning["{#MASTER_HOST}"] Preprocessing
|
MySQL: Replication Slave SQL Running {#MASTER_HOST} | Whether the SQL thread for executing events in the relay log is running. As with the I/O thread, this should normally be Yes. |
Dependent item | mysql.replication.slavesqlrunning["{#MASTER_HOST}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MySQL: Replication lag is too high | Replication delay is too long. |
min(/MySQL by Zabbix agent 2/mysql.replication.seconds_behind_master["{#MASTER_HOST}"],5m)>{$MYSQL.REPL_LAG.MAX.WARN} |Warning |
||
MySQL: The slave I/O thread is not running | Whether the I/O thread for reading the master's binary log is running. |
count(/MySQL by Zabbix agent 2/mysql.replication.slave_io_running["{#MASTER_HOST}"],#1,"eq","No")=1 |Average |
||
MySQL: The slave I/O thread is not connected to a replication master | Whether the slave I/O thread is connected to the master. |
count(/MySQL by Zabbix agent 2/mysql.replication.slave_io_running["{#MASTER_HOST}"],#1,"ne","Yes")=1 |Warning |
Depends on:
|
|
MySQL: The SQL thread is not running | Whether the SQL thread for executing events in the relay log is running. |
count(/MySQL by Zabbix agent 2/mysql.replication.slave_sql_running["{#MASTER_HOST}"],#1,"eq","No")=1 |Warning |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MariaDB discovery | Used for additional metrics if MariaDB is used. |
Dependent item | mysql.extra_metric.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MySQL: Binlog commits | Total number of transactions committed to the binary log. |
Dependent item | mysql.binlog_commits[{#SINGLETON}] Preprocessing
|
MySQL: Binlog group commits | Total number of group commits done to the binary log. |
Dependent item | mysql.binloggroupcommits[{#SINGLETON}] Preprocessing
|
MySQL: Master GTID wait count | The number of times |
Dependent item | mysql.mastergtidwait_count[{#SINGLETON}] Preprocessing
|
MySQL: Master GTID wait time | Total number of time spent in |
Dependent item | mysql.mastergtidwait_time[{#SINGLETON}] Preprocessing
|
MySQL: Master GTID wait timeouts | Number of timeouts occurring in |
Dependent item | mysql.mastergtidwait_timeouts[{#SINGLETON}] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of MySQL monitoring by Zabbix via Zabbix agent and doesn't require any external scripts.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
mysql
and mysqladmin
utilities to the global environment variable PATH.template_db_mysql.conf
file with user parameters into folder with Zabbix agent configuration (/etc/zabbix/zabbix_agentd.d/ by default). Don't forget to restart Zabbix agent.<password>
at your discretion). For example:CREATE USER 'zbx_monitor'@'%' IDENTIFIED BY '<password>';
GRANT REPLICATION CLIENT,PROCESS,SHOW DATABASES,SHOW VIEW ON *.* TO 'zbx_monitor'@'%';
For more information, please see MySQL documentation
.
.my.cnf
configuration file in the home directory of Zabbix agent for Linux distributions (/var/lib/zabbix by default) or my.cnf
in c:\ for Windows. For example:[client]
protocol=tcp
user='zbx_monitor'
password='<password>'
For more information, please see MySQL documentation
.
NOTE: Linux distributions that use SELinux may require additional steps for access configuration.
For example, the following rule could be added to the SELinux policy:
# cat <<EOF > zabbix_home.te
module zabbix_home 1.0;
require {
type zabbix_agent_t;
type zabbix_var_lib_t;
type mysqld_etc_t;
type mysqld_port_t;
type mysqld_var_run_t;
class file { open read };
class tcp_socket name_connect;
class sock_file write;
}
#============= zabbix_agent_t ==============
allow zabbix_agent_t zabbix_var_lib_t:file read;
allow zabbix_agent_t zabbix_var_lib_t:file open;
allow zabbix_agent_t mysqld_etc_t:file read;
allow zabbix_agent_t mysqld_port_t:tcp_socket name_connect;
allow zabbix_agent_t mysqld_var_run_t:sock_file write;
EOF
# checkmodule -M -m -o zabbix_home.mod zabbix_home.te
# semodule_package -o zabbix_home.pp -m zabbix_home.mod
# semodule -i zabbix_home.pp
# restorecon -R /var/lib/zabbix
Name | Description | Default |
---|---|---|
{$MYSQL.ABORTED_CONN.MAX.WARN} | Number of failed attempts to connect to the MySQL server for trigger expressions. |
3 |
{$MYSQL.HOST} | Hostname or IP of MySQL host or container. |
127.0.0.1 |
{$MYSQL.PORT} | MySQL service port. |
3306 |
{$MYSQL.REPL_LAG.MAX.WARN} | Amount of time the slave is behind the master for trigger expressions. |
30m |
{$MYSQL.SLOW_QUERIES.MAX.WARN} | Number of slow queries for trigger expressions. |
3 |
{$MYSQL.BUFF_UTIL.MIN.WARN} | The minimum buffer pool utilization in percentage for trigger expressions. |
50 |
{$MYSQL.CREATEDTMPTABLES.MAX.WARN} | The maximum number of temporary tables created in memory per second for trigger expressions. |
30 |
{$MYSQL.CREATEDTMPDISK_TABLES.MAX.WARN} | The maximum number of temporary tables created on a disk per second for trigger expressions. |
10 |
{$MYSQL.CREATEDTMPFILES.MAX.WARN} | The maximum number of temporary files created on a disk per second for trigger expressions. |
10 |
{$MYSQL.INNODBLOGFILES} | Number of physical files in the InnoDB redo log for calculating |
2 |
{$MYSQL.DBNAME.MATCHES} | Filter of discoverable databases. |
.+ |
{$MYSQL.DBNAME.NOT_MATCHES} | Filter to exclude discovered databases. |
information_schema |
Name | Description | Type | Key and additional info |
---|---|---|---|
MySQL: Get status variables | Gets server global status information. |
Zabbix agent | mysql.getstatusvariables["{$MYSQL.HOST}","{$MYSQL.PORT}"] |
MySQL: Status | MySQL server status. |
Zabbix agent | mysql.ping["{$MYSQL.HOST}","{$MYSQL.PORT}"] Preprocessing
|
MySQL: Version | MySQL server version. |
Zabbix agent | mysql.version["{$MYSQL.HOST}","{$MYSQL.PORT}"] Preprocessing
|
MySQL: Uptime | Number of seconds that the server has been up. |
Dependent item | mysql.uptime Preprocessing
|
MySQL: Aborted clients per second | Number of connections that were aborted because the client died without closing the connection properly. |
Dependent item | mysql.aborted_clients.rate Preprocessing
|
MySQL: Aborted connections per second | Number of failed attempts to connect to the MySQL server. |
Dependent item | mysql.aborted_connects.rate Preprocessing
|
MySQL: Connection errors accept per second | Number of errors that occurred during calls to |
Dependent item | mysql.connectionerrorsaccept.rate Preprocessing
|
MySQL: Connection errors internal per second | Number of refused connections due to internal server errors, for example, out of memory errors, or failed thread starts. |
Dependent item | mysql.connectionerrorsinternal.rate Preprocessing
|
MySQL: Connection errors max connections per second | Number of refused connections due to the |
Dependent item | mysql.connectionerrorsmax_connections.rate Preprocessing
|
MySQL: Connection errors peer address per second | Number of errors while searching for the connecting client's IP address. |
Dependent item | mysql.connectionerrorspeer_address.rate Preprocessing
|
MySQL: Connection errors select per second | Number of errors during calls to |
Dependent item | mysql.connectionerrorsselect.rate Preprocessing
|
MySQL: Connection errors tcpwrap per second | Number of connections the libwrap library has refused. |
Dependent item | mysql.connectionerrorstcpwrap.rate Preprocessing
|
MySQL: Connections per second | Number of connection attempts (successful or not) to the MySQL server. |
Dependent item | mysql.connections.rate Preprocessing
|
MySQL: Max used connections | The maximum number of connections that have been in use simultaneously since the server start. |
Dependent item | mysql.maxusedconnections Preprocessing
|
MySQL: Threads cached | Number of threads in the thread cache. |
Dependent item | mysql.threads_cached Preprocessing
|
MySQL: Threads connected | Number of currently open connections. |
Dependent item | mysql.threads_connected Preprocessing
|
MySQL: Threads created per second | Number of threads created to handle connections. If the value of |
Dependent item | mysql.threads_created.rate Preprocessing
|
MySQL: Threads running | Number of threads that are not sleeping. |
Dependent item | mysql.threads_running Preprocessing
|
MySQL: Buffer pool efficiency | The item shows how effectively the buffer pool is serving reads. |
Calculated | mysql.bufferpoolefficiency |
MySQL: Buffer pool utilization | Ratio of used to total pages in the buffer pool. |
Calculated | mysql.bufferpoolutilization |
MySQL: Created tmp files on disk per second | How many temporary files |
Dependent item | mysql.createdtmpfiles.rate Preprocessing
|
MySQL: Created tmp tables on disk per second | Number of internal on-disk temporary tables created by the server while executing statements. |
Dependent item | mysql.createdtmpdisk_tables.rate Preprocessing
|
MySQL: Created tmp tables on memory per second | Number of internal temporary tables created by the server while executing statements. |
Dependent item | mysql.createdtmptables.rate Preprocessing
|
MySQL: InnoDB buffer pool pages free | The total size of the InnoDB buffer pool, in pages. |
Dependent item | mysql.innodbbufferpoolpagesfree Preprocessing
|
MySQL: InnoDB buffer pool pages total | The total size of the InnoDB buffer pool, in pages. |
Dependent item | mysql.innodbbufferpoolpagestotal Preprocessing
|
MySQL: InnoDB buffer pool read requests | Number of logical read requests. |
Dependent item | mysql.innodbbufferpoolreadrequests Preprocessing
|
MySQL: InnoDB buffer pool read requests per second | Number of logical read requests per second. |
Dependent item | mysql.innodbbufferpoolreadrequests.rate Preprocessing
|
MySQL: InnoDB buffer pool reads | Number of logical reads that InnoDB could not satisfy from the buffer pool and had to read directly from the disk. |
Dependent item | mysql.innodbbufferpool_reads Preprocessing
|
MySQL: InnoDB buffer pool reads per second | Number of logical reads per second that InnoDB could not satisfy from the buffer pool and had to read directly from the disk. |
Dependent item | mysql.innodbbufferpool_reads.rate Preprocessing
|
MySQL: InnoDB row lock time | The total time spent in acquiring row locks for InnoDB tables, in milliseconds. |
Dependent item | mysql.innodbrowlock_time Preprocessing
|
MySQL: InnoDB row lock time max | The maximum time to acquire a row lock for InnoDB tables, in milliseconds. |
Dependent item | mysql.innodbrowlocktimemax Preprocessing
|
MySQL: InnoDB row lock waits | Number of times operations on InnoDB tables had to wait for a row lock. |
Dependent item | mysql.innodbrowlock_waits Preprocessing
|
MySQL: Slow queries per second | Number of queries that have taken more than |
Dependent item | mysql.slow_queries.rate Preprocessing
|
MySQL: Bytes received | Number of bytes received from all clients. |
Dependent item | mysql.bytes_received.rate Preprocessing
|
MySQL: Bytes sent | Number of bytes sent to all clients. |
Dependent item | mysql.bytes_sent.rate Preprocessing
|
MySQL: Command Delete per second | The |
Dependent item | mysql.com_delete.rate Preprocessing
|
MySQL: Command Insert per second | The |
Dependent item | mysql.com_insert.rate Preprocessing
|
MySQL: Command Select per second | The |
Dependent item | mysql.com_select.rate Preprocessing
|
MySQL: Command Update per second | The |
Dependent item | mysql.com_update.rate Preprocessing
|
MySQL: Queries per second | Number of statements executed by the server. This variable includes statements executed within stored programs, unlike the |
Dependent item | mysql.queries.rate Preprocessing
|
MySQL: Questions per second | Number of statements executed by the server. This includes only statements sent to the server by clients and not statements executed within stored programs, unlike the |
Dependent item | mysql.questions.rate Preprocessing
|
MySQL: Binlog cache disk use | Number of transactions that used a temporary disk cache because they could not fit in the regular binary log cache, being larger than |
Dependent item | mysql.binlogcachedisk_use Preprocessing
|
MySQL: Innodb buffer pool wait free | Number of times InnoDB waited for a free page before reading or creating a page. Normally, writes to the InnoDB buffer pool happen in the background. When no clean pages are available, dirty pages are flushed first in order to free some up. This counts the numbers of wait for this operation to finish. If this value is not small, look at the increasing |
Dependent item | mysql.innodbbufferpoolwaitfree Preprocessing
|
MySQL: Innodb number open files | Number of open files held by InnoDB. InnoDB only. |
Dependent item | mysql.innodbnumopen_files Preprocessing
|
MySQL: Open table definitions | Number of cached table definitions. |
Dependent item | mysql.opentabledefinitions Preprocessing
|
MySQL: Open tables | Number of tables that are open. |
Dependent item | mysql.open_tables Preprocessing
|
MySQL: Innodb log written | Number of bytes written to the InnoDB log. |
Dependent item | mysql.innodboslog_written Preprocessing
|
MySQL: Calculated value of innodblogfile_size |
|
Calculated | mysql.innodblogfile_size Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MySQL: Service is down | MySQL is down. |
last(/MySQL by Zabbix agent/mysql.ping["{$MYSQL.HOST}","{$MYSQL.PORT}"])=0 |High |
||
MySQL: Version has changed | The MySQL version has changed. Acknowledge to close the problem manually. |
last(/MySQL by Zabbix agent/mysql.version["{$MYSQL.HOST}","{$MYSQL.PORT}"],#1)<>last(/MySQL by Zabbix agent/mysql.version["{$MYSQL.HOST}","{$MYSQL.PORT}"],#2) and length(last(/MySQL by Zabbix agent/mysql.version["{$MYSQL.HOST}","{$MYSQL.PORT}"]))>0 |Info |
Manual close: Yes | |
MySQL: Service has been restarted | MySQL uptime is less than 10 minutes. |
last(/MySQL by Zabbix agent/mysql.uptime)<10m |Info |
||
MySQL: Failed to fetch info data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/MySQL by Zabbix agent/mysql.uptime,30m)=1 |Info |
Depends on:
|
|
MySQL: Server has aborted connections | The number of failed attempts to connect to the MySQL server is more than |
min(/MySQL by Zabbix agent/mysql.aborted_connects.rate,5m)>{$MYSQL.ABORTED_CONN.MAX.WARN} |Average |
Depends on:
|
|
MySQL: Refused connections | Number of refused connections due to the |
last(/MySQL by Zabbix agent/mysql.connection_errors_max_connections.rate)>0 |Average |
||
MySQL: Buffer pool utilization is too low | The buffer pool utilization is less than |
max(/MySQL by Zabbix agent/mysql.buffer_pool_utilization,5m)<{$MYSQL.BUFF_UTIL.MIN.WARN} |Warning |
||
MySQL: Number of temporary files created per second is high | The application using the database may be in need of query optimization. |
min(/MySQL by Zabbix agent/mysql.created_tmp_files.rate,5m)>{$MYSQL.CREATED_TMP_FILES.MAX.WARN} |Warning |
||
MySQL: Number of on-disk temporary tables created per second is high | The application using the database may be in need of query optimization. |
min(/MySQL by Zabbix agent/mysql.created_tmp_disk_tables.rate,5m)>{$MYSQL.CREATED_TMP_DISK_TABLES.MAX.WARN} |Warning |
||
MySQL: Number of internal temporary tables created per second is high | The application using the database may be in need of query optimization. |
min(/MySQL by Zabbix agent/mysql.created_tmp_tables.rate,5m)>{$MYSQL.CREATED_TMP_TABLES.MAX.WARN} |Warning |
||
MySQL: Server has slow queries | The number of slow queries is more than |
min(/MySQL by Zabbix agent/mysql.slow_queries.rate,5m)>{$MYSQL.SLOW_QUERIES.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Database discovery | Scanning databases in DBMS. |
Zabbix agent | mysql.db.discovery["{$MYSQL.HOST}","{$MYSQL.PORT}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MySQL: Size of database {#DBNAME} | Database size. |
Zabbix agent | mysql.dbsize["{$MYSQL.HOST}","{$MYSQL.PORT}","{#DBNAME}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Replication discovery | If "show slave status" returns Master_Host, "Replication: *" items are created. |
Zabbix agent | mysql.replication.discovery["{$MYSQL.HOST}","{$MYSQL.PORT}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MySQL: Replication Slave status {#MASTER_HOST} | The item gets status information on the essential parameters of the slave threads. |
Zabbix agent | mysql.slavestatus["{$MYSQL.HOST}","{$MYSQL.PORT}","{#MASTERHOST}"] |
MySQL: Replication Slave SQL Running State {#MASTER_HOST} | This shows the state of the SQL driver threads. |
Dependent item | mysql.slavesqlrunningstate["{#MASTERHOST}"] Preprocessing
|
MySQL: Replication Seconds Behind Master {#MASTER_HOST} | The number of seconds that the slave SQL thread is behind processing the master binary log. A high number (or an increasing one) can indicate that the slave is unable to handle events from the master in a timely fashion. |
Dependent item | mysql.secondsbehindmaster["{#MASTER_HOST}"] Preprocessing
|
MySQL: Replication Slave IO Running {#MASTER_HOST} | Whether the I/O thread for reading the master's binary log is running. Normally, you want this to be Yes unless you have not yet started replication or have explicitly stopped it with STOP SLAVE. |
Dependent item | mysql.slaveiorunning["{#MASTER_HOST}"] Preprocessing
|
MySQL: Replication Slave SQL Running {#MASTER_HOST} | Whether the SQL thread for executing events in the relay log is running. As with the I/O thread, this should normally be Yes. |
Dependent item | mysql.slavesqlrunning["{#MASTER_HOST}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MySQL: Replication lag is too high | Replication delay is too long. |
min(/MySQL by Zabbix agent/mysql.seconds_behind_master["{#MASTER_HOST}"],5m)>{$MYSQL.REPL_LAG.MAX.WARN} |Warning |
||
MySQL: The slave I/O thread is not running | Whether the I/O thread for reading the master's binary log is running. |
count(/MySQL by Zabbix agent/mysql.slave_io_running["{#MASTER_HOST}"],#1,"eq","No")=1 |Average |
||
MySQL: The slave I/O thread is not connected to a replication master | Whether the slave I/O thread is connected to the master. |
count(/MySQL by Zabbix agent/mysql.slave_io_running["{#MASTER_HOST}"],#1,"ne","Yes")=1 |Warning |
Depends on:
|
|
MySQL: The SQL thread is not running | Whether the SQL thread for executing events in the relay log is running. |
count(/MySQL by Zabbix agent/mysql.slave_sql_running["{#MASTER_HOST}"],#1,"eq","No")=1 |Warning |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MariaDB discovery | Used for additional metrics if MariaDB is used. |
Dependent item | mysql.extra_metric.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MySQL: Binlog commits | Total number of transactions committed to the binary log. |
Dependent item | mysql.binlog_commits[{#SINGLETON}] Preprocessing
|
MySQL: Binlog group commits | Total number of group commits done to the binary log. |
Dependent item | mysql.binloggroupcommits[{#SINGLETON}] Preprocessing
|
MySQL: Master GTID wait count | The number of times |
Dependent item | mysql.mastergtidwait_count[{#SINGLETON}] Preprocessing
|
MySQL: Master GTID wait time | Total number of time spent in |
Dependent item | mysql.mastergtidwait_time[{#SINGLETON}] Preprocessing
|
MySQL: Master GTID wait timeouts | Number of timeouts occurring in |
Dependent item | mysql.mastergtidwait_timeouts[{#SINGLETON}] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of MSSQL monitoring by Zabbix via ODBC and doesn't require any external scripts.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
View Server State and View Any Definition permissions should be granted to the user.
Grant this user read permissions to the sysjobschedules
, sysjobhistory
, and sysjobs
tables.
For example, using T-SQL commands:
GRANT SELECT ON OBJECT::msdb.dbo.sysjobs TO zbx_monitor;
GRANT SELECT ON OBJECT::msdb.dbo.sysjobservers TO zbx_monitor;
GRANT SELECT ON OBJECT::msdb.dbo.sysjobactivity TO zbx_monitor;
GRANT EXECUTE ON OBJECT::msdb.dbo.agent_datetime TO zbx_monitor;
For more information, see MSSQL documentation:
Configure a User to Create and Manage SQL Server Agent Jobs
Set the username and password in the host macros {$MSSQL.USER}
and {$MSSQL.PASSWORD}
.
Do not forget to install Microsoft ODBC driver on Zabbix server or Zabbix proxy and specify data source name in macro {$MSSQL.DSN}
.
See Microsoft documentation for instructions: https://docs.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server?view=sql-server-ver16.
Note! Credentials in the odbc.ini
do not work for MSSQL.
The Service's TCP port state
item uses the {HOST.CONN}
and {$MSSQL.PORT}
macros to check the availability of the MSSQL instance. Keep in mind that if dynamic ports are used on the MSSQL server side, this check will not work correctly.
If your instance uses a non-default TCP port, set the port in your section of odbc.ini
in the line Server = IP or FQDN name, port.
Note: You can use the context macros {$MSSQL.BACKUP_FULL.USED}
, {$MSSQL.BACKUP_LOG.USED}
, and {$MSSQL.BACKUP_DIFF.USED}
to disable backup age triggers for a certain database. If set to a value other than "1", the trigger expression for the backup age will not fire.
Name | Description | Default |
---|---|---|
{$MSSQL.DSN} | System data source name. |
<Put your DSN here> |
{$MSSQL.USER} | MSSQL username. |
<Put your username here> |
{$MSSQL.PASSWORD} | MSSQL user password. |
<Put your password here> |
{$MSSQL.PORT} | MSSQL TCP port. |
1433 |
{$MSSQL.DBNAME.MATCHES} | This macro is used in database discovery. It can be overridden on the host or linked template level. |
.* |
{$MSSQL.DBNAME.NOT_MATCHES} | This macro is used in database discovery. It can be overridden on the host or linked template level. |
master|tempdb|model|msdb |
{$MSSQL.WORK_FILES.MAX} | The maximum number of work files created per second - for the trigger expression. |
20 |
{$MSSQL.WORK_TABLES.MAX} | The maximum number of work tables created per second - for the trigger expression. |
20 |
{$MSSQL.WORKTABLESFROMCACHE_RATIO.MIN.CRIT} | The minimum percentage of work tables from the cache ratio - for the High trigger expression. |
90 |
{$MSSQL.BUFFERCACHERATIO.MIN.CRIT} | The minimum buffer cache hit ratio, in percent - for the High trigger expression. |
30 |
{$MSSQL.BUFFERCACHERATIO.MIN.WARN} | The minimum buffer cache hit ratio, in percent - for the Warning trigger expression. |
50 |
{$MSSQL.FREELISTSTALLS.MAX} | The maximum free list stalls per second - for the trigger expression. |
2 |
{$MSSQL.LAZY_WRITES.MAX} | The maximum lazy writes per second - for the trigger expression. |
20 |
{$MSSQL.PAGELIFEEXPECTANCY.MIN} | The minimum page life expectancy - for the trigger expression. |
300 |
{$MSSQL.PAGE_READS.MAX} | The maximum page reads per second - for the trigger expression. |
90 |
{$MSSQL.PAGE_WRITES.MAX} | The maximum page writes per second - for the trigger expression. |
90 |
{$MSSQL.AVERAGEWAITTIME.MAX} | The maximum average wait time, in milliseconds - for the trigger expression. |
500 |
{$MSSQL.LOCK_REQUESTS.MAX} | The maximum lock requests per second - for the trigger expression. |
1000 |
{$MSSQL.LOCK_TIMEOUTS.MAX} | The maximum lock timeouts per second - for the trigger expression. |
1 |
{$MSSQL.DEADLOCKS.MAX} | The maximum deadlocks per second - for the trigger expression. |
1 |
{$MSSQL.LOGFLUSHWAITS.MAX} | The maximum log flush waits per second - for the trigger expression. |
1 |
{$MSSQL.LOGFLUSHWAIT_TIME.MAX} | The maximum log flush wait time, in milliseconds - for the trigger expression. |
1 |
{$MSSQL.PERCENTLOGUSED.MAX} | The maximum percentage of log used - for the trigger expression. |
80 |
{$MSSQL.PERCENT_COMPILATIONS.MAX} | The maximum percentage of Transact-SQL compilations - for the trigger expression. |
10 |
{$MSSQL.PERCENT_RECOMPILATIONS.MAX} | The maximum percentage of Transact-SQL recompilations - for the trigger expression. |
10 |
{$MSSQL.PERCENT_READAHEAD.MAX} | The maximum percentage of pages read per second in anticipation of use - for the trigger expression. |
20 |
{$MSSQL.BACKUP_DIFF.CRIT} | The maximum of days without a differential backup - for the High trigger expression. |
6d |
{$MSSQL.BACKUP_DIFF.WARN} | The maximum of days without a differential backup - for the Warning trigger expression. |
3d |
{$MSSQL.BACKUP_FULL.CRIT} | The maximum of days without a full backup - for the High trigger expression. |
10d |
{$MSSQL.BACKUP_FULL.WARN} | The maximum of days without a full backup - for the Warning trigger expression. |
9d |
{$MSSQL.BACKUP_LOG.CRIT} | The maximum of days without a log backup - for the High trigger expression. |
8h |
{$MSSQL.BACKUP_LOG.WARN} | The maximum of days without a log backup - for the Warning trigger expression. |
4h |
{$MSSQL.JOB.MATCHES} | This macro is used in job discovery. It can be overridden on the host or linked template level. |
.* |
{$MSSQL.JOB.NOT_MATCHES} | This macro is used in job discovery. It can be overridden on the host or linked template level. |
CHANGE_IF_NEEDED |
{$MSSQL.BACKUP_DURATION.WARN} | The maximum job duration - for the Warning trigger expression. |
1h |
{$MSSQL.BACKUP_FULL.USED} | The flag for checking the age of a full backup. If set to a value other than "1", the trigger expression for the full backup age will not fire. Can be used with context for database name. |
1 |
{$MSSQL.BACKUP_LOG.USED} | The flag for checking the age of a log backup. If set to a value other than "1", the trigger expression for the log backup age will not fire. Can be used with context for database name. |
1 |
{$MSSQL.BACKUP_DIFF.USED} | The flag for checking the age of a differential backup. If set to a value other than "1", the trigger expression for the differential backup age will not fire. Can be used with context for database name. |
1 |
{$MSSQL.QUORUM.MEMBER.DISCOVERY.NAME.MATCHES} | Filter to include discovered quorum member by name. |
.* |
{$MSSQL.QUORUM.MEMBER.DISCOVERY.NAME.NOT_MATCHES} | Filter to exclude discovered quorum member by name. |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
MSSQL: Service's TCP port state | Test the availability of MSSQL Server on a TCP port. |
Simple check | net.tcp.service[tcp,{HOST.CONN},{$MSSQL.PORT}] Preprocessing
|
MSSQL: Get last backup | The item gets information about backup processes. |
Database monitor | db.odbc.get[getlastbackup,"{$MSSQL.DSN}"] |
MSSQL: Get job status | The item gets the SQL agent job status. |
Database monitor | db.odbc.get[getjobstatus,"{$MSSQL.DSN}"] |
MSSQL: Get performance counters | The item gets server global status information. |
Database monitor | db.odbc.get[getstatusvariables,"{$MSSQL.DSN}"] |
MSSQL: Get availability groups | The item gets availability group states - name, primary and secondary health, synchronization health. |
Database monitor | db.odbc.get[getavailabilitygroup,"{$MSSQL.DSN}"] |
MSSQL: Get local DB | Getting the states of the local availability database. |
Database monitor | db.odbc.get[getlocaldb,"{$MSSQL.DSN}"] |
MSSQL: Get DB mirroring | Getting DB mirroring. |
Database monitor | db.odbc.get[getdbmirroring,"{$MSSQL.DSN}"] |
MSSQL: Get non-local DB | Getting the non-local availability database. |
Database monitor | db.odbc.get[getnonlocal_db,"{$MSSQL.DSN}"] |
MSSQL: Get replica | Getting the database replica. |
Database monitor | db.odbc.get[get_replica,"{$MSSQL.DSN}"] |
MSSQL: Get quorum | Getting quorum - cluster name, type, and state. |
Database monitor | db.odbc.get[get_quorum,"{$MSSQL.DSN}"] |
MSSQL: Get quorum member | Getting quorum members - member name, type, state, and number of quorum votes. |
Database monitor | db.odbc.get[getquorummember,"{$MSSQL.DSN}"] |
MSSQL: Get database | Getting databases - database name and recovery model. |
Database monitor | db.odbc.get[get_database,"{$MSSQL.DSN}"] |
MSSQL: Version | MSSQL Server version. |
Dependent item | mssql.version Preprocessing
|
MSSQL: Uptime | MSSQL Server uptime in the format "N days, hh:mm:ss". |
Dependent item | mssql.uptime Preprocessing
|
MSSQL: Get Access Methods counters | The item gets server information about access methods. |
Dependent item | mssql.access_methods.raw Preprocessing
|
MSSQL: Forwarded records per second | Number of records per second fetched through forwarded record pointers. |
Dependent item | mssql.forwardedrecordssec.rate Preprocessing
|
MSSQL: Full scans per second | Number of unrestricted full scans per second. These can be either base-table or full-index scans. Values greater than 1 or 2 indicate that there are table / index page scans. If that is combined with high CPU, this counter requires further investigation, otherwise, if the full scans are on small tables, it can be ignored. |
Dependent item | mssql.fullscanssec.rate Preprocessing
|
MSSQL: Index searches per second | Number of index searches per second. These are used to start a range scan, reposition a range scan, revalidate a scan point, fetch a single index record, and search down the index to locate where to insert a new row. |
Dependent item | mssql.indexsearchessec.rate Preprocessing
|
MSSQL: Page splits per second | Number of page splits per second that occur as a result of overflowing index pages. |
Dependent item | mssql.pagesplitssec.rate Preprocessing
|
MSSQL: Work files created per second | Number of work files created per second. For example, work files can be used to store temporary results for hash joins and hash aggregates. |
Dependent item | mssql.workfilescreatedsec.rate Preprocessing
|
MSSQL: Work tables created per second | Number of work tables created per second. For example, work tables can be used to store temporary results for query spool, LOB variables, XML variables, and cursors. |
Dependent item | mssql.worktablescreatedsec.rate Preprocessing
|
MSSQL: Table lock escalations per second | Number of times locks on a table were escalated to the TABLE or HoBT granularity. |
Dependent item | mssql.tablelockescalations.rate Preprocessing
|
MSSQL: Worktables from cache ratio | Percentage of work tables created where the initial two pages of the work table were not allocated but were immediately available from the work table cache. |
Dependent item | mssql.worktablesfromcache_ratio Preprocessing
|
MSSQL: Get Buffer Manager counters | The item gets server information about the buffer pool. |
Dependent item | mssql.buffer_manager.raw Preprocessing
|
MSSQL: Buffer cache hit ratio | Indicates the percentage of pages found in the buffer cache without having to read from the disk. The ratio is the total number of cache hits divided by the total number of cache lookups over the last few thousand page accesses. After a long period of time, the ratio changes very little. Since reading from the cache is much less expensive than reading from the disk, a higher value is preferred for this item. To increase the buffer cache hit ratio, consider increasing the amount of memory available to MSSQL Server or using the buffer pool extension feature. |
Dependent item | mssql.buffercachehit_ratio Preprocessing
|
MSSQL: Checkpoint pages per second | Indicates the number of pages flushed to the disk per second by a checkpoint or other operation which required all dirty pages to be flushed. |
Dependent item | mssql.checkpointpagessec.rate Preprocessing
|
MSSQL: Database pages | Indicates the number of pages in the buffer pool with database content. |
Dependent item | mssql.database_pages Preprocessing
|
MSSQL: Free list stalls per second | Indicates the number of requests per second that had to wait for a free page. |
Dependent item | mssql.freeliststalls_sec.rate Preprocessing
|
MSSQL: Lazy writes per second | Indicates the number of buffers written per second by the buffer manager's lazy writer. The lazy writer is a system process that flushes out batches of dirty, aged buffers (buffers that contain changes that must be written back to the disk before the buffer can be reused for a different page) and makes them available to user processes. The lazy writer eliminates the need to perform frequent checkpoints in order to create available buffers. |
Dependent item | mssql.lazywritessec.rate Preprocessing
|
MSSQL: Page life expectancy | Indicates the number of seconds a page will stay in the buffer pool without references. |
Dependent item | mssql.pagelifeexpectancy Preprocessing
|
MSSQL: Page lookups per second | Indicates the number of requests per second to find a page in the buffer pool. |
Dependent item | mssql.pagelookupssec.rate Preprocessing
|
MSSQL: Page reads per second | Indicates the number of physical database page reads that are issued per second. This statistic displays the total number of physical page reads across all databases. As physical I/O is expensive, you may be able to minimize the cost either by using a larger data cache, intelligent indexes, and more efficient queries, or by changing the database design. |
Dependent item | mssql.pagereadssec.rate Preprocessing
|
MSSQL: Page writes per second | Indicates the number of physical database page writes that are issued per second. |
Dependent item | mssql.pagewritessec.rate Preprocessing
|
MSSQL: Read-ahead pages per second | Indicates the number of pages read per second in anticipation of use. |
Dependent item | mssql.readaheadpagessec.rate Preprocessing
|
MSSQL: Target pages | The optimal number of pages in the buffer pool. |
Dependent item | mssql.target_pages Preprocessing
|
MSSQL: Get DB counters | The item gets summary information about databases. |
Dependent item | mssql.db_info.raw Preprocessing
|
MSSQL: Total data file size | Total size of all data files. |
Dependent item | mssql.datafilessize Preprocessing
|
MSSQL: Total log file size | Total size of all the transaction log files. |
Dependent item | mssql.logfilessize Preprocessing
|
MSSQL: Total log file used size | The cumulative size of all the log files in the database. |
Dependent item | mssql.logfilesused_size Preprocessing
|
MSSQL: Total transactions per second | Total number of transactions started for all databases per second. |
Dependent item | mssql.transactions_sec.rate Preprocessing
|
MSSQL: Get General Statistics counters | The item gets general statistics information. |
Dependent item | mssql.general_statistics.raw Preprocessing
|
MSSQL: Logins per second | Total number of logins started per second. This does not include pooled connections. Any value over 2 may indicate insufficient connection pooling. |
Dependent item | mssql.logins_sec.rate Preprocessing
|
MSSQL: Logouts per second | Total number of logout operations started per second. Any value over 2 may indicate insufficient connection pooling. |
Dependent item | mssql.logouts_sec.rate Preprocessing
|
MSSQL: Number of blocked processes | Number of currently blocked processes. |
Dependent item | mssql.processes_blocked Preprocessing
|
MSSQL: Number of users connected | Number of users connected to MSSQL Server. |
Dependent item | mssql.user_connections Preprocessing
|
MSSQL: Average latch wait time | Average latch wait time (in milliseconds) for latch requests that had to wait. |
Calculated | mssql.averagelatchwait_time |
MSSQL: Get Latches counters | The item gets server information about latches. |
Dependent item | mssql.latches_info.raw Preprocessing
|
MSSQL: Average latch wait time raw | Average latch wait time (in milliseconds) for latch requests that had to wait. |
Dependent item | mssql.averagelatchwaittimeraw Preprocessing
|
MSSQL: Average latch wait time base | For internal use only. |
Dependent item | mssql.averagelatchwaittimebase Preprocessing
|
MSSQL: Latch waits per second | The number of latch requests that could not be granted immediately. Latches are lightweight means of holding a very transient server resource, such as an address in memory. |
Dependent item | mssql.latchwaitssec.rate Preprocessing
|
MSSQL: Total latch wait time | Total latch wait time (in milliseconds) for latch requests in the last second. This value should stay stable compared to the number of latch waits per second. |
Dependent item | mssql.totallatchwait_time Preprocessing
|
MSSQL: Total average wait time | The average wait time, in milliseconds, for each lock request that had to wait. |
Calculated | mssql.averagewaittime |
MSSQL: Get Locks counters | The item gets server information about locks. |
Dependent item | mssql.locks_info.raw Preprocessing
|
MSSQL: Total average wait time raw | Average amount of wait time (in milliseconds) for each lock request that resulted in a wait. Information for all locks. |
Dependent item | mssql.averagewaittime_raw Preprocessing
|
MSSQL: Total average wait time base | For internal use only. |
Dependent item | mssql.averagewaittime_base Preprocessing
|
MSSQL: Total lock requests per second | Number of new locks and lock conversions per second requested from the lock manager. |
Dependent item | mssql.lockrequestssec.rate Preprocessing
|
MSSQL: Total lock requests per second that timed out | Number of timed out lock requests per second, including requests for NOWAIT locks. |
Dependent item | mssql.locktimeoutssec.rate Preprocessing
|
MSSQL: Total lock requests per second that required waiting | Number of lock requests per second that required the caller to wait. |
Dependent item | mssql.lockwaitssec.rate Preprocessing
|
MSSQL: Lock wait time | Average of total wait time (in milliseconds) for locks in the last second. |
Dependent item | mssql.lockwaittime Preprocessing
|
MSSQL: Total lock requests per second that have deadlocks | Number of lock requests per second that resulted in a deadlock. |
Dependent item | mssql.numberdeadlockssec.rate Preprocessing
|
MSSQL: Get Memory counters | The item gets memory information. |
Dependent item | mssql.mem_manager.raw Preprocessing
|
MSSQL: Granted Workspace Memory | Specifies the total amount of memory currently granted to executing processes, such as hash, sort, bulk copy, and index creation operations. |
Dependent item | mssql.grantedworkspacememory Preprocessing
|
MSSQL: Maximum workspace memory | Indicates the maximum amount of memory available for executing processes, such as hash, sort, bulk copy, and index creation operations. |
Dependent item | mssql.maximumworkspacememory Preprocessing
|
MSSQL: Memory grants outstanding | Specifies the total number of processes that have successfully acquired a workspace memory grant. |
Dependent item | mssql.memorygrantsoutstanding Preprocessing
|
MSSQL: Memory grants pending | Specifies the total number of processes waiting for a workspace memory grant. |
Dependent item | mssql.memorygrantspending Preprocessing
|
MSSQL: Target server memory | Indicates the ideal amount of memory the server can consume. |
Dependent item | mssql.targetservermemory Preprocessing
|
MSSQL: Total server memory | Specifies the amount of memory the server has committed using the memory manager. |
Dependent item | mssql.totalservermemory Preprocessing
|
MSSQL: Get Cache counters | The item gets server information about cache. |
Dependent item | mssql.cache_info.raw Preprocessing
|
MSSQL: Cache hit ratio | Ratio between cache hits and lookups. |
Dependent item | mssql.cachehitratio Preprocessing
|
MSSQL: Cache object counts | Number of cache objects in the cache. |
Dependent item | mssql.cacheobjectcounts Preprocessing
|
MSSQL: Cache objects in use | Number of cache objects in use. |
Dependent item | mssql.cacheobjectsin_use Preprocessing
|
MSSQL: Cache pages | Number of 8-kilobyte (KB) pages used by cache objects. |
Dependent item | mssql.cache_pages Preprocessing
|
MSSQL: Get SQL Errors counters | The item gets SQL error information. |
Dependent item | mssql.sql_errors.raw Preprocessing
|
MSSQL: Errors per second (DB offline errors) | Number of errors per second. |
Dependent item | mssql.offlineerrorssec.rate Preprocessing
|
MSSQL: Errors per second (Info errors) | Number of errors per second. |
Dependent item | mssql.infoerrorssec.rate Preprocessing
|
MSSQL: Errors per second (Kill connection errors) | Number of errors per second. |
Dependent item | mssql.killconnectionerrors_sec.rate Preprocessing
|
MSSQL: Errors per second (User errors) | Number of errors per second. |
Dependent item | mssql.usererrorssec.rate Preprocessing
|
MSSQL: Total errors per second | Number of errors per second. |
Dependent item | mssql.errors_sec.rate Preprocessing
|
MSSQL: Get SQL Statistics counters | The item gets SQL statistics information. |
Dependent item | mssql.sql_statistics.raw Preprocessing
|
MSSQL: Auto-param attempts per second | Number of auto-parameterization attempts per second. The total should be the sum of the failed, safe, and unsafe auto-parameterizations. Auto-parameterization occurs when an instance of SQL Server tries to parameterize a Transact-SQL request by replacing some literals with parameters so that reuse of the resulting cached execution plan across multiple similar-looking requests is possible. Note that auto-parameterizations are also known as simple parameterizations in the newer versions of SQL Server. This counter does not include forced parameterizations. |
Dependent item | mssql.autoparamattemptssec.rate Preprocessing
|
MSSQL: Batch requests per second | Number of Transact-SQL command batches received per second. This statistic is affected by all constraints (such as I/O, number of users, cache size, complexity of requests, and so on). High batch requests mean good throughput. |
Dependent item | mssql.batchrequestssec.rate Preprocessing
|
MSSQL: Percent of ad hoc queries running | The ratio of SQL compilations per second to batch requests per second, in percent. |
Calculated | mssql.percentofadhoc_queries |
MSSQL: Percent of Recompiled Transact-SQL Objects | The ratio of SQL re-compilations per second to SQL compilations per second, in percent. |
Calculated | mssql.percentrecompilationsto_compilations |
MSSQL: Full scans to Index searches ratio | The ratio of full scans per second to index searches per second. The threshold recommendation is strictly for OLTP workloads. |
Calculated | mssql.scantosearch |
MSSQL: Failed auto-params per second | Number of failed auto-parameterization attempts per second. This number should be small. Note that auto-parameterizations are also known as simple parameterizations in the newer versions of SQL Server. |
Dependent item | mssql.failedautoparamssec.rate Preprocessing
|
MSSQL: Safe auto-params per second | Number of safe auto-parameterization attempts per second. Safe refers to a determination that a cached execution plan can be shared between different similar-looking Transact-SQL statements. SQL Server makes many auto-parameterization attempts, some of which turn out to be safe and others fail. Note that auto-parameterizations are also known as simple parameterizations in the newer versions of SQL Server. This does not include forced parameterizations. |
Dependent item | mssql.safeautoparamssec.rate Preprocessing
|
MSSQL: SQL compilations per second | Number of SQL compilations per second. Indicates the number of times the compile code path is entered. Includes runs caused by statement-level recompilations in SQL Server. After SQL Server user activity is stable, this value reaches a steady state. |
Dependent item | mssql.sqlcompilationssec.rate Preprocessing
|
MSSQL: SQL re-compilations per second | Number of statement recompiles per second. Counts the number of times statement recompiles are triggered. Generally, you want the recompiles to be low. |
Dependent item | mssql.sqlrecompilationssec.rate Preprocessing
|
MSSQL: Unsafe auto-params per second | Number of unsafe auto-parameterization attempts per second. For example, the query has some characteristics that prevent the cached plan from being shared. These are designated as unsafe. This does not count the number of forced parameterizations. |
Dependent item | mssql.unsafeautoparamssec.rate Preprocessing
|
MSSQL: Total transactions number | The number of currently active transactions of all types. |
Dependent item | mssql.transactions Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MSSQL: Service is unavailable | The TCP port of the MSSQL Server service is currently unavailable. |
last(/MSSQL by ODBC/net.tcp.service[tcp,{HOST.CONN},{$MSSQL.PORT}])=0 |Disaster |
||
MSSQL: Version has changed | MSSQL version has changed. Acknowledge to close the problem manually. |
last(/MSSQL by ODBC/mssql.version,#1)<>last(/MSSQL by ODBC/mssql.version,#2) and length(last(/MSSQL by ODBC/mssql.version))>0 |Info |
Manual close: Yes | |
MSSQL: Service has been restarted | Uptime is less than 10 minutes. |
last(/MSSQL by ODBC/mssql.uptime)<10m |Info |
Manual close: Yes | |
MSSQL: Failed to fetch info data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/MSSQL by ODBC/mssql.uptime,30m)=1 |Info |
Depends on:
|
|
MSSQL: Too frequently using pointers | Rows with VARCHAR columns can experience expansion when VARCHAR values are updated with a longer string. In the case where the row cannot fit in the existing page, the row migrates, and access to the row will traverse a pointer. This only happens on heaps (tables without clustered indexes). In cases where clustered indexes cannot be used, drop non-clustered indexes, build a clustered index to reorg pages and rows, drop the clustered index, then recreate non-clustered indexes. |
last(/MSSQL by ODBC/mssql.forwarded_records_sec.rate) * 100 > 10 * last(/MSSQL by ODBC/mssql.batch_requests_sec.rate) |Warning |
||
MSSQL: Number of work files created per second is high | Too many work files created per second to store temporary results for hash joins and hash aggregates. |
min(/MSSQL by ODBC/mssql.workfiles_created_sec.rate,5m)>{$MSSQL.WORK_FILES.MAX} |Average |
||
MSSQL: Number of work tables created per second is high | Too many work tables created per second to store temporary results for query spool, LOB variables, XML variables, and cursors. |
min(/MSSQL by ODBC/mssql.worktables_created_sec.rate,5m)>{$MSSQL.WORK_TABLES.MAX} |Average |
||
MSSQL: Percentage of work tables available from the work table cache is low | A value less than 90% may indicate insufficient memory, since execution plans are being dropped, or, on 32-bit systems, may indicate the need for an upgrade to a 64-bit system. |
max(/MSSQL by ODBC/mssql.worktables_from_cache_ratio,5m)<{$MSSQL.WORKTABLES_FROM_CACHE_RATIO.MIN.CRIT} |High |
||
MSSQL: Percentage of the buffer cache efficiency is low | Too low buffer cache hit ratio. |
max(/MSSQL by ODBC/mssql.buffer_cache_hit_ratio,5m)<{$MSSQL.BUFFER_CACHE_RATIO.MIN.CRIT} |High |
||
MSSQL: Percentage of the buffer cache efficiency is low | Low buffer cache hit ratio. |
max(/MSSQL by ODBC/mssql.buffer_cache_hit_ratio,5m)<{$MSSQL.BUFFER_CACHE_RATIO.MIN.WARN} |Warning |
Depends on:
|
|
MSSQL: Number of rps waiting for a free page is high | Some requests have to wait for a free page. |
min(/MSSQL by ODBC/mssql.free_list_stalls_sec.rate,5m)>{$MSSQL.FREE_LIST_STALLS.MAX} |Warning |
||
MSSQL: Number of buffers written per second by the lazy writer is high | The number of buffers written per second by the buffer manager's lazy writer exceeds the threshold. |
min(/MSSQL by ODBC/mssql.lazy_writes_sec.rate,5m)>{$MSSQL.LAZY_WRITES.MAX} |Warning |
||
MSSQL: Page life expectancy is low | The page stays in the buffer pool without references for less time than the threshold value. |
max(/MSSQL by ODBC/mssql.page_life_expectancy,15m)<{$MSSQL.PAGE_LIFE_EXPECTANCY.MIN} |High |
||
MSSQL: Number of physical database page reads per second is high | The physical database page reads are issued too frequently. |
min(/MSSQL by ODBC/mssql.page_reads_sec.rate,5m)>{$MSSQL.PAGE_READS.MAX} |Warning |
||
MSSQL: Number of physical database page writes per second is high | The physical database page writes are issued too frequently. |
min(/MSSQL by ODBC/mssql.page_writes_sec.rate,5m)>{$MSSQL.PAGE_WRITES.MAX} |Warning |
||
MSSQL: Too many physical reads occurring | If this value makes up even a sizeable minority of the total "Page Reads/sec" (say, greater than 20% of the total page reads), you may have too many physical reads occurring. |
last(/MSSQL by ODBC/mssql.readahead_pages_sec.rate) > {$MSSQL.PERCENT_READAHEAD.MAX} / 100 * last(/MSSQL by ODBC/mssql.page_reads_sec.rate) |Warning |
||
MSSQL: Total average wait time for locks is high | An average wait time longer than 500 ms may indicate excessive blocking. This value should generally correlate to "Lock Waits/sec" and move up or down with it accordingly. |
min(/MSSQL by ODBC/mssql.average_wait_time,5m)>{$MSSQL.AVERAGE_WAIT_TIME.MAX} |Warning |
||
MSSQL: Total number of locks per second is high | Number of new locks and lock conversions per second requested from the lock manager is high. |
min(/MSSQL by ODBC/mssql.lock_requests_sec.rate,5m)>{$MSSQL.LOCK_REQUESTS.MAX} |Warning |
||
MSSQL: Total lock requests per second that timed out is high | The total number of timed out lock requests per second, including requests for NOWAIT locks, is high. |
min(/MSSQL by ODBC/mssql.lock_timeouts_sec.rate,5m)>{$MSSQL.LOCK_TIMEOUTS.MAX} |Warning |
||
MSSQL: Some blocking is occurring for 5m | Values greater than zero indicate at least some blocking is occurring, while a value of zero can quickly eliminate blocking as a potential root-cause problem. |
min(/MSSQL by ODBC/mssql.lock_waits_sec.rate,5m)>0 |Average |
||
MSSQL: Number of deadlocks is high | Too many deadlocks are occurring currently. |
min(/MSSQL by ODBC/mssql.number_deadlocks_sec.rate,5m)>{$MSSQL.DEADLOCKS.MAX} |Average |
||
MSSQL: Percent of ad hoc queries running is high | The lower this value is, the better. High values often indicate excessive ad hoc querying and should be as low as possible. If excessive ad hoc querying is happening, try rewriting the queries as procedures or invoke the queries using |
min(/MSSQL by ODBC/mssql.percent_of_adhoc_queries,15m) > {$MSSQL.PERCENT_COMPILATIONS.MAX} |Warning |
||
MSSQL: Percent of times statement recompiles is high | This number should be at or near zero, since recompiles can cause deadlocks and exclusive compile locks. This counter's value should follow in proportion to "Batch Requests/sec" and "SQL Compilations/sec". |
min(/MSSQL by ODBC/mssql.percent_recompilations_to_compilations,15m) > {$MSSQL.PERCENT_RECOMPILATIONS.MAX} |Warning |
||
MSSQL: Number of index and table scans exceeds index searches in the last 15m | Index searches are preferable to index and table scans. For OLTP applications, optimize for more index searches and less scans (preferably, 1 full scan for every 1000 index searches). Index and table scans are expensive I/O operations. |
min(/MSSQL by ODBC/mssql.scan_to_search,15m) > 0.001 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Database discovery | Scanning databases in DBMS. |
Dependent item | mssql.database.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MSSQL DB '{#DBNAME}': Get performance counters | The item gets server status information for {#DBNAME}. |
Dependent item | mssql.db.perf_raw["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Get last backup | The item gets information about backup processes for {#DBNAME}. |
Dependent item | mssql.backup.raw["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': State | 0 = Online 1 = Restoring 2 = Recovering | SQL Server 2008 and later 3 = Recoverypending | SQL Server 2008 and later 4 = Suspect 5 = Emergency | SQL Server 2008 and later 6 = Offline | SQL Server 2008 and later 7 = Copying | Azure SQL Database Active Geo-Replication 10 = Offlinesecondary | Azure SQL Database Active Geo-Replication |
Dependent item | mssql.db.state["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Active transactions | Number of active transactions for the database. |
Dependent item | mssql.db.active_transactions["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Data file size | Cumulative size of all the data files in the database including any automatic growth. Monitoring this counter is useful, for example, for determining the correct size of |
Dependent item | mssql.db.datafilessize["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Log bytes flushed per second | Total number of log bytes flushed per second. Useful for determining trends and utilization of the transaction log. |
Dependent item | mssql.db.logbytesflushed_sec.rate["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Log file size | Cumulative size of all the transaction log files in the database. |
Dependent item | mssql.db.logfilessize["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Log file used size | Cumulative size of all the log files in the database. |
Dependent item | mssql.db.logfilesused_size["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Log flushes per second | Number of log flushes per second. |
Dependent item | mssql.db.logflushessec.rate["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Log flush waits per second | Number of commits per second waiting for the log flush. |
Dependent item | mssql.db.logflushwaits_sec.rate["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Log flush wait time | Total wait time (in milliseconds) to flush the log. On an Always On secondary database, this value indicates the wait time for log records to be hardened to disk. |
Dependent item | mssql.db.logflushwait_time["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Log growths | Total number of times the transaction log for the database has been expanded. |
Dependent item | mssql.db.log_growths["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Log shrinks | Total number of times the transaction log for the database has been shrunk. |
Dependent item | mssql.db.log_shrinks["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Log truncations | Number of times the transaction log has been shrunk. |
Dependent item | mssql.db.log_truncations["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Percent log used | Percentage of log space in use. |
Dependent item | mssql.db.percentlogused["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Transactions per second | Number of transactions started for the database per second. |
Dependent item | mssql.db.transactions_sec.rate["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Last diff backup duration | Duration of the last differential backup. |
Dependent item | mssql.backup.diff.duration["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Last diff backup (time ago) | The amount of time since the last differential backup. |
Dependent item | mssql.backup.diff["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Last full backup duration | Duration of the last full backup. |
Dependent item | mssql.backup.full.duration["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Last full backup (time ago) | The amount of time since the last full backup. |
Dependent item | mssql.backup.full["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Last log backup duration | Duration of the last log backup. |
Dependent item | mssql.backup.log.duration["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Last log backup (time ago) | The amount of time since the last log backup. |
Dependent item | mssql.backup.log["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Recovery model | Recovery model selected: 1 = Full 2 = Bulk_logged 3 = Simple |
Dependent item | mssql.backup.recovery_model["{#DBNAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MSSQL DB '{#DBNAME}': State is {ITEM.VALUE} | The DB has a non-working state. |
last(/MSSQL by ODBC/mssql.db.state["{#DBNAME}"])>1 |High |
||
MSSQL DB '{#DBNAME}': Number of commits waiting for the log flush is high | Too many commits are waiting for the log flush. |
min(/MSSQL by ODBC/mssql.db.log_flush_waits_sec.rate["{#DBNAME}"],5m)>{$MSSQL.LOG_FLUSH_WAITS.MAX:"{#DBNAME}"} |Warning |
||
MSSQL DB '{#DBNAME}': Total wait time to flush the log is high | The wait time to flush the log is too long. |
min(/MSSQL by ODBC/mssql.db.log_flush_wait_time["{#DBNAME}"],5m)>{$MSSQL.LOG_FLUSH_WAIT_TIME.MAX:"{#DBNAME}"} |Warning |
||
MSSQL DB '{#DBNAME}': Percent of log usage is high | There's not enough space left in the log. |
min(/MSSQL by ODBC/mssql.db.percent_log_used["{#DBNAME}"],5m)>{$MSSQL.PERCENT_LOG_USED.MAX:"{#DBNAME}"} |Warning |
||
MSSQL DB '{#DBNAME}': Diff backup is old | The differential backup has not been executed for a long time. |
last(/MSSQL by ODBC/mssql.backup.diff["{#DBNAME}"])>{$MSSQL.BACKUP_DIFF.CRIT:"{#DBNAME}"} and {$MSSQL.BACKUP_DIFF.USED:"{#DBNAME}"}=1 |High |
Manual close: Yes | |
MSSQL DB '{#DBNAME}': Diff backup is old | The differential backup has not been executed for a long time. |
last(/MSSQL by ODBC/mssql.backup.diff["{#DBNAME}"])>{$MSSQL.BACKUP_DIFF.WARN:"{#DBNAME}"} and {$MSSQL.BACKUP_DIFF.USED:"{#DBNAME}"}=1 |Warning |
Manual close: Yes Depends on:
|
|
MSSQL DB '{#DBNAME}': Full backup is old | The full backup has not been executed for a long time. |
last(/MSSQL by ODBC/mssql.backup.full["{#DBNAME}"])>{$MSSQL.BACKUP_FULL.CRIT:"{#DBNAME}"} and {$MSSQL.BACKUP_FULL.USED:"{#DBNAME}"}=1 |High |
Manual close: Yes | |
MSSQL DB '{#DBNAME}': Full backup is old | The full backup has not been executed for a long time. |
last(/MSSQL by ODBC/mssql.backup.full["{#DBNAME}"])>{$MSSQL.BACKUP_FULL.WARN:"{#DBNAME}"} and {$MSSQL.BACKUP_FULL.USED:"{#DBNAME}"}=1 |Warning |
Manual close: Yes Depends on:
|
|
MSSQL DB '{#DBNAME}': Log backup is old | The log backup has not been executed for a long time. |
last(/MSSQL by ODBC/mssql.backup.log["{#DBNAME}"])>{$MSSQL.BACKUP_LOG.CRIT:"{#DBNAME}"} and {$MSSQL.BACKUP_LOG.USED:"{#DBNAME}"}=1 and last(/MSSQL by ODBC/mssql.backup.recovery_model["{#DBNAME}"])<>3 |High |
Manual close: Yes | |
MSSQL DB '{#DBNAME}': Log backup is old | The log backup has not been executed for a long time. |
last(/MSSQL by ODBC/mssql.backup.log["{#DBNAME}"])>{$MSSQL.BACKUP_LOG.WARN:"{#DBNAME}"} and {$MSSQL.BACKUP_LOG.USED:"{#DBNAME}"}=1 and last(/MSSQL by ODBC/mssql.backup.recovery_model["{#DBNAME}"])<>3 |Warning |
Manual close: Yes Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Availability group discovery | Discovery of the existing availability groups. |
Dependent item | mssql.availability.group.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MSSQL AG '{#GROUP_NAME}': Primary replica recovery health | Indicates the recovery health of the primary replica: 0 = In progress 1 = Online 2 = Unavailable |
Dependent item | mssql.primaryrecoveryhealth["{#GROUP_NAME}"] Preprocessing
|
MSSQL AG '{#GROUP_NAME}': Primary replica name | Name of the server instance that is hosting the current primary replica. |
Dependent item | mssql.primaryreplica["{#GROUPNAME}"] Preprocessing
|
MSSQL AG '{#GROUP_NAME}': Secondary replica recovery health | Indicates the recovery health of a secondary replica: 0 = In progress 1 = Online 2 = Unavailable |
Dependent item | mssql.secondaryrecoveryhealth["{#GROUP_NAME}"] Preprocessing
|
MSSQL AG '{#GROUP_NAME}': Synchronization health | Reflects a rollup of the 0 = Not healthy. None of the availability replicas have a healthy synchronization. 1 = Partially healthy. The synchronization of some, but not all, availability replicas is healthy. 2 = Healthy. The synchronization of every availability replica is healthy. |
Dependent item | mssql.synchronizationhealth["{#GROUPNAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MSSQL AG '{#GROUP_NAME}': Primary replica recovery health in progress | The primary replica is in the synchronization process. |
last(/MSSQL by ODBC/mssql.primary_recovery_health["{#GROUP_NAME}"])=0 |Warning |
||
MSSQL AG '{#GROUP_NAME}': Secondary replica recovery health in progress | The secondary replica is in the synchronization process. |
last(/MSSQL by ODBC/mssql.secondary_recovery_health["{#GROUP_NAME}"])=0 |Warning |
||
MSSQL AG '{#GROUP_NAME}': All replicas unhealthy | None of the availability replicas have a healthy synchronization. |
last(/MSSQL by ODBC/mssql.synchronization_health["{#GROUP_NAME}"])=0 |Disaster |
||
MSSQL AG '{#GROUP_NAME}': Some replicas unhealthy | The synchronization health of some, but not all, availability replicas is healthy. |
last(/MSSQL by ODBC/mssql.synchronization_health["{#GROUP_NAME}"])=1 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Local database discovery | Discovery of the local availability databases. |
Dependent item | mssql.local.db.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': State | 0 = Online 1 = Restoring 2 = Recovering 3 = Recovery pending 4 = Suspect 5 = Emergency 6 = Offline |
Dependent item | mssql.local_db.state["{#DBNAME}"] Preprocessing
|
MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': Suspended | Database state: 0 = Resumed 1 = Suspended |
Dependent item | mssql.localdb.issuspended["{#DBNAME}"] Preprocessing
|
MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': Synchronization health | Reflects the intersection of the synchronization state of a database that is joined to the availability group on the availability replica and the availability mode of the availability replica (synchronous-commit or asynchronous-commit mode): 0 = Not healthy. The synchronizationstate of the database is 0 ("Not synchronizing"). 1 = Partially healthy. A database on a synchronous-commit availability replica is considered partially healthy if synchronizationstate is 1 ("Synchronizing"). 2 = Healthy. A database on an synchronous-commit availability replica is considered healthy if synchronizationstate is 2 ("Synchronized"), and a database on an asynchronous-commit availability replica is considered healthy if synchronizationstate is 1 ("Synchronizing"). |
Dependent item | mssql.localdb.synchronizationhealth["{#DBNAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': "{#DBNAME}" is {ITEM.VALUE} | The local availability database has a non-working state. |
last(/MSSQL by ODBC/mssql.local_db.state["{#DBNAME}"])>0 |Warning |
||
MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': "{#DBNAME}" is Not healthy | The synchronization state of the local availability database is "Not synchronizing". |
last(/MSSQL by ODBC/mssql.local_db.synchronization_health["{#DBNAME}"])=0 |High |
||
MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': "{#DBNAME}" is Partially healthy | A database on a synchronous-commit availability replica is considered partially healthy if synchronization state is "Synchronizing". |
last(/MSSQL by ODBC/mssql.local_db.synchronization_health["{#DBNAME}"])=1 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Non-local database discovery | Discovery of the non-local (not local to SQL Server instance) availability databases. |
Dependent item | mssql.non.local.db.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MSSQL AG '{#GROUPNAME}' Non-Local DB '*{#REPLICANAME}*{#DBNAME}': Log queue size | Amount of the log records of the primary database that has not been sent to the secondary databases. |
Dependent item | mssql.non-localdb.logsendqueuesize["{#GROUPNAME}*{#REPLICANAME}*{#DBNAME}"] Preprocessing
|
MSSQL AG '{#GROUPNAME}' Non-Local DB '*{#REPLICANAME}*{#DBNAME}': Redo log queue size | Amount of log records in the log files of the secondary replica that has not yet been redone. |
Dependent item | mssql.non-localdb.redoqueuesize["{#GROUPNAME}{#REPLICA_NAME}{#DBNAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MSSQL AG '{#GROUPNAME}' Non-Local DB '*{#REPLICANAME}*{#DBNAME}': Log queue size is growing | The log records of the primary database are not sent to the secondary databases. |
last(/MSSQL by ODBC/mssql.non-local_db.log_send_queue_size["{#GROUP_NAME}*{#REPLICA_NAME}*{#DBNAME}"],#1)>last(/MSSQL by ODBC/mssql.non-local_db.log_send_queue_size["{#GROUP_NAME}*{#REPLICA_NAME}*{#DBNAME}"],#2) and last(/MSSQL by ODBC/mssql.non-local_db.log_send_queue_size["{#GROUP_NAME}*{#REPLICA_NAME}*{#DBNAME}"],#2)>last(/MSSQL by ODBC/mssql.non-local_db.log_send_queue_size["{#GROUP_NAME}*{#REPLICA_NAME}*{#DBNAME}"],#3) |High |
||
MSSQL AG '{#GROUPNAME}' Non-Local DB '*{#REPLICANAME}*{#DBNAME}': Redo log queue size is growing | The log records in the log files of the secondary replica have not yet been redone. |
last(/MSSQL by ODBC/mssql.non-local_db.redo_queue_size["{#GROUP_NAME}*{#REPLICA_NAME}*{#DBNAME}"],#1)>last(/MSSQL by ODBC/mssql.non-local_db.redo_queue_size["{#GROUP_NAME}*{#REPLICA_NAME}*{#DBNAME}"],#2) and last(/MSSQL by ODBC/mssql.non-local_db.redo_queue_size["{#GROUP_NAME}*{#REPLICA_NAME}*{#DBNAME}"],#2)>last(/MSSQL by ODBC/mssql.non-local_db.redo_queue_size["{#GROUP_NAME}*{#REPLICA_NAME}*{#DBNAME}"],#3) |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Quorum discovery | Discovery of the quorum of the WSFC cluster. |
Dependent item | mssql.quorum.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MSSQL Cluster '{#CLUSTER_NAME}': Quorum type | Type of quorum used by this WSFC cluster, one of: 0 = Node Majority. This quorum configuration can sustain failures of half the nodes (rounding up) minus one. 1 = Node and Disk Majority. If the disk witness remains on line, this quorum configuration can sustain failures of half the nodes (rounding up). 2 = Node and File Share Majority. This quorum configuration works in a similar way to Node and Disk Majority, but uses a file-share witness instead of a disk witness. 3 = No Majority: Disk Only. If the quorum disk is online, this quorum configuration can sustain failures of all nodes except one. 4 = Unknown Quorum. Unknown quorum for the cluster. 5 = Cloud Witness. Cluster utilizes Microsoft Azure for quorum arbitration. If the cloud witness is available, the cluster can sustain the failure of half the nodes (rounding up). |
Dependent item | mssql.quorum.type.[{#CLUSTER_NAME}] Preprocessing
|
MSSQL Cluster '{#CLUSTER_NAME}': Quorum state | State of the WSFC quorum, one of: 0 = Unknown quorum state 1 = Normal quorum 2 = Forced quorum |
Dependent item | mssql.quorum.state.[{#CLUSTER_NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Quorum members discovery | Discovery of the quorum members of the WSFC cluster. |
Dependent item | mssql.quorum.member.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MSSQL Cluster member '{#MEMBER_NAME}': Number of quorum votes | Number of quorum votes possessed by this quorum member. |
Dependent item | mssql.quorummembers.numberofquorumvotes.[{#MEMBER_NAME}] Preprocessing
|
MSSQL Cluster member '{#MEMBER_NAME}': Member type | The type of member, one of: 0 = WSFC node 1 = Disk witness 2 = File share witness 3 = Cloud Witness |
Dependent item | mssql.quorummembers.membertype.[{#MEMBER_NAME}] Preprocessing
|
MSSQL Cluster member '{#MEMBER_NAME}': Member state | The member state, one of: 0 = Offline 1 = Online |
Dependent item | mssql.quorummembers.memberstate.[{#MEMBER_NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Replication discovery | Discovery of the database replicas. |
Dependent item | mssql.replica.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Connected state | Whether a secondary replica is currently connected to the primary replica: 0 = Disconnected. The response of an availability replica to the "Disconnected" state depends on its role: On the primary replica, if a secondary replica is disconnected, its secondary databases are marked as "Not synchronized" on the primary replica, which waits for the secondary to reconnect; On a secondary replica, upon detecting that it is disconnected, the secondary replica attempts to reconnect to the primary replica. 1 = Connected. Each primary replica tracks the connection state for every secondary replica in the same availability group. Secondary replicas track the connection state of only the primary replica. |
Dependent item | mssql.replica.connectedstate["{#GROUPNAME}{#REPLICANAME}"] Preprocessing
|
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Is local | Whether the replica is local: 0 = Indicates a remote secondary replica in an availability group whose primary replica is hosted by the local server instance. This value occurs only on the primary replica location. 1 = Indicates a local replica. On secondary replicas, this is the only available value for the availability group to which the replica belongs. |
Dependent item | mssql.replica.islocal["{#GROUPNAME}{#REPLICANAME}"] Preprocessing
|
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Join state | 0 = Not joined 1 = Joined, standalone instance 2 = Joined, failover cluster instance |
Dependent item | mssql.replica.joinstate["{#GROUPNAME}{#REPLICANAME}"] Preprocessing
|
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Operational state | Current operational state of the replica: 0 = Pending failover 1 = Pending 2 = Online 3 = Offline 4 = Failed 5 = Failed, no quorum 6 = Not local |
Dependent item | mssql.replica.operationalstate["{#GROUPNAME}{#REPLICANAME}"] Preprocessing
|
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Recovery health | Rollup of the "databasestate" column of the 0 = In progress. At least one joined database has a database state other than "Online" (databasestate is not "0"). 1 = Online. All the joined databases have a database state of "Online" (database_state is "0"). |
Dependent item | mssql.replica.recoveryhealth["{#GROUPNAME}{#REPLICANAME}"] Preprocessing
|
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Role | Current Always On availability group role of a local replica or a connected remote replica: 0 = Resolving 1 = Primary 2 = Secondary |
Dependent item | mssql.replica.role["{#GROUPNAME}{#REPLICA_NAME}"] Preprocessing
|
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Sync health | Reflects a rollup of the database synchronization state (synchronization_state) of all joined availability databases (also known as replicas) and the availability mode of the replica (synchronous-commit or asynchronous-commit mode). The rollup will reflect the least healthy accumulated state of the databases on the replica: 0 = Not healthy. At least one joined database is in the "Not synchronizing" state. 1 = Partially healthy. Some replicas are not in the target synchronization state: synchronous-commit replicas should be synchronized, and asynchronous-commit replicas should be synchronizing. 2 = Healthy. All replicas are in the target synchronization state: synchronous-commit replicas are synchronized, and asynchronous-commit replicas are synchronizing. |
Dependent item | mssql.replica.synchronizationhealth["{#GROUPNAME}{#REPLICANAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is disconnected | The response of an availability replica to the "Disconnected" state depends on its role: |
last(/MSSQL by ODBC/mssql.replica.connected_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=0 and last(/MSSQL by ODBC/mssql.replica.role["{#GROUP_NAME}_{#REPLICA_NAME}"])=2 |Warning |
||
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is {ITEM.VALUE} | The operational state of the replica in a given availability group is "Pending" or "Offline". |
last(/MSSQL by ODBC/mssql.replica.operational_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=0 or last(/MSSQL by ODBC/mssql.replica.operational_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=1 or last(/MSSQL by ODBC/mssql.replica.operational_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=3 |Warning |
||
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is {ITEM.VALUE} | The operational state of the replica in a given availability group is "Failed". |
last(/MSSQL by ODBC/mssql.replica.operational_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=4 |Average |
||
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is {ITEM.VALUE} | The operational state of the replica in a given availability group is "Failed, no quorum". |
last(/MSSQL by ODBC/mssql.replica.operational_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=5 |High |
||
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} Recovery in progress | At least one joined database has a database state other than "Online". |
last(/MSSQL by ODBC/mssql.replica.recovery_health["{#GROUP_NAME}_{#REPLICA_NAME}"])=0 |Info |
||
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is Not healthy | At least one joined database is in the "Not synchronizing" state. |
last(/MSSQL by ODBC/mssql.replica.synchronization_health["{#GROUP_NAME}_{#REPLICA_NAME}"])=0 |Average |
||
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is Partially healthy | Some replicas are not in the target synchronization state: synchronous-commit replicas should be synchronized, and asynchronous-commit replicas should be synchronizing. |
last(/MSSQL by ODBC/mssql.replica.synchronization_health["{#GROUP_NAME}_{#REPLICA_NAME}"])=1 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Mirroring discovery | To see the row for a database other than master or tempdb, you must either be the database owner or have at least ALTER ANY DATABASE or VIEW ANY DATABASE server-level permission or CREATE DATABASE permission in the master database. To see non-NULL values on a mirror database, you must be a member of the sysadmin fixed server role. |
Dependent item | mssql.mirroring.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MSSQL Mirroring '{#DBNAME}': Role | Current role of the local database plays in the database mirroring session. 1 = Principal 2 = Mirror |
Dependent item | mssql.mirroring.role["{#DBNAME}"] Preprocessing
|
MSSQL Mirroring '{#DBNAME}': Role sequence | The number of times that mirroring partners have switched the principal and mirror roles due to a failover or forced service. |
Dependent item | mssql.mirroring.role_sequence["{#DBNAME}"] Preprocessing
|
MSSQL Mirroring '{#DBNAME}': State | State of the mirror database and of the database mirroring session. 0 = Suspended 1 = Disconnected from the other partner 2 = Synchronizing 3 = Pending failover 4 = Synchronized 5 = The partners are not synchronized. Failover is not possible now. 6 = The partners are synchronized. Failover is potentially possible. For information about the requirements for the failover, see Database Mirroring Operating Modes: https://learn.microsoft.com/en-us/sql/database-engine/database-mirroring/database-mirroring-operating-modes?view=sql-server-ver16. |
Dependent item | mssql.mirroring.state["{#DBNAME}"] Preprocessing
|
MSSQL Mirroring '{#DBNAME}': Witness state | State of the witness in the database mirroring session of the database: 0 = Unknown 1 = Connected 2 = Disconnected |
Dependent item | mssql.mirroring.witness_state["{#DBNAME}"] Preprocessing
|
MSSQL Mirroring '{#DBNAME}': Safety level | Safety setting for updates on the mirror database: 0 = Unknown state 1 = Off [asynchronous] 2 = Full [synchronous] |
Dependent item | mssql.mirroring.safety_level["{#DBNAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MSSQL Mirroring '{#DBNAME}': "{#DBNAME}" is {ITEM.VALUE} | The state of the mirror database and of the database mirroring session is "Suspended", "Disconnected from the other partner", or "Synchronizing". |
last(/MSSQL by ODBC/mssql.mirroring.state["{#DBNAME}"])>=0 and last(/MSSQL by ODBC/mssql.mirroring.state["{#DBNAME}"])<=2 |Info |
||
MSSQL Mirroring '{#DBNAME}': "{#DBNAME}" is {ITEM.VALUE} | The state of the mirror database and of the database mirroring session is "Pending failover". |
last(/MSSQL by ODBC/mssql.mirroring.state["{#DBNAME}"])=3 |Warning |
||
MSSQL Mirroring '{#DBNAME}': "{#DBNAME}" is {ITEM.VALUE} | The state of the mirror database and of the database mirroring session is "Not synchronized". The partners are not synchronized. A failover is not possible now. |
last(/MSSQL by ODBC/mssql.mirroring.state["{#DBNAME}"])=5 |High |
||
MSSQL Mirroring '{#DBNAME}': "{#DBNAME}" Witness is disconnected | The state of the witness in the database mirroring session of the database is "Disconnected". |
last(/MSSQL by ODBC/mssql.mirroring.witness_state["{#DBNAME}"])=2 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Job discovery | Scanning jobs in DBMS. |
Dependent item | mssql.job.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MSSQL Job '{#JOBNAME}': Get job status | The item gets the status of SQL agent job {#JOBNAME}. |
Dependent item | mssql.job.status_raw["{#JOBNAME}"] Preprocessing
|
MSSQL Job '{#JOBNAME}': Enabled | The possible values of the job status: 0 = Disabled 1 = Enabled |
Dependent item | mssql.job.enabled["{#JOBNAME}"] Preprocessing
|
MSSQL Job '{#JOBNAME}': Last run date-time | The last date-time of the job run. |
Dependent item | mssql.job.lastrundatetime["{#JOBNAME}"] Preprocessing
|
MSSQL Job '{#JOBNAME}': Next run date-time | The next date-time of the job run. |
Dependent item | mssql.job.nextrundatetime["{#JOBNAME}"] Preprocessing
|
MSSQL Job '{#JOBNAME}': Last run status message | An informational message about the last run of the job. |
Dependent item | mssql.job.lastrunstatusmessage["{#JOBNAME}"] Preprocessing
|
MSSQL Job '{#JOBNAME}': Run status | The possible values of the job status: 0 ⇒ Failed 1 ⇒ Succeeded 2 ⇒ Retry 3 ⇒ Canceled 4 ⇒ Running |
Dependent item | mssql.job.runstatus["{#JOBNAME}"] Preprocessing
|
MSSQL Job '{#JOBNAME}': Run duration | Duration of the last-run job. |
Dependent item | mssql.job.run_duration["{#JOBNAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MSSQL Job '{#JOBNAME}': Failed to run | The last run of the job has failed. |
last(/MSSQL by ODBC/mssql.job.runstatus["{#JOBNAME}"])=0 |Warning |
Manual close: Yes | |
MSSQL Job '{#JOBNAME}': Job duration is high | The job is taking too long. |
last(/MSSQL by ODBC/mssql.job.run_duration["{#JOBNAME}"])>{$MSSQL.BACKUP_DURATION.WARN:"{#JOBNAME}"} |Warning |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of MSSQL monitoring by Zabbix via Zabbix agent 2 and doesn't require any external scripts.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Important! Starting with Zabbix 6.0.39, the MSSQL plugin must be updated to a version equal to or above 6.0.39.
Loadable plugin requires installation of a separate package or binary file or compilation from sources.
View Server State and View Any Definition permissions should be granted to the user.
Grant this user read permissions to the sysjobschedules
, sysjobhistory
, and sysjobs
tables.
For example, using T-SQL commands:
GRANT SELECT ON OBJECT::msdb.dbo.sysjobs TO zbx_monitor;
GRANT SELECT ON OBJECT::msdb.dbo.sysjobservers TO zbx_monitor;
GRANT SELECT ON OBJECT::msdb.dbo.sysjobactivity TO zbx_monitor;
GRANT EXECUTE ON OBJECT::msdb.dbo.agent_datetime TO zbx_monitor;
For more information, see MSSQL documentation:
Configure a User to Create and Manage SQL Server Agent Jobs
Set the username and password in the host macros {$MSSQL.USER}
and {$MSSQL.PASSWORD}
.
Set the connection string for the MSSQL instance in the {$MSSQL.URI}
macro as a URI, such as <protocol://host:port>
, or specify the named session - <sessionname>
.
The Service's TCP port state
item uses the {HOST.CONN}
and {$MSSQL.PORT}
macros to check the availability of the MSSQL instance. Keep in mind that if dynamic ports are used on the MSSQL server side, this check will not work correctly.
Note: You can use the context macros {$MSSQL.BACKUP_FULL.USED}
, {$MSSQL.BACKUP_LOG.USED}
, and {$MSSQL.BACKUP_DIFF.USED}
to disable backup age triggers for a certain database. If set to a value other than "1", the trigger expression for the backup age will not fire.
Note: Since version 6.0.36, you can also connect to the MSSQL instance using its name. To do this, set the connection string in the {$MSSQL.URI}
macro as <protocol://host/instance_name>
.
Name | Description | Default |
---|---|---|
{$MSSQL.URI} | Connection string. |
<Put your URI here> |
{$MSSQL.USER} | MSSQL username. |
<Put your username here> |
{$MSSQL.PASSWORD} | MSSQL user password. |
<Put your password here> |
{$MSSQL.PORT} | MSSQL TCP port. |
1433 |
{$MSSQL.DBNAME.MATCHES} | This macro is used in database discovery. It can be overridden on the host or linked template level. |
.* |
{$MSSQL.DBNAME.NOT_MATCHES} | This macro is used in database discovery. It can be overridden on the host or linked template level. |
master|tempdb|model|msdb |
{$MSSQL.WORK_FILES.MAX} | The maximum number of work files created per second - for the trigger expression. |
20 |
{$MSSQL.WORK_TABLES.MAX} | The maximum number of work tables created per second - for the trigger expression. |
20 |
{$MSSQL.WORKTABLESFROMCACHE_RATIO.MIN.CRIT} | The minimum percentage of work tables from the cache ratio - for the High trigger expression. |
90 |
{$MSSQL.BUFFERCACHERATIO.MIN.CRIT} | The minimum buffer cache hit ratio, in percent - for the High trigger expression. |
30 |
{$MSSQL.BUFFERCACHERATIO.MIN.WARN} | The minimum buffer cache hit ratio, in percent - for the Warning trigger expression. |
50 |
{$MSSQL.FREELISTSTALLS.MAX} | The maximum free list stalls per second - for the trigger expression. |
2 |
{$MSSQL.LAZY_WRITES.MAX} | The maximum lazy writes per second - for the trigger expression. |
20 |
{$MSSQL.PAGELIFEEXPECTANCY.MIN} | The minimum page life expectancy - for the trigger expression. |
300 |
{$MSSQL.PAGE_READS.MAX} | The maximum page reads per second - for the trigger expression. |
90 |
{$MSSQL.PAGE_WRITES.MAX} | The maximum page writes per second - for the trigger expression. |
90 |
{$MSSQL.AVERAGEWAITTIME.MAX} | The maximum average wait time, in milliseconds - for the trigger expression. |
500 |
{$MSSQL.LOCK_REQUESTS.MAX} | The maximum lock requests per second - for the trigger expression. |
1000 |
{$MSSQL.LOCK_TIMEOUTS.MAX} | The maximum lock timeouts per second - for the trigger expression. |
1 |
{$MSSQL.DEADLOCKS.MAX} | The maximum deadlocks per second - for the trigger expression. |
1 |
{$MSSQL.LOGFLUSHWAITS.MAX} | The maximum log flush waits per second - for the trigger expression. |
1 |
{$MSSQL.LOGFLUSHWAIT_TIME.MAX} | The maximum log flush wait time, in milliseconds - for the trigger expression. |
1 |
{$MSSQL.PERCENTLOGUSED.MAX} | The maximum percentage of log used - for the trigger expression. |
80 |
{$MSSQL.PERCENT_COMPILATIONS.MAX} | The maximum percentage of Transact-SQL compilations - for the trigger expression. |
10 |
{$MSSQL.PERCENT_RECOMPILATIONS.MAX} | The maximum percentage of Transact-SQL recompilations - for the trigger expression. |
10 |
{$MSSQL.PERCENT_READAHEAD.MAX} | The maximum percentage of pages read per second in anticipation of use - for the trigger expression. |
20 |
{$MSSQL.BACKUP_DIFF.CRIT} | The maximum of days without a differential backup - for the High trigger expression. |
6d |
{$MSSQL.BACKUP_DIFF.WARN} | The maximum of days without a differential backup - for the Warning trigger expression. |
3d |
{$MSSQL.BACKUP_FULL.CRIT} | The maximum of days without a full backup - for the High trigger expression. |
10d |
{$MSSQL.BACKUP_FULL.WARN} | The maximum of days without a full backup - for the Warning trigger expression. |
9d |
{$MSSQL.BACKUP_LOG.CRIT} | The maximum of days without a log backup - for the High trigger expression. |
8h |
{$MSSQL.BACKUP_LOG.WARN} | The maximum of days without a log backup - for the Warning trigger expression. |
4h |
{$MSSQL.JOB.MATCHES} | This macro is used in job discovery. It can be overridden on the host or linked template level. |
.* |
{$MSSQL.JOB.NOT_MATCHES} | This macro is used in job discovery. It can be overridden on the host or linked template level. |
CHANGE_IF_NEEDED |
{$MSSQL.BACKUP_DURATION.WARN} | The maximum job duration - for the Warning trigger expression. |
1h |
{$MSSQL.BACKUP_FULL.USED} | The flag for checking the age of a full backup. If set to a value other than "1", the trigger expression for the full backup age will not fire. Can be used with context for database name. |
1 |
{$MSSQL.BACKUP_LOG.USED} | The flag for checking the age of a log backup. If set to a value other than "1", the trigger expression for the log backup age will not fire. Can be used with context for database name. |
1 |
{$MSSQL.BACKUP_DIFF.USED} | The flag for checking the age of a differential backup. If set to a value other than "1", the trigger expression for the differential backup age will not fire. Can be used with context for database name. |
1 |
{$MSSQL.QUORUM.MEMBER.DISCOVERY.NAME.MATCHES} | Filter to include discovered quorum member by name. |
.* |
{$MSSQL.QUORUM.MEMBER.DISCOVERY.NAME.NOT_MATCHES} | Filter to exclude discovered quorum member by name. |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
MSSQL: Service's TCP port state | Test the availability of MSSQL Server on a TCP port. |
Simple check | net.tcp.service[tcp,{HOST.CONN},{$MSSQL.PORT}] Preprocessing
|
MSSQL: Get last backup | The item gets information about backup processes. |
Zabbix agent | mssql.last.backup.get["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"] |
MSSQL: Get job status | The item gets the SQL agent job status. |
Zabbix agent | mssql.job.status.get["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"] |
MSSQL: Get performance counters | The item gets server global status information. |
Zabbix agent | mssql.perfcounter.get["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"] |
MSSQL: Get availability groups | The item gets availability group states - name, primary and secondary health, synchronization health. |
Zabbix agent | mssql.availability.group.get["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"] |
MSSQL: Get local DB | Getting the states of the local availability database. |
Zabbix agent | mssql.local.db.get["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"] |
MSSQL: Get DB mirroring | Getting DB mirroring. |
Zabbix agent | mssql.mirroring.get["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"] |
MSSQL: Get non-local DB | Getting the non-local availability database. |
Zabbix agent | mssql.nonlocal.db.get["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"] |
MSSQL: Get replica | Getting the database replica. |
Zabbix agent | mssql.replica.get["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"] |
MSSQL: Get quorum | Getting quorum - cluster name, type, and state. |
Zabbix agent | mssql.quorum.get["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"] |
MSSQL: Get quorum member | Getting quorum members - member name, type, state, and number of quorum votes. |
Zabbix agent | mssql.quorum.member.get["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"] |
MSSQL: Get database | Getting databases - database name and recovery model. |
Zabbix agent | mssql.db.get["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"] |
MSSQL: Version | MSSQL Server version. |
Zabbix agent | mssql.version["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"] Preprocessing
|
MSSQL: Uptime | MSSQL Server uptime in the format "N days, hh:mm:ss". |
Dependent item | mssql.uptime Preprocessing
|
MSSQL: Get Access Methods counters | The item gets server information about access methods. |
Dependent item | mssql.access_methods.raw Preprocessing
|
MSSQL: Forwarded records per second | Number of records per second fetched through forwarded record pointers. |
Dependent item | mssql.forwardedrecordssec.rate Preprocessing
|
MSSQL: Full scans per second | Number of unrestricted full scans per second. These can be either base-table or full-index scans. Values greater than 1 or 2 indicate that there are table / index page scans. If that is combined with high CPU, this counter requires further investigation, otherwise, if the full scans are on small tables, it can be ignored. |
Dependent item | mssql.fullscanssec.rate Preprocessing
|
MSSQL: Index searches per second | Number of index searches per second. These are used to start a range scan, reposition a range scan, revalidate a scan point, fetch a single index record, and search down the index to locate where to insert a new row. |
Dependent item | mssql.indexsearchessec.rate Preprocessing
|
MSSQL: Page splits per second | Number of page splits per second that occur as a result of overflowing index pages. |
Dependent item | mssql.pagesplitssec.rate Preprocessing
|
MSSQL: Work files created per second | Number of work files created per second. For example, work files can be used to store temporary results for hash joins and hash aggregates. |
Dependent item | mssql.workfilescreatedsec.rate Preprocessing
|
MSSQL: Work tables created per second | Number of work tables created per second. For example, work tables can be used to store temporary results for query spool, LOB variables, XML variables, and cursors. |
Dependent item | mssql.worktablescreatedsec.rate Preprocessing
|
MSSQL: Table lock escalations per second | Number of times locks on a table were escalated to the TABLE or HoBT granularity. |
Dependent item | mssql.tablelockescalations.rate Preprocessing
|
MSSQL: Worktables from cache ratio | Percentage of work tables created where the initial two pages of the work table were not allocated but were immediately available from the work table cache. |
Dependent item | mssql.worktablesfromcache_ratio Preprocessing
|
MSSQL: Get Buffer Manager counters | The item gets server information about the buffer pool. |
Dependent item | mssql.buffer_manager.raw Preprocessing
|
MSSQL: Buffer cache hit ratio | Indicates the percentage of pages found in the buffer cache without having to read from the disk. The ratio is the total number of cache hits divided by the total number of cache lookups over the last few thousand page accesses. After a long period of time, the ratio changes very little. Since reading from the cache is much less expensive than reading from the disk, a higher value is preferred for this item. To increase the buffer cache hit ratio, consider increasing the amount of memory available to MSSQL Server or using the buffer pool extension feature. |
Dependent item | mssql.buffercachehit_ratio Preprocessing
|
MSSQL: Checkpoint pages per second | Indicates the number of pages flushed to the disk per second by a checkpoint or other operation which required all dirty pages to be flushed. |
Dependent item | mssql.checkpointpagessec.rate Preprocessing
|
MSSQL: Database pages | Indicates the number of pages in the buffer pool with database content. |
Dependent item | mssql.database_pages Preprocessing
|
MSSQL: Free list stalls per second | Indicates the number of requests per second that had to wait for a free page. |
Dependent item | mssql.freeliststalls_sec.rate Preprocessing
|
MSSQL: Lazy writes per second | Indicates the number of buffers written per second by the buffer manager's lazy writer. The lazy writer is a system process that flushes out batches of dirty, aged buffers (buffers that contain changes that must be written back to the disk before the buffer can be reused for a different page) and makes them available to user processes. The lazy writer eliminates the need to perform frequent checkpoints in order to create available buffers. |
Dependent item | mssql.lazywritessec.rate Preprocessing
|
MSSQL: Page life expectancy | Indicates the number of seconds a page will stay in the buffer pool without references. |
Dependent item | mssql.pagelifeexpectancy Preprocessing
|
MSSQL: Page lookups per second | Indicates the number of requests per second to find a page in the buffer pool. |
Dependent item | mssql.pagelookupssec.rate Preprocessing
|
MSSQL: Page reads per second | Indicates the number of physical database page reads that are issued per second. This statistic displays the total number of physical page reads across all databases. As physical I/O is expensive, you may be able to minimize the cost either by using a larger data cache, intelligent indexes, and more efficient queries, or by changing the database design. |
Dependent item | mssql.pagereadssec.rate Preprocessing
|
MSSQL: Page writes per second | Indicates the number of physical database page writes that are issued per second. |
Dependent item | mssql.pagewritessec.rate Preprocessing
|
MSSQL: Read-ahead pages per second | Indicates the number of pages read per second in anticipation of use. |
Dependent item | mssql.readaheadpagessec.rate Preprocessing
|
MSSQL: Target pages | The optimal number of pages in the buffer pool. |
Dependent item | mssql.target_pages Preprocessing
|
MSSQL: Get DB counters | The item gets summary information about databases. |
Dependent item | mssql.db_info.raw Preprocessing
|
MSSQL: Total data file size | Total size of all data files. |
Dependent item | mssql.datafilessize Preprocessing
|
MSSQL: Total log file size | Total size of all the transaction log files. |
Dependent item | mssql.logfilessize Preprocessing
|
MSSQL: Total log file used size | The cumulative size of all the log files in the database. |
Dependent item | mssql.logfilesused_size Preprocessing
|
MSSQL: Total transactions per second | Total number of transactions started for all databases per second. |
Dependent item | mssql.transactions_sec.rate Preprocessing
|
MSSQL: Get General Statistics counters | The item gets general statistics information. |
Dependent item | mssql.general_statistics.raw Preprocessing
|
MSSQL: Logins per second | Total number of logins started per second. This does not include pooled connections. Any value over 2 may indicate insufficient connection pooling. |
Dependent item | mssql.logins_sec.rate Preprocessing
|
MSSQL: Logouts per second | Total number of logout operations started per second. Any value over 2 may indicate insufficient connection pooling. |
Dependent item | mssql.logouts_sec.rate Preprocessing
|
MSSQL: Number of blocked processes | Number of currently blocked processes. |
Dependent item | mssql.processes_blocked Preprocessing
|
MSSQL: Number of users connected | Number of users connected to MSSQL Server. |
Dependent item | mssql.user_connections Preprocessing
|
MSSQL: Average latch wait time | Average latch wait time (in milliseconds) for latch requests that had to wait. |
Calculated | mssql.averagelatchwait_time |
MSSQL: Get Latches counters | The item gets server information about latches. |
Dependent item | mssql.latches_info.raw Preprocessing
|
MSSQL: Average latch wait time raw | Average latch wait time (in milliseconds) for latch requests that had to wait. |
Dependent item | mssql.averagelatchwaittimeraw Preprocessing
|
MSSQL: Average latch wait time base | For internal use only. |
Dependent item | mssql.averagelatchwaittimebase Preprocessing
|
MSSQL: Latch waits per second | The number of latch requests that could not be granted immediately. Latches are lightweight means of holding a very transient server resource, such as an address in memory. |
Dependent item | mssql.latchwaitssec.rate Preprocessing
|
MSSQL: Total latch wait time | Total latch wait time (in milliseconds) for latch requests in the last second. This value should stay stable compared to the number of latch waits per second. |
Dependent item | mssql.totallatchwait_time Preprocessing
|
MSSQL: Total average wait time | The average wait time, in milliseconds, for each lock request that had to wait. |
Calculated | mssql.averagewaittime |
MSSQL: Get Locks counters | The item gets server information about locks. |
Dependent item | mssql.locks_info.raw Preprocessing
|
MSSQL: Total average wait time raw | Average amount of wait time (in milliseconds) for each lock request that resulted in a wait. Information for all locks. |
Dependent item | mssql.averagewaittime_raw Preprocessing
|
MSSQL: Total average wait time base | For internal use only. |
Dependent item | mssql.averagewaittime_base Preprocessing
|
MSSQL: Total lock requests per second | Number of new locks and lock conversions per second requested from the lock manager. |
Dependent item | mssql.lockrequestssec.rate Preprocessing
|
MSSQL: Total lock requests per second that timed out | Number of timed out lock requests per second, including requests for NOWAIT locks. |
Dependent item | mssql.locktimeoutssec.rate Preprocessing
|
MSSQL: Total lock requests per second that required waiting | Number of lock requests per second that required the caller to wait. |
Dependent item | mssql.lockwaitssec.rate Preprocessing
|
MSSQL: Lock wait time | Average of total wait time (in milliseconds) for locks in the last second. |
Dependent item | mssql.lockwaittime Preprocessing
|
MSSQL: Total lock requests per second that have deadlocks | Number of lock requests per second that resulted in a deadlock. |
Dependent item | mssql.numberdeadlockssec.rate Preprocessing
|
MSSQL: Get Memory counters | The item gets memory information. |
Dependent item | mssql.mem_manager.raw Preprocessing
|
MSSQL: Granted Workspace Memory | Specifies the total amount of memory currently granted to executing processes, such as hash, sort, bulk copy, and index creation operations. |
Dependent item | mssql.grantedworkspacememory Preprocessing
|
MSSQL: Maximum workspace memory | Indicates the maximum amount of memory available for executing processes, such as hash, sort, bulk copy, and index creation operations. |
Dependent item | mssql.maximumworkspacememory Preprocessing
|
MSSQL: Memory grants outstanding | Specifies the total number of processes that have successfully acquired a workspace memory grant. |
Dependent item | mssql.memorygrantsoutstanding Preprocessing
|
MSSQL: Memory grants pending | Specifies the total number of processes waiting for a workspace memory grant. |
Dependent item | mssql.memorygrantspending Preprocessing
|
MSSQL: Target server memory | Indicates the ideal amount of memory the server can consume. |
Dependent item | mssql.targetservermemory Preprocessing
|
MSSQL: Total server memory | Specifies the amount of memory the server has committed using the memory manager. |
Dependent item | mssql.totalservermemory Preprocessing
|
MSSQL: Get Cache counters | The item gets server information about cache. |
Dependent item | mssql.cache_info.raw Preprocessing
|
MSSQL: Cache hit ratio | Ratio between cache hits and lookups. |
Dependent item | mssql.cachehitratio Preprocessing
|
MSSQL: Cache object counts | Number of cache objects in the cache. |
Dependent item | mssql.cacheobjectcounts Preprocessing
|
MSSQL: Cache objects in use | Number of cache objects in use. |
Dependent item | mssql.cacheobjectsin_use Preprocessing
|
MSSQL: Cache pages | Number of 8-kilobyte (KB) pages used by cache objects. |
Dependent item | mssql.cache_pages Preprocessing
|
MSSQL: Get SQL Errors counters | The item gets SQL error information. |
Dependent item | mssql.sql_errors.raw Preprocessing
|
MSSQL: Errors per second (DB offline errors) | Number of errors per second. |
Dependent item | mssql.offlineerrorssec.rate Preprocessing
|
MSSQL: Errors per second (Info errors) | Number of errors per second. |
Dependent item | mssql.infoerrorssec.rate Preprocessing
|
MSSQL: Errors per second (Kill connection errors) | Number of errors per second. |
Dependent item | mssql.killconnectionerrors_sec.rate Preprocessing
|
MSSQL: Errors per second (User errors) | Number of errors per second. |
Dependent item | mssql.usererrorssec.rate Preprocessing
|
MSSQL: Total errors per second | Number of errors per second. |
Dependent item | mssql.errors_sec.rate Preprocessing
|
MSSQL: Get SQL Statistics counters | The item gets SQL statistics information. |
Dependent item | mssql.sql_statistics.raw Preprocessing
|
MSSQL: Auto-param attempts per second | Number of auto-parameterization attempts per second. The total should be the sum of the failed, safe, and unsafe auto-parameterizations. Auto-parameterization occurs when an instance of SQL Server tries to parameterize a Transact-SQL request by replacing some literals with parameters so that reuse of the resulting cached execution plan across multiple similar-looking requests is possible. Note that auto-parameterizations are also known as simple parameterizations in the newer versions of SQL Server. This counter does not include forced parameterizations. |
Dependent item | mssql.autoparamattemptssec.rate Preprocessing
|
MSSQL: Batch requests per second | Number of Transact-SQL command batches received per second. This statistic is affected by all constraints (such as I/O, number of users, cache size, complexity of requests, and so on). High batch requests mean good throughput. |
Dependent item | mssql.batchrequestssec.rate Preprocessing
|
MSSQL: Percent of ad hoc queries running | The ratio of SQL compilations per second to batch requests per second, in percent. |
Calculated | mssql.percentofadhoc_queries |
MSSQL: Percent of Recompiled Transact-SQL Objects | The ratio of SQL re-compilations per second to SQL compilations per second, in percent. |
Calculated | mssql.percentrecompilationsto_compilations |
MSSQL: Full scans to Index searches ratio | The ratio of full scans per second to index searches per second. The threshold recommendation is strictly for OLTP workloads. |
Calculated | mssql.scantosearch |
MSSQL: Failed auto-params per second | Number of failed auto-parameterization attempts per second. This number should be small. Note that auto-parameterizations are also known as simple parameterizations in the newer versions of SQL Server. |
Dependent item | mssql.failedautoparamssec.rate Preprocessing
|
MSSQL: Safe auto-params per second | Number of safe auto-parameterization attempts per second. Safe refers to a determination that a cached execution plan can be shared between different similar-looking Transact-SQL statements. SQL Server makes many auto-parameterization attempts, some of which turn out to be safe and others fail. Note that auto-parameterizations are also known as simple parameterizations in the newer versions of SQL Server. This does not include forced parameterizations. |
Dependent item | mssql.safeautoparamssec.rate Preprocessing
|
MSSQL: SQL compilations per second | Number of SQL compilations per second. Indicates the number of times the compile code path is entered. Includes runs caused by statement-level recompilations in SQL Server. After SQL Server user activity is stable, this value reaches a steady state. |
Dependent item | mssql.sqlcompilationssec.rate Preprocessing
|
MSSQL: SQL re-compilations per second | Number of statement recompiles per second. Counts the number of times statement recompiles are triggered. Generally, you want the recompiles to be low. |
Dependent item | mssql.sqlrecompilationssec.rate Preprocessing
|
MSSQL: Unsafe auto-params per second | Number of unsafe auto-parameterization attempts per second. For example, the query has some characteristics that prevent the cached plan from being shared. These are designated as unsafe. This does not count the number of forced parameterizations. |
Dependent item | mssql.unsafeautoparamssec.rate Preprocessing
|
MSSQL: Total transactions number | The number of currently active transactions of all types. |
Dependent item | mssql.transactions Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MSSQL: Service is unavailable | The TCP port of the MSSQL Server service is currently unavailable. |
last(/MSSQL by Zabbix agent 2/net.tcp.service[tcp,{HOST.CONN},{$MSSQL.PORT}])=0 |Disaster |
||
MSSQL: Version has changed | MSSQL version has changed. Acknowledge to close the problem manually. |
last(/MSSQL by Zabbix agent 2/mssql.version["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"],#1)<>last(/MSSQL by Zabbix agent 2/mssql.version["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"],#2) and length(last(/MSSQL by Zabbix agent 2/mssql.version["{$MSSQL.URI}","{$MSSQL.USER}","{$MSSQL.PASSWORD}"]))>0 |Info |
Manual close: Yes | |
MSSQL: Service has been restarted | Uptime is less than 10 minutes. |
last(/MSSQL by Zabbix agent 2/mssql.uptime)<10m |Info |
Manual close: Yes | |
MSSQL: Failed to fetch info data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/MSSQL by Zabbix agent 2/mssql.uptime,30m)=1 |Info |
Depends on:
|
|
MSSQL: Too frequently using pointers | Rows with VARCHAR columns can experience expansion when VARCHAR values are updated with a longer string. In the case where the row cannot fit in the existing page, the row migrates, and access to the row will traverse a pointer. This only happens on heaps (tables without clustered indexes). In cases where clustered indexes cannot be used, drop non-clustered indexes, build a clustered index to reorg pages and rows, drop the clustered index, then recreate non-clustered indexes. |
last(/MSSQL by Zabbix agent 2/mssql.forwarded_records_sec.rate) * 100 > 10 * last(/MSSQL by Zabbix agent 2/mssql.batch_requests_sec.rate) |Warning |
||
MSSQL: Number of work files created per second is high | Too many work files created per second to store temporary results for hash joins and hash aggregates. |
min(/MSSQL by Zabbix agent 2/mssql.workfiles_created_sec.rate,5m)>{$MSSQL.WORK_FILES.MAX} |Average |
||
MSSQL: Number of work tables created per second is high | Too many work tables created per second to store temporary results for query spool, LOB variables, XML variables, and cursors. |
min(/MSSQL by Zabbix agent 2/mssql.worktables_created_sec.rate,5m)>{$MSSQL.WORK_TABLES.MAX} |Average |
||
MSSQL: Percentage of work tables available from the work table cache is low | A value less than 90% may indicate insufficient memory, since execution plans are being dropped, or, on 32-bit systems, may indicate the need for an upgrade to a 64-bit system. |
max(/MSSQL by Zabbix agent 2/mssql.worktables_from_cache_ratio,5m)<{$MSSQL.WORKTABLES_FROM_CACHE_RATIO.MIN.CRIT} |High |
||
MSSQL: Percentage of the buffer cache efficiency is low | Too low buffer cache hit ratio. |
max(/MSSQL by Zabbix agent 2/mssql.buffer_cache_hit_ratio,5m)<{$MSSQL.BUFFER_CACHE_RATIO.MIN.CRIT} |High |
||
MSSQL: Percentage of the buffer cache efficiency is low | Low buffer cache hit ratio. |
max(/MSSQL by Zabbix agent 2/mssql.buffer_cache_hit_ratio,5m)<{$MSSQL.BUFFER_CACHE_RATIO.MIN.WARN} |Warning |
Depends on:
|
|
MSSQL: Number of rps waiting for a free page is high | Some requests have to wait for a free page. |
min(/MSSQL by Zabbix agent 2/mssql.free_list_stalls_sec.rate,5m)>{$MSSQL.FREE_LIST_STALLS.MAX} |Warning |
||
MSSQL: Number of buffers written per second by the lazy writer is high | The number of buffers written per second by the buffer manager's lazy writer exceeds the threshold. |
min(/MSSQL by Zabbix agent 2/mssql.lazy_writes_sec.rate,5m)>{$MSSQL.LAZY_WRITES.MAX} |Warning |
||
MSSQL: Page life expectancy is low | The page stays in the buffer pool without references for less time than the threshold value. |
max(/MSSQL by Zabbix agent 2/mssql.page_life_expectancy,15m)<{$MSSQL.PAGE_LIFE_EXPECTANCY.MIN} |High |
||
MSSQL: Number of physical database page reads per second is high | The physical database page reads are issued too frequently. |
min(/MSSQL by Zabbix agent 2/mssql.page_reads_sec.rate,5m)>{$MSSQL.PAGE_READS.MAX} |Warning |
||
MSSQL: Number of physical database page writes per second is high | The physical database page writes are issued too frequently. |
min(/MSSQL by Zabbix agent 2/mssql.page_writes_sec.rate,5m)>{$MSSQL.PAGE_WRITES.MAX} |Warning |
||
MSSQL: Too many physical reads occurring | If this value makes up even a sizeable minority of the total "Page Reads/sec" (say, greater than 20% of the total page reads), you may have too many physical reads occurring. |
last(/MSSQL by Zabbix agent 2/mssql.readahead_pages_sec.rate) > {$MSSQL.PERCENT_READAHEAD.MAX} / 100 * last(/MSSQL by Zabbix agent 2/mssql.page_reads_sec.rate) |Warning |
||
MSSQL: Total average wait time for locks is high | An average wait time longer than 500 ms may indicate excessive blocking. This value should generally correlate to "Lock Waits/sec" and move up or down with it accordingly. |
min(/MSSQL by Zabbix agent 2/mssql.average_wait_time,5m)>{$MSSQL.AVERAGE_WAIT_TIME.MAX} |Warning |
||
MSSQL: Total number of locks per second is high | Number of new locks and lock conversions per second requested from the lock manager is high. |
min(/MSSQL by Zabbix agent 2/mssql.lock_requests_sec.rate,5m)>{$MSSQL.LOCK_REQUESTS.MAX} |Warning |
||
MSSQL: Total lock requests per second that timed out is high | The total number of timed out lock requests per second, including requests for NOWAIT locks, is high. |
min(/MSSQL by Zabbix agent 2/mssql.lock_timeouts_sec.rate,5m)>{$MSSQL.LOCK_TIMEOUTS.MAX} |Warning |
||
MSSQL: Some blocking is occurring for 5m | Values greater than zero indicate at least some blocking is occurring, while a value of zero can quickly eliminate blocking as a potential root-cause problem. |
min(/MSSQL by Zabbix agent 2/mssql.lock_waits_sec.rate,5m)>0 |Average |
||
MSSQL: Number of deadlocks is high | Too many deadlocks are occurring currently. |
min(/MSSQL by Zabbix agent 2/mssql.number_deadlocks_sec.rate,5m)>{$MSSQL.DEADLOCKS.MAX} |Average |
||
MSSQL: Percent of ad hoc queries running is high | The lower this value is, the better. High values often indicate excessive ad hoc querying and should be as low as possible. If excessive ad hoc querying is happening, try rewriting the queries as procedures or invoke the queries using |
min(/MSSQL by Zabbix agent 2/mssql.percent_of_adhoc_queries,15m) > {$MSSQL.PERCENT_COMPILATIONS.MAX} |Warning |
||
MSSQL: Percent of times statement recompiles is high | This number should be at or near zero, since recompiles can cause deadlocks and exclusive compile locks. This counter's value should follow in proportion to "Batch Requests/sec" and "SQL Compilations/sec". |
min(/MSSQL by Zabbix agent 2/mssql.percent_recompilations_to_compilations,15m) > {$MSSQL.PERCENT_RECOMPILATIONS.MAX} |Warning |
||
MSSQL: Number of index and table scans exceeds index searches in the last 15m | Index searches are preferable to index and table scans. For OLTP applications, optimize for more index searches and less scans (preferably, 1 full scan for every 1000 index searches). Index and table scans are expensive I/O operations. |
min(/MSSQL by Zabbix agent 2/mssql.scan_to_search,15m) > 0.001 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Database discovery | Scanning databases in DBMS. |
Dependent item | mssql.database.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MSSQL DB '{#DBNAME}': Get performance counters | The item gets server status information for {#DBNAME}. |
Dependent item | mssql.db.perf_raw["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Get last backup | The item gets information about backup processes for {#DBNAME}. |
Dependent item | mssql.backup.raw["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': State | 0 = Online 1 = Restoring 2 = Recovering | SQL Server 2008 and later 3 = Recoverypending | SQL Server 2008 and later 4 = Suspect 5 = Emergency | SQL Server 2008 and later 6 = Offline | SQL Server 2008 and later 7 = Copying | Azure SQL Database Active Geo-Replication 10 = Offlinesecondary | Azure SQL Database Active Geo-Replication |
Dependent item | mssql.db.state["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Active transactions | Number of active transactions for the database. |
Dependent item | mssql.db.active_transactions["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Data file size | Cumulative size of all the data files in the database including any automatic growth. Monitoring this counter is useful, for example, for determining the correct size of |
Dependent item | mssql.db.datafilessize["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Log bytes flushed per second | Total number of log bytes flushed per second. Useful for determining trends and utilization of the transaction log. |
Dependent item | mssql.db.logbytesflushed_sec.rate["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Log file size | Cumulative size of all the transaction log files in the database. |
Dependent item | mssql.db.logfilessize["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Log file used size | Cumulative size of all the log files in the database. |
Dependent item | mssql.db.logfilesused_size["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Log flushes per second | Number of log flushes per second. |
Dependent item | mssql.db.logflushessec.rate["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Log flush waits per second | Number of commits per second waiting for the log flush. |
Dependent item | mssql.db.logflushwaits_sec.rate["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Log flush wait time | Total wait time (in milliseconds) to flush the log. On an Always On secondary database, this value indicates the wait time for log records to be hardened to disk. |
Dependent item | mssql.db.logflushwait_time["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Log growths | Total number of times the transaction log for the database has been expanded. |
Dependent item | mssql.db.log_growths["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Log shrinks | Total number of times the transaction log for the database has been shrunk. |
Dependent item | mssql.db.log_shrinks["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Log truncations | Number of times the transaction log has been shrunk. |
Dependent item | mssql.db.log_truncations["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Percent log used | Percentage of log space in use. |
Dependent item | mssql.db.percentlogused["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Transactions per second | Number of transactions started for the database per second. |
Dependent item | mssql.db.transactions_sec.rate["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Last diff backup duration | Duration of the last differential backup. |
Dependent item | mssql.backup.diff.duration["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Last diff backup (time ago) | The amount of time since the last differential backup. |
Dependent item | mssql.backup.diff["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Last full backup duration | Duration of the last full backup. |
Dependent item | mssql.backup.full.duration["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Last full backup (time ago) | The amount of time since the last full backup. |
Dependent item | mssql.backup.full["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Last log backup duration | Duration of the last log backup. |
Dependent item | mssql.backup.log.duration["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Last log backup (time ago) | The amount of time since the last log backup. |
Dependent item | mssql.backup.log["{#DBNAME}"] Preprocessing
|
MSSQL DB '{#DBNAME}': Recovery model | Recovery model selected: 1 = Full 2 = Bulk_logged 3 = Simple |
Dependent item | mssql.backup.recovery_model["{#DBNAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MSSQL DB '{#DBNAME}': State is {ITEM.VALUE} | The DB has a non-working state. |
last(/MSSQL by Zabbix agent 2/mssql.db.state["{#DBNAME}"])>1 |High |
||
MSSQL DB '{#DBNAME}': Number of commits waiting for the log flush is high | Too many commits are waiting for the log flush. |
min(/MSSQL by Zabbix agent 2/mssql.db.log_flush_waits_sec.rate["{#DBNAME}"],5m)>{$MSSQL.LOG_FLUSH_WAITS.MAX:"{#DBNAME}"} |Warning |
||
MSSQL DB '{#DBNAME}': Total wait time to flush the log is high | The wait time to flush the log is too long. |
min(/MSSQL by Zabbix agent 2/mssql.db.log_flush_wait_time["{#DBNAME}"],5m)>{$MSSQL.LOG_FLUSH_WAIT_TIME.MAX:"{#DBNAME}"} |Warning |
||
MSSQL DB '{#DBNAME}': Percent of log usage is high | There's not enough space left in the log. |
min(/MSSQL by Zabbix agent 2/mssql.db.percent_log_used["{#DBNAME}"],5m)>{$MSSQL.PERCENT_LOG_USED.MAX:"{#DBNAME}"} |Warning |
||
MSSQL DB '{#DBNAME}': Diff backup is old | The differential backup has not been executed for a long time. |
last(/MSSQL by Zabbix agent 2/mssql.backup.diff["{#DBNAME}"])>{$MSSQL.BACKUP_DIFF.CRIT:"{#DBNAME}"} and {$MSSQL.BACKUP_DIFF.USED:"{#DBNAME}"}=1 |High |
Manual close: Yes | |
MSSQL DB '{#DBNAME}': Diff backup is old | The differential backup has not been executed for a long time. |
last(/MSSQL by Zabbix agent 2/mssql.backup.diff["{#DBNAME}"])>{$MSSQL.BACKUP_DIFF.WARN:"{#DBNAME}"} and {$MSSQL.BACKUP_DIFF.USED:"{#DBNAME}"}=1 |Warning |
Manual close: Yes Depends on:
|
|
MSSQL DB '{#DBNAME}': Full backup is old | The full backup has not been executed for a long time. |
last(/MSSQL by Zabbix agent 2/mssql.backup.full["{#DBNAME}"])>{$MSSQL.BACKUP_FULL.CRIT:"{#DBNAME}"} and {$MSSQL.BACKUP_FULL.USED:"{#DBNAME}"}=1 |High |
Manual close: Yes | |
MSSQL DB '{#DBNAME}': Full backup is old | The full backup has not been executed for a long time. |
last(/MSSQL by Zabbix agent 2/mssql.backup.full["{#DBNAME}"])>{$MSSQL.BACKUP_FULL.WARN:"{#DBNAME}"} and {$MSSQL.BACKUP_FULL.USED:"{#DBNAME}"}=1 |Warning |
Manual close: Yes Depends on:
|
|
MSSQL DB '{#DBNAME}': Log backup is old | The log backup has not been executed for a long time. |
last(/MSSQL by Zabbix agent 2/mssql.backup.log["{#DBNAME}"])>{$MSSQL.BACKUP_LOG.CRIT:"{#DBNAME}"} and {$MSSQL.BACKUP_LOG.USED:"{#DBNAME}"}=1 and last(/MSSQL by Zabbix agent 2/mssql.backup.recovery_model["{#DBNAME}"])<>3 |High |
Manual close: Yes | |
MSSQL DB '{#DBNAME}': Log backup is old | The log backup has not been executed for a long time. |
last(/MSSQL by Zabbix agent 2/mssql.backup.log["{#DBNAME}"])>{$MSSQL.BACKUP_LOG.WARN:"{#DBNAME}"} and {$MSSQL.BACKUP_LOG.USED:"{#DBNAME}"}=1 and last(/MSSQL by Zabbix agent 2/mssql.backup.recovery_model["{#DBNAME}"])<>3 |Warning |
Manual close: Yes Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Availability group discovery | Discovery of the existing availability groups. |
Dependent item | mssql.availability.group.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MSSQL AG '{#GROUP_NAME}': Primary replica recovery health | Indicates the recovery health of the primary replica: 0 = In progress 1 = Online 2 = Unavailable |
Dependent item | mssql.primaryrecoveryhealth["{#GROUP_NAME}"] Preprocessing
|
MSSQL AG '{#GROUP_NAME}': Primary replica name | Name of the server instance that is hosting the current primary replica. |
Dependent item | mssql.primaryreplica["{#GROUPNAME}"] Preprocessing
|
MSSQL AG '{#GROUP_NAME}': Secondary replica recovery health | Indicates the recovery health of a secondary replica: 0 = In progress 1 = Online 2 = Unavailable |
Dependent item | mssql.secondaryrecoveryhealth["{#GROUP_NAME}"] Preprocessing
|
MSSQL AG '{#GROUP_NAME}': Synchronization health | Reflects a rollup of the 0 = Not healthy. None of the availability replicas have a healthy synchronization. 1 = Partially healthy. The synchronization of some, but not all, availability replicas is healthy. 2 = Healthy. The synchronization of every availability replica is healthy. |
Dependent item | mssql.synchronizationhealth["{#GROUPNAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MSSQL AG '{#GROUP_NAME}': Primary replica recovery health in progress | The primary replica is in the synchronization process. |
last(/MSSQL by Zabbix agent 2/mssql.primary_recovery_health["{#GROUP_NAME}"])=0 |Warning |
||
MSSQL AG '{#GROUP_NAME}': Secondary replica recovery health in progress | The secondary replica is in the synchronization process. |
last(/MSSQL by Zabbix agent 2/mssql.secondary_recovery_health["{#GROUP_NAME}"])=0 |Warning |
||
MSSQL AG '{#GROUP_NAME}': All replicas unhealthy | None of the availability replicas have a healthy synchronization. |
last(/MSSQL by Zabbix agent 2/mssql.synchronization_health["{#GROUP_NAME}"])=0 |Disaster |
||
MSSQL AG '{#GROUP_NAME}': Some replicas unhealthy | The synchronization health of some, but not all, availability replicas is healthy. |
last(/MSSQL by Zabbix agent 2/mssql.synchronization_health["{#GROUP_NAME}"])=1 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Local database discovery | Discovery of the local availability databases. |
Dependent item | mssql.local.db.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': State | 0 = Online 1 = Restoring 2 = Recovering 3 = Recovery pending 4 = Suspect 5 = Emergency 6 = Offline |
Dependent item | mssql.local_db.state["{#DBNAME}"] Preprocessing
|
MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': Suspended | Database state: 0 = Resumed 1 = Suspended |
Dependent item | mssql.localdb.issuspended["{#DBNAME}"] Preprocessing
|
MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': Synchronization health | Reflects the intersection of the synchronization state of a database that is joined to the availability group on the availability replica and the availability mode of the availability replica (synchronous-commit or asynchronous-commit mode): 0 = Not healthy. The synchronizationstate of the database is 0 ("Not synchronizing"). 1 = Partially healthy. A database on a synchronous-commit availability replica is considered partially healthy if synchronizationstate is 1 ("Synchronizing"). 2 = Healthy. A database on an synchronous-commit availability replica is considered healthy if synchronizationstate is 2 ("Synchronized"), and a database on an asynchronous-commit availability replica is considered healthy if synchronizationstate is 1 ("Synchronizing"). |
Dependent item | mssql.localdb.synchronizationhealth["{#DBNAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': "{#DBNAME}" is {ITEM.VALUE} | The local availability database has a non-working state. |
last(/MSSQL by Zabbix agent 2/mssql.local_db.state["{#DBNAME}"])>0 |Warning |
||
MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': "{#DBNAME}" is Not healthy | The synchronization state of the local availability database is "Not synchronizing". |
last(/MSSQL by Zabbix agent 2/mssql.local_db.synchronization_health["{#DBNAME}"])=0 |High |
||
MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': "{#DBNAME}" is Partially healthy | A database on a synchronous-commit availability replica is considered partially healthy if synchronization state is "Synchronizing". |
last(/MSSQL by Zabbix agent 2/mssql.local_db.synchronization_health["{#DBNAME}"])=1 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Non-local database discovery | Discovery of the non-local (not local to SQL Server instance) availability databases. |
Dependent item | mssql.non.local.db.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MSSQL AG '{#GROUPNAME}' Non-Local DB '*{#REPLICANAME}*{#DBNAME}': Log queue size | Amount of the log records of the primary database that has not been sent to the secondary databases. |
Dependent item | mssql.non-localdb.logsendqueuesize["{#GROUPNAME}*{#REPLICANAME}*{#DBNAME}"] Preprocessing
|
MSSQL AG '{#GROUPNAME}' Non-Local DB '*{#REPLICANAME}*{#DBNAME}': Redo log queue size | Amount of log records in the log files of the secondary replica that has not yet been redone. |
Dependent item | mssql.non-localdb.redoqueuesize["{#GROUPNAME}{#REPLICA_NAME}{#DBNAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MSSQL AG '{#GROUPNAME}' Non-Local DB '*{#REPLICANAME}*{#DBNAME}': Log queue size is growing | The log records of the primary database are not sent to the secondary databases. |
last(/MSSQL by Zabbix agent 2/mssql.non-local_db.log_send_queue_size["{#GROUP_NAME}*{#REPLICA_NAME}*{#DBNAME}"],#1)>last(/MSSQL by Zabbix agent 2/mssql.non-local_db.log_send_queue_size["{#GROUP_NAME}*{#REPLICA_NAME}*{#DBNAME}"],#2) and last(/MSSQL by Zabbix agent 2/mssql.non-local_db.log_send_queue_size["{#GROUP_NAME}*{#REPLICA_NAME}*{#DBNAME}"],#2)>last(/MSSQL by Zabbix agent 2/mssql.non-local_db.log_send_queue_size["{#GROUP_NAME}*{#REPLICA_NAME}*{#DBNAME}"],#3) |High |
||
MSSQL AG '{#GROUPNAME}' Non-Local DB '*{#REPLICANAME}*{#DBNAME}': Redo log queue size is growing | The log records in the log files of the secondary replica have not yet been redone. |
last(/MSSQL by Zabbix agent 2/mssql.non-local_db.redo_queue_size["{#GROUP_NAME}*{#REPLICA_NAME}*{#DBNAME}"],#1)>last(/MSSQL by Zabbix agent 2/mssql.non-local_db.redo_queue_size["{#GROUP_NAME}*{#REPLICA_NAME}*{#DBNAME}"],#2) and last(/MSSQL by Zabbix agent 2/mssql.non-local_db.redo_queue_size["{#GROUP_NAME}*{#REPLICA_NAME}*{#DBNAME}"],#2)>last(/MSSQL by Zabbix agent 2/mssql.non-local_db.redo_queue_size["{#GROUP_NAME}*{#REPLICA_NAME}*{#DBNAME}"],#3) |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Quorum discovery | Discovery of the quorum of the WSFC cluster. |
Dependent item | mssql.quorum.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MSSQL Cluster '{#CLUSTER_NAME}': Quorum type | Type of quorum used by this WSFC cluster, one of: 0 = Node Majority. This quorum configuration can sustain failures of half the nodes (rounding up) minus one. 1 = Node and Disk Majority. If the disk witness remains on line, this quorum configuration can sustain failures of half the nodes (rounding up). 2 = Node and File Share Majority. This quorum configuration works in a similar way to Node and Disk Majority, but uses a file-share witness instead of a disk witness. 3 = No Majority: Disk Only. If the quorum disk is online, this quorum configuration can sustain failures of all nodes except one. 4 = Unknown Quorum. Unknown quorum for the cluster. 5 = Cloud Witness. Cluster utilizes Microsoft Azure for quorum arbitration. If the cloud witness is available, the cluster can sustain the failure of half the nodes (rounding up). |
Dependent item | mssql.quorum.type.[{#CLUSTER_NAME}] Preprocessing
|
MSSQL Cluster '{#CLUSTER_NAME}': Quorum state | State of the WSFC quorum, one of: 0 = Unknown quorum state 1 = Normal quorum 2 = Forced quorum |
Dependent item | mssql.quorum.state.[{#CLUSTER_NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Quorum members discovery | Discovery of the quorum members of the WSFC cluster. |
Dependent item | mssql.quorum.member.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MSSQL Cluster member '{#MEMBER_NAME}': Number of quorum votes | Number of quorum votes possessed by this quorum member. |
Dependent item | mssql.quorummembers.numberofquorumvotes.[{#MEMBER_NAME}] Preprocessing
|
MSSQL Cluster member '{#MEMBER_NAME}': Member type | The type of member, one of: 0 = WSFC node 1 = Disk witness 2 = File share witness 3 = Cloud Witness |
Dependent item | mssql.quorummembers.membertype.[{#MEMBER_NAME}] Preprocessing
|
MSSQL Cluster member '{#MEMBER_NAME}': Member state | The member state, one of: 0 = Offline 1 = Online |
Dependent item | mssql.quorummembers.memberstate.[{#MEMBER_NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Replication discovery | Discovery of the database replicas. |
Dependent item | mssql.replica.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Connected state | Whether a secondary replica is currently connected to the primary replica: 0 = Disconnected. The response of an availability replica to the "Disconnected" state depends on its role: On the primary replica, if a secondary replica is disconnected, its secondary databases are marked as "Not synchronized" on the primary replica, which waits for the secondary to reconnect; On a secondary replica, upon detecting that it is disconnected, the secondary replica attempts to reconnect to the primary replica. 1 = Connected. Each primary replica tracks the connection state for every secondary replica in the same availability group. Secondary replicas track the connection state of only the primary replica. |
Dependent item | mssql.replica.connectedstate["{#GROUPNAME}{#REPLICANAME}"] Preprocessing
|
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Is local | Whether the replica is local: 0 = Indicates a remote secondary replica in an availability group whose primary replica is hosted by the local server instance. This value occurs only on the primary replica location. 1 = Indicates a local replica. On secondary replicas, this is the only available value for the availability group to which the replica belongs. |
Dependent item | mssql.replica.islocal["{#GROUPNAME}{#REPLICANAME}"] Preprocessing
|
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Join state | 0 = Not joined 1 = Joined, standalone instance 2 = Joined, failover cluster instance |
Dependent item | mssql.replica.joinstate["{#GROUPNAME}{#REPLICANAME}"] Preprocessing
|
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Operational state | Current operational state of the replica: 0 = Pending failover 1 = Pending 2 = Online 3 = Offline 4 = Failed 5 = Failed, no quorum 6 = Not local |
Dependent item | mssql.replica.operationalstate["{#GROUPNAME}{#REPLICANAME}"] Preprocessing
|
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Recovery health | Rollup of the "databasestate" column of the 0 = In progress. At least one joined database has a database state other than "Online" (databasestate is not "0"). 1 = Online. All the joined databases have a database state of "Online" (database_state is "0"). |
Dependent item | mssql.replica.recoveryhealth["{#GROUPNAME}{#REPLICANAME}"] Preprocessing
|
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Role | Current Always On availability group role of a local replica or a connected remote replica: 0 = Resolving 1 = Primary 2 = Secondary |
Dependent item | mssql.replica.role["{#GROUPNAME}{#REPLICA_NAME}"] Preprocessing
|
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Sync health | Reflects a rollup of the database synchronization state (synchronization_state) of all joined availability databases (also known as replicas) and the availability mode of the replica (synchronous-commit or asynchronous-commit mode). The rollup will reflect the least healthy accumulated state of the databases on the replica: 0 = Not healthy. At least one joined database is in the "Not synchronizing" state. 1 = Partially healthy. Some replicas are not in the target synchronization state: synchronous-commit replicas should be synchronized, and asynchronous-commit replicas should be synchronizing. 2 = Healthy. All replicas are in the target synchronization state: synchronous-commit replicas are synchronized, and asynchronous-commit replicas are synchronizing. |
Dependent item | mssql.replica.synchronizationhealth["{#GROUPNAME}{#REPLICANAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is disconnected | The response of an availability replica to the "Disconnected" state depends on its role: |
last(/MSSQL by Zabbix agent 2/mssql.replica.connected_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=0 and last(/MSSQL by Zabbix agent 2/mssql.replica.role["{#GROUP_NAME}_{#REPLICA_NAME}"])=2 |Warning |
||
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is {ITEM.VALUE} | The operational state of the replica in a given availability group is "Pending" or "Offline". |
last(/MSSQL by Zabbix agent 2/mssql.replica.operational_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=0 or last(/MSSQL by Zabbix agent 2/mssql.replica.operational_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=1 or last(/MSSQL by Zabbix agent 2/mssql.replica.operational_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=3 |Warning |
||
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is {ITEM.VALUE} | The operational state of the replica in a given availability group is "Failed". |
last(/MSSQL by Zabbix agent 2/mssql.replica.operational_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=4 |Average |
||
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is {ITEM.VALUE} | The operational state of the replica in a given availability group is "Failed, no quorum". |
last(/MSSQL by Zabbix agent 2/mssql.replica.operational_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=5 |High |
||
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} Recovery in progress | At least one joined database has a database state other than "Online". |
last(/MSSQL by Zabbix agent 2/mssql.replica.recovery_health["{#GROUP_NAME}_{#REPLICA_NAME}"])=0 |Info |
||
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is Not healthy | At least one joined database is in the "Not synchronizing" state. |
last(/MSSQL by Zabbix agent 2/mssql.replica.synchronization_health["{#GROUP_NAME}_{#REPLICA_NAME}"])=0 |Average |
||
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is Partially healthy | Some replicas are not in the target synchronization state: synchronous-commit replicas should be synchronized, and asynchronous-commit replicas should be synchronizing. |
last(/MSSQL by Zabbix agent 2/mssql.replica.synchronization_health["{#GROUP_NAME}_{#REPLICA_NAME}"])=1 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Mirroring discovery | To see the row for a database other than master or tempdb, you must either be the database owner or have at least ALTER ANY DATABASE or VIEW ANY DATABASE server-level permission or CREATE DATABASE permission in the master database. To see non-NULL values on a mirror database, you must be a member of the sysadmin fixed server role. |
Dependent item | mssql.mirroring.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MSSQL Mirroring '{#DBNAME}': Role | Current role of the local database plays in the database mirroring session. 1 = Principal 2 = Mirror |
Dependent item | mssql.mirroring.role["{#DBNAME}"] Preprocessing
|
MSSQL Mirroring '{#DBNAME}': Role sequence | The number of times that mirroring partners have switched the principal and mirror roles due to a failover or forced service. |
Dependent item | mssql.mirroring.role_sequence["{#DBNAME}"] Preprocessing
|
MSSQL Mirroring '{#DBNAME}': State | State of the mirror database and of the database mirroring session. 0 = Suspended 1 = Disconnected from the other partner 2 = Synchronizing 3 = Pending failover 4 = Synchronized 5 = The partners are not synchronized. Failover is not possible now. 6 = The partners are synchronized. Failover is potentially possible. For information about the requirements for the failover, see Database Mirroring Operating Modes: https://learn.microsoft.com/en-us/sql/database-engine/database-mirroring/database-mirroring-operating-modes?view=sql-server-ver16. |
Dependent item | mssql.mirroring.state["{#DBNAME}"] Preprocessing
|
MSSQL Mirroring '{#DBNAME}': Witness state | State of the witness in the database mirroring session of the database: 0 = Unknown 1 = Connected 2 = Disconnected |
Dependent item | mssql.mirroring.witness_state["{#DBNAME}"] Preprocessing
|
MSSQL Mirroring '{#DBNAME}': Safety level | Safety setting for updates on the mirror database: 0 = Unknown state 1 = Off [asynchronous] 2 = Full [synchronous] |
Dependent item | mssql.mirroring.safety_level["{#DBNAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MSSQL Mirroring '{#DBNAME}': "{#DBNAME}" is {ITEM.VALUE} | The state of the mirror database and of the database mirroring session is "Suspended", "Disconnected from the other partner", or "Synchronizing". |
last(/MSSQL by Zabbix agent 2/mssql.mirroring.state["{#DBNAME}"])>=0 and last(/MSSQL by Zabbix agent 2/mssql.mirroring.state["{#DBNAME}"])<=2 |Info |
||
MSSQL Mirroring '{#DBNAME}': "{#DBNAME}" is {ITEM.VALUE} | The state of the mirror database and of the database mirroring session is "Pending failover". |
last(/MSSQL by Zabbix agent 2/mssql.mirroring.state["{#DBNAME}"])=3 |Warning |
||
MSSQL Mirroring '{#DBNAME}': "{#DBNAME}" is {ITEM.VALUE} | The state of the mirror database and of the database mirroring session is "Not synchronized". The partners are not synchronized. A failover is not possible now. |
last(/MSSQL by Zabbix agent 2/mssql.mirroring.state["{#DBNAME}"])=5 |High |
||
MSSQL Mirroring '{#DBNAME}': "{#DBNAME}" Witness is disconnected | The state of the witness in the database mirroring session of the database is "Disconnected". |
last(/MSSQL by Zabbix agent 2/mssql.mirroring.witness_state["{#DBNAME}"])=2 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Job discovery | Scanning jobs in DBMS. |
Dependent item | mssql.job.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MSSQL Job '{#JOBNAME}': Get job status | The item gets the status of SQL agent job {#JOBNAME}. |
Dependent item | mssql.job.status_raw["{#JOBNAME}"] Preprocessing
|
MSSQL Job '{#JOBNAME}': Enabled | The possible values of the job status: 0 = Disabled 1 = Enabled |
Dependent item | mssql.job.enabled["{#JOBNAME}"] Preprocessing
|
MSSQL Job '{#JOBNAME}': Last run date-time | The last date-time of the job run. |
Dependent item | mssql.job.lastrundatetime["{#JOBNAME}"] Preprocessing
|
MSSQL Job '{#JOBNAME}': Next run date-time | The next date-time of the job run. |
Dependent item | mssql.job.nextrundatetime["{#JOBNAME}"] Preprocessing
|
MSSQL Job '{#JOBNAME}': Last run status message | An informational message about the last run of the job. |
Dependent item | mssql.job.lastrunstatusmessage["{#JOBNAME}"] Preprocessing
|
MSSQL Job '{#JOBNAME}': Run status | The possible values of the job status: 0 ⇒ Failed 1 ⇒ Succeeded 2 ⇒ Retry 3 ⇒ Canceled 4 ⇒ Running |
Dependent item | mssql.job.runstatus["{#JOBNAME}"] Preprocessing
|
MSSQL Job '{#JOBNAME}': Run duration | Duration of the last-run job. |
Dependent item | mssql.job.run_duration["{#JOBNAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MSSQL Job '{#JOBNAME}': Failed to run | The last run of the job has failed. |
last(/MSSQL by Zabbix agent 2/mssql.job.runstatus["{#JOBNAME}"])=0 |Warning |
Manual close: Yes | |
MSSQL Job '{#JOBNAME}': Job duration is high | The job is taking too long. |
last(/MSSQL by Zabbix agent 2/mssql.job.run_duration["{#JOBNAME}"])>{$MSSQL.BACKUP_DURATION.WARN:"{#JOBNAME}"} |Warning |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor MongoDB sharded cluster by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
MongoDB cluster by Zabbix agent 2
— collects metrics from mongos proxy(router) by polling zabbix-agent2.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Note, depending on the number of DBs and collections discovery operation may be expensive. Use filters with macros {$MONGODB.LLD.FILTER.DB.MATCHES}, {$MONGODB.LLD.FILTER.DB.NOTMATCHES}, {$MONGODB.LLD.FILTER.COLLECTION.MATCHES}, {$MONGODB.LLD.FILTER.COLLECTION.NOTMATCHES}.
All sharded Mongodb nodes (mongod) will be discovered with attached template "MongoDB node by Zabbix agent 2".
Test availability: zabbix_get -s mongos.node -k 'mongodb.ping["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"]"
Name | Description | Default |
---|---|---|
{$MONGODB.CONNSTRING} | Connection string in the URI format (password is not used). This param overwrites a value configured in the "Server" option of the configuration file (if it's set), otherwise, the plugin's default value is used: "tcp://localhost:27017" |
tcp://localhost:27017 |
{$MONGODB.USER} | MongoDB username |
|
{$MONGODB.PASSWORD} | MongoDB user password |
|
{$MONGODB.CONNS.AVAILABLE.MIN.WARN} | Minimum number of available connections |
1000 |
{$MONGODB.LLD.FILTER.COLLECTION.MATCHES} | Filter of discoverable collections |
.* |
{$MONGODB.LLD.FILTER.COLLECTION.NOT_MATCHES} | Filter to exclude discovered collections |
CHANGE_IF_NEEDED |
{$MONGODB.LLD.FILTER.DB.MATCHES} | Filter of discoverable databases |
.* |
{$MONGODB.LLD.FILTER.DB.NOT_MATCHES} | Filter to exclude discovered databases |
(admin|config|local) |
{$MONGODB.CURSOR.TIMEOUT.MAX.WARN} | Maximum number of cursors timing out per second |
1 |
{$MONGODB.CURSOR.OPEN.MAX.WARN} | Maximum number of open cursors |
10000 |
Name | Description | Type | Key and additional info |
---|---|---|---|
MongoDB cluster: Get server status | The mongos statistic |
Zabbix agent | mongodb.server.status["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] |
MongoDB cluster: Get mongodb.connpool.stats | Returns current info about connpool.stats. |
Zabbix agent | mongodb.connpool.stats["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] |
MongoDB cluster: Ping | Test if a connection is alive or not. |
Zabbix agent | mongodb.ping["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] Preprocessing
|
MongoDB cluster: Jumbo chunks | Total number of 'jumbo' chunks in the mongo cluster. |
Zabbix agent | mongodb.jumbo_chunks.count["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] |
MongoDB cluster: Mongos version | Version of the Mongos server |
Dependent item | mongodb.version Preprocessing
|
MongoDB cluster: Uptime | Number of seconds since the Mongos server start. |
Dependent item | mongodb.uptime Preprocessing
|
MongoDB cluster: Operations: command | The number of commands issued to the database per second. Counts all commands except the write commands: insert, update, and delete. |
Dependent item | mongodb.opcounters.command.rate Preprocessing
|
MongoDB cluster: Operations: delete | The number of delete operations the mongos instance per second. |
Dependent item | mongodb.opcounters.delete.rate Preprocessing
|
MongoDB cluster: Operations: update, rate | The number of update operations the mongos instance per second. |
Dependent item | mongodb.opcounters.update.rate Preprocessing
|
MongoDB cluster: Operations: query, rate | The number of queries received the mongos instance per second. |
Dependent item | mongodb.opcounters.query.rate Preprocessing
|
MongoDB cluster: Operations: insert, rate | The number of insert operations received the mongos instance per second. |
Dependent item | mongodb.opcounters.insert.rate Preprocessing
|
MongoDB cluster: Operations: getmore, rate | The number of "getmore" operations the mongos per second. This counter can be high even if the query count is low. Secondary nodes send getMore operations as part of the replication process. |
Dependent item | mongodb.opcounters.getmore.rate Preprocessing
|
MongoDB cluster: Last seen configserver | The latest optime of the CSRS primary that the mongos has seen. |
Dependent item | mongodb.lastseenconfig_server Preprocessing
|
MongoDB cluster: Configserver heartbeat | Difference between the latest optime of the CSRS primary that the mongos has seen and cluster time. |
Dependent item | mongodb.configserverheartbeat Preprocessing
|
MongoDB cluster: Bytes in, rate | The total number of bytes that the server has received over network connections initiated by clients or other mongod/mongos instances per second. |
Dependent item | mongodb.network.bytes_in.rate Preprocessing
|
MongoDB cluster: Bytes out, rate | The total number of bytes that the server has sent over network connections initiated by clients or other mongod/mongos instances per second. |
Dependent item | mongodb.network.bytes_out.rate Preprocessing
|
MongoDB cluster: Requests, rate | Number of distinct requests that the server has received per second. |
Dependent item | mongodb.network.numRequests.rate Preprocessing
|
MongoDB cluster: Connections, current | The number of incoming connections from clients to the database server. This number includes the current shell session. |
Dependent item | mongodb.connections.current Preprocessing
|
MongoDB cluster: New connections, rate | "Rate of all incoming connections created to the server." |
Dependent item | mongodb.connections.rate Preprocessing
|
MongoDB cluster: Connections, active | The number of active client connections to the server. Active client connections refers to client connections that currently have operations in progress. Available starting in 4.0.7, 0 for older versions. |
Dependent item | mongodb.connections.active Preprocessing
|
MongoDB cluster: Connections, available | "The number of unused incoming connections available." |
Dependent item | mongodb.connections.available Preprocessing
|
MongoDB cluster: Connection pool: client connections | The number of active and stored outgoing synchronous connections from the current mongos instance to other members of the sharded cluster. |
Dependent item | mongodb.connection_pool.client Preprocessing
|
MongoDB cluster: Connection pool: scoped | Number of active and stored outgoing scoped synchronous connections from the current mongos instance to other members of the sharded cluster. |
Dependent item | mongodb.connection_pool.scoped Preprocessing
|
MongoDB cluster: Connection pool: created, rate | The total number of outgoing connections created per second by the current mongos instance to other members of the sharded cluster. |
Dependent item | mongodb.connection_pool.created.rate Preprocessing
|
MongoDB cluster: Connection pool: available | The total number of available outgoing connections from the current mongos instance to other members of the sharded cluster. |
Dependent item | mongodb.connection_pool.available Preprocessing
|
MongoDB cluster: Connection pool: in use | Reports the total number of outgoing connections from the current mongos instance to other members of the sharded cluster set that are currently in use. |
Dependent item | mongodb.connectionpool.inuse Preprocessing
|
MongoDB cluster: Connection pool: refreshing | Reports the total number of outgoing connections from the current mongos instance to other members of the sharded cluster that are currently being refreshed. |
Dependent item | mongodb.connection_pool.refreshing Preprocessing
|
MongoDB cluster: Cursor: open no timeout | Number of open cursors with the option DBQuery.Option.noTimeout set to prevent timeout after a period of inactivity. |
Dependent item | mongodb.metrics.cursor.open.no_timeout Preprocessing
|
MongoDB cluster: Cursor: open pinned | Number of pinned open cursors. |
Dependent item | mongodb.cursor.open.pinned Preprocessing
|
MongoDB cluster: Cursor: open total | Number of cursors that MongoDB is maintaining for clients. |
Dependent item | mongodb.cursor.open.total Preprocessing
|
MongoDB cluster: Cursor: timed out, rate | Number of cursors that time out, per second. |
Dependent item | mongodb.cursor.timed_out.rate Preprocessing
|
MongoDB cluster: Architecture | A number, either 64 or 32, that indicates whether the MongoDB instance is compiled for 64-bit or 32-bit architecture. |
Dependent item | mongodb.mem.bits Preprocessing
|
MongoDB cluster: Memory: resident | Amount of memory currently used by the database process. |
Dependent item | mongodb.mem.resident Preprocessing
|
MongoDB cluster: Memory: virtual | Amount of virtual memory used by the mongos process. |
Dependent item | mongodb.mem.virtual Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MongoDB cluster: Connection to mongos proxy is unavailable | Connection to mongos proxy instance is currently unavailable. |
last(/MongoDB cluster by Zabbix agent 2/mongodb.ping["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"])=0 |High |
||
MongoDB cluster: Version has changed | MongoDB cluster version has changed. Acknowledge to close the problem manually. |
last(/MongoDB cluster by Zabbix agent 2/mongodb.version,#1)<>last(/MongoDB cluster by Zabbix agent 2/mongodb.version,#2) and length(last(/MongoDB cluster by Zabbix agent 2/mongodb.version))>0 |Info |
Manual close: Yes | |
MongoDB cluster: Mongos server has been restarted | Uptime is less than 10 minutes. |
last(/MongoDB cluster by Zabbix agent 2/mongodb.uptime)<10m |Info |
Manual close: Yes | |
MongoDB cluster: Failed to fetch info data | Zabbix has not received data for items for the last 10 minutes |
nodata(/MongoDB cluster by Zabbix agent 2/mongodb.uptime,10m)=1 |Warning |
Manual close: Yes Depends on:
|
|
MongoDB cluster: Available connections is low | Too few available connections. |
max(/MongoDB cluster by Zabbix agent 2/mongodb.connections.available,5m)<{$MONGODB.CONNS.AVAILABLE.MIN.WARN} |Warning |
||
MongoDB cluster: Too many cursors opened by MongoDB for clients | min(/MongoDB cluster by Zabbix agent 2/mongodb.cursor.open.total,5m)>{$MONGODB.CURSOR.OPEN.MAX.WARN} |Warning |
|||
MongoDB cluster: Too many cursors are timing out | min(/MongoDB cluster by Zabbix agent 2/mongodb.cursor.timed_out.rate,5m)>{$MONGODB.CURSOR.TIMEOUT.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Database discovery | Collect database metrics. Note, depending on the number of DBs this discovery operation may be expensive. Use filters with macros {$MONGODB.LLD.FILTER.DB.MATCHES}, {$MONGODB.LLD.FILTER.DB.NOT_MATCHES}. |
Zabbix agent | mongodb.db.discovery["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
MongoDB {#DBNAME}: Get db stats {#DBNAME} | Returns statistics reflecting the database system's state. |
Zabbix agent | mongodb.db.stats["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}","{#DBNAME}"] |
MongoDB {#DBNAME}: Objects, avg size | The average size of each document in bytes. |
Dependent item | mongodb.db.size["{#DBNAME}"] Preprocessing
|
MongoDB {#DBNAME}: Size, data | Total size of the data held in this database including the padding factor. |
Dependent item | mongodb.db.data_size["{#DBNAME}"] Preprocessing
|
MongoDB {#DBNAME}: Size, file | Total size of the data held in this database including the padding factor (only available with the mmapv1 storage engine). |
Dependent item | mongodb.db.file_size["{#DBNAME}"] Preprocessing
|
MongoDB {#DBNAME}: Size, index | Total size of all indexes created on this database. |
Dependent item | mongodb.db.index_size["{#DBNAME}"] Preprocessing
|
MongoDB {#DBNAME}: Size, storage | Total amount of space allocated to collections in this database for document storage. |
Dependent item | mongodb.db.storage_size["{#DBNAME}"] Preprocessing
|
MongoDB {#DBNAME}: Objects, count | Number of objects (documents) in the database across all collections. |
Dependent item | mongodb.db.objects["{#DBNAME}"] Preprocessing
|
MongoDB {#DBNAME}: Extents | Contains a count of the number of extents in the database across all collections. |
Dependent item | mongodb.db.extents["{#DBNAME}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Collection discovery | Collect collections metrics. Note, depending on the number of DBs and collections this discovery operation may be expensive. Use filters with macros {$MONGODB.LLD.FILTER.DB.MATCHES}, {$MONGODB.LLD.FILTER.DB.NOTMATCHES}, {$MONGODB.LLD.FILTER.COLLECTION.MATCHES}, {$MONGODB.LLD.FILTER.COLLECTION.NOTMATCHES}. |
Zabbix agent | mongodb.collections.discovery["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
MongoDB {#DBNAME}.{#COLLECTION}: Get collection stats {#DBNAME}.{#COLLECTION} | Returns a variety of storage statistics for a given collection. |
Zabbix agent | mongodb.collection.stats["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}","{#DBNAME}","{#COLLECTION}"] |
MongoDB {#DBNAME}.{#COLLECTION}: Size | The total size in bytes of the data in the collection plus the size of every indexes on the mongodb.collection. |
Dependent item | mongodb.collection.size["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Objects, avg size | The size of the average object in the collection in bytes. |
Dependent item | mongodb.collection.avgobjsize["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Objects, count | Total number of objects in the collection. |
Dependent item | mongodb.collection.count["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Capped, max number | Maximum number of documents in a capped collection. |
Dependent item | mongodb.collection.max["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Capped, max size | Maximum size of a capped collection in bytes. |
Dependent item | mongodb.collection.max_size["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Storage size | Total storage space allocated to this collection for document storage. |
Dependent item | mongodb.collection.storage_size["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Indexes | Total number of indices on the collection. |
Dependent item | mongodb.collection.nindexes["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Capped | Whether or not the collection is capped. |
Dependent item | mongodb.collection.capped["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Shards discovery | Discovery shared cluster hosts. |
Zabbix agent | mongodb.sh.discovery["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Config servers discovery | Discovery shared cluster config servers. |
Zabbix agent | mongodb.cfg.discovery["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor single MongoDB server by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
MongoDB node by Zabbix Agent 2
— collects metrics by polling zabbix-agent2.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Note, depending on the number of DBs and collections discovery operation may be expensive. Use filters with macros {$MONGODB.LLD.FILTER.DB.MATCHES}, {$MONGODB.LLD.FILTER.DB.NOTMATCHES}, {$MONGODB.LLD.FILTER.COLLECTION.MATCHES}, {$MONGODB.LLD.FILTER.COLLECTION.NOTMATCHES}.
Test availability: zabbix_get -s mongodb.node -k 'mongodb.ping["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"]"
Name | Description | Default |
---|---|---|
{$MONGODB.CONNSTRING} | Connection string in the URI format (password is not used). This param overwrites a value configured in the "Server" option of the configuration file (if it's set), otherwise, the plugin's default value is used: "tcp://localhost:27017" |
tcp://localhost:27017 |
{$MONGODB.USER} | MongoDB username |
|
{$MONGODB.PASSWORD} | MongoDB user password |
|
{$MONGODB.CONNS.PCT.USED.MAX.WARN} | Maximum percentage of used connections |
80 |
{$MONGODB.CURSOR.TIMEOUT.MAX.WARN} | Maximum number of cursors timing out per second |
1 |
{$MONGODB.CURSOR.OPEN.MAX.WARN} | Maximum number of open cursors |
10000 |
{$MONGODB.REPL.LAG.MAX.WARN} | Maximum replication lag in seconds |
10s |
{$MONGODB.LLD.FILTER.COLLECTION.MATCHES} | Filter of discoverable collections |
.* |
{$MONGODB.LLD.FILTER.COLLECTION.NOT_MATCHES} | Filter to exclude discovered collections |
CHANGE_IF_NEEDED |
{$MONGODB.LLD.FILTER.DB.MATCHES} | Filter of discoverable databases |
.* |
{$MONGODB.LLD.FILTER.DB.NOT_MATCHES} | Filter to exclude discovered databases |
(admin|config|local) |
{$MONGODB.WIRED_TIGER.TICKETS.AVAILABLE.MIN.WARN} | Minimum number of available WiredTiger read or write tickets remaining |
5 |
Name | Description | Type | Key and additional info |
---|---|---|---|
MongoDB: Get server status | Returns a database's state. |
Zabbix agent | mongodb.server.status["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] |
MongoDB: Get Replica Set status | Returns the replica set status from the point of view of the member where the method is run. |
Zabbix agent | mongodb.rs.status["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] |
MongoDB: Get oplog stats | Returns status of the replica set, using data polled from the oplog. |
Zabbix agent | mongodb.oplog.stats["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] |
MongoDB: Ping | Test if a connection is alive or not. |
Zabbix agent | mongodb.ping["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] Preprocessing
|
MongoDB: Get collections usage stats | Returns usage statistics for each collection. |
Zabbix agent | mongodb.collections.usage["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] |
MongoDB: MongoDB version | Version of the MongoDB server. |
Dependent item | mongodb.version Preprocessing
|
MongoDB: Uptime | Number of seconds that the mongod process has been active. |
Dependent item | mongodb.uptime Preprocessing
|
MongoDB: Asserts: message, rate | The number of message assertions raised per second. Check the log file for more information about these messages. |
Dependent item | mongodb.asserts.msg.rate Preprocessing
|
MongoDB: Asserts: user, rate | The number of "user asserts" that have occurred per second. These are errors that user may generate, such as out of disk space or duplicate key. |
Dependent item | mongodb.asserts.user.rate Preprocessing
|
MongoDB: Asserts: warning, rate | The number of warnings raised per second. |
Dependent item | mongodb.asserts.warning.rate Preprocessing
|
MongoDB: Asserts: regular, rate | The number of regular assertions raised per second. Check the log file for more information about these messages. |
Dependent item | mongodb.asserts.regular.rate Preprocessing
|
MongoDB: Asserts: rollovers, rate | Number of times that the rollover counters roll over per second. The counters rollover to zero every 2^30 assertions. |
Dependent item | mongodb.asserts.rollovers.rate Preprocessing
|
MongoDB: Active clients: writers | The number of active client connections performing write operations. |
Dependent item | mongodb.active_clients.writers Preprocessing
|
MongoDB: Active clients: readers | The number of the active client connections performing read operations. |
Dependent item | mongodb.active_clients.readers Preprocessing
|
MongoDB: Active clients: total | The total number of internal client connections to the database including system threads as well as queued readers and writers. |
Dependent item | mongodb.active_clients.total Preprocessing
|
MongoDB: Current queue: writers | The number of operations that are currently queued and waiting for the write lock. A consistently small write-queue, particularly of shorter operations, is no cause for concern. |
Dependent item | mongodb.current_queue.writers Preprocessing
|
MongoDB: Current queue: readers | The number of operations that are currently queued and waiting for the read lock. A consistently small read-queue, particularly of shorter operations, should cause no concern. |
Dependent item | mongodb.current_queue.readers Preprocessing
|
MongoDB: Current queue: total | The total number of operations queued waiting for the lock. |
Dependent item | mongodb.current_queue.total Preprocessing
|
MongoDB: Operations: command, rate | The number of commands issued to the database the mongod instance per second. Counts all commands except the write commands: insert, update, and delete. |
Dependent item | mongodb.opcounters.command.rate Preprocessing
|
MongoDB: Operations: delete, rate | The number of delete operations the mongod instance per second. |
Dependent item | mongodb.opcounters.delete.rate Preprocessing
|
MongoDB: Operations: update, rate | The number of update operations the mongod instance per second. |
Dependent item | mongodb.opcounters.update.rate Preprocessing
|
MongoDB: Operations: query, rate | The number of queries received the mongod instance per second. |
Dependent item | mongodb.opcounters.query.rate Preprocessing
|
MongoDB: Operations: insert, rate | The number of insert operations received since the mongod instance per second. |
Dependent item | mongodb.opcounters.insert.rate Preprocessing
|
MongoDB: Operations: getmore, rate | The number of "getmore" operations since the mongod instance per second. This counter can be high even if the query count is low. Secondary nodes send getMore operations as part of the replication process. |
Dependent item | mongodb.opcounters.getmore.rate Preprocessing
|
MongoDB: Connections, current | The number of incoming connections from clients to the database server. This number includes the current shell session. |
Dependent item | mongodb.connections.current Preprocessing
|
MongoDB: New connections, rate | Rate of all incoming connections created to the server. |
Dependent item | mongodb.connections.rate Preprocessing
|
MongoDB: Connections, available | The number of unused incoming connections available. |
Dependent item | mongodb.connections.available Preprocessing
|
MongoDB: Connections, active | The number of active client connections to the server. Active client connections refers to client connections that currently have operations in progress. Available starting in 4.0.7, 0 for older versions. |
Dependent item | mongodb.connections.active Preprocessing
|
MongoDB: Bytes in, rate | The total number of bytes that the server has received over network connections initiated by clients or other mongod/mongos instances per second. |
Dependent item | mongodb.network.bytes_in.rate Preprocessing
|
MongoDB: Bytes out, rate | The total number of bytes that the server has sent over network connections initiated by clients or other mongod/mongos instances per second. |
Dependent item | mongodb.network.bytes_out.rate Preprocessing
|
MongoDB: Requests, rate | Number of distinct requests that the server has received per second |
Dependent item | mongodb.network.numRequests.rate Preprocessing
|
MongoDB: Document: deleted, rate | Number of documents deleted per second. |
Dependent item | mongod.document.deleted.rate Preprocessing
|
MongoDB: Document: inserted, rate | Number of documents inserted per second. |
Dependent item | mongod.document.inserted.rate Preprocessing
|
MongoDB: Document: returned, rate | Number of documents returned by queries per second. |
Dependent item | mongod.document.returned.rate Preprocessing
|
MongoDB: Document: updated, rate | Number of documents updated per second. |
Dependent item | mongod.document.updated.rate Preprocessing
|
MongoDB: Cursor: open no timeout | Number of open cursors with the option DBQuery.Option.noTimeout set to prevent timeout after a period of inactivity. |
Dependent item | mongodb.metrics.cursor.open.no_timeout Preprocessing
|
MongoDB: Cursor: open pinned | Number of pinned open cursors. |
Dependent item | mongodb.cursor.open.pinned Preprocessing
|
MongoDB: Cursor: open total | Number of cursors that MongoDB is maintaining for clients. |
Dependent item | mongodb.cursor.open.total Preprocessing
|
MongoDB: Cursor: timed out, rate | Number of cursors that time out, per second. |
Dependent item | mongodb.cursor.timed_out.rate Preprocessing
|
MongoDB: Architecture | A number, either 64 or 32, that indicates whether the MongoDB instance is compiled for 64-bit or 32-bit architecture. |
Dependent item | mongodb.mem.bits Preprocessing
|
MongoDB: Memory: mapped | Amount of mapped memory by the database. |
Dependent item | mongodb.mem.mapped Preprocessing
|
MongoDB: Memory: mapped with journal | The amount of mapped memory, including the memory used for journaling. |
Dependent item | mongodb.mem.mappedwithjournal Preprocessing
|
MongoDB: Memory: resident | Amount of memory currently used by the database process. |
Dependent item | mongodb.mem.resident Preprocessing
|
MongoDB: Memory: virtual | Amount of virtual memory used by the mongod process. |
Dependent item | mongodb.mem.virtual Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MongoDB: Connection to MongoDB is unavailable | Connection to MongoDB instance is currently unavailable. |
last(/MongoDB node by Zabbix agent 2/mongodb.ping["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"])=0 |High |
||
MongoDB: Version has changed | MongoDB version has changed. Acknowledge to close the problem manually. |
last(/MongoDB node by Zabbix agent 2/mongodb.version,#1)<>last(/MongoDB node by Zabbix agent 2/mongodb.version,#2) and length(last(/MongoDB node by Zabbix agent 2/mongodb.version))>0 |Info |
Manual close: Yes | |
MongoDB: mongod process has been restarted | Uptime is less than 10 minutes. |
last(/MongoDB node by Zabbix agent 2/mongodb.uptime)<10m |Info |
Manual close: Yes | |
MongoDB: Failed to fetch info data | Zabbix has not received data for items for the last 10 minutes |
nodata(/MongoDB node by Zabbix agent 2/mongodb.uptime,10m)=1 |Warning |
Manual close: Yes Depends on:
|
|
MongoDB: Total number of open connections is too high | Too few available connections. |
min(/MongoDB node by Zabbix agent 2/mongodb.connections.current,5m)/(last(/MongoDB node by Zabbix agent 2/mongodb.connections.available)+last(/MongoDB node by Zabbix agent 2/mongodb.connections.current))*100>{$MONGODB.CONNS.PCT.USED.MAX.WARN} |Warning |
||
MongoDB: Too many cursors opened by MongoDB for clients | min(/MongoDB node by Zabbix agent 2/mongodb.cursor.open.total,5m)>{$MONGODB.CURSOR.OPEN.MAX.WARN} |Warning |
|||
MongoDB: Too many cursors are timing out | min(/MongoDB node by Zabbix agent 2/mongodb.cursor.timed_out.rate,5m)>{$MONGODB.CURSOR.TIMEOUT.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Database discovery | Collect database metrics. Note, depending on the number of DBs this discovery operation may be expensive. Use filters with macros {$MONGODB.LLD.FILTER.DB.MATCHES}, {$MONGODB.LLD.FILTER.DB.NOT_MATCHES}. |
Zabbix agent | mongodb.db.discovery["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
MongoDB {#DBNAME}: Get db stats {#DBNAME} | Returns statistics reflecting the database system's state. |
Zabbix agent | mongodb.db.stats["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}","{#DBNAME}"] |
MongoDB {#DBNAME}: Objects, avg size | The average size of each document in bytes. |
Dependent item | mongodb.db.size["{#DBNAME}"] Preprocessing
|
MongoDB {#DBNAME}: Size, data | Total size of the data held in this database including the padding factor. |
Dependent item | mongodb.db.data_size["{#DBNAME}"] Preprocessing
|
MongoDB {#DBNAME}: Size, file | Total size of the data held in this database including the padding factor (only available with the mmapv1 storage engine). |
Dependent item | mongodb.db.file_size["{#DBNAME}"] Preprocessing
|
MongoDB {#DBNAME}: Size, index | Total size of all indexes created on this database. |
Dependent item | mongodb.db.index_size["{#DBNAME}"] Preprocessing
|
MongoDB {#DBNAME}: Size, storage | Total amount of space allocated to collections in this database for document storage. |
Dependent item | mongodb.db.storage_size["{#DBNAME}"] Preprocessing
|
MongoDB {#DBNAME}: Collections | Contains a count of the number of collections in that database. |
Dependent item | mongodb.db.collections["{#DBNAME}"] Preprocessing
|
MongoDB {#DBNAME}: Objects, count | Number of objects (documents) in the database across all collections. |
Dependent item | mongodb.db.objects["{#DBNAME}"] Preprocessing
|
MongoDB {#DBNAME}: Extents | Contains a count of the number of extents in the database across all collections. |
Dependent item | mongodb.db.extents["{#DBNAME}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Collection discovery | Collect collections metrics. Note, depending on the number of DBs and collections this discovery operation may be expensive. Use filters with macros {$MONGODB.LLD.FILTER.DB.MATCHES}, {$MONGODB.LLD.FILTER.DB.NOTMATCHES}, {$MONGODB.LLD.FILTER.COLLECTION.MATCHES}, {$MONGODB.LLD.FILTER.COLLECTION.NOTMATCHES}. |
Zabbix agent | mongodb.collections.discovery["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
MongoDB {#DBNAME}.{#COLLECTION}: Get collection stats {#DBNAME}.{#COLLECTION} | Returns a variety of storage statistics for a given collection. |
Zabbix agent | mongodb.collection.stats["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}","{#DBNAME}","{#COLLECTION}"] |
MongoDB {#DBNAME}.{#COLLECTION}: Size | The total size in bytes of the data in the collection plus the size of every indexes on the mongodb.collection. |
Dependent item | mongodb.collection.size["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Objects, avg size | The size of the average object in the collection in bytes. |
Dependent item | mongodb.collection.avgobjsize["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Objects, count | Total number of objects in the collection. |
Dependent item | mongodb.collection.count["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Capped: max number | Maximum number of documents that may be present in a capped collection. |
Dependent item | mongodb.collection.max_number["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Capped: max size | Maximum size of a capped collection in bytes. |
Dependent item | mongodb.collection.max_size["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Storage size | Total storage space allocated to this collection for document storage. |
Dependent item | mongodb.collection.storage_size["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Indexes | Total number of indices on the collection. |
Dependent item | mongodb.collection.nindexes["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Capped | Whether or not the collection is capped. |
Dependent item | mongodb.collection.capped["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Operations: total, rate | The number of operations per second. |
Dependent item | mongodb.collection.ops.total.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Read lock, rate | The number of operations per second. |
Dependent item | mongodb.collection.read_lock.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Write lock, rate | The number of operations per second. |
Dependent item | mongodb.collection.write_lock.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Operations: queries, rate | The number of operations per second. |
Dependent item | mongodb.collection.ops.queries.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Operations: getmore, rate | The number of operations per second. |
Dependent item | mongodb.collection.ops.getmore.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Operations: insert, rate | The number of operations per second. |
Dependent item | mongodb.collection.ops.insert.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Operations: update, rate | The number of operations per second. |
Dependent item | mongodb.collection.ops.update.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Operations: remove, rate | The number of operations per second. |
Dependent item | mongodb.collection.ops.remove.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Operations: commands, rate | The number of operations per second. |
Dependent item | mongodb.collection.ops.commands.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Operations: total, ms/s | Fraction of time (ms/s) the mongod has spent to operations. |
Dependent item | mongodb.collection.ops.total.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Read lock, ms/s | Fraction of time (ms/s) the mongod has spent to operations. |
Dependent item | mongodb.collection.read_lock.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Write lock, ms/s | Fraction of time (ms/s) the mongod has spent to operations. |
Dependent item | mongodb.collection.write_lock.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Operations: queries, ms/s | Fraction of time (ms/s) the mongod has spent to operations. |
Dependent item | mongodb.collection.ops.queries.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Operations: getmore, ms/s | Fraction of time (ms/s) the mongod has spent to operations. |
Dependent item | mongodb.collection.ops.getmore.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Operations: insert, ms/s | Fraction of time (ms/s) the mongod has spent to operations. |
Dependent item | mongodb.collection.ops.insert.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Operations: update, ms/s | Fraction of time (ms/s) the mongod has spent to operations. |
Dependent item | mongodb.collection.ops.update.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Operations: remove, ms/s | Fraction of time (ms/s) the mongod has spent to operations. |
Dependent item | mongodb.collection.ops.remove.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
MongoDB {#DBNAME}.{#COLLECTION}: Operations: commands, ms/s | Fraction of time (ms/s) the mongod has spent to operations. |
Dependent item | mongodb.collection.ops.commands.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Replication discovery | Collect metrics by Zabbix agent if it exists. |
Dependent item | mongodb.rs.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MongoDB: Node state | An integer between 0 and 10 that represents the replica state of the current member. |
Dependent item | mongodb.rs.state[{#RS_NAME}] Preprocessing
|
MongoDB: Replication lag | Delay between a write operation on the primary and its copy to a secondary. |
Dependent item | mongodb.rs.lag[{#RS_NAME}] Preprocessing
|
MongoDB: Number of replicas | The number of replicated nodes in current ReplicaSet. |
Dependent item | mongodb.rs.totalnodes[{#RSNAME}] Preprocessing
|
MongoDB: Number of unhealthy replicas | The number of replicated nodes with member health value = 0. |
Dependent item | mongodb.rs.unhealthycount[{#RSNAME}] Preprocessing
|
MongoDB: Unhealthy replicas | The replicated nodes in current ReplicaSet with member health value = 0. |
Dependent item | mongodb.rs.unhealthy[{#RS_NAME}] Preprocessing
|
MongoDB: Apply batches, rate | Number of batches applied across all databases per second. |
Dependent item | mongodb.rs.apply.batches.rate[{#RS_NAME}] Preprocessing
|
MongoDB: Apply batches, ms/s | Fraction of time (ms/s) the mongod has spent applying operations from the oplog. |
Dependent item | mongodb.rs.apply.batches.ms.rate[{#RS_NAME}] Preprocessing
|
MongoDB: Apply ops, rate | Number of oplog operations applied per second. |
Dependent item | mongodb.rs.apply.rate[{#RS_NAME}] Preprocessing
|
MongoDB: Buffer | Number of operations in the oplog buffer. |
Dependent item | mongodb.rs.buffer.count[{#RS_NAME}] Preprocessing
|
MongoDB: Buffer, max size | Maximum size of the buffer. |
Dependent item | mongodb.rs.buffer.maxsize[{#RSNAME}] Preprocessing
|
MongoDB: Buffer, size | Current size of the contents of the oplog buffer. |
Dependent item | mongodb.rs.buffer.size[{#RS_NAME}] Preprocessing
|
MongoDB: Network bytes, rate | Amount of data read from the replication sync source per second. |
Dependent item | mongodb.rs.network.bytes.rate[{#RS_NAME}] Preprocessing
|
MongoDB: Network getmores, rate | Number of getmore operations per second. |
Dependent item | mongodb.rs.network.getmores.rate[{#RS_NAME}] Preprocessing
|
MongoDB: Network getmores, ms/s | Fraction of time (ms/s) required to collect data from getmore operations. |
Dependent item | mongodb.rs.network.getmores.ms.rate[{#RS_NAME}] Preprocessing
|
MongoDB: Network ops, rate | Number of operations read from the replication source per second. |
Dependent item | mongodb.rs.network.ops.rate[{#RS_NAME}] Preprocessing
|
MongoDB: Network readers created, rate | Number of oplog query processes created per second. |
Dependent item | mongodb.rs.network.readers.rate[{#RS_NAME}] Preprocessing
|
MongoDB {#RS_NAME}: Oplog time diff | Oplog window: difference between the first and last operation in the oplog. Only present if there are entries in the oplog. |
Dependent item | mongodb.rs.oplog.timediff[{#RS_NAME}] Preprocessing
|
MongoDB: Preload docs, rate | Number of documents loaded per second during the pre-fetch stage of replication. |
Dependent item | mongodb.rs.preload.docs.rate[{#RS_NAME}] Preprocessing
|
MongoDB: Preload docs, ms/s | Fraction of time (ms/s) spent loading documents as part of the pre-fetch stage of replication. |
Dependent item | mongodb.rs.preload.docs.ms.rate[{#RS_NAME}] Preprocessing
|
MongoDB: Preload indexes, rate | Number of index entries loaded by members before updating documents as part of the pre-fetch stage of replication. |
Dependent item | mongodb.rs.preload.indexes.rate[{#RS_NAME}] Preprocessing
|
MongoDB: Preload indexes, ms/s | Fraction of time (ms/s) spent loading documents as part of the pre-fetch stage of replication. |
Dependent item | mongodb.rs.preload.indexes.ms.rate[{#RS_NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MongoDB: Node in ReplicaSet changed the state | Node in ReplicaSet changed the state. Acknowledge to close the problem manually. |
last(/MongoDB node by Zabbix agent 2/mongodb.rs.state[{#RS_NAME}],#1)<>last(/MongoDB node by Zabbix agent 2/mongodb.rs.state[{#RS_NAME}],#2) |Warning |
Manual close: Yes | |
MongoDB: Replication lag with primary is too high | min(/MongoDB node by Zabbix agent 2/mongodb.rs.lag[{#RS_NAME}],5m)>{$MONGODB.REPL.LAG.MAX.WARN} |Warning |
|||
MongoDB: There are unhealthy replicas in ReplicaSet | last(/MongoDB node by Zabbix agent 2/mongodb.rs.unhealthy_count[{#RS_NAME}])>0 and length(last(/MongoDB node by Zabbix agent 2/mongodb.rs.unhealthy[{#RS_NAME}]))>0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
WiredTiger metrics | Collect metrics of WiredTiger Storage Engine if it exists. |
Dependent item | mongodb.wired_tiger.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
MongoDB: WiredTiger cache: bytes | Size of the data currently in cache. |
Dependent item | mongodb.wiredtiger.cache.bytesin_cache[{#SINGLETON}] Preprocessing
|
MongoDB: WiredTiger cache: in-memory page splits | In-memory page splits. |
Dependent item | mongodb.wired_tiger.cache.splits[{#SINGLETON}] Preprocessing
|
MongoDB: WiredTiger cache: bytes, max | Maximum cache size. |
Dependent item | mongodb.wiredtiger.cache.maximumbytes_configured[{#SINGLETON}] Preprocessing
|
MongoDB: WiredTiger cache: max page size at eviction | Maximum page size at eviction. |
Dependent item | mongodb.wiredtiger.cache.maxpagesizeeviction[{#SINGLETON}] Preprocessing
|
MongoDB: WiredTiger cache: modified pages evicted | Number of pages, that have been modified, evicted from the cache. |
Dependent item | mongodb.wiredtiger.cache.modifiedpages_evicted[{#SINGLETON}] Preprocessing
|
MongoDB: WiredTiger cache: pages read into cache | Number of pages read into the cache. |
Dependent item | mongodb.wiredtiger.cache.pagesread[{#SINGLETON}] Preprocessing
|
MongoDB: WiredTiger cache: pages written from cache | Number of pages written from the cache. |
Dependent item | mongodb.wiredtiger.cache.pageswritten[{#SINGLETON}] Preprocessing
|
MongoDB: WiredTiger cache: pages held in cache | Number of pages currently held in the cache. |
Dependent item | mongodb.wiredtiger.cache.pagesin_cache[{#SINGLETON}] Preprocessing
|
MongoDB: WiredTiger cache: pages evicted by application threads, rate | Number of page evicted by application threads per second. |
Dependent item | mongodb.wiredtiger.cache.pagesevicted_threads.rate[{#SINGLETON}] Preprocessing
|
MongoDB: WiredTiger cache: tracked dirty bytes in the cache | Size of the dirty data in the cache. |
Dependent item | mongodb.wiredtiger.cache.trackeddirty_bytes[{#SINGLETON}] Preprocessing
|
MongoDB: WiredTiger cache: unmodified pages evicted | Number of pages, that were not modified, evicted from the cache. |
Dependent item | mongodb.wiredtiger.cache.unmodifiedpages_evicted[{#SINGLETON}] Preprocessing
|
MongoDB: WiredTiger concurrent transactions: read, available | Number of available read tickets (concurrent transactions) remaining. |
Dependent item | mongodb.wiredtiger.concurrenttransactions.read.available[{#SINGLETON}] Preprocessing
|
MongoDB: WiredTiger concurrent transactions: read, out | Number of read tickets (concurrent transactions) in use. |
Dependent item | mongodb.wiredtiger.concurrenttransactions.read.out[{#SINGLETON}] Preprocessing
|
MongoDB: WiredTiger concurrent transactions: read, total tickets | Total number of read tickets (concurrent transactions) available. |
Dependent item | mongodb.wiredtiger.concurrenttransactions.read.totalTickets[{#SINGLETON}] Preprocessing
|
MongoDB: WiredTiger concurrent transactions: write, available | Number of available write tickets (concurrent transactions) remaining. |
Dependent item | mongodb.wiredtiger.concurrenttransactions.write.available[{#SINGLETON}] Preprocessing
|
MongoDB: WiredTiger concurrent transactions: write, out | Number of write tickets (concurrent transactions) in use. |
Dependent item | mongodb.wiredtiger.concurrenttransactions.write.out[{#SINGLETON}] Preprocessing
|
MongoDB: WiredTiger concurrent transactions: write, total tickets | Total number of write tickets (concurrent transactions) available. |
Dependent item | mongodb.wiredtiger.concurrenttransactions.write.totalTickets[{#SINGLETON}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MongoDB: Available WiredTiger read tickets is low | Too few available read tickets. |
max(/MongoDB node by Zabbix agent 2/mongodb.wired_tiger.concurrent_transactions.read.available[{#SINGLETON}],5m)<{$MONGODB.WIRED_TIGER.TICKETS.AVAILABLE.MIN.WARN} |Warning |
||
MongoDB: Available WiredTiger write tickets is low | Too few available write tickets. |
max(/MongoDB node by Zabbix agent 2/mongodb.wired_tiger.concurrent_transactions.write.available[{#SINGLETON}],5m)<{$MONGODB.WIRED_TIGER.TICKETS.AVAILABLE.MIN.WARN} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of InfluxDB monitoring by Zabbix via HTTP and doesn't require any external scripts.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template works with self-hosted InfluxDB instances. Internal service metrics are collected from InfluxDB /metrics endpoint. For organization discovery template need to use Authorization via API token. See docs: https://docs.influxdata.com/influxdb/v2.0/security/tokens/
Don't forget to change the macros {$INFLUXDB.URL}, {$INFLUXDB.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values. NOTE. Some metrics may not be collected depending on your InfluxDB instance version and configuration.
Name | Description | Default |
---|---|---|
{$INFLUXDB.URL} | InfluxDB instance URL |
http://localhost:8086 |
{$INFLUXDB.API.TOKEN} | InfluxDB API Authorization Token |
|
{$INFLUXDB.ORG_NAME.MATCHES} | Filter of discoverable organizations |
.* |
{$INFLUXDB.ORGNAME.NOTMATCHES} | Filter to exclude discovered organizations |
CHANGE_IF_NEEDED |
{$INFLUXDB.TASK.RUN.FAIL.MAX.WARN} | Maximum number of tasks runs failures for trigger expression. |
2 |
{$INFLUXDB.REQ.FAIL.MAX.WARN} | Maximum number of query requests failures for trigger expression. |
2 |
Name | Description | Type | Key and additional info |
---|---|---|---|
InfluxDB: Get instance metrics | HTTP agent | influx.get_metrics Preprocessing
|
|
InfluxDB: Instance status | Get the health of an instance. |
HTTP agent | influx.healthcheck Preprocessing
|
InfluxDB: Boltdb reads, rate | Total number of boltdb reads per second. |
Dependent item | influxdb.boltdb_reads.rate Preprocessing
|
InfluxDB: Boltdb writes, rate | Total number of boltdb writes per second. |
Dependent item | influxdb.boltdb_writes.rate Preprocessing
|
InfluxDB: Buckets, total | Number of total buckets on the server. |
Dependent item | influxdb.buckets.total Preprocessing
|
InfluxDB: Dashboards, total | Number of total dashboards on the server. |
Dependent item | influxdb.dashboards.total Preprocessing
|
InfluxDB: Organizations, total | Number of total organizations on the server. |
Dependent item | influxdb.organizations.total Preprocessing
|
InfluxDB: Scrapers, total | Number of total scrapers on the server. |
Dependent item | influxdb.scrapers.total Preprocessing
|
InfluxDB: Telegraf plugins, total | Number of individual telegraf plugins configured. |
Dependent item | influxdb.telegraf_plugins.total Preprocessing
|
InfluxDB: Telegrafs, total | Number of total telegraf configurations on the server. |
Dependent item | influxdb.telegrafs.total Preprocessing
|
InfluxDB: Tokens, total | Number of total tokens on the server. |
Dependent item | influxdb.tokens.total Preprocessing
|
InfluxDB: Users, total | Number of total users on the server. |
Dependent item | influxdb.users.total Preprocessing
|
InfluxDB: Version | Version of the InfluxDB instance. |
Dependent item | influxdb.version Preprocessing
|
InfluxDB: Uptime | InfluxDB process uptime in seconds. |
Dependent item | influxdb.uptime Preprocessing
|
InfluxDB: Workers currently running | Total number of workers currently running tasks. |
Dependent item | influxdb.taskexecutorruns_active.total Preprocessing
|
InfluxDB: Workers busy, pct | Percent of total available workers that are currently busy. |
Dependent item | influxdb.taskexecutorworkers_busy.pct Preprocessing
|
InfluxDB: Task runs failed, rate | Total number of failure runs across all tasks. |
Dependent item | influxdb.taskexecutorcomplete.failed.rate Preprocessing
|
InfluxDB: Task runs successful, rate | Total number of runs successful completed across all tasks. |
Dependent item | influxdb.taskexecutorcomplete.successful.rate Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
InfluxDB: Health check was failed | The InfluxDB instance is not available or unhealthy. |
last(/InfluxDB by HTTP/influx.healthcheck)=0 |High |
||
InfluxDB: Version has changed | InfluxDB version has changed. Acknowledge to close the problem manually. |
last(/InfluxDB by HTTP/influxdb.version,#1)<>last(/InfluxDB by HTTP/influxdb.version,#2) and length(last(/InfluxDB by HTTP/influxdb.version))>0 |Info |
Manual close: Yes | |
InfluxDB: has been restarted | Uptime is less than 10 minutes. |
last(/InfluxDB by HTTP/influxdb.uptime)<10m |Info |
Manual close: Yes | |
InfluxDB: Too many tasks failure runs | "Number of failure runs completed across all tasks is too high." |
min(/InfluxDB by HTTP/influxdb.task_executor_complete.failed.rate,5m)>{$INFLUXDB.TASK.RUN.FAIL.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Organizations discovery | Discovery of organizations metrics. |
HTTP agent | influxdb.orgs.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
InfluxDB: [{#ORG_NAME}] Query requests bytes, success | Count of bytes received with status 200 per second. |
Dependent item | influxdb.org.queryrequestbytes.success.rate["{#ORG_NAME}"] Preprocessing
|
InfluxDB: [{#ORG_NAME}] Query requests bytes, failed | Count of bytes received with status not 200 per second. |
Dependent item | influxdb.org.queryrequestbytes.failed.rate["{#ORG_NAME}"] Preprocessing
|
InfluxDB: [{#ORG_NAME}] Query requests, failed | Total number of query requests with status not 200 per second. |
Dependent item | influxdb.org.queryrequest.failed.rate["{#ORGNAME}"] Preprocessing
|
InfluxDB: [{#ORG_NAME}] Query requests, success | Total number of query requests with status 200 per second. |
Dependent item | influxdb.org.queryrequest.success.rate["{#ORGNAME}"] Preprocessing
|
InfluxDB: [{#ORG_NAME}] Query response bytes, success | Count of bytes returned with status 200 per second. |
Dependent item | influxdb.org.httpqueryresponsebytes.success.rate["{#ORGNAME}"] Preprocessing
|
InfluxDB: [{#ORG_NAME}] Query response bytes, failed | Count of bytes returned with status not 200 per second. |
Dependent item | influxdb.org.httpqueryresponsebytes.failed.rate["{#ORGNAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
InfluxDB: [{#ORG_NAME}]: Too many requests failures | Too many query requests failed. |
min(/InfluxDB by HTTP/influxdb.org.query_request.failed.rate["{#ORG_NAME}"],5m)>{$INFLUXDB.REQ.FAIL.MAX.WARN} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Official JMX Template for Apache Ignite computing platform. This template is based on the original template developed by Igor Akkuratov, Senior Engineer at GridGain Systems and Apache Ignite Contributor.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template works with standalone and cluster instances. Metrics are collected by JMX. All metrics are discoverable.
-DIGNITE_MBEAN_APPEND_CLASS_LOADER_ID=false
to will exclude one level with Classloader name. You can configure Cache and Data Region metrics which you want using official guide.Name | Description | Default |
---|---|---|
{$IGNITE.PASSWORD} | <secret> |
|
{$IGNITE.USER} | zabbix |
|
{$IGNITE.LLD.FILTER.THREAD.POOL.MATCHES} | Filter of discoverable thread pools. |
.* |
{$IGNITE.LLD.FILTER.THREAD.POOL.NOT_MATCHES} | Filter to exclude discovered thread pools. |
Macro too long. Please see the template. |
{$IGNITE.LLD.FILTER.DATA.REGION.MATCHES} | Filter of discoverable data regions. |
.* |
{$IGNITE.LLD.FILTER.DATA.REGION.NOT_MATCHES} | Filter to exclude discovered data regions. |
^(sysMemPlc|TxLog)$ |
{$IGNITE.LLD.FILTER.CACHE.MATCHES} | Filter of discoverable cache groups. |
.* |
{$IGNITE.LLD.FILTER.CACHE.NOT_MATCHES} | Filter to exclude discovered cache groups. |
CHANGE_IF_NEEDED |
{$IGNITE.THREAD.QUEUE.MAX.WARN} | Threshold for thread pool queue size. Can be used with thread pool name as context. |
1000 |
{$IGNITE.PME.DURATION.MAX.WARN} | The maximum PME duration in ms for warning trigger expression. |
10000 |
{$IGNITE.PME.DURATION.MAX.HIGH} | The maximum PME duration in ms for high trigger expression. |
60000 |
{$IGNITE.THREADS.COUNT.MAX.WARN} | The maximum number of running threads for trigger expression. |
1000 |
{$IGNITE.JOBS.QUEUE.MAX.WARN} | The maximum number of queued jobs for trigger expression. |
10 |
{$IGNITE.CHECKPOINT.PUSED.MAX.HIGH} | The maximum percent of checkpoint buffer utilization for high trigger expression. |
80 |
{$IGNITE.CHECKPOINT.PUSED.MAX.WARN} | The maximum percent of checkpoint buffer utilization for warning trigger expression. |
66 |
{$IGNITE.DATA.REGION.PUSED.MAX.HIGH} | The maximum percent of data region utilization for high trigger expression. |
90 |
{$IGNITE.DATA.REGION.PUSED.MAX.WARN} | The maximum percent of data region utilization for warning trigger expression. |
80 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Ignite kernal metrics | JMX agent | jmx.discovery[beans,"org.apache:group=Kernal,name=IgniteKernal,*"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Ignite [{#JMXIGNITEINSTANCENAME}]: Uptime | Uptime of Ignite instance. |
JMX agent | jmx["{#JMXOBJ}",UpTime] Preprocessing
|
Ignite [{#JMXIGNITEINSTANCENAME}]: Version | Version of Ignite instance. |
JMX agent | jmx["{#JMXOBJ}",FullVersion] Preprocessing
|
Ignite [{#JMXIGNITEINSTANCENAME}]: Local node ID | Unique identifier for this node within grid. |
JMX agent | jmx["{#JMXOBJ}",LocalNodeId] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Ignite [{#JMXIGNITEINSTANCENAME}]: has been restarted | Uptime is less than 10 minutes. |
last(/Ignite by JMX/jmx["{#JMXOBJ}",UpTime])<10m |Info |
Manual close: Yes | |
Ignite [{#JMXIGNITEINSTANCENAME}]: Failed to fetch info data | Zabbix has not received data for items for the last 10 minutes. |
nodata(/Ignite by JMX/jmx["{#JMXOBJ}",UpTime],10m)=1 |Warning |
Manual close: Yes | |
Ignite [{#JMXIGNITEINSTANCENAME}]: Version has changed | Ignite [{#JMXIGNITEINSTANCENAME}] version has changed. Acknowledge to close the problem manually. |
last(/Ignite by JMX/jmx["{#JMXOBJ}",FullVersion],#1)<>last(/Ignite by JMX/jmx["{#JMXOBJ}",FullVersion],#2) and length(last(/Ignite by JMX/jmx["{#JMXOBJ}",FullVersion]))>0 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster metrics | JMX agent | jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterMetricsMXBeanImpl,*"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, Baseline | Total baseline nodes that are registered in the baseline topology. |
JMX agent | jmx["{#JMXOBJ}",TotalBaselineNodes] Preprocessing
|
Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, Active baseline | The number of nodes that are currently active in the baseline topology. |
JMX agent | jmx["{#JMXOBJ}",ActiveBaselineNodes] Preprocessing
|
Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, Client | The number of client nodes in the cluster. |
JMX agent | jmx["{#JMXOBJ}",TotalClientNodes] Preprocessing
|
Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, total | Total number of nodes. |
JMX agent | jmx["{#JMXOBJ}",TotalNodes] Preprocessing
|
Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, Server | The number of server nodes in the cluster. |
JMX agent | jmx["{#JMXOBJ}",TotalServerNodes] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Ignite [{#JMXIGNITEINSTANCENAME}]: Server node left the topology | One or more server node left the topology. Acknowledge to close the problem manually. |
change(/Ignite by JMX/jmx["{#JMXOBJ}",TotalServerNodes])<0 |Warning |
Manual close: Yes | |
Ignite [{#JMXIGNITEINSTANCENAME}]: Server node added to the topology | One or more server node added to the topology. Acknowledge to close the problem manually. |
change(/Ignite by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>0 |Info |
Manual close: Yes | |
Ignite [{#JMXIGNITEINSTANCENAME}]: There are nodes is not in topology | One or more server node left the topology. Acknowledge to close the problem manually. |
last(/Ignite by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>last(/Ignite by JMX/jmx["{#JMXOBJ}",TotalBaselineNodes]) |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Local node metrics | JMX agent | jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterLocalNodeMetricsMXBeanImpl,*"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, current | Number of cancelled jobs that are still running. |
JMX agent | jmx["{#JMXOBJ}",CurrentCancelledJobs] |
Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs rejected, current | Number of jobs rejected after more recent collision resolution operation. |
JMX agent | jmx["{#JMXOBJ}",CurrentRejectedJobs] |
Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs waiting, current | Number of queued jobs currently waiting to be executed. |
JMX agent | jmx["{#JMXOBJ}",CurrentWaitingJobs] |
Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs active, current | Number of currently active jobs concurrently executing on the node. |
JMX agent | jmx["{#JMXOBJ}",CurrentActiveJobs] |
Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs executed, rate | Total number of jobs handled by the node per second. |
JMX agent | jmx["{#JMXOBJ}",TotalExecutedJobs] Preprocessing
|
Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, rate | Total number of jobs cancelled by the node per second. |
JMX agent | jmx["{#JMXOBJ}",TotalCancelledJobs] Preprocessing
|
Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs rejects, rate | Total number of jobs this node rejects during collision resolution operations since node startup per second. |
JMX agent | jmx["{#JMXOBJ}",TotalRejectedJobs] Preprocessing
|
Ignite [{#JMXIGNITEINSTANCENAME}]: PME duration, current | Current PME duration in milliseconds. |
JMX agent | jmx["{#JMXOBJ}",CurrentPmeDuration] |
Ignite [{#JMXIGNITEINSTANCENAME}]: Threads count, current | Current number of live threads. |
JMX agent | jmx["{#JMXOBJ}",CurrentThreadCount] |
Ignite [{#JMXIGNITEINSTANCENAME}]: Heap memory used | Current heap size that is used for object allocation. |
JMX agent | jmx["{#JMXOBJ}",HeapMemoryUsed] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Ignite [{#JMXIGNITEINSTANCENAME}]: Number of queued jobs is too high | Number of queued jobs is over {$IGNITE.JOBS.QUEUE.MAX.WARN}. |
min(/Ignite by JMX/jmx["{#JMXOBJ}",CurrentWaitingJobs],15m) > {$IGNITE.JOBS.QUEUE.MAX.WARN} |Warning |
||
Ignite [{#JMXIGNITEINSTANCENAME}]: PME duration is too long | PME duration is over {$IGNITE.PME.DURATION.MAX.WARN}ms. |
min(/Ignite by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$IGNITE.PME.DURATION.MAX.WARN} |Warning |
Depends on:
|
|
Ignite [{#JMXIGNITEINSTANCENAME}]: PME duration is too long | PME duration is over {$IGNITE.PME.DURATION.MAX.HIGH}ms. Looks like PME is hung. |
min(/Ignite by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$IGNITE.PME.DURATION.MAX.HIGH} |High |
||
Ignite [{#JMXIGNITEINSTANCENAME}]: Number of running threads is too high | Number of running threads is over {$IGNITE.THREADS.COUNT.MAX.WARN}. |
min(/Ignite by JMX/jmx["{#JMXOBJ}",CurrentThreadCount],15m) > {$IGNITE.THREADS.COUNT.MAX.WARN} |Warning |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
TCP discovery SPI | JMX agent | jmx.discovery[beans,"org.apache:group=SPIs,name=TcpDiscoverySpi,*"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Ignite [{#JMXIGNITEINSTANCENAME}]: Coordinator | Current coordinator UUID. |
JMX agent | jmx["{#JMXOBJ}",Coordinator] Preprocessing
|
Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes left | Nodes left count. |
JMX agent | jmx["{#JMXOBJ}",NodesLeft] |
Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes joined | Nodes join count. |
JMX agent | jmx["{#JMXOBJ}",NodesJoined] |
Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes failed | Nodes failed count. |
JMX agent | jmx["{#JMXOBJ}",NodesFailed] |
Ignite [{#JMXIGNITEINSTANCENAME}]: Discovery message worker queue | Message worker queue current size. |
JMX agent | jmx["{#JMXOBJ}",MessageWorkerQueueSize] |
Ignite [{#JMXIGNITEINSTANCENAME}]: Discovery reconnect, rate | Number of times node tries to (re)establish connection to another node per second. |
JMX agent | jmx["{#JMXOBJ}",ReconnectCount] Preprocessing
|
Ignite [{#JMXIGNITEINSTANCENAME}]: TotalProcessedMessages | The number of messages received per second. |
JMX agent | jmx["{#JMXOBJ}",TotalProcessedMessages] Preprocessing
|
Ignite [{#JMXIGNITEINSTANCENAME}]: Discovery messages received, rate | The number of messages processed per second. |
JMX agent | jmx["{#JMXOBJ}",TotalReceivedMessages] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Ignite [{#JMXIGNITEINSTANCENAME}]: Coordinator has changed | Ignite [{#JMXIGNITEINSTANCENAME}] version has changed. Acknowledge to close the problem manually. |
last(/Ignite by JMX/jmx["{#JMXOBJ}",Coordinator],#1)<>last(/Ignite by JMX/jmx["{#JMXOBJ}",Coordinator],#2) and length(last(/Ignite by JMX/jmx["{#JMXOBJ}",Coordinator]))>0 |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
TCP Communication SPI metrics | JMX agent | jmx.discovery[beans,"org.apache:group=SPIs,name=TcpCommunicationSpi,*"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Ignite [{#JMXIGNITEINSTANCENAME}]: Communication outbound messages queue | Outbound messages queue size. |
JMX agent | jmx["{#JMXOBJ}",OutboundMessagesQueueSize] |
Ignite [{#JMXIGNITEINSTANCENAME}]: Communication messages received, rate | The number of messages received per second. |
JMX agent | jmx["{#JMXOBJ}",ReceivedMessagesCount] Preprocessing
|
Ignite [{#JMXIGNITEINSTANCENAME}]: Communication messages sent, rate | The number of messages sent per second. |
JMX agent | jmx["{#JMXOBJ}",SentMessagesCount] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Transaction metrics | JMX agent | jmx.discovery[beans,"org.apache:group=TransactionMetrics,name=TransactionMetricsMxBeanImpl,*"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Ignite [{#JMXIGNITEINSTANCENAME}]: Locked keys | The number of keys locked on the node. |
JMX agent | jmx["{#JMXOBJ}",LockedKeysNumber] |
Ignite [{#JMXIGNITEINSTANCENAME}]: Transactions owner, current | The number of active transactions for which this node is the initiator. |
JMX agent | jmx["{#JMXOBJ}",OwnerTransactionsNumber] |
Ignite [{#JMXIGNITEINSTANCENAME}]: Transactions holding lock, current | The number of active transactions holding at least one key lock. |
JMX agent | jmx["{#JMXOBJ}",TransactionsHoldingLockNumber] |
Ignite [{#JMXIGNITEINSTANCENAME}]: Transactions rolledback, rate | The number of transactions which were rollback per second. |
JMX agent | jmx["{#JMXOBJ}",TransactionsRolledBackNumber] |
Ignite [{#JMXIGNITEINSTANCENAME}]: Transactions committed, rate | The number of transactions which were committed per second. |
JMX agent | jmx["{#JMXOBJ}",TransactionsCommittedNumber] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Cache metrics | JMX agent | jmx.discovery[beans,"org.apache:name=\"org.apache.ignite.internal.processors.cache.CacheLocalMetricsMXBeanImpl\",*"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Cache group [{#JMXGROUP}]: Cache gets, rate | The number of gets to the cache per second. |
JMX agent | jmx["{#JMXOBJ}",CacheGets] Preprocessing
|
Cache group [{#JMXGROUP}]: Cache puts, rate | The number of puts to the cache per second. |
JMX agent | jmx["{#JMXOBJ}",CachePuts] Preprocessing
|
Cache group [{#JMXGROUP}]: Cache removals, rate | The number of removals from the cache per second. |
JMX agent | jmx["{#JMXOBJ}",CacheRemovals] Preprocessing
|
Cache group [{#JMXGROUP}]: Cache hits, pct | Percentage of successful hits. |
JMX agent | jmx["{#JMXOBJ}",CacheHitPercentage] |
Cache group [{#JMXGROUP}]: Cache misses, pct | Percentage of accesses that failed to find anything. |
JMX agent | jmx["{#JMXOBJ}",CacheMissPercentage] |
Cache group [{#JMXGROUP}]: Cache transaction commits, rate | The number of transaction commits per second. |
JMX agent | jmx["{#JMXOBJ}",CacheTxCommits] Preprocessing
|
Cache group [{#JMXGROUP}]: Cache transaction rollbacks, rate | The number of transaction rollback per second. |
JMX agent | jmx["{#JMXOBJ}",CacheTxRollbacks] Preprocessing
|
Cache group [{#JMXGROUP}]: Cache size | The number of non-null values in the cache as a long value. |
JMX agent | jmx["{#JMXOBJ}",CacheSize] |
Cache group [{#JMXGROUP}]: Cache heap entries | The number of entries in heap memory. |
JMX agent | jmx["{#JMXOBJ}",HeapEntriesCount] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m | min(/Ignite by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m)>0 and max(/Ignite by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)=0 |Average |
|||
Cache group [{#JMXGROUP}]: Success transactions less than rollbacks for 5m | min(/Ignite by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m) > max(/Ignite by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m) |Warning |
Depends on:
|
||
Cache group [{#JMXGROUP}]: All entries are in heap | All entries are in heap. Possibly you use eager queries it may cause out of memory exceptions for big caches. Acknowledge to close the problem manually. |
last(/Ignite by JMX/jmx["{#JMXOBJ}",CacheSize])=last(/Ignite by JMX/jmx["{#JMXOBJ}",HeapEntriesCount]) |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Data region metrics | JMX agent | jmx.discovery[beans,"org.apache:group=DataRegionMetrics,*"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Data region {#JMXNAME}: Allocation, rate | Allocation rate (pages per second) averaged across rateTimeInternal. |
JMX agent | jmx["{#JMXOBJ}",AllocationRate] |
Data region {#JMXNAME}: Allocated, bytes | Total size of memory allocated in bytes. |
JMX agent | jmx["{#JMXOBJ}",TotalAllocatedSize] |
Data region {#JMXNAME}: Dirty pages | Number of pages in memory not yet synchronized with persistent storage. |
JMX agent | jmx["{#JMXOBJ}",DirtyPages] |
Data region {#JMXNAME}: Eviction, rate | Eviction rate (pages per second). |
JMX agent | jmx["{#JMXOBJ}",EvictionRate] |
Data region {#JMXNAME}: Size, max | Maximum memory region size defined by its data region. |
JMX agent | jmx["{#JMXOBJ}",MaxSize] |
Data region {#JMXNAME}: Offheap size | Offheap size in bytes. |
JMX agent | jmx["{#JMXOBJ}",OffHeapSize] |
Data region {#JMXNAME}: Offheap used size | Total used offheap size in bytes. |
JMX agent | jmx["{#JMXOBJ}",OffheapUsedSize] |
Data region {#JMXNAME}: Pages fill factor | The percentage of the used space. |
JMX agent | jmx["{#JMXOBJ}",PagesFillFactor] |
Data region {#JMXNAME}: Pages replace, rate | Rate at which pages in memory are replaced with pages from persistent storage (pages per second). |
JMX agent | jmx["{#JMXOBJ}",PagesReplaceRate] |
Data region {#JMXNAME}: Used checkpoint buffer size | Used checkpoint buffer size in bytes. |
JMX agent | jmx["{#JMXOBJ}",UsedCheckpointBufferSize] |
Data region {#JMXNAME}: Checkpoint buffer size | Total size in bytes for checkpoint buffer. |
JMX agent | jmx["{#JMXOBJ}",CheckpointBufferSize] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Data region {#JMXNAME}: Node started to evict pages | You store more data than region can accommodate. Data started to move to disk it can make requests work slower. Acknowledge to close the problem manually. |
min(/Ignite by JMX/jmx["{#JMXOBJ}",EvictionRate],5m)>0 |Info |
Manual close: Yes | |
Data region {#JMXNAME}: Data region utilization is too high | Data region utilization is high. Increase data region size or delete any data. |
min(/Ignite by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/Ignite by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$IGNITE.DATA.REGION.PUSED.MAX.WARN} |Warning |
Depends on:
|
|
Data region {#JMXNAME}: Data region utilization is too high | Data region utilization is high. Increase data region size or delete any data. |
min(/Ignite by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/Ignite by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$IGNITE.DATA.REGION.PUSED.MAX.HIGH} |High |
||
Data region {#JMXNAME}: Pages replace rate more than 0 | There is more data than DataRegionMaxSize. Cluster started to replace pages in memory. Page replacement can slow down operations. |
min(/Ignite by JMX/jmx["{#JMXOBJ}",PagesReplaceRate],5m)>0 |Warning |
||
Data region {#JMXNAME}: Checkpoint buffer utilization is too high | Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization. |
min(/Ignite by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/Ignite by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$IGNITE.CHECKPOINT.PUSED.MAX.WARN} |Warning |
Depends on:
|
|
Data region {#JMXNAME}: Checkpoint buffer utilization is too high | Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization. |
min(/Ignite by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/Ignite by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$IGNITE.CHECKPOINT.PUSED.MAX.HIGH} |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Cache groups | JMX agent | jmx.discovery[beans,"org.apache:group=\"Cache groups\",*"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Cache group [{#JMXNAME}]: Backups | Count of backups configured for cache group. |
JMX agent | jmx["{#JMXOBJ}",Backups] |
Cache group [{#JMXNAME}]: Partitions | Count of partitions for cache group. |
JMX agent | jmx["{#JMXOBJ}",Partitions] |
Cache group [{#JMXNAME}]: Caches | List of caches. |
JMX agent | jmx["{#JMXOBJ}",Caches] Preprocessing
|
Cache group [{#JMXNAME}]: Local node partitions, moving | Count of partitions with state MOVING for this cache group located on this node. |
JMX agent | jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount] |
Cache group [{#JMXNAME}]: Local node partitions, renting | Count of partitions with state RENTING for this cache group located on this node. |
JMX agent | jmx["{#JMXOBJ}",LocalNodeRentingPartitionsCount] |
Cache group [{#JMXNAME}]: Local node entries, renting | Count of entries remains to evict in RENTING partitions located on this node for this cache group. |
JMX agent | jmx["{#JMXOBJ}",LocalNodeRentingEntriesCount] |
Cache group [{#JMXNAME}]: Local node partitions, owning | Count of partitions with state OWNING for this cache group located on this node. |
JMX agent | jmx["{#JMXOBJ}",LocalNodeOwningPartitionsCount] |
Cache group [{#JMXNAME}]: Partition copies, min | Minimum number of partition copies for all partitions of this cache group. |
JMX agent | jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies] |
Cache group [{#JMXNAME}]: Partition copies, max | Maximum number of partition copies for all partitions of this cache group. |
JMX agent | jmx["{#JMXOBJ}",MaximumNumberOfPartitionCopies] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Cache group [{#JMXNAME}]: One or more backups are unavailable | min(/Ignite by JMX/jmx["{#JMXOBJ}",Backups],5m)>=max(/Ignite by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],5m) |Warning |
|||
Cache group [{#JMXNAME}]: List of caches has changed | List of caches has changed. Significant changes have occurred in the cluster. Acknowledge to close the problem manually. |
last(/Ignite by JMX/jmx["{#JMXOBJ}",Caches],#1)<>last(/Ignite by JMX/jmx["{#JMXOBJ}",Caches],#2) and length(last(/Ignite by JMX/jmx["{#JMXOBJ}",Caches]))>0 |Info |
Manual close: Yes | |
Cache group [{#JMXNAME}]: Rebalance in progress | Acknowledge to close the problem manually. |
max(/Ignite by JMX/jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount],30m)>0 |Info |
Manual close: Yes | |
Cache group [{#JMXNAME}]: There is no copy for partitions | max(/Ignite by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],30m)=0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Thread pool metrics | JMX agent | jmx.discovery[beans,"org.apache:group=\"Thread Pools\",*"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Thread pool [{#JMXNAME}]: Queue size | Current size of the execution queue. |
JMX agent | jmx["{#JMXOBJ}",QueueSize] |
Thread pool [{#JMXNAME}]: Pool size | Current number of threads in the pool. |
JMX agent | jmx["{#JMXOBJ}",PoolSize] |
Thread pool [{#JMXNAME}]: Pool size, max | The maximum allowed number of threads. |
JMX agent | jmx["{#JMXOBJ}",MaximumPoolSize] |
Thread pool [{#JMXNAME}]: Pool size, core | The core number of threads. |
JMX agent | jmx["{#JMXOBJ}",CorePoolSize] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Thread pool [{#JMXNAME}]: Too many messages in queue | Number of messages in queue more than {$IGNITE.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}. |
min(/Ignite by JMX/jmx["{#JMXOBJ}",QueueSize],5m) > {$IGNITE.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"} |Average |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Official JMX Template for GridGain In-Memory Computing Platform. This template is based on the original template developed by Igor Akkuratov, Senior Engineer at GridGain Systems and GridGain In-Memory Computing Platform Contributor.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template works with standalone and cluster instances. Metrics are collected by JMX. All metrics are discoverable.
-DIGNITE_MBEAN_APPEND_CLASS_LOADER_ID=false
to will exclude one level with Classloader name. You can configure Cache and Data Region metrics which you want using official guide.Name | Description | Default |
---|---|---|
{$GRIDGAIN.PASSWORD} | <secret> |
|
{$GRIDGAIN.USER} | zabbix |
|
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.MATCHES} | Filter of discoverable thread pools. |
.* |
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.NOT_MATCHES} | Filter to exclude discovered thread pools. |
Macro too long. Please see the template. |
{$GRIDGAIN.LLD.FILTER.DATA.REGION.MATCHES} | Filter of discoverable data regions. |
.* |
{$GRIDGAIN.LLD.FILTER.DATA.REGION.NOT_MATCHES} | Filter to exclude discovered data regions. |
^(sysMemPlc|TxLog)$ |
{$GRIDGAIN.LLD.FILTER.CACHE.MATCHES} | Filter of discoverable cache groups. |
.* |
{$GRIDGAIN.LLD.FILTER.CACHE.NOT_MATCHES} | Filter to exclude discovered cache groups. |
CHANGE_IF_NEEDED |
{$GRIDGAIN.THREAD.QUEUE.MAX.WARN} | Threshold for thread pool queue size. Can be used with thread pool name as context. |
1000 |
{$GRIDGAIN.PME.DURATION.MAX.WARN} | The maximum PME duration in ms for warning trigger expression. |
10000 |
{$GRIDGAIN.PME.DURATION.MAX.HIGH} | The maximum PME duration in ms for high trigger expression. |
60000 |
{$GRIDGAIN.THREADS.COUNT.MAX.WARN} | The maximum number of running threads for trigger expression. |
1000 |
{$GRIDGAIN.JOBS.QUEUE.MAX.WARN} | The maximum number of queued jobs for trigger expression. |
10 |
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH} | The maximum percent of checkpoint buffer utilization for high trigger expression. |
80 |
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN} | The maximum percent of checkpoint buffer utilization for warning trigger expression. |
66 |
{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH} | The maximum percent of data region utilization for high trigger expression. |
90 |
{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN} | The maximum percent of data region utilization for warning trigger expression. |
80 |
Name | Description | Type | Key and additional info |
---|---|---|---|
GridGain kernal metrics | JMX agent | jmx.discovery[beans,"org.apache:group=Kernal,name=IgniteKernal,*"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
GridGain [{#JMXIGNITEINSTANCENAME}]: Uptime | Uptime of GridGain instance. |
JMX agent | jmx["{#JMXOBJ}",UpTime] Preprocessing
|
GridGain [{#JMXIGNITEINSTANCENAME}]: Version | Version of GridGain instance. |
JMX agent | jmx["{#JMXOBJ}",FullVersion] Preprocessing
|
GridGain [{#JMXIGNITEINSTANCENAME}]: Local node ID | Unique identifier for this node within grid. |
JMX agent | jmx["{#JMXOBJ}",LocalNodeId] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
GridGain [{#JMXIGNITEINSTANCENAME}]: has been restarted | Uptime is less than 10 minutes. |
last(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime])<10m |Info |
Manual close: Yes | |
GridGain [{#JMXIGNITEINSTANCENAME}]: Failed to fetch info data | Zabbix has not received data for items for the last 10 minutes. |
nodata(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime],10m)=1 |Warning |
Manual close: Yes | |
GridGain [{#JMXIGNITEINSTANCENAME}]: Version has changed | The GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Acknowledge to close the problem manually. |
last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion]))>0 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster metrics | JMX agent | jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterMetricsMXBeanImpl,*"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Baseline | Total baseline nodes that are registered in the baseline topology. |
JMX agent | jmx["{#JMXOBJ}",TotalBaselineNodes] Preprocessing
|
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Active baseline | The number of nodes that are currently active in the baseline topology. |
JMX agent | jmx["{#JMXOBJ}",ActiveBaselineNodes] Preprocessing
|
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Client | The number of client nodes in the cluster. |
JMX agent | jmx["{#JMXOBJ}",TotalClientNodes] Preprocessing
|
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, total | Total number of nodes. |
JMX agent | jmx["{#JMXOBJ}",TotalNodes] Preprocessing
|
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Server | The number of server nodes in the cluster. |
JMX agent | jmx["{#JMXOBJ}",TotalServerNodes] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
GridGain [{#JMXIGNITEINSTANCENAME}]: Server node left the topology | One or more server node left the topology. Acknowledge to close the problem manually. |
change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])<0 |Warning |
Manual close: Yes | |
GridGain [{#JMXIGNITEINSTANCENAME}]: Server node added to the topology | One or more server node added to the topology. Acknowledge to close the problem manually. |
change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>0 |Info |
Manual close: Yes | |
GridGain [{#JMXIGNITEINSTANCENAME}]: There are nodes is not in topology | One or more server node left the topology. Acknowledge to close the problem manually. |
last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalBaselineNodes]) |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Local node metrics | JMX agent | jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterLocalNodeMetricsMXBeanImpl,*"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, current | Number of cancelled jobs that are still running. |
JMX agent | jmx["{#JMXOBJ}",CurrentCancelledJobs] |
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejected, current | Number of jobs rejected after more recent collision resolution operation. |
JMX agent | jmx["{#JMXOBJ}",CurrentRejectedJobs] |
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs waiting, current | Number of queued jobs currently waiting to be executed. |
JMX agent | jmx["{#JMXOBJ}",CurrentWaitingJobs] |
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs active, current | Number of currently active jobs concurrently executing on the node. |
JMX agent | jmx["{#JMXOBJ}",CurrentActiveJobs] |
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs executed, rate | Total number of jobs handled by the node per second. |
JMX agent | jmx["{#JMXOBJ}",TotalExecutedJobs] Preprocessing
|
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, rate | Total number of jobs cancelled by the node per second. |
JMX agent | jmx["{#JMXOBJ}",TotalCancelledJobs] Preprocessing
|
GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejects, rate | Total number of jobs this node rejects during collision resolution operations since node startup per second. |
JMX agent | jmx["{#JMXOBJ}",TotalRejectedJobs] Preprocessing
|
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration, current | Current PME duration in milliseconds. |
JMX agent | jmx["{#JMXOBJ}",CurrentPmeDuration] |
GridGain [{#JMXIGNITEINSTANCENAME}]: Threads count, current | Current number of live threads. |
JMX agent | jmx["{#JMXOBJ}",CurrentThreadCount] |
GridGain [{#JMXIGNITEINSTANCENAME}]: Heap memory used | Current heap size that is used for object allocation. |
JMX agent | jmx["{#JMXOBJ}",HeapMemoryUsed] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
GridGain [{#JMXIGNITEINSTANCENAME}]: Number of queued jobs is too high | Number of queued jobs is over {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentWaitingJobs],15m) > {$GRIDGAIN.JOBS.QUEUE.MAX.WARN} |Warning |
||
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long | PME duration is over {$GRIDGAIN.PME.DURATION.MAX.WARN}ms. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.WARN} |Warning |
Depends on:
|
|
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long | PME duration is over {$GRIDGAIN.PME.DURATION.MAX.HIGH}ms. Looks like PME is hung. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.HIGH} |High |
||
GridGain [{#JMXIGNITEINSTANCENAME}]: Number of running threads is too high | Number of running threads is over {$GRIDGAIN.THREADS.COUNT.MAX.WARN}. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentThreadCount],15m) > {$GRIDGAIN.THREADS.COUNT.MAX.WARN} |Warning |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
TCP discovery SPI | JMX agent | jmx.discovery[beans,"org.apache:group=SPIs,name=TcpDiscoverySpi,*"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator | Current coordinator UUID. |
JMX agent | jmx["{#JMXOBJ}",Coordinator] Preprocessing
|
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes left | Nodes left count. |
JMX agent | jmx["{#JMXOBJ}",NodesLeft] |
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes joined | Nodes join count. |
JMX agent | jmx["{#JMXOBJ}",NodesJoined] |
GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes failed | Nodes failed count. |
JMX agent | jmx["{#JMXOBJ}",NodesFailed] |
GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery message worker queue | Message worker queue current size. |
JMX agent | jmx["{#JMXOBJ}",MessageWorkerQueueSize] |
GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery reconnect, rate | Number of times node tries to (re)establish connection to another node per second. |
JMX agent | jmx["{#JMXOBJ}",ReconnectCount] Preprocessing
|
GridGain [{#JMXIGNITEINSTANCENAME}]: TotalProcessedMessages | The number of messages received per second. |
JMX agent | jmx["{#JMXOBJ}",TotalProcessedMessages] Preprocessing
|
GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery messages received, rate | The number of messages processed per second. |
JMX agent | jmx["{#JMXOBJ}",TotalReceivedMessages] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator has changed | The GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Acknowledge to close the problem manually. |
last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator]))>0 |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
TCP Communication SPI metrics | JMX agent | jmx.discovery[beans,"org.apache:group=SPIs,name=TcpCommunicationSpi,*"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication outbound messages queue | Outbound messages queue size. |
JMX agent | jmx["{#JMXOBJ}",OutboundMessagesQueueSize] |
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages received, rate | The number of messages received per second. |
JMX agent | jmx["{#JMXOBJ}",ReceivedMessagesCount] Preprocessing
|
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages sent, rate | The number of messages sent per second. |
JMX agent | jmx["{#JMXOBJ}",SentMessagesCount] Preprocessing
|
GridGain [{#JMXIGNITEINSTANCENAME}]: Communication reconnect rate | Gets maximum number of reconnect attempts used when establishing connection with remote nodes per second. |
JMX agent | jmx["{#JMXOBJ}",ReconnectCount,maxNumbers] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Transaction metrics | JMX agent | jmx.discovery[beans,"org.apache:group=TransactionMetrics,name=TransactionMetricsMxBeanImpl,*"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
GridGain [{#JMXIGNITEINSTANCENAME}]: Locked keys | The number of keys locked on the node. |
JMX agent | jmx["{#JMXOBJ}",LockedKeysNumber] |
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions owner, current | The number of active transactions for which this node is the initiator. |
JMX agent | jmx["{#JMXOBJ}",OwnerTransactionsNumber] |
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions holding lock, current | The number of active transactions holding at least one key lock. |
JMX agent | jmx["{#JMXOBJ}",TransactionsHoldingLockNumber] |
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions rolledback, rate | The number of transactions which were rollback per second. |
JMX agent | jmx["{#JMXOBJ}",TransactionsRolledBackNumber] |
GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions committed, rate | The number of transactions which were committed per second. |
JMX agent | jmx["{#JMXOBJ}",TransactionsCommittedNumber] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Cache metrics | JMX agent | jmx.discovery[beans,"org.apache:name=\"org.apache.gridgain.internal.processors.cache.CacheLocalMetricsMXBeanImpl\",*"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Cache group [{#JMXGROUP}]: Cache gets, rate | The number of gets to the cache per second. |
JMX agent | jmx["{#JMXOBJ}",CacheGets] Preprocessing
|
Cache group [{#JMXGROUP}]: Cache puts, rate | The number of puts to the cache per second. |
JMX agent | jmx["{#JMXOBJ}",CachePuts] Preprocessing
|
Cache group [{#JMXGROUP}]: Cache removals, rate | The number of removals from the cache per second. |
JMX agent | jmx["{#JMXOBJ}",CacheRemovals] Preprocessing
|
Cache group [{#JMXGROUP}]: Cache hits, pct | Percentage of successful hits. |
JMX agent | jmx["{#JMXOBJ}",CacheHitPercentage] |
Cache group [{#JMXGROUP}]: Cache misses, pct | Percentage of accesses that failed to find anything. |
JMX agent | jmx["{#JMXOBJ}",CacheMissPercentage] |
Cache group [{#JMXGROUP}]: Cache transaction commits, rate | The number of transaction commits per second. |
JMX agent | jmx["{#JMXOBJ}",CacheTxCommits] Preprocessing
|
Cache group [{#JMXGROUP}]: Cache transaction rollbacks, rate | The number of transaction rollback per second. |
JMX agent | jmx["{#JMXOBJ}",CacheTxRollbacks] Preprocessing
|
Cache group [{#JMXGROUP}]: Cache size | The number of non-null values in the cache as a long value. |
JMX agent | jmx["{#JMXOBJ}",CacheSize] |
Cache group [{#JMXGROUP}]: Cache heap entries | The number of entries in heap memory. |
JMX agent | jmx["{#JMXOBJ}",HeapEntriesCount] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m | min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m)>0 and max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)=0 |Average |
|||
Cache group [{#JMXGROUP}]: Success transactions less than rollbacks for 5m | min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m) > max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m) |Warning |
Depends on:
|
||
Cache group [{#JMXGROUP}]: All entries are in heap | All entries are in heap. Possibly you use eager queries it may cause out of memory exceptions for big caches. Acknowledge to close the problem manually. |
last(/GridGain by JMX/jmx["{#JMXOBJ}",CacheSize])=last(/GridGain by JMX/jmx["{#JMXOBJ}",HeapEntriesCount]) |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Data region metrics | JMX agent | jmx.discovery[beans,"org.apache:group=DataRegionMetrics,*"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Data region {#JMXNAME}: Allocation, rate | Allocation rate (pages per second) averaged across rateTimeInternal. |
JMX agent | jmx["{#JMXOBJ}",AllocationRate] |
Data region {#JMXNAME}: Allocated, bytes | Total size of memory allocated in bytes. |
JMX agent | jmx["{#JMXOBJ}",TotalAllocatedSize] |
Data region {#JMXNAME}: Dirty pages | Number of pages in memory not yet synchronized with persistent storage. |
JMX agent | jmx["{#JMXOBJ}",DirtyPages] |
Data region {#JMXNAME}: Eviction, rate | Eviction rate (pages per second). |
JMX agent | jmx["{#JMXOBJ}",EvictionRate] |
Data region {#JMXNAME}: Size, max | Maximum memory region size defined by its data region. |
JMX agent | jmx["{#JMXOBJ}",MaxSize] |
Data region {#JMXNAME}: Offheap size | Offheap size in bytes. |
JMX agent | jmx["{#JMXOBJ}",OffHeapSize] |
Data region {#JMXNAME}: Offheap used size | Total used offheap size in bytes. |
JMX agent | jmx["{#JMXOBJ}",OffheapUsedSize] |
Data region {#JMXNAME}: Pages fill factor | The percentage of the used space. |
JMX agent | jmx["{#JMXOBJ}",PagesFillFactor] |
Data region {#JMXNAME}: Pages replace, rate | Rate at which pages in memory are replaced with pages from persistent storage (pages per second). |
JMX agent | jmx["{#JMXOBJ}",PagesReplaceRate] |
Data region {#JMXNAME}: Used checkpoint buffer size | Used checkpoint buffer size in bytes. |
JMX agent | jmx["{#JMXOBJ}",UsedCheckpointBufferSize] |
Data region {#JMXNAME}: Checkpoint buffer size | Total size in bytes for checkpoint buffer. |
JMX agent | jmx["{#JMXOBJ}",CheckpointBufferSize] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Data region {#JMXNAME}: Node started to evict pages | You store more data than region can accommodate. Data started to move to disk it can make requests work slower. Acknowledge to close the problem manually. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",EvictionRate],5m)>0 |Info |
Manual close: Yes | |
Data region {#JMXNAME}: Data region utilization is too high | Data region utilization is high. Increase data region size or delete any data. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN} |Warning |
Depends on:
|
|
Data region {#JMXNAME}: Data region utilization is too high | Data region utilization is high. Increase data region size or delete any data. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH} |High |
||
Data region {#JMXNAME}: Pages replace rate more than 0 | There is more data than DataRegionMaxSize. Cluster started to replace pages in memory. Page replacement can slow down operations. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",PagesReplaceRate],5m)>0 |Warning |
||
Data region {#JMXNAME}: Checkpoint buffer utilization is too high | Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN} |Warning |
Depends on:
|
|
Data region {#JMXNAME}: Checkpoint buffer utilization is too high | Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH} |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Cache groups | JMX agent | jmx.discovery[beans,"org.apache:group=\"Cache groups\",*"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Cache group [{#JMXNAME}]: Backups | Count of backups configured for cache group. |
JMX agent | jmx["{#JMXOBJ}",Backups] |
Cache group [{#JMXNAME}]: Partitions | Count of partitions for cache group. |
JMX agent | jmx["{#JMXOBJ}",Partitions] |
Cache group [{#JMXNAME}]: Caches | List of caches. |
JMX agent | jmx["{#JMXOBJ}",Caches] Preprocessing
|
Cache group [{#JMXNAME}]: Local node partitions, moving | Count of partitions with state MOVING for this cache group located on this node. |
JMX agent | jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount] |
Cache group [{#JMXNAME}]: Local node partitions, renting | Count of partitions with state RENTING for this cache group located on this node. |
JMX agent | jmx["{#JMXOBJ}",LocalNodeRentingPartitionsCount] |
Cache group [{#JMXNAME}]: Local node entries, renting | Count of entries remains to evict in RENTING partitions located on this node for this cache group. |
JMX agent | jmx["{#JMXOBJ}",LocalNodeRentingEntriesCount] |
Cache group [{#JMXNAME}]: Local node partitions, owning | Count of partitions with state OWNING for this cache group located on this node. |
JMX agent | jmx["{#JMXOBJ}",LocalNodeOwningPartitionsCount] |
Cache group [{#JMXNAME}]: Partition copies, min | Minimum number of partition copies for all partitions of this cache group. |
JMX agent | jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies] |
Cache group [{#JMXNAME}]: Partition copies, max | Maximum number of partition copies for all partitions of this cache group. |
JMX agent | jmx["{#JMXOBJ}",MaximumNumberOfPartitionCopies] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Cache group [{#JMXNAME}]: One or more backups are unavailable | min(/GridGain by JMX/jmx["{#JMXOBJ}",Backups],5m)>=max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],5m) |Warning |
|||
Cache group [{#JMXNAME}]: List of caches has changed | List of caches has changed. Significant changes have occurred in the cluster. Acknowledge to close the problem manually. |
last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches]))>0 |Info |
Manual close: Yes | |
Cache group [{#JMXNAME}]: Rebalance in progress | Acknowledge to close the problem manually. |
max(/GridGain by JMX/jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount],30m)>0 |Info |
Manual close: Yes | |
Cache group [{#JMXNAME}]: There is no copy for partitions | max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],30m)=0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Thread pool metrics | JMX agent | jmx.discovery[beans,"org.apache:group=\"Thread Pools\",*"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Thread pool [{#JMXNAME}]: Queue size | Current size of the execution queue. |
JMX agent | jmx["{#JMXOBJ}",QueueSize] |
Thread pool [{#JMXNAME}]: Pool size | Current number of threads in the pool. |
JMX agent | jmx["{#JMXOBJ}",PoolSize] |
Thread pool [{#JMXNAME}]: Pool size, max | The maximum allowed number of threads. |
JMX agent | jmx["{#JMXOBJ}",MaximumPoolSize] |
Thread pool [{#JMXNAME}]: Pool size, core | The core number of threads. |
JMX agent | jmx["{#JMXOBJ}",CorePoolSize] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Thread pool [{#JMXNAME}]: Too many messages in queue | Number of messages in queue more than {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",QueueSize],5m) > {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"} |Average |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor CockroachDB nodes by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template CockroachDB node by HTTP
— collects metrics by HTTP agent from Prometheus endpoint and health endpoints.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Internal node metrics are collected from Prometheus /_status/vars endpoint. Node health metrics are collected from /health and /health?ready=1 endpoints. Template doesn't require usage of session token.
Don't forget change macros {$COCKROACHDB.API.SCHEME} according to your situation (secure/insecure node). Also, see the Macros section for a list of macros used to set trigger values.
NOTE. Some metrics may not be collected depending on your CockroachDB version and configuration.
Name | Description | Default |
---|---|---|
{$COCKROACHDB.API.PORT} | The port of CockroachDB API and Prometheus endpoint. |
8080 |
{$COCKROACHDB.API.SCHEME} | Request scheme which may be http or https. |
http |
{$COCKROACHDB.STORE.USED.MIN.WARN} | The warning threshold of the available disk space in percent. |
20 |
{$COCKROACHDB.STORE.USED.MIN.CRIT} | The critical threshold of the available disk space in percent. |
10 |
{$COCKROACHDB.OPEN.FDS.MAX.WARN} | Maximum percentage of used file descriptors. |
80 |
{$COCKROACHDB.CERT.NODE.EXPIRY.WARN} | Number of days until the node certificate expires. |
30 |
{$COCKROACHDB.CERT.CA.EXPIRY.WARN} | Number of days until the CA certificate expires. |
90 |
{$COCKROACHDB.CLOCK.OFFSET.MAX.WARN} | Maximum clock offset of the node against the rest of the cluster in milliseconds for trigger expression. |
300 |
{$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN} | Maximum number of SQL statements errors for trigger expression. |
2 |
Name | Description | Type | Key and additional info |
---|---|---|---|
CockroachDB: Get metrics | Get raw metrics from the Prometheus endpoint. |
HTTP agent | cockroachdb.get_metrics Preprocessing
|
CockroachDB: Get health | Get node /health endpoint |
HTTP agent | cockroachdb.get_health Preprocessing
|
CockroachDB: Get readiness | Get node /health?ready=1 endpoint |
HTTP agent | cockroachdb.get_readiness Preprocessing
|
CockroachDB: Service ping | Check if HTTP/HTTPS service accepts TCP connections. |
Simple check | net.tcp.service["{$COCKROACHDB.API.SCHEME}","{HOST.CONN}","{$COCKROACHDB.API.PORT}"] Preprocessing
|
CockroachDB: Clock offset | Mean clock offset of the node against the rest of the cluster. |
Dependent item | cockroachdb.clock.offset Preprocessing
|
CockroachDB: Version | Build information. |
Dependent item | cockroachdb.version Preprocessing
|
CockroachDB: CPU: System time | System CPU time. |
Dependent item | cockroachdb.cpu.system_time Preprocessing
|
CockroachDB: CPU: User time | User CPU time. |
Dependent item | cockroachdb.cpu.user_time Preprocessing
|
CockroachDB: CPU: Utilization | The CPU utilization expressed in %. |
Dependent item | cockroachdb.cpu.util Preprocessing
|
CockroachDB: Disk: IOPS in progress, rate | Number of disk IO operations currently in progress on this host. |
Dependent item | cockroachdb.disk.iops.in_progress.rate Preprocessing
|
CockroachDB: Disk: Reads, rate | Bytes read from all disks per second since this process started |
Dependent item | cockroachdb.disk.read.rate Preprocessing
|
CockroachDB: Disk: Read IOPS, rate | Number of disk read operations per second across all disks since this process started. |
Dependent item | cockroachdb.disk.iops.read.rate Preprocessing
|
CockroachDB: Disk: Writes, rate | Bytes written to all disks per second since this process started. |
Dependent item | cockroachdb.disk.write.rate Preprocessing
|
CockroachDB: Disk: Write IOPS, rate | Disk write operations per second across all disks since this process started. |
Dependent item | cockroachdb.disk.iops.write.rate Preprocessing
|
CockroachDB: File descriptors: Limit | Open file descriptors soft limit of the process. |
Dependent item | cockroachdb.descriptors.limit Preprocessing
|
CockroachDB: File descriptors: Open | The number of open file descriptors. |
Dependent item | cockroachdb.descriptors.open Preprocessing
|
CockroachDB: GC: Pause time | The amount of processor time used by Go's garbage collector across all nodes. During garbage collection, application code execution is paused. |
Dependent item | cockroachdb.gc.pause_time Preprocessing
|
CockroachDB: GC: Runs, rate | The number of times that Go's garbage collector was invoked per second across all nodes. |
Dependent item | cockroachdb.gc.runs.rate Preprocessing
|
CockroachDB: Go: Goroutines count | Current number of Goroutines. This count should rise and fall based on load. |
Dependent item | cockroachdb.go.goroutines.count Preprocessing
|
CockroachDB: KV transactions: Aborted, rate | Number of aborted KV transactions per second. |
Dependent item | cockroachdb.kv.transactions.aborted.rate Preprocessing
|
CockroachDB: KV transactions: Committed, rate | Number of KV transactions (including 1PC) committed per second. |
Dependent item | cockroachdb.kv.transactions.committed.rate Preprocessing
|
CockroachDB: Live nodes count | The number of live nodes in the cluster (will be 0 if this node is not itself live). |
Dependent item | cockroachdb.live_count Preprocessing
|
CockroachDB: Liveness heartbeats, rate | Number of successful node liveness heartbeats per second from this node. |
Dependent item | cockroachdb.heartbeaths.success.rate Preprocessing
|
CockroachDB: Memory: Allocated by Cgo | Current bytes of memory allocated by the C layer. |
Dependent item | cockroachdb.memory.cgo.allocated Preprocessing
|
CockroachDB: Memory: Allocated by Go | Current bytes of memory allocated by the Go layer. |
Dependent item | cockroachdb.memory.go.allocated Preprocessing
|
CockroachDB: Memory: Managed by Cgo | Total bytes of memory managed by the C layer. |
Dependent item | cockroachdb.memory.cgo.managed Preprocessing
|
CockroachDB: Memory: Managed by Go | Total bytes of memory managed by the Go layer. |
Dependent item | cockroachdb.memory.go.managed Preprocessing
|
CockroachDB: Memory: Total usage | Resident set size (RSS) of memory in use by the node. |
Dependent item | cockroachdb.memory.total Preprocessing
|
CockroachDB: Network: Bytes received, rate | Bytes received per second on all network interfaces since this process started. |
Dependent item | cockroachdb.network.bytes.received.rate Preprocessing
|
CockroachDB: Network: Bytes sent, rate | Bytes sent per second on all network interfaces since this process started. |
Dependent item | cockroachdb.network.bytes.sent.rate Preprocessing
|
CockroachDB: Time series: Sample errors, rate | The number of errors encountered while attempting to write metrics to disk, per second. |
Dependent item | cockroachdb.ts.samples.errors.rate Preprocessing
|
CockroachDB: Time series: Samples written, rate | The number of successfully written metric samples per second. |
Dependent item | cockroachdb.ts.samples.written.rate Preprocessing
|
CockroachDB: Slow requests: DistSender RPCs | Number of RPCs stuck or retrying for a long time. |
Dependent item | cockroachdb.slow_requests.rpc Preprocessing
|
CockroachDB: SQL: Bytes received, rate | Total amount of incoming SQL client network traffic in bytes per second. |
Dependent item | cockroachdb.sql.bytes.received.rate Preprocessing
|
CockroachDB: SQL: Bytes sent, rate | Total amount of outgoing SQL client network traffic in bytes per second. |
Dependent item | cockroachdb.sql.bytes.sent.rate Preprocessing
|
CockroachDB: Memory: Allocated by SQL | Current SQL statement memory usage for root. |
Dependent item | cockroachdb.memory.sql Preprocessing
|
CockroachDB: SQL: Schema changes, rate | Total number of SQL DDL statements successfully executed per second. |
Dependent item | cockroachdb.sql.schema_changes.rate Preprocessing
|
CockroachDB: SQL sessions: Open | Total number of open SQL sessions. |
Dependent item | cockroachdb.sql.sessions Preprocessing
|
CockroachDB: SQL statements: Active | Total number of SQL statements currently active. |
Dependent item | cockroachdb.sql.statements.active Preprocessing
|
CockroachDB: SQL statements: DELETE, rate | A moving average of the number of DELETE statements successfully executed per second. |
Dependent item | cockroachdb.sql.statements.delete.rate Preprocessing
|
CockroachDB: SQL statements: Executed, rate | Number of SQL queries executed per second. |
Dependent item | cockroachdb.sql.statements.executed.rate Preprocessing
|
CockroachDB: SQL statements: Denials, rate | The number of statements denied per second by a feature flag. |
Dependent item | cockroachdb.sql.statements.denials.rate Preprocessing
|
CockroachDB: SQL statements: Active flows distributed, rate | The number of distributed SQL flows currently active per second. |
Dependent item | cockroachdb.sql.statements.flows.active.rate Preprocessing
|
CockroachDB: SQL statements: INSERT, rate | A moving average of the number of INSERT statements successfully executed per second. |
Dependent item | cockroachdb.sql.statements.insert.rate Preprocessing
|
CockroachDB: SQL statements: SELECT, rate | A moving average of the number of SELECT statements successfully executed per second. |
Dependent item | cockroachdb.sql.statements.select.rate Preprocessing
|
CockroachDB: SQL statements: UPDATE, rate | A moving average of the number of UPDATE statements successfully executed per second. |
Dependent item | cockroachdb.sql.statements.update.rate Preprocessing
|
CockroachDB: SQL statements: Contention, rate | Total number of SQL statements that experienced contention per second. |
Dependent item | cockroachdb.sql.statements.contention.rate Preprocessing
|
CockroachDB: SQL statements: Errors, rate | Total number of statements which returned a planning or runtime error per second. |
Dependent item | cockroachdb.sql.statements.errors.rate Preprocessing
|
CockroachDB: SQL transactions: Open | Total number of currently open SQL transactions. |
Dependent item | cockroachdb.sql.transactions.open Preprocessing
|
CockroachDB: SQL transactions: Aborted, rate | Total number of SQL transaction abort errors per second. |
Dependent item | cockroachdb.sql.transactions.aborted.rate Preprocessing
|
CockroachDB: SQL transactions: Committed, rate | Total number of SQL transaction COMMIT statements successfully executed per second. |
Dependent item | cockroachdb.sql.transactions.committed.rate Preprocessing
|
CockroachDB: SQL transactions: Initiated, rate | Total number of SQL transaction BEGIN statements successfully executed per second. |
Dependent item | cockroachdb.sql.transactions.initiated.rate Preprocessing
|
CockroachDB: SQL transactions: Rolled back, rate | Total number of SQL transaction ROLLBACK statements successfully executed per second. |
Dependent item | cockroachdb.sql.transactions.rollbacks.rate Preprocessing
|
CockroachDB: Uptime | Process uptime. |
Dependent item | cockroachdb.uptime Preprocessing
|
CockroachDB: Node certificate expiration date | Node certificate expires at that date. |
Dependent item | cockroachdb.cert.expire_date.node Preprocessing
|
CockroachDB: CA certificate expiration date | CA certificate expires at that date. |
Dependent item | cockroachdb.cert.expire_date.ca Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
CockroachDB: Node is unhealthy | Node's /health endpoint has returned HTTP 500 Internal Server Error which indicates unhealthy mode. |
last(/CockroachDB by HTTP/cockroachdb.get_health) = 500 |Average |
Depends on:
|
|
CockroachDB: Node is not ready | Node's /health?ready=1 endpoint has returned HTTP 503 Service Unavailable. Possible reasons: |
last(/CockroachDB by HTTP/cockroachdb.get_readiness) = 503 and last(/CockroachDB by HTTP/cockroachdb.uptime) > 5m |Average |
Depends on:
|
|
CockroachDB: Service is down | last(/CockroachDB by HTTP/net.tcp.service["{$COCKROACHDB.API.SCHEME}","{HOST.CONN}","{$COCKROACHDB.API.PORT}"]) = 0 |Average |
|||
CockroachDB: Clock offset is too high | Cockroach-measured clock offset is nearing limit (by default, servers kill themselves at 400ms from the mean). |
min(/CockroachDB by HTTP/cockroachdb.clock.offset,5m) > {$COCKROACHDB.CLOCK.OFFSET.MAX.WARN} * 0.001 |Warning |
||
CockroachDB: Version has changed | last(/CockroachDB by HTTP/cockroachdb.version) <> last(/CockroachDB by HTTP/cockroachdb.version,#2) and length(last(/CockroachDB by HTTP/cockroachdb.version)) > 0 |Info |
|||
CockroachDB: Current number of open files is too high | Getting close to open file descriptor limit. |
min(/CockroachDB by HTTP/cockroachdb.descriptors.open,10m) / last(/CockroachDB by HTTP/cockroachdb.descriptors.limit) * 100 > {$COCKROACHDB.OPEN.FDS.MAX.WARN} |Warning |
||
CockroachDB: Node is not executing SQL | Node is not executing SQL despite having connections. |
last(/CockroachDB by HTTP/cockroachdb.sql.sessions) > 0 and last(/CockroachDB by HTTP/cockroachdb.sql.statements.executed.rate) = 0 |Warning |
||
CockroachDB: SQL statements errors rate is too high | min(/CockroachDB by HTTP/cockroachdb.sql.statements.errors.rate,5m) > {$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN} |Warning |
|||
CockroachDB: Node has been restarted | Uptime is less than 10 minutes. |
last(/CockroachDB by HTTP/cockroachdb.uptime) < 10m |Info |
||
CockroachDB: Failed to fetch node data | Zabbix has not received data for items for the last 5 minutes. |
nodata(/CockroachDB by HTTP/cockroachdb.uptime,5m) = 1 |Warning |
Depends on:
|
|
CockroachDB: Node certificate expires soon | Node certificate expires soon. |
(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.node) - now()) / 86400 < {$COCKROACHDB.CERT.NODE.EXPIRY.WARN} |Warning |
||
CockroachDB: CA certificate expires soon | CA certificate expires soon. |
(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.ca) - now()) / 86400 < {$COCKROACHDB.CERT.CA.EXPIRY.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Storage metrics discovery | Discover per store metrics. |
Dependent item | cockroachdb.store.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
CockroachDB: Storage [{#STORE}]: Bytes: Live | Number of logical bytes stored in live key-value pairs on this node. Live data excludes historical and deleted data. |
Dependent item | cockroachdb.storage.bytes.[{#STORE},live] Preprocessing
|
CockroachDB: Storage [{#STORE}]: Bytes: System | Number of physical bytes stored in system key-value pairs. |
Dependent item | cockroachdb.storage.bytes.[{#STORE},system] Preprocessing
|
CockroachDB: Storage [{#STORE}]: Capacity available | Available storage capacity. |
Dependent item | cockroachdb.storage.capacity.[{#STORE},available] Preprocessing
|
CockroachDB: Storage [{#STORE}]: Capacity total | Total storage capacity. This value may be explicitly set using --store. If a store size has not been set, this metric displays the actual disk capacity. |
Dependent item | cockroachdb.storage.capacity.[{#STORE},total] Preprocessing
|
CockroachDB: Storage [{#STORE}]: Capacity used | Disk space in use by CockroachDB data on this node. This excludes the Cockroach binary, operating system, and other system files. |
Dependent item | cockroachdb.storage.capacity.[{#STORE},used] Preprocessing
|
CockroachDB: Storage [{#STORE}]: Capacity available in % | Available storage capacity in %. |
Calculated | cockroachdb.storage.capacity.[{#STORE},available_percent] |
CockroachDB: Storage [{#STORE}]: Replication: Lease holders | Number of lease holders. |
Dependent item | cockroachdb.replication.[{#STORE},lease_holders] Preprocessing
|
CockroachDB: Storage [{#STORE}]: Bytes: Logical | Number of logical bytes stored in key-value pairs on this node. This includes historical and deleted data. |
Dependent item | cockroachdb.storage.bytes.[{#STORE},logical] Preprocessing
|
CockroachDB: Storage [{#STORE}]: Rebalancing: Average queries, rate | Number of kv-level requests received per second by the store, averaged over a large time period as used in rebalancing decisions. |
Dependent item | cockroachdb.rebalancing.queries.average.[{#STORE},rate] Preprocessing
|
CockroachDB: Storage [{#STORE}]: Rebalancing: Average writes, rate | Number of keys written (i.e. applied by raft) per second to the store, averaged over a large time period as used in rebalancing decisions. |
Dependent item | cockroachdb.rebalancing.writes.average.[{#STORE},rate] Preprocessing
|
CockroachDB: Storage [{#STORE}]: Queue processing failures: Consistency, rate | Number of replicas which failed processing in the consistency checker queue per second. |
Dependent item | cockroachdb.queue.processing_failures.consistency.[{#STORE},rate] Preprocessing
|
CockroachDB: Storage [{#STORE}]: Queue processing failures: GC, rate | Number of replicas which failed processing in the GC queue per second. |
Dependent item | cockroachdb.queue.processing_failures.gc.[{#STORE},rate] Preprocessing
|
CockroachDB: Storage [{#STORE}]: Queue processing failures: Raft log, rate | Number of replicas which failed processing in the Raft log queue per second. |
Dependent item | cockroachdb.queue.processing_failures.raftlog.[{#STORE},rate] Preprocessing
|
CockroachDB: Storage [{#STORE}]: Queue processing failures: Raft snapshot, rate | Number of replicas which failed processing in the Raft repair queue per second. |
Dependent item | cockroachdb.queue.processing_failures.raftsnapshot.[{#STORE},rate] Preprocessing
|
CockroachDB: Storage [{#STORE}]: Queue processing failures: Replica GC, rate | Number of replicas which failed processing in the replica GC queue per second. |
Dependent item | cockroachdb.queue.processingfailures.gcreplica.[{#STORE},rate] Preprocessing
|
CockroachDB: Storage [{#STORE}]: Queue processing failures: Replicate, rate | Number of replicas which failed processing in the replicate queue per second. |
Dependent item | cockroachdb.queue.processing_failures.replicate.[{#STORE},rate] Preprocessing
|
CockroachDB: Storage [{#STORE}]: Queue processing failures: Split, rate | Number of replicas which failed processing in the split queue per second. |
Dependent item | cockroachdb.queue.processing_failures.split.[{#STORE},rate] Preprocessing
|
CockroachDB: Storage [{#STORE}]: Queue processing failures: Time series maintenance, rate | Number of replicas which failed processing in the time series maintenance queue per second. |
Dependent item | cockroachdb.queue.processing_failures.tsmaintenance.[{#STORE},rate] Preprocessing
|
CockroachDB: Storage [{#STORE}]: Ranges count | Number of ranges. |
Dependent item | cockroachdb.ranges.[{#STORE},count] Preprocessing
|
CockroachDB: Storage [{#STORE}]: Ranges unavailable | Number of ranges with fewer live replicas than needed for quorum. |
Dependent item | cockroachdb.ranges.[{#STORE},unavailable] Preprocessing
|
CockroachDB: Storage [{#STORE}]: Ranges underreplicated | Number of ranges with fewer live replicas than the replication target. |
Dependent item | cockroachdb.ranges.[{#STORE},underreplicated] Preprocessing
|
CockroachDB: Storage [{#STORE}]: RocksDB read amplification | The average number of real read operations executed per logical read operation. |
Dependent item | cockroachdb.rocksdb.[{#STORE},read_amp] Preprocessing
|
CockroachDB: Storage [{#STORE}]: RocksDB cache hits, rate | Count of block cache hits per second. |
Dependent item | cockroachdb.rocksdb.cache.hits.[{#STORE},rate] Preprocessing
|
CockroachDB: Storage [{#STORE}]: RocksDB cache misses, rate | Count of block cache misses per second. |
Dependent item | cockroachdb.rocksdb.cache.misses.[{#STORE},rate] Preprocessing
|
CockroachDB: Storage [{#STORE}]: RocksDB cache hit ratio | Block cache hit ratio in %. |
Calculated | cockroachdb.rocksdb.cache.[{#STORE},hit_ratio] |
CockroachDB: Storage [{#STORE}]: Replication: Replicas | Number of replicas. |
Dependent item | cockroachdb.replication.replicas.[{#STORE},count] Preprocessing
|
CockroachDB: Storage [{#STORE}]: Replication: Replicas quiesced | Number of quiesced replicas. |
Dependent item | cockroachdb.replication.replicas.[{#STORE},quiesced] Preprocessing
|
CockroachDB: Storage [{#STORE}]: Slow requests: Latch acquisitions | Number of requests that have been stuck for a long time acquiring latches. |
Dependent item | cockroachdb.slowrequests.[{#STORE},latchacquisitions] Preprocessing
|
CockroachDB: Storage [{#STORE}]: Slow requests: Lease acquisitions | Number of requests that have been stuck for a long time acquiring a lease. |
Dependent item | cockroachdb.slowrequests.[{#STORE},leaseacquisitions] Preprocessing
|
CockroachDB: Storage [{#STORE}]: Slow requests: Raft proposals | Number of requests that have been stuck for a long time in raft. |
Dependent item | cockroachdb.slowrequests.[{#STORE},raftproposals] Preprocessing
|
CockroachDB: Storage [{#STORE}]: RocksDB SSTables | The number of SSTables in use. |
Dependent item | cockroachdb.rocksdb.[{#STORE},sstables] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
CockroachDB: Storage [{#STORE}]: Available storage capacity is low | Storage is running low on free space (less than {$COCKROACHDB.STORE.USED.MIN.WARN}% available). |
max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.WARN} |Warning |
Depends on:
|
|
CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low | Storage is running critically low on free space (less than {$COCKROACHDB.STORE.USED.MIN.CRIT}% available). |
max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.CRIT} |Average |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of ClickHouse monitoring by Zabbix via HTTP and doesn't require any external scripts.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Create a user to monitor the service:
create file /etc/clickhouse-server/users.d/zabbix.xml
<yandex>
<users>
<zabbix>
<password>zabbix_pass</password>
<networks incl="networks" />
<profile>web</profile>
<quota>default</quota>
<allow_databases>
<database>test</database>
</allow_databases>
</zabbix>
</users>
</yandex>
Login and password are also set in macros:
Name | Description | Default |
---|---|---|
{$CLICKHOUSE.USER} | zabbix |
|
{$CLICKHOUSE.PASSWORD} | zabbix_pass |
|
{$CLICKHOUSE.NETWORK.ERRORS.MAX.WARN} | Maximum number of network errors for trigger expression |
5 |
{$CLICKHOUSE.PORT} | The port of ClickHouse HTTP endpoint |
8123 |
{$CLICKHOUSE.SCHEME} | Request scheme which may be http or https |
http |
{$CLICKHOUSE.LLD.FILTER.DB.MATCHES} | Filter of discoverable databases |
.* |
{$CLICKHOUSE.LLD.FILTER.DB.NOT_MATCHES} | Filter to exclude discovered databases |
CHANGE_IF_NEEDED |
{$CLICKHOUSE.LLD.FILTER.DICT.MATCHES} | Filter of discoverable dictionaries |
.* |
{$CLICKHOUSE.LLD.FILTER.DICT.NOT_MATCHES} | Filter to exclude discovered dictionaries |
CHANGE_IF_NEEDED |
{$CLICKHOUSE.LLD.FILTER.TABLE.MATCHES} | Filter of discoverable tables |
.* |
{$CLICKHOUSE.LLD.FILTER.TABLE.NOT_MATCHES} | Filter to exclude discovered tables |
CHANGE_IF_NEEDED |
{$CLICKHOUSE.QUERY_TIME.MAX.WARN} | Maximum ClickHouse query time in seconds for trigger expression |
600 |
{$CLICKHOUSE.QUEUE.SIZE.MAX.WARN} | Maximum size of the queue for operations waiting to be performed for trigger expression. |
20 |
{$CLICKHOUSE.LOG_POSITION.DIFF.MAX.WARN} | Maximum diff between logpointer and logmax_index. |
30 |
{$CLICKHOUSE.REPLICA.MAX.WARN} | Replication lag across all tables for trigger expression. |
600 |
{$CLICKHOUSE.DELAYED.FILES.DISTRIBUTED.COUNT.MAX.WARN} | Maximum size of distributed files queue to insert for trigger expression. |
600 |
{$CLICKHOUSE.PARTS.PER.PARTITION.WARN} | Maximum number of parts per partition for trigger expression. |
300 |
{$CLICKHOUSE.DELAYED.INSERTS.MAX.WARN} | Maximum number of delayed inserts for trigger expression. |
0 |
Name | Description | Type | Key and additional info |
---|---|---|---|
ClickHouse: Get system.events | Get information about the number of events that have occurred in the system. |
HTTP agent | clickhouse.system.events Preprocessing
|
ClickHouse: Get system.metrics | Get metrics which can be calculated instantly, or have a current value format JSONEachRow |
HTTP agent | clickhouse.system.metrics Preprocessing
|
ClickHouse: Get system.asynchronous_metrics | Get metrics that are calculated periodically in the background |
HTTP agent | clickhouse.system.asynchronous_metrics Preprocessing
|
ClickHouse: Get system.settings | Get information about settings that are currently in use. |
HTTP agent | clickhouse.system.settings Preprocessing
|
ClickHouse: Longest currently running query time | Get longest running query. |
HTTP agent | clickhouse.process.elapsed |
ClickHouse: Check port availability | Simple check | net.tcp.service[{$CLICKHOUSE.SCHEME},"{HOST.CONN}","{$CLICKHOUSE.PORT}"] Preprocessing
|
|
ClickHouse: Ping | HTTP agent | clickhouse.ping Preprocessing
|
|
ClickHouse: Version | Version of the server |
HTTP agent | clickhouse.version Preprocessing
|
ClickHouse: Revision | Revision of the server. |
Dependent item | clickhouse.revision Preprocessing
|
ClickHouse: Uptime | Number of seconds since ClickHouse server start |
Dependent item | clickhouse.uptime Preprocessing
|
ClickHouse: New queries per second | Number of queries to be interpreted and potentially executed. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries. |
Dependent item | clickhouse.query.rate Preprocessing
|
ClickHouse: New SELECT queries per second | Number of SELECT queries to be interpreted and potentially executed. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries. |
Dependent item | clickhouse.select_query.rate Preprocessing
|
ClickHouse: New INSERT queries per second | Number of INSERT queries to be interpreted and potentially executed. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries. |
Dependent item | clickhouse.insert_query.rate Preprocessing
|
ClickHouse: Delayed insert queries | Number of INSERT queries that are throttled due to high number of active data parts for partition in a MergeTree table. |
Dependent item | clickhouse.insert.delay Preprocessing
|
ClickHouse: Current running queries | Number of executing queries |
Dependent item | clickhouse.query.current Preprocessing
|
ClickHouse: Current running merges | Number of executing background merges |
Dependent item | clickhouse.merge.current Preprocessing
|
ClickHouse: Inserted bytes per second | The number of uncompressed bytes inserted in all tables. |
Dependent item | clickhouse.inserted_bytes.rate Preprocessing
|
ClickHouse: Read bytes per second | Number of bytes (the number of bytes before decompression) read from compressed sources (files, network). |
Dependent item | clickhouse.read_bytes.rate Preprocessing
|
ClickHouse: Inserted rows per second | The number of rows inserted in all tables. |
Dependent item | clickhouse.inserted_rows.rate Preprocessing
|
ClickHouse: Merged rows per second | Rows read for background merges. |
Dependent item | clickhouse.merge_rows.rate Preprocessing
|
ClickHouse: Uncompressed bytes merged per second | Uncompressed bytes that were read for background merges |
Dependent item | clickhouse.merge_bytes.rate Preprocessing
|
ClickHouse: Max count of parts per partition across all tables | Clickhouse MergeTree table engine split each INSERT query to partitions (PARTITION BY expression) and add one or more PARTS per INSERT inside each partition, after that background merge process run. |
Dependent item | clickhouse.max.part.count.for.partition Preprocessing
|
ClickHouse: Current TCP connections | Number of connections to TCP server (clients with native interface). |
Dependent item | clickhouse.connections.tcp Preprocessing
|
ClickHouse: Current HTTP connections | Number of connections to HTTP server. |
Dependent item | clickhouse.connections.http Preprocessing
|
ClickHouse: Current distribute connections | Number of connections to remote servers sending data that was INSERTed into Distributed tables. |
Dependent item | clickhouse.connections.distribute Preprocessing
|
ClickHouse: Current MySQL connections | Number of connections to MySQL server. |
Dependent item | clickhouse.connections.mysql Preprocessing
|
ClickHouse: Current Interserver connections | Number of connections from other replicas to fetch parts. |
Dependent item | clickhouse.connections.interserver Preprocessing
|
ClickHouse: Network errors per second | Network errors (timeouts and connection failures) during query execution, background pool tasks and DNS cache update. |
Dependent item | clickhouse.network.error.rate Preprocessing
|
ClickHouse: ZooKeeper sessions | Number of sessions (connections) to ZooKeeper. Should be no more than one. |
Dependent item | clickhouse.zookeeper.session Preprocessing
|
ClickHouse: ZooKeeper watches | Number of watches (e.g., event subscriptions) in ZooKeeper. |
Dependent item | clickhouse.zookeeper.watch Preprocessing
|
ClickHouse: ZooKeeper requests | Number of requests to ZooKeeper in progress. |
Dependent item | clickhouse.zookeeper.request Preprocessing
|
ClickHouse: ZooKeeper wait time | Time spent in waiting for ZooKeeper operations. |
Dependent item | clickhouse.zookeeper.wait.time Preprocessing
|
ClickHouse: ZooKeeper exceptions per second | Count of ZooKeeper exceptions that does not belong to user/hardware exceptions. |
Dependent item | clickhouse.zookeeper.exceptions.rate Preprocessing
|
ClickHouse: ZooKeeper hardware exceptions per second | Count of ZooKeeper exceptions caused by session moved/expired, connection loss, marshalling error, operation timed out and invalid zhandle state. |
Dependent item | clickhouse.zookeeper.hw_exceptions.rate Preprocessing
|
ClickHouse: ZooKeeper user exceptions per second | Count of ZooKeeper exceptions caused by no znodes, bad version, node exists, node empty and no children for ephemeral. |
Dependent item | clickhouse.zookeeper.user_exceptions.rate Preprocessing
|
ClickHouse: Read syscalls in fly | Number of read (read, pread, io_getevents, etc.) syscalls in fly |
Dependent item | clickhouse.read Preprocessing
|
ClickHouse: Write syscalls in fly | Number of write (write, pwrite, io_getevents, etc.) syscalls in fly |
Dependent item | clickhouse.write Preprocessing
|
ClickHouse: Allocated bytes | Total number of bytes allocated by the application. |
Dependent item | clickhouse.jemalloc.allocated Preprocessing
|
ClickHouse: Resident memory | Maximum number of bytes in physically resident data pages mapped by the allocator, comprising all pages dedicated to allocator metadata, pages backing active allocations, and unused dirty pages. |
Dependent item | clickhouse.jemalloc.resident Preprocessing
|
ClickHouse: Mapped memory | Total number of bytes in active extents mapped by the allocator. |
Dependent item | clickhouse.jemalloc.mapped Preprocessing
|
ClickHouse: Memory used for queries | Total amount of memory (bytes) allocated in currently executing queries. |
Dependent item | clickhouse.memory.tracking Preprocessing
|
ClickHouse: Memory used for background merges | Total amount of memory (bytes) allocated in background processing pool (that is dedicated for background merges, mutations and fetches). Note that this value may include a drift when the memory was allocated in a context of background processing pool and freed in other context or vice-versa. This happens naturally due to caches for tables indexes and doesn't indicate memory leaks. |
Dependent item | clickhouse.memory.tracking.background Preprocessing
|
ClickHouse: Memory used for background moves | Total amount of memory (bytes) allocated in background processing pool (that is dedicated for background moves). Note that this value may include a drift when the memory was allocated in a context of background processing pool and freed in other context or vice-versa. This happens naturally due to caches for tables indexes and doesn't indicate memory leaks. |
Dependent item | clickhouse.memory.tracking.background.moves Preprocessing
|
ClickHouse: Memory used for background schedule pool | Total amount of memory (bytes) allocated in background schedule pool (that is dedicated for bookkeeping tasks of Replicated tables). |
Dependent item | clickhouse.memory.tracking.schedule.pool Preprocessing
|
ClickHouse: Memory used for merges | Total amount of memory (bytes) allocated for background merges. Included in MemoryTrackingInBackgroundProcessingPool. Note that this value may include a drift when the memory was allocated in a context of background processing pool and freed in other context or vice-versa. This happens naturally due to caches for tables indexes and doesn't indicate memory leaks. |
Dependent item | clickhouse.memory.tracking.merges Preprocessing
|
ClickHouse: Current distributed files to insert | Number of pending files to process for asynchronous insertion into Distributed tables. Number of files for every shard is summed. |
Dependent item | clickhouse.distributed.files Preprocessing
|
ClickHouse: Distributed connection fail with retry per second | Connection retries in replicated DB connection pool |
Dependent item | clickhouse.distributed.files.retry.rate Preprocessing
|
ClickHouse: Distributed connection fail with retry per second | Connection failures after all retries in replicated DB connection pool |
Dependent item | clickhouse.distributed.files.fail.rate Preprocessing
|
ClickHouse: Replication lag across all tables | Maximum replica queue delay relative to current time |
Dependent item | clickhouse.replicas.max.absolute.delay Preprocessing
|
ClickHouse: Total replication tasks in queue | Number of replication tasks in queue |
Dependent item | clickhouse.replicas.sum.queue.size Preprocessing
|
ClickHouse: Total number read-only Replicas | Number of Replicated tables that are currently in readonly state due to re-initialization after ZooKeeper session loss or due to startup without ZooKeeper configured. |
Dependent item | clickhouse.replicas.readonly.total Preprocessing
|
ClickHouse: Get replicas info | Get information about replicas. |
HTTP agent | clickhouse.replicas Preprocessing
|
ClickHouse: Get databases info | Get information about databases. |
HTTP agent | clickhouse.databases Preprocessing
|
ClickHouse: Get tables info | Get information about tables. |
HTTP agent | clickhouse.tables Preprocessing
|
ClickHouse: Get dictionaries info | Get information about dictionaries. |
HTTP agent | clickhouse.dictionaries Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
ClickHouse: Configuration has been changed | ClickHouse configuration has been changed. Acknowledge to close the problem manually. |
last(/ClickHouse by HTTP/clickhouse.system.settings,#1)<>last(/ClickHouse by HTTP/clickhouse.system.settings,#2) and length(last(/ClickHouse by HTTP/clickhouse.system.settings))>0 |Info |
Manual close: Yes | |
ClickHouse: There are queries running is long | last(/ClickHouse by HTTP/clickhouse.process.elapsed)>{$CLICKHOUSE.QUERY_TIME.MAX.WARN} |Average |
Manual close: Yes | ||
ClickHouse: Port {$CLICKHOUSE.PORT} is unavailable | last(/ClickHouse by HTTP/net.tcp.service[{$CLICKHOUSE.SCHEME},"{HOST.CONN}","{$CLICKHOUSE.PORT}"])=0 |Average |
Manual close: Yes | ||
ClickHouse: Service is down | last(/ClickHouse by HTTP/clickhouse.ping)=0 or last(/ClickHouse by HTTP/net.tcp.service[{$CLICKHOUSE.SCHEME},"{HOST.CONN}","{$CLICKHOUSE.PORT}"]) = 0 |Average |
Manual close: Yes Depends on:
|
||
ClickHouse: Version has changed | The ClickHouse version has changed. Acknowledge to close the problem manually. |
last(/ClickHouse by HTTP/clickhouse.version,#1)<>last(/ClickHouse by HTTP/clickhouse.version,#2) and length(last(/ClickHouse by HTTP/clickhouse.version))>0 |Info |
Manual close: Yes | |
ClickHouse: Host has been restarted | The host uptime is less than 10 minutes. |
last(/ClickHouse by HTTP/clickhouse.uptime)<10m |Info |
Manual close: Yes | |
ClickHouse: Failed to fetch info data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/ClickHouse by HTTP/clickhouse.uptime,30m)=1 |Warning |
Manual close: Yes Depends on:
|
|
ClickHouse: Too many throttled insert queries | Clickhouse have INSERT queries that are throttled due to high number of active data parts for partition in a MergeTree, please decrease INSERT frequency |
min(/ClickHouse by HTTP/clickhouse.insert.delay,5m)>{$CLICKHOUSE.DELAYED.INSERTS.MAX.WARN} |Warning |
Manual close: Yes | |
ClickHouse: Too many MergeTree parts | Descease INSERT queries frequency. |
min(/ClickHouse by HTTP/clickhouse.max.part.count.for.partition,5m)>{$CLICKHOUSE.PARTS.PER.PARTITION.WARN} * 0.9 |Warning |
Manual close: Yes | |
ClickHouse: Too many network errors | Number of errors (timeouts and connection failures) during query execution, background pool tasks and DNS cache update is too high. |
min(/ClickHouse by HTTP/clickhouse.network.error.rate,5m)>{$CLICKHOUSE.NETWORK.ERRORS.MAX.WARN} |Warning |
||
ClickHouse: Too many ZooKeeper sessions opened | Number of sessions (connections) to ZooKeeper. |
min(/ClickHouse by HTTP/clickhouse.zookeeper.session,5m)>1 |Warning |
||
ClickHouse: Too many distributed files to insert | Clickhouse servers and |
min(/ClickHouse by HTTP/clickhouse.distributed.files,5m)>{$CLICKHOUSE.DELAYED.FILES.DISTRIBUTED.COUNT.MAX.WARN} |Warning |
Manual close: Yes | |
ClickHouse: Replication lag is too high | When replica have too much lag, it can be skipped from Distributed SELECT Queries without errors |
min(/ClickHouse by HTTP/clickhouse.replicas.max.absolute.delay,5m)>{$CLICKHOUSE.REPLICA.MAX.WARN} |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Tables | Info about tables |
Dependent item | clickhouse.tables.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
ClickHouse: {#DB}.{#TABLE}: Get table info | The item gets information about {#TABLE} table of {#DB} database. |
Dependent item | clickhouse.table.info_raw["{#DB}.{#TABLE}"] Preprocessing
|
ClickHouse: {#DB}.{#TABLE}: Bytes | Table size in bytes. Database: {#DB}, table: {#TABLE} |
Dependent item | clickhouse.table.bytes["{#DB}.{#TABLE}"] Preprocessing
|
ClickHouse: {#DB}.{#TABLE}: Parts | Number of parts of the table. Database: {#DB}, table: {#TABLE} |
Dependent item | clickhouse.table.parts["{#DB}.{#TABLE}"] Preprocessing
|
ClickHouse: {#DB}.{#TABLE}: Rows | Number of rows in the table. Database: {#DB}, table: {#TABLE} |
Dependent item | clickhouse.table.rows["{#DB}.{#TABLE}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Replicas | Info about replicas |
Dependent item | clickhouse.replicas.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
ClickHouse: {#DB}.{#TABLE}: Get replicas info | The item gets information about replicas of {#TABLE} table of {#DB} database. |
Dependent item | clickhouse.replica.info_raw["{#DB}.{#TABLE}"] Preprocessing
|
ClickHouse: {#DB}.{#TABLE}: Replica readonly | Whether the replica is in read-only mode. This mode is turned on if the config doesn't have sections with ZooKeeper, if an unknown error occurred when re-initializing sessions in ZooKeeper, and during session re-initialization in ZooKeeper. |
Dependent item | clickhouse.replica.is_readonly["{#DB}.{#TABLE}"] Preprocessing
|
ClickHouse: {#DB}.{#TABLE}: Replica session expired | True if the ZooKeeper session expired |
Dependent item | clickhouse.replica.issessionexpired["{#DB}.{#TABLE}"] Preprocessing
|
ClickHouse: {#DB}.{#TABLE}: Replica future parts | Number of data parts that will appear as the result of INSERTs or merges that haven't been done yet. |
Dependent item | clickhouse.replica.future_parts["{#DB}.{#TABLE}"] Preprocessing
|
ClickHouse: {#DB}.{#TABLE}: Replica parts to check | Number of data parts in the queue for verification. A part is put in the verification queue if there is suspicion that it might be damaged. |
Dependent item | clickhouse.replica.partstocheck["{#DB}.{#TABLE}"] Preprocessing
|
ClickHouse: {#DB}.{#TABLE}: Replica queue size | Size of the queue for operations waiting to be performed. |
Dependent item | clickhouse.replica.queue_size["{#DB}.{#TABLE}"] Preprocessing
|
ClickHouse: {#DB}.{#TABLE}: Replica queue inserts size | Number of inserts of blocks of data that need to be made. |
Dependent item | clickhouse.replica.insertsinqueue["{#DB}.{#TABLE}"] Preprocessing
|
ClickHouse: {#DB}.{#TABLE}: Replica queue merges size | Number of merges waiting to be made. |
Dependent item | clickhouse.replica.mergesinqueue["{#DB}.{#TABLE}"] Preprocessing
|
ClickHouse: {#DB}.{#TABLE}: Replica log max index | Maximum entry number in the log of general activity. (Have a non-zero value only where there is an active session with ZooKeeper). |
Dependent item | clickhouse.replica.logmaxindex["{#DB}.{#TABLE}"] Preprocessing
|
ClickHouse: {#DB}.{#TABLE}: Replica log pointer | Maximum entry number in the log of general activity that the replica copied to its execution queue, plus one. (Have a non-zero value only where there is an active session with ZooKeeper). |
Dependent item | clickhouse.replica.log_pointer["{#DB}.{#TABLE}"] Preprocessing
|
ClickHouse: {#DB}.{#TABLE}: Total replicas | Total number of known replicas of this table. (Have a non-zero value only where there is an active session with ZooKeeper). |
Dependent item | clickhouse.replica.total_replicas["{#DB}.{#TABLE}"] Preprocessing
|
ClickHouse: {#DB}.{#TABLE}: Active replicas | Number of replicas of this table that have a session in ZooKeeper (i.e., the number of functioning replicas). (Have a non-zero value only where there is an active session with ZooKeeper). |
Dependent item | clickhouse.replica.active_replicas["{#DB}.{#TABLE}"] Preprocessing
|
ClickHouse: {#DB}.{#TABLE}: Replica lag | Difference between logmaxindex and log_pointer |
Dependent item | clickhouse.replica.lag["{#DB}.{#TABLE}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
ClickHouse: {#DB}.{#TABLE} Replica is readonly | This mode is turned on if the config doesn't have sections with ZooKeeper, if an unknown error occurred when re-initializing sessions in ZooKeeper, and during session re-initialization in ZooKeeper. |
min(/ClickHouse by HTTP/clickhouse.replica.is_readonly["{#DB}.{#TABLE}"],5m)=1 |Warning |
||
ClickHouse: {#DB}.{#TABLE} Replica session is expired | This mode is turned on if the config doesn't have sections with ZooKeeper, if an unknown error occurred when re-initializing sessions in ZooKeeper, and during session re-initialization in ZooKeeper. |
min(/ClickHouse by HTTP/clickhouse.replica.is_session_expired["{#DB}.{#TABLE}"],5m)=1 |Warning |
||
ClickHouse: {#DB}.{#TABLE}: Too many operations in queue | min(/ClickHouse by HTTP/clickhouse.replica.queue_size["{#DB}.{#TABLE}"],5m)>{$CLICKHOUSE.QUEUE.SIZE.MAX.WARN:"{#TABLE}"} |Warning |
|||
ClickHouse: {#DB}.{#TABLE}: Number of active replicas less than number of total replicas | max(/ClickHouse by HTTP/clickhouse.replica.active_replicas["{#DB}.{#TABLE}"],5m) < last(/ClickHouse by HTTP/clickhouse.replica.total_replicas["{#DB}.{#TABLE}"]) |Warning |
|||
ClickHouse: {#DB}.{#TABLE}: Difference between logmaxindex and log_pointer is too high | min(/ClickHouse by HTTP/clickhouse.replica.lag["{#DB}.{#TABLE}"],5m) > {$CLICKHOUSE.LOG_POSITION.DIFF.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Dictionaries | Info about dictionaries |
Dependent item | clickhouse.dictionaries.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
ClickHouse: Dictionary {#NAME}: Get dictionary info | The item gets information about {#NAME} dictionary. |
Dependent item | clickhouse.dictionary.info_raw["{#NAME}"] Preprocessing
|
ClickHouse: Dictionary {#NAME}: Bytes allocated | The amount of RAM the dictionary uses. |
Dependent item | clickhouse.dictionary.bytes_allocated["{#NAME}"] Preprocessing
|
ClickHouse: Dictionary {#NAME}: Element count | Number of items stored in the dictionary. |
Dependent item | clickhouse.dictionary.element_count["{#NAME}"] Preprocessing
|
ClickHouse: Dictionary {#NAME}: Load factor | The percentage filled in the dictionary (for a hashed dictionary, the percentage filled in the hash table). |
Dependent item | clickhouse.dictionary.load_factor["{#NAME}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Databases | Info about databases |
Dependent item | clickhouse.db.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
ClickHouse: {#DB}: Get DB info | The item gets information about {#DB} database. |
Dependent item | clickhouse.db.info_raw["{#DB}"] Preprocessing
|
ClickHouse: {#DB}: Bytes | Database size in bytes. |
Dependent item | clickhouse.db.bytes["{#DB}"] Preprocessing
|
ClickHouse: {#DB}: Tables | Number of tables in {#DB} database. |
Dependent item | clickhouse.db.tables["{#DB}"] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Apache Cassandra monitoring by Zabbix via JMX and doesn't require any external scripts.
Zabbix version: 6.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template works with standalone and cluster instances. Metrics are collected by JMX.
Name | Description | Default |
---|---|---|
{$CASSANDRA.USER} | zabbix |
|
{$CASSANDRA.PASSWORD} | zabbix |
|
{$CASSANDRA.KEY_SPACE.MATCHES} | Filter of discoverable key spaces |
.* |
{$CASSANDRA.KEYSPACE.NOTMATCHES} | Filter to exclude discovered key spaces |
(system|system_auth|system_distributed|system_schema) |
{$CASSANDRA.PENDING_TASKS.MAX.HIGH} | 500 |
|
{$CASSANDRA.PENDING_TASKS.MAX.WARN} | 350 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Apache Cassandra: Cluster - Nodes down | JMX agent | jmx["org.apache.cassandra.net:type=FailureDetector","DownEndpointCount"] Preprocessing
|
|
Apache Cassandra: Cluster - Nodes up | JMX agent | jmx["org.apache.cassandra.net:type=FailureDetector","UpEndpointCount"] Preprocessing
|
|
Apache Cassandra: Cluster - Name | JMX agent | jmx["org.apache.cassandra.db:type=StorageService","ClusterName"] Preprocessing
|
|
Apache Cassandra: Version | JMX agent | jmx["org.apache.cassandra.db:type=StorageService","ReleaseVersion"] Preprocessing
|
|
Apache Cassandra: Dropped messages - Write (Mutation) | Number of dropped regular writes messages. |
JMX agent | jmx["org.apache.cassandra.metrics:type=DroppedMessage,scope=MUTATION,name=Dropped","Count"] |
Apache Cassandra: Dropped messages - Read | Number of dropped regular reads messages. |
JMX agent | jmx["org.apache.cassandra.metrics:type=DroppedMessage,scope=READ,name=Dropped","Count"] |
Apache Cassandra: Storage - Used (bytes) | Size, in bytes, of the on disk data size this node manages. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Storage,name=Load","Count"] |
Apache Cassandra: Storage - Errors | Number of internal exceptions caught. Under normal exceptions this should be zero. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Storage,name=Exceptions","Count"] |
Apache Cassandra: Storage - Hints | Number of hint messages written to this node since [re]start. Includes one entry for each host to be hinted per hint. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Storage,name=TotalHints","Count"] |
Apache Cassandra: Compaction - Number of completed tasks | Number of completed compactions since server [re]start. |
JMX agent | jmx["org.apache.cassandra.metrics:name=CompletedTasks,type=Compaction","Value"] |
Apache Cassandra: Compaction - Total compactions completed | Throughput of completed compactions since server [re]start. |
JMX agent | jmx["org.apache.cassandra.metrics:name=TotalCompactionsCompleted,type=Compaction","Count"] |
Apache Cassandra: Compaction - Pending tasks | Estimated number of compactions remaining to perform. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Compaction,name=PendingTasks","Value"] |
Apache Cassandra: Commitlog - Pending tasks | Number of commit log messages written but yet to be fsync'd. |
JMX agent | jmx["org.apache.cassandra.metrics:name=PendingTasks,type=CommitLog","Value"] |
Apache Cassandra: Commitlog - Total size | Current size, in bytes, used by all the commit log segments. |
JMX agent | jmx["org.apache.cassandra.metrics:name=TotalCommitLogSize,type=CommitLog","Value"] |
Apache Cassandra: Latency - Read median | Latency read from disk in milliseconds - median. |
JMX agent | jmx["org.apache.cassandra.metrics:name=ReadLatency,type=Table","50thPercentile"] Preprocessing
|
Apache Cassandra: Latency - Read 75 percentile | Latency read from disk in milliseconds - p75. |
JMX agent | jmx["org.apache.cassandra.metrics:name=ReadLatency,type=Table","75thPercentile"] Preprocessing
|
Apache Cassandra: Latency - Read 95 percentile | Latency read from disk in milliseconds - p95. |
JMX agent | jmx["org.apache.cassandra.metrics:name=ReadLatency,type=Table","95thPercentile"] Preprocessing
|
Apache Cassandra: Latency - Write median | Latency write to disk in milliseconds - median. |
JMX agent | jmx["org.apache.cassandra.metrics:name=WriteLatency,type=Table","50thPercentile"] Preprocessing
|
Apache Cassandra: Latency - Write 75 percentile | Latency write to disk in milliseconds - p75. |
JMX agent | jmx["org.apache.cassandra.metrics:name=WriteLatency,type=Table","75thPercentile"] Preprocessing
|
Apache Cassandra: Latency - Write 95 percentile | Latency write to disk in milliseconds - p95. |
JMX agent | jmx["org.apache.cassandra.metrics:name=WriteLatency,type=Table","95thPercentile"] Preprocessing
|
Apache Cassandra: Latency - Client request read median | Total latency serving data to clients in milliseconds - median. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency","50thPercentile"] Preprocessing
|
Apache Cassandra: Latency - Client request read 75 percentile | Total latency serving data to clients in milliseconds - p75. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency","75thPercentile"] Preprocessing
|
Apache Cassandra: Latency - Client request read 95 percentile | Total latency serving data to clients in milliseconds - p95. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency","95thPercentile"] Preprocessing
|
Apache Cassandra: Latency - Client request write median | Total latency serving write requests from clients in milliseconds - median. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Latency","50thPercentile"] Preprocessing
|
Apache Cassandra: Latency - Client request write 75 percentile | Total latency serving write requests from clients in milliseconds - p75. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Latency","75thPercentile"] Preprocessing
|
Apache Cassandra: Latency - Client request write 95 percentile | Total latency serving write requests from clients in milliseconds - p95. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Latency","95thPercentile"] Preprocessing
|
Apache Cassandra: KeyCache - Capacity | Cache capacity in bytes. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Capacity","Value"] Preprocessing
|
Apache Cassandra: KeyCache - Entries | Total number of cache entries. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Entries","Value"] |
Apache Cassandra: KeyCache - HitRate | All time cache hit rate. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=HitRate","Value"] Preprocessing
|
Apache Cassandra: KeyCache - Hits per second | Rate of cache hits. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Hits","Count"] Preprocessing
|
Apache Cassandra: KeyCache - requests per second | Rate of cache requests. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Requests","Count"] Preprocessing
|
Apache Cassandra: KeyCache - Size | Total size of occupied cache, in bytes. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Size","Value"] |
Apache Cassandra: Client connections - Native | Number of clients connected to this nodes native protocol server. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Client,name=connectedNativeClients","Value"] |
Apache Cassandra: Client connections - Trifts | Number of connected to this nodes thrift clients. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Client,name=connectedThriftClients","Value"] |
Apache Cassandra: Client request - Read per second | The number of client requests per second. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency","Count"] Preprocessing
|
Apache Cassandra: Client request - Write per second | The number of local write requests per second. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Latency","Count"] Preprocessing
|
Apache Cassandra: Client request - Write Timeouts | Number of write requests timeouts encountered. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Timeouts","Count"] |
Apache Cassandra: Thread pool.MutationStage - Pending tasks | Number of queued tasks queued up on this pool. MutationStage: Responsible for writes (exclude materialized and counter writes). |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=MutationStage,name=PendingTasks","Value"] |
Apache Cassandra: Thread pool MutationStage - Currently blocked task | Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. MutationStage: Responsible for writes (exclude materialized and counter writes). |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=MutationStage,name=CurrentlyBlockedTasks","Count"] |
Apache Cassandra: Thread pool MutationStage - Total blocked tasks | Number of tasks that were blocked due to queue saturation. MutationStage: Responsible for writes (exclude materialized and counter writes). |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=MutationStage,name=TotalBlockedTasks","Count"] |
Apache Cassandra: Thread pool CounterMutationStage - Pending tasks | Number of queued tasks queued up on this pool. CounterMutationStage: Responsible for counter writes. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=CounterMutationStage,name=PendingTasks","Value"] |
Apache Cassandra: Thread pool CounterMutationStage - Currently blocked task | Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. CounterMutationStage: Responsible for counter writes. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=CounterMutationStage,name=CurrentlyBlockedTasks","Count"] |
Apache Cassandra: Thread pool CounterMutationStage - Total blocked tasks | Number of tasks that were blocked due to queue saturation. CounterMutationStage: Responsible for counter writes. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=CounterMutationStage,name=TotalBlockedTasks","Count"] |
Apache Cassandra: Thread pool ReadStage - Pending tasks | Number of queued tasks queued up on this pool. ReadStage: Local reads run on this thread pool. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ReadStage,name=PendingTasks","Value"] |
Apache Cassandra: Thread pool ReadStage - Currently blocked task | Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. ReadStage: Local reads run on this thread pool. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ReadStage,name=CurrentlyBlockedTasks","Count"] |
Apache Cassandra: Thread pool ReadStage - Total blocked tasks | Number of tasks that were blocked due to queue saturation. ReadStage: Local reads run on this thread pool. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ReadStage,name=TotalBlockedTasks","Count"] |
Apache Cassandra: Thread pool ViewMutationStage - Pending tasks | Number of queued tasks queued up on this pool. ViewMutationStage: Responsible for materialized view writes. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ViewMutationStage,name=PendingTasks","Value"] |
Apache Cassandra: Thread pool ViewMutationStage - Currently blocked task | Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. ViewMutationStage: Responsible for materialized view writes. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ViewMutationStage,name=CurrentlyBlockedTasks","Count"] |
Apache Cassandra: Thread pool ViewMutationStage - Total blocked tasks | Number of tasks that were blocked due to queue saturation. ViewMutationStage: Responsible for materialized view writes. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ViewMutationStage,name=TotalBlockedTasks","Count"] |
Apache Cassandra: Thread pool MemtableFlushWriter - Pending tasks | Number of queued tasks queued up on this pool. MemtableFlushWriter: Writes memtables to disk. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MemtableFlushWriter,name=PendingTasks","Value"] |
Apache Cassandra: Thread pool MemtableFlushWriter - Currently blocked task | Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. MemtableFlushWriter: Writes memtables to disk. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MemtableFlushWriter,name=CurrentlyBlockedTasks","Count"] |
Apache Cassandra: Thread pool MemtableFlushWriter - Total blocked tasks | Number of tasks that were blocked due to queue saturation. MemtableFlushWriter: Writes memtables to disk. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MemtableFlushWriter,name=TotalBlockedTasks","Count"] |
Apache Cassandra: Thread pool HintsDispatcher - Pending tasks | Number of queued tasks queued up on this pool. HintsDispatcher: Performs hinted handoff. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=HintsDispatcher,name=PendingTasks","Value"] |
Apache Cassandra: Thread pool HintsDispatcher - Currently blocked task | Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. HintsDispatcher: Performs hinted handoff. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=HintsDispatcher,name=CurrentlyBlockedTasks","Count"] |
Apache Cassandra: Thread pool HintsDispatcher - Total blocked tasks | Number of tasks that were blocked due to queue saturation. HintsDispatcher: Performs hinted handoff. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=HintsDispatcher,name=TotalBlockedTasks","Count"] |
Apache Cassandra: Thread pool MemtablePostFlush - Pending tasks | Number of queued tasks queued up on this pool. MemtablePostFlush: Cleans up commit log after memtable is written to disk. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MemtablePostFlush,name=PendingTasks","Value"] |
Apache Cassandra: Thread pool MemtablePostFlush - Currently blocked task | Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. MemtablePostFlush: Cleans up commit log after memtable is written to disk. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MemtablePostFlush,name=CurrentlyBlockedTasks","Count"] |
Apache Cassandra: Thread pool MemtablePostFlush - Total blocked tasks | Number of tasks that were blocked due to queue saturation. MemtablePostFlush: Cleans up commit log after memtable is written to disk. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MemtablePostFlush,name=TotalBlockedTasks","Count"] |
Apache Cassandra: Thread pool MigrationStage - Pending tasks | Number of queued tasks queued up on this pool. MigrationStage: Runs schema migrations. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MigrationStage,name=PendingTasks","Value"] |
Apache Cassandra: Thread pool MigrationStage - Currently blocked task | Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. MigrationStage: Runs schema migrations. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MigrationStage,name=CurrentlyBlockedTasks","Count"] |
Apache Cassandra: Thread pool MigrationStage - Total blocked tasks | Number of tasks that were blocked due to queue saturation. MigrationStage: Runs schema migrations. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MigrationStage,name=TotalBlockedTasks","Count"] |
Apache Cassandra: Thread pool MiscStage - Pending tasks | Number of queued tasks queued up on this pool. MiscStage: Miscellaneous tasks run here. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MiscStage,name=PendingTasks","Value"] |
Apache Cassandra: Thread pool MiscStage - Currently blocked task | Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. MiscStage: Miscellaneous tasks run here. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MiscStage,name=CurrentlyBlockedTasks","Count"] |
Apache Cassandra: Thread pool MiscStage - Total blocked tasks | Number of tasks that were blocked due to queue saturation. MiscStage: Miscellaneous tasks run here. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MiscStage,name=TotalBlockedTasks","Count"] |
Apache Cassandra: Thread pool SecondaryIndexManagement - Pending tasks | Number of queued tasks queued up on this pool. SecondaryIndexManagement: Performs updates to secondary indexes. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=SecondaryIndexManagement,name=PendingTasks","Value"] |
Apache Cassandra: Thread pool SecondaryIndexManagement - Currently blocked task | Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. SecondaryIndexManagement: Performs updates to secondary indexes. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=SecondaryIndexManagement,name=CurrentlyBlockedTasks","Count"] |
Apache Cassandra: Thread pool SecondaryIndexManagement - Total blocked tasks | Number of tasks that were blocked due to queue saturation. SecondaryIndexManagement: Performs updates to secondary indexes. |
JMX agent | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=SecondaryIndexManagement,name=TotalBlockedTasks","Count"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Apache Cassandra: There are down nodes in cluster | last(/Apache Cassandra by JMX/jmx["org.apache.cassandra.net:type=FailureDetector","DownEndpointCount"])>0 |Average |
|||
Apache Cassandra: Version has changed | Cassandra version has changed. Acknowledge to close the problem manually. |
last(/Apache Cassandra by JMX/jmx["org.apache.cassandra.db:type=StorageService","ReleaseVersion"],#1)<>last(/Apache Cassandra by JMX/jmx["org.apache.cassandra.db:type=StorageService","ReleaseVersion"],#2) and length(last(/Apache Cassandra by JMX/jmx["org.apache.cassandra.db:type=StorageService","ReleaseVersion"]))>0 |Info |
Manual close: Yes | |
Apache Cassandra: Failed to fetch info data | Zabbix has not received data for items for the last 15 minutes |
nodata(/Apache Cassandra by JMX/jmx["org.apache.cassandra.metrics:type=Storage,name=Load","Count"],15m)=1 |Warning |
||
Apache Cassandra: Too many storage exceptions | min(/Apache Cassandra by JMX/jmx["org.apache.cassandra.metrics:type=Storage,name=Exceptions","Count"],5m)>0 |Warning |
|||
Apache Cassandra: Many pending tasks | min(/Apache Cassandra by JMX/jmx["org.apache.cassandra.metrics:type=Compaction,name=PendingTasks","Value"],15m)>{$CASSANDRA.PENDING_TASKS.MAX.WARN} |Warning |
Depends on:
|
||
Apache Cassandra: Too many pending tasks | min(/Apache Cassandra by JMX/jmx["org.apache.cassandra.metrics:type=Compaction,name=PendingTasks","Value"],15m)>{$CASSANDRA.PENDING_TASKS.MAX.HIGH} |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Tables | Info about keyspaces and tables |
JMX agent | jmx.discovery[beans,"org.apache.cassandra.metrics:type=Table,keyspace=,scope=,name=ReadLatency"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#JMXKEYSPACE}.{#JMXSCOPE}: SS Tables per read 75 percentile | The number of SSTable data files accessed per read - p75. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=SSTablesPerReadHistogram","75thPercentile"] |
{#JMXKEYSPACE}.{#JMXSCOPE}: SS Tables per read 95 percentile | The number of SSTable data files accessed per read - p95. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=SSTablesPerReadHistogram","95thPercentile"] |
{#JMXKEYSPACE}.{#JMXSCOPE}: Tombstone scanned 75 percentile | Number of tombstones scanned per read - p75. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=TombstoneScannedHistogram","75thPercentile"] |
{#JMXKEYSPACE}.{#JMXSCOPE}: Tombstone scanned 95 percentile | Number of tombstones scanned per read - p95. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=TombstoneScannedHistogram","95thPercentile"] |
{#JMXKEYSPACE}.{#JMXSCOPE}: Waiting on free memtable space 75 percentile | The time spent waiting for free memtable space either on- or off-heap - p75. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=WaitingOnFreeMemtableSpace","75thPercentile"] Preprocessing
|
{#JMXKEYSPACE}.{#JMXSCOPE}: Waiting on free memtable space95 percentile | The time spent waiting for free memtable space either on- or off-heap - p95. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=WaitingOnFreeMemtableSpace","95thPercentile"] Preprocessing
|
{#JMXKEYSPACE}.{#JMXSCOPE}: Col update time delta75 percentile | The column update time delta - p75. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=ColUpdateTimeDeltaHistogram","75thPercentile"] Preprocessing
|
{#JMXKEYSPACE}.{#JMXSCOPE}: Col update time delta 95 percentile | The column update time delta - p95. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=ColUpdateTimeDeltaHistogram","95thPercentile"] Preprocessing
|
{#JMXKEYSPACE}.{#JMXSCOPE}: Bloom filter false ratio | The ratio of Bloom filter false positives to total checks. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=BloomFilterFalseRatio","Value"] |
{#JMXKEYSPACE}.{#JMXSCOPE}: Compression ratio | The compression ratio for all SSTables. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=CompressionRatio","Value"] |
{#JMXKEYSPACE}.{#JMXSCOPE}: KeyCache hit rate | The key cache hit rate. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=KeyCacheHitRate","Value"] |
{#JMXKEYSPACE}.{#JMXSCOPE}: Live SS Table | Number of "live" (in use) SSTables. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=LiveSSTableCount","Value"] |
{#JMXKEYSPACE}.{#JMXSCOPE}: Max partition size | The size of the largest compacted partition. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=MaxPartitionSize","Value"] |
{#JMXKEYSPACE}.{#JMXSCOPE}: Mean partition size | The average size of compacted partition. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=MeanPartitionSize","Value"] |
{#JMXKEYSPACE}.{#JMXSCOPE}: Pending compactions | The number of pending compactions. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=PendingCompactions","Value"] |
{#JMXKEYSPACE}.{#JMXSCOPE}: Snapshots size | The disk space truly used by snapshots. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=SnapshotsSize","Value"] |
{#JMXKEYSPACE}.{#JMXSCOPE}: Compaction bytes written | The amount of data that was compacted since (re)start. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=CompactionBytesWritten","Count"] |
{#JMXKEYSPACE}.{#JMXSCOPE}: Bytes flushed | The amount of data that was flushed since (re)start. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=BytesFlushed","Count"] |
{#JMXKEYSPACE}.{#JMXSCOPE}: Pending flushes | The number of pending flushes. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=PendingFlushes","Count"] |
{#JMXKEYSPACE}.{#JMXSCOPE}: Live disk space used | The disk space used by "live" SSTables (only counts in use files). |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=LiveDiskSpaceUsed","Count"] |
{#JMXKEYSPACE}.{#JMXSCOPE}: Disk space used | Disk space used. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=TotalDiskSpaceUsed","Count"] |
{#JMXKEYSPACE}.{#JMXSCOPE}: Out of row cache hits | The number of row cache hits that do not satisfy the query filter and went to disk. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=RowCacheHitOutOfRange","Count"] |
{#JMXKEYSPACE}.{#JMXSCOPE}: Row cache hits | The number of row cache hits. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=RowCacheHit","Count"] |
{#JMXKEYSPACE}.{#JMXSCOPE}: Row cache misses | The number of table row cache misses. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=RowCacheMiss","Count"] |
{#JMXKEYSPACE}.{#JMXSCOPE}: Read latency 75 percentile | Latency read from disk in milliseconds. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=ReadLatency","75thPercentile"] Preprocessing
|
{#JMXKEYSPACE}.{#JMXSCOPE}: Read latency 95 percentile | Latency read from disk in milliseconds. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=ReadLatency","95thPercentile"] Preprocessing
|
{#JMXKEYSPACE}.{#JMXSCOPE}: Read per second | The number of client requests per second. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=ReadLatency","Count"] Preprocessing
|
{#JMXKEYSPACE}.{#JMXSCOPE}: Write latency 75 percentile | Latency write to disk in milliseconds. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=WriteLatency","75thPercentile"] Preprocessing
|
{#JMXKEYSPACE}.{#JMXSCOPE}: Write latency 95 percentile | Latency write to disk in milliseconds. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=WriteLatency","95thPercentile"] Preprocessing
|
{#JMXKEYSPACE}.{#JMXSCOPE}: Write per second | The number of local write requests per second. |
JMX agent | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=WriteLatency","Count"] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums