For Zabbix version: 6.2 and higher
The template to monitor TiKV server of TiDB cluster by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template TiDB TiKV by HTTP
— collects metrics by HTTP agent from TiKV /metrics endpoint.
This template was tested on:
See Zabbix template operation for basic instructions.
This template works with TiKV server of TiDB cluster. Internal service metrics are collected from TiKV /metrics endpoint. Don't forget to change the macros {$TIKV.URL}, {$TIKV.PORT}. Also, see the Macros section for a list of macros used to set trigger values.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$TIKV.COPOCESSOR.ERRORS.MAX.WARN} | Maximum number of coprocessor request errors |
1 |
{$TIKV.PENDING_COMMANDS.MAX.WARN} | Maximum number of pending commands |
1 |
{$TIKV.PENDING_TASKS.MAX.WARN} | Maximum number of tasks currently running by the worker or pending |
1 |
{$TIKV.PORT} | The port of TiKV server metrics web endpoint |
20180 |
{$TIKV.STORE.ERRORS.MAX.WARN} | Maximum number of failure messages |
1 |
{$TIKV.URL} | TiKV server URL |
localhost |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Coprocessor metrics discovery | Discovery coprocessor metrics. |
DEPENDENT | tikv.coprocessor.discovery Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
QPS metrics discovery | Discovery QPS metrics. |
DEPENDENT | tikv.qps.discovery Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
Scheduler metrics discovery | Discovery scheduler metrics. |
DEPENDENT | tikv.scheduler.discovery Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
Server errors discovery | Discovery server errors metrics. |
DEPENDENT | tikv.serverreportfailure.discovery Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: Overrides: Too many unreachable messages trigger |
Group | Name | Description | Type | Key and additional info | |
---|---|---|---|---|---|
TiKV node | TiKV: Store size | The storage size of TiKV instance. |
DEPENDENT | tikv.engine_size Preprocessing: - JSONPATH: |
|
TiKV node | TiKV: Available size | The available capacity of TiKV instance. |
DEPENDENT | tikv.store_size.available Preprocessing: - JSONPATH: |
|
TiKV node | TiKV: Capacity size | The capacity size of TiKV instance. |
DEPENDENT | tikv.store_size.capacity Preprocessing: - JSONPATH: |
|
TiKV node | TiKV: Bytes read | The total bytes of read in TiKV instance. |
DEPENDENT | tikv.engineflowbytes.read Preprocessing: - JSONPATH: `$[?(@.name == "tikvengineflowbytes" && @.labels.db == "kv" && @.labels.type =~ "bytesread |
iterbytesread")].value.sum()` |
TiKV node | TiKV: Bytes write | The total bytes of write in TiKV instance. |
DEPENDENT | tikv.engineflowbytes.write Preprocessing: - JSONPATH: |
|
TiKV node | TiKV: Storage: commands total, rate | Total number of commands received per second. |
DEPENDENT | tikv.storagecommand.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
|
TiKV node | TiKV: CPU util | The CPU usage ratio on TiKV instance. |
DEPENDENT | tikv.cpu.util Preprocessing: - JSONPATH: - CHANGEPERSECOND - MULTIPLIER: |
|
TiKV node | TiKV: RSS memory usage | Resident memory size in bytes. |
DEPENDENT | tikv.rss_bytes Preprocessing: - JSONPATH: |
|
TiKV node | TiKV: Regions, count | The number of regions collected in TiKV instance. |
DEPENDENT | tikv.region_count Preprocessing: - JSONPATH: |
|
TiKV node | TiKV: Regions, leader | The number of leaders in TiKV instance. |
DEPENDENT | tikv.region_leader Preprocessing: - JSONPATH: |
|
TiKV node | TiKV: Total query, rate | The total QPS in TiKV instance. |
DEPENDENT | tikv.grpcmsg.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
|
TiKV node | TiKV: Total query errors, rate | The total number of gRPC message handling failure per second. |
DEPENDENT | tikv.grpcmsgfail.rate Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
|
TiKV node | TiKV: Coprocessor: Errors, rate | Total number of push down request error per second. |
DEPENDENT | tikv.coprocessorrequesterror.rate Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
|
TiKV node | TiKV: Coprocessor: Requests, rate | Total number of coprocessor requests per second. |
DEPENDENT | tikv.coprocessorrequest.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
|
TiKV node | TiKV: Coprocessor: Scan keys, rate | Total number of scan keys observed per request per second. |
DEPENDENT | tikv.coprocessorscankeyssum.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
|
TiKV node | TiKV: Coprocessor: RocksDB ops, rate | Total number of RocksDB internal operations from PerfContext per second. |
DEPENDENT | tikv.coprocessorrocksdbperf.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
|
TiKV node | TiKV: Coprocessor: Response size, rate | The total size of coprocessor response per second. |
DEPENDENT | tikv.coprocessorresponsebytes.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
|
TiKV node | TiKV: Scheduler: Pending commands | The total number of pending commands. The scheduler receives commands from clients, executes them against the MVCC layer storage engine. |
DEPENDENT | tikv.scheduler_contex Preprocessing: - JSONPATH: |
|
TiKV node | TiKV: Scheduler: Busy, rate | The total count of too busy schedulers per second. |
DEPENDENT | tikv.schedulertoobusy.rate Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
|
TiKV node | TiKV: Scheduler: Commands total, rate | Total number of commands per second. |
DEPENDENT | tikv.schedulercommands.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 - CHANGEPERSECOND |
|
TiKV node | TiKV: Scheduler: Low priority commands total, rate | Total count of low priority commands per second. |
DEPENDENT | tikv.commandspri.low.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
|
TiKV node | TiKV: Scheduler: Normal priority commands total, rate | Total count of normal priority commands per second. |
DEPENDENT | tikv.commandspri.normal.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
|
TiKV node | TiKV: Scheduler: High priority commands total, rate | Total count of high priority commands per second. |
DEPENDENT | tikv.commandspri.high.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
|
TiKV node | TiKV: Snapshot: Pending tasks | The number of tasks currently running by the worker or pending. |
DEPENDENT | tikv.workerpendingtask Preprocessing: - JSONPATH: |
|
TiKV node | TiKV: Snapshot: Sending | The total amount of raftstore snapshot traffic. |
DEPENDENT | tikv.snapshot.sending Preprocessing: - JSONPATH: |
|
TiKV node | TiKV: Snapshot: Receiving | The total amount of raftstore snapshot traffic. |
DEPENDENT | tikv.snapshot.receiving Preprocessing: - JSONPATH: |
|
TiKV node | TiKV: Snapshot: Applying | The total amount of raftstore snapshot traffic. |
DEPENDENT | tikv.snapshot.applying Preprocessing: - JSONPATH: |
|
TiKV node | TiKV: Uptime | The runtime of each TiKV instance. |
DEPENDENT | tikv.uptime Preprocessing: - JSONPATH: - JAVASCRIPT: |
|
TiKV node | TiKV: Server: failure messages total, rate | Total number of reporting failure messages per second. |
DEPENDENT | tikv.messages.failure.rate Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
|
TiKV node | TiKV: Query: {#TYPE}, rate | The QPS per command in TiKV instance. |
DEPENDENT | tikv.grpcmsg.rate[{#TYPE}] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> |
|
TiKV node | TiKV: Coprocessor: {#REQ_TYPE} errors, rate | Total number of push down request error per second. |
DEPENDENT | tikv.coprocessorrequesterror.rate[{#REQTYPE}] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - CHANGEPERSECOND |
|
TiKV node | TiKV: Coprocessor: {#REQ_TYPE} requests, rate | Total number of coprocessor requests per second. |
DEPENDENT | tikv.coprocessorrequest.rate[{#REQTYPE}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
|
TiKV node | TiKV: Coprocessor: {#REQ_TYPE} scan keys, rate | Total number of scan keys observed per request per second. |
DEPENDENT | tikv.coprocessorscankeys.rate[{#REQTYPE}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
|
TiKV node | TiKV: Coprocessor: {#REQ_TYPE} RocksDB ops, rate | Total number of RocksDB internal operations from PerfContext per second. |
DEPENDENT | tikv.coprocessorrocksdbperf.rate[{#REQTYPE}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
|
TiKV node | TiKV: Scheduler: commands {#STAGE}, rate | Total number of commands on each stage per second. |
DEPENDENT | tikv.schedulerstage.rate[{#STAGE}] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 - CHANGEPERSECOND |
|
TiKV node | TiKV: Storeid {#STOREID}: failure messages "{#TYPE}", rate | Total number of reporting failure messages. The metric has two labels: type and storeid. type represents the failure type, and storeid represents the destination peer store id. |
DEPENDENT | tikv.messages.failure.rate[{#STOREID},{#TYPE}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
|
Zabbix raw items | TiKV: Get instance metrics | Get TiKV instance metrics. |
HTTP_AGENT | tikv.getmetrics Preprocessing: - CHECK NOTSUPPORTED⛔️ON FAIL:DISCARD_VALUE -> - PROMETHEUSTOJSON |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
TiKV: Too many coprocessor request error | - |
min(/TiDB TiKV by HTTP/tikv.coprocessor_request_error.rate,5m)>{$TIKV.COPOCESSOR.ERRORS.MAX.WARN} |
WARNING | |
TiKV: Too many pending commands | - |
min(/TiDB TiKV by HTTP/tikv.scheduler_contex,5m)>{$TIKV.PENDING_COMMANDS.MAX.WARN} |
AVERAGE | |
TiKV: Too many pending tasks | - |
min(/TiDB TiKV by HTTP/tikv.worker_pending_task,5m)>{$TIKV.PENDING_TASKS.MAX.WARN} |
AVERAGE | |
TiKV: has been restarted | Uptime is less than 10 minutes. |
last(/TiDB TiKV by HTTP/tikv.uptime)<10m |
INFO | Manual close: YES |
TiKV: Storeid {#STOREID}: Too many failure messages "{#TYPE}" | Indicates that the remote TiKV cannot be connected. |
min(/TiDB TiKV by HTTP/tikv.messages.failure.rate[{#STORE_ID},{#TYPE}],5m)>{$TIKV.STORE.ERRORS.MAX.WARN} |
WARNING |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
The template to monitor TiDB server of TiDB cluster by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template TiDB by HTTP
— collects metrics by HTTP agent from PD /metrics endpoint and from monitoring API.
See https://docs.pingcap.com/tidb/stable/tidb-monitoring-api.
This template was tested on:
See Zabbix template operation for basic instructions.
This template works with TiDB server of TiDB cluster. Internal service metrics are collected from TiDB /metrics endpoint and from monitoring API. See https://docs.pingcap.com/tidb/stable/tidb-monitoring-api. Don't forget to change the macros {$TIDB.URL}, {$TIDB.PORT}. Also, see the Macros section for a list of macros used to set trigger values.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$TIDB.DDL.WAITING.MAX.WARN} | Maximum number of DDL tasks that are waiting |
5 |
{$TIDB.GC_ACTIONS.ERRORS.MAX.WARN} | Maximum number of GC-related operations failures |
1 |
{$TIDB.HEAP.USAGE.MAX.WARN} | Maximum heap memory used |
10G |
{$TIDB.MONITORKEEPALIVE.MAX.WARN} | Minimum number of keep alive operations |
10 |
{$TIDB.OPEN.FDS.MAX.WARN} | Maximum percentage of used file descriptors |
90 |
{$TIDB.PORT} | The port of TiDB server metrics web endpoint |
10080 |
{$TIDB.REGION_ERROR.MAX.WARN} | Maximum number of region related errors |
50 |
{$TIDB.SCHEMALEASEERRORS.MAX.WARN} | Maximum number of schema lease errors |
0 |
{$TIDB.SCHEMALOADERRORS.MAX.WARN} | Maximum number of load schema errors |
1 |
{$TIDB.TIMEJUMPBACK.MAX.WARN} | Maximum number of times that the operating system rewinds every second |
1 |
{$TIDB.URL} | TiDB server URL |
localhost |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
GC action results discovery | Discovery GC action results metrics. |
DEPENDENT | tidb.tikvclientgcaction.discovery Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: Overrides: Failed GC-related operations trigger |
KV backoff discovery | Discovery KV backoff specific metrics. |
DEPENDENT | tidb.tikvclientbackoff.discovery Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARD UNCHANGED_HEARTBEAT:1h |
KV metrics discovery | Discovery KV specific metrics. |
DEPENDENT | tidb.kvops.discovery Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARD UNCHANGED_HEARTBEAT:1h |
Lock resolves discovery | Discovery lock resolves specific metrics. |
DEPENDENT | tidb.tikvclientlockresolveraction.discovery Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARD UNCHANGED_HEARTBEAT:1h |
QPS metrics discovery | Discovery QPS specific metrics. |
DEPENDENT | tidb.qps.discovery Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
Statement metrics discovery | Discovery statement specific metrics. |
DEPENDENT | tidb.statement.discover Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
TiDB node | TiDB: Status | Status of PD instance. |
DEPENDENT | tidb.status Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
TiDB node | TiDB: Total "error" server query, rate | The number of queries on TiDB instance per second with failure of command execution results. |
DEPENDENT | tidb.serverquery.error.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
TiDB node | TiDB: Total "ok" server query, rate | The number of queries on TiDB instance per second with success of command execution results. |
DEPENDENT | tidb.serverquery.ok.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
TiDB node | TiDB: Total server query, rate | The number of queries per second on TiDB instance. |
DEPENDENT | tidb.serverquery.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
TiDB node | TiDB: SQL statements, rate | The total number of SQL statements executed per second. |
DEPENDENT | tidb.statementtotal.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
TiDB node | TiDB: Failed Query, rate | The number of error occurred when executing SQL statements per second (such as syntax errors and primary key conflicts). |
DEPENDENT | tidb.executeerror.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - CHANGEPERSECOND |
TiDB node | TiDB: KV commands, rate | The number of executed KV commands per second. |
DEPENDENT | tidb.tikvclienttxn.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
TiDB node | TiDB: PD TSO commands, rate | The number of TSO commands that TiDB obtains from PD per second. |
DEPENDENT | tidb.pdtsocmd.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
TiDB node | TiDB: PD TSO requests, rate | The number of TSO requests that TiDB obtains from PD per second. |
DEPENDENT | tidb.pdtsorequest.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
TiDB node | TiDB: TiClient region errors, rate | The number of region related errors returned by TiKV per second. |
DEPENDENT | tidb.tikvclientregionerr.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
TiDB node | TiDB: Lock resolves, rate | The number of DDL tasks that are waiting. |
DEPENDENT | tidb.tikvclientlockresolveraction.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
TiDB node | TiDB: DDL waiting jobs | The number of TiDB operations that resolve locks per second. When TiDB's read or write request encounters a lock, it tries to resolve the lock. |
DEPENDENT | tidb.ddlwaitingjobs Preprocessing: - JSONPATH: |
TiDB node | TiDB: Load schema total, rate | The statistics of the schemas that TiDB obtains from TiKV per second. |
DEPENDENT | tidb.domainloadschema.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
TiDB node | TiDB: Load schema failed, rate | The total number of failures to reload the latest schema information in TiDB per second. |
DEPENDENT | tidb.domainloadschema.failed.rate Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
TiDB node | TiDB: Schema lease "outdate" errors , rate | The number of schema lease errors per second. "outdate" errors means that the schema cannot be updated, which is a more serious error and triggers an alert. |
DEPENDENT | tidb.sessionschemaleaseerror.outdate.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - CHANGEPERSECOND |
TiDB node | TiDB: Schema lease "change" errors, rate | The number of schema lease errors per second. "change" means that the schema has changed |
DEPENDENT | tidb.sessionschemaleaseerror.change.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - CHANGEPERSECOND |
TiDB node | TiDB: KV backoff, rate | The number of errors returned by TiKV. |
DEPENDENT | tidb.tikvclientbackoff.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - CHANGEPERSECOND |
TiDB node | TiDB: Keep alive, rate | The number of times that the metrics are refreshed on TiDB instance per minute. |
DEPENDENT | tidb.monitorkeepalive.rate Preprocessing: - JSONPATH: ⛔️ONFAIL: - SIMPLECHANGE |
TiDB node | TiDB: Server connections | The connection number of current TiDB instance. |
DEPENDENT | tidb.tidbserverconnections Preprocessing: - JSONPATH: |
TiDB node | TiDB: Heap memory usage | Number of heap bytes that are in use. |
DEPENDENT | tidb.heap_bytes Preprocessing: - JSONPATH: |
TiDB node | TiDB: RSS memory usage | Resident memory size in bytes. |
DEPENDENT | tidb.rss_bytes Preprocessing: - JSONPATH: |
TiDB node | TiDB: Goroutine count | The number of Goroutines on TiDB instance. |
DEPENDENT | tidb.goroutines Preprocessing: - JSONPATH: |
TiDB node | TiDB: Open file descriptors | Number of open file descriptors. |
DEPENDENT | tidb.processopenfds Preprocessing: - JSONPATH: |
TiDB node | TiDB: Open file descriptors, max | Maximum number of open file descriptors. |
DEPENDENT | tidb.processmaxfds Preprocessing: - JSONPATH: |
TiDB node | TiDB: CPU | Total user and system CPU usage ratio. |
DEPENDENT | tidb.cpu.util Preprocessing: - JSONPATH: - CHANGEPERSECOND - MULTIPLIER: |
TiDB node | TiDB: Uptime | The runtime of each TiDB instance. |
DEPENDENT | tidb.uptime Preprocessing: - JSONPATH: - JAVASCRIPT: |
TiDB node | TiDB: Version | Version of the TiDB instance. |
DEPENDENT | tidb.version Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
TiDB node | TiDB: Time jump back, rate | The number of times that the operating system rewinds every second. |
DEPENDENT | tidb.monitortimejumpback.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
TiDB node | TiDB: Server critical error, rate | The number of critical errors occurred in TiDB per second. |
DEPENDENT | tidb.tidbservercriticalerrortotal.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
TiDB node | TiDB: Server panic, rate | The number of panics occurred in TiDB per second. |
DEPENDENT | tidb.tidbserverpanictotal.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - CHANGEPERSECOND |
TiDB node | TiDB: Server query "OK": {#TYPE}, rate | The number of queries on TiDB instance per second with success of command execution results. |
DEPENDENT | tidb.serverquery.ok.rate[{#TYPE}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
TiDB node | TiDB: Server query "Error": {#TYPE}, rate | The number of queries on TiDB instance per second with failure of command execution results. |
DEPENDENT | tidb.serverquery.error.rate[{#TYPE}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
TiDB node | TiDB: SQL statements: {#TYPE}, rate | The number of SQL statements executed per second. |
DEPENDENT | tidb.statement.rate[{#TYPE}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
TiDB node | TiDB: KV Commands: {#TYPE}, rate | The number of executed KV commands per second. |
DEPENDENT | tidb.tikvclienttxn.rate[{#TYPE}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
TiDB node | TiDB: Lock resolves: {#TYPE}, rate | The number of TiDB operations that resolve locks per second. When TiDB's read or write request encounters a lock, it tries to resolve the lock. |
DEPENDENT | tidb.tikvclientlockresolveraction.rate[{#TYPE}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
TiDB node | TiDB: KV backoff: {#TYPE}, rate | The number of TiDB operations that resolve locks per second. When TiDB's read or write request encounters a lock, it tries to resolve the lock. |
DEPENDENT | tidb.tikvclientbackoff.rate[{#TYPE}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
TiDB node | TiDB: GC action result: {#TYPE}, rate | The number of results of GC-related operations per second. |
DEPENDENT | tidb.tikvclientgcaction.rate[{#TYPE}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Zabbix raw items | TiDB: Get instance metrics | Get TiDB instance metrics. |
HTTP_AGENT | tidb.getmetrics Preprocessing: - CHECK NOTSUPPORTED⛔️ON FAIL:DISCARD_VALUE -> - PROMETHEUSTOJSON |
Zabbix raw items | TiDB: Get instance status | Get TiDB instance status info. |
HTTP_AGENT | tidb.getstatus Preprocessing: - CHECK NOTSUPPORTED⛔️ON FAIL:CUSTOM_VALUE -> {"status": "0"} |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
TiDB: Instance is not responding | - |
last(/TiDB by HTTP/tidb.status)=0 |
AVERAGE | |
TiDB: Too many region related errors | - |
min(/TiDB by HTTP/tidb.tikvclient_region_err.rate,5m)>{$TIDB.REGION_ERROR.MAX.WARN} |
AVERAGE | |
TiDB: Too many DDL waiting jobs | - |
min(/TiDB by HTTP/tidb.ddl_waiting_jobs,5m)>{$TIDB.DDL.WAITING.MAX.WARN} |
WARNING | |
TiDB: Too many schema lease errors | - |
min(/TiDB by HTTP/tidb.domain_load_schema.failed.rate,5m)>{$TIDB.SCHEMA_LOAD_ERRORS.MAX.WARN} |
AVERAGE | |
TiDB: Too many schema lease errors | The latest schema information is not reloaded in TiDB within one lease. |
min(/TiDB by HTTP/tidb.session_schema_lease_error.outdate.rate,5m)>{$TIDB.SCHEMA_LEASE_ERRORS.MAX.WARN} |
AVERAGE | |
TiDB: Too few keep alive operations | Indicates whether the TiDB process still exists. If the number of times for tidbmonitorkeepalivetotal increases less than 10 per minute, the TiDB process might already exit and an alert is triggered. |
max(/TiDB by HTTP/tidb.monitor_keep_alive.rate,5m)<{$TIDB.MONITOR_KEEP_ALIVE.MAX.WARN} |
AVERAGE | |
TiDB: Heap memory usage is too high | - |
min(/TiDB by HTTP/tidb.heap_bytes,5m)>{$TIDB.HEAP.USAGE.MAX.WARN} |
WARNING | |
TiDB: Current number of open files is too high | Heavy file descriptor usage (i.e., near the process's file descriptor limit) indicates a potential file descriptor exhaustion issue. |
min(/TiDB by HTTP/tidb.process_open_fds,5m)/last(/TiDB by HTTP/tidb.process_max_fds)*100>{$TIDB.OPEN.FDS.MAX.WARN} |
WARNING | |
TiDB: has been restarted | Uptime is less than 10 minutes. |
last(/TiDB by HTTP/tidb.uptime)<10m |
INFO | Manual close: YES |
TiDB: Version has changed | TiDB version has changed. Ack to close. |
last(/TiDB by HTTP/tidb.version,#1)<>last(/TiDB by HTTP/tidb.version,#2) and length(last(/TiDB by HTTP/tidb.version))>0 |
INFO | Manual close: YES |
TiDB: Too many time jump backs | - |
min(/TiDB by HTTP/tidb.monitor_time_jump_back.rate,5m)>{$TIDB.TIME_JUMP_BACK.MAX.WARN} |
WARNING | |
TiDB: There are panicked TiDB threads | When a panic occurs, an alert is triggered. The thread is often recovered, otherwise, TiDB will frequently restart. |
last(/TiDB by HTTP/tidb.tidb_server_panic_total.rate)>0 |
AVERAGE | |
TiDB: Too many failed GC-related operations | - |
min(/TiDB by HTTP/tidb.tikvclient_gc_action.rate[{#TYPE}],5m)>{$TIDB.GC_ACTIONS.ERRORS.MAX.WARN} |
WARNING |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
The template to monitor PD server of TiDB cluster by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template TiDB PD by HTTP
— collects metrics by HTTP agent from PD /metrics endpoint and from monitoring API.
See https://docs.pingcap.com/tidb/stable/tidb-monitoring-api.
This template was tested on:
See Zabbix template operation for basic instructions.
This template works with PD server of TiDB cluster. Internal service metrics are collected from PD /metrics endpoint and from monitoring API. See https://docs.pingcap.com/tidb/stable/tidb-monitoring-api. Don't forget to change the macros {$PD.URL}, {$PD.PORT}. Also, see the Macros section for a list of macros used to set trigger values.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$PD.MISS_REGION.MAX.WARN} | Maximum number of missed regions |
100 |
{$PD.PORT} | The port of PD server metrics web endpoint |
2379 |
{$PD.STORAGE_USAGE.MAX.WARN} | Maximum percentage of cluster space used |
80 |
{$PD.URL} | PD server URL |
localhost |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster metrics discovery | Discovery cluster specific metrics. |
DEPENDENT | pd.cluster.discovery Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
gRPC commands discovery | Discovery grpc commands specific metrics. |
DEPENDENT | pd.grpccommand.discovery Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARD UNCHANGED_HEARTBEAT:1h |
Region discovery | Discovery region specific metrics. |
DEPENDENT | pd.region.discovery Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
Region labels discovery | Discovery region labels specific metrics. |
DEPENDENT | pd.regionlabels.discovery Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARD UNCHANGED_HEARTBEAT:1h |
Region status discovery | Discovery region status specific metrics. |
DEPENDENT | pd.regionstatus.discovery Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARD UNCHANGEDHEARTBEAT:1h Overrides: Too many missed regions trigger miss_peer_region_count - TRIGGERPROTOTYPE LIKE Too many missed regions - DISCOVERUnresponsive peers trigger down_peer_region_count - TRIGGER_PROTOTYPE LIKE There are unresponsive peers - DISCOVER |
Running scheduler discovery | Discovery scheduler specific metrics. |
DEPENDENT | pd.scheduler.discovery Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
PD instance | PD: Status | Status of PD instance. |
DEPENDENT | pd.status Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
PD instance | PD: GRPC Commands total, rate | The rate at which gRPC commands are completed. |
DEPENDENT | pd.grpccommand.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - CHANGEPERSECOND |
PD instance | PD: Version | Version of the PD instance. |
DEPENDENT | pd.version Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
PD instance | PD: Uptime | The runtime of each PD instance. |
DEPENDENT | pd.uptime Preprocessing: - JSONPATH: - JAVASCRIPT: |
PD instance | PD: GRPC Commands: {#GRPC_METHOD}, rate | The rate per command type at which gRPC commands are completed. |
DEPENDENT | pd.grpccommand.rate[{#GRPCMETHOD}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
TiDB cluster | TiDB cluster: Offline stores | - |
DEPENDENT | pd.clusterstatus.storeoffline[{#SINGLETON}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
TiDB cluster | TiDB cluster: Tombstone stores | The count of tombstone stores. |
DEPENDENT | pd.clusterstatus.storetombstone[{#SINGLETON}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
TiDB cluster | TiDB cluster: Down stores | The count of down stores. |
DEPENDENT | pd.clusterstatus.storedown[{#SINGLETON}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
TiDB cluster | TiDB cluster: Lowspace stores | The count of low space stores. |
DEPENDENT | pd.clusterstatus.storelowspace[{#SINGLETON}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
TiDB cluster | TiDB cluster: Unhealth stores | The count of unhealthy stores. |
DEPENDENT | pd.clusterstatus.storeunhealth[{#SINGLETON}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
TiDB cluster | TiDB cluster: Disconnect stores | The count of disconnected stores. |
DEPENDENT | pd.clusterstatus.storedisconnected[{#SINGLETON}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
TiDB cluster | TiDB cluster: Normal stores | The count of healthy storage instances. |
DEPENDENT | pd.clusterstatus.storeup[{#SINGLETON}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
TiDB cluster | TiDB cluster: Storage capacity | The total storage capacity for this TiDB cluster. |
DEPENDENT | pd.clusterstatus.storagecapacity[{#SINGLETON}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
TiDB cluster | TiDB cluster: Storage size | The storage size that is currently used by the TiDB cluster. |
DEPENDENT | pd.clusterstatus.storagesize[{#SINGLETON}] Preprocessing: - JSONPATH: |
TiDB cluster | TiDB cluster: Number of regions | The total count of cluster Regions. |
DEPENDENT | pd.clusterstatus.leadercount[{#SINGLETON}] Preprocessing: - JSONPATH: |
TiDB cluster | TiDB cluster: Current peer count | The current count of all cluster peers. |
DEPENDENT | pd.clusterstatus.regioncount[{#SINGLETON}] Preprocessing: - JSONPATH: |
TiDB cluster | TiDB cluster: Regions label: {#TYPE} | The number of Regions in different label levels. |
DEPENDENT | pd.region_labels[{#TYPE}] Preprocessing: - JSONPATH: |
TiDB cluster | TiDB cluster: Regions status: {#TYPE} | The health status of Regions indicated via the count of unusual Regions including pending peers, down peers, extra peers, offline peers, missing peers, learner peers and incorrect namespaces. |
DEPENDENT | pd.region_status[{#TYPE}] Preprocessing: - JSONPATH: |
TiDB cluster | TiDB cluster: Scheduler status: {#KIND} | The current running schedulers. |
DEPENDENT | pd.scheduler[{#KIND}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
TiDB cluster | PD: Region heartbeat: active, rate | The count of heartbeats with the ok status per second. |
DEPENDENT | pd.regionheartbeat.ok.rate[{#STOREADDRESS}] Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
TiDB cluster | PD: Region heartbeat: error, rate | The count of heartbeats with the error status per second. |
DEPENDENT | pd.regionheartbeat.error.rate[{#STOREADDRESS}] Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
TiDB cluster | PD: Region heartbeat: total, rate | The count of heartbeats reported to PD per instance per second. |
DEPENDENT | pd.regionheartbeat.rate[{#STOREADDRESS}] Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
TiDB cluster | PD: Region schedule push: total, rate | - |
DEPENDENT | pd.regionheartbeat.push.err.rate[{#STOREADDRESS}] Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
Zabbix raw items | PD: Get instance metrics | Get TiDB PD instance metrics. |
HTTP_AGENT | pd.getmetrics Preprocessing: - CHECK NOTSUPPORTED⛔️ON FAIL:DISCARD_VALUE -> - PROMETHEUSTOJSON |
Zabbix raw items | PD: Get instance status | Get TiDB PD instance status info. |
HTTP_AGENT | pd.getstatus Preprocessing: - CHECK NOTSUPPORTED⛔️ON FAIL:CUSTOM_VALUE -> {"status": "0"} |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
PD: Instance is not responding | - |
last(/TiDB PD by HTTP/pd.status)=0 |
AVERAGE | |
PD: Version has changed | PD version has changed. Ack to close. |
last(/TiDB PD by HTTP/pd.version,#1)<>last(/TiDB PD by HTTP/pd.version,#2) and length(last(/TiDB PD by HTTP/pd.version))>0 |
INFO | Manual close: YES |
PD: has been restarted | Uptime is less than 10 minutes. |
last(/TiDB PD by HTTP/pd.uptime)<10m |
INFO | Manual close: YES |
TiDB cluster: There are offline TiKV nodes | PD has not received a TiKV heartbeat for a long time. |
last(/TiDB PD by HTTP/pd.cluster_status.store_down[{#SINGLETON}])>0 |
AVERAGE | |
TiDB cluster: There are low space TiKV nodes | Indicates that there is no sufficient space on the TiKV node. |
last(/TiDB PD by HTTP/pd.cluster_status.store_low_space[{#SINGLETON}])>0 |
AVERAGE | |
TiDB cluster: There are disconnected TiKV nodes | PD does not receive a TiKV heartbeat within 20 seconds. Normally a TiKV heartbeat comes in every 10 seconds. |
last(/TiDB PD by HTTP/pd.cluster_status.store_disconnected[{#SINGLETON}])>0 |
WARNING | |
TiDB cluster: Current storage usage is too high | Over {$PD.STORAGE_USAGE.MAX.WARN}% of the cluster space is occupied. |
min(/TiDB PD by HTTP/pd.cluster_status.storage_size[{#SINGLETON}],5m)/last(/TiDB PD by HTTP/pd.cluster_status.storage_capacity[{#SINGLETON}])*100>{$PD.STORAGE_USAGE.MAX.WARN} |
WARNING | |
TiDB cluster: Too many missed regions | The number of Region replicas is smaller than the value of max-replicas. When a TiKV machine is down and its downtime exceeds max-down-time, it usually leads to missing replicas for some Regions during a period of time. When a TiKV node is made offline, it might result in a small number of Regions with missing replicas. |
min(/TiDB PD by HTTP/pd.region_status[{#TYPE}],5m)>{$PD.MISS_REGION.MAX.WARN} |
WARNING | |
TiDB cluster: There are unresponsive peers | The number of Regions with an unresponsive peer reported by the Raft leader. |
min(/TiDB PD by HTTP/pd.region_status[{#TYPE}],5m)>0 |
WARNING |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
The template to monitor Redis server by Zabbix that work without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Redis by Zabbix agent 2
— collects metrics by polling zabbix-agent2.
This template was tested on:
See Zabbix template operation for basic instructions.
Setup and configure zabbix-agent2 compiled with the Redis monitoring plugin (ZBXNEXT-5428-4.3).
Test availability: zabbix_get -s redis-master -k redis.ping
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$REDIS.CLIENTS.PRC.MAX.WARN} | Maximum percentage of connected clients |
80 |
{$REDIS.CONN.URI} | Connection string in the URI format (password is not used). This param overwrites a value configured in the "Server" option of the configuration file (if it's set), otherwise, the plugin's default value is used: "tcp://localhost:6379" |
tcp://localhost:6379 |
{$REDIS.LLD.FILTER.DB.MATCHES} | Filter of discoverable databases |
.* |
{$REDIS.LLD.FILTER.DB.NOT_MATCHES} | Filter to exclude discovered databases |
CHANGE_IF_NEEDED |
{$REDIS.LLD.PROCESS_NAME} | Redis server process name for LLD |
redis-server |
{$REDIS.MEM.FRAG_RATIO.MAX.WARN} | Maximum memory fragmentation ratio |
1.5 |
{$REDIS.MEM.PUSED.MAX.WARN} | Maximum percentage of memory used |
90 |
{$REDIS.PROCESS_NAME} | Redis server process name |
redis-server |
{$REDIS.REPL.LAG.MAX.WARN} | Maximum replication lag in seconds |
30s |
{$REDIS.SLOWLOG.COUNT.MAX.WARN} | Maximum number of slowlog entries per second |
1 |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
AOF metrics discovery | If AOF is activated, additional metrics will be added |
DEPENDENT | redis.persistence.aof.discovery Preprocessing: - JAVASCRIPT: |
Keyspace discovery | Individual keyspace metrics |
DEPENDENT | redis.keyspace.discovery Preprocessing: - JAVASCRIPT: Filter: AND- {#DB} MATCHESREGEX - {#DB} NOTMATCHES_REGEX |
Process metrics discovery | Collect metrics by Zabbix agent if it exists |
ZABBIX_PASSIVE | proc.num["{$REDIS.LLD.PROCESS_NAME}"] Preprocessing: - JAVASCRIPT: |
Replication metrics discovery | If the instance is the master and the slaves are connected, additional metrics are provided |
DEPENDENT | redis.replication.master.discovery Preprocessing: - JAVASCRIPT: |
Slave metrics discovery | If the instance is a replica, additional metrics are provided |
DEPENDENT | redis.replication.slave.discovery Preprocessing: - JAVASCRIPT: |
Version 4+ metrics discovery | Additional metrics for versions 4+ |
DEPENDENT | redis.metrics.v4.discovery Preprocessing: - JSONPATH: - JAVASCRIPT: |
Version 5+ metrics discovery | Additional metrics for versions 5+ |
DEPENDENT | redis.metrics.v5.discovery Preprocessing: - JSONPATH: - JAVASCRIPT: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Redis | Redis: Ping | ZABBIX_PASSIVE | redis.ping["{$REDIS.CONN.URI}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
|
Redis | Redis: Slowlog entries per second | ZABBIX_PASSIVE | redis.slowlog.count["{$REDIS.CONN.URI}"] Preprocessing: - CHANGEPERSECOND |
|
Redis | Redis: CPU sys | System CPU consumed by the Redis server |
DEPENDENT | redis.cpu.sys Preprocessing: - JSONPATH: |
Redis | Redis: CPU sys children | System CPU consumed by the background processes |
DEPENDENT | redis.cpu.sys_children Preprocessing: - JSONPATH: |
Redis | Redis: CPU user | User CPU consumed by the Redis server |
DEPENDENT | redis.cpu.user Preprocessing: - JSONPATH: |
Redis | Redis: CPU user children | User CPU consumed by the background processes |
DEPENDENT | redis.cpu.user_children Preprocessing: - JSONPATH: |
Redis | Redis: Blocked clients | The number of connections waiting on a blocking call |
DEPENDENT | redis.clients.blocked Preprocessing: - JSONPATH: |
Redis | Redis: Max input buffer | The biggest input buffer among current client connections |
DEPENDENT | redis.clients.maxinputbuffer Preprocessing: - JAVASCRIPT: |
Redis | Redis: Max output buffer | The biggest output buffer among current client connections |
DEPENDENT | redis.clients.maxoutputbuffer Preprocessing: - JAVASCRIPT: |
Redis | Redis: Connected clients | The number of connected clients |
DEPENDENT | redis.clients.connected Preprocessing: - JSONPATH: |
Redis | Redis: Cluster enabled | Indicate Redis cluster is enabled |
DEPENDENT | redis.cluster.enabled Preprocessing: - JSONPATH: |
Redis | Redis: Memory used | Total number of bytes allocated by Redis using its allocator |
DEPENDENT | redis.memory.used_memory Preprocessing: - JSONPATH: |
Redis | Redis: Memory used Lua | Amount of memory used by the Lua engine |
DEPENDENT | redis.memory.usedmemorylua Preprocessing: - JSONPATH: |
Redis | Redis: Memory used peak | Peak memory consumed by Redis (in bytes) |
DEPENDENT | redis.memory.usedmemorypeak Preprocessing: - JSONPATH: |
Redis | Redis: Memory used RSS | Number of bytes that Redis allocated as seen by the operating system |
DEPENDENT | redis.memory.usedmemoryrss Preprocessing: - JSONPATH: |
Redis | Redis: Memory fragmentation ratio | This ratio is an indication of memory mapping efficiency: — Value over 1.0 indicate that memory fragmentation is very likely. Consider restarting the Redis server so the operating system can recover fragmented memory, especially with a ratio over 1.5. — Value under 1.0 indicate that Redis likely has insufficient memory available. Consider optimizing memory usage or adding more RAM. Note: If your peak memory usage is much higher than your current memory usage, the memory fragmentation ratio may be unreliable. https://redis.io/topics/memory-optimization |
DEPENDENT | redis.memory.fragmentation_ratio Preprocessing: - JSONPATH: |
Redis | Redis: AOF current rewrite time sec | Duration of the on-going AOF rewrite operation if any |
DEPENDENT | redis.persistence.aofcurrentrewritetimesec Preprocessing: - JSONPATH: |
Redis | Redis: AOF enabled | Flag indicating AOF logging is activated |
DEPENDENT | redis.persistence.aof_enabled Preprocessing: - JSONPATH: |
Redis | Redis: AOF last bgrewrite status | Status of the last AOF rewrite operation |
DEPENDENT | redis.persistence.aoflastbgrewritestatus Preprocessing: - JSONPATH: - BOOL TO_DECIMAL |
Redis | Redis: AOF last rewrite time sec | Duration of the last AOF rewrite |
DEPENDENT | redis.persistence.aoflastrewritetimesec Preprocessing: - JSONPATH: |
Redis | Redis: AOF last write status | Status of the last write operation to the AOF |
DEPENDENT | redis.persistence.aoflastwritestatus Preprocessing: - JSONPATH: - BOOL TO_DECIMAL |
Redis | Redis: AOF rewrite in progress | Flag indicating a AOF rewrite operation is on-going |
DEPENDENT | redis.persistence.aofrewritein_progress Preprocessing: - JSONPATH: |
Redis | Redis: AOF rewrite scheduled | Flag indicating an AOF rewrite operation will be scheduled once the on-going RDB save is complete |
DEPENDENT | redis.persistence.aofrewritescheduled Preprocessing: - JSONPATH: |
Redis | Redis: Dump loading | Flag indicating if the load of a dump file is on-going |
DEPENDENT | redis.persistence.loading Preprocessing: - JSONPATH: |
Redis | Redis: RDB bgsave in progress | "1" if bgsave is in progress and "0" otherwise |
DEPENDENT | redis.persistence.rdbbgsavein_progress Preprocessing: - JSONPATH: |
Redis | Redis: RDB changes since last save | Number of changes since the last background save |
DEPENDENT | redis.persistence.rdbchangessincelastsave Preprocessing: - JSONPATH: |
Redis | Redis: RDB current bgsave time sec | Duration of the on-going RDB save operation if any |
DEPENDENT | redis.persistence.rdbcurrentbgsavetimesec Preprocessing: - JSONPATH: |
Redis | Redis: RDB last bgsave status | Status of the last RDB save operation |
DEPENDENT | redis.persistence.rdblastbgsavestatus Preprocessing: - JSONPATH: - BOOL TO_DECIMAL |
Redis | Redis: RDB last bgsave time sec | Duration of the last bg_save operation |
DEPENDENT | redis.persistence.rdblastbgsavetimesec Preprocessing: - JSONPATH: |
Redis | Redis: RDB last save time | Epoch-based timestamp of last successful RDB save |
DEPENDENT | redis.persistence.rdblastsave_time Preprocessing: - JSONPATH: |
Redis | Redis: Connected slaves | Number of connected slaves |
DEPENDENT | redis.replication.connected_slaves Preprocessing: - JSONPATH: |
Redis | Redis: Replication backlog active | Flag indicating replication backlog is active |
DEPENDENT | redis.replication.replbacklogactive Preprocessing: - JSONPATH: |
Redis | Redis: Replication backlog first byte offset | The master offset of the replication backlog buffer |
DEPENDENT | redis.replication.replbacklogfirstbyteoffset Preprocessing: - JSONPATH: |
Redis | Redis: Replication backlog history length | Amount of data in the backlog sync buffer |
DEPENDENT | redis.replication.replbackloghistlen Preprocessing: - JSONPATH: |
Redis | Redis: Replication backlog size | Total size in bytes of the replication backlog buffer |
DEPENDENT | redis.replication.replbacklogsize Preprocessing: - JSONPATH: |
Redis | Redis: Replication role | Value is "master" if the instance is replica of no one, or "slave" if the instance is a replica of some master instance. Note that a replica can be master of another replica (chained replication). |
DEPENDENT | redis.replication.role Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Redis | Redis: Master replication offset | Replication offset reported by the master |
DEPENDENT | redis.replication.masterreploffset Preprocessing: - JSONPATH: |
Redis | Redis: Process id | PID of the server process |
DEPENDENT | redis.server.processid Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
Redis | Redis: Redis mode | The server's mode ("standalone", "sentinel" or "cluster") |
DEPENDENT | redis.server.redismode Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
Redis | Redis: Redis version | Version of the Redis server |
DEPENDENT | redis.server.redisversion Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
Redis | Redis: TCP port | TCP/IP listen port |
DEPENDENT | redis.server.tcpport Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
Redis | Redis: Uptime | Number of seconds since Redis server start |
DEPENDENT | redis.server.uptime Preprocessing: - JSONPATH: |
Redis | Redis: Evicted keys | Number of evicted keys due to maxmemory limit |
DEPENDENT | redis.stats.evicted_keys Preprocessing: - JSONPATH: |
Redis | Redis: Expired keys | Total number of key expiration events |
DEPENDENT | redis.stats.expired_keys Preprocessing: - JSONPATH: |
Redis | Redis: Instantaneous input bytes per second | The network's read rate per second in KB/sec |
DEPENDENT | redis.stats.instantaneous_input.rate Preprocessing: - JSONPATH: - MULTIPLIER: |
Redis | Redis: Instantaneous operations per sec | Number of commands processed per second |
DEPENDENT | redis.stats.instantaneous_ops.rate Preprocessing: - JSONPATH: |
Redis | Redis: Instantaneous output bytes per second | The network's write rate per second in KB/sec |
DEPENDENT | redis.stats.instantaneous_output.rate Preprocessing: - JSONPATH: - MULTIPLIER: |
Redis | Redis: Keyspace hits | Number of successful lookup of keys in the main dictionary |
DEPENDENT | redis.stats.keyspace_hits Preprocessing: - JSONPATH: |
Redis | Redis: Keyspace misses | Number of failed lookup of keys in the main dictionary |
DEPENDENT | redis.stats.keyspace_misses Preprocessing: - JSONPATH: |
Redis | Redis: Latest fork usec | Duration of the latest fork operation in microseconds |
DEPENDENT | redis.stats.latestforkusec Preprocessing: - JSONPATH: - MULTIPLIER: |
Redis | Redis: Migrate cached sockets | The number of sockets open for MIGRATE purposes |
DEPENDENT | redis.stats.migratecachedsockets Preprocessing: - JSONPATH: |
Redis | Redis: Pubsub channels | Global number of pub/sub channels with client subscriptions |
DEPENDENT | redis.stats.pubsub_channels Preprocessing: - JSONPATH: |
Redis | Redis: Pubsub patterns | Global number of pub/sub pattern with client subscriptions |
DEPENDENT | redis.stats.pubsub_patterns Preprocessing: - JSONPATH: |
Redis | Redis: Rejected connections | Number of connections rejected because of maxclients limit |
DEPENDENT | redis.stats.rejected_connections Preprocessing: - JSONPATH: |
Redis | Redis: Sync full | The number of full resyncs with replicas |
DEPENDENT | redis.stats.sync_full Preprocessing: - JSONPATH: |
Redis | Redis: Sync partial err | The number of denied partial resync requests |
DEPENDENT | redis.stats.syncpartialerr Preprocessing: - JSONPATH: |
Redis | Redis: Sync partial ok | The number of accepted partial resync requests |
DEPENDENT | redis.stats.syncpartialok Preprocessing: - JSONPATH: |
Redis | Redis: Total commands processed | Total number of commands processed by the server |
DEPENDENT | redis.stats.totalcommandsprocessed Preprocessing: - JSONPATH: |
Redis | Redis: Total connections received | Total number of connections accepted by the server |
DEPENDENT | redis.stats.totalconnectionsreceived Preprocessing: - JSONPATH: |
Redis | Redis: Total net input bytes | The total number of bytes read from the network |
DEPENDENT | redis.stats.totalnetinput_bytes Preprocessing: - JSONPATH: |
Redis | Redis: Total net output bytes | The total number of bytes written to the network |
DEPENDENT | redis.stats.totalnetoutput_bytes Preprocessing: - JSONPATH: |
Redis | Redis: Max clients | Max number of connected clients at the same time. Once the limit is reached Redis will close all the new connections sending an error "max number of clients reached". |
DEPENDENT | redis.config.maxclients Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Redis | DB {#DB}: Average TTL | Average TTL |
DEPENDENT | redis.db.avg_ttl["{#DB}"] Preprocessing: - JSONPATH: - MULTIPLIER: |
Redis | DB {#DB}: Expires | Number of keys with an expiration |
DEPENDENT | redis.db.expires["{#DB}"] Preprocessing: - JSONPATH: |
Redis | DB {#DB}: Keys | Total number of keys |
DEPENDENT | redis.db.keys["{#DB}"] Preprocessing: - JSONPATH: |
Redis | Redis: AOF current size{#SINGLETON} | AOF current file size |
DEPENDENT | redis.persistence.aofcurrentsize[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: AOF base size{#SINGLETON} | AOF file size on latest startup or rewrite |
DEPENDENT | redis.persistence.aofbasesize[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: AOF pending rewrite{#SINGLETON} | Flag indicating an AOF rewrite operation will |
DEPENDENT | redis.persistence.aofpendingrewrite[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: AOF buffer length{#SINGLETON} | Size of the AOF buffer |
DEPENDENT | redis.persistence.aofbufferlength[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: AOF rewrite buffer length{#SINGLETON} | Size of the AOF rewrite buffer |
DEPENDENT | redis.persistence.aofrewritebuffer_length[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: AOF pending background I/O fsync{#SINGLETON} | Number of fsync pending jobs in background I/O queue |
DEPENDENT | redis.persistence.aofpendingbio_fsync[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: AOF delayed fsync{#SINGLETON} | Delayed fsync counter |
DEPENDENT | redis.persistence.aofdelayedfsync[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Master host{#SINGLETON} | Host or IP address of the master |
DEPENDENT | redis.replication.masterhost[{#SINGLETON}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
Redis | Redis: Master port{#SINGLETON} | Master listening TCP port |
DEPENDENT | redis.replication.masterport[{#SINGLETON}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
Redis | Redis: Master link status{#SINGLETON} | Status of the link (up/down) |
DEPENDENT | redis.replication.masterlinkstatus[{#SINGLETON}] Preprocessing: - JSONPATH: - BOOLTODECIMAL |
Redis | Redis: Master last I/O seconds ago{#SINGLETON} | Number of seconds since the last interaction with master |
DEPENDENT | redis.replication.masterlastiosecondsago[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Master sync in progress{#SINGLETON} | Indicate the master is syncing to the replica |
DEPENDENT | redis.replication.mastersyncin_progress[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Slave replication offset{#SINGLETON} | The replication offset of the replica instance |
DEPENDENT | redis.replication.slavereploffset[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Slave priority{#SINGLETON} | The priority of the instance as a candidate for failover |
DEPENDENT | redis.replication.slave_priority[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Slave priority{#SINGLETON} | Flag indicating if the replica is read-only |
DEPENDENT | redis.replication.slavereadonly[{#SINGLETON}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Redis | Redis slave {#SLAVEIP}:{#SLAVEPORT}: Replication lag in bytes | Replication lag in bytes |
DEPENDENT | redis.replication.lagbytes["{#SLAVEIP}:{#SLAVE_PORT}"] Preprocessing: - JAVASCRIPT: |
Redis | Redis: Number of processes running | - |
ZABBIX_PASSIVE | proc.num["{$REDIS.PROCESS_NAME}{#SINGLETON}"] |
Redis | Redis: Memory usage (rss) | Resident set size memory used by process in bytes. |
ZABBIX_PASSIVE | proc.mem["{$REDIS.PROCESS_NAME}{#SINGLETON}",,,,rss] |
Redis | Redis: Memory usage (vsize) | Virtual memory size used by process in bytes. |
ZABBIX_PASSIVE | proc.mem["{$REDIS.PROCESS_NAME}{#SINGLETON}",,,,vsize] |
Redis | Redis: CPU utilization | Process CPU utilization percentage. |
ZABBIX_PASSIVE | proc.cpu.util["{$REDIS.PROCESS_NAME}{#SINGLETON}"] |
Redis | Redis: Executable path{#SINGLETON} | The path to the server's executable |
DEPENDENT | redis.server.executable[{#SINGLETON}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Redis | Redis: Memory used peak %{#SINGLETON} | The percentage of usedmemorypeak out of used_memory |
DEPENDENT | redis.memory.usedmemorypeak_perc[{#SINGLETON}] Preprocessing: - JSONPATH: - REGEX: |
Redis | Redis: Memory used overhead{#SINGLETON} | The sum in bytes of all overheads that the server allocated for managing its internal data structures |
DEPENDENT | redis.memory.usedmemoryoverhead[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Memory used startup{#SINGLETON} | Initial amount of memory consumed by Redis at startup in bytes |
DEPENDENT | redis.memory.usedmemorystartup[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Memory used dataset{#SINGLETON} | The size in bytes of the dataset |
DEPENDENT | redis.memory.usedmemorydataset[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Memory used dataset %{#SINGLETON} | The percentage of usedmemorydataset out of the net memory usage (usedmemory minus usedmemory_startup) |
DEPENDENT | redis.memory.usedmemorydataset_perc[{#SINGLETON}] Preprocessing: - JSONPATH: - REGEX: |
Redis | Redis: Total system memory{#SINGLETON} | The total amount of memory that the Redis host has |
DEPENDENT | redis.memory.totalsystemmemory[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Max memory{#SINGLETON} | Maximum amount of memory allocated to the Redisdb system |
DEPENDENT | redis.memory.maxmemory[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Max memory policy{#SINGLETON} | The value of the maxmemory-policy configuration directive |
DEPENDENT | redis.memory.maxmemorypolicy[{#SINGLETON}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
Redis | Redis: Active defrag running{#SINGLETON} | Flag indicating if active defragmentation is active |
DEPENDENT | redis.memory.activedefragrunning[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Lazyfree pending objects{#SINGLETON} | The number of objects waiting to be freed (as a result of calling UNLINK, or FLUSHDB and FLUSHALL with the ASYNC option) |
DEPENDENT | redis.memory.lazyfreependingobjects[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: RDB last CoW size{#SINGLETON} | The size in bytes of copy-on-write allocations during the last RDB save operation |
DEPENDENT | redis.persistence.rdblastcow_size[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: AOF last CoW size{#SINGLETON} | The size in bytes of copy-on-write allocations during the last AOF rewrite operation |
DEPENDENT | redis.persistence.aoflastcow_size[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Expired stale %{#SINGLETON} | - |
DEPENDENT | redis.stats.expiredstaleperc[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Expired time cap reached count{#SINGLETON} | - |
DEPENDENT | redis.stats.expiredtimecapreachedcount[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Slave expires tracked keys{#SINGLETON} | The number of keys tracked for expiry purposes (applicable only to writable replicas) |
DEPENDENT | redis.stats.slaveexpirestracked_keys[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Active defrag hits{#SINGLETON} | Number of value reallocations performed by active the defragmentation process |
DEPENDENT | redis.stats.activedefraghits[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Active defrag misses{#SINGLETON} | Number of aborted value reallocations started by the active defragmentation process |
DEPENDENT | redis.stats.activedefragmisses[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Active defrag key hits{#SINGLETON} | Number of keys that were actively defragmented |
DEPENDENT | redis.stats.activedefragkey_hits[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Active defrag key misses{#SINGLETON} | Number of keys that were skipped by the active defragmentation process |
DEPENDENT | redis.stats.activedefragkey_misses[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Replication second offset{#SINGLETON} | Offset up to which replication IDs are accepted |
DEPENDENT | redis.replication.secondreploffset[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Allocator active{#SINGLETON} | - |
DEPENDENT | redis.memory.allocator_active[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Allocator allocated{#SINGLETON} | - |
DEPENDENT | redis.memory.allocator_allocated[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Allocator resident{#SINGLETON} | - |
DEPENDENT | redis.memory.allocator_resident[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Memory used scripts{#SINGLETON} | - |
DEPENDENT | redis.memory.usedmemoryscripts[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Memory number of cached scripts{#SINGLETON} | - |
DEPENDENT | redis.memory.numberofcached_scripts[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Allocator fragmentation bytes{#SINGLETON} | - |
DEPENDENT | redis.memory.allocatorfragbytes[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Allocator fragmentation ratio{#SINGLETON} | - |
DEPENDENT | redis.memory.allocatorfragratio[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Allocator RSS bytes{#SINGLETON} | - |
DEPENDENT | redis.memory.allocatorrssbytes[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Allocator RSS ratio{#SINGLETON} | - |
DEPENDENT | redis.memory.allocatorrssratio[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Memory RSS overhead bytes{#SINGLETON} | - |
DEPENDENT | redis.memory.rssoverheadbytes[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Memory RSS overhead ratio{#SINGLETON} | - |
DEPENDENT | redis.memory.rssoverheadratio[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Memory fragmentation bytes{#SINGLETON} | - |
DEPENDENT | redis.memory.fragmentation_bytes[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Memory not counted for evict{#SINGLETON} | - |
DEPENDENT | redis.memory.notcountedfor_evict[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Memory replication backlog{#SINGLETON} | - |
DEPENDENT | redis.memory.replication_backlog[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Memory clients normal{#SINGLETON} | - |
DEPENDENT | redis.memory.memclientsnormal[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Memory clients slaves{#SINGLETON} | - |
DEPENDENT | redis.memory.memclientsslaves[{#SINGLETON}] Preprocessing: - JSONPATH: |
Redis | Redis: Memory AOF buffer{#SINGLETON} | Size of the AOF buffer |
DEPENDENT | redis.memory.memaofbuffer[{#SINGLETON}] Preprocessing: - JSONPATH: |
Zabbix raw items | Redis: Get info | ZABBIX_PASSIVE | redis.info["{$REDIS.CONN.URI}"] | |
Zabbix raw items | Redis: Get config | ZABBIX_PASSIVE | redis.config["{$REDIS.CONN.URI}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Redis: Service is down | - |
last(/Redis by Zabbix agent 2/redis.ping["{$REDIS.CONN.URI}"])=0 |
AVERAGE | Manual close: YES |
Redis: Too many entries in the slowlog | - |
min(/Redis by Zabbix agent 2/redis.slowlog.count["{$REDIS.CONN.URI}"],5m)>{$REDIS.SLOWLOG.COUNT.MAX.WARN} |
INFO | |
Redis: Total number of connected clients is too high | When the number of clients reaches the value of the "maxclients" parameter, new connections will be rejected. https://redis.io/topics/clients#maximum-number-of-clients |
min(/Redis by Zabbix agent 2/redis.clients.connected,5m)/last(/Redis by Zabbix agent 2/redis.config.maxclients)*100>{$REDIS.CLIENTS.PRC.MAX.WARN} |
WARNING | |
Redis: Memory fragmentation ratio is too high | This ratio is an indication of memory mapping efficiency: — Value over 1.0 indicate that memory fragmentation is very likely. Consider restarting the Redis server so the operating system can recover fragmented memory, especially with a ratio over 1.5. — Value under 1.0 indicate that Redis likely has insufficient memory available. Consider optimizing memory usage or adding more RAM. Note: If your peak memory usage is much higher than your current memory usage, the memory fragmentation ratio may be unreliable. https://redis.io/topics/memory-optimization |
min(/Redis by Zabbix agent 2/redis.memory.fragmentation_ratio,15m)>{$REDIS.MEM.FRAG_RATIO.MAX.WARN} |
WARNING | |
Redis: Last AOF write operation failed | Detailed information about persistence: https://redis.io/topics/persistence |
last(/Redis by Zabbix agent 2/redis.persistence.aof_last_write_status)=0 |
WARNING | |
Redis: Last RDB save operation failed | Detailed information about persistence: https://redis.io/topics/persistence |
last(/Redis by Zabbix agent 2/redis.persistence.rdb_last_bgsave_status)=0 |
WARNING | |
Redis: Number of slaves has changed | Redis number of slaves has changed. Ack to close. |
last(/Redis by Zabbix agent 2/redis.replication.connected_slaves,#1)<>last(/Redis by Zabbix agent 2/redis.replication.connected_slaves,#2) |
INFO | Manual close: YES |
Redis: Replication role has changed | Redis replication role has changed. Ack to close. |
last(/Redis by Zabbix agent 2/redis.replication.role,#1)<>last(/Redis by Zabbix agent 2/redis.replication.role,#2) and length(last(/Redis by Zabbix agent 2/redis.replication.role))>0 |
WARNING | Manual close: YES |
Redis: Version has changed | Redis version has changed. Ack to close. |
last(/Redis by Zabbix agent 2/redis.server.redis_version,#1)<>last(/Redis by Zabbix agent 2/redis.server.redis_version,#2) and length(last(/Redis by Zabbix agent 2/redis.server.redis_version))>0 |
INFO | Manual close: YES |
Redis: has been restarted | Uptime is less than 10 minutes. |
last(/Redis by Zabbix agent 2/redis.server.uptime)<10m |
INFO | Manual close: YES |
Redis: Connections are rejected | The number of connections has reached the value of "maxclients". https://redis.io/topics/clients |
last(/Redis by Zabbix agent 2/redis.stats.rejected_connections)>0 |
HIGH | |
Redis: Replication lag with master is too high | - |
min(/Redis by Zabbix agent 2/redis.replication.master_last_io_seconds_ago[{#SINGLETON}],5m)>{$REDIS.REPL.LAG.MAX.WARN} |
WARNING | |
Redis: Process is not running | - |
last(/Redis by Zabbix agent 2/proc.num["{$REDIS.PROCESS_NAME}{#SINGLETON}"])=0 |
HIGH | |
Redis: Memory usage is too high | - |
last(/Redis by Zabbix agent 2/redis.memory.used_memory)/min(/Redis by Zabbix agent 2/redis.memory.maxmemory[{#SINGLETON}],5m)*100>{$REDIS.MEM.PUSED.MAX.WARN} |
WARNING | |
Redis: Failed to fetch info data | Zabbix has not received data for items for the last 30 minutes |
nodata(/Redis by Zabbix agent 2/redis.info["{$REDIS.CONN.URI}"],30m)=1 |
WARNING | Manual close: YES Depends on: - Redis: Service is down |
Redis: Configuration has changed | Redis configuration has changed. Ack to close. |
last(/Redis by Zabbix agent 2/redis.config["{$REDIS.CONN.URI}"],#1)<>last(/Redis by Zabbix agent 2/redis.config["{$REDIS.CONN.URI}"],#2) and length(last(/Redis by Zabbix agent 2/redis.config["{$REDIS.CONN.URI}"]))>0 |
INFO | Manual close: YES |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher. The template is developed to monitor DBMS PostgreSQL and its forks.
This template has been tested on:
See Zabbix template operation for basic instructions.
Deploy Zabbix agent2 with Postgres plugin. Starting with Zabbix versions 6.0.10 / 6.2.4 / 6.4 postgres metrics moved to a loadable plugin and requires separate package installation or compilation of a plugin from sources.
Create PostgreSQL user to monitor (<password>
at your discretion) and inherit permissions from the default role pg_monitor
:
CREATE USER zbx_monitor WITH PASSWORD '<PASSWORD>' INHERIT;
GRANT pg_monitor TO zbx_monitor;
pg_hba.conf
to allow connections from Zabbix agent:# TYPE DATABASE USER ADDRESS METHOD
host all zbx_monitor localhost md5
For more information please read the PostgreSQL documentation https://www.postgresql.org/docs/current/auth-pg-hba-conf.html.
Set in the {$PG.URI}
macro the system data source name of the PostgreSQL instance such as <protocol(host:port)>
.
Set the user name and password in host macros ({$PG.USER}
and {$PG.PASSWORD}
) if you want to override parameters from the Zabbix agent configuration file.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$PG.CONFLICTS.MAX.WARN} | - |
0 |
{$PG.CONNTOTALPCT.MAX.WARN} | - |
90 |
{$PG.DATABASE} | - |
postgres |
{$PG.DEADLOCKS.MAX.WARN} | - |
0 |
{$PG.LLD.FILTER.APPLICATION} | - |
(.+) |
{$PG.LLD.FILTER.DBNAME} | - |
(.+) |
{$PG.PASSWORD} | - |
postgres |
{$PG.QUERY_ETIME.MAX.WARN} | Execution time limit for count of slow queries. |
30 |
{$PG.SLOW_QUERIES.MAX.WARN} | Slow queries count threshold for a trigger. |
5 |
{$PG.URI} | - |
tcp://localhost:5432 |
{$PG.USER} | - |
postgres |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Database discovery | - |
ZABBIX_PASSIVE | pgsql.db.discovery["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}"] Filter: AND- {#DBNAME} MATCHES_REGEX |
Replication Discovery | - |
ZABBIX_PASSIVE | pgsql.replication.process.discovery["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}"] Filter: AND- {#APPLICATIONNAME} MATCHESREGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
PostgreSQL | PostgreSQL: Custom queries | Execute custom queries from file *.sql (check for option Plugins.Postgres.CustomQueriesPath at agent configuration) |
ZABBIX_PASSIVE | pgsql.custom.query["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}",""] |
PostgreSQL | WAL: Bytes written | WAL write in bytes |
DEPENDENT | pgsql.wal.write Preprocessing: - JSONPATH: - CHANGEPERSECOND |
PostgreSQL | WAL: Bytes received | WAL receive in bytes |
DEPENDENT | pgsql.wal.receive Preprocessing: - JSONPATH: - CHANGEPERSECOND |
PostgreSQL | WAL: Segments count | Number of WAL segments |
DEPENDENT | pgsql.wal.count Preprocessing: - JSONPATH: |
PostgreSQL | Bgwriter: Buffers allocated | Number of buffers allocated |
DEPENDENT | pgsql.bgwriter.buffersalloc.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | Bgwriter: Buffers written directly by a backend | Number of buffers written directly by a backend |
DEPENDENT | pgsql.bgwriter.buffersbackend.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | Bgwriter: Number of bgwriter stopped | Number of times the background writer stopped a cleaning scan because it had written too many buffers |
DEPENDENT | pgsql.bgwriter.maxwrittenclean.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | Bgwriter: Times a backend execute its own fsync | Number of times a backend had to execute its own fsync call (normally the background writer handles those even when the backend does its own write) |
DEPENDENT | pgsql.bgwriter.buffersbackendfsync.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
PostgreSQL | Checkpoint: Buffers background written | Number of buffers written by the background writer |
DEPENDENT | pgsql.bgwriter.buffersclean.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | Checkpoint: Buffers checkpoints written | Number of buffers written during checkpoints |
DEPENDENT | pgsql.bgwriter.bufferscheckpoint.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | Checkpoint: By timeout | Number of scheduled checkpoints that have been performed |
DEPENDENT | pgsql.bgwriter.checkpointstimed.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | Checkpoint: Requested | Number of requested checkpoints that have been performed |
DEPENDENT | pgsql.bgwriter.checkpointsreq.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | Checkpoint: Checkpoint write time | Total amount of time that has been spent in the portion of checkpoint processing where files are written to disk, in milliseconds |
DEPENDENT | pgsql.bgwriter.checkpointwritetime.rate Preprocessing: - JSONPATH: - MULTIPLIER: - CHANGEPERSECOND |
PostgreSQL | Checkpoint: Checkpoint write time | Total amount of time that has been spent in the portion of checkpoint processing where files are synchronized to disk, in milliseconds |
DEPENDENT | pgsql.bgwriter.checkpointsynctime.rate Preprocessing: - JSONPATH: - MULTIPLIER: - CHANGEPERSECOND |
PostgreSQL | Checkpoint: Checkpoint sync time | Total amount of time that has been spent in the portion of checkpoint processing where files are synchronized to disk |
DEPENDENT | pgsql.bgwriter.checkpointsynctime.rate Preprocessing: - JSONPATH: - MULTIPLIER: - CHANGEPERSECOND |
PostgreSQL | Archive: Count of archive files | Collect all metrics from pgstatactivity https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-ARCHIVER-VIEW |
DEPENDENT | pgsql.archive.countarchivedfiles Preprocessing: - JSONPATH: |
PostgreSQL | Archive: Count of attempts to archive files | Collect all metrics from pgstatactivity https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-ARCHIVER-VIEW |
DEPENDENT | pgsql.archive.failedtryingto_archive Preprocessing: - JSONPATH: |
PostgreSQL | Archive: Count of files in archive_status need to archive | - |
DEPENDENT | pgsql.archive.countfilesto_archive Preprocessing: - JSONPATH: |
PostgreSQL | Archive: Count of files need to archive | Size of files to archive |
DEPENDENT | pgsql.archive.sizefilesto_archive Preprocessing: - JSONPATH: |
PostgreSQL | Dbstat: Blocks read time | Time spent reading data file blocks by backends, in milliseconds |
DEPENDENT | pgsql.dbstat.sum.blkreadtime Preprocessing: - JSONPATH: - MULTIPLIER: |
PostgreSQL | Dbstat: Blocks write time | Time spent writing data file blocks by backends, in milliseconds |
DEPENDENT | pgsql.dbstat.sum.blkwritetime Preprocessing: - JSONPATH: - MULTIPLIER: |
PostgreSQL | Dbstat: Checksum failures | Number of data page checksum failures detected (or on a shared object), or NULL if data checksums are not enabled. This metric included in PostgreSQL 12 |
DEPENDENT | pgsql.dbstat.sum.checksumfailures.rate Preprocessing: - JSONPATH: - MATCHES REGEX:^\d*$ - CHANGEPERSECOND ⛔️ON_FAIL: |
PostgreSQL | Dbstat: Committed transactions | Number of transactions that have been committed |
DEPENDENT | pgsql.dbstat.sum.xactcommit.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | Dbstat: Conflicts | Number of queries canceled due to conflicts with recovery. (Conflicts occur only on standby servers; see pgstatdatabase_conflicts for details.) |
DEPENDENT | pgsql.dbstat.sum.conflicts.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
PostgreSQL | Dbstat: Deadlocks | Number of deadlocks detected |
DEPENDENT | pgsql.dbstat.sum.deadlocks.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
PostgreSQL | Dbstat: Disk blocks read | Number of disk blocks read |
DEPENDENT | pgsql.dbstat.sum.blksread.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | Dbstat: Hit blocks read | Number of times disk blocks were found already in the buffer cache |
DEPENDENT | pgsql.dbstat.sum.blkshit.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | Dbstat: Number temp bytes | Total amount of data written to temporary files by queries |
DEPENDENT | pgsql.dbstat.sum.tempbytes.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | Dbstat: Number temp bytes | Number of temporary files created by queries |
DEPENDENT | pgsql.dbstat.sum.tempfiles.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | Dbstat: Roll backed transactions | Number of transactions that have been rolled back |
DEPENDENT | pgsql.dbstat.sum.xactrollback.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | Dbstat: Rows deleted | Number of rows deleted by queries |
DEPENDENT | pgsql.dbstat.sum.tupdeleted.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | Dbstat: Rows fetched | Number of rows fetched by queries |
DEPENDENT | pgsql.dbstat.sum.tupfetched.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | Dbstat: Rows inserted | Number of rows inserted by queries |
DEPENDENT | pgsql.dbstat.sum.tupinserted.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | Dbstat: Rows returned | Number of rows returned by queries |
DEPENDENT | pgsql.dbstat.sum.tupreturned.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | Dbstat: Rows updated | Number of rows updated by queries |
DEPENDENT | pgsql.dbstat.sum.tupupdated.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | Dbstat: Backends connected | Number of connected backends |
DEPENDENT | pgsql.dbstat.sum.numbackends Preprocessing: - JSONPATH: |
PostgreSQL | Connections sum: Active | Total number of connections executing a query |
DEPENDENT | pgsql.connections.active Preprocessing: - JSONPATH: |
PostgreSQL | Connections sum: Fastpath function call | Total number of connections executing a fast-path function |
DEPENDENT | pgsql.connections.fastpathfunctioncall Preprocessing: - JSONPATH: |
PostgreSQL | Connections sum: Idle | Total number of connections waiting for a new client command |
DEPENDENT | pgsql.connections.idle Preprocessing: - JSONPATH: |
PostgreSQL | Connections sum: Idle in transaction | Total number of connections in a transaction state, but not executing a query |
DEPENDENT | pgsql.connections.idleintransaction Preprocessing: - JSONPATH: |
PostgreSQL | Connections sum: Prepared | Total number of prepared transactions https://www.postgresql.org/docs/current/sql-prepare-transaction.html |
DEPENDENT | pgsql.connections.prepared Preprocessing: - JSONPATH: |
PostgreSQL | Connections sum: Total | Total number of connections |
DEPENDENT | pgsql.connections.total Preprocessing: - JSONPATH: |
PostgreSQL | Connections sum: Total % | Total number of connections in percentage |
DEPENDENT | pgsql.connections.total_pct Preprocessing: - JSONPATH: |
PostgreSQL | Connections sum: Waiting | Total number of waiting connections https://www.postgresql.org/docs/current/monitoring-stats.html#WAIT-EVENT-TABLE |
DEPENDENT | pgsql.connections.waiting Preprocessing: - JSONPATH: |
PostgreSQL | Connections sum: Idle in transaction (aborted) | Total number of connections in a transaction state, but not executing a query and one of the statements in the transaction caused an error. |
DEPENDENT | pgsql.connections.idleintransaction_aborted Preprocessing: - JSONPATH: |
PostgreSQL | Connections sum: Disabled | Total number of disabled connections |
DEPENDENT | pgsql.connections.disabled Preprocessing: - JSONPATH: |
PostgreSQL | PostgreSQL: Age of oldest xid | Age of oldest xid. |
ZABBIX_PASSIVE | pgsql.oldest.xid["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}"] |
PostgreSQL | Autovacuum: Count of autovacuum workers | Number of autovacuum workers. |
ZABBIX_PASSIVE | pgsql.autovacuum.count["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}"] |
PostgreSQL | PostgreSQL: Cache hit | - |
CALCULATED | pgsql.cache.hit["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}"] Expression: last(//pgsql.dbstat.sum.blks_hit.rate) * 100 / (last(//pgsql.dbstat.sum.blks_hit.rate) + last(//pgsql.dbstat.sum.blks_read.rate)) |
PostgreSQL | PostgreSQL: Uptime | - |
ZABBIX_PASSIVE | pgsql.uptime["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}"] |
PostgreSQL | Replication: Lag in bytes | Replication lag with Master in byte. |
ZABBIX_PASSIVE | pgsql.replication.lag.b["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}"] |
PostgreSQL | Replication: Lag in seconds | Replication lag with Master in seconds. |
ZABBIX_PASSIVE | pgsql.replication.lag.sec["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}"] |
PostgreSQL | Replication: Recovery role | Replication role: 1 — recovery is still in progress (standby mode), 0 — master mode. |
ZABBIX_PASSIVE | pgsql.replication.recovery_role["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}"] |
PostgreSQL | Replication: Standby count | Number of standby servers |
ZABBIX_PASSIVE | pgsql.replication.count["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}"] |
PostgreSQL | Replication: Status | Replication status: 0 — streaming is down, 1 — streaming is up, 2 — master mode |
ZABBIX_PASSIVE | pgsql.replication.status["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}"] |
PostgreSQL | PostgreSQL: Ping | - |
ZABBIX_PASSIVE | pgsql.ping["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
PostgreSQL | Application {#APPLICATION_NAME}: Replication flush lag | DEPENDENT | pgsql.replication.process.flushlag["{#APPLICATIONNAME}"] Preprocessing: - JSONPATH: |
|
PostgreSQL | Application {#APPLICATION_NAME}: Replication replay lag | DEPENDENT | pgsql.replication.process.replaylag["{#APPLICATIONNAME}"] Preprocessing: - JSONPATH: |
|
PostgreSQL | Application {#APPLICATION_NAME}: Replication write lag | DEPENDENT | pgsql.replication.process.writelag["{#APPLICATIONNAME}"] Preprocessing: - JSONPATH: |
|
PostgreSQL | DB {#DBNAME}: Database age | Database age |
ZABBIX_PASSIVE | pgsql.db.age["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}","{#DBNAME}"] |
PostgreSQL | DB {#DBNAME}: Get bloating tables | Number of bloating tables |
ZABBIX_PASSIVE | pgsql.db.bloating_tables["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}","{#DBNAME}"] |
PostgreSQL | DB {#DBNAME}: Database size | Database size |
ZABBIX_PASSIVE | pgsql.db.size["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}","{#DBNAME}"] |
PostgreSQL | DB {#DBNAME}: Blocks hit per second | Total number of times disk blocks were found already in the buffer cache, so that a read was not necessary |
DEPENDENT | pgsql.dbstat.blkshit.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | DB {#DBNAME}: Disk blocks read per second | Total number of disk blocks read in this database |
DEPENDENT | pgsql.dbstat.blksread.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | DB {#DBNAME}: Detected conflicts per second | Total number of queries canceled due to conflicts with recovery in this database |
DEPENDENT | pgsql.dbstat.conflicts.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
PostgreSQL | DB {#DBNAME}: Detected deadlocks per second | Total number of detected deadlocks in this database |
DEPENDENT | pgsql.dbstat.deadlocks.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
PostgreSQL | DB {#DBNAME}: Temp_bytes written per second | Total amount of data written to temporary files by queries in this database |
DEPENDENT | pgsql.dbstat.tempbytes.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | DB {#DBNAME}: Temp_files created per second | Total number of temporary files created by queries in this database |
DEPENDENT | pgsql.dbstat.tempfiles.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | DB {#DBNAME}: Tuples deleted per second | Total number of rows deleted by queries in this database |
DEPENDENT | pgsql.dbstat.tupdeleted.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | DB {#DBNAME}: Tuples fetched per second | Total number of rows fetched by queries in this database |
DEPENDENT | pgsql.dbstat.tupfetched.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | DB {#DBNAME}: Tuples inserted per second | Total number of rows inserted by queries in this database |
DEPENDENT | pgsql.dbstat.tupinserted.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | DB {#DBNAME}: Tuples returned per second | Number of rows returned by queries in this database |
DEPENDENT | pgsql.dbstat.tupreturned.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | DB {#DBNAME}: Tuples updated per second | Total number of rows updated by queries in this database |
DEPENDENT | pgsql.dbstat.tupupdated.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | DB {#DBNAME}: Commits per second | Number of transactions in this database that have been committed |
DEPENDENT | pgsql.dbstat.xactcommit.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | DB {#DBNAME}: Rollbacks per second | Total number of transactions in this database that have been rolled back |
DEPENDENT | pgsql.dbstat.xactrollback.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | DB {#DBNAME}: Backends connected | Number of backends currently connected to this database |
DEPENDENT | pgsql.dbstat.numbackends["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Checksum failures | Number of data page checksum failures detected in this database |
DEPENDENT | pgsql.dbstat.checksumfailures.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - MATCHES REGEX:^\d*$ - CHANGEPERSECOND ⛔️ON_FAIL: |
PostgreSQL | DB {#DBNAME}: Disk blocks read time | Time spent reading data file blocks by backends, in milliseconds |
DEPENDENT | pgsql.dbstat.blkreadtime.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - MULTIPLIER: - CHANGEPERSECOND |
PostgreSQL | DB {#DBNAME}: Disk blocks write time | Time spent writing data file blocks by backends, in milliseconds |
DEPENDENT | pgsql.dbstat.blkwritetime.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - MULTIPLIER: - CHANGEPERSECOND |
PostgreSQL | DB {#DBNAME}: Num of accessexclusive locks | Number of accessexclusive locks for each database |
DEPENDENT | pgsql.locks.accessexclusive["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Num of accessshare locks | Number of accessshare locks for each database |
DEPENDENT | pgsql.locks.accessshare["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Num of exclusive locks | Number of exclusive locks for each database |
DEPENDENT | pgsql.locks.exclusive["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Num of rowexclusive locks | Number of rowexclusive locks for each database |
DEPENDENT | pgsql.locks.rowexclusive["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Num of rowshare locks | Number of rowshare locks for each database |
DEPENDENT | pgsql.locks.rowshare["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Num of sharerowexclusive locks | Number of total sharerowexclusive for each database |
DEPENDENT | pgsql.locks.sharerowexclusive["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Num of shareupdateexclusive locks | Number of shareupdateexclusive locks for each database |
DEPENDENT | pgsql.locks.shareupdateexclusive["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Num of share locks | Number of share locks for each database |
DEPENDENT | pgsql.locks.share["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Num of total locks | Number of total locks for each database |
DEPENDENT | pgsql.locks.total["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Queries max maintenance time | Max maintenance query time |
DEPENDENT | pgsql.queries.mro.time_max["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Queries max query time | Max query time |
DEPENDENT | pgsql.queries.query.time_max["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Queries max transaction time | Max transaction query time |
DEPENDENT | pgsql.queries.tx.time_max["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Queries slow maintenance count | Slow maintenance query count |
DEPENDENT | pgsql.queries.mro.slow_count["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Queries slow query count | Slow query count |
DEPENDENT | pgsql.queries.query.slow_count["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Queries slow transaction count | Slow transaction query count |
DEPENDENT | pgsql.queries.tx.slow_count["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Queries sum maintenance time | Sum maintenance query time |
DEPENDENT | pgsql.queries.mro.time_sum["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Queries sum query time | Sum query time |
DEPENDENT | pgsql.queries.query.time_sum["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Queries sum transaction time | Sum transaction query time |
DEPENDENT | pgsql.queries.tx.time_sum["{#DBNAME}"] Preprocessing: - JSONPATH: |
Zabbix raw items | PostgreSQL: Get bgwriter | https://www.postgresql.org/docs/12/monitoring-stats.html#PG-STAT-BGWRITER-VIEW |
ZABBIX_PASSIVE | pgsql.bgwriter["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}"] |
Zabbix raw items | PostgreSQL: Get archive | Collect archive status metrics |
ZABBIX_PASSIVE | pgsql.archive["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}"] |
Zabbix raw items | PostgreSQL: Get dbstat | Collect all metrics from pgstatdatabase per database https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-DATABASE-VIEW |
ZABBIX_PASSIVE | pgsql.dbstat["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}"] |
Zabbix raw items | PostgreSQL: Get dbstat sum | Collect all metrics from pgstatdatabase per database https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-DATABASE-VIEW |
ZABBIX_PASSIVE | pgsql.dbstat.sum["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}"] |
Zabbix raw items | PostgreSQL: Get connections | Collect all metrics from pgstatactivity https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW |
ZABBIX_PASSIVE | pgsql.connections["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}"] |
Zabbix raw items | PostgreSQL: Get WAL | Collect WAL metrics |
ZABBIX_PASSIVE | pgsql.wal.stat["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}"] |
Zabbix raw items | PostgreSQL: Get locks | Collect all metrics from pg_locks per database https://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-TABLES |
ZABBIX_PASSIVE | pgsql.locks["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}"] |
Zabbix raw items | PostgreSQL: Get replication | Collect metrics from the pgstatreplication, which contains information about the WAL sender process, showing statistics about replication to that sender's connected standby server. |
ZABBIX_PASSIVE | pgsql.replication.process["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}"] |
Zabbix raw items | PostgreSQL: Get queries | Collect all metrics by query execution time |
ZABBIX_PASSIVE | pgsql.queries["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DATABASE}","{$PG.QUERY_ETIME.MAX.WARN}"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Dbstat: Checksum failures detected | Data page checksum failures were detected on that DB instance. https://www.postgresql.org/docs/current/checksums.html |
last(/PostgreSQL by Zabbix agent 2/pgsql.dbstat.sum.checksum_failures.rate)>0 |
AVERAGE | |
Connections sum: Total number of connections is too high | - |
min(/PostgreSQL by Zabbix agent 2/pgsql.connections.total_pct,5m) > {$PG.CONN_TOTAL_PCT.MAX.WARN} |
AVERAGE | |
PostgreSQL: Oldest xid is too big | - |
last(/PostgreSQL by Zabbix agent 2/pgsql.oldest.xid["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}"]) > 18000000 |
AVERAGE | |
PostgreSQL: Service has been restarted | - |
last(/PostgreSQL by Zabbix agent 2/pgsql.uptime["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}"]) < 600 |
AVERAGE | |
PostgreSQL: Service is down | - |
last(/PostgreSQL by Zabbix agent 2/pgsql.ping["{$PG.URI}","{$PG.USER}","{$PG.PASSWORD}"])=0 |
HIGH | |
DB {#DBNAME}: Too many recovery conflicts | The primary and standby servers are in many ways loosely connected. Actions on the primary will have an effect on the standby. As a result, there is potential for negative interactions or conflicts between them. https://www.postgresql.org/docs/current/hot-standby.html#HOT-STANDBY-CONFLICT |
min(/PostgreSQL by Zabbix agent 2/pgsql.dbstat.conflicts.rate["{#DBNAME}"],5m) > {$PG.CONFLICTS.MAX.WARN:"{#DBNAME}"} |
AVERAGE | |
DB {#DBNAME}: Deadlock occurred | - |
min(/PostgreSQL by Zabbix agent 2/pgsql.dbstat.deadlocks.rate["{#DBNAME}"],5m) > {$PG.DEADLOCKS.MAX.WARN:"{#DBNAME}"} |
HIGH | |
DB {#DBNAME}: Checksum failures detected | Data page checksum failures were detected on that database. https://www.postgresql.org/docs/current/checksums.html |
last(/PostgreSQL by Zabbix agent 2/pgsql.dbstat.checksum_failures.rate["{#DBNAME}"])>0 |
AVERAGE | |
DB {#DBNAME}: Too many slow queries | - |
min(/PostgreSQL by Zabbix agent 2/pgsql.queries.query.slow_count["{#DBNAME}"],5m)>{$PG.SLOW_QUERIES.MAX.WARN:"{#DBNAME}"} |
WARNING |
Please report any issues with the template at https://support.zabbix.com.
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums.
For Zabbix version: 6.2 and higher
Templates to monitor PostgreSQL by Zabbix.
This template was tested on PostgreSQL versions 9.6, 10 and 11 on Linux and Windows.
See Zabbix template operation for basic instructions.
Install Zabbix agent and create a read-only zbx_monitor
user with proper access to your PostgreSQL server.
For PostgreSQL version 10 and above:
CREATE USER zbx_monitor WITH PASSWORD '<PASSWORD>' INHERIT;
GRANT pg_monitor TO zbx_monitor;
For PostgreSQL version 9.6 and below:
CREATE USER zbx_monitor WITH PASSWORD '<PASSWORD>';
GRANT SELECT ON pg_stat_database TO zbx_monitor;
-- To collect WAL metrics, the user must have a `superuser` role.
ALTER USER zbx_monitor WITH SUPERUSER;
Copy postgresql/
to Zabbix agent home directory /var/lib/zabbix/
. The postgresql/
directory contains the files needed to obtain metrics from PostgreSQL.
Copy template_db_postgresql.conf
to Zabbix agent configuration directory /etc/zabbix/zabbix_agentd.d/
and restart Zabbix agent service.
Edit pg_hba.conf
to allow connections from Zabbix agent https://www.postgresql.org/docs/current/auth-pg-hba-conf.html.
Add rows (for example):
host all zbx_monitor 127.0.0.1/32 trust
host all zbx_monitor 0.0.0.0/0 md5
host all zbx_monitor ::0/0 md5
Import template file to Zabbix and link it to the target host
Set {$PG.HOST}, {$PG.PORT}, {$PG.USER}, {$PG.PASSWORD} and {$PG.DB} macros values.
If PostgreSQL is installed from the PGDG
repository, then add the path to pg_isready
to the PATH
environment variable for zabbix
user.
Name | Description | Default |
---|---|---|
{$PG.CACHE_HITRATIO.MIN.WARN} | - |
90 |
{$PG.CHECKPOINTS_REQ.MAX.WARN} | - |
5 |
{$PG.CONFLICTS.MAX.WARN} | - |
0 |
{$PG.CONNIDLEIN_TRANS.MAX.WARN} | - |
5 |
{$PG.CONNTOTALPCT.MAX.WARN} | - |
90 |
{$PG.CONN_WAIT.MAX.WARN} | - |
0 |
{$PG.DB} | - |
postgres |
{$PG.DEADLOCKS.MAX.WARN} | - |
0 |
{$PG.FROZENXIDPCTSTOP.MIN.HIGH} | - |
75 |
{$PG.HOST} | - |
127.0.0.1 |
{$PG.LLD.FILTER.DBNAME} | - |
(.*) |
{$PG.LOCKS.MAX.WARN} | - |
100 |
{$PG.PASSWORD} | Please set user's password in this macro. |
`` |
{$PG.PING_TIME.MAX.WARN} | - |
1s |
{$PG.PORT} | - |
5432 |
{$PG.QUERY_ETIME.MAX.WARN} | - |
30 |
{$PG.REPL_LAG.MAX.WARN} | - |
10m |
{$PG.SLOW_QUERIES.MAX.WARN} | - |
5 |
{$PG.TRANS_ACTIVE.MAX.WARN} | - |
30s |
{$PG.TRANS_IDLE.MAX.WARN} | - |
30s |
{$PG.TRANS_WAIT.MAX.WARN} | - |
30s |
{$PG.USER} | - |
zbx_monitor |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Database discovery | - |
ZABBIX_PASSIVE | pgsql.discovery.db["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"] Filter: - {#DBNAME} MATCHES_REGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
PostgreSQL | Bgwriter: Buffers allocated per second | Number of buffers allocated |
DEPENDENT | pgsql.bgwriter.buffersalloc.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | Bgwriter: Buffers written directly by a backend per second | Number of buffers written directly by a backend |
DEPENDENT | pgsql.bgwriter.buffersbackend.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | Bgwriter: Buffers backend fsync per second | Number of times a backend had to execute its own fsync call (normally the background writer handles those even when the backend does its own write) |
DEPENDENT | pgsql.bgwriter.buffersbackendfsync.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
PostgreSQL | Bgwriter: Buffers written during checkpoints per second | Number of buffers written during checkpoints |
DEPENDENT | pgsql.bgwriter.bufferscheckpoint.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | Bgwriter: Buffers written by the background writer per second | Number of buffers written by the background writer |
DEPENDENT | pgsql.bgwriter.buffersclean.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | Bgwriter: Requested checkpoints per second | Number of requested checkpoints that have been performed |
DEPENDENT | pgsql.bgwriter.checkpointsreq.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | Bgwriter: Scheduled checkpoints per second | Number of scheduled checkpoints that have been performed |
DEPENDENT | pgsql.bgwriter.checkpointstimed.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | Bgwriter: Checkpoint sync time | Total amount of time that has been spent in the portion of checkpoint processing where files are synchronized to disk |
DEPENDENT | pgsql.bgwriter.checkpointsynctime Preprocessing: - JSONPATH: - MULTIPLIER: - CHANGEPERSECOND |
PostgreSQL | Bgwriter: Checkpoint write time | Total amount of time that has been spent in the portion of checkpoint processing where files are written to disk, in milliseconds |
DEPENDENT | pgsql.bgwriter.checkpointwritetime Preprocessing: - JSONPATH: - MULTIPLIER: - CHANGEPERSECOND |
PostgreSQL | Bgwriter: Max written per second | Number of times the background writer stopped a cleaning scan because it had written too many buffers |
DEPENDENT | pgsql.bgwriter.maxwrittenclean.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | Status: Cache hit ratio % | Cache hit ratio |
ZABBIX_PASSIVE | pgsql.cache.hit["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"] |
PostgreSQL | Status: Config hash | PostgreSQL configuration hash |
ZABBIX_PASSIVE | pgsql.config.hash["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
PostgreSQL | Connections sum: Active | Total number of connections executing a query |
DEPENDENT | pgsql.connections.sum.active Preprocessing: - JSONPATH: |
PostgreSQL | Connections sum: Idle | Total number of connections waiting for a new client command |
DEPENDENT | pgsql.connections.sum.idle Preprocessing: - JSONPATH: |
PostgreSQL | Connections sum: Idle in transaction | Total number of connections in a transaction state, but not executing a query |
DEPENDENT | pgsql.connections.sum.idleintransaction Preprocessing: - JSONPATH: |
PostgreSQL | Connections sum: Prepared | Total number of prepared transactions https://www.postgresql.org/docs/current/sql-prepare-transaction.html |
DEPENDENT | pgsql.connections.sum.prepared Preprocessing: - JSONPATH: |
PostgreSQL | Connections sum: Total | Total number of connections |
DEPENDENT | pgsql.connections.sum.total Preprocessing: - JSONPATH: |
PostgreSQL | Connections sum: Total % | Total number of connections in percentage |
DEPENDENT | pgsql.connections.sum.total_pct Preprocessing: - JSONPATH: |
PostgreSQL | Connections sum: Waiting | Total number of waiting connections https://www.postgresql.org/docs/current/monitoring-stats.html#WAIT-EVENT-TABLE |
DEPENDENT | pgsql.connections.sum.waiting Preprocessing: - JSONPATH: |
PostgreSQL | Status: Ping time | - |
ZABBIX_PASSIVE | pgsql.ping.time["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"] Preprocessing: - REGEX: - MULTIPLIER: |
PostgreSQL | Status: Ping | - |
ZABBIX_PASSIVE | pgsql.ping["{$PG.HOST}","{$PG.PORT}"] Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
PostgreSQL | Replication: standby count | Number of standby servers |
ZABBIX_PASSIVE | pgsql.replication.count["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"] |
PostgreSQL | Replication: lag in seconds | Replication lag with Master in seconds |
ZABBIX_PASSIVE | pgsql.replication.lag.sec["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"] |
PostgreSQL | Replication: recovery role | Replication role: 1 — recovery is still in progress (standby mode), 0 — master mode. |
ZABBIX_PASSIVE | pgsql.replication.recovery_role["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"] |
PostgreSQL | Replication: status | Replication status: 0 — streaming is down, 1 — streaming is up, 2 — master mode |
ZABBIX_PASSIVE | pgsql.replication.status["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"] |
PostgreSQL | Transactions: Max active transaction time | Current max active transaction time |
DEPENDENT | pgsql.transactions.active Preprocessing: - JSONPATH: |
PostgreSQL | Transactions: Max idle transaction time | Current max idle transaction time |
DEPENDENT | pgsql.transactions.idle Preprocessing: - JSONPATH: |
PostgreSQL | Transactions: Max prepared transaction time | Current max prepared transaction time |
DEPENDENT | pgsql.transactions.prepared Preprocessing: - JSONPATH: |
PostgreSQL | Transactions: Max waiting transaction time | Current max waiting transaction time |
DEPENDENT | pgsql.transactions.waiting Preprocessing: - JSONPATH: |
PostgreSQL | Status: Uptime | - |
ZABBIX_PASSIVE | pgsql.uptime["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"] |
PostgreSQL | Status: Version | PostgreSQL version |
ZABBIX_PASSIVE | pgsql.version["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
PostgreSQL | WAL: Segments count | Number of WAL segments |
DEPENDENT | pgsql.wal.count Preprocessing: - JSONPATH: |
PostgreSQL | WAL: Bytes written | WAL write in bytes |
DEPENDENT | pgsql.wal.write Preprocessing: - JSONPATH: - CHANGEPERSECOND |
PostgreSQL | DB {#DBNAME}: Database size | Database size |
ZABBIX_PASSIVE | pgsql.db.size["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}","{#DBNAME}"] |
PostgreSQL | DB {#DBNAME}: Blocks hit per second | Total number of times disk blocks were found already in the buffer cache, so that a read was not necessary |
DEPENDENT | pgsql.dbstat.blkshit.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | DB {#DBNAME}: Disk blocks read per second | Total number of disk blocks read in this database |
DEPENDENT | pgsql.dbstat.blksread.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | DB {#DBNAME}: Detected conflicts per second | Total number of queries canceled due to conflicts with recovery in this database |
DEPENDENT | pgsql.dbstat.conflicts.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
PostgreSQL | DB {#DBNAME}: Detected deadlocks per second | Total number of detected deadlocks in this database |
DEPENDENT | pgsql.dbstat.deadlocks.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
PostgreSQL | DB {#DBNAME}: Temp_bytes written per second | Total amount of data written to temporary files by queries in this database |
DEPENDENT | pgsql.dbstat.tempbytes.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | DB {#DBNAME}: Temp_files created per second | Total number of temporary files created by queries in this database |
DEPENDENT | pgsql.dbstat.tempfiles.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | DB {#DBNAME}: Tuples deleted per second | Total number of rows deleted by queries in this database |
DEPENDENT | pgsql.dbstat.tupdeleted.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | DB {#DBNAME}: Tuples fetched per second | Total number of rows fetched by queries in this database |
DEPENDENT | pgsql.dbstat.tupfetched.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | DB {#DBNAME}: Tuples inserted per second | Total number of rows inserted by queries in this database |
DEPENDENT | pgsql.dbstat.tupinserted.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | DB {#DBNAME}: Tuples returned per second | Total number of rows updated by queries in this database |
DEPENDENT | pgsql.dbstat.tupreturned.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | DB {#DBNAME}: Tuples updated per second | Total number of rows updated by queries in this database |
DEPENDENT | pgsql.dbstat.tupupdated.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | DB {#DBNAME}: Commits per second | Number of transactions in this database that have been committed |
DEPENDENT | pgsql.dbstat.xactcommit.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | DB {#DBNAME}: Rollbacks per second | Total number of transactions in this database that have been rolled back |
DEPENDENT | pgsql.dbstat.xactrollback.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
PostgreSQL | DB {#DBNAME}: Frozen XID before avtovacuum % | reventing Transaction ID Wraparound Failures https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND |
DEPENDENT | pgsql.frozenxid.prcbeforeav["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Frozen XID before stop % | Preventing Transaction ID Wraparound Failures https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND |
DEPENDENT | pgsql.frozenxid.prcbeforestop["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Locks total | Total number of locks in the database |
DEPENDENT | pgsql.locks.total["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Queries slow maintenance count | Slow maintenance query count |
DEPENDENT | pgsql.queries.mro.slow_count["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Queries max maintenance time | Max maintenance query time |
DEPENDENT | pgsql.queries.mro.time_max["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Queries sum maintenance time | Sum maintenance query time |
DEPENDENT | pgsql.queries.mro.time_sum["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Queries slow query count | Slow query count |
DEPENDENT | pgsql.queries.query.slow_count["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Queries max query time | Max query time |
DEPENDENT | pgsql.queries.query.time_max["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Queries sum query time | Sum query time |
DEPENDENT | pgsql.queries.query.time_sum["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Queries slow transaction count | Slow transaction query count |
DEPENDENT | pgsql.queries.tx.slow_count["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Queries max transaction time | Max transaction query time |
DEPENDENT | pgsql.queries.tx.time_max["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Queries sum transaction time | Sum transaction query time |
DEPENDENT | pgsql.queries.tx.time_sum["{#DBNAME}"] Preprocessing: - JSONPATH: |
PostgreSQL | DB {#DBNAME}: Index scans per second | Number of index scans in the database |
DEPENDENT | pgsql.scans.idx.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
PostgreSQL | DB {#DBNAME}: Sequential scans per second | Number of sequential scans in the database |
DEPENDENT | pgsql.scans.seq.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Zabbix raw items | PostgreSQL: Get bgwriter | Statistics about the background writer process's activity |
ZABBIX_PASSIVE | pgsql.bgwriter["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"] |
Zabbix raw items | PostgreSQL: Get connections sum | Collect all metrics from pgstatactivity https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-ACTIVITY-VIEW |
ZABBIX_PASSIVE | pgsql.connections.sum["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"] |
Zabbix raw items | PostgreSQL: Get dbstat | Collect all metrics from pgstatdatabase per database https://www.postgresql.org/docs/current/monitoring-stats.html#PG-STAT-DATABASE-VIEW |
ZABBIX_PASSIVE | pgsql.dbstat["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"] |
Zabbix raw items | PostgreSQL: Get locks | Collect all metrics from pg_locks per database https://www.postgresql.org/docs/current/explicit-locking.html#LOCKING-TABLES |
ZABBIX_PASSIVE | pgsql.locks["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"] |
Zabbix raw items | PostgreSQL: Get queries | Collect all metrics by query execution time |
ZABBIX_PASSIVE | pgsql.queries["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}","{$PG.QUERY_ETIME.MAX.WARN}"] |
Zabbix raw items | PostgreSQL: Get transactions | Collect metrics by transaction execution time |
ZABBIX_PASSIVE | pgsql.transactions["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"] |
Zabbix raw items | PostgreSQL: Get WAL | Master item to collect WAL metrics |
ZABBIX_PASSIVE | pgsql.wal.stat["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"] |
Zabbix raw items | DB {#DBNAME}: Get frozen XID | - |
ZABBIX_PASSIVE | pgsql.frozenxid["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{#DBNAME}"] |
Zabbix raw items | DB {#DBNAME}: Get scans | Number of scans done for table/index in the database |
ZABBIX_PASSIVE | pgsql.scans["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{#DBNAME}"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
PostgreSQL: Required checkpoints occurs too frequently | Checkpoints are points in the sequence of transactions at which it is guaranteed that the heap and index data files have been updated with all information written before that checkpoint. At checkpoint time, all dirty data pages are flushed to disk and a special checkpoint record is written to the log file. https://www.postgresql.org/docs/current/wal-configuration.html |
last(/PostgreSQL by Zabbix agent/pgsql.bgwriter.checkpoints_req.rate) > {$PG.CHECKPOINTS_REQ.MAX.WARN} |
AVERAGE | |
PostgreSQL: Cache hit ratio too low | - |
max(/PostgreSQL by Zabbix agent/pgsql.cache.hit["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"],5m) < {$PG.CACHE_HITRATIO.MIN.WARN} |
WARNING | |
PostgreSQL: Configuration has changed | - |
last(/PostgreSQL by Zabbix agent/pgsql.config.hash["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"],#1)<>last(/PostgreSQL by Zabbix agent/pgsql.config.hash["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"],#2) and length(last(/PostgreSQL by Zabbix agent/pgsql.config.hash["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"]))>0 |
INFO | |
PostgreSQL: Total number of connections is too high | - |
min(/PostgreSQL by Zabbix agent/pgsql.connections.sum.total_pct,5m) > {$PG.CONN_TOTAL_PCT.MAX.WARN} |
AVERAGE | |
PostgreSQL: Response too long | - |
min(/PostgreSQL by Zabbix agent/pgsql.ping.time["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"],5m) > {$PG.PING_TIME.MAX.WARN} |
AVERAGE | Depends on: - PostgreSQL: Service is down |
PostgreSQL: Service is down | - |
last(/PostgreSQL by Zabbix agent/pgsql.ping["{$PG.HOST}","{$PG.PORT}"]) = 0 |
HIGH | |
PostgreSQL: Streaming lag with {#MASTER} is too high | - |
min(/PostgreSQL by Zabbix agent/pgsql.replication.lag.sec["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"],5m) > {$PG.REPL_LAG.MAX.WARN} |
AVERAGE | |
PostgreSQL: Replication is down | - |
max(/PostgreSQL by Zabbix agent/pgsql.replication.status["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"],5m)=0 |
AVERAGE | |
PostgreSQL: Service has been restarted | PostgreSQL uptime is less than 10 minutes |
last(/PostgreSQL by Zabbix agent/pgsql.uptime["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"]) < 10m |
INFO | |
PostgreSQL: Version has changed | - |
last(/PostgreSQL by Zabbix agent/pgsql.version["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"],#1)<>last(/PostgreSQL by Zabbix agent/pgsql.version["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"],#2) and length(last(/PostgreSQL by Zabbix agent/pgsql.version["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"]))>0 |
INFO | |
DB {#DBNAME}: Too many recovery conflicts | The primary and standby servers are in many ways loosely connected. Actions on the primary will have an effect on the standby. As a result, there is potential for negative interactions or conflicts between them. https://www.postgresql.org/docs/current/hot-standby.html#HOT-STANDBY-CONFLICT |
min(/PostgreSQL by Zabbix agent/pgsql.dbstat.conflicts.rate["{#DBNAME}"],5m) > {$PG.CONFLICTS.MAX.WARN:"{#DBNAME}"} |
AVERAGE | |
DB {#DBNAME}: Deadlock occurred | - |
min(/PostgreSQL by Zabbix agent/pgsql.dbstat.deadlocks.rate["{#DBNAME}"],5m) > {$PG.DEADLOCKS.MAX.WARN:"{#DBNAME}"} |
HIGH | |
DB {#DBNAME}: VACUUM FREEZE is required to prevent wraparound | Preventing Transaction ID Wraparound Failures https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-WRAPAROUND |
last(/PostgreSQL by Zabbix agent/pgsql.frozenxid.prc_before_stop["{#DBNAME}"])<{$PG.FROZENXID_PCT_STOP.MIN.HIGH:"{#DBNAME}"} |
AVERAGE | |
DB {#DBNAME}: Number of locks is too high | - |
min(/PostgreSQL by Zabbix agent/pgsql.locks.total["{#DBNAME}"],5m)>{$PG.LOCKS.MAX.WARN:"{#DBNAME}"} |
WARNING | |
DB {#DBNAME}: Too many slow queries | - |
min(/PostgreSQL by Zabbix agent/pgsql.queries.query.slow_count["{#DBNAME}"],5m)>{$PG.SLOW_QUERIES.MAX.WARN:"{#DBNAME}"} |
WARNING | |
PostgreSQL: Failed to get items | Zabbix has not received data for items for the last 30 minutes |
nodata(/PostgreSQL by Zabbix agent/pgsql.bgwriter["{$PG.HOST}","{$PG.PORT}","{$PG.USER}","{$PG.PASSWORD}","{$PG.DB}"],30m) = 1 |
WARNING | Depends on: - PostgreSQL: Service is down |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher. The template is developed to monitor a single DBMS Oracle Database instance with ODBC.
This template was tested on:
See Zabbix template operation for basic instructions.
Create an Oracle DB user for monitoring:
CREATE USER zabbix_mon IDENTIFIED BY <PASSWORD>;
-- Grant access to the zabbix_mon user.
GRANT CONNECT, CREATE SESSION TO zabbix_mon;
GRANT SELECT_CATALOG_ROLE to zabbix_mon;
GRANT SELECT ON v_$instance TO zabbix_mon;
GRANT SELECT ON v_$database TO zabbix_mon;
GRANT SELECT ON v_$sysmetric TO zabbix_mon;
GRANT SELECT ON v_$system_parameter TO zabbix_mon;
GRANT SELECT ON v_$session TO zabbix_mon;
GRANT SELECT ON v_$recovery_file_dest TO zabbix_mon;
GRANT SELECT ON v_$active_session_history TO zabbix_mon;
GRANT SELECT ON v_$osstat TO zabbix_mon;
GRANT SELECT ON v_$restore_point TO zabbix_mon;
GRANT SELECT ON v_$process TO zabbix_mon;
GRANT SELECT ON v_$datafile TO zabbix_mon;
GRANT SELECT ON v_$pgastat TO zabbix_mon;
GRANT SELECT ON v_$sgastat TO zabbix_mon;
GRANT SELECT ON v_$log TO zabbix_mon;
GRANT SELECT ON v_$archive_dest TO zabbix_mon;
GRANT SELECT ON v_$asm_diskgroup TO zabbix_mon;
GRANT SELECT ON sys.dba_data_files TO zabbix_mon;
GRANT SELECT ON DBA_TABLESPACES TO zabbix_mon;
GRANT SELECT ON DBA_TABLESPACE_USAGE_METRICS TO zabbix_mon;
GRANT SELECT ON DBA_USERS TO zabbix_mon;
Note! Ensure that ODBC connects to Oracle with session parameter NLSNUMERICCHARACTERS= '.,'. It is important for displaying the float numbers in Zabbix correctly.
Install the ODBC driver on Zabbix server or Zabbix proxy. See the Oracle documentation for instructions.
Configure Zabbix server or Zabbix proxy for the usage of Oracle Environment:
Edit or add a new file:
/etc/sysconfig/zabbix-server # for server
/etc/sysconfig/zabbix-proxy # for proxy
Then, add:
export ORACLE_HOME=/usr/lib/oracle/19.6/client64
export PATH=$PATH:$ORACLE_HOME/bin
export LD_LIBRARY_PATH=$ORACLE_HOME/lib:/usr/lib64:/usr/lib:$ORACLE_HOME/bin
export TNS_ADMIN=$ORACLE_HOME/network/admin
Restart Zabbix server or Zabbix proxy.
Set the username and password in the host macros ({$ORACLE.USER} and {$ORACLE.PASSWORD}).
Set the {$ORACLE.DRIVER} and {$ORACLE.SERVICE} in the host macros. {$ORACLE.DRIVER} is a path to the driver location in OS. The "Service's TCP port state" item uses {HOST.CONN} and {$ORACLE.PORT} macros to check the availability of the listener.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$ORACLE.ASM.USED.PCT.MAX.HIGH} | The maximum percentage of used Automatic Storage Management (ASM) disk group for a high trigger expression. |
95 |
{$ORACLE.ASM.USED.PCT.MAX.WARN} | The maximum percentage of used ASM disk group for a warning trigger expression. |
90 |
{$ORACLE.CONCURRENCY.MAX.WARN} | The maximum percentage of sessions concurrency usage for a trigger expression. |
80 |
{$ORACLE.DB.FILE.MAX.WARN} | The maximum percentage of used database files for a trigger expression. |
80 |
{$ORACLE.DBNAME.MATCHES} | This macro is used in database discovery. It can be overridden on host level or its linked template level. |
.* |
{$ORACLE.DBNAME.NOT_MATCHES} | This macro is used in database discovery. It can be overridden on host level or its linked template level. |
PDB\$SEED |
{$ORACLE.DRIVER} | The Oracle driver path. For example: |
<Put path to oracle driver here> |
{$ORACLE.EXPIRE.PASSWORD.MIN.WARN} | The number of warning days before the password expires for a trigger expression. |
7 |
{$ORACLE.PASSWORD} | The Oracle user's password. |
<Put your password here> |
{$ORACLE.PGA.USE.MAX.WARN} | Alert threshold for the maximum percentage of the Program Global Area (PGA) usage for a trigger expression. |
90 |
{$ORACLE.PORT} | Oracle DB TCP port. |
1521 |
{$ORACLE.PROCESSES.MAX.WARN} | Alert threshold for the maximum percentage of active processes for a trigger expression. |
80 |
{$ORACLE.REDO.MIN.WARN} | Alert threshold for the minimum number of REDO logs for a trigger expression. |
3 |
{$ORACLE.SERVICE} | Oracle Service Name. |
<Put oracle service name here> |
{$ORACLE.SESSION.LOCK.MAX.TIME} | The maximum duration of the session lock in seconds to count the session as a prolongedly locked query. |
600 |
{$ORACLE.SESSION.LONG.LOCK.MAX.WARN} | Alert threshold for the maximum number of the prolongedly locked sessions for a trigger expression. |
3 |
{$ORACLE.SESSIONS.LOCK.MAX.WARN} | Alert threshold for the maximum percentage of locked sessions for a trigger expression. |
20 |
{$ORACLE.SESSIONS.MAX.WARN} | Alert threshold for the maximum percentage of active sessions for a trigger expression. |
80 |
{$ORACLE.SHARED.FREE.MIN.WARN} | Alert threshold for the minimum percentage of free shared pool for a trigger expression. |
5 |
{$ORACLE.TABLESPACE.NAME.MATCHES} | This macro is used in tablespace discovery. It can be overridden on host level or its linked template level. |
.* |
{$ORACLE.TABLESPACE.NAME.NOT_MATCHES} | This macro is used in tablespace discovery. It can be overridden on host level or its linked template level. |
CHANGE_IF_NEEDED |
{$ORACLE.TBS.USED.PCT.MAX.HIGH} | High severity alert threshold for the maximum percentage of tablespace usage (used bytes/allocated bytes) for a trigger expression. |
95 |
{$ORACLE.TBS.USED.PCT.MAX.WARN} | Warning severity alert threshold for the maximum percentage of tablespace usage (used bytes/allocated bytes) for a trigger expression. |
90 |
{$ORACLE.TBS.UTIL.PCT.MAX.HIGH} | High severity alert threshold for the maximum percentage of tablespace utilization (allocated bytes/max bytes) for a trigger expression. |
90 |
{$ORACLE.TBS.UTIL.PCT.MAX.WARN} | Warning severity alert threshold for the maximum percentage of tablespace utilization (allocated bytes/max bytes) for a trigger expression. |
80 |
{$ORACLE.USER} | Oracle username. |
<Put your username here> |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Archive log discovery | Destinations of the log archive. |
ODBC | db.odbc.discovery[archivelog,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] |
ASM disk groups discovery | The ASM disk groups. |
ODBC | db.odbc.discovery[asm,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] |
Database discovery | Scanning databases in the database management system (DBMS). |
ODBC | db.odbc.discovery[dblist,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] Filter: AND- {#DBNAME} MATCHES REGEX{$ORACLE.DBNAME.MATCHES} - {#DBNAME} NOTMATCHESREGEX |
PDB discovery | Scanning a pluggable database (PDB) in DBMS. |
ODBC | db.odbc.discovery[pdblist,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] Filter: AND- {#DBNAME} MATCHES REGEX{$ORACLE.DBNAME.MATCHES} - {#DBNAME} NOTMATCHESREGEX |
Tablespace discovery | Scanning tablespaces in DBMS. |
ODBC | db.odbc.discovery[tbsname,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] Filter: AND- {#TABLESPACE} MATCHESREGEX - {#TABLESPACE} NOTMATCHES_REGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Oracle | Oracle: Service's TCP port state | It checks the availability of Oracle on the TCP port. |
ZABBIX_PASSIVE | net.tcp.service[tcp,{HOST.CONN},{$ORACLE.PORT}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Oracle | Oracle: Number of LISTENER processes | The number of running LISTENER processes. |
ZABBIX_PASSIVE | proc.num[,,,"tnslsnr LISTENER"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Oracle | Oracle: Version | The Oracle Server version. |
DEPENDENT | oracle.version Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Oracle | Oracle: Uptime | The Oracle instance uptime expressed in seconds. |
DEPENDENT | oracle.uptime Preprocessing: - JSONPATH: |
Oracle | Oracle: Instance status | The status of the instance. |
DEPENDENT | oracle.instance_status Preprocessing: - JSONPATH: |
Oracle | Oracle: Archiver state | The status of automatic archiving. |
DEPENDENT | oracle.archiver_state Preprocessing: - JSONPATH: |
Oracle | Oracle: Instance name | The name of an instance. |
DEPENDENT | oracle.instance_name Preprocessing: - JSONPATH: |
Oracle | Oracle: Instance hostname | The name of the host machine. |
DEPENDENT | oracle.instance_hostname Preprocessing: - JSONPATH: |
Oracle | Oracle: Instance role | It indicates whether the instance is an active instance or an inactive secondary instance. |
DEPENDENT | oracle.instance.role Preprocessing: - JSONPATH: |
Oracle | Oracle: Sessions limit | The user and system sessions. |
DEPENDENT | oracle.session_limit Preprocessing: - JSONPATH: |
Oracle | Oracle: Datafiles limit | The maximum allowable number of datafiles. |
DEPENDENT | oracle.dbfileslimit Preprocessing: - JSONPATH: |
Oracle | Oracle: Processes limit | The maximum number of user processes. |
DEPENDENT | oracle.processes_limit Preprocessing: - JSONPATH: |
Oracle | Oracle: Number of processes | DEPENDENT | oracle.processes_count Preprocessing: - JSONPATH: |
|
Oracle | Oracle: Datafiles count | The current number of datafiles. |
DEPENDENT | oracle.dbfilescount Preprocessing: - JSONPATH: |
Oracle | Oracle: Buffer cache hit ratio | The ratio of buffer cache hits ((LogRead - PhyRead)/LogRead). |
DEPENDENT | oracle.buffercachehit_ratio Preprocessing: - JSONPATH: |
Oracle | Oracle: Cursor cache hit ratio | The ratio of cursor cache hits (CursorCacheHit/SoftParse). |
DEPENDENT | oracle.cursorcachehit_ratio Preprocessing: - JSONPATH: |
Oracle | Oracle: Library cache hit ratio | The ratio of library cache hits (Hits/Pins). |
DEPENDENT | oracle.librarycachehit_ratio Preprocessing: - JSONPATH: |
Oracle | Oracle: Shared pool free % | Free memory of a shared pool expressed in %. |
DEPENDENT | oracle.sharedpoolfree Preprocessing: - JSONPATH: |
Oracle | Oracle: Physical reads per second | Reads per second. |
DEPENDENT | oracle.physicalreadsrate Preprocessing: - JSONPATH: |
Oracle | Oracle: Physical writes per second | Writes per second. |
DEPENDENT | oracle.physicalwritesrate Preprocessing: - JSONPATH: |
Oracle | Oracle: Physical reads bytes per second | Read bytes per second. |
DEPENDENT | oracle.physicalreadbytes_rate Preprocessing: - JSONPATH: |
Oracle | Oracle: Physical writes bytes per second | Write bytes per second. |
DEPENDENT | oracle.physicalwritebytes_rate Preprocessing: - JSONPATH: |
Oracle | Oracle: Enqueue timeouts per second | Enqueue timeouts per second. |
DEPENDENT | oracle.enqueuetimeoutsrate Preprocessing: - JSONPATH: |
Oracle | Oracle: GC CR block received per second | The global cache (GC) and the consistent read (CR) block received per second. |
DEPENDENT | oracle.gccrblockreceivedrate Preprocessing: - JSONPATH: |
Oracle | Oracle: Global cache blocks corrupted | The number of blocks that encountered corruption or checksum failure during the interconnect. |
DEPENDENT | oracle.cacheblockscorrupt Preprocessing: - JSONPATH: |
Oracle | Oracle: Global cache blocks lost | The number of lost global cache blocks. |
DEPENDENT | oracle.cacheblockslost Preprocessing: - JSONPATH: |
Oracle | Oracle: Logons per second | The number of logon attempts. |
DEPENDENT | oracle.logons_rate Preprocessing: - JSONPATH: |
Oracle | Oracle: Average active sessions | The average active sessions at a point in time. The number of sessions that are either working or waiting. |
DEPENDENT | oracle.active_sessions Preprocessing: - JSONPATH: |
Oracle | Oracle: Session count | The count of sessions. |
DEPENDENT | oracle.session_count Preprocessing: - JSONPATH: |
Oracle | Oracle: Active user sessions | The number of active user sessions. |
DEPENDENT | oracle.sessionactiveuser Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Oracle | Oracle: Active background sessions | The number of active background sessions. |
DEPENDENT | oracle.sessionactivebackground Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Oracle | Oracle: Inactive user sessions | The number of inactive user sessions. |
DEPENDENT | oracle.sessioninactiveuser Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Oracle | Oracle: Sessions lock rate | The percentage of locked sessions. Locks are mechanisms that prevent destructive interaction between transactions accessing the same resource — either user objects, such as tables and rows or system objects not visible to users, such as shared data structures in memory and data dictionary rows. |
DEPENDENT | oracle.sessionlockrate Preprocessing: - JSONPATH: |
Oracle | Oracle: Sessions locked over {$ORACLE.SESSION.LOCK.MAX.TIME}s | The count of the prolongedly locked sessions. (You can change the duration of maximum session lock in seconds for a query by {$ORACLE.SESSION.LOCK.MAX.TIME} macro. Default is 600 sec). |
DEPENDENT | oracle.sessionlongtime_locked Preprocessing: - JSONPATH: |
Oracle | Oracle: Sessions concurrency | The percentage of concurrency. Concurrency is a DB behavior when different transactions request to change the same resource. In the case of modifying data transactions, it sequentially temporarily blocks the right to change the data, the rest of the transactions are waiting for the access. In the case when the access for the resource is locked for a long time, then the concurrency grows (like the transaction queue) and this often has an extremely negative impact on the performance. A high contention value does not indicate the root cause of the problem but is a signal to search for it. |
DEPENDENT | oracle.sessionconcurrencyrate Preprocessing: - JSONPATH: |
Oracle | Oracle: User '{$ORACLE.USER}' expire password | The number of days before the password of Zabbix account expires. |
DEPENDENT | oracle.userexpirepassword Preprocessing: - JSONPATH: |
Oracle | Oracle: Active serial sessions | The number of active serial sessions. |
DEPENDENT | oracle.activeserialsessions Preprocessing: - JSONPATH: |
Oracle | Oracle: Active parallel sessions | The number of active parallel sessions. |
DEPENDENT | oracle.activeparallelsessions Preprocessing: - JSONPATH: |
Oracle | Oracle: Long table scans per second | The number of long table scans per second. A table is considered 'long' if the table is not cached and if its high-water mark is greater than five blocks. |
DEPENDENT | oracle.longtablescans_rate Preprocessing: - JSONPATH: |
Oracle | Oracle: SQL service response time | The Structured Query Language (SQL) service response time expressed in seconds. |
DEPENDENT | oracle.serviceresponsetime Preprocessing: - JSONPATH: - MULTIPLIER: |
Oracle | Oracle: User rollbacks per second | The number of times that users manually issue the ROLLBACK statement or an error occurred during the users' transactions. |
DEPENDENT | oracle.userrollbacksrate Preprocessing: - JSONPATH: |
Oracle | Oracle: Total sorts per user call | The total sorts per user call. |
DEPENDENT | oracle.sortsperuser_call Preprocessing: - JSONPATH: |
Oracle | Oracle: Rows per sort | The average number of rows per sort for all types of sorts performed. |
DEPENDENT | oracle.rowspersort Preprocessing: - JSONPATH: |
Oracle | Oracle: Disk sort per second | The number of sorts going to disk per second. |
DEPENDENT | oracle.disk_sorts Preprocessing: - JSONPATH: |
Oracle | Oracle: Memory sorts ratio | The percentage of sorts (from ORDER BY clauses or index building) that are done to disk vs in-memory. |
DEPENDENT | oracle.memorysortsratio Preprocessing: - JSONPATH: |
Oracle | Oracle: Database wait time ratio | Wait time - the time that the server process spends waiting for available shared resources to be released by other server processes, such as latches, locks, data buffers, etc. |
DEPENDENT | oracle.databasewaittime_ratio Preprocessing: - JSONPATH: |
Oracle | Oracle: Database CPU time ratio | It is calculated by dividing the total CPU (used by the database) by the Oracle time model statistic DB time. |
DEPENDENT | oracle.databasecputime_ratio Preprocessing: - JSONPATH: |
Oracle | Oracle: Temp space used | Used temporary space. |
DEPENDENT | oracle.tempspaceused Preprocessing: - JSONPATH: |
Oracle | Oracle: PGA, Total inuse | It indicates how much the Program Global Area (PGA) memory is currently consumed by work areas. This number can be used to determine how much memory is consumed by other consumers of the PGA memory (for example, PL/SQL or Java). |
DEPENDENT | oracle.totalpgaused Preprocessing: - JSONPATH: |
Oracle | Oracle: PGA, Aggregate target parameter | The current value of the PGAAGGREGATETARGET initialization parameter. If this parameter is not set, then its value is 0 and automatic management of the PGA memory is disabled. |
DEPENDENT | oracle.pga_target Preprocessing: - JSONPATH: |
Oracle | Oracle: PGA, Total allocated | The current amount of the PGA memory allocated by the instance. The Oracle Database attempts to keep this number below the value of the PGAAGGREGATETARGET initialization parameter. However, it is possible for the PGA allocated to exceed that value by a small percentage and for a short period of time when the work area workload is increasing very rapidly or when PGAAGGREGATETARGET is set to a small value. |
DEPENDENT | oracle.totalpgaallocated Preprocessing: - JSONPATH: |
Oracle | Oracle: PGA, Total freeable | The number of bytes of the PGA memory in all processes that could be freed back to the operating system. |
DEPENDENT | oracle.totalpgafreeable Preprocessing: - JSONPATH: |
Oracle | Oracle: PGA, Global memory bound | The maximum size of work area executed in automatic mode. |
DEPENDENT | oracle.pgaglobalbound Preprocessing: - JSONPATH: |
Oracle | Oracle: FRA, Space limit | The maximum amount of disk space (in bytes) that the database can use for the Fast Recovery Area (FRA). |
DEPENDENT | oracle.fraspacelimit Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Oracle | Oracle: FRA, Used space | The amount of disk space (in bytes) used by FRA files created in the current and all the previous FRAs. |
DEPENDENT | oracle.fraspaceused Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Oracle | Oracle: FRA, Space reclaimable | The total amount of disk space (in bytes) that can be created by deleting obsolete, redundant, and other low priority files from the FRA. |
DEPENDENT | oracle.fraspacereclaimable Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Oracle | Oracle: FRA, Number of files | The number of files in the FRA. |
DEPENDENT | oracle.franumberoffiles Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
Oracle | Oracle: FRA, Usable space in % | DEPENDENT | oracle.frausablepct Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
|
Oracle | Oracle: FRA, Number of restore points | DEPENDENT | oracle.frarestorepoint Preprocessing: - JSONPATH: |
|
Oracle | Oracle: SGA, java pool | The memory is allocated from the Java pool. |
DEPENDENT | oracle.sgajavapool Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Oracle | Oracle: SGA, large pool | The memory is allocated from a large pool. |
DEPENDENT | oracle.sgalargepool Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Oracle | Oracle: SGA, shared pool | The memory is allocated from a shared pool. |
DEPENDENT | oracle.sgasharedpool Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Oracle | Oracle: SGA, log buffer | The number of bytes allocated for the redo log buffer. |
DEPENDENT | oracle.sgalogbuffer Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Oracle | Oracle: SGA, fixed | The fixed System Global Area (SGA) is an internal housekeeping area. |
DEPENDENT | oracle.sgafixed Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
Oracle | Oracle: SGA, buffer cache | The size of standard block cache. |
DEPENDENT | oracle.sgabuffercache Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Oracle | Oracle: Redo logs available to switch | The number of inactive/unused redo logs available for log switching. |
DEPENDENT | oracle.redologsavailable Preprocessing: - JSONPATH: |
Oracle | Oracle Database '{#DBNAME}': Open status | 1 - 'MOUNTED'; 2 - 'READ WRITE'; 3 - 'READ ONLY'; 4 - 'READ ONLY WITH APPLY' (a physical standby database is open in real-time query mode). |
DEPENDENT | oracle.dbopenmode["{#DBNAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Oracle | Oracle Database '{#DBNAME}': Role | The current role of the database where: 1 - 'SNAPSHOT STANDBY'; 2 - 'LOGICAL STANDBY'; 3 - 'PHYSICAL STANDBY'; 4 - 'PRIMARY '; 5 - 'FAR SYNC'. |
DEPENDENT | oracle.dbrole["{#DBNAME}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:15m |
Oracle | Oracle Database '{#DBNAME}': Log mode | The archive log mode where: 0 - 'NOARCHIVELOG'; 1 - 'ARCHIVELOG'; 2 - 'MANUAL'. |
DEPENDENT | oracle.dblogmode["{#DBNAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Oracle | Oracle Database '{#DBNAME}': Force logging | It indicates whether the database is under force logging mode 'YES' or 'NO'. |
DEPENDENT | oracle.dbforcelogging["{#DBNAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Oracle | Oracle Database '{#DBNAME}': Open status | 1 - 'MOUNTED'; 2 - 'READ WRITE'; 3 - 'READ ONLY'; 4 - 'READ ONLY WITH APPLY' (a physical standby database is open in real-time query mode). |
DEPENDENT | oracle.pdbopenmode["{#DBNAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Oracle | Oracle TBS '{#TABLESPACE}': Tablespace allocated, bytes | Currently allocated bytes for the tablespace (sum of the current size of datafiles). |
DEPENDENT | oracle.tbsallocbytes["{#TABLESPACE}"] Preprocessing: - JSONPATH: |
Oracle | Oracle TBS '{#TABLESPACE}': Tablespace MAX size, bytes | The maximum size of the tablespace. |
DEPENDENT | oracle.tbsmaxbytes["{#TABLESPACE}"] Preprocessing: - JSONPATH: |
Oracle | Oracle TBS '{#TABLESPACE}': Tablespace used, bytes | Currently used bytes for the tablespace (current size of datafiles - the free space). |
DEPENDENT | oracle.tbsusedbytes["{#TABLESPACE}"] Preprocessing: - JSONPATH: |
Oracle | Oracle TBS '{#TABLESPACE}': Tablespace free, bytes | Free bytes of the allocated space. |
DEPENDENT | oracle.tbsfreebytes["{#TABLESPACE}"] Preprocessing: - JSONPATH: |
Oracle | Oracle TBS '{#TABLESPACE}': Tablespace allocated, percent | Allocated bytes/max bytes*100. |
DEPENDENT | oracle.tbsusedpct["{#TABLESPACE}"] Preprocessing: - JSONPATH: |
Oracle | Oracle TBS '{#TABLESPACE}': Tablespace usage, percent | Used bytes/allocated bytes*100. |
DEPENDENT | oracle.tbsusedfile_pct["{#TABLESPACE}"] Preprocessing: - JSONPATH: |
Oracle | Oracle TBS '{#TABLESPACE}': Open status | The tablespace status where: 1 - 'ONLINE'; 2 - 'OFFLINE'; 3 - 'READ ONLY'. |
DEPENDENT | oracle.tbs_status["{#TABLESPACE}"] Preprocessing: - JSONPATH: |
Oracle | Archivelog '{#DEST_NAME}': Error | It displays the error message. |
DEPENDENT | oracle.archivelogerror["{#DESTNAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Oracle | Archivelog '{#DEST_NAME}': Last sequence | It identifies the sequence number of the last archived redo log to be archived. |
DEPENDENT | oracle.archiveloglogsequence["{#DEST_NAME}"] Preprocessing: - JSONPATH: |
Oracle | Archivelog '{#DEST_NAME}': Status | It identifies the current status of the destination where: 1 - 'VALID'; 2 - 'DEFERRED'; 3 - 'ERROR'; 0 - 'UNKNOWN'. |
DEPENDENT | oracle.archiveloglogstatus["{#DESTNAME}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
Oracle | ASM '{#DGNAME}': Total size | The total size of the ASM disk group. |
DEPENDENT | oracle.asmtotalsize["{#DGNAME}"] Preprocessing: - JSONPATH: |
Oracle | ASM '{#DGNAME}': Free size | The free size of the ASM disk group. |
DEPENDENT | oracle.asmfreesize["{#DGNAME}"] Preprocessing: - JSONPATH: |
Oracle | ASM '{#DGNAME}': Free size | Usage of the ASM disk group expressed in %. |
DEPENDENT | oracle.asmusedpct["{#DGNAME}"] Preprocessing: - JSONPATH: |
Zabbix raw items | Oracle: Get instance state | The item gets its state of the current instance. |
ODBC | db.odbc.get[getinstancestate,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] Expression: The text is too long. Please see the template. |
Zabbix raw items | Oracle: Get system metrics | The item gets the values of the system metrics. |
ODBC | db.odbc.get[getsystemmetrics,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] Expression: The text is too long. Please see the template. |
Zabbix raw items | Oracle Database '{#DBNAME}': Get CDB and No-CDB info | It gets the information about the container database (CDB) and non-CDB database on an instance. |
ODBC | db.odbc.get[getcdb{#DBNAME}info,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> Expression: The text is too long. Please see the template. |
Zabbix raw items | Oracle Database '{#DBNAME}': Get PDB info | It gets the information about the PDB database on an instance. |
ODBC | db.odbc.get[getpdb{#DBNAME}info,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> Expression: The text is too long. Please see the template. |
Zabbix raw items | Oracle TBS '{#TABLESPACE}': Get tablespaces stats | It gets the statistics of the tablespace. |
ODBC | db.odbc.get[gettablespace{#TABLESPACE}stats,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> Expression: The text is too long. Please see the template. |
Zabbix raw items | Archivelog '{#DEST_NAME}': Get archive log info | It gets the archivelog statistics. |
ODBC | db.odbc.get[getarchivelog{#DESTNAME}stat,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: Expression: The text is too long. Please see the template. |
Zabbix raw items | ASM '{#DGNAME}': Get ASM stats | It gets the ASM disk group statistics. |
ODBC | db.odbc.get[getasm{#DGNAME}stat,,"Driver={$ORACLE.DRIVER};DBQ=//{HOST.CONN}:{$ORACLE.PORT}/{$ORACLE.SERVICE};"] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> Expression: The text is too long. Please see the template. |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Oracle: Port {$ORACLE.PORT} is unavailable | The TCP port of the Oracle Server service is currently unavailable. |
max(/Oracle by ODBC/net.tcp.service[tcp,{HOST.CONN},{$ORACLE.PORT}],#3)=0 and max(/Oracle by ODBC/proc.num[,,,"tnslsnr LISTENER"],#3)>0 |
DISASTER | |
Oracle: LISTENER process is not running | - |
max(/Oracle by ODBC/proc.num[,,,"tnslsnr LISTENER"],#3)=0 |
DISASTER | |
Oracle: Version has changed | The Oracle DB version has changed. Acknowledge to close manually. |
last(/Oracle by ODBC/oracle.version,#1)<>last(/Oracle by ODBC/oracle.version,#2) and length(last(/Oracle by ODBC/oracle.version))>0 |
INFO | Manual close: YES |
Oracle: Host has been restarted | The host uptime is less than 10 minutes. |
last(/Oracle by ODBC/oracle.uptime)<10m |
INFO | Manual close: YES |
Oracle: Failed to fetch info data | Zabbix has not received any data for the items for the last 5 minutes. The database might be unavailable for connecting. |
nodata(/Oracle by ODBC/oracle.uptime,5m)=1 |
WARNING | Depends on: - Oracle: Port {$ORACLE.PORT} is unavailable |
Oracle: Instance name has changed | The Oracle DB instance has changed. Ack to close manually. |
last(/Oracle by ODBC/oracle.instance_name,#1)<>last(/Oracle by ODBC/oracle.instance_name,#2) and length(last(/Oracle by ODBC/oracle.instance_name))>0 |
INFO | Manual close: YES |
Oracle: Instance hostname has changed | Oracle DB Instance hostname has changed. Ack to close. |
last(/Oracle by ODBC/oracle.instance_hostname,#1)<>last(/Oracle by ODBC/oracle.instance_hostname,#2) and length(last(/Oracle by ODBC/oracle.instance_hostname))>0 |
INFO | Manual close: YES |
Oracle: Too many active processes | Active processes are using more than {$ORACLE.PROCESSES.MAX.WARN}% of the available number of processes. |
min(/Oracle by ODBC/oracle.processes_count,5m) * 100 / last(/Oracle by ODBC/oracle.processes_limit) > {$ORACLE.PROCESSES.MAX.WARN} |
WARNING | |
Oracle: Too many database files | The number of datafiles is higher than {$ORACLE.DB.FILE.MAX.WARN}% of the available datafiles limit. |
min(/Oracle by ODBC/oracle.db_files_count,5m) * 100 / last(/Oracle by ODBC/oracle.db_files_limit) > {$ORACLE.DB.FILE.MAX.WARN} |
WARNING | |
Oracle: Shared pool free is too low | The free memory percent of the shared pool has been less than {$ORACLE.SHARED.FREE.MIN.WARN}% for the last 5 minutes. |
max(/Oracle by ODBC/oracle.shared_pool_free,5m)<{$ORACLE.SHARED.FREE.MIN.WARN} |
WARNING | |
Oracle: Too many active sessions | Active sessions are using more than {$ORACLE.SESSIONS.MAX.WARN}% of the available sessions. |
min(/Oracle by ODBC/oracle.session_count,5m) * 100 / last(/Oracle by ODBC/oracle.session_limit) > {$ORACLE.SESSIONS.MAX.WARN} |
WARNING | |
Oracle: Too many locked sessions | The number of locked sessions exceeds {$ORACLE.SESSIONS.LOCK.MAX.WARN}% of the running sessions. |
min(/Oracle by ODBC/oracle.session_lock_rate,5m) > {$ORACLE.SESSIONS.LOCK.MAX.WARN} |
WARNING | |
Oracle: Too many sessions locked | The number of locked sessions exceeding {$ORACLE.SESSION.LOCK.MAX.TIME} seconds is too high. Long-term locks can negatively affect the database performance. Therefore, if they are detected, you should first find the most difficult queries from the database point of view and then analyze possible resource leaks. |
min(/Oracle by ODBC/oracle.session_long_time_locked,5m) > {$ORACLE.SESSION.LONG.LOCK.MAX.WARN} |
WARNING | |
Oracle: Too high database concurrency | The concurrency rate exceeds {$ORACLE.CONCURRENCY.MAX.WARN}%. A high contention value does not indicate the root cause of the problem, but it is a signal to search for it. In the case of high competition, the analysis of resource consumption should be carried out. Which are the most "heavy" queries made in the database? Possibly, also session tracing. All this will help to determine the root cause and possible optimization points both in the database configuration and in the logic of building queries of the application itself. |
min(/Oracle by ODBC/oracle.session_concurrency_rate,5m) > {$ORACLE.CONCURRENCY.MAX.WARN} |
WARNING | |
Oracle: Zabbix account will expire soon | The password for Zabbix user in the database expires soon. |
last(/Oracle by ODBC/oracle.user_expire_password) < {$ORACLE.EXPIRE.PASSWORD.MIN.WARN} |
WARNING | |
Oracle: Total PGA inuse is too high | The total PGA in use is more than {$ORACLE.PGA.USE.MAX.WARN}% of PGAAGGREGATETARGET. |
min(/Oracle by ODBC/oracle.total_pga_used,5m) * 100 / last(/Oracle by ODBC/oracle.pga_target) > {$ORACLE.PGA.USE.MAX.WARN} |
WARNING | |
Oracle: Number of REDO logs available for switching is too low | The number of inactive/unused REDOs available for log switching is low (database down risk). |
max(/Oracle by ODBC/oracle.redo_logs_available,5m) < {$ORACLE.REDO.MIN.WARN} |
WARNING | |
Oracle Database '{#DBNAME}': Open status in mount mode | The Oracle DB is in a mounted state. |
last(/Oracle by ODBC/oracle.db_open_mode["{#DBNAME}"])=1 |
WARNING | |
Oracle Database '{#DBNAME}': Open status has changed | The Oracle DB open status has changed. Ack to close manually. |
last(/Oracle by ODBC/oracle.db_open_mode["{#DBNAME}"],#1)<>last(/Oracle by ODBC/oracle.db_open_mode["{#DBNAME}"],#2) |
INFO | Manual close: YES Depends on: - Oracle Database '{#DBNAME}': Open status in mount mode |
Oracle Database '{#DBNAME}': Role has changed | The Oracle DB role has changed. Ack to close manually. |
last(/Oracle by ODBC/oracle.db_role["{#DBNAME}"],#1)<>last(/Oracle by ODBC/oracle.db_role["{#DBNAME}"],#2) |
INFO | Manual close: YES |
Oracle Database '{#DBNAME}': Force logging is deactivated for DB with active Archivelog | Force Logging mode - it is very important metric for Databases in 'ARCHIVELOG'. This feature allows to forcibly write all the transactions to the REDO. |
last(/Oracle by ODBC/oracle.db_force_logging["{#DBNAME}"]) = 0 and last(/Oracle by ODBC/oracle.db_log_mode["{#DBNAME}"]) = 1 |
WARNING | |
Oracle Database '{#DBNAME}': Open status in mount mode | The Oracle DB is in a mounted state. |
last(/Oracle by ODBC/oracle.pdb_open_mode["{#DBNAME}"])=1 |
WARNING | |
Oracle Database '{#DBNAME}': Open status has changed | The Oracle DB open status has changed. Ack to close manually. |
last(/Oracle by ODBC/oracle.pdb_open_mode["{#DBNAME}"],#1)<>last(/Oracle by ODBC/oracle.pdb_open_mode["{#DBNAME}"],#2) |
INFO | Manual close: YES |
Oracle TBS '{#TABLESPACE}': Tablespace utilization is too high | - |
min(/Oracle by ODBC/oracle.tbs_used_pct["{#TABLESPACE}"],5m)>{$ORACLE.TBS.USED.PCT.MAX.WARN} |
WARNING | Depends on: - Oracle TBS '{#TABLESPACE}': Tablespace utilization is too high |
Oracle TBS '{#TABLESPACE}': Tablespace utilization is too high | - |
min(/Oracle by ODBC/oracle.tbs_used_pct["{#TABLESPACE}"],5m)>{$ORACLE.TBS.UTIL.PCT.MAX.HIGH} |
HIGH | |
Oracle TBS '{#TABLESPACE}': Tablespace usage is too high | - |
min(/Oracle by ODBC/oracle.tbs_used_file_pct["{#TABLESPACE}"],5m)>{$ORACLE.TBS.USED.PCT.MAX.WARN} |
WARNING | Depends on: - Oracle TBS '{#TABLESPACE}': Tablespace usage is too high |
Oracle TBS '{#TABLESPACE}': Tablespace usage is too high | - |
min(/Oracle by ODBC/oracle.tbs_used_file_pct["{#TABLESPACE}"],5m)>{$ORACLE.TBS.USED.PCT.MAX.HIGH} |
HIGH | |
Oracle TBS '{#TABLESPACE}': Tablespace is OFFLINE | The tablespace is in the offline state. |
last(/Oracle by ODBC/oracle.tbs_status["{#TABLESPACE}"])=2 |
WARNING | |
Oracle TBS '{#TABLESPACE}': Tablespace status has changed | Oracle tablespace status has changed. Ack to close. |
last(/Oracle by ODBC/oracle.tbs_status["{#TABLESPACE}"],#1)<>last(/Oracle by ODBC/oracle.tbs_status["{#TABLESPACE}"],#2) |
INFO | Manual close: YES Depends on: - Oracle TBS '{#TABLESPACE}': Tablespace is OFFLINE |
Archivelog '{#DEST_NAME}': Log Archive is not valid | The trigger will launch if the archive log destination is not in one of these states: 2 - 'DEFERRED'; 3 - 'VALID'." |
last(/Oracle by ODBC/oracle.archivelog_log_status["{#DEST_NAME}"])<2 |
HIGH | |
ASM '{#DGNAME}': Disk group usage is too high | The usage of the ASM disk group expressed in % exceeds {$ORACLE.ASM.USED.PCT.MAX.WARN}. |
min(/Oracle by ODBC/oracle.asm_used_pct["{#DGNAME}"],5m)>{$ORACLE.ASM.USED.PCT.MAX.WARN} |
WARNING | Depends on: - ASM '{#DGNAME}': Disk group usage is too high |
ASM '{#DGNAME}': Disk group usage is too high | The usage of the ASM disk group expressed in % exceeds {$ORACLE.ASM.USED.PCT.MAX.WARN}. |
min(/Oracle by ODBC/oracle.asm_used_pct["{#DGNAME}"],5m)>{$ORACLE.ASM.USED.PCT.MAX.HIGH} |
HIGH |
Please report any issues with the template at https://support.zabbix.com.
For Zabbix version: 6.2 and higher. The template is developed to monitor a single DBMS Oracle Database instance with Zabbix agent 2.
This template was tested on:
See Zabbix template operation for basic instructions.
Test availability:
zabbix_get -s oracle-host -k oracle.ping["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"]
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$ORACLE.ASM.USED.PCT.MAX.HIGH} | The maximum percentage of used Automatic Storage Management (ASM) disk group for a high trigger expression. |
95 |
{$ORACLE.ASM.USED.PCT.MAX.WARN} | The maximum percentage of used ASM disk group for a warning trigger expression. |
90 |
{$ORACLE.CONCURRENCY.MAX.WARN} | The maximum percentage of sessions concurrency usage for a trigger expression. |
80 |
{$ORACLE.CONNSTRING} | - |
tcp://localhost:1521 |
{$ORACLE.DB.FILE.MAX.WARN} | The maximum percentage of used database files for a trigger expression. |
80 |
{$ORACLE.DBNAME.MATCHES} | This macro is used in discovery of the database. It can be overridden on host level or its linked template level. |
.* |
{$ORACLE.DBNAME.NOT_MATCHES} | This macro is used in discovery of the database. It can be overridden on host level or its linked template level. |
PDB\$SEED |
{$ORACLE.EXPIRE.PASSWORD.MIN.WARN} | The number of warning days before the password expires for a trigger expression. |
7 |
{$ORACLE.PASSWORD} | The Oracle user's password. |
zabbix_password |
{$ORACLE.PGA.USE.MAX.WARN} | The maximum percentage of the Program Global Area (PGA) usage that alerts the threshold for a trigger expression. |
90 |
{$ORACLE.PROCESSES.MAX.WARN} | Alert threshold for the maximum percentage of active processes for a trigger expression. |
80 |
{$ORACLE.REDO.MIN.WARN} | Alert threshold for the minimum number of REDO logs for a trigger expression. |
3 |
{$ORACLE.SERVICE} | Oracle Service Name. |
ORA |
{$ORACLE.SESSION.LOCK.MAX.TIME} | The maximum duration of the session lock in seconds to count the session as a prolongedly locked query. |
600 |
{$ORACLE.SESSION.LONG.LOCK.MAX.WARN} | Alert threshold for the maximum number of the prolongedly locked sessions for a trigger expression. |
3 |
{$ORACLE.SESSIONS.LOCK.MAX.WARN} | Alert threshold for the maximum percentage of locked sessions for a trigger expression. |
20 |
{$ORACLE.SESSIONS.MAX.WARN} | Alert threshold for the maximum percentage of active sessions for a trigger expression. |
80 |
{$ORACLE.SHARED.FREE.MIN.WARN} | Alert threshold for the minimum percentage of free shared pool for a trigger expression. |
5 |
{$ORACLE.TABLESPACE.NAME.MATCHES} | This macro is used in tablespace discovery. It can be overridden on host level or its linked template level. |
.* |
{$ORACLE.TABLESPACE.NAME.NOT_MATCHES} | This macro is used in tablespace discovery. It can be overridden on host level or its linked template level. |
CHANGE_IF_NEEDED |
{$ORACLE.TBS.USED.PCT.MAX.HIGH} | High severity alert threshold for the maximum percentage of tablespace usage (used bytes/allocated bytes) for a trigger expression. |
95 |
{$ORACLE.TBS.USED.PCT.MAX.WARN} | Warning severity alert threshold for the maximum percentage of tablespace usage (used bytes/allocated bytes) for a trigger expression. |
90 |
{$ORACLE.TBS.UTIL.PCT.MAX.HIGH} | High severity alert threshold for the maximum percentage of tablespace utilization (allocated bytes/max bytes) for a trigger expression. |
90 |
{$ORACLE.TBS.UTIL.PCT.MAX.WARN} | Warning severity alert threshold for the maximum percentage of tablespace utilization (allocated bytes/max bytes) for a trigger expression. |
80 |
{$ORACLE.USER} | Oracle username. |
zabbix |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Archive log discovery | Destinations of the log archive. |
ZABBIX_PASSIVE | oracle.archive.discovery["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] |
ASM disk groups discovery | The ASM disk groups. |
ZABBIX_PASSIVE | oracle.diskgroups.discovery["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] |
Database discovery | Scanning databases in the database management system (DBMS). |
ZABBIX_PASSIVE | oracle.db.discovery["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] Filter: AND- {#DBNAME} MATCHESREGEX - {#DBNAME} NOTMATCHES_REGEX |
PDB discovery | Scanning a pluggable database (PDB) in DBMS. |
ZABBIX_PASSIVE | oracle.pdb.discovery["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] Filter: AND- {#DBNAME} MATCHESREGEX - {#DBNAME} NOTMATCHES_REGEX |
Tablespace discovery | Scanning tablespaces in DBMS. |
ZABBIX_PASSIVE | oracle.ts.discovery["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] Filter: AND- {#TABLESPACE} MATCHESREGEX - {#TABLESPACE} NOTMATCHES_REGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Oracle | Oracle: Ping | Test the connection to Oracle Database state. |
ZABBIX_PASSIVE | oracle.ping["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Oracle | Oracle: Version | The Oracle Server version. |
DEPENDENT | oracle.version Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Oracle | Oracle: Uptime | The Oracle instance uptime expressed in seconds. |
DEPENDENT | oracle.uptime Preprocessing: - JSONPATH: |
Oracle | Oracle: Instance status | The status of the instance. |
DEPENDENT | oracle.instance_status Preprocessing: - JSONPATH: |
Oracle | Oracle: Archiver state | The status of automatic archiving. |
DEPENDENT | oracle.archiver_state Preprocessing: - JSONPATH: |
Oracle | Oracle: Instance name | The name of an instance. |
DEPENDENT | oracle.instance_name Preprocessing: - JSONPATH: |
Oracle | Oracle: Instance hostname | The name of the host machine. |
DEPENDENT | oracle.instance_hostname Preprocessing: - JSONPATH: |
Oracle | Oracle: Instance role | It indicates whether the instance is an active instance or an inactive secondary instance. |
DEPENDENT | oracle.instance.role Preprocessing: - JSONPATH: |
Oracle | Oracle: Buffer cache hit ratio | The ratio of buffer cache hits ((LogRead - PhyRead)/LogRead). |
DEPENDENT | oracle.buffercachehit_ratio Preprocessing: - JSONPATH: |
Oracle | Oracle: Cursor cache hit ratio | The ratio of cursor cache hits (CursorCacheHit/SoftParse). |
DEPENDENT | oracle.cursorcachehit_ratio Preprocessing: - JSONPATH: |
Oracle | Oracle: Library cache hit ratio | The ratio of library cache hits (Hits/Pins). |
DEPENDENT | oracle.librarycachehit_ratio Preprocessing: - JSONPATH: |
Oracle | Oracle: Shared pool free % | Free memory of a shared pool expressed in %. |
DEPENDENT | oracle.sharedpoolfree Preprocessing: - JSONPATH: |
Oracle | Oracle: Physical reads per second | Reads per second. |
DEPENDENT | oracle.physicalreadsrate Preprocessing: - JSONPATH: |
Oracle | Oracle: Physical writes per second | Writes per second. |
DEPENDENT | oracle.physicalwritesrate Preprocessing: - JSONPATH: |
Oracle | Oracle: Physical reads bytes per second | Read bytes per second. |
DEPENDENT | oracle.physicalreadbytes_rate Preprocessing: - JSONPATH: |
Oracle | Oracle: Physical writes bytes per second | Write bytes per second. |
DEPENDENT | oracle.physicalwritebytes_rate Preprocessing: - JSONPATH: |
Oracle | Oracle: Enqueue timeouts per second | Enqueue timeouts per second. |
DEPENDENT | oracle.enqueuetimeoutsrate Preprocessing: - JSONPATH: |
Oracle | Oracle: GC CR block received per second | The global cache (GC) and the consistent read (CR) block received per second. |
DEPENDENT | oracle.gccrblockreceivedrate Preprocessing: - JSONPATH: |
Oracle | Oracle: Global cache blocks corrupted | The number of blocks that encountered corruption or checksum failure during the interconnect. |
DEPENDENT | oracle.cacheblockscorrupt Preprocessing: - JSONPATH: |
Oracle | Oracle: Global cache blocks lost | The number of lost global cache blocks. |
DEPENDENT | oracle.cacheblockslost Preprocessing: - JSONPATH: |
Oracle | Oracle: Logons per second | The number of logon attempts. |
DEPENDENT | oracle.logons_rate Preprocessing: - JSONPATH: |
Oracle | Oracle: Average active sessions | The average active sessions at a point in time. The number of sessions that are either working or waiting. |
DEPENDENT | oracle.active_sessions Preprocessing: - JSONPATH: |
Oracle | Oracle: Active serial sessions | The number of active serial sessions. |
DEPENDENT | oracle.activeserialsessions Preprocessing: - JSONPATH: |
Oracle | Oracle: Active parallel sessions | The number of active parallel sessions. |
DEPENDENT | oracle.activeparallelsessions Preprocessing: - JSONPATH: |
Oracle | Oracle: Long table scans per second | The number of long table scans per second. A table is considered 'long' if the table is not cached and if its high-water mark is greater than five blocks. |
DEPENDENT | oracle.longtablescans_rate Preprocessing: - JSONPATH: |
Oracle | Oracle: SQL service response time | The Structured Query Language (SQL) service response time expressed in seconds. |
DEPENDENT | oracle.serviceresponsetime Preprocessing: - JSONPATH: - MULTIPLIER: |
Oracle | Oracle: User rollbacks per second | The number of times that users manually issue the ROLLBACK statement or an error occurred during the users' transactions. |
DEPENDENT | oracle.userrollbacksrate Preprocessing: - JSONPATH: |
Oracle | Oracle: Total sorts per user call | The total sorts per user call. |
DEPENDENT | oracle.sortsperuser_call Preprocessing: - JSONPATH: |
Oracle | Oracle: Rows per sort | The average number of rows per sort for all types of sorts performed. |
DEPENDENT | oracle.rowspersort Preprocessing: - JSONPATH: |
Oracle | Oracle: Disk sort per second | The number of sorts going to disk per second. |
DEPENDENT | oracle.disk_sorts Preprocessing: - JSONPATH: |
Oracle | Oracle: Memory sorts ratio | The percentage of sorts (from ORDER BY clauses or index building) that are done to disk vs in-memory. |
DEPENDENT | oracle.memorysortsratio Preprocessing: - JSONPATH: |
Oracle | Oracle: Database wait time ratio | Wait time - the time that the server process spends waiting for available shared resources to be released by other server processes, such as latches, locks, data buffers, etc. |
DEPENDENT | oracle.databasewaittime_ratio Preprocessing: - JSONPATH: |
Oracle | Oracle: Database CPU time ratio | It is calculated by dividing the total CPU (used by the database) by the Oracle time model statistic DB time. |
DEPENDENT | oracle.databasecputime_ratio Preprocessing: - JSONPATH: |
Oracle | Oracle: Temp space used | Used temporary space. |
DEPENDENT | oracle.tempspaceused Preprocessing: - JSONPATH: |
Oracle | Oracle: Sessions limit | The user and system sessions. |
DEPENDENT | oracle.session_limit Preprocessing: - JSONPATH: |
Oracle | Oracle: Datafiles limit | The maximum allowable number of datafiles. |
DEPENDENT | oracle.dbfileslimit Preprocessing: - JSONPATH: |
Oracle | Oracle: Processes limit | The maximum number of user processes. |
DEPENDENT | oracle.processes_limit Preprocessing: - JSONPATH: |
Oracle | Oracle: Session count | The count of sessions. |
DEPENDENT | oracle.session_count Preprocessing: - JSONPATH: |
Oracle | Oracle: Active user sessions | The number of active user sessions. |
DEPENDENT | oracle.sessionactiveuser Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Oracle | Oracle: Active background sessions | The number of active background sessions. |
DEPENDENT | oracle.sessionactivebackground Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Oracle | Oracle: Inactive user sessions | The number of inactive user sessions. |
DEPENDENT | oracle.sessioninactiveuser Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Oracle | Oracle: Sessions lock rate | The percentage of locked sessions. Locks are mechanisms that prevent destructive interaction between transactions accessing the same resource — either user objects, such as tables and rows or system objects not visible to users, such as shared data structures in memory and data dictionary rows. |
DEPENDENT | oracle.sessionlockrate Preprocessing: - JSONPATH: |
Oracle | Oracle: Sessions locked over {$ORACLE.SESSION.LOCK.MAX.TIME}s | The count of the prolongedly locked sessions. (You can change the duration of maximum session lock in seconds for a query by {$ORACLE.SESSION.LOCK.MAX.TIME} macro. Default is 600 sec). |
DEPENDENT | oracle.sessionlongtime_locked Preprocessing: - JSONPATH: |
Oracle | Oracle: Sessions concurrency | The percentage of concurrency. Concurrency is a DB behavior when different transactions request to change the same resource. In the case of modifying data transactions, it sequentially temporarily blocks the right to change the data, the rest of the transactions are waiting for the access. In the case when the access for the resource is locked for a long time, then the concurrency grows (like the transaction queue) and this often has an extremely negative impact on the performance. A high contention value does not indicate the root cause of the problem but is a signal to search for it. |
DEPENDENT | oracle.sessionconcurrencyrate Preprocessing: - JSONPATH: |
Oracle | Oracle: PGA, Total inuse | It indicates how much the Program Global Area (PGA) memory is currently consumed by work areas. This number can be used to determine how much memory is consumed by other consumers of the PGA memory (for example, PL/SQL or Java). |
DEPENDENT | oracle.totalpgaused Preprocessing: - JSONPATH: |
Oracle | Oracle: PGA, Aggregate target parameter | The current value of the PGAAGGREGATETARGET initialization parameter. If this parameter is not set, then its value is 0 and automatic management of the PGA memory is disabled. |
DEPENDENT | oracle.pga_target Preprocessing: - JSONPATH: |
Oracle | Oracle: PGA, Total allocated | The current amount of the PGA memory allocated by the instance. The Oracle Database attempts to keep this number below the value of the PGAAGGREGATETARGET initialization parameter. However, it is possible for the PGA allocated to exceed that value by a small percentage and for a short period of time when the work area workload is increasing very rapidly or when PGAAGGREGATETARGET is set to a small value. |
DEPENDENT | oracle.totalpgaallocated Preprocessing: - JSONPATH: |
Oracle | Oracle: PGA, Total freeable | The number of bytes of the PGA memory in all processes that could be freed back to the operating system. |
DEPENDENT | oracle.totalpgafreeable Preprocessing: - JSONPATH: |
Oracle | Oracle: PGA, Global memory bound | The maximum size of work area executed in automatic mode. |
DEPENDENT | oracle.pgaglobalbound Preprocessing: - JSONPATH: |
Oracle | Oracle: FRA, Space limit | The maximum amount of disk space (in bytes) that the database can use for the Fast Recovery Area (FRA). |
DEPENDENT | oracle.fraspacelimit Preprocessing: - JSONPATH: |
Oracle | Oracle: FRA, Used space | The amount of disk space (in bytes) used by FRA files created in the current and all the previous FRAs. |
DEPENDENT | oracle.fraspaceused Preprocessing: - JSONPATH: |
Oracle | Oracle: FRA, Space reclaimable | The total amount of disk space (in bytes) that can be created by deleting obsolete, redundant, and other low priority files from the FRA. |
DEPENDENT | oracle.fraspacereclaimable Preprocessing: - JSONPATH: |
Oracle | Oracle: FRA, Number of files | The number of files in the FRA. |
DEPENDENT | oracle.franumberof_files Preprocessing: - JSONPATH: |
Oracle | Oracle: FRA, Usable space in % | DEPENDENT | oracle.frausablepct Preprocessing: - JSONPATH: |
|
Oracle | Oracle: FRA, Number of restore points | DEPENDENT | oracle.frarestorepoint Preprocessing: - JSONPATH: |
|
Oracle | Oracle: SGA, java pool | The memory is allocated from the Java pool. |
DEPENDENT | oracle.sgajavapool Preprocessing: - JSONPATH: |
Oracle | Oracle: SGA, large pool | The memory is allocated from a large pool. |
DEPENDENT | oracle.sgalargepool Preprocessing: - JSONPATH: |
Oracle | Oracle: SGA, shared pool | The memory is allocated from a shared pool. |
DEPENDENT | oracle.sgasharedpool Preprocessing: - JSONPATH: |
Oracle | Oracle: SGA, log buffer | The number of bytes allocated for the redo log buffer. |
DEPENDENT | oracle.sgalogbuffer Preprocessing: - JSONPATH: |
Oracle | Oracle: SGA, fixed | The fixed System Global Area (SGA) is an internal housekeeping area. |
DEPENDENT | oracle.sga_fixed Preprocessing: - JSONPATH: |
Oracle | Oracle: SGA, buffer cache | The size of standard block cache. |
DEPENDENT | oracle.sgabuffercache Preprocessing: - JSONPATH: |
Oracle | Oracle: User's expire password | The number of days before the password of Zabbix account expires. |
ZABBIX_PASSIVE | oracle.user.info["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] Preprocessing: - JSONPATH: |
Oracle | Oracle: Redo logs available to switch | The number of inactive/unused redo logs available for log switching. |
ZABBIX_PASSIVE | oracle.redolog.info["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] Preprocessing: - JSONPATH: |
Oracle | Oracle: Number of processes | ZABBIX_PASSIVE | oracle.proc.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] Preprocessing: - JSONPATH: |
|
Oracle | Oracle: Datafiles count | The current number of datafiles. |
ZABBIX_PASSIVE | oracle.datafiles.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] Preprocessing: - JSONPATH: |
Oracle | Oracle Database '{#DBNAME}': Open status | 1 - 'MOUNTED'; 2 - 'READ WRITE'; 3 - 'READ ONLY'; 4 - 'READ ONLY WITH APPLY' (a physical standby database is open in real-time query mode). |
DEPENDENT | oracle.dbopenmode["{#DBNAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Oracle | Oracle Database '{#DBNAME}': Role | The current role of the database where: 1 - 'SNAPSHOT STANDBY'; 2 - 'LOGICAL STANDBY'; 3 - 'PHYSICAL STANDBY'; 4 - 'PRIMARY '; 5 - 'FAR SYNC'. |
DEPENDENT | oracle.dbrole["{#DBNAME}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:15m |
Oracle | Oracle Database '{#DBNAME}': Log mode | The archive log mode where: 0 - 'NOARCHIVELOG'; 1 - 'ARCHIVELOG'; 2 - 'MANUAL'. |
DEPENDENT | oracle.dblogmode["{#DBNAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Oracle | Oracle Database '{#DBNAME}': Force logging | It indicates whether the database is under force logging mode 'YES' or 'NO'. |
DEPENDENT | oracle.dbforcelogging["{#DBNAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Oracle | Oracle Database '{#DBNAME}': Open status | 1 - 'MOUNTED'; 2 - 'READ WRITE'; 3 - 'READ ONLY'; 4 - 'READ ONLY WITH APPLY' (a physical standby database is open in real-time query mode). |
DEPENDENT | oracle.pdbopenmode["{#DBNAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Oracle | Oracle TBS '{#TABLESPACE}': Tablespace allocated, bytes | Currently allocated bytes for the tablespace (sum of the current size of datafiles). |
DEPENDENT | oracle.tbsallocbytes["{#TABLESPACE}"] Preprocessing: - JSONPATH: |
Oracle | Oracle TBS '{#TABLESPACE}': Tablespace MAX size, bytes | The maximum size of the tablespace. |
DEPENDENT | oracle.tbsmaxbytes["{#TABLESPACE}"] Preprocessing: - JSONPATH: |
Oracle | Oracle TBS '{#TABLESPACE}': Tablespace used, bytes | Currently used bytes for the tablespace (current size of datafiles - the free space). |
DEPENDENT | oracle.tbsusedbytes["{#TABLESPACE}"] Preprocessing: - JSONPATH: |
Oracle | Oracle TBS '{#TABLESPACE}': Tablespace free, bytes | Free bytes of the allocated space. |
DEPENDENT | oracle.tbsfreebytes["{#TABLESPACE}"] Preprocessing: - JSONPATH: |
Oracle | Oracle TBS '{#TABLESPACE}': Tablespace usage, percent | Used bytes/allocated bytes*100. |
DEPENDENT | oracle.tbsusedfile_pct["{#TABLESPACE}"] Preprocessing: - JSONPATH: |
Oracle | Oracle TBS '{#TABLESPACE}': Tablespace allocated, percent | Allocated bytes/max bytes*100. |
DEPENDENT | oracle.tbsusedpct["{#TABLESPACE}"] Preprocessing: - JSONPATH: |
Oracle | Oracle TBS '{#TABLESPACE}': Open status | The tablespace status where: 1 - 'ONLINE'; 2 - 'OFFLINE'; 3 - 'READ ONLY'. |
DEPENDENT | oracle.tbs_status["{#TABLESPACE}"] Preprocessing: - JSONPATH: |
Oracle | Archivelog '{#DEST_NAME}': Error | It displays the error message. |
DEPENDENT | oracle.archivelogerror["{#DESTNAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Oracle | Archivelog '{#DEST_NAME}': Last sequence | It identifies the sequence number of the last archived redo log to be archived. |
DEPENDENT | oracle.archiveloglogsequence["{#DEST_NAME}"] Preprocessing: - JSONPATH: |
Oracle | Archivelog '{#DEST_NAME}': Status | It identifies the current status of the destination where: 1 - 'VALID'; 2 - 'DEFERRED'; 3 - 'ERROR'; 0 - 'UNKNOWN'. |
DEPENDENT | oracle.archiveloglogstatus["{#DESTNAME}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
Oracle | ASM '{#DGNAME}': Total size | The total size of the ASM disk group. |
DEPENDENT | oracle.asmtotalsize["{#DGNAME}"] Preprocessing: - JSONPATH: |
Oracle | ASM '{#DGNAME}': Free size | The free size of the ASM disk group. |
DEPENDENT | oracle.asmfreesize["{#DGNAME}"] Preprocessing: - JSONPATH: |
Oracle | ASM '{#DGNAME}': Free size | Usage of the ASM disk group expressed in %. |
DEPENDENT | oracle.asmusedpct["{#DGNAME}"] Preprocessing: - JSONPATH: |
Zabbix raw items | Oracle: Get instance state | The item gets its state of the current instance. |
ZABBIX_PASSIVE | oracle.instance.info["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] |
Zabbix raw items | Oracle: Get system metrics | The item gets the values of the system metrics. |
ZABBIX_PASSIVE | oracle.sys.metrics["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] |
Zabbix raw items | Oracle: Get system parameters | Get a set of system parameter values. |
ZABBIX_PASSIVE | oracle.sys.params["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] |
Zabbix raw items | Oracle: Get sessions stats | Get sessions statistics. {$ORACLE.SESSION.LOCK.MAX.TIME} -- maximum seconds in the current wait condition for counting long time locked sessions. Default: 600 seconds. |
ZABBIX_PASSIVE | oracle.sessions.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}","{$ORACLE.SESSION.LOCK.MAX.TIME}"] |
Zabbix raw items | Oracle: Get PGA stats | Get PGA statistics. |
ZABBIX_PASSIVE | oracle.pga.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] |
Zabbix raw items | Oracle: Get FRA stats | Get FRA statistics. |
ZABBIX_PASSIVE | oracle.fra.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] |
Zabbix raw items | Oracle: Get SGA stats | Get SGA statistics. |
ZABBIX_PASSIVE | oracle.sga.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"] |
Zabbix raw items | Oracle Database '{#DBNAME}': Get CDB and No-CDB info | It gets the information about the container database (CDB) and non-CDB database on an instance. |
ZABBIX_PASSIVE | oracle.cdb.info["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}","{#DBNAME}"] |
Zabbix raw items | Oracle Database '{#DBNAME}': Get PDB info | It gets the information about the PDB database on an instance. |
ZABBIX_PASSIVE | oracle.pdb.info["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}","{#DBNAME}"] |
Zabbix raw items | Oracle TBS '{#TABLESPACE}': Get tablespaces stats | It gets the statistics of the tablespace. |
ZABBIX_PASSIVE | oracle.ts.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}","{#TABLESPACE}","{#CONTENTS}"] |
Zabbix raw items | Archivelog '{#DEST_NAME}': Get archive log info | It gets the archivelog statistics. |
ZABBIX_PASSIVE | oracle.archive.info["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}","{#DEST_NAME}"] |
Zabbix raw items | ASM '{#DGNAME}': Get ASM stats | It gets the ASM disk group statistics. |
ZABBIX_PASSIVE | oracle.diskgroups.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}","{#DGNAME}"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Oracle: Connection to database is unavailable | Connection to Oracle Database is currently unavailable. |
last(/Oracle by Zabbix agent 2/oracle.ping["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"])=0 |
DISASTER | |
Oracle: Version has changed | The Oracle DB version has changed. Acknowledge (Ack) to close manually. |
last(/Oracle by Zabbix agent 2/oracle.version,#1)<>last(/Oracle by Zabbix agent 2/oracle.version,#2) and length(last(/Oracle by Zabbix agent 2/oracle.version))>0 |
INFO | Manual close: YES |
Oracle: Failed to fetch info data | Zabbix has not received any data for the items for the last 5 minutes. The database might be unavailable for connecting. |
nodata(/Oracle by Zabbix agent 2/oracle.uptime,30m)=1 |
INFO | |
Oracle: Host has been restarted | The host uptime is less than 10 minutes. |
last(/Oracle by Zabbix agent 2/oracle.uptime)<10m |
INFO | Manual close: YES |
Oracle: Instance name has changed | Oracle DB Instance name has changed. Ack to close manually. |
last(/Oracle by Zabbix agent 2/oracle.instance_name,#1)<>last(/Oracle by Zabbix agent 2/oracle.instance_name,#2) and length(last(/Oracle by Zabbix agent 2/oracle.instance_name))>0 |
INFO | Manual close: YES |
Oracle: Instance hostname has changed | Oracle DB Instance hostname has changed. Ack to close. |
last(/Oracle by Zabbix agent 2/oracle.instance_hostname,#1)<>last(/Oracle by Zabbix agent 2/oracle.instance_hostname,#2) and length(last(/Oracle by Zabbix agent 2/oracle.instance_hostname))>0 |
INFO | Manual close: YES |
Oracle: Shared pool free is too low | The free memory percent of the shared pool has been less than {$ORACLE.SHARED.FREE.MIN.WARN}% for the last 5 minutes. |
max(/Oracle by Zabbix agent 2/oracle.shared_pool_free,5m)<{$ORACLE.SHARED.FREE.MIN.WARN} |
WARNING | |
Oracle: Too many active sessions | Active sessions are using more than {$ORACLE.SESSIONS.MAX.WARN}% of the available sessions. |
min(/Oracle by Zabbix agent 2/oracle.session_count,5m) * 100 / last(/Oracle by Zabbix agent 2/oracle.session_limit) > {$ORACLE.SESSIONS.MAX.WARN} |
WARNING | |
Oracle: Too many locked sessions | The number of locked sessions exceeds {$ORACLE.SESSIONS.LOCK.MAX.WARN}% of the running sessions. |
min(/Oracle by Zabbix agent 2/oracle.session_lock_rate,5m) > {$ORACLE.SESSIONS.LOCK.MAX.WARN} |
WARNING | |
Oracle: Too many sessions locked | The number of locked sessions exceeding {$ORACLE.SESSION.LOCK.MAX.TIME} seconds is too high. Long-term locks can negatively affect the database performance. Therefore, if they are detected, you should first find the most difficult queries from the database point of view and then analyze possible resource leaks. |
min(/Oracle by Zabbix agent 2/oracle.session_long_time_locked,5m) > {$ORACLE.SESSION.LONG.LOCK.MAX.WARN} |
WARNING | |
Oracle: Too high database concurrency | The concurrency rate exceeds {$ORACLE.CONCURRENCY.MAX.WARN}%. A high contention value does not indicate the root cause of the problem, but it is a signal to search for it. In the case of high competition, the analysis of resource consumption should be carried out. Which are the most "heavy" queries made in the database? Possibly, also session tracing. All this will help to determine the root cause and possible optimization points both in the database configuration and in the logic of building queries of the application itself. |
min(/Oracle by Zabbix agent 2/oracle.session_concurrency_rate,5m) > {$ORACLE.CONCURRENCY.MAX.WARN} |
WARNING | |
Oracle: Total PGA inuse is too high | The total PGA in use is more than {$ORACLE.PGA.USE.MAX.WARN}% of PGAAGGREGATETARGET. |
min(/Oracle by Zabbix agent 2/oracle.total_pga_used,5m) * 100 / last(/Oracle by Zabbix agent 2/oracle.pga_target) > {$ORACLE.PGA.USE.MAX.WARN} |
WARNING | |
Oracle: Zabbix account will expire soon | The password for Zabbix user in the database expires soon. |
last(/Oracle by Zabbix agent 2/oracle.user.info["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"]) < {$ORACLE.EXPIRE.PASSWORD.MIN.WARN} |
WARNING | |
Oracle: Number of REDO logs available for switching is too low | The number of inactive/unused REDOs available for log switching is low (database down risk). |
max(/Oracle by Zabbix agent 2/oracle.redolog.info["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"],5m) < {$ORACLE.REDO.MIN.WARN} |
WARNING | |
Oracle: Too many active processes | Active processes are using more than {$ORACLE.PROCESSES.MAX.WARN}% of the available number of processes. |
min(/Oracle by Zabbix agent 2/oracle.proc.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"],5m) * 100 / last(/Oracle by Zabbix agent 2/oracle.processes_limit) > {$ORACLE.PROCESSES.MAX.WARN} |
WARNING | |
Oracle: Too many database files | The number of datafiles is higher than {$ORACLE.DB.FILE.MAX.WARN}% of the available datafiles limit. |
min(/Oracle by Zabbix agent 2/oracle.datafiles.stats["{$ORACLE.CONNSTRING}","{$ORACLE.USER}","{$ORACLE.PASSWORD}","{$ORACLE.SERVICE}"],5m) * 100 / last(/Oracle by Zabbix agent 2/oracle.db_files_limit) > {$ORACLE.DB.FILE.MAX.WARN} |
WARNING | |
Oracle Database '{#DBNAME}': Open status in mount mode | The Oracle DB is in a mounted state. |
last(/Oracle by Zabbix agent 2/oracle.db_open_mode["{#DBNAME}"])=1 |
WARNING | |
Oracle Database '{#DBNAME}': Open status has changed | The Oracle DB open status has changed. Ack to close manually. |
last(/Oracle by Zabbix agent 2/oracle.db_open_mode["{#DBNAME}"],#1)<>last(/Oracle by Zabbix agent 2/oracle.db_open_mode["{#DBNAME}"],#2) |
INFO | Manual close: YES Depends on: - Oracle Database '{#DBNAME}': Open status in mount mode |
Oracle Database '{#DBNAME}': Role has changed | The Oracle DB role has changed. Ack to close manually. |
last(/Oracle by Zabbix agent 2/oracle.db_role["{#DBNAME}"],#1)<>last(/Oracle by Zabbix agent 2/oracle.db_role["{#DBNAME}"],#2) |
INFO | Manual close: YES |
Oracle Database '{#DBNAME}': Force logging is deactivated for DB with active Archivelog | Force Logging mode - it is very important metric for Databases in 'ARCHIVELOG'. This feature allows to forcibly write all the transactions to the REDO. |
last(/Oracle by Zabbix agent 2/oracle.db_force_logging["{#DBNAME}"]) = 0 and last(/Oracle by Zabbix agent 2/oracle.db_log_mode["{#DBNAME}"]) = 1 |
WARNING | |
Oracle Database '{#DBNAME}': Open status in mount mode | The Oracle DB is in a mounted state. |
last(/Oracle by Zabbix agent 2/oracle.pdb_open_mode["{#DBNAME}"])=1 |
WARNING | |
Oracle Database '{#DBNAME}': Open status has changed | The Oracle DB open status has changed. Ack to close manually. |
last(/Oracle by Zabbix agent 2/oracle.pdb_open_mode["{#DBNAME}"],#1)<>last(/Oracle by Zabbix agent 2/oracle.pdb_open_mode["{#DBNAME}"],#2) |
INFO | Manual close: YES |
Oracle TBS '{#TABLESPACE}': Tablespace usage is too high | - |
min(/Oracle by Zabbix agent 2/oracle.tbs_used_file_pct["{#TABLESPACE}"],5m)>{$ORACLE.TBS.USED.PCT.MAX.WARN} |
WARNING | Depends on: - Oracle TBS '{#TABLESPACE}': Tablespace usage is too high |
Oracle TBS '{#TABLESPACE}': Tablespace usage is too high | - |
min(/Oracle by Zabbix agent 2/oracle.tbs_used_file_pct["{#TABLESPACE}"],5m)>{$ORACLE.TBS.USED.PCT.MAX.HIGH} |
HIGH | |
Oracle TBS '{#TABLESPACE}': Tablespace utilization is too high | - |
min(/Oracle by Zabbix agent 2/oracle.tbs_used_pct["{#TABLESPACE}"],5m)>{$ORACLE.TBS.USED.PCT.MAX.WARN} |
WARNING | Depends on: - Oracle TBS '{#TABLESPACE}': Tablespace utilization is too high |
Oracle TBS '{#TABLESPACE}': Tablespace utilization is too high | - |
min(/Oracle by Zabbix agent 2/oracle.tbs_used_pct["{#TABLESPACE}"],5m)>{$ORACLE.TBS.UTIL.PCT.MAX.HIGH} |
HIGH | |
Oracle TBS '{#TABLESPACE}': Tablespace is OFFLINE | The tablespace is in the offline state. |
last(/Oracle by Zabbix agent 2/oracle.tbs_status["{#TABLESPACE}"])=2 |
WARNING | |
Oracle TBS '{#TABLESPACE}': Tablespace status has changed | Oracle tablespace status has changed. Ack to close. |
last(/Oracle by Zabbix agent 2/oracle.tbs_status["{#TABLESPACE}"],#1)<>last(/Oracle by Zabbix agent 2/oracle.tbs_status["{#TABLESPACE}"],#2) |
INFO | Manual close: YES Depends on: - Oracle TBS '{#TABLESPACE}': Tablespace is OFFLINE |
Archivelog '{#DEST_NAME}': Log Archive is not valid | The trigger will launch if the archive log destination is not in one of these states: 2 - 'DEFERRED'; 3 - 'VALID'. |
last(/Oracle by Zabbix agent 2/oracle.archivelog_log_status["{#DEST_NAME}"])<2 |
HIGH | |
ASM '{#DGNAME}': Disk group usage is too high | The usage of the ASM disk group expressed in % exceeds {$ORACLE.ASM.USED.PCT.MAX.WARN} |
min(/Oracle by Zabbix agent 2/oracle.asm_used_pct["{#DGNAME}"],5m)>{$ORACLE.ASM.USED.PCT.MAX.WARN} |
WARNING | Depends on: - ASM '{#DGNAME}': Disk group usage is too high |
ASM '{#DGNAME}': Disk group usage is too high | The usage of the ASM disk group expressed in % exceeds {$ORACLE.ASM.USED.PCT.MAX.HIGH} |
min(/Oracle by Zabbix agent 2/oracle.asm_used_pct["{#DGNAME}"],5m)>{$ORACLE.ASM.USED.PCT.MAX.HIGH} |
HIGH |
Please report any issues with the template at https://support.zabbix.com.
For Zabbix version: 6.2 and higher
The template is developed for monitoring DBMS MySQL and its forks.
This template was tested on:
See Zabbix template operation for basic instructions.
<password>
at your discretion):CREATE USER 'zbx_monitor'@'%' IDENTIFIED BY '<password>';
GRANT REPLICATION CLIENT,PROCESS,SHOW DATABASES,SHOW VIEW ON *.* TO 'zbx_monitor'@'%';
For more information, please see MySQL documentation https://dev.mysql.com/doc/refman/8.0/en/grant.html
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$MYSQL.ABORTED_CONN.MAX.WARN} | Number of failed attempts to connect to the MySQL server for trigger expression. |
3 |
{$MYSQL.BUFF_UTIL.MIN.WARN} | The minimum buffer pool utilization in percentage for trigger expression. |
50 |
{$MYSQL.CREATEDTMPDISK_TABLES.MAX.WARN} | The maximum number of created tmp tables on a disk per second for trigger expressions. |
10 |
{$MYSQL.CREATEDTMPFILES.MAX.WARN} | The maximum number of created tmp files on a disk per second for trigger expressions. |
10 |
{$MYSQL.CREATEDTMPTABLES.MAX.WARN} | The maximum number of created tmp tables in memory per second for trigger expressions. |
30 |
{$MYSQL.DSN} | System data source name. |
<Put your DSN here> |
{$MYSQL.INNODBLOGFILES} | Number of physical files in the InnoDB redo log for calculating innodblogfile_size. |
2 |
{$MYSQL.PASSWORD} | MySQL user password. |
<Put your password here> |
{$MYSQL.REPL_LAG.MAX.WARN} | The lag of slave from master for trigger expression. |
30m |
{$MYSQL.SLOW_QUERIES.MAX.WARN} | Number of slow queries for trigger expression. |
3 |
{$MYSQL.USER} | MySQL username. |
<Put your username here> |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Database discovery | Scanning databases in DBMS. |
ODBC | db.odbc.discovery[databases,"{$MYSQL.DSN}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: Filter: ANDOR- {#DATABASE} NOT MATCHES_REGEXinformation_schema |
MariaDB discovery | Additional metrics if MariaDB is used. |
DEPENDENT | mysql.extra_metric.discovery Preprocessing: - JAVASCRIPT: |
Replication discovery | If "show slave status" returns Master_Host, "Replication: *" items are created. |
ODBC | db.odbc.discovery[replication,"{$MYSQL.DSN}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
MySQL | MySQL: Status | ODBC | db.odbc.select[ping,"{$MYSQL.DSN}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: Expression: select "1" |
|
MySQL | MySQL: Version | ODBC | db.odbc.select[version,"{$MYSQL.DSN}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: Expression: select version() |
|
MySQL | MySQL: Uptime | The amount of seconds that the server has been up. |
DEPENDENT | mysql.uptime Preprocessing: - JSONPATH: |
MySQL | MySQL: Aborted clients per second | Number of connections that were aborted because the client died without closing the connection properly. |
DEPENDENT | mysql.abortedclients.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Aborted connections per second | Number of failed attempts to connect to the MySQL server. |
DEPENDENT | mysql.abortedconnects.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Connection errors accept per second | Number of errors that occurred during calls to accept() on the listening port. |
DEPENDENT | mysql.connectionerrorsaccept.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MySQL | MySQL: Connection errors internal per second | Number of refused connections due to internal server errors, for example, out of memory errors, or failed thread starts. |
DEPENDENT | mysql.connectionerrorsinternal.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MySQL | MySQL: Connection errors max connections per second | Number of refused connections due to the max_connections limit being reached. |
DEPENDENT | mysql.connectionerrorsmaxconnections.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Connection errors peer address per second | Number of errors while searching for the connecting client IP address. |
DEPENDENT | mysql.connectionerrorspeeraddress.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Connection errors select per second | Number of errors during calls to select() or poll() on the listening port. The client would not necessarily have been rejected in these cases. |
DEPENDENT | mysql.connectionerrorsselect.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MySQL | MySQL: Connection errors tcpwrap per second | Number of connections the libwrap library has refused. |
DEPENDENT | mysql.connectionerrorstcpwrap.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MySQL | MySQL: Connections per second | Number of connection attempts (successful or not) to the MySQL server. |
DEPENDENT | mysql.connections.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MySQL | MySQL: Max used connections | The maximum number of connections that have been in use simultaneously since the server start. |
DEPENDENT | mysql.maxusedconnections Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
MySQL | MySQL: Threads cached | Number of threads in the thread cache. |
DEPENDENT | mysql.threads_cached Preprocessing: - JSONPATH: |
MySQL | MySQL: Threads connected | Number of currently open connections. |
DEPENDENT | mysql.threads_connected Preprocessing: - JSONPATH: |
MySQL | MySQL: Threads created per second | Number of threads created to handle connections. If Threadscreated is big, you may want to increase the threadcachesize value. The cache miss rate can be calculated as Threadscreated/Connections. |
DEPENDENT | mysql.threadscreated.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Threads running | Number of threads which are not sleeping. |
DEPENDENT | mysql.threads_running Preprocessing: - JSONPATH: |
MySQL | MySQL: Buffer pool efficiency | The item shows how effectively the buffer pool is serving reads. |
CALCULATED | mysql.bufferpoolefficiency Expression: last(//mysql.innodb_buffer_pool_reads) / ( last(//mysql.innodb_buffer_pool_read_requests) + ( last(//mysql.innodb_buffer_pool_read_requests) = 0 ) ) * 100 * ( last(//mysql.innodb_buffer_pool_read_requests) > 0 ) |
MySQL | MySQL: Buffer pool utilization | Ratio of used to total pages in the buffer pool. |
CALCULATED | mysql.bufferpoolutilization Expression: ( last(//mysql.innodb_buffer_pool_pages_total) - last(//mysql.innodb_buffer_pool_pages_free) ) / ( last(//mysql.innodb_buffer_pool_pages_total) + ( last(//mysql.innodb_buffer_pool_pages_total) = 0 ) ) * 100 * ( last(//mysql.innodb_buffer_pool_pages_total) > 0 ) |
MySQL | MySQL: Created tmp files on disk per second | How many temporary files mysqld has created. |
DEPENDENT | mysql.createdtmpfiles.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MySQL | MySQL: Created tmp tables on disk per second | Number of internal on-disk temporary tables created by the server while executing statements. |
DEPENDENT | mysql.createdtmpdisktables.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Created tmp tables on memory per second | Number of internal temporary tables created by the server while executing statements. |
DEPENDENT | mysql.createdtmptables.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MySQL | MySQL: InnoDB buffer pool pages free | The total size of the InnoDB buffer pool, in pages. |
DEPENDENT | mysql.innodbbufferpoolpagesfree Preprocessing: - JSONPATH: |
MySQL | MySQL: InnoDB buffer pool pages total | The total size of the InnoDB buffer pool, in pages. |
DEPENDENT | mysql.innodbbufferpoolpagestotal Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
MySQL | MySQL: InnoDB buffer pool read requests per second | Number of logical read requests per second. |
DEPENDENT | mysql.innodbbufferpoolreadrequests.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MySQL | MySQL: InnoDB buffer pool reads per second | Number of logical reads per second that InnoDB could not satisfy from the buffer pool, and had to read directly from the disk. |
DEPENDENT | mysql.innodbbufferpoolreads.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: InnoDB row lock time | The total time spent in acquiring row locks for InnoDB tables, in milliseconds. |
DEPENDENT | mysql.innodbrowlocktime Preprocessing: - JSONPATH: - MULTIPLIER: - DISCARD UNCHANGED_HEARTBEAT:1h |
MySQL | MySQL: InnoDB row lock time max | The maximum time to acquire a row lock for InnoDB tables, in milliseconds. |
DEPENDENT | mysql.innodbrowlocktimemax Preprocessing: - JSONPATH: - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
MySQL | MySQL: InnoDB row lock waits | Number of times operations on InnoDB tables had to wait for a row lock. |
DEPENDENT | mysql.innodbrowlock_waits Preprocessing: - JSONPATH: |
MySQL | MySQL: Slow queries per second | Number of queries that have taken more than longquerytime seconds. |
DEPENDENT | mysql.slowqueries.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Bytes received | Number of bytes received from all clients. |
DEPENDENT | mysql.bytesreceived.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Bytes sent | Number of bytes sent to all clients. |
DEPENDENT | mysql.bytessent.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Command Delete per second | The Com_delete counter variable indicates the number of times the delete statement has been executed. |
DEPENDENT | mysql.comdelete.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Command Insert per second | The Com_insert counter variable indicates the number of times the insert statement has been executed. |
DEPENDENT | mysql.cominsert.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Command Select per second | The Com_select counter variable indicates the number of times the select statement has been executed. |
DEPENDENT | mysql.comselect.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Command Update per second | The Com_update counter variable indicates the number of times the update statement has been executed. |
DEPENDENT | mysql.comupdate.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Queries per second | Number of statements executed by the server. This variable includes statements executed within stored programs, unlike the Questions variable. |
DEPENDENT | mysql.queries.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MySQL | MySQL: Questions per second | Number of statements executed by the server. This includes only statements sent to the server by clients and not statements executed within stored programs, unlike the Queries variable. |
DEPENDENT | mysql.questions.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MySQL | MySQL: Binlog cache disk use | Number of transactions that used a temporary disk cache because they could not fit in the regular binary log cache, being larger than binlogcachesize. |
DEPENDENT | mysql.binlogcachediskuse Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
MySQL | MySQL: Innodb buffer pool wait free | Number of times InnoDB waited for a free page before reading or creating a page. Normally, writes to the InnoDB buffer pool happen in the background. When no clean pages are available, dirty pages are flushed first in order to free some up. This counts the numbers of wait for this operation to finish. If this value is not small, look at the increasing innodbbufferpool_size. |
DEPENDENT | mysql.innodbbufferpoolwaitfree Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
MySQL | MySQL: Innodb number open files | Number of open files held by InnoDB. InnoDB only. |
DEPENDENT | mysql.innodbnumopenfiles Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
MySQL | MySQL: Open table definitions | Number of cached table definitions. |
DEPENDENT | mysql.opentabledefinitions Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
MySQL | MySQL: Open tables | Number of tables that are open. |
DEPENDENT | mysql.opentables Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
MySQL | MySQL: Innodb log written | Number of bytes written to the InnoDB log. |
DEPENDENT | mysql.innodboslogwritten Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
MySQL | MySQL: Calculated value of innodblogfile_size | Calculated by (innodboslogwritten-innodboslogwritten(time shift -1h))/{$MYSQL.INNODBLOGFILES} value of the innodblogfilesize. Innodblogfilesize is the size in bytes of the each InnoDB redo log file in the log group. The combined size can be no more than 512GB. Larger values mean less disk I/O due to less flushing checkpoint activity, but also slower recovery from a crash. |
CALCULATED | mysql.innodblogfilesize Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:6h Expression: (last(//mysql.innodb_os_log_written) - last(//mysql.innodb_os_log_written,#1:now-1h)) / {$MYSQL.INNODB_LOG_FILES} |
MySQL | MySQL: Size of database {#DATABASE} | - |
ODBC | db.odbc.select[{#DATABASE}size,"{$MYSQL.DSN}"] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:1h Expression: The text is too long. Please see the template. |
MySQL | MySQL: Replication Slave SQL Running State {#MASTER_HOST} | This shows the state of the SQL driver threads. |
DEPENDENT | mysql.slavesqlrunningstate["{#MASTERHOST}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
MySQL | MySQL: Replication Seconds Behind Master {#MASTER_HOST} | The amount of seconds the slave SQL thread has been behind processing the master binary log. A high number (or an increasing one) can indicate that the slave is unable to handle events from the master in a timely fashion. |
DEPENDENT | mysql.secondsbehindmaster["{#MASTERHOST}"] Preprocessing: - JSONPATH: - MATCHES REGEX:\d+ ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
MySQL | MySQL: Replication Slave IO Running {#MASTER_HOST} | Whether the I/O thread for reading the master's binary log is running. Normally, you want this to be Yes unless you have not yet started a replication or have explicitly stopped it with STOP SLAVE. |
DEPENDENT | mysql.slaveiorunning["{#MASTERHOST}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
MySQL | MySQL: Replication Slave SQL Running {#MASTER_HOST} | Whether the SQL thread for executing events in the relay log is running. As with the I/O thread, this should normally be Yes. |
DEPENDENT | mysql.slavesqlrunning["{#MASTERHOST}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
MySQL | MySQL: Binlog commits | Total number of transactions committed to the binary log. |
DEPENDENT | mysql.binlog_commits[{#SINGLETON}] Preprocessing: - JSONPATH: |
MySQL | MySQL: Binlog group commits | Total number of group commits done to the binary log. |
DEPENDENT | mysql.binloggroupcommits[{#SINGLETON}] Preprocessing: - JSONPATH: |
MySQL | MySQL: Master GTID wait count | The number of times MASTERGTIDWAIT called. |
DEPENDENT | mysql.mastergtidwaitcount[{#SINGLETON}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
MySQL | MySQL: Master GTID wait time | Total number of time spent in MASTERGTIDWAIT. |
DEPENDENT | mysql.mastergtidwaittime[{#SINGLETON}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
MySQL | MySQL: Master GTID wait timeouts | Number of timeouts occurring in MASTERGTIDWAIT. |
DEPENDENT | mysql.mastergtidwaittimeouts[{#SINGLETON}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
Zabbix raw items | MySQL: Get status variables | The item gets server global status information. |
ODBC | db.odbc.get[getstatusvariables,"{$MYSQL.DSN}"] Expression: show global status |
Zabbix raw items | MySQL: InnoDB buffer pool read requests | Number of logical read requests. |
DEPENDENT | mysql.innodbbufferpoolreadrequests Preprocessing: - JSONPATH: |
Zabbix raw items | MySQL: InnoDB buffer pool reads | Number of logical reads that InnoDB could not satisfy from the buffer pool, and had to read directly from the disk. |
DEPENDENT | mysql.innodbbufferpool_reads Preprocessing: - JSONPATH: |
Zabbix raw items | MySQL: Replication Slave status {#MASTER_HOST} | The item gets status information on the essential parameters of the slave threads. |
ODBC | db.odbc.get["{#MASTER_HOST}","{$MYSQL.DSN}"] Expression: show slave status |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MySQL: Service is down | - |
last(/MySQL by ODBC/db.odbc.select[ping,"{$MYSQL.DSN}"])=0 |
HIGH | |
MySQL: Version has changed | MySQL version has changed. Ack to close. |
last(/MySQL by ODBC/db.odbc.select[version,"{$MYSQL.DSN}"],#1)<>last(/MySQL by ODBC/db.odbc.select[version,"{$MYSQL.DSN}"],#2) and length(last(/MySQL by ODBC/db.odbc.select[version,"{$MYSQL.DSN}"]))>0 |
INFO | Manual close: YES |
MySQL: Service has been restarted | MySQL uptime is less than 10 minutes. |
last(/MySQL by ODBC/mysql.uptime)<10m |
INFO | |
MySQL: Failed to fetch info data | Zabbix has not received data for items for the last 30 minutes. |
nodata(/MySQL by ODBC/mysql.uptime,30m)=1 |
INFO | Depends on: - MySQL: Service is down |
MySQL: Server has aborted connections | The number of failed attempts to connect to the MySQL server is more than {$MYSQL.ABORTED_CONN.MAX.WARN} in the last 5 minutes. |
min(/MySQL by ODBC/mysql.aborted_connects.rate,5m)>{$MYSQL.ABORTED_CONN.MAX.WARN} |
AVERAGE | Depends on: - MySQL: Refused connections |
MySQL: Refused connections | Number of refused connections due to the max_connections limit being reached. |
last(/MySQL by ODBC/mysql.connection_errors_max_connections.rate)>0 |
AVERAGE | |
MySQL: Buffer pool utilization is too low | The buffer pool utilization is less than {$MYSQL.BUFF_UTIL.MIN.WARN}% in the last 5 minutes. This means that there is a lot of unused RAM allocated for the buffer pool, which you can easily reallocate at the moment. |
max(/MySQL by ODBC/mysql.buffer_pool_utilization,5m)<{$MYSQL.BUFF_UTIL.MIN.WARN} |
WARNING | |
MySQL: Number of temporary files created per second is high | Possibly the application using the database is in need of query optimization. |
min(/MySQL by ODBC/mysql.created_tmp_files.rate,5m)>{$MYSQL.CREATED_TMP_FILES.MAX.WARN} |
WARNING | |
MySQL: Number of on-disk temporary tables created per second is high | Possibly the application using the database is in need of query optimization. |
min(/MySQL by ODBC/mysql.created_tmp_disk_tables.rate,5m)>{$MYSQL.CREATED_TMP_DISK_TABLES.MAX.WARN} |
WARNING | |
MySQL: Number of internal temporary tables created per second is high | Possibly the application using the database is in need of query optimization. |
min(/MySQL by ODBC/mysql.created_tmp_tables.rate,5m)>{$MYSQL.CREATED_TMP_TABLES.MAX.WARN} |
WARNING | |
MySQL: Server has slow queries | The number of slow queries is more than {$MYSQL.SLOW_QUERIES.MAX.WARN} in the last 5 minutes. |
min(/MySQL by ODBC/mysql.slow_queries.rate,5m)>{$MYSQL.SLOW_QUERIES.MAX.WARN} |
WARNING | |
MySQL: Replication lag is too high | - |
min(/MySQL by ODBC/mysql.seconds_behind_master["{#MASTER_HOST}"],5m)>{$MYSQL.REPL_LAG.MAX.WARN} |
WARNING | |
MySQL: The slave I/O thread is not running | Whether the I/O thread for reading the master's binary log is running. |
count(/MySQL by ODBC/mysql.slave_io_running["{#MASTER_HOST}"],#1,"eq","No")=1 |
AVERAGE | |
MySQL: The slave I/O thread is not connected to a replication master | - |
count(/MySQL by ODBC/mysql.slave_io_running["{#MASTER_HOST}"],#1,"ne","Yes")=1 |
WARNING | Depends on: - MySQL: The slave I/O thread is not running |
MySQL: The SQL thread is not running | Whether the SQL thread for executing events in the relay log is running. |
count(/MySQL by ODBC/mysql.slave_sql_running["{#MASTER_HOST}"],#1,"eq","No")=1 |
WARNING | Depends on: - MySQL: The slave I/O thread is not running |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
The template is developed for monitoring DBMS MySQL and its forks.
This template was tested on:
See Zabbix template operation for basic instructions.
<password>
at your discretion):CREATE USER 'zbx_monitor'@'%' IDENTIFIED BY '<password>';
GRANT REPLICATION CLIENT,PROCESS,SHOW DATABASES,SHOW VIEW ON *.* TO 'zbx_monitor'@'%';
For more information, please see MySQL documentation https://dev.mysql.com/doc/refman/8.0/en/grant.html
Set in the {$MYSQL.DSN} macro the data source name of the MySQL instance either session name from Zabbix agent 2 configuration file or URI. Examples: MySQL1, tcp://localhost:3306, tcp://172.16.0.10, unix:/var/run/mysql.sock For more information about MySQL Unix socket file, see the MySQL documentation https://dev.mysql.com/doc/refman/8.0/en/problems-with-mysql-sock.html.
If you had set URI in the {$MYSQL.DSN}, define the user name and password in host macros ({$MYSQL.USER} and {$MYSQL.PASSWORD}). Leave macros {$MYSQL.USER} and {$MYSQL.PASSWORD} empty if you use a session name. Set the user name and password in the Plugins.Mysql.<...> section of your Zabbix agent 2 configuration file. For more information about configuring the Zabbix MySQL plugin, see the documentation https://git.zabbix.com/projects/ZBX/repos/zabbix/browse/src/go/plugins/mysql/README.md.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$MYSQL.ABORTED_CONN.MAX.WARN} | Number of failed attempts to connect to the MySQL server for trigger expression. |
3 |
{$MYSQL.BUFF_UTIL.MIN.WARN} | The minimum buffer pool utilization percentage for trigger expression. |
50 |
{$MYSQL.CREATEDTMPDISK_TABLES.MAX.WARN} | The maximum number of created tmp tables on a disk per second for trigger expressions. |
10 |
{$MYSQL.CREATEDTMPFILES.MAX.WARN} | The maximum number of created tmp files on a disk per second for trigger expressions. |
10 |
{$MYSQL.CREATEDTMPTABLES.MAX.WARN} | The maximum number of created tmp tables in memory per second for trigger expressions. |
30 |
{$MYSQL.DSN} | System data source name such as |
<Put your DSN> |
{$MYSQL.INNODBLOGFILES} | Number of physical files in the InnoDB redo log for calculating innodblogfile_size. |
2 |
{$MYSQL.PASSWORD} | MySQL user password. |
`` |
{$MYSQL.REPL_LAG.MAX.WARN} | The lag of slave from master for trigger expression. |
30m |
{$MYSQL.SLOW_QUERIES.MAX.WARN} | The number of slow queries for trigger expression. |
3 |
{$MYSQL.USER} | MySQL user name. |
`` |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Database discovery | Scanning databases in DBMS. |
ZABBIX_PASSIVE | mysql.db.discovery["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: Filter: ANDOR- {#DATABASE} NOT MATCHES_REGEXinformation_schema |
MariaDB discovery | Additional metrics if MariaDB is used. |
DEPENDENT | mysql.extra_metric.discovery Preprocessing: - JAVASCRIPT: |
Replication discovery | If "show slave status" returns Master_Host, "Replication: *" items are created. |
ZABBIX_PASSIVE | mysql.replication.discovery["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
MySQL | MySQL: Status | ZABBIX_PASSIVE | mysql.ping["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
|
MySQL | MySQL: Version | ZABBIX_PASSIVE | mysql.version["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
|
MySQL | MySQL: Uptime | The amount of seconds that the server has been up. |
DEPENDENT | mysql.uptime Preprocessing: - JSONPATH: |
MySQL | MySQL: Aborted clients per second | Number of connections that were aborted because the client died without closing the connection properly. |
DEPENDENT | mysql.abortedclients.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Aborted connections per second | Number of failed attempts to connect to the MySQL server. |
DEPENDENT | mysql.abortedconnects.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Connection errors accept per second | Number of errors that occurred during calls to accept() on the listening port. |
DEPENDENT | mysql.connectionerrorsaccept.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MySQL | MySQL: Connection errors internal per second | Number of refused connections due to internal server errors, for example, out of memory errors, or failed thread starts. |
DEPENDENT | mysql.connectionerrorsinternal.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MySQL | MySQL: Connection errors max connections per second | Number of refused connections due to the max_connections limit being reached. |
DEPENDENT | mysql.connectionerrorsmaxconnections.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Connection errors peer address per second | Number of errors while searching for the connecting client IP address. |
DEPENDENT | mysql.connectionerrorspeeraddress.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Connection errors select per second | Number of errors during calls to select() or poll() on the listening port. The client would not necessarily have been rejected in these cases. |
DEPENDENT | mysql.connectionerrorsselect.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MySQL | MySQL: Connection errors tcpwrap per second | Number of connections the libwrap library has refused. |
DEPENDENT | mysql.connectionerrorstcpwrap.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MySQL | MySQL: Connections per second | Number of connection attempts (successful or not) to the MySQL server. |
DEPENDENT | mysql.connections.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MySQL | MySQL: Max used connections | The maximum number of connections that have been in use simultaneously since the server start. |
DEPENDENT | mysql.maxusedconnections Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
MySQL | MySQL: Threads cached | Number of threads in the thread cache. |
DEPENDENT | mysql.threads_cached Preprocessing: - JSONPATH: |
MySQL | MySQL: Threads connected | Number of currently open connections. |
DEPENDENT | mysql.threads_connected Preprocessing: - JSONPATH: |
MySQL | MySQL: Threads created per second | Number of threads created to handle connections. If Threadscreated is big, you may want to increase the threadcachesize value. The cache miss rate can be calculated as Threadscreated/Connections. |
DEPENDENT | mysql.threadscreated.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Threads running | Number of threads which are not sleeping. |
DEPENDENT | mysql.threads_running Preprocessing: - JSONPATH: |
MySQL | MySQL: Buffer pool efficiency | The item shows how effectively the buffer pool is serving reads. |
CALCULATED | mysql.bufferpoolefficiency Expression: last(//mysql.innodb_buffer_pool_reads) / ( last(//mysql.innodb_buffer_pool_read_requests) + ( last(//mysql.innodb_buffer_pool_read_requests) = 0 ) ) * 100 * ( last(//mysql.innodb_buffer_pool_read_requests) > 0 ) |
MySQL | MySQL: Buffer pool utilization | Ratio of used to total pages in the buffer pool. |
CALCULATED | mysql.bufferpoolutilization Expression: ( last(//mysql.innodb_buffer_pool_pages_total) - last(//mysql.innodb_buffer_pool_pages_free) ) / ( last(//mysql.innodb_buffer_pool_pages_total) + ( last(//mysql.innodb_buffer_pool_pages_total) = 0 ) ) * 100 * ( last(//mysql.innodb_buffer_pool_pages_total) > 0 ) |
MySQL | MySQL: Created tmp files on disk per second | How many temporary files mysqld has created. |
DEPENDENT | mysql.createdtmpfiles.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MySQL | MySQL: Created tmp tables on disk per second | Number of internal on-disk temporary tables created by the server while executing statements. |
DEPENDENT | mysql.createdtmpdisktables.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Created tmp tables on memory per second | Number of internal temporary tables created by the server while executing statements. |
DEPENDENT | mysql.createdtmptables.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MySQL | MySQL: InnoDB buffer pool pages free | The total size of the InnoDB buffer pool, in pages. |
DEPENDENT | mysql.innodbbufferpoolpagesfree Preprocessing: - JSONPATH: |
MySQL | MySQL: InnoDB buffer pool pages total | The total size of the InnoDB buffer pool, in pages. |
DEPENDENT | mysql.innodbbufferpoolpagestotal Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
MySQL | MySQL: InnoDB buffer pool read requests per second | Number of logical read requests per second. |
DEPENDENT | mysql.innodbbufferpoolreadrequests.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MySQL | MySQL: InnoDB buffer pool reads per second | Number of logical reads per second that InnoDB could not satisfy from the buffer pool, and had to read directly from the disk. |
DEPENDENT | mysql.innodbbufferpoolreads.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: InnoDB row lock time | The total time spent in acquiring row locks for InnoDB tables, in milliseconds. |
DEPENDENT | mysql.innodbrowlocktime Preprocessing: - JSONPATH: - MULTIPLIER: - DISCARD UNCHANGED_HEARTBEAT:1h |
MySQL | MySQL: InnoDB row lock time max | The maximum time to acquire a row lock for InnoDB tables, in milliseconds. |
DEPENDENT | mysql.innodbrowlocktimemax Preprocessing: - JSONPATH: - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
MySQL | MySQL: InnoDB row lock waits | Number of times operations on InnoDB tables had to wait for a row lock. |
DEPENDENT | mysql.innodbrowlock_waits Preprocessing: - JSONPATH: |
MySQL | MySQL: Slow queries per second | Number of queries that have taken more than longquerytime seconds. |
DEPENDENT | mysql.slowqueries.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Bytes received | Number of bytes received from all clients. |
DEPENDENT | mysql.bytesreceived.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Bytes sent | Number of bytes sent to all clients. |
DEPENDENT | mysql.bytessent.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Command Delete per second | The Com_delete counter variable indicates the number of times the delete statement has been executed. |
DEPENDENT | mysql.comdelete.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Command Insert per second | The Com_insert counter variable indicates the number of times the insert statement has been executed. |
DEPENDENT | mysql.cominsert.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Command Select per second | The Com_select counter variable indicates the number of times the select statement has been executed. |
DEPENDENT | mysql.comselect.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Command Update per second | The Com_update counter variable indicates the number of times the update statement has been executed. |
DEPENDENT | mysql.comupdate.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Queries per second | Number of statements executed by the server. This variable includes statements executed within stored programs, unlike the Questions variable. |
DEPENDENT | mysql.queries.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MySQL | MySQL: Questions per second | Number of statements executed by the server. This includes only statements sent to the server by clients and not statements executed within stored programs, unlike the Queries variable. |
DEPENDENT | mysql.questions.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MySQL | MySQL: Binlog cache disk use | Number of transactions that used a temporary disk cache because they could not fit in the regular binary log cache, being larger than binlogcachesize. |
DEPENDENT | mysql.binlogcachediskuse Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
MySQL | MySQL: Innodb buffer pool wait free | Number of times InnoDB waited for a free page before reading or creating a page. Normally, writes to the InnoDB buffer pool happen in the background. When no clean pages are available, dirty pages are flushed first in order to free some up. This counts the numbers of wait for this operation to finish. If this value is not small, look at the increasing innodbbufferpool_size. |
DEPENDENT | mysql.innodbbufferpoolwaitfree Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
MySQL | MySQL: Innodb number open files | Number of open files held by InnoDB. InnoDB only. |
DEPENDENT | mysql.innodbnumopenfiles Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
MySQL | MySQL: Open table definitions | Number of cached table definitions. |
DEPENDENT | mysql.opentabledefinitions Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
MySQL | MySQL: Open tables | Number of tables that are open. |
DEPENDENT | mysql.opentables Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
MySQL | MySQL: Innodb log written | Number of bytes written to the InnoDB log. |
DEPENDENT | mysql.innodboslogwritten Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
MySQL | MySQL: Calculated value of innodblogfile_size | Calculated by (innodboslogwritten-innodboslogwritten(time shift -1h))/{$MYSQL.INNODBLOGFILES} value of the innodblogfilesize. Innodblogfilesize is the size in bytes of the each InnoDB redo log file in the log group. The combined size can be no more than 512GB. Larger values mean less disk I/O due to less flushing checkpoint activity, but also slower recovery from a crash. |
CALCULATED | mysql.innodblogfilesize Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:6h Expression: (last(//mysql.innodb_os_log_written) - last(//mysql.innodb_os_log_written,#1:now-1h)) / {$MYSQL.INNODB_LOG_FILES} |
MySQL | MySQL: Size of database {#DATABASE} | - |
ZABBIX_PASSIVE | mysql.db.size["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}","{#DATABASE}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
MySQL | MySQL: Replication Slave SQL Running State {#MASTER_HOST} | This shows the state of the SQL driver threads. |
DEPENDENT | mysql.replication.slavesqlrunningstate["{#MASTERHOST}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
MySQL | MySQL: Replication Seconds Behind Master {#MASTER_HOST} | Number of seconds that the slave SQL thread is behind processing the master binary log. A high number (or an increasing one) can indicate that the slave is unable to handle events from the master in a timely fashion. |
DEPENDENT | mysql.replication.secondsbehindmaster["{#MASTERHOST}"] Preprocessing: - JSONPATH: - MATCHES REGEX:\d+ ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
MySQL | MySQL: Replication Slave IO Running {#MASTER_HOST} | Whether the I/O thread for reading the master's binary log is running. Normally, you want this to be Yes unless you have not yet started a replication or have explicitly stopped it with STOP SLAVE. |
DEPENDENT | mysql.replication.slaveiorunning["{#MASTERHOST}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
MySQL | MySQL: Replication Slave SQL Running {#MASTER_HOST} | Whether the SQL thread for executing events in the relay log is running. As with the I/O thread, this should normally be Yes. |
DEPENDENT | mysql.replication.slavesqlrunning["{#MASTERHOST}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
MySQL | MySQL: Binlog commits | Total number of transactions committed to the binary log. |
DEPENDENT | mysql.binlog_commits[{#SINGLETON}] Preprocessing: - JSONPATH: |
MySQL | MySQL: Binlog group commits | Total number of group commits done to the binary log. |
DEPENDENT | mysql.binloggroupcommits[{#SINGLETON}] Preprocessing: - JSONPATH: |
MySQL | MySQL: Master GTID wait count | The number of times MASTERGTIDWAIT called. |
DEPENDENT | mysql.mastergtidwaitcount[{#SINGLETON}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
MySQL | MySQL: Master GTID wait time | Total number of time spent in MASTERGTIDWAIT. |
DEPENDENT | mysql.mastergtidwaittime[{#SINGLETON}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
MySQL | MySQL: Master GTID wait timeouts | Number of timeouts occurring in MASTERGTIDWAIT. |
DEPENDENT | mysql.mastergtidwaittimeouts[{#SINGLETON}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
Zabbix raw items | MySQL: Get status variables | The item gets server global status information. |
ZABBIX_PASSIVE | mysql.getstatusvariables["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"] |
Zabbix raw items | MySQL: InnoDB buffer pool read requests | Number of logical read requests. |
DEPENDENT | mysql.innodbbufferpoolreadrequests Preprocessing: - JSONPATH: |
Zabbix raw items | MySQL: InnoDB buffer pool reads | Number of logical reads that InnoDB could not satisfy from the buffer pool, and had to read directly from the disk. |
DEPENDENT | mysql.innodbbufferpool_reads Preprocessing: - JSONPATH: |
Zabbix raw items | MySQL: Replication Slave status {#MASTER_HOST} | The item gets status information on the essential parameters of the slave threads. |
ZABBIX_PASSIVE | mysql.replication.getslavestatus["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}","{#MASTER_HOST}"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MySQL: Service is down | - |
last(/MySQL by Zabbix agent 2/mysql.ping["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"])=0 |
HIGH | |
MySQL: Version has changed | MySQL version has changed. Ack to close. |
last(/MySQL by Zabbix agent 2/mysql.version["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"],#1)<>last(/MySQL by Zabbix agent 2/mysql.version["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"],#2) and length(last(/MySQL by Zabbix agent 2/mysql.version["{$MYSQL.DSN}","{$MYSQL.USER}","{$MYSQL.PASSWORD}"]))>0 |
INFO | Manual close: YES |
MySQL: Service has been restarted | MySQL uptime is less than 10 minutes. |
last(/MySQL by Zabbix agent 2/mysql.uptime)<10m |
INFO | |
MySQL: Failed to fetch info data | Zabbix has not received data for items for the last 30 minutes. |
nodata(/MySQL by Zabbix agent 2/mysql.uptime,30m)=1 |
INFO | Depends on: - MySQL: Service is down |
MySQL: Server has aborted connections | The number of failed attempts to connect to the MySQL server is more than {$MYSQL.ABORTED_CONN.MAX.WARN} in the last 5 minutes. |
min(/MySQL by Zabbix agent 2/mysql.aborted_connects.rate,5m)>{$MYSQL.ABORTED_CONN.MAX.WARN} |
AVERAGE | Depends on: - MySQL: Refused connections |
MySQL: Refused connections | Number of refused connections due to the max_connections limit being reached. |
last(/MySQL by Zabbix agent 2/mysql.connection_errors_max_connections.rate)>0 |
AVERAGE | |
MySQL: Buffer pool utilization is too low | The buffer pool utilization is less than {$MYSQL.BUFF_UTIL.MIN.WARN}% in the last 5 minutes. This means that there is a lot of unused RAM allocated for the buffer pool, which you can easily reallocate at the moment. |
max(/MySQL by Zabbix agent 2/mysql.buffer_pool_utilization,5m)<{$MYSQL.BUFF_UTIL.MIN.WARN} |
WARNING | |
MySQL: Number of temporary files created per second is high | Possibly the application using the database is in need of query optimization. |
min(/MySQL by Zabbix agent 2/mysql.created_tmp_files.rate,5m)>{$MYSQL.CREATED_TMP_FILES.MAX.WARN} |
WARNING | |
MySQL: Number of on-disk temporary tables created per second is high | Possibly the application using the database is in need of query optimization. |
min(/MySQL by Zabbix agent 2/mysql.created_tmp_disk_tables.rate,5m)>{$MYSQL.CREATED_TMP_DISK_TABLES.MAX.WARN} |
WARNING | |
MySQL: Number of internal temporary tables created per second is high | Possibly the application using the database is in need of query optimization. |
min(/MySQL by Zabbix agent 2/mysql.created_tmp_tables.rate,5m)>{$MYSQL.CREATED_TMP_TABLES.MAX.WARN} |
WARNING | |
MySQL: Server has slow queries | The number of slow queries is more than {$MYSQL.SLOW_QUERIES.MAX.WARN} in the last 5 minutes. |
min(/MySQL by Zabbix agent 2/mysql.slow_queries.rate,5m)>{$MYSQL.SLOW_QUERIES.MAX.WARN} |
WARNING | |
MySQL: Replication lag is too high | - |
min(/MySQL by Zabbix agent 2/mysql.replication.seconds_behind_master["{#MASTER_HOST}"],5m)>{$MYSQL.REPL_LAG.MAX.WARN} |
WARNING | |
MySQL: The slave I/O thread is not running | Whether the I/O thread for reading the master's binary log is running. |
count(/MySQL by Zabbix agent 2/mysql.replication.slave_io_running["{#MASTER_HOST}"],#1,"eq","No")=1 |
AVERAGE | |
MySQL: The slave I/O thread is not connected to a replication master | - |
count(/MySQL by Zabbix agent 2/mysql.replication.slave_io_running["{#MASTER_HOST}"],#1,"ne","Yes")=1 |
WARNING | Depends on: - MySQL: The slave I/O thread is not running |
MySQL: The SQL thread is not running | Whether the SQL thread for executing events in the relay log is running. |
count(/MySQL by Zabbix agent 2/mysql.replication.slave_sql_running["{#MASTER_HOST}"],#1,"eq","No")=1 |
WARNING | Depends on: - MySQL: The slave I/O thread is not running |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
The template is developed for monitoring DBMS MySQL and its forks.
This template was tested on:
See Zabbix template operation for basic instructions.
mysql
and mysqladmin
utilities to the global environment variable PATH.template_db_mysql.conf
into the folder with Zabbix agent configuration (/etc/zabbix/zabbix_agentd.d/
by default). Don't forget to restart Zabbix agent.<password>
at your discretion):CREATE USER 'zbx_monitor'@'%' IDENTIFIED BY '<password>';
GRANT REPLICATION CLIENT,PROCESS,SHOW DATABASES,SHOW VIEW ON *.* TO 'zbx_monitor'@'%';
For more information, please see MySQL documentation https://dev.mysql.com/doc/refman/8.0/en/grant.html
.my.cnf
in the home directory of Zabbix agent for Linux (/var/lib/zabbix
by default ) or my.cnf
in c:\ for Windows. The file must have three strings:[client]
user='zbx_monitor'
password='<password>'
NOTE: Use systemd to start Zabbix agent on Linux OS. For example, in Centos use "systemctl edit zabbix-agent.service" to set the required user to start the Zabbix agent.
Add the rule to the SELinux policy (example for Centos):
# cat <<EOF > zabbix_home.te
module zabbix_home 1.0;
require {
type zabbix_agent_t;
type zabbix_var_lib_t;
type mysqld_etc_t;
type mysqld_port_t;
type mysqld_var_run_t;
class file { open read };
class tcp_socket name_connect;
class sock_file write;
}
#============= zabbix_agent_t ==============
allow zabbix_agent_t zabbix_var_lib_t:file read;
allow zabbix_agent_t zabbix_var_lib_t:file open;
allow zabbix_agent_t mysqld_etc_t:file read;
allow zabbix_agent_t mysqld_port_t:tcp_socket name_connect;
allow zabbix_agent_t mysqld_var_run_t:sock_file write;
EOF
# checkmodule -M -m -o zabbix_home.mod zabbix_home.te
# semodule_package -o zabbix_home.pp -m zabbix_home.mod
# semodule -i zabbix_home.pp
# restorecon -R /var/lib/zabbix
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$MYSQL.ABORTED_CONN.MAX.WARN} | The number of failed attempts to connect to the MySQL server for trigger expression. |
3 |
{$MYSQL.BUFF_UTIL.MIN.WARN} | The minimum buffer pool utilization in percentage for trigger expression. |
50 |
{$MYSQL.CREATEDTMPDISK_TABLES.MAX.WARN} | The maximum number of created tmp tables on a disk per second for trigger expressions. |
10 |
{$MYSQL.CREATEDTMPFILES.MAX.WARN} | The maximum number of created tmp files on a disk per second for trigger expressions. |
10 |
{$MYSQL.CREATEDTMPTABLES.MAX.WARN} | The maximum number of created tmp tables in memory per second for trigger expressions. |
30 |
{$MYSQL.HOST} | Hostname or IP of MySQL host or container. |
127.0.0.1 |
{$MYSQL.INNODBLOGFILES} | Number of physical files in the InnoDB redo log for calculating innodblogfile_size. |
2 |
{$MYSQL.PORT} | MySQL service port. |
3306 |
{$MYSQL.REPL_LAG.MAX.WARN} | The lag of slave from master for trigger expression. |
30m |
{$MYSQL.SLOW_QUERIES.MAX.WARN} | The number of slow queries for trigger expression. |
3 |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Database discovery | Scanning databases in DBMS. |
ZABBIX_PASSIVE | mysql.db.discovery["{$MYSQL.HOST}","{$MYSQL.PORT}"] Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: Filter: ANDOR- {#DBNAME} NOT MATCHES_REGEXinformation_schema |
MariaDB discovery | Additional metrics if MariaDB is used. |
DEPENDENT | mysql.extra_metric.discovery Preprocessing: - JAVASCRIPT: |
Replication discovery | If "show slave status" returns Master_Host, "Replication: *" items are created. |
ZABBIX_PASSIVE | mysql.replication.discovery["{$MYSQL.HOST}","{$MYSQL.PORT}"] Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
MySQL | MySQL: Status | ZABBIX_PASSIVE | mysql.ping["{$MYSQL.HOST}","{$MYSQL.PORT}"] Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
|
MySQL | MySQL: Version | ZABBIX_PASSIVE | mysql.version["{$MYSQL.HOST}","{$MYSQL.PORT}"] Preprocessing: - REGEX: - DISCARDUNCHANGEDHEARTBEAT: |
|
MySQL | MySQL: Uptime | The amount of seconds that the server has been up. |
DEPENDENT | mysql.uptime Preprocessing: - XMLPATH: |
MySQL | MySQL: Aborted clients per second | Number of connections that were aborted because the client died without closing the connection properly. |
DEPENDENT | mysql.abortedclients.rate Preprocessing: - XMLPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Aborted connections per second | Number of failed attempts to connect to the MySQL server. |
DEPENDENT | mysql.abortedconnects.rate Preprocessing: - XMLPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Connection errors accept per second | Number of errors that occurred during calls to accept() on the listening port. |
DEPENDENT | mysql.connectionerrorsaccept.rate Preprocessing: - XMLPATH: - CHANGEPERSECOND |
MySQL | MySQL: Connection errors internal per second | Number of refused connections due to internal server errors, for example, out of memory errors, or failed thread starts. |
DEPENDENT | mysql.connectionerrorsinternal.rate Preprocessing: - XMLPATH: - CHANGEPERSECOND |
MySQL | MySQL: Connection errors max connections per second | Number of refused connections due to the max_connections limit being reached. |
DEPENDENT | mysql.connectionerrorsmaxconnections.rate Preprocessing: - XMLPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Connection errors peer address per second | Number of errors while searching for the connecting client IP address. |
DEPENDENT | mysql.connectionerrorspeeraddress.rate Preprocessing: - XMLPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Connection errors select per second | Number of errors during calls to select() or poll() on the listening port. The client would not necessarily have been rejected in these cases. |
DEPENDENT | mysql.connectionerrorsselect.rate Preprocessing: - XMLPATH: - CHANGEPERSECOND |
MySQL | MySQL: Connection errors tcpwrap per second | Number of connections the libwrap library has refused. |
DEPENDENT | mysql.connectionerrorstcpwrap.rate Preprocessing: - XMLPATH: - CHANGEPERSECOND |
MySQL | MySQL: Connections per second | Number of connection attempts (successful or not) to the MySQL server. |
DEPENDENT | mysql.connections.rate Preprocessing: - XMLPATH: - CHANGEPERSECOND |
MySQL | MySQL: Max used connections | The maximum number of connections that have been in use simultaneously since the server start. |
DEPENDENT | mysql.maxusedconnections Preprocessing: - XMLPATH: - DISCARDUNCHANGEDHEARTBEAT: |
MySQL | MySQL: Threads cached | Number of threads in the thread cache. |
DEPENDENT | mysql.threads_cached Preprocessing: - XMLPATH: |
MySQL | MySQL: Threads connected | Number of currently open connections. |
DEPENDENT | mysql.threads_connected Preprocessing: - XMLPATH: |
MySQL | MySQL: Threads created per second | Number of threads created to handle connections. If Threadscreated is big, you may want to increase the threadcachesize value. The cache miss rate can be calculated as Threadscreated/Connections. |
DEPENDENT | mysql.threadscreated.rate Preprocessing: - XMLPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Threads running | Number of threads which are not sleeping. |
DEPENDENT | mysql.threads_running Preprocessing: - XMLPATH: |
MySQL | MySQL: Buffer pool efficiency | The item shows how effectively the buffer pool is serving reads. |
CALCULATED | mysql.bufferpoolefficiency Expression: last(//mysql.innodb_buffer_pool_reads) / ( last(//mysql.innodb_buffer_pool_read_requests) + ( last(//mysql.innodb_buffer_pool_read_requests) = 0 ) ) * 100 * ( last(//mysql.innodb_buffer_pool_read_requests) > 0 ) |
MySQL | MySQL: Buffer pool utilization | Ratio of used to total pages in the buffer pool. |
CALCULATED | mysql.bufferpoolutilization Expression: ( last(//mysql.innodb_buffer_pool_pages_total) - last(//mysql.innodb_buffer_pool_pages_free) ) / ( last(//mysql.innodb_buffer_pool_pages_total) + ( last(//mysql.innodb_buffer_pool_pages_total) = 0 ) ) * 100 * ( last(//mysql.innodb_buffer_pool_pages_total) > 0 ) |
MySQL | MySQL: Created tmp files on disk per second | How many temporary files mysqld has created. |
DEPENDENT | mysql.createdtmpfiles.rate Preprocessing: - XMLPATH: - CHANGEPERSECOND |
MySQL | MySQL: Created tmp tables on disk per second | Number of internal on-disk temporary tables created by the server while executing statements. |
DEPENDENT | mysql.createdtmpdisktables.rate Preprocessing: - XMLPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Created tmp tables on memory per second | Number of internal temporary tables created by the server while executing statements. |
DEPENDENT | mysql.createdtmptables.rate Preprocessing: - XMLPATH: - CHANGEPERSECOND |
MySQL | MySQL: InnoDB buffer pool pages free | The total size of the InnoDB buffer pool, in pages. |
DEPENDENT | mysql.innodbbufferpoolpagesfree Preprocessing: - XMLPATH: |
MySQL | MySQL: InnoDB buffer pool pages total | The total size of the InnoDB buffer pool, in pages. |
DEPENDENT | mysql.innodbbufferpoolpagestotal Preprocessing: - XMLPATH: - DISCARDUNCHANGEDHEARTBEAT: |
MySQL | MySQL: InnoDB buffer pool read requests per second | Number of logical read requests per second. |
DEPENDENT | mysql.innodbbufferpoolreadrequests.rate Preprocessing: - XMLPATH: - CHANGEPERSECOND |
MySQL | MySQL: InnoDB buffer pool reads per second | Number of logical reads per second that InnoDB could not satisfy from the buffer pool, and had to read directly from the disk. |
DEPENDENT | mysql.innodbbufferpoolreads.rate Preprocessing: - XMLPATH: - CHANGE PER_SECOND |
MySQL | MySQL: InnoDB row lock time | The total time spent in acquiring row locks for InnoDB tables, in milliseconds. |
DEPENDENT | mysql.innodbrowlocktime Preprocessing: - XMLPATH: - MULTIPLIER: - DISCARD UNCHANGED_HEARTBEAT:1h |
MySQL | MySQL: InnoDB row lock time max | The maximum time to acquire a row lock for InnoDB tables, in milliseconds. |
DEPENDENT | mysql.innodbrowlocktimemax Preprocessing: - XMLPATH: - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
MySQL | MySQL: InnoDB row lock waits | Number of times operations on InnoDB tables had to wait for a row lock. |
DEPENDENT | mysql.innodbrowlock_waits Preprocessing: - XMLPATH: |
MySQL | MySQL: Slow queries per second | Number of queries that have taken more than longquerytime seconds. |
DEPENDENT | mysql.slowqueries.rate Preprocessing: - XMLPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Bytes received | Number of bytes received from all clients. |
DEPENDENT | mysql.bytesreceived.rate Preprocessing: - XMLPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Bytes sent | Number of bytes sent to all clients. |
DEPENDENT | mysql.bytessent.rate Preprocessing: - XMLPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Command Delete per second | The Com_delete counter variable indicates the number of times the delete statement has been executed. |
DEPENDENT | mysql.comdelete.rate Preprocessing: - XMLPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Command Insert per second | The Com_insert counter variable indicates the number of times the insert statement has been executed. |
DEPENDENT | mysql.cominsert.rate Preprocessing: - XMLPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Command Select per second | The Com_select counter variable indicates the number of times the select statement has been executed. |
DEPENDENT | mysql.comselect.rate Preprocessing: - XMLPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Command Update per second | The Com_update counter variable indicates the number of times the update statement has been executed. |
DEPENDENT | mysql.comupdate.rate Preprocessing: - XMLPATH: - CHANGE PER_SECOND |
MySQL | MySQL: Queries per second | Number of statements executed by the server. This variable includes statements executed within stored programs, unlike the Questions variable. |
DEPENDENT | mysql.queries.rate Preprocessing: - XMLPATH: - CHANGEPERSECOND |
MySQL | MySQL: Questions per second | Number of statements executed by the server. This includes only statements sent to the server by clients and not statements executed within stored programs, unlike the Queries variable. |
DEPENDENT | mysql.questions.rate Preprocessing: - XMLPATH: - CHANGEPERSECOND |
MySQL | MySQL: Binlog cache disk use | Number of transactions that used a temporary disk cache because they could not fit in the regular binary log cache, being larger than binlogcachesize. |
DEPENDENT | mysql.binlogcachediskuse Preprocessing: - XMLPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
MySQL | MySQL: Innodb buffer pool wait free | Number of times InnoDB waited for a free page before reading or creating a page. Normally, writes to the InnoDB buffer pool happen in the background. When no clean pages are available, dirty pages are flushed first in order to free some up. This counts the numbers of wait for this operation to finish. If this value is not small, look at the increasing innodbbufferpool_size. |
DEPENDENT | mysql.innodbbufferpoolwaitfree Preprocessing: - XMLPATH: - DISCARDUNCHANGEDHEARTBEAT: |
MySQL | MySQL: Innodb number open files | Number of open files held by InnoDB. InnoDB only. |
DEPENDENT | mysql.innodbnumopenfiles Preprocessing: - XMLPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
MySQL | MySQL: Open table definitions | Number of cached table definitions. |
DEPENDENT | mysql.opentabledefinitions Preprocessing: - XMLPATH: - DISCARDUNCHANGEDHEARTBEAT: |
MySQL | MySQL: Open tables | Number of tables that are open. |
DEPENDENT | mysql.opentables Preprocessing: - XMLPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
MySQL | MySQL: Innodb log written | Number of bytes written to the InnoDB log. |
DEPENDENT | mysql.innodboslog_written Preprocessing: - XMLPATH: |
MySQL | MySQL: Calculated value of innodblogfile_size | Calculated by (innodboslogwritten-innodboslogwritten(time shift -1h))/{$MYSQL.INNODBLOGFILES} value of the innodblogfilesize. Innodblogfilesize is the size in bytes of the each InnoDB redo log file in the log group. The combined size can be no more than 512GB. Larger values mean less disk I/O due to less flushing checkpoint activity, but also slower recovery from a crash. |
CALCULATED | mysql.innodblogfilesize Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:6h Expression: (last(//mysql.innodb_os_log_written) - last(//mysql.innodb_os_log_written,#1:now-1h)) / {$MYSQL.INNODB_LOG_FILES} |
MySQL | MySQL: Size of database {#DBNAME} | - |
ZABBIX_PASSIVE | mysql.dbsize["{$MYSQL.HOST}","{$MYSQL.PORT}","{#DBNAME}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
MySQL | MySQL: Replication Slave SQL Running State {#MASTER_HOST} | This shows the state of the SQL driver threads. |
DEPENDENT | mysql.slavesqlrunningstate["{#MASTERHOST}"] Preprocessing: - XMLPATH: - DISCARDUNCHANGEDHEARTBEAT: |
MySQL | MySQL: Replication Seconds Behind Master {#MASTERHOST} | The number of seconds that the slave SQL thread is behind processing the master binary log. A high number (or an increasing one) can indicate that the slave is unable to handle events from the master in a timely fashion. |
DEPENDENT | mysql.secondsbehindmaster["{#MASTERHOST}"] Preprocessing: - XMLPATH: - DISCARDUNCHANGEDHEARTBEAT: - NOTMATCHESREGEX: ⛔️ON_FAIL: |
MySQL | MySQL: Replication Slave IO Running {#MASTERHOST} | Whether the I/O thread for reading the master's binary log is running. Normally, you want this to be Yes unless you have not yet started replication or have explicitly stopped it with STOP SLAVE. |
DEPENDENT | mysql.slaveiorunning["{#MASTERHOST}"] Preprocessing: - XMLPATH: - DISCARDUNCHANGEDHEARTBEAT: |
MySQL | MySQL: Replication Slave SQL Running {#MASTERHOST} | Whether the SQL thread for executing events in the relay log is running. As with the I/O thread, this should normally be Yes. |
DEPENDENT | mysql.slavesqlrunning["{#MASTERHOST}"] Preprocessing: - XMLPATH: - DISCARDUNCHANGEDHEARTBEAT: |
MySQL | MySQL: Binlog commits | Total number of transactions committed to the binary log. |
DEPENDENT | mysql.binlog_commits[{#SINGLETON}] Preprocessing: - XMLPATH: |
MySQL | MySQL: Binlog group commits | Total number of group commits done to the binary log. |
DEPENDENT | mysql.binloggroupcommits[{#SINGLETON}] Preprocessing: - XMLPATH: |
MySQL | MySQL: Master GTID wait count | The number of times MASTERGTIDWAIT called. |
DEPENDENT | mysql.mastergtidwaitcount[{#SINGLETON}] Preprocessing: - XMLPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
MySQL | MySQL: Master GTID wait time | Total number of time spent in MASTERGTIDWAIT. |
DEPENDENT | mysql.mastergtidwaittime[{#SINGLETON}] Preprocessing: - XMLPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
MySQL | MySQL: Master GTID wait timeouts | Number of timeouts occurring in MASTERGTIDWAIT. |
DEPENDENT | mysql.mastergtidwaittimeouts[{#SINGLETON}] Preprocessing: - XMLPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
Zabbix raw items | MySQL: Get status variables | The item gets server global status information. |
ZABBIX_PASSIVE | mysql.getstatusvariables["{$MYSQL.HOST}","{$MYSQL.PORT}"] |
Zabbix raw items | MySQL: InnoDB buffer pool read requests | Number of logical read requests. |
DEPENDENT | mysql.innodbbufferpoolreadrequests Preprocessing: - XMLPATH: |
Zabbix raw items | MySQL: InnoDB buffer pool reads | Number of logical reads that InnoDB could not satisfy from the buffer pool, and had to read directly from the disk. |
DEPENDENT | mysql.innodbbufferpool_reads Preprocessing: - XMLPATH: |
Zabbix raw items | MySQL: Replication Slave status {#MASTERHOST} | The item gets status information on the essential parameters of the slave threads. |
ZABBIX_PASSIVE | mysql.slave_status["{$MYSQL.HOST}","{$MYSQL.PORT}","{#MASTERHOST}"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MySQL: Service is down | - |
last(/MySQL by Zabbix agent/mysql.ping["{$MYSQL.HOST}","{$MYSQL.PORT}"])=0 |
HIGH | |
MySQL: Version has changed | MySQL version has changed. Ack to close. |
last(/MySQL by Zabbix agent/mysql.version["{$MYSQL.HOST}","{$MYSQL.PORT}"],#1)<>last(/MySQL by Zabbix agent/mysql.version["{$MYSQL.HOST}","{$MYSQL.PORT}"],#2) and length(last(/MySQL by Zabbix agent/mysql.version["{$MYSQL.HOST}","{$MYSQL.PORT}"]))>0 |
INFO | Manual close: YES |
MySQL: Service has been restarted | MySQL uptime is less than 10 minutes. |
last(/MySQL by Zabbix agent/mysql.uptime)<10m |
INFO | |
MySQL: Failed to fetch info data | Zabbix has not received data for items for the last 30 minutes. |
nodata(/MySQL by Zabbix agent/mysql.uptime,30m)=1 |
INFO | Depends on: - MySQL: Service is down |
MySQL: Server has aborted connections | The number of failed attempts to connect to the MySQL server is more than {$MYSQL.ABORTED_CONN.MAX.WARN} in the last 5 minutes. |
min(/MySQL by Zabbix agent/mysql.aborted_connects.rate,5m)>{$MYSQL.ABORTED_CONN.MAX.WARN} |
AVERAGE | Depends on: - MySQL: Refused connections |
MySQL: Refused connections | Number of refused connections due to the max_connections limit being reached. |
last(/MySQL by Zabbix agent/mysql.connection_errors_max_connections.rate)>0 |
AVERAGE | |
MySQL: Buffer pool utilization is too low | The buffer pool utilization is less than {$MYSQL.BUFF_UTIL.MIN.WARN}% in the last 5 minutes. This means that there is a lot of unused RAM allocated for the buffer pool, which you can easily reallocate at the moment. |
max(/MySQL by Zabbix agent/mysql.buffer_pool_utilization,5m)<{$MYSQL.BUFF_UTIL.MIN.WARN} |
WARNING | |
MySQL: Number of temporary files created per second is high | Possibly the application using the database is in need of query optimization. |
min(/MySQL by Zabbix agent/mysql.created_tmp_files.rate,5m)>{$MYSQL.CREATED_TMP_FILES.MAX.WARN} |
WARNING | |
MySQL: Number of on-disk temporary tables created per second is high | Possibly the application using the database is in need of query optimization. |
min(/MySQL by Zabbix agent/mysql.created_tmp_disk_tables.rate,5m)>{$MYSQL.CREATED_TMP_DISK_TABLES.MAX.WARN} |
WARNING | |
MySQL: Number of internal temporary tables created per second is high | Possibly the application using the database is in need of query optimization. |
min(/MySQL by Zabbix agent/mysql.created_tmp_tables.rate,5m)>{$MYSQL.CREATED_TMP_TABLES.MAX.WARN} |
WARNING | |
MySQL: Server has slow queries | The number of slow queries is more than {$MYSQL.SLOW_QUERIES.MAX.WARN} in the last 5 minutes. |
min(/MySQL by Zabbix agent/mysql.slow_queries.rate,5m)>{$MYSQL.SLOW_QUERIES.MAX.WARN} |
WARNING | |
MySQL: Replication lag is too high | - |
min(/MySQL by Zabbix agent/mysql.seconds_behind_master["{#MASTERHOST}"],5m)>{$MYSQL.REPL_LAG.MAX.WARN} |
WARNING | |
MySQL: The slave I/O thread is not running | Whether the I/O thread for reading the master's binary log is running. |
count(/MySQL by Zabbix agent/mysql.slave_io_running["{#MASTERHOST}"],#1,"eq","No")=1 |
AVERAGE | |
MySQL: The slave I/O thread is not connected to a replication master | - |
count(/MySQL by Zabbix agent/mysql.slave_io_running["{#MASTERHOST}"],#1,"ne","Yes")=1 |
WARNING | Depends on: - MySQL: The slave I/O thread is not running |
MySQL: The SQL thread is not running | Whether the SQL thread for executing events in the relay log is running. |
count(/MySQL by Zabbix agent/mysql.slave_sql_running["{#MASTERHOST}"],#1,"eq","No")=1 |
WARNING | Depends on: - MySQL: The slave I/O thread is not running |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
The template is developed for monitoring DBMS Microsoft SQL Server via ODBC.
This template was tested on:
See Zabbix template operation for basic instructions.
GRANT SELECT ON OBJECT::msdb.dbo.sysjobs TO zbx_monitor
GRANT SELECT ON OBJECT::msdb.dbo.sysjobservers TO zbx_monitor
GRANT SELECT ON OBJECT::msdb.dbo.sysjobactivity TO zbx_monitor
GRANT EXECUTE ON OBJECT::msdb.dbo.agent_datetime TO zbx_monitor
For more information, see MSSQL documentation:
Create a database user
GRANT Server Permissions
Configure a User to Create and Manage SQL Server Agent JobsFor named instance set the value of {$MSSQL.INSTANCE} macro as MSSQL$instance name. In case if MSSQL was installed using default configuration do not change {$MSSQL.INSTANCE} macro value.
The "Service's TCP port state" item uses {HOST.CONN} and {$MSSQL.PORT} macros to check the availability of MSSQL instance. If your instance uses a non-default TCP port, set the port in your section of odbc.ini in the line Server = IP or FQDN name, port.
No specific Zabbix configuration is required.
Name | Description | Default | |||
---|---|---|---|---|---|
{$MSSQL.AVERAGEWAITTIME.MAX} | The maximum average wait time in ms - for the trigger expression. |
500 |
|||
{$MSSQL.BACKUP_DIFF.CRIT} | The maximum days without a differential backup - for the High trigger expression. |
6d |
|||
{$MSSQL.BACKUP_DIFF.WARN} | The maximum days without a differential backup - for the Warning trigger expression. |
3d |
|||
{$MSSQL.BACKUP_DURATION.WARN} | The maximum job duration - for the Warning trigger expression. |
1h |
|||
{$MSSQL.BACKUP_FULL.CRIT} | The maximum days without a full backup - for the High trigger expression. |
10d |
|||
{$MSSQL.BACKUP_FULL.WARN} | The maximum days without a full backup - for the Warning trigger expression. |
9d |
|||
{$MSSQL.BACKUP_LOG.CRIT} | The maximum days without a log backup - for the High trigger expression. |
8h |
|||
{$MSSQL.BACKUP_LOG.WARN} | The maximum days without a log backup - for the Warning trigger expression. |
4h |
|||
{$MSSQL.BUFFERCACHERATIO.MIN.CRIT} | The minimum % buffer cache hit ratio - for the High trigger expression. |
30 |
|||
{$MSSQL.BUFFERCACHERATIO.MIN.WARN} | The minimum % buffer cache hit ratio - for the Warning trigger expression. |
50 |
|||
{$MSSQL.DBNAME.MATCHES} | This macro is used in database discovery. It can be overridden on a host or linked template level. |
.* |
|||
{$MSSQL.DBNAME.NOT_MATCHES} | This macro is used in database discovery. It can be overridden on a host or linked template level. |
`master | tempdb | model | msdb` |
{$MSSQL.DEADLOCKS.MAX} | The maximum deadlocks per second - for the trigger expression. |
1 |
|||
{$MSSQL.DSN} | System data source name. |
<Put your DSN here> |
|||
{$MSSQL.FREELISTSTALLS.MAX} | The maximum free list stalls per second - for the trigger expression. |
2 |
|||
{$MSSQL.INSTANCE} | The instance name for the default instance is SQLServer. For named instance set the macro value as MSSQL$instance name. |
SQLServer |
|||
{$MSSQL.JOB.MATCHES} | This macro is used in job discovery. It can be overridden on a host or linked template level. |
.* |
|||
{$MSSQL.JOB.NOT_MATCHES} | This macro is used in job discovery. It can be overridden on a host or linked template level. |
CHANGE_IF_NEEDED |
|||
{$MSSQL.LAZY_WRITES.MAX} | The maximum lazy writes per second - for the trigger expression. |
20 |
|||
{$MSSQL.LOCK_REQUESTS.MAX} | The maximum lock requests per second - for the trigger expression. |
1000 |
|||
{$MSSQL.LOCK_TIMEOUTS.MAX} | The maximum lock timeouts per second - for the trigger expression. |
1 |
|||
{$MSSQL.LOGFLUSHWAITS.MAX} | The maximum log flush waits per second - for the trigger expression. |
1 |
|||
{$MSSQL.LOGFLUSHWAIT_TIME.MAX} | The maximum log flush wait time in ms - for the trigger expression. |
1 |
|||
{$MSSQL.PAGELIFEEXPECTANCY.MIN} | The minimum page life expectancy - for the trigger expression. |
300 |
|||
{$MSSQL.PAGE_READS.MAX} | The maximum page reads per second - for the trigger expression. |
90 |
|||
{$MSSQL.PAGE_WRITES.MAX} | The maximum page writes per second - for the trigger expression. |
90 |
|||
{$MSSQL.PASSWORD} | MSSQL user password. |
<Put your password here> |
|||
{$MSSQL.PERCENT_COMPILATIONS.MAX} | The maximum percentage of Transact-SQL compilations - for the trigger expression. |
10 |
|||
{$MSSQL.PERCENTLOGUSED.MAX} | The maximum percentage of log used - for the trigger expression. |
80 |
|||
{$MSSQL.PERCENT_READAHEAD.MAX} | The maximum percentage of pages read/sec in anticipation of use - for the trigger expression. |
20 |
|||
{$MSSQL.PERCENT_RECOMPILATIONS.MAX} | The maximum percentage of Transact-SQL recompilations - for the trigger expression. |
10 |
|||
{$MSSQL.PORT} | MSSQL TCP port. |
1433 |
|||
{$MSSQL.USER} | MSSQL username. |
<Put your username here> |
|||
{$MSSQL.WORKTABLESFROMCACHE_RATIO.MIN.CRIT} | The minimum percentage of the worktables from cache ratio - for the High trigger expression. |
90 |
|||
{$MSSQL.WORK_FILES.MAX} | The maximum number of work files created per second - for the trigger expression. |
20 |
|||
{$MSSQL.WORK_TABLES.MAX} | The maximum number of work tables created per second - for the trigger expression. |
20 |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Availability groups discovery | Discovery of the existing availability groups. |
ODBC | db.odbc.discovery[availabilitygroups,"{$MSSQL.DSN}"] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:1d |
Database discovery | Scanning databases in DBMS. |
ODBC | db.odbc.discovery[dbname,"{$MSSQL.DSN}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: Filter: AND- {#DBNAME} MATCHESREGEX - {#DBNAME} NOTMATCHES_REGEX |
Job discovery | Scanning jobs in DBMS. |
ODBC | db.odbc.discovery[jobname,"{$MSSQL.DSN}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: Filter: AND- {#JOBNAME} MATCHESREGEX - {#JOBNAME} NOTMATCHES_REGEX |
Local database discovery | Discovery of the local availability databases. |
ODBC | db.odbc.discovery[localdb,"{$MSSQL.DSN}"] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:1d |
Mirroring discovery | To see the row for a database other than master or tempdb, you must either be the database owner or have at least ALTER ANY DATABASE or VIEW ANY DATABASE server-level permission or CREATE DATABASE permission in the master database. To see non-NULL values on a mirror database, you must be a member of the sysadmin fixed server role. |
ODBC | db.odbc.discovery[mirrors,"{$MSSQL.DSN}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Non-local database discovery | Discovery of the non-local (not local to the SQL Server instance) availability databases. |
ODBC | db.odbc.discovery[non-localdb,"{$MSSQL.DSN}"] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:1d |
Replication discovery | Discovery of the database replicas. |
ODBC | db.odbc.discovery[replicas,"{$MSSQL.DSN}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Group | Name | Description | Type | Key and additional info | ||||||
---|---|---|---|---|---|---|---|---|---|---|
MSSQL | MSSQL: Service's TCP port state | Test the availability of MS SQL Server on a TCP port. |
SIMPLE | net.tcp.service[tcp,{HOST.CONN},{$MSSQL.PORT}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
||||||
MSSQL | MSSQL: Version | MS SQL Server version. |
DEPENDENT | mssql.version Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
||||||
MSSQL | MSSQL: Uptime | MS SQL Server uptime in 'N days, hh:mm:ss' format. |
DEPENDENT | mssql.uptime Preprocessing: - JSONPATH: |
||||||
MSSQL | MSSQL: Forwarded records per second | Number of records per second fetched through forwarded record pointers. |
DEPENDENT | mssql.forwardedrecordssec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Full scans per second | Number of unrestricted full scans per second. These can be either base-table or full-index scans. Values greater than 1 or 2 indicate that there are table / Index page scans. If that is combined with high CPU, this counter requires further investigation, otherwise, if the full scans are on small tables, it can be ignored. |
DEPENDENT | mssql.fullscanssec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Index searches per second | Number of index searches per second. These are used to start a range scan, reposition a range scan, revalidate a scan point, fetch a single index record, and search down the index to locate where to insert a new row. |
DEPENDENT | mssql.indexsearchessec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Page splits per second | Number of page splits per second that occur as the result of overflowing index pages. |
DEPENDENT | mssql.pagesplitssec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Work files created per second | Number of work files created per second. For example, work files can be used to store temporary results for hash joins and hash aggregates. |
DEPENDENT | mssql.workfilescreatedsec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Work tables created per second | Number of work tables created per second. For example, work tables can be used to store temporary results for query spool, lob variables, XML variables, and cursors. |
DEPENDENT | mssql.worktablescreatedsec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Table lock escalations per second | Number of times locks on a table were escalated to the TABLE or HoBT granularity. |
DEPENDENT | mssql.tablelockescalations.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Worktables from cache ratio | Percentage of work tables created where the initial two pages of the work table were not allocated but were immediately available from the work table cache. |
DEPENDENT | mssql.worktablesfromcache_ratio Preprocessing: - JSONPATH: |
||||||
MSSQL | MSSQL: Buffer cache hit ratio | Indicates the percentage of pages found in the buffer cache without having to read from disk. The ratio is the total number of cache hits divided by the total number of cache lookups over the last few thousand page accesses. After a long period of time, the ratio changes very little. Since reading from the cache is much less expensive than reading from the disk, a higher value is preferred for this item. To increase the buffer cache hit ratio, consider increasing the amount of memory available to SQL Server or using the buffer pool extension feature. |
DEPENDENT | mssql.buffercachehit_ratio Preprocessing: - JSONPATH: |
||||||
MSSQL | MSSQL: Checkpoint pages per second | Indicates the number of pages flushed to disk per second by a checkpoint or other operation which required all dirty pages to be flushed. |
DEPENDENT | mssql.checkpointpagessec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Database pages | Indicates the number of pages in the buffer pool with database content. |
DEPENDENT | mssql.database_pages Preprocessing: - JSONPATH: |
||||||
MSSQL | MSSQL: Free list stalls per second | Indicates the number of requests per second that had to wait for a free page. |
DEPENDENT | mssql.freeliststallssec.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
||||||
MSSQL | MSSQL: Lazy writes per second | Indicates the number of buffers written per second by the buffer manager's lazy writer. The lazy writer is a system process that flushes out batches of dirty, aged buffers (buffers that contain changes that must be written back to disk before the buffer can be reused for a different page) and makes them available to user processes. The lazy writer eliminates the need to perform frequent checkpoints in order to create available buffers. |
DEPENDENT | mssql.lazywritessec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Page life expectancy | Indicates the number of seconds a page will stay in the buffer pool without references. |
DEPENDENT | mssql.pagelifeexpectancy Preprocessing: - JSONPATH: |
||||||
MSSQL | MSSQL: Page lookups per second | Indicates the number of requests per second to find a page in the buffer pool. |
DEPENDENT | mssql.pagelookupssec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Page reads per second | Indicates the number of physical database page reads that are issued per second. This statistic displays the total number of physical page reads across all databases. Because physical I/O is expensive, you may be able to minimize the cost, either by using a larger data cache, intelligent indexes, and more efficient queries, or by changing the database design. |
DEPENDENT | mssql.pagereadssec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Page writes per second | Indicates the number of physical database page writes that are issued per second. |
DEPENDENT | mssql.pagewritessec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Read-ahead pages per second | Indicates the number of pages read per second in anticipation of use. |
DEPENDENT | mssql.readaheadpagessec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Target pages | The optimal number of pages in the buffer pool. |
DEPENDENT | mssql.target_pages Preprocessing: - JSONPATH: |
||||||
MSSQL | MSSQL: Total data file size | Total size of all data files. |
DEPENDENT | mssql.datafilessize Preprocessing: - JSONPATH: - MULTIPLIER: |
||||||
MSSQL | MSSQL: Total log file size | Total size of all the transaction log files. |
DEPENDENT | mssql.logfilessize Preprocessing: - JSONPATH: - MULTIPLIER: |
||||||
MSSQL | MSSQL: Total log file used size | The cumulative used size of all the log files in the database. |
DEPENDENT | mssql.logfilesused_size Preprocessing: - JSONPATH: - MULTIPLIER: |
||||||
MSSQL | MSSQL: Total transactions per second | Total number of transactions started for all databases per second. |
DEPENDENT | mssql.transactionssec.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
||||||
MSSQL | MSSQL: Logins per second | Total number of logins started per second. This does not include pooled connections. Any value over 2 may indicate insufficient connection pooling. |
DEPENDENT | mssql.loginssec.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
||||||
MSSQL | MSSQL: Logouts per second | Total number of logout operations started per second. Any value over 2 may indicate insufficient connection pooling. |
DEPENDENT | mssql.logoutssec.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
||||||
MSSQL | MSSQL: Number of blocked processes | Number of currently blocked processes. |
DEPENDENT | mssql.processes_blocked Preprocessing: - JSONPATH: |
||||||
MSSQL | MSSQL: Number users connected | Number of users connected to MS SQL Server. |
DEPENDENT | mssql.user_connections Preprocessing: - JSONPATH: |
||||||
MSSQL | MSSQL: Average latch wait time | Average latch wait time (in milliseconds) for latch requests that had to wait. |
CALCULATED | mssql.averagelatchwait_time Expression: (last(//mssql.average_latch_wait_time_raw) - last(//mssql.average_latch_wait_time_raw,#2)) / (last(//mssql.average_latch_wait_time_base) - last(//mssql.average_latch_wait_time_base,#2) + (last(//mssql.average_latch_wait_time_base) - last(//mssql.average_latch_wait_time_base,#2)=0)) |
||||||
MSSQL | MSSQL: Latch waits per second | The number of latch requests that could not be granted immediately. Latches are lightweight means of holding a very transient server resource, such as an address in memory. |
DEPENDENT | mssql.latchwaitssec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Total latch wait Time | Total latch wait time (in milliseconds) for latch requests in the last second. This value should stay stable compared to the number of latch waits per second. |
DEPENDENT | mssql.totallatchwaittime Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
||||||
MSSQL | MSSQL: Total average wait time | The average wait time, in milliseconds, for each lock request that had to wait. |
CALCULATED | mssql.averagewaittime Expression: (last(//mssql.average_wait_time_raw) - last(//mssql.average_wait_time_raw,#2)) / (last(//mssql.average_wait_time_base) - last(//mssql.average_wait_time_base,#2) + (last(//mssql.average_wait_time_base) - last(//mssql.average_wait_time_base,#2)=0)) |
||||||
MSSQL | MSSQL: Total lock requests per second | Number of new locks and lock conversions per second requested from the lock manager. |
DEPENDENT | mssql.lockrequestssec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Total lock requests per second that timed out | Number of timed out lock requests per second, including requests for NOWAIT locks. |
DEPENDENT | mssql.locktimeoutssec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Total lock requests per second that required waiting | Number of lock requests per second that required the caller to wait. |
DEPENDENT | mssql.lockwaitssec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Lock wait time | Average of total wait time (in milliseconds) for locks in the last second. |
DEPENDENT | mssql.lockwaittime Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Total lock requests per second that have deadlocks | Number of lock requests per second that resulted in a deadlock. |
DEPENDENT | mssql.numberdeadlockssec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Granted Workspace Memory | Specifies the total amount of memory currently granted to executing processes, such as hash, sort, bulk copy, and index creation operations. |
DEPENDENT | mssql.grantedworkspacememory Preprocessing: - JSONPATH: - MULTIPLIER: |
||||||
MSSQL | MSSQL: Maximum workspace memory | Indicates the maximum amount of memory available for executing processes, such as hash, sort, bulk copy, and index creation operations. |
DEPENDENT | mssql.maximumworkspacememory Preprocessing: - JSONPATH: - MULTIPLIER: |
||||||
MSSQL | MSSQL: Memory grants outstanding | Specifies the total number of processes that have successfully acquired a workspace memory grant. |
DEPENDENT | mssql.memorygrantsoutstanding Preprocessing: - JSONPATH: |
||||||
MSSQL | MSSQL: Memory grants pending | Specifies the total number of processes waiting for a workspace memory grant. |
DEPENDENT | mssql.memorygrantspending Preprocessing: - JSONPATH: |
||||||
MSSQL | MSSQL: Target server memory | Indicates the ideal amount of memory the server can consume. |
DEPENDENT | mssql.targetservermemory Preprocessing: - JSONPATH: - MULTIPLIER: |
||||||
MSSQL | MSSQL: Total server memory | Specifies the amount of memory the server has committed using the memory manager. |
DEPENDENT | mssql.totalservermemory Preprocessing: - JSONPATH: - MULTIPLIER: |
||||||
MSSQL | MSSQL: Cache hit ratio | Ratio between cache hits and lookups. |
DEPENDENT | mssql.cachehitratio Preprocessing: - JSONPATH: |
||||||
MSSQL | MSSQL: Cache object counts | Number of cache objects in the cache. |
DEPENDENT | mssql.cacheobjectcounts Preprocessing: - JSONPATH: |
||||||
MSSQL | MSSQL: Cache objects in use | Number of cache objects in use. |
DEPENDENT | mssql.cacheobjectsin_use Preprocessing: - JSONPATH: |
||||||
MSSQL | MSSQL: Cache pages | Number of 8-kilobyte (KB) pages used by cache objects. |
DEPENDENT | mssql.cache_pages Preprocessing: - JSONPATH: |
||||||
MSSQL | MSSQL: Errors per second (DB offline errors) | Number of errors per second. |
DEPENDENT | mssql.offlineerrorssec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Errors per second (Info errors) | Number of errors per second. |
DEPENDENT | mssql.infoerrorssec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Errors per second (Kill connection errors) | Number of errors per second. |
DEPENDENT | mssql.killconnectionerrorssec.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
||||||
MSSQL | MSSQL: Errors per second (User errors) | Number of errors per second. |
DEPENDENT | mssql.usererrorssec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Total errors per second | Number of errors per second. |
DEPENDENT | mssql.errorssec.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
||||||
MSSQL | MSSQL: Auto-param attempts per second | Number of auto-parameterization attempts per second. The total should be the sum of the failed, safe, and unsafe auto-parameterizations. Auto-parameterization occurs when an instance of SQL Server tries to parameterize a Transact-SQL request by replacing some literals with parameters to me reuse of the resulting cached execution plan across multiple similar-looking requests possible. Note that auto-parameterizations are also known as simple parameterizations in the newer versions of SQL Server. This counter does not include forced parameterizations. |
DEPENDENT | mssql.autoparamattemptssec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Batch requests per second | Number of Transact-SQL command batches received per second. This statistic is affected by all constraints (such as I/O, number of users, cache size, complexity of requests, and so on). High batch requests mean good throughput. |
DEPENDENT | mssql.batchrequestssec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Percent of Adhoc queries running | The ratio of SQL compilations per second to Batch requests per second in percentage. |
CALCULATED | mssql.percentofadhoc_queries Expression: last(//mssql.sql_compilations_sec.rate) * 100 / (last(//mssql.batch_requests_sec.rate) + (last(//mssql.batch_requests_sec.rate)=0)) |
||||||
MSSQL | MSSQL: Percent of Recompiled Transact-SQL Objects | The ratio of SQL re-compilations per second to SQL compilations per second in percentage. |
CALCULATED | mssql.percentrecompilationsto_compilations Expression: last(//mssql.sql_recompilations_sec.rate) * 100 / (last(//mssql.sql_compilations_sec.rate) + (last(//mssql.sql_compilations_sec.rate)=0)) |
||||||
MSSQL | MSSQL: Full scans to Index searches ratio | The ratio of Full scans per second to Index searches per second. The threshold recommendation is strictly for OLTP workloads. |
CALCULATED | mssql.scantosearch Expression: last(//mssql.full_scans_sec.rate) / (last(//mssql.index_searches_sec.rate) + (last(//mssql.index_searches_sec.rate)=0)) |
||||||
MSSQL | MSSQL: Failed auto-params per second | Number of failed auto-parameterization attempts per second. This number should be small. Note that auto-parameterizations are also known as simple parameterizations in the newer versions of SQL Server. |
DEPENDENT | mssql.failedautoparamssec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Safe auto-params per second | Number of safe auto-parameterization attempts per second. Safe refers to a determination that a cached execution plan can be shared between different similar-looking Transact-SQL statements. SQL Server makes many auto-parameterization attempts some of which turn out to be safe and others fail. Note that auto-parameterizations are also known as simple parameterizations in the newer versions of SQL Server. This does not include forced parameterizations. |
DEPENDENT | mssql.safeautoparamssec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: SQL compilations per second | Number of SQL compilations per second. Indicates the number of times the compile code path is entered. Includes runs caused by statement-level recompilations in SQL Server. After SQL Server user activity is stable, this value reaches a steady state. |
DEPENDENT | mssql.sqlcompilationssec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: SQL re-compilations per second | Number of statement recompiles per second. Counts the number of times statement recompiles are triggered. Generally, you want the recompiles to be low. |
DEPENDENT | mssql.sqlrecompilationssec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Unsafe auto-params per second | Number of unsafe auto-parameterization attempts per second. For example, the query has some characteristics that prevent the cached plan from being shared. These are designated as unsafe. This does not count the number of forced parameterizations. |
DEPENDENT | mssql.unsafeautoparamssec.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL: Total transactions number | The number of currently active transactions of all types. |
DEPENDENT | mssql.transactions Preprocessing: - JSONPATH: |
||||||
MSSQL | MSSQL DB '{#DBNAME}': State | 0 = ONLINE 1 = RESTORING 2 = RECOVERING |
SQL Server 2008 and later 3 = RECOVERY_PENDING |
SQL Server 2008 and later 4 = SUSPECT 5 = EMERGENCY |
SQL Server 2008 and later 6 = OFFLINE |
SQL Server 2008 and later 7 = COPYING |
Azure SQL Database Active Geo-Replication 10 = OFFLINE_SECONDARY |
Azure SQL Database Active Geo-Replication | DEPENDENT | mssql.db.state["{#DBNAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
MSSQL | MSSQL DB '{#DBNAME}': Active transactions | Number of active transactions for the database. |
DEPENDENT | mssql.db.active_transactions["{#DBNAME}"] Preprocessing: - JSONPATH: |
||||||
MSSQL | MSSQL DB '{#DBNAME}': Data file size | Cumulative size of all the data files in the database including any automatic growth. Monitoring this counter is useful, for example, for determining the correct size of tempdb. |
DEPENDENT | mssql.db.datafilessize["{#DBNAME}"] Preprocessing: - JSONPATH: - MULTIPLIER: |
||||||
MSSQL | MSSQL DB '{#DBNAME}': Log bytes flushed per second | Total number of log bytes flushed per second. Useful for determining trends and utilization of the transaction log. |
DEPENDENT | mssql.db.logbytesflushedsec.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
||||||
MSSQL | MSSQL DB '{#DBNAME}': Log file size | Cumulative size of all the transaction log files in the database. |
DEPENDENT | mssql.db.logfilessize["{#DBNAME}"] Preprocessing: - JSONPATH: - MULTIPLIER: |
||||||
MSSQL | MSSQL DB '{#DBNAME}': Log file used size | Cumulative used size of all the log files in the database. |
DEPENDENT | mssql.db.logfilesused_size["{#DBNAME}"] Preprocessing: - JSONPATH: - MULTIPLIER: |
||||||
MSSQL | MSSQL DB '{#DBNAME}': Log flushes per second | Number of log flushes per second. |
DEPENDENT | mssql.db.logflushessec.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||||||
MSSQL | MSSQL DB '{#DBNAME}': Log flush waits per second | Number of commits per second waiting for the log flush. |
DEPENDENT | mssql.db.logflushwaitssec.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
||||||
MSSQL | MSSQL DB '{#DBNAME}': Log flush wait time | Total wait time (in milliseconds) to flush the log. On an AlwaysOn secondary database, this value indicates the wait time for log records to be hardened to disk. |
DEPENDENT | mssql.db.logflushwaittime["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
||||||
MSSQL | MSSQL DB '{#DBNAME}': Log growths | Total number of times the transaction log for the database has been expanded. |
DEPENDENT | mssql.db.log_growths["{#DBNAME}"] Preprocessing: - JSONPATH: |
||||||
MSSQL | MSSQL DB '{#DBNAME}': Log shrinks | Total number of times the transaction log for the database has been shrunk. |
DEPENDENT | mssql.db.log_shrinks["{#DBNAME}"] Preprocessing: - JSONPATH: |
||||||
MSSQL | MSSQL DB '{#DBNAME}': Log truncations | Number of times the transaction log has been shrunk. |
DEPENDENT | mssql.db.log_truncations["{#DBNAME}"] Preprocessing: - JSONPATH: |
||||||
MSSQL | MSSQL DB '{#DBNAME}': Percent log used | Percentage of space in the log that is in use. |
DEPENDENT | mssql.db.percentlogused["{#DBNAME}"] Preprocessing: - JSONPATH: |
||||||
MSSQL | MSSQL DB '{#DBNAME}': Transactions per second | Number of transactions started for the database per second. |
DEPENDENT | mssql.db.transactionssec.rate["{#DBNAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
||||||
MSSQL | MSSQL DB '{#DBNAME}': Last diff backup duration | Duration of the last differential backup. |
DEPENDENT | mssql.backup.diff.duration["{#DBNAME}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
||||||
MSSQL | MSSQL DB '{#DBNAME}': Last diff backup (time ago) | The amount of time since the last differential backup. |
DEPENDENT | mssql.backup.diff["{#DBNAME}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
||||||
MSSQL | MSSQL DB '{#DBNAME}': Last full backup duration | Duration of the last full backup. |
DEPENDENT | mssql.backup.full.duration["{#DBNAME}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
||||||
MSSQL | MSSQL DB '{#DBNAME}': Last full backup (time ago) | The amount of time since the last full backup. |
DEPENDENT | mssql.backup.full["{#DBNAME}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
||||||
MSSQL | MSSQL DB '{#DBNAME}': Last log backup duration | Duration of the last log backup. |
DEPENDENT | mssql.backup.log.duration["{#DBNAME}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
||||||
MSSQL | MSSQL DB '{#DBNAME}': Last log backup | The amount of time since the last log backup. |
DEPENDENT | mssql.backup.log["{#DBNAME}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
||||||
MSSQL | MSSQL AG '{#GROUP_NAME}': Primary replica recovery health | Indicates the recovery health of the primary replica: 0 = In progress 1 = Online 2 = Unavailable |
DEPENDENT | mssql.primaryrecoveryhealth["{#GROUPNAME}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
||||||
MSSQL | MSSQL AG '{#GROUP_NAME}': Primary replica name | Name of the server instance that is hosting the current primary replica. |
DEPENDENT | mssql.primaryreplica["{#GROUPNAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
||||||
MSSQL | MSSQL AG '{#GROUP_NAME}': Secondary replica recovery health | Indicates the recovery health of a secondary replica: 0 = In progress 1 = Online 2 = Unavailable |
DEPENDENT | mssql.secondaryrecoveryhealth["{#GROUPNAME}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
||||||
MSSQL | MSSQL AG '{#GROUP_NAME}': Synchronization health | Reflects a rollup of the synchronization_health of all availability replicas in the availability group: 0: Not healthy. None of the availability replicas have a healthy synchronization. 1: Partially healthy. The synchronization of some, but not all, availability replicas is healthy. 2: Healthy. The synchronization of every availability replica is healthy. |
DEPENDENT | mssql.synchronizationhealth["{#GROUPNAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
||||||
MSSQL | MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': State | 0 = Online 1 = Restoring 2 = Recovering 3 = Recovery pending 4 = Suspect 5 = Emergency 6 = Offline |
DEPENDENT | mssql.localdb.state["{#DBNAME}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
||||||
MSSQL | MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': Suspended | Database state: 0 = Resumed 1 = Suspended |
DEPENDENT | mssql.localdb.issuspended["{#DBNAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
||||||
MSSQL | MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': Synchronization health | Reflects the intersection of the synchronization state of a database that is joined to the availability group on the availability replica and the availability mode of the availability replica (synchronous-commit or asynchronous-commit mode): 0 = Not healthy. The synchronizationstate of the database is 0 (NOT SYNCHRONIZING). 1 = Partially healthy. A database on a synchronous-commit availability replica is considered partially healthy if synchronizationstate is 1 (SYNCHRONIZING). 2 = Healthy. A database on an synchronous-commit availability replica is considered healthy if synchronizationstate is 2 (SYNCHRONIZED), and a database on an asynchronous-commit availability replica is considered healthy if synchronizationstate is 1 (SYNCHRONIZING). |
DEPENDENT | mssql.localdb.synchronizationhealth["{#DBNAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
||||||
MSSQL | MSSQL AG '{#GROUPNAME}' Non-Local DB '*{#REPLICANAME}*{#DBNAME}': Log queue size | Amount of the log records of the primary database that has not been sent to the secondary databases. |
DEPENDENT | mssql.non-localdb.logsendqueuesize["{#GROUPNAME}*{#REPLICANAME}*{#DBNAME}"] Preprocessing: - JSONPATH: - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
||||||
MSSQL | MSSQL AG '{#GROUPNAME}' Non-Local DB '*{#REPLICANAME}*{#DBNAME}': Redo log queue size | Amount of log records in the log files of the secondary replica that has not yet been redone. |
DEPENDENT | mssql.non-localdb.redoqueuesize["{#GROUPNAME}{#REPLICA_NAME}{#DBNAME}"] Preprocessing: - JSONPATH: - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
||||||
MSSQL | MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Connected state | Whether a secondary replica is currently connected to the primary replica: 0 : Disconnected. The response of an availability replica to the DISCONNECTED state depends on its role: On the primary replica, if a secondary replica is disconnected, its secondary databases are marked as NOT SYNCHRONIZED on the primary replica, which waits for the secondary to reconnect; On a secondary replica, upon detecting that it is disconnected, the secondary replica attempts to reconnect to the primary replica. 1 : Connected. Each primary replica tracks the connection state for every secondary replica in the same availability group. Secondary replicas track the connection state of only the primary replica. |
DEPENDENT | mssql.replica.connectedstate["{#GROUPNAME}{#REPLICANAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
||||||
MSSQL | MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Is local | Whether the replica is local: 0 = Indicates a remote secondary replica in an availability group whose primary replica is hosted by the local server instance. This value occurs only on the primary replica location. 1 = Indicates a local replica. On secondary replicas, this is the only available value for the availability group to which the replica belongs. |
DEPENDENT | mssql.replica.islocal["{#GROUPNAME}{#REPLICANAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
||||||
MSSQL | MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Join state | 0 = Not joined 1 = Joined, standalone instance 2 = Joined, failover cluster instance |
DEPENDENT | mssql.replica.joinstate["{#GROUPNAME}{#REPLICANAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
||||||
MSSQL | MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Operational state | Current operational state of the replica: 0 = Pending failover 1 = Pending 2 = Online 3 = Offline 4 = Failed 5 = Failed, no quorum 6 = Not local |
DEPENDENT | mssql.replica.operationalstate["{#GROUPNAME}{#REPLICANAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
||||||
MSSQL | MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Recovery health | Rollup of the databasestate column of the sys.dmhadrdatabasereplicastates dynamic management view: 0 : In progress. At least one joined database has a database state other than ONLINE (databasestate is not 0). 1 : Online. All the joined databases have a database state of ONLINE (database_state is 0). |
DEPENDENT | mssql.replica.recoveryhealth["{#GROUPNAME}{#REPLICANAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
||||||
MSSQL | MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Role | Current Always On availability groups role of a local replica or a connected remote replica: 0 = Resolving 1 = Primary 2 = Secondary |
DEPENDENT | mssql.replica.role["{#GROUPNAME}{#REPLICANAME}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
||||||
MSSQL | MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Sync health | Reflects a rollup of the database synchronization state (synchronization_state)of all joined availability databases (also known as replicas) and the availability mode of the replica (synchronous-commit or asynchronous-commit mode). The rollup will reflect the least healthy accumulated state of the databases on the replica: 0 : Not healthy. At least one joined database is in the NOT SYNCHRONIZING state. 1 : Partially healthy. Some replicas are not in the target synchronization state: synchronous-commit replicas should be synchronized, and asynchronous-commit replicas should be synchronizing. 2 : Healthy. All replicas are in the target synchronization state: synchronous-commit replicas are synchronized, and asynchronous-commit replicas are synchronizing. |
DEPENDENT | mssql.replica.synchronizationhealth["{#GROUPNAME}{#REPLICANAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
||||||
MSSQL | MSSQL Mirroring '{#DBNAME}': Role | Current role of the local database plays in the database mirroring session. 1 = Principal 2 = Mirror |
DEPENDENT | mssql.mirroring.role["{#DBNAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
||||||
MSSQL | MSSQL Mirroring '{#DBNAME}': Role sequence | The number of times that mirroring partners have switched the principal and mirror roles due to a failover or forced service. |
DEPENDENT | mssql.mirroring.rolesequence["{#DBNAME}"] Preprocessing: - JSONPATH: - SIMPLE CHANGE |
||||||
MSSQL | MSSQL Mirroring '{#DBNAME}': State | State of the mirror database and of the database mirroring session. 0 = Suspended 1 = Disconnected from the other partner 2 = Synchronizing 3 = Pending Failover 4 = Synchronized 5 = The partners are not synchronized. Failover is not possible now. 6 = The partners are synchronized. Failover is potentially possible. For information about the requirements for the failover, see Database Mirroring Operating Modes. |
DEPENDENT | mssql.mirroring.state["{#DBNAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
||||||
MSSQL | MSSQL Mirroring '{#DBNAME}': Witness state | State of the witness in the database mirroring session of the database: 0 = Unknown 1 = Connected 2 = Disconnected |
DEPENDENT | mssql.mirroring.witnessstate["{#DBNAME}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
||||||
MSSQL | MSSQL Mirroring '{#DBNAME}': Safety level | Safety setting for updates on the mirror database: 0 = Unknown state 1 = Off [asynchronous] 2 = Full [synchronous] |
DEPENDENT | mssql.mirroring.safetylevel["{#DBNAME}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
||||||
MSSQL | MSSQL Job '{#JOBNAME}': Last run date-time | The last date-time of the job run. |
DEPENDENT | mssql.job.lastrundatetime["{#JOBNAME}"] Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
||||||
MSSQL | MSSQL Job '{#JOBNAME}': Next run date-time | The next date-time of the job run. |
DEPENDENT | mssql.job.nextrundatetime["{#JOBNAME}"] Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
||||||
MSSQL | MSSQL Job '{#JOBNAME}': Last run status message | The informational message about the last run of the job. |
DEPENDENT | mssql.job.lastrunstatusmessage["{#JOBNAME}"] Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
||||||
MSSQL | MSSQL Job '{#JOBNAME}': Run status | The job status possible values: 0 ⇒ Failed 1 ⇒ Succeeded 2 ⇒ Retry 3 ⇒ Canceled 4 ⇒ Running |
DEPENDENT | mssql.job.runstatus["{#JOBNAME}"] Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
||||||
MSSQL | MSSQL Job '{#JOBNAME}': Run duration | Duration of the last run job. |
DEPENDENT | mssql.job.runduration["{#JOBNAME}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - DISCARDUNCHANGEDHEARTBEAT: |
||||||
Zabbix raw items | MSSQL: Get last backup | The item gets information about backup processes. |
ODBC | db.odbc.get[getlastbackup,"{$MSSQL.DSN}"] Expression: The text is too long. Please see the template. |
||||||
Zabbix raw items | MSSQL: Get job status | The item gets sql agent job status. |
ODBC | db.odbc.get[getjobstatus,"{$MSSQL.DSN}"] Expression: The text is too long. Please see the template. |
||||||
Zabbix raw items | MSSQL: Get performance counters | The item gets server global status information. |
ODBC | db.odbc.get[getstatusvariables,"{$MSSQL.DSN}"] Expression: The text is too long. Please see the template. |
||||||
Zabbix raw items | MSSQL: Average latch wait time raw | Average latch wait time (in milliseconds) for latch requests that had to wait. |
DEPENDENT | mssql.averagelatchwaittimeraw Preprocessing: - JSONPATH: |
||||||
Zabbix raw items | MSSQL: Average latch wait time base | For internal use only. |
DEPENDENT | mssql.averagelatchwaittimebase Preprocessing: - JSONPATH: |
||||||
Zabbix raw items | MSSQL: Total average wait time raw | Average amount of wait time (in milliseconds) for each lock request that resulted in a wait. Information for all locks. |
DEPENDENT | mssql.averagewaittime_raw Preprocessing: - JSONPATH: |
||||||
Zabbix raw items | MSSQL: Total average wait time base | For internal use only. |
DEPENDENT | mssql.averagewaittime_base Preprocessing: - JSONPATH: |
||||||
Zabbix raw items | MSSQL AG '{#GROUP_NAME}': Get replica states | Getting replica states - name, primary and secondary health, synchronization health. |
ODBC | db.odbc.get[{#GROUPNAME}replica_states,"{$MSSQL.DSN}"] Expression: The text is too long. Please see the template. |
||||||
Zabbix raw items | MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': Get local DB states | Getting the states of the local availability database. |
ODBC | db.odbc.get["{#GROUPNAME}{#DBNAME}localdb.states","{$MSSQL.DSN}"] Expression: The text is too long. Please see the template. |
||||||
Zabbix raw items | MSSQL AG '{#GROUPNAME}' Non-Local DB '*{#REPLICANAME}*{#DBNAME}': Get non-local DB states | Getting the states of the non-local availability database. |
ODBC | db.odbc.get["{#GROUPNAME}*{#REPLICANAME}*{#DBNAME}non-localdb.states","{$MSSQL.DSN}"] Expression: The text is too long. Please see the template. |
||||||
Zabbix raw items | MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': Get the replica state | Getting the database replica states. |
ODBC | db.odbc.get["{#GROUPNAME}{#REPLICANAME}replica.state","{$MSSQL.DSN}"] Expression: The text is too long. Please see the template. |
||||||
Zabbix raw items | MSSQL Mirroring '{#DBNAME}': Get the mirror state | Getting mirrors state |
ODBC | db.odbc.get["{#DBNAME}mirroringstate","{$MSSQL.DSN}"] Expression: The text is too long. Please see the template. |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MSSQL: Service is unavailable | The TCP port of the MS SQL Server service is currently unavailable. |
last(/MSSQL by ODBC/net.tcp.service[tcp,{HOST.CONN},{$MSSQL.PORT}])=0 |
DISASTER | |
MSSQL: Version has changed | MSSQL version has changed. Ack to close. |
last(/MSSQL by ODBC/mssql.version,#1)<>last(/MSSQL by ODBC/mssql.version,#2) and length(last(/MSSQL by ODBC/mssql.version))>0 |
INFO | Manual close: YES |
MSSQL: Service has been restarted | Uptime is less than 10 minutes. |
last(/MSSQL by ODBC/mssql.uptime)<10m |
INFO | Manual close: YES |
MSSQL: Failed to fetch info data | Zabbix has not received data for items for the last 30 minutes. |
nodata(/MSSQL by ODBC/mssql.uptime,30m)=1 |
INFO | Depends on: - MSSQL: Service is unavailable |
MSSQL: Too frequently using pointers | Rows with varchar columns can experience expansion when varchar values are updated with a longer string. In the case where the row cannot fit in the existing page, the row migrates and access to the row will traverse a pointer. This only happens on heaps (tables without clustered indexes). Evaluate clustered index for heap tables. In cases where clustered indexes cannot be used, drop non-clustered indexes, build a clustered index to reorg pages and rows, drop the clustered index, then recreate non-clustered indexes. |
last(/MSSQL by ODBC/mssql.forwarded_records_sec.rate) * 100 > 10 * last(/MSSQL by ODBC/mssql.batch_requests_sec.rate) |
WARNING | |
MSSQL: Number of work files created per second is high | Too many work files created per second to store temporary results for hash joins and hash aggregates. |
min(/MSSQL by ODBC/mssql.workfiles_created_sec.rate,5m)>{$MSSQL.WORK_FILES.MAX} |
AVERAGE | |
MSSQL: Number of work tables created per second is high | Too many work tables created per second to store temporary results for query spool, lob variables, XML variables, and cursors. |
min(/MSSQL by ODBC/mssql.worktables_created_sec.rate,5m)>{$MSSQL.WORK_TABLES.MAX} |
AVERAGE | |
MSSQL: Percentage of work tables available from the work table cache is low | A value less than 90% may indicate insufficient memory, since execution plans are being dropped, or on 32-bit systems, may indicate the need for an upgrade to a 64-bit system |
max(/MSSQL by ODBC/mssql.worktables_from_cache_ratio,5m)<{$MSSQL.WORKTABLES_FROM_CACHE_RATIO.MIN.CRIT} |
HIGH | |
MSSQL: Percentage of the buffer cache efficiency is low | Too low buffer cache hit ratio. |
max(/MSSQL by ODBC/mssql.buffer_cache_hit_ratio,5m)<{$MSSQL.BUFFER_CACHE_RATIO.MIN.CRIT} |
HIGH | |
MSSQL: Percentage of the buffer cache efficiency is low | Low buffer cache hit ratio. |
max(/MSSQL by ODBC/mssql.buffer_cache_hit_ratio,5m)<{$MSSQL.BUFFER_CACHE_RATIO.MIN.WARN} |
WARNING | Depends on: - MSSQL: Percentage of the buffer cache efficiency is low |
MSSQL: Number of rps waiting for a free page is high | Some requests have to wait for a free page. |
min(/MSSQL by ODBC/mssql.free_list_stalls_sec.rate,5m)>{$MSSQL.FREE_LIST_STALLS.MAX} |
WARNING | |
MSSQL: Number of buffers written per second by the lazy writer is high | The number of buffers written per second by the buffer manager's lazy writer exceeds the threshold. |
min(/MSSQL by ODBC/mssql.lazy_writes_sec.rate,5m)>{$MSSQL.LAZY_WRITES.MAX} |
WARNING | |
MSSQL: Page life expectancy is low | The page stays in the buffer pool without references of less time than the threshold value. |
max(/MSSQL by ODBC/mssql.page_life_expectancy,15m)<{$MSSQL.PAGE_LIFE_EXPECTANCY.MIN} |
HIGH | |
MSSQL: Number of physical database page reads per second is high | The physical database page reads are issued too frequently. |
min(/MSSQL by ODBC/mssql.page_reads_sec.rate,5m)>{$MSSQL.PAGE_READS.MAX} |
WARNING | |
MSSQL: Number of physical database page writes per second is high | The physical database page writes are issued too frequently. |
min(/MSSQL by ODBC/mssql.page_writes_sec.rate,5m)>{$MSSQL.PAGE_WRITES.MAX} |
WARNING | |
MSSQL: Too many physical reads occurring | If this value makes up even a sizeable minority of the total Page Reads/sec (say, greater than 20% of the total page reads), you may have too many physical reads occurring. |
last(/MSSQL by ODBC/mssql.readahead_pages_sec.rate) > {$MSSQL.PERCENT_READAHEAD.MAX} / 100 * last(/MSSQL by ODBC/mssql.page_reads_sec.rate) |
WARNING | |
MSSQL: Total average wait time for locks is high | An average wait time longer than 500ms may indicate excessive blocking. This value should generally correlate to 'Lock Waits/sec' and move up or down with it accordingly. |
min(/MSSQL by ODBC/mssql.average_wait_time,5m)>{$MSSQL.AVERAGE_WAIT_TIME.MAX} |
WARNING | |
MSSQL: Total number of locks per second is high | Number of new locks and lock conversions per second requested from the lock manager is high. |
min(/MSSQL by ODBC/mssql.lock_requests_sec.rate,5m)>{$MSSQL.LOCK_REQUESTS.MAX} |
WARNING | |
MSSQL: Total lock requests per second that timed out is high | The total number of timed out lock requests per second, including requests for NOWAIT locks, is high. |
min(/MSSQL by ODBC/mssql.lock_timeouts_sec.rate,5m)>{$MSSQL.LOCK_TIMEOUTS.MAX} |
WARNING | |
MSSQL: Some blocking is occurring for 5m | Values greater than zero indicate at least some blocking is occurring, while a value of zero can quickly eliminate blocking as a potential root-cause problem. |
min(/MSSQL by ODBC/mssql.lock_waits_sec.rate,5m)>0 |
AVERAGE | |
MSSQL: Number of deadlock is high | Too many deadlocks are occurring currently. |
min(/MSSQL by ODBC/mssql.number_deadlocks_sec.rate,5m)>{$MSSQL.DEADLOCKS.MAX} |
AVERAGE | |
MSSQL: Percent of adhoc queries running is high | The lower this value is the better. High values often indicate excessive adhoc querying and should be as low as possible. If excessive adhoc querying is happening, try rewriting the queries as procedures or invoke the queries using sp_executeSQL. When rewriting isn't possible, consider using a plan guide or setting the database to parameterization forced mode. |
min(/MSSQL by ODBC/mssql.percent_of_adhoc_queries,15m) > {$MSSQL.PERCENT_COMPILATIONS.MAX} |
WARNING | |
MSSQL: Percent of times statement recompiles is high | This number should be at or near zero, since recompiles can cause deadlocks and exclusive compile locks. This counter's value should follow in proportion to “Batch Requests/sec” and “SQL Compilations/sec”. |
min(/MSSQL by ODBC/mssql.percent_recompilations_to_compilations,15m) > {$MSSQL.PERCENT_RECOMPILATIONS.MAX} |
WARNING | |
MSSQL: Number of index and table scans exceeds index searches in the last 15m | Index searches are preferable to index and table scans. For OLTP applications, optimize for more index searches and less scans (preferably, 1 full scan for every 1000 index searches). Index and table scans are expensive I/O operations. |
min(/MSSQL by ODBC/mssql.scan_to_search,15m) > 0.001 |
WARNING | |
MSSQL DB '{#DBNAME}': State is {ITEM.VALUE} | The DB has a non-working state. |
last(/MSSQL by ODBC/mssql.db.state["{#DBNAME}"])>1 |
HIGH | |
MSSQL DB '{#DBNAME}': Number of commits waiting for the log flush is high | Too many commits are waiting for the log flush. |
min(/MSSQL by ODBC/mssql.db.log_flush_waits_sec.rate["{#DBNAME}"],5m)>{$MSSQL.LOG_FLUSH_WAITS.MAX:"{#DBNAME}"} |
WARNING | |
MSSQL DB '{#DBNAME}': Total wait time to flush the log is high | The wait time to flush the log is too long. |
min(/MSSQL by ODBC/mssql.db.log_flush_wait_time["{#DBNAME}"],5m)>{$MSSQL.LOG_FLUSH_WAIT_TIME.MAX:"{#DBNAME}"} |
WARNING | |
MSSQL DB '{#DBNAME}': Percent of log using is high | There's not enough space left in the log. |
min(/MSSQL by ODBC/mssql.db.percent_log_used["{#DBNAME}"],5m)>{$MSSQL.PERCENT_LOG_USED.MAX:"{#DBNAME}"} |
WARNING | |
MSSQL DB '{#DBNAME}': Diff backup is old | The differential backup has not been executed for a long time. |
last(/MSSQL by ODBC/mssql.backup.diff["{#DBNAME}"])>{$MSSQL.BACKUP_DIFF.CRIT:"{#DBNAME}"} |
HIGH | Manual close: YES |
MSSQL DB '{#DBNAME}': Diff backup is old | The differential backup has not been executed for a long time. |
last(/MSSQL by ODBC/mssql.backup.diff["{#DBNAME}"])>{$MSSQL.BACKUP_DIFF.WARN:"{#DBNAME}"} |
WARNING | Manual close: YES Depends on: - MSSQL DB '{#DBNAME}': Diff backup is old |
MSSQL DB '{#DBNAME}': Full backup is old | The full backup has not been executed for a long time. |
last(/MSSQL by ODBC/mssql.backup.full["{#DBNAME}"])>{$MSSQL.BACKUP_FULL.CRIT:"{#DBNAME}"} |
HIGH | Manual close: YES |
MSSQL DB '{#DBNAME}': Full backup is old | The full backup has not been executed for a long time. |
last(/MSSQL by ODBC/mssql.backup.full["{#DBNAME}"])>{$MSSQL.BACKUP_FULL.WARN:"{#DBNAME}"} |
WARNING | Manual close: YES Depends on: - MSSQL DB '{#DBNAME}': Full backup is old |
MSSQL DB '{#DBNAME}': Log backup is old | The log backup has not been executed for a long time. |
last(/MSSQL by ODBC/mssql.backup.log["{#DBNAME}"])>{$MSSQL.BACKUP_LOG.CRIT:"{#DBNAME}"} |
HIGH | Manual close: YES |
MSSQL DB '{#DBNAME}': Log backup is old | The log backup has not been executed for a long time. |
last(/MSSQL by ODBC/mssql.backup.log["{#DBNAME}"])>{$MSSQL.BACKUP_LOG.WARN:"{#DBNAME}"} |
WARNING | Manual close: YES Depends on: - MSSQL DB '{#DBNAME}': Log backup is old |
MSSQL AG '{#GROUP_NAME}': Primary replica recovery health in progress | The primary replica is in the synchronization process. |
last(/MSSQL by ODBC/mssql.primary_recovery_health["{#GROUP_NAME}"])=0 |
WARNING | |
MSSQL AG '{#GROUP_NAME}': Secondary replica recovery health in progress | The secondary replica is in the synchronization process. |
last(/MSSQL by ODBC/mssql.secondary_recovery_health["{#GROUP_NAME}"])=0 |
WARNING | |
MSSQL AG '{#GROUP_NAME}': All replicas unhealthy | None of the availability replicas have a healthy synchronization. |
last(/MSSQL by ODBC/mssql.synchronization_health["{#GROUP_NAME}"])=0 |
DISASTER | |
MSSQL AG '{#GROUP_NAME}': Some replicas unhealthy | The synchronization health of some, but not all, availability replicas is healthy. |
last(/MSSQL by ODBC/mssql.synchronization_health["{#GROUP_NAME}"])=1 |
HIGH | |
MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': "{#DBNAME}" is {ITEM.VALUE} | The local availability database has a non-working state. |
last(/MSSQL by ODBC/mssql.local_db.state["{#DBNAME}"])>0 |
WARNING | |
MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': "{#DBNAME}" is Not healthy | The synchronization state of the local availability database is NOT SYNCHRONIZING. |
last(/MSSQL by ODBC/mssql.local_db.synchronization_health["{#DBNAME}"])=0 |
HIGH | |
MSSQL AG '{#GROUP_NAME}' Local DB '{#DBNAME}': "{#DBNAME}" is Partially healthy | A database on a synchronous-commit availability replica is considered partially healthy if synchronization state is SYNCHRONIZING. |
last(/MSSQL by ODBC/mssql.local_db.synchronization_health["{#DBNAME}"])=1 |
AVERAGE | |
MSSQL AG '{#GROUPNAME}' Non-Local DB '*{#REPLICANAME}*{#DBNAME}': Log queue size is growing | The log records of the primary database are not sent to the secondary databases. |
last(/MSSQL by ODBC/mssql.non-local_db.log_send_queue_size["{#GROUP_NAME}*{#REPLICA_NAME}*{#DBNAME}"],#1)>last(/MSSQL by ODBC/mssql.non-local_db.log_send_queue_size["{#GROUP_NAME}*{#REPLICA_NAME}*{#DBNAME}"],#2) and last(/MSSQL by ODBC/mssql.non-local_db.log_send_queue_size["{#GROUP_NAME}*{#REPLICA_NAME}*{#DBNAME}"],#2)>last(/MSSQL by ODBC/mssql.non-local_db.log_send_queue_size["{#GROUP_NAME}*{#REPLICA_NAME}*{#DBNAME}"],#3) |
HIGH | |
MSSQL AG '{#GROUPNAME}' Non-Local DB '*{#REPLICANAME}*{#DBNAME}': Redo log queue size is growing | The log records in the log files of the secondary replica have not yet been redone. |
last(/MSSQL by ODBC/mssql.non-local_db.redo_queue_size["{#GROUP_NAME}*{#REPLICA_NAME}*{#DBNAME}"],#1)>last(/MSSQL by ODBC/mssql.non-local_db.redo_queue_size["{#GROUP_NAME}*{#REPLICA_NAME}*{#DBNAME}"],#2) and last(/MSSQL by ODBC/mssql.non-local_db.redo_queue_size["{#GROUP_NAME}*{#REPLICA_NAME}*{#DBNAME}"],#2)>last(/MSSQL by ODBC/mssql.non-local_db.redo_queue_size["{#GROUP_NAME}*{#REPLICA_NAME}*{#DBNAME}"],#3) |
HIGH | |
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is disconnected | The response of an availability replica to the DISCONNECTED state depends on its role: On the primary replica, if a secondary replica is disconnected, its secondary databases are marked as NOT SYNCHRONIZED on the primary replica, which waits for the secondary to reconnect; On a secondary replica, upon detecting that it is disconnected, the secondary replica attempts to reconnect to the primary replica. |
last(/MSSQL by ODBC/mssql.replica.connected_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=0 and last(/MSSQL by ODBC/mssql.replica.role["{#GROUP_NAME}_{#REPLICA_NAME}"])=2 |
WARNING | |
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is {ITEM.VALUE} | The operational state of the replica in a given availability group is "Pending" or "Offline". |
last(/MSSQL by ODBC/mssql.replica.operational_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=0 or last(/MSSQL by ODBC/mssql.replica.operational_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=1 or last(/MSSQL by ODBC/mssql.replica.operational_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=3 |
WARNING | |
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is {ITEM.VALUE} | The operational state of the replica in a given availability group is "Failed". |
last(/MSSQL by ODBC/mssql.replica.operational_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=4 |
AVERAGE | |
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is {ITEM.VALUE} | The operational state of the replica in a given availability group is "Failed, no quorum". |
last(/MSSQL by ODBC/mssql.replica.operational_state["{#GROUP_NAME}_{#REPLICA_NAME}"])=5 |
HIGH | |
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} Recovery in progress | At least one joined database has a database state other than ONLINE. |
last(/MSSQL by ODBC/mssql.replica.recovery_health["{#GROUP_NAME}_{#REPLICA_NAME}"])=0 |
INFO | |
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is Not healthy | At least one joined database is in the NOT SYNCHRONIZING state. |
last(/MSSQL by ODBC/mssql.replica.synchronization_health["{#GROUP_NAME}_{#REPLICA_NAME}"])=0 |
AVERAGE | |
MSSQL AG '{#GROUPNAME}' Replica '{#REPLICANAME}': {#REPLICA_NAME} is Partially healthy | Some replicas are not in the target synchronization state: synchronous-commit replicas should be synchronized, and asynchronous-commit replicas should be synchronizing. |
last(/MSSQL by ODBC/mssql.replica.synchronization_health["{#GROUP_NAME}_{#REPLICA_NAME}"])=1 |
WARNING | |
MSSQL Mirroring '{#DBNAME}': "{#DBNAME}" is {ITEM.VALUE} | The state of the mirror database and of the database mirroring session is "Suspended", "Disconnected from the other partner", or "Synchronizing". |
last(/MSSQL by ODBC/mssql.mirroring.state["{#DBNAME}"])>=0 and last(/MSSQL by ODBC/mssql.mirroring.state["{#DBNAME}"])<=2 |
INFO | |
MSSQL Mirroring '{#DBNAME}': "{#DBNAME}" is {ITEM.VALUE} | The state of the mirror database and of the database mirroring session is "Pending Failover". |
last(/MSSQL by ODBC/mssql.mirroring.state["{#DBNAME}"])=3 |
WARNING | |
MSSQL Mirroring '{#DBNAME}': "{#DBNAME}" is {ITEM.VALUE} | The state of the mirror database and of the database mirroring session is "Not synchronized". The partners are not synchronized. A failover is not possible now. |
last(/MSSQL by ODBC/mssql.mirroring.state["{#DBNAME}"])=5 |
HIGH | |
MSSQL Mirroring '{#DBNAME}': "{#DBNAME}" Witness is disconnected | The state of the witness in the database mirroring session of the database is "Disconnected". |
last(/MSSQL by ODBC/mssql.mirroring.witness_state["{#DBNAME}"])=2 |
WARNING | |
MSSQL Job '{#JOBNAME}': Failed to run | The last run of the job has failed. |
last(/MSSQL by ODBC/mssql.job.runstatus["{#JOBNAME}"])=0 |
WARNING | Manual close: YES |
MSSQL Job '{#JOBNAME}': Job duration is high | The job is taking too long. |
last(/MSSQL by ODBC/mssql.job.run_duration["{#JOBNAME}"])>{$MSSQL.BACKUP_DURATION.WARN:"{#JOBNAME}"} |
WARNING | Manual close: YES |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
http://www.grumpyolddba.co.uk/monitoring/Performance%20Counter%20Guidance%20-%20SQL%20Server.htm https://docs.microsoft.com/en-us/sql/relational-databases/performance-monitor/sql-server-access-methods-object?view=sql-server-ver15
For Zabbix version: 6.2 and higher
The template to monitor MongoDB sharded cluster by Zabbix that work without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
MongoDB cluster by Zabbix agent 2
— collects metrics from mongos proxy(router) by polling zabbix-agent2.
This template was tested on:
See Zabbix template operation for basic instructions.
Note, depending on the number of DBs and collections discovery operation may be expensive. Use filters with macros {$MONGODB.LLD.FILTER.DB.MATCHES}, {$MONGODB.LLD.FILTER.DB.NOTMATCHES}, {$MONGODB.LLD.FILTER.COLLECTION.MATCHES}, {$MONGODB.LLD.FILTER.COLLECTION.NOTMATCHES}.
All sharded Mongodb nodes (mongod) will be discovered with attached template "MongoDB node by Zabbix agent 2".
Test availability: zabbix_get -s mongos.node -k 'mongodb.ping["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"]"
No specific Zabbix configuration is required.
Name | Description | Default | ||
---|---|---|---|---|
{$MONGODB.CONNS.AVAILABLE.MIN.WARN} | Minimum number of available connections |
1000 |
||
{$MONGODB.CONNSTRING} | Connection string in the URI format (password is not used). This param overwrites a value configured in the "Server" option of the configuration file (if it's set), otherwise, the plugin's default value is used: "tcp://localhost:27017" |
tcp://localhost:27017 |
||
{$MONGODB.CURSOR.OPEN.MAX.WARN} | Maximum number of open cursors |
10000 |
||
{$MONGODB.CURSOR.TIMEOUT.MAX.WARN} | Maximum number of cursors timing out per second |
1 |
||
{$MONGODB.LLD.FILTER.COLLECTION.MATCHES} | Filter of discoverable collections |
.* |
||
{$MONGODB.LLD.FILTER.COLLECTION.NOT_MATCHES} | Filter to exclude discovered collections |
CHANGE_IF_NEEDED |
||
{$MONGODB.LLD.FILTER.DB.MATCHES} | Filter of discoverable databases |
.* |
||
{$MONGODB.LLD.FILTER.DB.NOT_MATCHES} | Filter to exclude discovered databases |
`(admin | config | local)` |
{$MONGODB.PASSWORD} | MongoDB user password |
`` | ||
{$MONGODB.USER} | MongoDB username |
`` |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Collection discovery | Collect collections metrics. Note, depending on the number of DBs and collections this discovery operation may be expensive. Use filters with macros {$MONGODB.LLD.FILTER.DB.MATCHES}, {$MONGODB.LLD.FILTER.DB.NOTMATCHES}, {$MONGODB.LLD.FILTER.COLLECTION.MATCHES}, {$MONGODB.LLD.FILTER.COLLECTION.NOTMATCHES}. |
ZABBIX_PASSIVE | mongodb.collections.discovery["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] Filter: AND- {#DBNAME} MATCHESREGEX - {#DBNAME} NOTMATCHESREGEX - {#COLLECTION} MATCHESREGEX - {#COLLECTION} NOTMATCHESREGEX |
Config servers discovery | Discovery shared cluster config servers. |
ZABBIX_PASSIVE | mongodb.cfg.discovery["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] |
Database discovery | Collect database metrics. Note, depending on the number of DBs this discovery operation may be expensive. Use filters with macros {$MONGODB.LLD.FILTER.DB.MATCHES}, {$MONGODB.LLD.FILTER.DB.NOT_MATCHES}. |
ZABBIX_PASSIVE | mongodb.db.discovery["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] Filter: AND- {#DBNAME} MATCHESREGEX - {#DBNAME} NOTMATCHES_REGEX |
Shards discovery | Discovery shared cluster hosts. |
ZABBIX_PASSIVE | mongodb.sh.discovery["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
MongoDB sharded cluster | MongoDB cluster: Ping | Test if a connection is alive or not. |
ZABBIX_PASSIVE | mongodb.ping["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
MongoDB sharded cluster | MongoDB cluster: Jumbo chunks | Total number of 'jumbo' chunks in the mongo cluster. |
ZABBIX_PASSIVE | mongodb.jumbo_chunks.count["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] |
MongoDB sharded cluster | MongoDB cluster: Mongos version | Version of the Mongos server |
DEPENDENT | mongodb.version Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
MongoDB sharded cluster | MongoDB cluster: Uptime | Number of seconds since Mongos server start |
DEPENDENT | mongodb.uptime Preprocessing: - JSONPATH: |
MongoDB sharded cluster | MongoDB cluster: Operations: command | "The number of commands issued to the database per second. Counts all commands except the write commands: insert, update, and delete." |
DEPENDENT | mongodb.opcounters.command.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB sharded cluster | MongoDB cluster: Operations: delete | The number of delete operations the mongos instance per second. |
DEPENDENT | mongodb.opcounters.delete.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB sharded cluster | MongoDB cluster: Operations: update, rate | The number of update operations the mongos instance per second. |
DEPENDENT | mongodb.opcounters.update.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB sharded cluster | MongoDB cluster: Operations: query, rate | The number of queries received the mongos instance per second. |
DEPENDENT | mongodb.opcounters.query.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB sharded cluster | MongoDB cluster: Operations: insert, rate | The number of insert operations received the mongos instance per second. |
DEPENDENT | mongodb.opcounters.insert.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB sharded cluster | MongoDB cluster: Operations: getmore, rate | "The number of “getmore” operations the mongos per second. This counter can be high even if the query count is low. Secondary nodes send getMore operations as part of the replication process." |
DEPENDENT | mongodb.opcounters.getmore.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB sharded cluster | MongoDB cluster: Last seen configserver | The latest optime of the CSRS primary that the mongos has seen. |
DEPENDENT | mongodb.lastseenconfig_server Preprocessing: - JSONPATH: |
MongoDB sharded cluster | MongoDB cluster: Configserver heartbeat | Difference between the latest optime of the CSRS primary that the mongos has seen and cluster time. |
DEPENDENT | mongodb.configserverheartbeat Preprocessing: - JAVASCRIPT: |
MongoDB sharded cluster | MongoDB cluster: Bytes in, rate | The total number of bytes that the server has received over network connections initiated by clients or other mongod/mongos instances per second. |
DEPENDENT | mongodb.network.bytesin.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MongoDB sharded cluster | MongoDB cluster: Bytes out, rate | The total number of bytes that the server has sent over network connections initiated by clients or other mongod/mongos instances per second. |
DEPENDENT | mongodb.network.bytesout.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MongoDB sharded cluster | MongoDB cluster: Requests, rate | Number of distinct requests that the server has received per second |
DEPENDENT | mongodb.network.numRequests.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB sharded cluster | MongoDB cluster: Connections, current | "The number of incoming connections from clients to the database server. This number includes the current shell session" |
DEPENDENT | mongodb.connections.current Preprocessing: - JSONPATH: |
MongoDB sharded cluster | MongoDB cluster: New connections, rate | "Rate of all incoming connections created to the server." |
DEPENDENT | mongodb.connections.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB sharded cluster | MongoDB cluster: Connections, active | "The number of active client connections to the server. Active client connections refers to client connections that currently have operations in progress. Available starting in 4.0.7, 0 for older versions." |
DEPENDENT | mongodb.connections.active Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
MongoDB sharded cluster | MongoDB cluster: Connections, available | "The number of unused incoming connections available." |
DEPENDENT | mongodb.connections.available Preprocessing: - JSONPATH: |
MongoDB sharded cluster | MongoDB cluster: Connection pool: client connections | The number of active and stored outgoing synchronous connections from the current mongos instance to other members of the sharded cluster. |
DEPENDENT | mongodb.connection_pool.client Preprocessing: - JSONPATH: |
MongoDB sharded cluster | MongoDB cluster: Connection pool: scoped | Number of active and stored outgoing scoped synchronous connections from the current mongos instance to other members of the sharded cluster. |
DEPENDENT | mongodb.connection_pool.scoped Preprocessing: - JSONPATH: |
MongoDB sharded cluster | MongoDB cluster: Connection pool: created, rate | The total number of outgoing connections created per second by the current mongos instance to other members of the sharded cluster. |
DEPENDENT | mongodb.connectionpool.created.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MongoDB sharded cluster | MongoDB cluster: Connection pool: available | The total number of available outgoing connections from the current mongos instance to other members of the sharded cluster. |
DEPENDENT | mongodb.connection_pool.available Preprocessing: - JSONPATH: |
MongoDB sharded cluster | MongoDB cluster: Connection pool: in use | Reports the total number of outgoing connections from the current mongos instance to other members of the sharded cluster set that are currently in use. |
DEPENDENT | mongodb.connectionpool.inuse Preprocessing: - JSONPATH: |
MongoDB sharded cluster | MongoDB cluster: Connection pool: refreshing | Reports the total number of outgoing connections from the current mongos instance to other members of the sharded cluster that are currently being refreshed. |
DEPENDENT | mongodb.connection_pool.refreshing Preprocessing: - JSONPATH: |
MongoDB sharded cluster | MongoDB cluster: Cursor: open no timeout | Number of open cursors with the option DBQuery.Option.noTimeout set to prevent timeout after a period of inactivity. |
DEPENDENT | mongodb.metrics.cursor.open.notimeout Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
MongoDB sharded cluster | MongoDB cluster: Cursor: open pinned | Number of pinned open cursors. |
DEPENDENT | mongodb.cursor.open.pinned Preprocessing: - JSONPATH: |
MongoDB sharded cluster | MongoDB cluster: Cursor: open total | Number of cursors that MongoDB is maintaining for clients. |
DEPENDENT | mongodb.cursor.open.total Preprocessing: - JSONPATH: |
MongoDB sharded cluster | MongoDB cluster: Cursor: timed out, rate | Number of cursors that time out, per second. |
DEPENDENT | mongodb.cursor.timedout.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MongoDB sharded cluster | MongoDB cluster: Architecture | A number, either 64 or 32, that indicates whether the MongoDB instance is compiled for 64-bit or 32-bit architecture. |
DEPENDENT | mongodb.mem.bits Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
MongoDB sharded cluster | MongoDB cluster: Memory: resident | Amount of memory currently used by the database process. |
DEPENDENT | mongodb.mem.resident Preprocessing: - JSONPATH: - MULTIPLIER: |
MongoDB sharded cluster | MongoDB cluster: Memory: virtual | Amount of virtual memory used by the mongos process. |
DEPENDENT | mongodb.mem.virtual Preprocessing: - JSONPATH: - MULTIPLIER: |
MongoDB sharded cluster | MongoDB {#DBNAME}: Objects, avg size | The average size of each document in bytes. |
DEPENDENT | mongodb.db.size["{#DBNAME}"] Preprocessing: - JSONPATH: |
MongoDB sharded cluster | MongoDB {#DBNAME}: Size, data | Total size of the data held in this database including the padding factor. |
DEPENDENT | mongodb.db.data_size["{#DBNAME}"] Preprocessing: - JSONPATH: |
MongoDB sharded cluster | MongoDB {#DBNAME}: Size, file | Total size of the data held in this database including the padding factor (only available with the mmapv1 storage engine). |
DEPENDENT | mongodb.db.filesize["{#DBNAME}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
MongoDB sharded cluster | MongoDB {#DBNAME}: Size, index | Total size of all indexes created on this database. |
DEPENDENT | mongodb.db.index_size["{#DBNAME}"] Preprocessing: - JSONPATH: |
MongoDB sharded cluster | MongoDB {#DBNAME}: Size, storage | Total amount of space allocated to collections in this database for document storage. |
DEPENDENT | mongodb.db.storage_size["{#DBNAME}"] Preprocessing: - JSONPATH: |
MongoDB sharded cluster | MongoDB {#DBNAME}: Objects, count | Number of objects (documents) in the database across all collections. |
DEPENDENT | mongodb.db.objects["{#DBNAME}"] Preprocessing: - JSONPATH: |
MongoDB sharded cluster | MongoDB {#DBNAME}: Extents | Contains a count of the number of extents in the database across all collections. |
DEPENDENT | mongodb.db.extents["{#DBNAME}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
MongoDB sharded cluster | MongoDB {#DBNAME}.{#COLLECTION}: Size | The total size in bytes of the data in the collection plus the size of every indexes on the mongodb.collection. |
DEPENDENT | mongodb.collection.size["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: |
MongoDB sharded cluster | MongoDB {#DBNAME}.{#COLLECTION}: Objects, avg size | The size of the average object in the collection in bytes. |
DEPENDENT | mongodb.collection.avgobjsize["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
MongoDB sharded cluster | MongoDB {#DBNAME}.{#COLLECTION}: Objects, count | Total number of objects in the collection. |
DEPENDENT | mongodb.collection.count["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: |
MongoDB sharded cluster | MongoDB {#DBNAME}.{#COLLECTION}: Capped, max number | Maximum number of documents in a capped collection. |
DEPENDENT | mongodb.collection.max["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
MongoDB sharded cluster | MongoDB {#DBNAME}.{#COLLECTION}: Capped, max size | Maximum size of a capped collection in bytes. |
DEPENDENT | mongodb.collection.maxsize["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
MongoDB sharded cluster | MongoDB {#DBNAME}.{#COLLECTION}: Storage size | Total storage space allocated to this collection for document storage. |
DEPENDENT | mongodb.collection.storage_size["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: |
MongoDB sharded cluster | MongoDB {#DBNAME}.{#COLLECTION}: Indexes | Total number of indices on the collection. |
DEPENDENT | mongodb.collection.nindexes["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: |
MongoDB sharded cluster | MongoDB {#DBNAME}.{#COLLECTION}: Capped | Whether or not the collection is capped. |
DEPENDENT | mongodb.collection.capped["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: - BOOLTODECIMAL - DISCARDUNCHANGEDHEARTBEAT: |
Zabbix raw items | MongoDB cluster: Get server status | The mongos statistic |
ZABBIX_PASSIVE | mongodb.server.status["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] |
Zabbix raw items | MongoDB cluster: Get mongodb.connpool.stats | Returns current info about connpool.stats. |
ZABBIX_PASSIVE | mongodb.connpool.stats["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] |
Zabbix raw items | MongoDB {#DBNAME}: Get db stats {#DBNAME} | Returns statistics reflecting the database system's state. |
ZABBIX_PASSIVE | mongodb.db.stats["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}","{#DBNAME}"] |
Zabbix raw items | MongoDB {#DBNAME}.{#COLLECTION}: Get collection stats {#DBNAME}.{#COLLECTION} | Returns a variety of storage statistics for a given collection. |
ZABBIX_PASSIVE | mongodb.collection.stats["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}","{#DBNAME}","{#COLLECTION}"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MongoDB cluster: Connection to mongos proxy is unavailable | Connection to mongos proxy instance is currently unavailable. |
last(/MongoDB cluster by Zabbix agent 2/mongodb.ping["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"])=0 |
HIGH | |
MongoDB cluster: Version has changed | MongoDB cluster version has changed. Ack to close. |
last(/MongoDB cluster by Zabbix agent 2/mongodb.version,#1)<>last(/MongoDB cluster by Zabbix agent 2/mongodb.version,#2) and length(last(/MongoDB cluster by Zabbix agent 2/mongodb.version))>0 |
INFO | Manual close: YES |
MongoDB cluster: has been restarted | Uptime is less than 10 minutes. |
last(/MongoDB cluster by Zabbix agent 2/mongodb.uptime)<10m |
INFO | Manual close: YES |
MongoDB cluster: Failed to fetch info data | Zabbix has not received data for items for the last 10 minutes |
nodata(/MongoDB cluster by Zabbix agent 2/mongodb.uptime,10m)=1 |
WARNING | Manual close: YES Depends on: - MongoDB cluster: Connection to mongos proxy is unavailable |
MongoDB cluster: Available connections is low | "Too few available connections. Consider this value in combination with the value of connections current to understand the connection load on the database" |
max(/MongoDB cluster by Zabbix agent 2/mongodb.connections.available,5m)<{$MONGODB.CONNS.AVAILABLE.MIN.WARN} |
WARNING | |
MongoDB cluster: Too many cursors opened by MongoDB for clients | - |
min(/MongoDB cluster by Zabbix agent 2/mongodb.cursor.open.total,5m)>{$MONGODB.CURSOR.OPEN.MAX.WARN} |
WARNING | |
MongoDB cluster: Too many cursors are timing out | - |
min(/MongoDB cluster by Zabbix agent 2/mongodb.cursor.timed_out.rate,5m)>{$MONGODB.CURSOR.TIMEOUT.MAX.WARN} |
WARNING |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
The template to monitor single MongoDB server by Zabbix that work without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
MongoDB node by Zabbix Agent 2
— collects metrics by polling zabbix-agent2.
This template was tested on:
See Zabbix template operation for basic instructions.
Note, depending on the number of DBs and collections discovery operation may be expensive. Use filters with macros {$MONGODB.LLD.FILTER.DB.MATCHES}, {$MONGODB.LLD.FILTER.DB.NOTMATCHES}, {$MONGODB.LLD.FILTER.COLLECTION.MATCHES}, {$MONGODB.LLD.FILTER.COLLECTION.NOTMATCHES}.
Test availability: zabbix_get -s mongodb.node -k 'mongodb.ping["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"]"
No specific Zabbix configuration is required.
Name | Description | Default | ||
---|---|---|---|---|
{$MONGODB.CONNS.PCT.USED.MAX.WARN} | Maximum percentage of used connections |
80 |
||
{$MONGODB.CONNSTRING} | Connection string in the URI format (password is not used). This param overwrites a value configured in the "Server" option of the configuration file (if it's set), otherwise, the plugin's default value is used: "tcp://localhost:27017" |
tcp://localhost:27017 |
||
{$MONGODB.CURSOR.OPEN.MAX.WARN} | Maximum number of open cursors |
10000 |
||
{$MONGODB.CURSOR.TIMEOUT.MAX.WARN} | Maximum number of cursors timing out per second |
1 |
||
{$MONGODB.LLD.FILTER.COLLECTION.MATCHES} | Filter of discoverable collections |
.* |
||
{$MONGODB.LLD.FILTER.COLLECTION.NOT_MATCHES} | Filter to exclude discovered collections |
CHANGE_IF_NEEDED |
||
{$MONGODB.LLD.FILTER.DB.MATCHES} | Filter of discoverable databases |
.* |
||
{$MONGODB.LLD.FILTER.DB.NOT_MATCHES} | Filter to exclude discovered databases |
`(admin | config | local)` |
{$MONGODB.PASSWORD} | MongoDB user password |
`` | ||
{$MONGODB.REPL.LAG.MAX.WARN} | Maximum replication lag in seconds |
10s |
||
{$MONGODB.USER} | MongoDB username |
`` | ||
{$MONGODB.WIRED_TIGER.TICKETS.AVAILABLE.MIN.WARN} | Minimum number of available WiredTiger read or write tickets remaining |
5 |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Collection discovery | Collect collections metrics. Note, depending on the number of DBs and collections this discovery operation may be expensive. Use filters with macros {$MONGODB.LLD.FILTER.DB.MATCHES}, {$MONGODB.LLD.FILTER.DB.NOTMATCHES}, {$MONGODB.LLD.FILTER.COLLECTION.MATCHES}, {$MONGODB.LLD.FILTER.COLLECTION.NOTMATCHES}. |
ZABBIX_PASSIVE | mongodb.collections.discovery["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] Filter: AND- {#DBNAME} MATCHESREGEX - {#DBNAME} NOTMATCHESREGEX - {#COLLECTION} MATCHESREGEX - {#COLLECTION} NOTMATCHESREGEX |
Database discovery | Collect database metrics. Note, depending on the number of DBs this discovery operation may be expensive. Use filters with macros {$MONGODB.LLD.FILTER.DB.MATCHES}, {$MONGODB.LLD.FILTER.DB.NOT_MATCHES}. |
ZABBIX_PASSIVE | mongodb.db.discovery["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] Filter: AND- {#DBNAME} MATCHESREGEX - {#DBNAME} NOTMATCHES_REGEX |
Replication discovery | Collect metrics by Zabbix agent if it exists |
DEPENDENT | mongodb.rs.discovery Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: Overrides: Primary metrics - ITEMPROTOTYPE LIKE Unhealthy replicas - DISCOVER- ITEMPROTOTYPE LIKE Number of unhealthy replicas - DISCOVER- ITEMPROTOTYPE LIKE Replication lag - NODISCOVERArbiter metrics 7 - ITEMPROTOTYPE LIKE Replication lag - NO_DISCOVER |
WiredTiger metrics | Collect metrics of WiredTiger Storage Engine if it exists |
DEPENDENT | mongodb.wiredtiger.discovery Preprocessing: - JAVASCRIPT: - DISCARD UNCHANGED_HEARTBEAT:6h |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
MongoDB | MongoDB: Ping | Test if a connection is alive or not. |
ZABBIX_PASSIVE | mongodb.ping["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
MongoDB | MongoDB: MongoDB version | Version of the MongoDB server. |
DEPENDENT | mongodb.version Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
MongoDB | MongoDB: Uptime | Number of seconds that the mongod process has been active. |
DEPENDENT | mongodb.uptime Preprocessing: - JSONPATH: |
MongoDB | MongoDB: Asserts: message, rate | The number of message assertions raised per second. Check the log file for more information about these messages. |
DEPENDENT | mongodb.asserts.msg.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB: Asserts: user, rate | The number of “user asserts” that have occurred per second. These are errors that user may generate, such as out of disk space or duplicate key. |
DEPENDENT | mongodb.asserts.user.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB: Asserts: warning, rate | The number of warnings raised per second. |
DEPENDENT | mongodb.asserts.warning.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB: Asserts: regular, rate | The number of regular assertions raised per second. Check the log file for more information about these messages. |
DEPENDENT | mongodb.asserts.regular.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB: Asserts: rollovers, rate | Number of times that the rollover counters roll over per second. The counters rollover to zero every 2^30 assertions. |
DEPENDENT | mongodb.asserts.rollovers.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB: Active clients: writers | The number of active client connections performing write operations. |
DEPENDENT | mongodb.active_clients.writers Preprocessing: - JSONPATH: |
MongoDB | MongoDB: Active clients: readers | The number of the active client connections performing read operations. |
DEPENDENT | mongodb.active_clients.readers Preprocessing: - JSONPATH: |
MongoDB | MongoDB: Active clients: total | The total number of internal client connections to the database including system threads as well as queued readers and writers. |
DEPENDENT | mongodb.active_clients.total Preprocessing: - JSONPATH: |
MongoDB | MongoDB: Current queue: writers | The number of operations that are currently queued and waiting for the write lock. A consistently small write-queue, particularly of shorter operations, is no cause for concern. |
DEPENDENT | mongodb.current_queue.writers Preprocessing: - JSONPATH: |
MongoDB | MongoDB: Current queue: readers | The number of operations that are currently queued and waiting for the read lock. A consistently small read-queue, particularly of shorter operations, should cause no concern. |
DEPENDENT | mongodb.current_queue.readers Preprocessing: - JSONPATH: |
MongoDB | MongoDB: Current queue: total | The total number of operations queued waiting for the lock. |
DEPENDENT | mongodb.current_queue.total Preprocessing: - JSONPATH: |
MongoDB | MongoDB: Operations: command, rate | The number of commands issued to the database the mongod instance per second. Counts all commands except the write commands: insert, update, and delete. |
DEPENDENT | mongodb.opcounters.command.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB: Operations: delete, rate | The number of delete operations the mongod instance per second. |
DEPENDENT | mongodb.opcounters.delete.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB: Operations: update, rate | The number of update operations the mongod instance per second. |
DEPENDENT | mongodb.opcounters.update.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB: Operations: query, rate | The number of queries received the mongod instance per second. |
DEPENDENT | mongodb.opcounters.query.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB: Operations: insert, rate | The number of insert operations received since the mongod instance per second. |
DEPENDENT | mongodb.opcounters.insert.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB: Operations: getmore, rate | The number of “getmore” operations since the mongod instance per second. This counter can be high even if the query count is low. Secondary nodes send getMore operations as part of the replication process. |
DEPENDENT | mongodb.opcounters.getmore.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB: Connections, current | The number of incoming connections from clients to the database server. This number includes the current shell session |
DEPENDENT | mongodb.connections.current Preprocessing: - JSONPATH: |
MongoDB | MongoDB: New connections, rate | Rate of all incoming connections created to the server. |
DEPENDENT | mongodb.connections.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB: Connections, available | The number of unused incoming connections available. |
DEPENDENT | mongodb.connections.available Preprocessing: - JSONPATH: |
MongoDB | MongoDB: Connections, active | The number of active client connections to the server. Active client connections refers to client connections that currently have operations in progress. Available starting in 4.0.7, 0 for older versions. |
DEPENDENT | mongodb.connections.active Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
MongoDB | MongoDB: Bytes in, rate | The total number of bytes that the server has received over network connections initiated by clients or other mongod/mongos instances per second. |
DEPENDENT | mongodb.network.bytesin.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MongoDB | MongoDB: Bytes out, rate | The total number of bytes that the server has sent over network connections initiated by clients or other mongod/mongos instances per second. |
DEPENDENT | mongodb.network.bytesout.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MongoDB | MongoDB: Requests, rate | Number of distinct requests that the server has received per second |
DEPENDENT | mongodb.network.numRequests.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB: Document: deleted, rate | Number of documents deleted per second. |
DEPENDENT | mongod.document.deleted.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB: Document: inserted, rate | Number of documents inserted per second. |
DEPENDENT | mongod.document.inserted.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB: Document: returned, rate | Number of documents returned by queries per second. |
DEPENDENT | mongod.document.returned.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB: Document: updated, rate | Number of documents updated per second. |
DEPENDENT | mongod.document.updated.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB: Cursor: open no timeout | Number of open cursors with the option DBQuery.Option.noTimeout set to prevent timeout after a period of inactivity. |
DEPENDENT | mongodb.metrics.cursor.open.no_timeout Preprocessing: - JSONPATH: |
MongoDB | MongoDB: Cursor: open pinned | Number of pinned open cursors. |
DEPENDENT | mongodb.cursor.open.pinned Preprocessing: - JSONPATH: |
MongoDB | MongoDB: Cursor: open total | Number of cursors that MongoDB is maintaining for clients. |
DEPENDENT | mongodb.cursor.open.total Preprocessing: - JSONPATH: |
MongoDB | MongoDB: Cursor: timed out, rate | Number of cursors that time out, per second. |
DEPENDENT | mongodb.cursor.timedout.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MongoDB | MongoDB: Architecture | A number, either 64 or 32, that indicates whether the MongoDB instance is compiled for 64-bit or 32-bit architecture. |
DEPENDENT | mongodb.mem.bits Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
MongoDB | MongoDB: Memory: mapped | Amount of mapped memory by the database. |
DEPENDENT | mongodb.mem.mapped Preprocessing: - JSONPATH: ⛔️ON_FAIL: - MULTIPLIER: |
MongoDB | MongoDB: Memory: mapped with journal | The amount of mapped memory, including the memory used for journaling. |
DEPENDENT | mongodb.mem.mappedwithjournal Preprocessing: - JSONPATH: ⛔️ON_FAIL: - MULTIPLIER: |
MongoDB | MongoDB: Memory: resident | Amount of memory currently used by the database process. |
DEPENDENT | mongodb.mem.resident Preprocessing: - JSONPATH: - MULTIPLIER: |
MongoDB | MongoDB: Memory: virtual | Amount of virtual memory used by the mongod process. |
DEPENDENT | mongodb.mem.virtual Preprocessing: - JSONPATH: - MULTIPLIER: |
MongoDB | MongoDB {#DBNAME}: Objects, avg size | The average size of each document in bytes. |
DEPENDENT | mongodb.db.size["{#DBNAME}"] Preprocessing: - JSONPATH: |
MongoDB | MongoDB {#DBNAME}: Size, data | Total size of the data held in this database including the padding factor. |
DEPENDENT | mongodb.db.data_size["{#DBNAME}"] Preprocessing: - JSONPATH: |
MongoDB | MongoDB {#DBNAME}: Size, file | Total size of the data held in this database including the padding factor (only available with the mmapv1 storage engine). |
DEPENDENT | mongodb.db.filesize["{#DBNAME}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
MongoDB | MongoDB {#DBNAME}: Size, index | Total size of all indexes created on this database. |
DEPENDENT | mongodb.db.index_size["{#DBNAME}"] Preprocessing: - JSONPATH: |
MongoDB | MongoDB {#DBNAME}: Size, storage | Total amount of space allocated to collections in this database for document storage. |
DEPENDENT | mongodb.db.storage_size["{#DBNAME}"] Preprocessing: - JSONPATH: |
MongoDB | MongoDB {#DBNAME}: Collections | Contains a count of the number of collections in that database. |
DEPENDENT | mongodb.db.collections["{#DBNAME}"] Preprocessing: - JSONPATH: |
MongoDB | MongoDB {#DBNAME}: Objects, count | Number of objects (documents) in the database across all collections. |
DEPENDENT | mongodb.db.objects["{#DBNAME}"] Preprocessing: - JSONPATH: |
MongoDB | MongoDB {#DBNAME}: Extents | Contains a count of the number of extents in the database across all collections. |
DEPENDENT | mongodb.db.extents["{#DBNAME}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Size | The total size in bytes of the data in the collection plus the size of every indexes on the mongodb.collection. |
DEPENDENT | mongodb.collection.size["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Objects, avg size | The size of the average object in the collection in bytes. |
DEPENDENT | mongodb.collection.avgobjsize["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Objects, count | Total number of objects in the collection. |
DEPENDENT | mongodb.collection.count["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Capped: max number | Maximum number of documents that may be present in a capped collection. |
DEPENDENT | mongodb.collection.maxnumber["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Capped: max size | Maximum size of a capped collection in bytes. |
DEPENDENT | mongodb.collection.maxsize["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Storage size | Total storage space allocated to this collection for document storage. |
DEPENDENT | mongodb.collection.storage_size["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Indexes | Total number of indices on the collection. |
DEPENDENT | mongodb.collection.nindexes["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Capped | Whether or not the collection is capped. |
DEPENDENT | mongodb.collection.capped["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: - BOOLTODECIMAL - DISCARDUNCHANGEDHEARTBEAT: |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Operations: total, rate | The number of operations per second. |
DEPENDENT | mongodb.collection.ops.total.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Read lock, rate | The number of operations per second. |
DEPENDENT | mongodb.collection.readlock.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Write lock, rate | The number of operations per second. |
DEPENDENT | mongodb.collection.writelock.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Operations: queries, rate | The number of operations per second. |
DEPENDENT | mongodb.collection.ops.queries.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Operations: getmore, rate | The number of operations per second. |
DEPENDENT | mongodb.collection.ops.getmore.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Operations: insert, rate | The number of operations per second. |
DEPENDENT | mongodb.collection.ops.insert.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Operations: update, rate | The number of operations per second. |
DEPENDENT | mongodb.collection.ops.update.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Operations: remove, rate | The number of operations per second. |
DEPENDENT | mongodb.collection.ops.remove.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Operations: commands, rate | The number of operations per second. |
DEPENDENT | mongodb.collection.ops.commands.rate["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Operations: total, ms/s | Fraction of time (ms/s) the mongod has spent to operations. |
DEPENDENT | mongodb.collection.ops.total.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Read lock, ms/s | Fraction of time (ms/s) the mongod has spent to operations. |
DEPENDENT | mongodb.collection.readlock.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Write lock, ms/s | Fraction of time (ms/s) the mongod has spent to operations. |
DEPENDENT | mongodb.collection.writelock.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Operations: queries, ms/s | Fraction of time (ms/s) the mongod has spent to operations. |
DEPENDENT | mongodb.collection.ops.queries.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Operations: getmore, ms/s | Fraction of time (ms/s) the mongod has spent to operations. |
DEPENDENT | mongodb.collection.ops.getmore.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Operations: insert, ms/s | Fraction of time (ms/s) the mongod has spent to operations. |
DEPENDENT | mongodb.collection.ops.insert.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Operations: update, ms/s | Fraction of time (ms/s) the mongod has spent to operations. |
DEPENDENT | mongodb.collection.ops.update.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Operations: remove, ms/s | Fraction of time (ms/s) the mongod has spent to operations. |
DEPENDENT | mongodb.collection.ops.remove.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB {#DBNAME}.{#COLLECTION}: Operations: commands, ms/s | Fraction of time (ms/s) the mongod has spent to operations. |
DEPENDENT | mongodb.collection.ops.commands.ms["{#DBNAME}","{#COLLECTION}"] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
MongoDB | MongoDB: Node state | An integer between 0 and 10 that represents the replica state of the current member. |
DEPENDENT | mongodb.rs.state[{#RSNAME}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
MongoDB | MongoDB: Replication lag | Delay between a write operation on the primary and its copy to a secondary. |
DEPENDENT | mongodb.rs.lag[{#RS_NAME}] Preprocessing: - JSONPATH: |
MongoDB | MongoDB: Number of replicas | The number of replicated nodes in current ReplicaSet. |
DEPENDENT | mongodb.rs.totalnodes[{#RSNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
MongoDB | MongoDB: Number of unhealthy replicas | The number of replicated nodes with member health value = 0. |
DEPENDENT | mongodb.rs.unhealthycount[{#RSNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
MongoDB | MongoDB: Unhealthy replicas | The replicated nodes in current ReplicaSet with member health value = 0. |
DEPENDENT | mongodb.rs.unhealthy[{#RSNAME}] Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARD UNCHANGED_HEARTBEAT:1h |
MongoDB | MongoDB: Apply batches, rate | Number of batches applied across all databases per second. |
DEPENDENT | mongodb.rs.apply.batches.rate[{#RSNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MongoDB | MongoDB: Apply batches, ms/s | Fraction of time (ms/s) the mongod has spent applying operations from the oplog. |
DEPENDENT | mongodb.rs.apply.batches.ms.rate[{#RSNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MongoDB | MongoDB: Apply ops, rate | Number of oplog operations applied per second. |
DEPENDENT | mongodb.rs.apply.rate[{#RSNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MongoDB | MongoDB: Buffer | Number of operations in the oplog buffer. |
DEPENDENT | mongodb.rs.buffer.count[{#RS_NAME}] Preprocessing: - JSONPATH: |
MongoDB | MongoDB: Buffer, max size | Maximum size of the buffer. |
DEPENDENT | mongodb.rs.buffer.maxsize[{#RSNAME}] Preprocessing: - JSONPATH: |
MongoDB | MongoDB: Buffer, size | Current size of the contents of the oplog buffer. |
DEPENDENT | mongodb.rs.buffer.size[{#RS_NAME}] Preprocessing: - JSONPATH: |
MongoDB | MongoDB: Network bytes, rate | Amount of data read from the replication sync source per second. |
DEPENDENT | mongodb.rs.network.bytes.rate[{#RSNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MongoDB | MongoDB: Network getmores, rate | Number of getmore operations per second. |
DEPENDENT | mongodb.rs.network.getmores.rate[{#RSNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MongoDB | MongoDB: Network getmores, ms/s | Fraction of time (ms/s) required to collect data from getmore operations. |
DEPENDENT | mongodb.rs.network.getmores.ms.rate[{#RSNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MongoDB | MongoDB: Network ops, rate | Number of operations read from the replication source per second. |
DEPENDENT | mongodb.rs.network.ops.rate[{#RSNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MongoDB | MongoDB: Network readers created, rate | Number of oplog query processes created per second. |
DEPENDENT | mongodb.rs.network.readers.rate[{#RSNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
MongoDB | MongoDB {#RS_NAME}: Oplog time diff | Oplog window: difference between the first and last operation in the oplog. Only present if there are entries in the oplog. |
DEPENDENT | mongodb.rs.oplog.timediff[{#RS_NAME}] Preprocessing: - JSONPATH: |
MongoDB | MongoDB: Preload docs, rate | Number of documents loaded per second during the pre-fetch stage of replication. |
DEPENDENT | mongodb.rs.preload.docs.rate[{#RSNAME}] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - CHANGEPERSECOND |
MongoDB | MongoDB: Preload docs, ms/s | Fraction of time (ms/s) spent loading documents as part of the pre-fetch stage of replication. |
DEPENDENT | mongodb.rs.preload.docs.ms.rate[{#RSNAME}] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - CHANGEPERSECOND |
MongoDB | MongoDB: Preload indexes, rate | Number of index entries loaded by members before updating documents as part of the pre-fetch stage of replication. |
DEPENDENT | mongodb.rs.preload.indexes.rate[{#RSNAME}] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - CHANGEPERSECOND |
MongoDB | MongoDB: Preload indexes, ms/s | Fraction of time (ms/s) spent loading documents as part of the pre-fetch stage of replication. |
DEPENDENT | mongodb.rs.preload.indexes.ms.rate[{#RSNAME}] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - CHANGEPERSECOND |
MongoDB | MongoDB: WiredTiger cache: bytes | Size of the data currently in cache. |
DEPENDENT | mongodb.wiredtiger.cache.bytesin_cache[{#SINGLETON}] Preprocessing: - JSONPATH: |
MongoDB | MongoDB: WiredTiger cache: in-memory page splits | In-memory page splits. |
DEPENDENT | mongodb.wired_tiger.cache.splits[{#SINGLETON}] Preprocessing: - JSONPATH: |
MongoDB | MongoDB: WiredTiger cache: bytes, max | Maximum cache size. |
DEPENDENT | mongodb.wiredtiger.cache.maximumbytes_configured[{#SINGLETON}] Preprocessing: - JSONPATH: |
MongoDB | MongoDB: WiredTiger cache: max page size at eviction | Maximum page size at eviction. |
DEPENDENT | mongodb.wiredtiger.cache.maxpagesizeeviction[{#SINGLETON}] Preprocessing: - JSONPATH: |
MongoDB | MongoDB: WiredTiger cache: modified pages evicted | Number of pages, that have been modified, evicted from the cache. |
DEPENDENT | mongodb.wiredtiger.cache.modifiedpages_evicted[{#SINGLETON}] Preprocessing: - JSONPATH: |
MongoDB | MongoDB: WiredTiger cache: pages read into cache | Number of pages read into the cache. |
DEPENDENT | mongodb.wiredtiger.cache.pagesread[{#SINGLETON}] Preprocessing: - JSONPATH: |
MongoDB | MongoDB: WiredTiger cache: pages written from cache | Number of pages written from the cache. |
DEPENDENT | mongodb.wiredtiger.cache.pageswritten[{#SINGLETON}] Preprocessing: - JSONPATH: |
MongoDB | MongoDB: WiredTiger cache: pages held in cache | Number of pages currently held in the cache. |
DEPENDENT | mongodb.wiredtiger.cache.pagesin_cache[{#SINGLETON}] Preprocessing: - JSONPATH: |
MongoDB | MongoDB: WiredTiger cache: pages evicted by application threads, rate | Number of page evicted by application threads per second. |
DEPENDENT | mongodb.wiredtiger.cache.pagesevicted_threads.rate[{#SINGLETON}] Preprocessing: - JSONPATH: |
MongoDB | MongoDB: WiredTiger cache: tracked dirty bytes in the cache | Size of the dirty data in the cache. |
DEPENDENT | mongodb.wiredtiger.cache.trackeddirty_bytes[{#SINGLETON}] Preprocessing: - JSONPATH: |
MongoDB | MongoDB: WiredTiger cache: unmodified pages evicted | Number of pages, that were not modified, evicted from the cache. |
DEPENDENT | mongodb.wiredtiger.cache.unmodifiedpages_evicted[{#SINGLETON}] Preprocessing: - JSONPATH: |
MongoDB | MongoDB: WiredTiger concurrent transactions: read, available | Number of available read tickets (concurrent transactions) remaining. |
DEPENDENT | mongodb.wiredtiger.concurrenttransactions.read.available[{#SINGLETON}] Preprocessing: - JSONPATH: |
MongoDB | MongoDB: WiredTiger concurrent transactions: read, out | Number of read tickets (concurrent transactions) in use. |
DEPENDENT | mongodb.wiredtiger.concurrenttransactions.read.out[{#SINGLETON}] Preprocessing: - JSONPATH: |
MongoDB | MongoDB: WiredTiger concurrent transactions: read, total tickets | Total number of read tickets (concurrent transactions) available. |
DEPENDENT | mongodb.wiredtiger.concurrenttransactions.read.totalTickets[{#SINGLETON}] Preprocessing: - JSONPATH: |
MongoDB | MongoDB: WiredTiger concurrent transactions: write, available | Number of available write tickets (concurrent transactions) remaining. |
DEPENDENT | mongodb.wiredtiger.concurrenttransactions.write.available[{#SINGLETON}] Preprocessing: - JSONPATH: |
MongoDB | MongoDB: WiredTiger concurrent transactions: write, out | Number of write tickets (concurrent transactions) in use. |
DEPENDENT | mongodb.wiredtiger.concurrenttransactions.write.out[{#SINGLETON}] Preprocessing: - JSONPATH: |
MongoDB | MongoDB: WiredTiger concurrent transactions: write, total tickets | Total number of write tickets (concurrent transactions) available. |
DEPENDENT | mongodb.wiredtiger.concurrenttransactions.write.totalTickets[{#SINGLETON}] Preprocessing: - JSONPATH: |
Zabbix raw items | MongoDB: Get server status | Returns a database's state. |
ZABBIX_PASSIVE | mongodb.server.status["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] |
Zabbix raw items | MongoDB: Get Replica Set status | Returns the replica set status from the point of view of the member where the method is run. |
ZABBIX_PASSIVE | mongodb.rs.status["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] |
Zabbix raw items | MongoDB: Get oplog stats | Returns status of the replica set, using data polled from the oplog. |
ZABBIX_PASSIVE | mongodb.oplog.stats["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] |
Zabbix raw items | MongoDB: Get collections usage stats | Returns usage statistics for each collection. |
ZABBIX_PASSIVE | mongodb.collections.usage["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"] |
Zabbix raw items | MongoDB {#DBNAME}: Get db stats {#DBNAME} | Returns statistics reflecting the database system's state. |
ZABBIX_PASSIVE | mongodb.db.stats["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}","{#DBNAME}"] |
Zabbix raw items | MongoDB {#DBNAME}.{#COLLECTION}: Get collection stats {#DBNAME}.{#COLLECTION} | Returns a variety of storage statistics for a given collection. |
ZABBIX_PASSIVE | mongodb.collection.stats["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}","{#DBNAME}","{#COLLECTION}"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MongoDB: Connection to MongoDB is unavailable | Connection to MongoDB instance is currently unavailable. |
last(/MongoDB node by Zabbix agent 2/mongodb.ping["{$MONGODB.CONNSTRING}","{$MONGODB.USER}","{$MONGODB.PASSWORD}"])=0 |
HIGH | |
MongoDB: Version has changed | MongoDB version has changed. Ack to close. |
last(/MongoDB node by Zabbix agent 2/mongodb.version,#1)<>last(/MongoDB node by Zabbix agent 2/mongodb.version,#2) and length(last(/MongoDB node by Zabbix agent 2/mongodb.version))>0 |
INFO | Manual close: YES |
MongoDB: has been restarted | Uptime is less than 10 minutes. |
last(/MongoDB node by Zabbix agent 2/mongodb.uptime)<10m |
INFO | Manual close: YES |
MongoDB: Failed to fetch info data | Zabbix has not received data for items for the last 10 minutes |
nodata(/MongoDB node by Zabbix agent 2/mongodb.uptime,10m)=1 |
WARNING | Manual close: YES Depends on: - MongoDB: Connection to MongoDB is unavailable |
MongoDB: Total number of open connections is too high | Too few available connections. If MongoDB runs low on connections, in may not be able to handle incoming requests in a timely manner. |
min(/MongoDB node by Zabbix agent 2/mongodb.connections.current,5m)/(last(/MongoDB node by Zabbix agent 2/mongodb.connections.available)+last(/MongoDB node by Zabbix agent 2/mongodb.connections.current))*100>{$MONGODB.CONNS.PCT.USED.MAX.WARN} |
WARNING | |
MongoDB: Too many cursors opened by MongoDB for clients | - |
min(/MongoDB node by Zabbix agent 2/mongodb.cursor.open.total,5m)>{$MONGODB.CURSOR.OPEN.MAX.WARN} |
WARNING | |
MongoDB: Too many cursors are timing out | - |
min(/MongoDB node by Zabbix agent 2/mongodb.cursor.timed_out.rate,5m)>{$MONGODB.CURSOR.TIMEOUT.MAX.WARN} |
WARNING | |
MongoDB: Node in ReplicaSet changed the state | Node in ReplicaSet changed the state. Ack to close. |
last(/MongoDB node by Zabbix agent 2/mongodb.rs.state[{#RS_NAME}],#1)<>last(/MongoDB node by Zabbix agent 2/mongodb.rs.state[{#RS_NAME}],#2) |
WARNING | Manual close: YES |
MongoDB: Replication lag with primary is too high | - |
min(/MongoDB node by Zabbix agent 2/mongodb.rs.lag[{#RS_NAME}],5m)>{$MONGODB.REPL.LAG.MAX.WARN} |
WARNING | |
MongoDB: There are unhealthy replicas in ReplicaSet | - |
last(/MongoDB node by Zabbix agent 2/mongodb.rs.unhealthy_count[{#RS_NAME}])>0 and length(last(/MongoDB node by Zabbix agent 2/mongodb.rs.unhealthy[{#RS_NAME}]))>0 |
AVERAGE | |
MongoDB: Available WiredTiger read tickets is low | "Too few available read tickets. When the number of available read tickets remaining reaches zero, new read requests will be queued until a new read ticket is available." |
max(/MongoDB node by Zabbix agent 2/mongodb.wired_tiger.concurrent_transactions.read.available[{#SINGLETON}],5m)<{$MONGODB.WIRED_TIGER.TICKETS.AVAILABLE.MIN.WARN} |
WARNING | |
MongoDB: Available WiredTiger write tickets is low | "Too few available write tickets. When the number of available write tickets remaining reaches zero, new write requests will be queued until a new write ticket is available." |
max(/MongoDB node by Zabbix agent 2/mongodb.wired_tiger.concurrent_transactions.write.available[{#SINGLETON}],5m)<{$MONGODB.WIRED_TIGER.TICKETS.AVAILABLE.MIN.WARN} |
WARNING |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
The template to monitor InfluxDB by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template InfluxDB by HTTP
— collects metrics by HTTP agent from InfluxDB /metrics endpoint.
See:
This template was tested on:
See Zabbix template operation for basic instructions.
This template works with self-hosted InfluxDB instances. Internal service metrics are collected from InfluxDB /metrics endpoint. For organization discovery template need to use Authorization via API token. See docs: https://docs.influxdata.com/influxdb/v2.0/security/tokens/
Don't forget to change the macros {$INFLUXDB.URL}, {$INFLUXDB.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values. NOTE. Some metrics may not be collected depending on your InfluxDB instance version and configuration.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$INFLUXDB.API.TOKEN} | InfluxDB API Authorization Token |
`` |
{$INFLUXDB.ORG_NAME.MATCHES} | Filter of discoverable organizations |
.* |
{$INFLUXDB.ORGNAME.NOTMATCHES} | Filter to exclude discovered organizations |
CHANGE_IF_NEEDED |
{$INFLUXDB.REQ.FAIL.MAX.WARN} | Maximum number of query requests failures for trigger expression. |
2 |
{$INFLUXDB.TASK.RUN.FAIL.MAX.WARN} | Maximum number of tasks runs failures for trigger expression. |
2 |
{$INFLUXDB.URL} | InfluxDB instance URL |
http://localhost:8086 |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Organizations discovery | Discovery of organizations metrics. |
HTTP_AGENT | influxdb.orgs.discovery Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: Filter: AND- {#ORGNAME} NOTMATCHESREGEX - {#ORGNAME} MATCHES_REGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
InfluxDB | InfluxDB: Instance status | Get the health of an instance. |
HTTP_AGENT | influx.healthcheck Preprocessing: - CHECKNOTSUPPORTED ⛔️ONFAIL: - JAVASCRIPT: - DISCARDUNCHANGED_HEARTBEAT: |
InfluxDB | InfluxDB: Boltdb reads, rate | Total number of boltdb reads per second. |
DEPENDENT | influxdb.boltdbreads.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - CHANGEPERSECOND |
InfluxDB | InfluxDB: Boltdb writes, rate | Total number of boltdb writes per second. |
DEPENDENT | influxdb.boltdbwrites.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - CHANGEPERSECOND |
InfluxDB | InfluxDB: Buckets, total | Number of total buckets on the server. |
DEPENDENT | influxdb.buckets.total Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
InfluxDB | InfluxDB: Dashboards, total | Number of total dashboards on the server. |
DEPENDENT | influxdb.dashboards.total Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
InfluxDB | InfluxDB: Organizations, total | Number of total organizations on the server. |
DEPENDENT | influxdb.organizations.total Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
InfluxDB | InfluxDB: Scrapers, total | Number of total scrapers on the server. |
DEPENDENT | influxdb.scrapers.total Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
InfluxDB | InfluxDB: Telegraf plugins, total | Number of individual telegraf plugins configured. |
DEPENDENT | influxdb.telegrafplugins.total Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - DISCARDUNCHANGEDHEARTBEAT: |
InfluxDB | InfluxDB: Telegrafs, total | Number of total telegraf configurations on the server. |
DEPENDENT | influxdb.telegrafs.total Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
InfluxDB | InfluxDB: Tokens, total | Number of total tokens on the server. |
DEPENDENT | influxdb.tokens.total Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
InfluxDB | InfluxDB: Users, total | Number of total users on the server. |
DEPENDENT | influxdb.users.total Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
InfluxDB | InfluxDB: Version | Version of the InfluxDB instance. |
DEPENDENT | influxdb.version Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
InfluxDB | InfluxDB: Uptime | InfluxDB process uptime in seconds. |
DEPENDENT | influxdb.uptime Preprocessing: - JSONPATH: |
InfluxDB | InfluxDB: Workers currently running | Total number of workers currently running tasks. |
DEPENDENT | influxdb.taskexecutorrunsactive.total Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
InfluxDB | InfluxDB: Workers busy, pct | Percent of total available workers that are currently busy. |
DEPENDENT | influxdb.taskexecutorworkersbusy.pct Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
InfluxDB | InfluxDB: Task runs failed, rate | Total number of failure runs across all tasks. |
DEPENDENT | influxdb.taskexecutorcomplete.failed.rate Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
InfluxDB | InfluxDB: Task runs successful, rate | Total number of runs successful completed across all tasks. |
DEPENDENT | influxdb.taskexecutorcomplete.successful.rate Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
InfluxDB | InfluxDB: [{#ORG_NAME}] Query requests bytes, success | Count of bytes received with status 200 per second. |
DEPENDENT | influxdb.org.queryrequestbytes.success.rate["{#ORGNAME}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - CHANGEPERSECOND |
InfluxDB | InfluxDB: [{#ORG_NAME}] Query requests bytes, failed | Count of bytes received with status not 200 per second. |
DEPENDENT | influxdb.org.queryrequestbytes.failed.rate["{#ORGNAME}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - CHANGEPERSECOND |
InfluxDB | InfluxDB: [{#ORG_NAME}] Query requests, failed | Total number of query requests with status not 200 per second. |
DEPENDENT | influxdb.org.queryrequest.failed.rate["{#ORGNAME}"] Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
InfluxDB | InfluxDB: [{#ORG_NAME}] Query requests, success | Total number of query requests with status 200 per second. |
DEPENDENT | influxdb.org.queryrequest.success.rate["{#ORGNAME}"] Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
InfluxDB | InfluxDB: [{#ORG_NAME}] Query response bytes, success | Count of bytes returned with status 200 per second. |
DEPENDENT | influxdb.org.httpqueryresponsebytes.success.rate["{#ORGNAME}"] Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
InfluxDB | InfluxDB: [{#ORG_NAME}] Query response bytes, failed | Count of bytes returned with status not 200 per second. |
DEPENDENT | influxdb.org.httpqueryresponsebytes.failed.rate["{#ORGNAME}"] Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
Zabbix raw items | InfluxDB: Get instance metrics | - |
HTTP_AGENT | influx.getmetrics Preprocessing: - CHECK NOTSUPPORTED⛔️ON FAIL:DISCARD_VALUE -> - PROMETHEUSTOJSON |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
InfluxDB: Health check was failed | The InfluxDB instance is not available or unhealthy. |
last(/InfluxDB by HTTP/influx.healthcheck)=0 |
HIGH | |
InfluxDB: Version has changed | InfluxDB version has changed. Ack to close. |
last(/InfluxDB by HTTP/influxdb.version,#1)<>last(/InfluxDB by HTTP/influxdb.version,#2) and length(last(/InfluxDB by HTTP/influxdb.version))>0 |
INFO | Manual close: YES |
InfluxDB: has been restarted | Uptime is less than 10 minutes. |
last(/InfluxDB by HTTP/influxdb.uptime)<10m |
INFO | Manual close: YES |
InfluxDB: Too many tasks failure runs | "Number of failure runs completed across all tasks is too high." |
min(/InfluxDB by HTTP/influxdb.task_executor_complete.failed.rate,5m)>{$INFLUXDB.TASK.RUN.FAIL.MAX.WARN} |
WARNING | |
InfluxDB: [{#ORG_NAME}]: Too many requests failures | Too many query requests failed. |
min(/InfluxDB by HTTP/influxdb.org.query_request.failed.rate["{#ORG_NAME}"],5m)>{$INFLUXDB.REQ.FAIL.MAX.WARN} |
WARNING |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
Official JMX Template for Apache Ignite computing platform.
This template is based on the original template developed by Igor Akkuratov, Senior Engineer at GridGain Systems and Apache Ignite Contributor.
This template was tested on:
See Zabbix template operation for basic instructions.
This template works with standalone and cluster instances. Metrics are collected by JMX. All metrics are discoverable.
-DIGNITE_MBEAN_APPEND_CLASS_LOADER_ID=false
to will exclude one level with Classloader name. You can configure Cache and Data Region metrics which you want using official guide.No specific Zabbix configuration is required.
Name | Description | Default | |||
---|---|---|---|---|---|
{$IGNITE.CHECKPOINT.PUSED.MAX.HIGH} | The maximum percent of checkpoint buffer utilization for high trigger expression. |
80 |
|||
{$IGNITE.CHECKPOINT.PUSED.MAX.WARN} | The maximum percent of checkpoint buffer utilization for warning trigger expression. |
66 |
|||
{$IGNITE.DATA.REGION.PUSED.MAX.HIGH} | The maximum percent of data region utilization for high trigger expression. |
90 |
|||
{$IGNITE.DATA.REGION.PUSED.MAX.WARN} | The maximum percent of data region utilization for warning trigger expression. |
80 |
|||
{$IGNITE.JOBS.QUEUE.MAX.WARN} | The maximum number of queued jobs for trigger expression. |
10 |
|||
{$IGNITE.LLD.FILTER.CACHE.MATCHES} | Filter of discoverable cache groups. |
.* |
|||
{$IGNITE.LLD.FILTER.CACHE.NOT_MATCHES} | Filter to exclude discovered cache groups. |
CHANGE_IF_NEEDED |
|||
{$IGNITE.LLD.FILTER.DATA.REGION.MATCHES} | Filter of discoverable data regions. |
.* |
|||
{$IGNITE.LLD.FILTER.DATA.REGION.NOT_MATCHES} | Filter to exclude discovered data regions. |
`^(sysMemPlc | TxLog)$` | ||
{$IGNITE.LLD.FILTER.THREAD.POOL.MATCHES} | Filter of discoverable thread pools. |
.* |
|||
{$IGNITE.LLD.FILTER.THREAD.POOL.NOT_MATCHES} | Filter to exclude discovered thread pools. |
`^(GridCallbackExecutor | GridRebalanceStripedExecutor | GridDataStreamExecutor | StripedExecutor)$` |
{$IGNITE.PASSWORD} | - |
<secret> |
|||
{$IGNITE.PME.DURATION.MAX.HIGH} | The maximum PME duration in ms for high trigger expression. |
60000 |
|||
{$IGNITE.PME.DURATION.MAX.WARN} | The maximum PME duration in ms for warning trigger expression. |
10000 |
|||
{$IGNITE.THREAD.QUEUE.MAX.WARN} | Threshold for thread pool queue size. Can be used with thread pool name as context. |
1000 |
|||
{$IGNITE.THREADS.COUNT.MAX.WARN} | The maximum number of running threads for trigger expression. |
1000 |
|||
{$IGNITE.USER} | - |
zabbix |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Cache groups | - |
JMX | jmx.discovery[beans,"org.apache:group=\"Cache groups\",*"] Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: Filter: AND- {#JMXNAME} MATCHESREGEX - {#JMXNAME} NOTMATCHES_REGEX |
Cache metrics | - |
JMX | jmx.discovery[beans,"org.apache:name=\"org.apache.ignite.internal.processors.cache.CacheLocalMetricsMXBeanImpl\",*"] Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: Filter: AND- {#JMXGROUP} MATCHESREGEX - {#JMXGROUP} NOTMATCHES_REGEX |
Cluster metrics | - |
JMX | jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterMetricsMXBeanImpl,*"] Preprocessing: - JAVASCRIPT: |
Data region metrics | - |
JMX | jmx.discovery[beans,"org.apache:group=DataRegionMetrics,*"] Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: Filter: AND- {#JMXNAME} MATCHESREGEX - {#JMXNAME} NOTMATCHES_REGEX |
Ignite kernal metrics | - |
JMX | jmx.discovery[beans,"org.apache:group=Kernal,name=IgniteKernal,*"] Preprocessing: - JAVASCRIPT: |
Local node metrics | - |
JMX | jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterLocalNodeMetricsMXBeanImpl,*"] Preprocessing: - JAVASCRIPT: |
TCP Communication SPI metrics | - |
JMX | jmx.discovery[beans,"org.apache:group=SPIs,name=TcpCommunicationSpi,*"] Preprocessing: - JAVASCRIPT: |
TCP discovery SPI | - |
JMX | jmx.discovery[beans,"org.apache:group=SPIs,name=TcpDiscoverySpi,*"] Preprocessing: - JAVASCRIPT: |
Thread pool metrics | - |
JMX | jmx.discovery[beans,"org.apache:group=\"Thread Pools\",*"] Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: Filter: AND- {#JMXNAME} MATCHESREGEX - {#JMXNAME} NOTMATCHES_REGEX |
Transaction metrics | - |
JMX | jmx.discovery[beans,"org.apache:group=TransactionMetrics,name=TransactionMetricsMxBeanImpl,*"] Preprocessing: - JAVASCRIPT: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Uptime | Uptime of Ignite instance. |
JMX | jmx["{#JMXOBJ}",UpTime] Preprocessing: - MULTIPLIER: |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Version | Version of Ignite instance. |
JMX | jmx["{#JMXOBJ}",FullVersion] Preprocessing: - REGEX: - DISCARDUNCHANGEDHEARTBEAT: |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Local node ID | Unique identifier for this node within grid. |
JMX | jmx["{#JMXOBJ}",LocalNodeId] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, Baseline | Total baseline nodes that are registered in the baseline topology. |
JMX | jmx["{#JMXOBJ}",TotalBaselineNodes] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, Active baseline | The number of nodes that are currently active in the baseline topology. |
JMX | jmx["{#JMXOBJ}",ActiveBaselineNodes] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, Client | The number of client nodes in the cluster. |
JMX | jmx["{#JMXOBJ}",TotalClientNodes] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, total | Total number of nodes. |
JMX | jmx["{#JMXOBJ}",TotalNodes] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes, Server | The number of server nodes in the cluster. |
JMX | jmx["{#JMXOBJ}",TotalServerNodes] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, current | Number of cancelled jobs that are still running. |
JMX | jmx["{#JMXOBJ}",CurrentCancelledJobs] |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs rejected, current | Number of jobs rejected after more recent collision resolution operation. |
JMX | jmx["{#JMXOBJ}",CurrentRejectedJobs] |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs waiting, current | Number of queued jobs currently waiting to be executed. |
JMX | jmx["{#JMXOBJ}",CurrentWaitingJobs] |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs active, current | Number of currently active jobs concurrently executing on the node. |
JMX | jmx["{#JMXOBJ}",CurrentActiveJobs] |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs executed, rate | Total number of jobs handled by the node per second. |
JMX | jmx["{#JMXOBJ}",TotalExecutedJobs] Preprocessing: - CHANGEPERSECOND |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, rate | Total number of jobs cancelled by the node per second. |
JMX | jmx["{#JMXOBJ}",TotalCancelledJobs] Preprocessing: - CHANGEPERSECOND |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Jobs rejects, rate | Total number of jobs this node rejects during collision resolution operations since node startup per second. |
JMX | jmx["{#JMXOBJ}",TotalRejectedJobs] Preprocessing: - CHANGEPERSECOND |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: PME duration, current | Current PME duration in milliseconds. |
JMX | jmx["{#JMXOBJ}",CurrentPmeDuration] |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Threads count, current | Current number of live threads. |
JMX | jmx["{#JMXOBJ}",CurrentThreadCount] |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Heap memory used | Current heap size that is used for object allocation. |
JMX | jmx["{#JMXOBJ}",HeapMemoryUsed] |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Coordinator | Current coordinator UUID. |
JMX | jmx["{#JMXOBJ}",Coordinator] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes left | Nodes left count. |
JMX | jmx["{#JMXOBJ}",NodesLeft] |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes joined | Nodes join count. |
JMX | jmx["{#JMXOBJ}",NodesJoined] |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Nodes failed | Nodes failed count. |
JMX | jmx["{#JMXOBJ}",NodesFailed] |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Discovery message worker queue | Message worker queue current size. |
JMX | jmx["{#JMXOBJ}",MessageWorkerQueueSize] |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Discovery reconnect, rate | Number of times node tries to (re)establish connection to another node per second. |
JMX | jmx["{#JMXOBJ}",ReconnectCount] Preprocessing: - CHANGEPERSECOND |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: TotalProcessedMessages | The number of messages received per second. |
JMX | jmx["{#JMXOBJ}",TotalProcessedMessages] Preprocessing: - CHANGEPERSECOND |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Discovery messages received, rate | The number of messages processed per second. |
JMX | jmx["{#JMXOBJ}",TotalReceivedMessages] Preprocessing: - CHANGEPERSECOND |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Communication outbound messages queue | Outbound messages queue size. |
JMX | jmx["{#JMXOBJ}",OutboundMessagesQueueSize] |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Communication messages received, rate | The number of messages received per second. |
JMX | jmx["{#JMXOBJ}",ReceivedMessagesCount] Preprocessing: - CHANGEPERSECOND |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Communication messages sent, rate | The number of messages sent per second. |
JMX | jmx["{#JMXOBJ}",SentMessagesCount] Preprocessing: - CHANGEPERSECOND |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Locked keys | The number of keys locked on the node. |
JMX | jmx["{#JMXOBJ}",LockedKeysNumber] |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Transactions owner, current | The number of active transactions for which this node is the initiator. |
JMX | jmx["{#JMXOBJ}",OwnerTransactionsNumber] |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Transactions holding lock, current | The number of active transactions holding at least one key lock. |
JMX | jmx["{#JMXOBJ}",TransactionsHoldingLockNumber] |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Transactions rolledback, rate | The number of transactions which were rollback per second. |
JMX | jmx["{#JMXOBJ}",TransactionsRolledBackNumber] |
Ignite | Ignite [{#JMXIGNITEINSTANCENAME}]: Transactions committed, rate | The number of transactions which were committed per second. |
JMX | jmx["{#JMXOBJ}",TransactionsCommittedNumber] |
Ignite | Cache group [{#JMXGROUP}]: Cache gets, rate | The number of gets to the cache per second. |
JMX | jmx["{#JMXOBJ}",CacheGets] Preprocessing: - CHANGEPERSECOND |
Ignite | Cache group [{#JMXGROUP}]: Cache puts, rate | The number of puts to the cache per second. |
JMX | jmx["{#JMXOBJ}",CachePuts] Preprocessing: - CHANGEPERSECOND |
Ignite | Cache group [{#JMXGROUP}]: Cache removals, rate | The number of removals from the cache per second. |
JMX | jmx["{#JMXOBJ}",CacheRemovals] Preprocessing: - CHANGEPERSECOND |
Ignite | Cache group [{#JMXGROUP}]: Cache hits, pct | Percentage of successful hits. |
JMX | jmx["{#JMXOBJ}",CacheHitPercentage] |
Ignite | Cache group [{#JMXGROUP}]: Cache misses, pct | Percentage of accesses that failed to find anything. |
JMX | jmx["{#JMXOBJ}",CacheMissPercentage] |
Ignite | Cache group [{#JMXGROUP}]: Cache transaction commits, rate | The number of transaction commits per second. |
JMX | jmx["{#JMXOBJ}",CacheTxCommits] Preprocessing: - CHANGEPERSECOND |
Ignite | Cache group [{#JMXGROUP}]: Cache transaction rollbacks, rate | The number of transaction rollback per second. |
JMX | jmx["{#JMXOBJ}",CacheTxRollbacks] Preprocessing: - CHANGEPERSECOND |
Ignite | Cache group [{#JMXGROUP}]: Cache size | The number of non-null values in the cache as a long value. |
JMX | jmx["{#JMXOBJ}",CacheSize] |
Ignite | Cache group [{#JMXGROUP}]: Cache heap entries | The number of entries in heap memory. |
JMX | jmx["{#JMXOBJ}",HeapEntriesCount] Preprocessing: - CHANGEPERSECOND |
Ignite | Data region {#JMXNAME}: Allocation, rate | Allocation rate (pages per second) averaged across rateTimeInternal. |
JMX | jmx["{#JMXOBJ}",AllocationRate] |
Ignite | Data region {#JMXNAME}: Allocated, bytes | Total size of memory allocated in bytes. |
JMX | jmx["{#JMXOBJ}",TotalAllocatedSize] |
Ignite | Data region {#JMXNAME}: Dirty pages | Number of pages in memory not yet synchronized with persistent storage. |
JMX | jmx["{#JMXOBJ}",DirtyPages] |
Ignite | Data region {#JMXNAME}: Eviction, rate | Eviction rate (pages per second). |
JMX | jmx["{#JMXOBJ}",EvictionRate] |
Ignite | Data region {#JMXNAME}: Size, max | Maximum memory region size defined by its data region. |
JMX | jmx["{#JMXOBJ}",MaxSize] |
Ignite | Data region {#JMXNAME}: Offheap size | Offheap size in bytes. |
JMX | jmx["{#JMXOBJ}",OffHeapSize] |
Ignite | Data region {#JMXNAME}: Offheap used size | Total used offheap size in bytes. |
JMX | jmx["{#JMXOBJ}",OffheapUsedSize] |
Ignite | Data region {#JMXNAME}: Pages fill factor | The percentage of the used space. |
JMX | jmx["{#JMXOBJ}",PagesFillFactor] |
Ignite | Data region {#JMXNAME}: Pages replace, rate | Rate at which pages in memory are replaced with pages from persistent storage (pages per second). |
JMX | jmx["{#JMXOBJ}",PagesReplaceRate] |
Ignite | Data region {#JMXNAME}: Used checkpoint buffer size | Used checkpoint buffer size in bytes. |
JMX | jmx["{#JMXOBJ}",UsedCheckpointBufferSize] |
Ignite | Data region {#JMXNAME}: Checkpoint buffer size | Total size in bytes for checkpoint buffer. |
JMX | jmx["{#JMXOBJ}",CheckpointBufferSize] |
Ignite | Cache group [{#JMXNAME}]: Backups | Count of backups configured for cache group. |
JMX | jmx["{#JMXOBJ}",Backups] |
Ignite | Cache group [{#JMXNAME}]: Partitions | Count of partitions for cache group. |
JMX | jmx["{#JMXOBJ}",Partitions] |
Ignite | Cache group [{#JMXNAME}]: Caches | List of caches. |
JMX | jmx["{#JMXOBJ}",Caches] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Ignite | Cache group [{#JMXNAME}]: Local node partitions, moving | Count of partitions with state MOVING for this cache group located on this node. |
JMX | jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount] |
Ignite | Cache group [{#JMXNAME}]: Local node partitions, renting | Count of partitions with state RENTING for this cache group located on this node. |
JMX | jmx["{#JMXOBJ}",LocalNodeRentingPartitionsCount] |
Ignite | Cache group [{#JMXNAME}]: Local node entries, renting | Count of entries remains to evict in RENTING partitions located on this node for this cache group. |
JMX | jmx["{#JMXOBJ}",LocalNodeRentingEntriesCount] |
Ignite | Cache group [{#JMXNAME}]: Local node partitions, owning | Count of partitions with state OWNING for this cache group located on this node. |
JMX | jmx["{#JMXOBJ}",LocalNodeOwningPartitionsCount] |
Ignite | Cache group [{#JMXNAME}]: Partition copies, min | Minimum number of partition copies for all partitions of this cache group. |
JMX | jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies] |
Ignite | Cache group [{#JMXNAME}]: Partition copies, max | Maximum number of partition copies for all partitions of this cache group. |
JMX | jmx["{#JMXOBJ}",MaximumNumberOfPartitionCopies] |
Ignite | Thread pool [{#JMXNAME}]: Queue size | Current size of the execution queue. |
JMX | jmx["{#JMXOBJ}",QueueSize] |
Ignite | Thread pool [{#JMXNAME}]: Pool size | Current number of threads in the pool. |
JMX | jmx["{#JMXOBJ}",PoolSize] |
Ignite | Thread pool [{#JMXNAME}]: Pool size, max | The maximum allowed number of threads. |
JMX | jmx["{#JMXOBJ}",MaximumPoolSize] |
Ignite | Thread pool [{#JMXNAME}]: Pool size, core | The core number of threads. |
JMX | jmx["{#JMXOBJ}",CorePoolSize] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Ignite [{#JMXIGNITEINSTANCENAME}]: has been restarted | Uptime is less than 10 minutes. |
last(/Ignite by JMX/jmx["{#JMXOBJ}",UpTime])<10m |
INFO | Manual close: YES |
Ignite [{#JMXIGNITEINSTANCENAME}]: Failed to fetch info data | Zabbix has not received data for items for the last 10 minutes. |
nodata(/Ignite by JMX/jmx["{#JMXOBJ}",UpTime],10m)=1 |
WARNING | Manual close: YES |
Ignite [{#JMXIGNITEINSTANCENAME}]: Version has changed | Ignite [{#JMXIGNITEINSTANCENAME}] version has changed. Ack to close. |
last(/Ignite by JMX/jmx["{#JMXOBJ}",FullVersion],#1)<>last(/Ignite by JMX/jmx["{#JMXOBJ}",FullVersion],#2) and length(last(/Ignite by JMX/jmx["{#JMXOBJ}",FullVersion]))>0 |
INFO | Manual close: YES |
Ignite [{#JMXIGNITEINSTANCENAME}]: Server node left the topology | One or more server node left the topology. Ack to close. |
change(/Ignite by JMX/jmx["{#JMXOBJ}",TotalServerNodes])<0 |
WARNING | Manual close: YES |
Ignite [{#JMXIGNITEINSTANCENAME}]: Server node added to the topology | One or more server node added to the topology. Ack to close. |
change(/Ignite by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>0 |
INFO | Manual close: YES |
Ignite [{#JMXIGNITEINSTANCENAME}]: There are nodes is not in topology | One or more server node left the topology. Ack to close. |
last(/Ignite by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>last(/Ignite by JMX/jmx["{#JMXOBJ}",TotalBaselineNodes]) |
INFO | Manual close: YES |
Ignite [{#JMXIGNITEINSTANCENAME}]: Number of queued jobs is too high | Number of queued jobs is over {$IGNITE.JOBS.QUEUE.MAX.WARN}. |
min(/Ignite by JMX/jmx["{#JMXOBJ}",CurrentWaitingJobs],15m) > {$IGNITE.JOBS.QUEUE.MAX.WARN} |
WARNING | |
Ignite [{#JMXIGNITEINSTANCENAME}]: PME duration is too long | PME duration is over {$IGNITE.PME.DURATION.MAX.WARN}ms. |
min(/Ignite by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$IGNITE.PME.DURATION.MAX.WARN} |
WARNING | Depends on: - Ignite [{#JMXIGNITEINSTANCENAME}]: PME duration is too long |
Ignite [{#JMXIGNITEINSTANCENAME}]: PME duration is too long | PME duration is over {$IGNITE.PME.DURATION.MAX.HIGH}ms. Looks like PME is hung. |
min(/Ignite by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$IGNITE.PME.DURATION.MAX.HIGH} |
HIGH | |
Ignite [{#JMXIGNITEINSTANCENAME}]: Number of running threads is too high | Number of running threads is over {$IGNITE.THREADS.COUNT.MAX.WARN}. |
min(/Ignite by JMX/jmx["{#JMXOBJ}",CurrentThreadCount],15m) > {$IGNITE.THREADS.COUNT.MAX.WARN} |
WARNING | Depends on: - Ignite [{#JMXIGNITEINSTANCENAME}]: PME duration is too long |
Ignite [{#JMXIGNITEINSTANCENAME}]: Coordinator has changed | Ignite [{#JMXIGNITEINSTANCENAME}] version has changed. Ack to close. |
last(/Ignite by JMX/jmx["{#JMXOBJ}",Coordinator],#1)<>last(/Ignite by JMX/jmx["{#JMXOBJ}",Coordinator],#2) and length(last(/Ignite by JMX/jmx["{#JMXOBJ}",Coordinator]))>0 |
WARNING | Manual close: YES |
Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m | - |
min(/Ignite by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m)>0 and max(/Ignite by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)=0 |
AVERAGE | |
Cache group [{#JMXGROUP}]: Success transactions less than rollbacks for 5m | - |
min(/Ignite by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m) > max(/Ignite by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m) |
WARNING | Depends on: - Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m |
Cache group [{#JMXGROUP}]: All entries are in heap | All entries are in heap. Possibly you use eager queries it may cause out of memory exceptions for big caches. Ack to close. |
last(/Ignite by JMX/jmx["{#JMXOBJ}",CacheSize])=last(/Ignite by JMX/jmx["{#JMXOBJ}",HeapEntriesCount]) |
INFO | Manual close: YES |
Data region {#JMXNAME}: Node started to evict pages | You store more data than region can accommodate. Data started to move to disk it can make requests work slower. Ack to close. |
min(/Ignite by JMX/jmx["{#JMXOBJ}",EvictionRate],5m)>0 |
INFO | Manual close: YES |
Data region {#JMXNAME}: Data region utilization is too high | Data region utilization is high. Increase data region size or delete any data. |
min(/Ignite by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/Ignite by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$IGNITE.DATA.REGION.PUSED.MAX.WARN} |
WARNING | Depends on: - Data region {#JMXNAME}: Data region utilization is too high |
Data region {#JMXNAME}: Data region utilization is too high | Data region utilization is high. Increase data region size or delete any data. |
min(/Ignite by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/Ignite by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$IGNITE.DATA.REGION.PUSED.MAX.HIGH} |
HIGH | |
Data region {#JMXNAME}: Pages replace rate more than 0 | There is more data than DataRegionMaxSize. Cluster started to replace pages in memory. Page replacement can slow down operations. |
min(/Ignite by JMX/jmx["{#JMXOBJ}",PagesReplaceRate],5m)>0 |
WARNING | |
Data region {#JMXNAME}: Checkpoint buffer utilization is too high | Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization. |
min(/Ignite by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/Ignite by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$IGNITE.CHECKPOINT.PUSED.MAX.WARN} |
WARNING | Depends on: - Data region {#JMXNAME}: Checkpoint buffer utilization is too high |
Data region {#JMXNAME}: Checkpoint buffer utilization is too high | Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization. |
min(/Ignite by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/Ignite by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$IGNITE.CHECKPOINT.PUSED.MAX.HIGH} |
HIGH | |
Cache group [{#JMXNAME}]: One or more backups are unavailable | - |
min(/Ignite by JMX/jmx["{#JMXOBJ}",Backups],5m)>=max(/Ignite by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],5m) |
WARNING | |
Cache group [{#JMXNAME}]: List of caches has changed | List of caches has changed. Significant changes have occurred in the cluster. Ack to close. |
last(/Ignite by JMX/jmx["{#JMXOBJ}",Caches],#1)<>last(/Ignite by JMX/jmx["{#JMXOBJ}",Caches],#2) and length(last(/Ignite by JMX/jmx["{#JMXOBJ}",Caches]))>0 |
INFO | Manual close: YES |
Cache group [{#JMXNAME}]: Rebalance in progress | Ack to close. |
max(/Ignite by JMX/jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount],30m)>0 |
INFO | Manual close: YES |
Cache group [{#JMXNAME}]: There is no copy for partitions | - |
max(/Ignite by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],30m)=0 |
WARNING | |
Thread pool [{#JMXNAME}]: Too many messages in queue | Number of messages in queue more than {$IGNITE.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}. |
min(/Ignite by JMX/jmx["{#JMXOBJ}",QueueSize],5m) > {$IGNITE.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"} |
AVERAGE |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
Official JMX Template for GridGain In-Memory Computing Platform.
This template is based on the original template developed by Igor Akkuratov, Senior Engineer at GridGain Systems and GridGain In-Memory Computing Platform Contributor.
This template was tested on:
See Zabbix template operation for basic instructions.
This template works with standalone and cluster instances. Metrics are collected by JMX. All metrics are discoverable.
-DIGNITE_MBEAN_APPEND_CLASS_LOADER_ID=false
to will exclude one level with Classloader name. You can configure Cache and Data Region metrics which you want using official guide.No specific Zabbix configuration is required.
Name | Description | Default | |||
---|---|---|---|---|---|
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH} | The maximum percent of checkpoint buffer utilization for high trigger expression. |
80 |
|||
{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN} | The maximum percent of checkpoint buffer utilization for warning trigger expression. |
66 |
|||
{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH} | The maximum percent of data region utilization for high trigger expression. |
90 |
|||
{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN} | The maximum percent of data region utilization for warning trigger expression. |
80 |
|||
{$GRIDGAIN.JOBS.QUEUE.MAX.WARN} | The maximum number of queued jobs for trigger expression. |
10 |
|||
{$GRIDGAIN.LLD.FILTER.CACHE.MATCHES} | Filter of discoverable cache groups. |
.* |
|||
{$GRIDGAIN.LLD.FILTER.CACHE.NOT_MATCHES} | Filter to exclude discovered cache groups. |
CHANGE_IF_NEEDED |
|||
{$GRIDGAIN.LLD.FILTER.DATA.REGION.MATCHES} | Filter of discoverable data regions. |
.* |
|||
{$GRIDGAIN.LLD.FILTER.DATA.REGION.NOT_MATCHES} | Filter to exclude discovered data regions. |
`^(sysMemPlc | TxLog)$` | ||
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.MATCHES} | Filter of discoverable thread pools. |
.* |
|||
{$GRIDGAIN.LLD.FILTER.THREAD.POOL.NOT_MATCHES} | Filter to exclude discovered thread pools. |
`^(GridCallbackExecutor | GridRebalanceStripedExecutor | GridDataStreamExecutor | StripedExecutor)$` |
{$GRIDGAIN.PASSWORD} | - |
<secret> |
|||
{$GRIDGAIN.PME.DURATION.MAX.HIGH} | The maximum PME duration in ms for high trigger expression. |
60000 |
|||
{$GRIDGAIN.PME.DURATION.MAX.WARN} | The maximum PME duration in ms for warning trigger expression. |
10000 |
|||
{$GRIDGAIN.THREAD.QUEUE.MAX.WARN} | Threshold for thread pool queue size. Can be used with thread pool name as context. |
1000 |
|||
{$GRIDGAIN.THREADS.COUNT.MAX.WARN} | The maximum number of running threads for trigger expression. |
1000 |
|||
{$GRIDGAIN.USER} | - |
zabbix |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Cache groups | - |
JMX | jmx.discovery[beans,"org.apache:group=\"Cache groups\",*"] Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: Filter: AND- {#JMXNAME} MATCHESREGEX - {#JMXNAME} NOTMATCHES_REGEX |
Cache metrics | - |
JMX | jmx.discovery[beans,"org.apache:name=\"org.apache.gridgain.internal.processors.cache.CacheLocalMetricsMXBeanImpl\",*"] Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: Filter: AND- {#JMXGROUP} MATCHESREGEX - {#JMXGROUP} NOTMATCHES_REGEX |
Cluster metrics | - |
JMX | jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterMetricsMXBeanImpl,*"] Preprocessing: - JAVASCRIPT: |
Data region metrics | - |
JMX | jmx.discovery[beans,"org.apache:group=DataRegionMetrics,*"] Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: Filter: AND- {#JMXNAME} MATCHESREGEX - {#JMXNAME} NOTMATCHES_REGEX |
GridGain kernal metrics | - |
JMX | jmx.discovery[beans,"org.apache:group=Kernal,name=IgniteKernal,*"] Preprocessing: - JAVASCRIPT: |
Local node metrics | - |
JMX | jmx.discovery[beans,"org.apache:group=Kernal,name=ClusterLocalNodeMetricsMXBeanImpl,*"] Preprocessing: - JAVASCRIPT: |
TCP Communication SPI metrics | - |
JMX | jmx.discovery[beans,"org.apache:group=SPIs,name=TcpCommunicationSpi,*"] Preprocessing: - JAVASCRIPT: |
TCP discovery SPI | - |
JMX | jmx.discovery[beans,"org.apache:group=SPIs,name=TcpDiscoverySpi,*"] Preprocessing: - JAVASCRIPT: |
Thread pool metrics | - |
JMX | jmx.discovery[beans,"org.apache:group=\"Thread Pools\",*"] Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: Filter: AND- {#JMXNAME} MATCHESREGEX - {#JMXNAME} NOTMATCHES_REGEX |
Transaction metrics | - |
JMX | jmx.discovery[beans,"org.apache:group=TransactionMetrics,name=TransactionMetricsMxBeanImpl,*"] Preprocessing: - JAVASCRIPT: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Uptime | Uptime of GridGain instance. |
JMX | jmx["{#JMXOBJ}",UpTime] Preprocessing: - MULTIPLIER: |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Version | Version of GridGain instance. |
JMX | jmx["{#JMXOBJ}",FullVersion] Preprocessing: - REGEX: - DISCARDUNCHANGEDHEARTBEAT: |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Local node ID | Unique identifier for this node within grid. |
JMX | jmx["{#JMXOBJ}",LocalNodeId] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Baseline | Total baseline nodes that are registered in the baseline topology. |
JMX | jmx["{#JMXOBJ}",TotalBaselineNodes] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Active baseline | The number of nodes that are currently active in the baseline topology. |
JMX | jmx["{#JMXOBJ}",ActiveBaselineNodes] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Client | The number of client nodes in the cluster. |
JMX | jmx["{#JMXOBJ}",TotalClientNodes] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, total | Total number of nodes. |
JMX | jmx["{#JMXOBJ}",TotalNodes] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes, Server | The number of server nodes in the cluster. |
JMX | jmx["{#JMXOBJ}",TotalServerNodes] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, current | Number of cancelled jobs that are still running. |
JMX | jmx["{#JMXOBJ}",CurrentCancelledJobs] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejected, current | Number of jobs rejected after more recent collision resolution operation. |
JMX | jmx["{#JMXOBJ}",CurrentRejectedJobs] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs waiting, current | Number of queued jobs currently waiting to be executed. |
JMX | jmx["{#JMXOBJ}",CurrentWaitingJobs] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs active, current | Number of currently active jobs concurrently executing on the node. |
JMX | jmx["{#JMXOBJ}",CurrentActiveJobs] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs executed, rate | Total number of jobs handled by the node per second. |
JMX | jmx["{#JMXOBJ}",TotalExecutedJobs] Preprocessing: - CHANGEPERSECOND |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs cancelled, rate | Total number of jobs cancelled by the node per second. |
JMX | jmx["{#JMXOBJ}",TotalCancelledJobs] Preprocessing: - CHANGEPERSECOND |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Jobs rejects, rate | Total number of jobs this node rejects during collision resolution operations since node startup per second. |
JMX | jmx["{#JMXOBJ}",TotalRejectedJobs] Preprocessing: - CHANGEPERSECOND |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration, current | Current PME duration in milliseconds. |
JMX | jmx["{#JMXOBJ}",CurrentPmeDuration] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Threads count, current | Current number of live threads. |
JMX | jmx["{#JMXOBJ}",CurrentThreadCount] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Heap memory used | Current heap size that is used for object allocation. |
JMX | jmx["{#JMXOBJ}",HeapMemoryUsed] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator | Current coordinator UUID. |
JMX | jmx["{#JMXOBJ}",Coordinator] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes left | Nodes left count. |
JMX | jmx["{#JMXOBJ}",NodesLeft] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes joined | Nodes join count. |
JMX | jmx["{#JMXOBJ}",NodesJoined] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Nodes failed | Nodes failed count. |
JMX | jmx["{#JMXOBJ}",NodesFailed] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery message worker queue | Message worker queue current size. |
JMX | jmx["{#JMXOBJ}",MessageWorkerQueueSize] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery reconnect, rate | Number of times node tries to (re)establish connection to another node per second. |
JMX | jmx["{#JMXOBJ}",ReconnectCount] Preprocessing: - CHANGEPERSECOND |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: TotalProcessedMessages | The number of messages received per second. |
JMX | jmx["{#JMXOBJ}",TotalProcessedMessages] Preprocessing: - CHANGEPERSECOND |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Discovery messages received, rate | The number of messages processed per second. |
JMX | jmx["{#JMXOBJ}",TotalReceivedMessages] Preprocessing: - CHANGEPERSECOND |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Communication outbound messages queue | Outbound messages queue size. |
JMX | jmx["{#JMXOBJ}",OutboundMessagesQueueSize] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages received, rate | The number of messages received per second. |
JMX | jmx["{#JMXOBJ}",ReceivedMessagesCount] Preprocessing: - CHANGEPERSECOND |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Communication messages sent, rate | The number of messages sent per second. |
JMX | jmx["{#JMXOBJ}",SentMessagesCount] Preprocessing: - CHANGEPERSECOND |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Communication reconnect rate | Gets maximum number of reconnect attempts used when establishing connection with remote nodes per second. |
JMX | jmx["{#JMXOBJ}",ReconnectCount] Preprocessing: - CHANGEPERSECOND |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Locked keys | The number of keys locked on the node. |
JMX | jmx["{#JMXOBJ}",LockedKeysNumber] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions owner, current | The number of active transactions for which this node is the initiator. |
JMX | jmx["{#JMXOBJ}",OwnerTransactionsNumber] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions holding lock, current | The number of active transactions holding at least one key lock. |
JMX | jmx["{#JMXOBJ}",TransactionsHoldingLockNumber] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions rolledback, rate | The number of transactions which were rollback per second. |
JMX | jmx["{#JMXOBJ}",TransactionsRolledBackNumber] |
GridGain | GridGain [{#JMXIGNITEINSTANCENAME}]: Transactions committed, rate | The number of transactions which were committed per second. |
JMX | jmx["{#JMXOBJ}",TransactionsCommittedNumber] |
GridGain | Cache group [{#JMXGROUP}]: Cache gets, rate | The number of gets to the cache per second. |
JMX | jmx["{#JMXOBJ}",CacheGets] Preprocessing: - CHANGEPERSECOND |
GridGain | Cache group [{#JMXGROUP}]: Cache puts, rate | The number of puts to the cache per second. |
JMX | jmx["{#JMXOBJ}",CachePuts] Preprocessing: - CHANGEPERSECOND |
GridGain | Cache group [{#JMXGROUP}]: Cache removals, rate | The number of removals from the cache per second. |
JMX | jmx["{#JMXOBJ}",CacheRemovals] Preprocessing: - CHANGEPERSECOND |
GridGain | Cache group [{#JMXGROUP}]: Cache hits, pct | Percentage of successful hits. |
JMX | jmx["{#JMXOBJ}",CacheHitPercentage] |
GridGain | Cache group [{#JMXGROUP}]: Cache misses, pct | Percentage of accesses that failed to find anything. |
JMX | jmx["{#JMXOBJ}",CacheMissPercentage] |
GridGain | Cache group [{#JMXGROUP}]: Cache transaction commits, rate | The number of transaction commits per second. |
JMX | jmx["{#JMXOBJ}",CacheTxCommits] Preprocessing: - CHANGEPERSECOND |
GridGain | Cache group [{#JMXGROUP}]: Cache transaction rollbacks, rate | The number of transaction rollback per second. |
JMX | jmx["{#JMXOBJ}",CacheTxRollbacks] Preprocessing: - CHANGEPERSECOND |
GridGain | Cache group [{#JMXGROUP}]: Cache size | The number of non-null values in the cache as a long value. |
JMX | jmx["{#JMXOBJ}",CacheSize] |
GridGain | Cache group [{#JMXGROUP}]: Cache heap entries | The number of entries in heap memory. |
JMX | jmx["{#JMXOBJ}",HeapEntriesCount] Preprocessing: - CHANGEPERSECOND |
GridGain | Data region {#JMXNAME}: Allocation, rate | Allocation rate (pages per second) averaged across rateTimeInternal. |
JMX | jmx["{#JMXOBJ}",AllocationRate] |
GridGain | Data region {#JMXNAME}: Allocated, bytes | Total size of memory allocated in bytes. |
JMX | jmx["{#JMXOBJ}",TotalAllocatedSize] |
GridGain | Data region {#JMXNAME}: Dirty pages | Number of pages in memory not yet synchronized with persistent storage. |
JMX | jmx["{#JMXOBJ}",DirtyPages] |
GridGain | Data region {#JMXNAME}: Eviction, rate | Eviction rate (pages per second). |
JMX | jmx["{#JMXOBJ}",EvictionRate] |
GridGain | Data region {#JMXNAME}: Size, max | Maximum memory region size defined by its data region. |
JMX | jmx["{#JMXOBJ}",MaxSize] |
GridGain | Data region {#JMXNAME}: Offheap size | Offheap size in bytes. |
JMX | jmx["{#JMXOBJ}",OffHeapSize] |
GridGain | Data region {#JMXNAME}: Offheap used size | Total used offheap size in bytes. |
JMX | jmx["{#JMXOBJ}",OffheapUsedSize] |
GridGain | Data region {#JMXNAME}: Pages fill factor | The percentage of the used space. |
JMX | jmx["{#JMXOBJ}",PagesFillFactor] |
GridGain | Data region {#JMXNAME}: Pages replace, rate | Rate at which pages in memory are replaced with pages from persistent storage (pages per second). |
JMX | jmx["{#JMXOBJ}",PagesReplaceRate] |
GridGain | Data region {#JMXNAME}: Used checkpoint buffer size | Used checkpoint buffer size in bytes. |
JMX | jmx["{#JMXOBJ}",UsedCheckpointBufferSize] |
GridGain | Data region {#JMXNAME}: Checkpoint buffer size | Total size in bytes for checkpoint buffer. |
JMX | jmx["{#JMXOBJ}",CheckpointBufferSize] |
GridGain | Cache group [{#JMXNAME}]: Backups | Count of backups configured for cache group. |
JMX | jmx["{#JMXOBJ}",Backups] |
GridGain | Cache group [{#JMXNAME}]: Partitions | Count of partitions for cache group. |
JMX | jmx["{#JMXOBJ}",Partitions] |
GridGain | Cache group [{#JMXNAME}]: Caches | List of caches. |
JMX | jmx["{#JMXOBJ}",Caches] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
GridGain | Cache group [{#JMXNAME}]: Local node partitions, moving | Count of partitions with state MOVING for this cache group located on this node. |
JMX | jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount] |
GridGain | Cache group [{#JMXNAME}]: Local node partitions, renting | Count of partitions with state RENTING for this cache group located on this node. |
JMX | jmx["{#JMXOBJ}",LocalNodeRentingPartitionsCount] |
GridGain | Cache group [{#JMXNAME}]: Local node entries, renting | Count of entries remains to evict in RENTING partitions located on this node for this cache group. |
JMX | jmx["{#JMXOBJ}",LocalNodeRentingEntriesCount] |
GridGain | Cache group [{#JMXNAME}]: Local node partitions, owning | Count of partitions with state OWNING for this cache group located on this node. |
JMX | jmx["{#JMXOBJ}",LocalNodeOwningPartitionsCount] |
GridGain | Cache group [{#JMXNAME}]: Partition copies, min | Minimum number of partition copies for all partitions of this cache group. |
JMX | jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies] |
GridGain | Cache group [{#JMXNAME}]: Partition copies, max | Maximum number of partition copies for all partitions of this cache group. |
JMX | jmx["{#JMXOBJ}",MaximumNumberOfPartitionCopies] |
GridGain | Thread pool [{#JMXNAME}]: Queue size | Current size of the execution queue. |
JMX | jmx["{#JMXOBJ}",QueueSize] |
GridGain | Thread pool [{#JMXNAME}]: Pool size | Current number of threads in the pool. |
JMX | jmx["{#JMXOBJ}",PoolSize] |
GridGain | Thread pool [{#JMXNAME}]: Pool size, max | The maximum allowed number of threads. |
JMX | jmx["{#JMXOBJ}",MaximumPoolSize] |
GridGain | Thread pool [{#JMXNAME}]: Pool size, core | The core number of threads. |
JMX | jmx["{#JMXOBJ}",CorePoolSize] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
GridGain [{#JMXIGNITEINSTANCENAME}]: has been restarted | Uptime is less than 10 minutes. |
last(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime])<10m |
INFO | Manual close: YES |
GridGain [{#JMXIGNITEINSTANCENAME}]: Failed to fetch info data | Zabbix has not received data for items for the last 10 minutes. |
nodata(/GridGain by JMX/jmx["{#JMXOBJ}",UpTime],10m)=1 |
WARNING | Manual close: YES |
GridGain [{#JMXIGNITEINSTANCENAME}]: Version has changed | GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Ack to close. |
last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",FullVersion]))>0 |
INFO | Manual close: YES |
GridGain [{#JMXIGNITEINSTANCENAME}]: Server node left the topology | One or more server node left the topology. Ack to close. |
change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])<0 |
WARNING | Manual close: YES |
GridGain [{#JMXIGNITEINSTANCENAME}]: Server node added to the topology | One or more server node added to the topology. Ack to close. |
change(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>0 |
INFO | Manual close: YES |
GridGain [{#JMXIGNITEINSTANCENAME}]: There are nodes is not in topology | One or more server node left the topology. Ack to close. |
last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalServerNodes])>last(/GridGain by JMX/jmx["{#JMXOBJ}",TotalBaselineNodes]) |
INFO | Manual close: YES |
GridGain [{#JMXIGNITEINSTANCENAME}]: Number of queued jobs is too high | Number of queued jobs is over {$GRIDGAIN.JOBS.QUEUE.MAX.WARN}. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentWaitingJobs],15m) > {$GRIDGAIN.JOBS.QUEUE.MAX.WARN} |
WARNING | |
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long | PME duration is over {$GRIDGAIN.PME.DURATION.MAX.WARN}ms. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.WARN} |
WARNING | Depends on: - GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long |
GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long | PME duration is over {$GRIDGAIN.PME.DURATION.MAX.HIGH}ms. Looks like PME is hung. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentPmeDuration],5m) > {$GRIDGAIN.PME.DURATION.MAX.HIGH} |
HIGH | |
GridGain [{#JMXIGNITEINSTANCENAME}]: Number of running threads is too high | Number of running threads is over {$GRIDGAIN.THREADS.COUNT.MAX.WARN}. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",CurrentThreadCount],15m) > {$GRIDGAIN.THREADS.COUNT.MAX.WARN} |
WARNING | Depends on: - GridGain [{#JMXIGNITEINSTANCENAME}]: PME duration is too long |
GridGain [{#JMXIGNITEINSTANCENAME}]: Coordinator has changed | GridGain [{#JMXIGNITEINSTANCENAME}] version has changed. Ack to close. |
last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Coordinator]))>0 |
WARNING | Manual close: YES |
Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m | - |
min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m)>0 and max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m)=0 |
AVERAGE | |
Cache group [{#JMXGROUP}]: Success transactions less than rollbacks for 5m | - |
min(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxRollbacks],5m) > max(/GridGain by JMX/jmx["{#JMXOBJ}",CacheTxCommits],5m) |
WARNING | Depends on: - Cache group [{#JMXGROUP}]: There are no success transactions for cache for 5m |
Cache group [{#JMXGROUP}]: All entries are in heap | All entries are in heap. Possibly you use eager queries it may cause out of memory exceptions for big caches. Ack to close. |
last(/GridGain by JMX/jmx["{#JMXOBJ}",CacheSize])=last(/GridGain by JMX/jmx["{#JMXOBJ}",HeapEntriesCount]) |
INFO | Manual close: YES |
Data region {#JMXNAME}: Node started to evict pages | You store more data than region can accommodate. Data started to move to disk it can make requests work slower. Ack to close. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",EvictionRate],5m)>0 |
INFO | Manual close: YES |
Data region {#JMXNAME}: Data region utilization is too high | Data region utilization is high. Increase data region size or delete any data. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.WARN} |
WARNING | Depends on: - Data region {#JMXNAME}: Data region utilization is too high |
Data region {#JMXNAME}: Data region utilization is too high | Data region utilization is high. Increase data region size or delete any data. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",OffheapUsedSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",OffHeapSize])*100>{$GRIDGAIN.DATA.REGION.PUSED.MAX.HIGH} |
HIGH | |
Data region {#JMXNAME}: Pages replace rate more than 0 | There is more data than DataRegionMaxSize. Cluster started to replace pages in memory. Page replacement can slow down operations. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",PagesReplaceRate],5m)>0 |
WARNING | |
Data region {#JMXNAME}: Checkpoint buffer utilization is too high | Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.WARN} |
WARNING | Depends on: - Data region {#JMXNAME}: Checkpoint buffer utilization is too high |
Data region {#JMXNAME}: Checkpoint buffer utilization is too high | Checkpoint buffer utilization is high. Threads will be throttled to avoid buffer overflow. It can be caused by high disk utilization. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",UsedCheckpointBufferSize],5m)/last(/GridGain by JMX/jmx["{#JMXOBJ}",CheckpointBufferSize])*100>{$GRIDGAIN.CHECKPOINT.PUSED.MAX.HIGH} |
HIGH | |
Cache group [{#JMXNAME}]: One or more backups are unavailable | - |
min(/GridGain by JMX/jmx["{#JMXOBJ}",Backups],5m)>=max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],5m) |
WARNING | |
Cache group [{#JMXNAME}]: List of caches has changed | List of caches has changed. Significant changes have occurred in the cluster. Ack to close. |
last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#1)<>last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches],#2) and length(last(/GridGain by JMX/jmx["{#JMXOBJ}",Caches]))>0 |
INFO | Manual close: YES |
Cache group [{#JMXNAME}]: Rebalance in progress | Ack to close. |
max(/GridGain by JMX/jmx["{#JMXOBJ}",LocalNodeMovingPartitionsCount],30m)>0 |
INFO | Manual close: YES |
Cache group [{#JMXNAME}]: There is no copy for partitions | - |
max(/GridGain by JMX/jmx["{#JMXOBJ}",MinimumNumberOfPartitionCopies],30m)=0 |
WARNING | |
Thread pool [{#JMXNAME}]: Too many messages in queue | Number of messages in queue more than {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"}. |
min(/GridGain by JMX/jmx["{#JMXOBJ}",QueueSize],5m) > {$GRIDGAIN.THREAD.QUEUE.MAX.WARN:"{#JMXNAME}"} |
AVERAGE |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
The template to monitor CockroachDB nodes by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template CockroachDB node by HTTP
— collects metrics by HTTP agent from Prometheus endpoint and health endpoints.
This template was tested on:
See Zabbix template operation for basic instructions.
Internal node metrics are collected from Prometheus /_status/vars endpoint. Node health metrics are collected from /health and /health?ready=1 endpoints. Template doesn't require usage of session token.
Don't forget change macros {$COCKROACHDB.API.SCHEME} according to your situation (secure/insecure node). Also, see the Macros section for a list of macros used to set trigger values.
NOTE. Some metrics may not be collected depending on your CockroachDB version and configuration.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$COCKROACHDB.API.PORT} | The port of CockroachDB API and Prometheus endpoint. |
8080 |
{$COCKROACHDB.API.SCHEME} | Request scheme which may be http or https. |
http |
{$COCKROACHDB.CERT.CA.EXPIRY.WARN} | Number of days until the CA certificate expires. |
90 |
{$COCKROACHDB.CERT.NODE.EXPIRY.WARN} | Number of days until the node certificate expires. |
30 |
{$COCKROACHDB.CLOCK.OFFSET.MAX.WARN} | Maximum clock offset of the node against the rest of the cluster in milliseconds for trigger expression. |
300 |
{$COCKROACHDB.OPEN.FDS.MAX.WARN} | Maximum percentage of used file descriptors. |
80 |
{$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN} | Maximum number of SQL statements errors for trigger expression. |
2 |
{$COCKROACHDB.STORE.USED.MIN.CRIT} | The critical threshold of the available disk space in percent. |
10 |
{$COCKROACHDB.STORE.USED.MIN.WARN} | The warning threshold of the available disk space in percent. |
20 |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Storage metrics discovery | Discover per store metrics. |
DEPENDENT | cockroachdb.store.discovery Preprocessing: - PROMETHEUSTOJSON: - DISCARDUNCHANGEDHEARTBEAT: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
CockroachDB | CockroachDB: Service ping | Check if HTTP/HTTPS service accepts TCP connections. |
SIMPLE | net.tcp.service["{$COCKROACHDB.API.SCHEME}","{HOST.CONN}","{$COCKROACHDB.API.PORT}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
CockroachDB | CockroachDB: Clock offset | Mean clock offset of the node against the rest of the cluster. |
DEPENDENT | cockroachdb.clock.offset Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Version | Build information. |
DEPENDENT | cockroachdb.version Preprocessing: - PROMETHEUSPATTERN: - DISCARDUNCHANGED_HEARTBEAT: |
CockroachDB | CockroachDB: CPU: System time | System CPU time. |
DEPENDENT | cockroachdb.cpu.systemtime Preprocessing: - PROMETHEUS PATTERN:sys_cpu_sys_ns : value : `</p><p>- CHANGE_PER_SECOND</p><p>- MULTIPLIER: 0.000000001` |
CockroachDB | CockroachDB: CPU: User time | User CPU time. |
DEPENDENT | cockroachdb.cpu.usertime Preprocessing: - PROMETHEUS PATTERN:sys_cpu_user_ns : value : `</p><p>- CHANGE_PER_SECOND</p><p>- MULTIPLIER: 0.000000001` |
CockroachDB | CockroachDB: CPU: Utilization | CPU utilization in %. |
DEPENDENT | cockroachdb.cpu.util Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Disk: IOPS in progress, rate | Number of disk IO operations currently in progress on this host. |
DEPENDENT | cockroachdb.disk.iops.inprogress.rate Preprocessing: - PROMETHEUS PATTERN:sys_host_disk_iopsinprogress : value : ``- CHANGEPERSECOND |
CockroachDB | CockroachDB: Disk: Reads, rate | Bytes read from all disks per second since this process started |
DEPENDENT | cockroachdb.disk.read.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: Disk: Read IOPS, rate | Number of disk read operations per second across all disks since this process started. |
DEPENDENT | cockroachdb.disk.iops.read.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: Disk: Writes, rate | Bytes written to all disks per second since this process started. |
DEPENDENT | cockroachdb.disk.write.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: Disk: Write IOPS, rate | Disk write operations per second across all disks since this process started. |
DEPENDENT | cockroachdb.disk.iops.write.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: File descriptors: Limit | Open file descriptors soft limit of the process. |
DEPENDENT | cockroachdb.descriptors.limit Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: File descriptors: Open | The number of open file descriptors. |
DEPENDENT | cockroachdb.descriptors.open Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: GC: Pause time | The amount of processor time used by Go's garbage collector across all nodes. During garbage collection, application code execution is paused. |
DEPENDENT | cockroachdb.gc.pausetime Preprocessing: - PROMETHEUS PATTERN:sys_gc_pause_ns : value : `</p><p>- CHANGE_PER_SECOND</p><p>- MULTIPLIER: 0.000000001` |
CockroachDB | CockroachDB: GC: Runs, rate | The number of times that Go's garbage collector was invoked per second across all nodes. |
DEPENDENT | cockroachdb.gc.runs.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: Go: Goroutines count | Current number of Goroutines. This count should rise and fall based on load. |
DEPENDENT | cockroachdb.go.goroutines.count Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: KV transactions: Aborted, rate | Number of aborted KV transactions per second. |
DEPENDENT | cockroachdb.kv.transactions.aborted.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: KV transactions: Committed, rate | Number of KV transactions (including 1PC) committed per second. |
DEPENDENT | cockroachdb.kv.transactions.committed.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: Live nodes count | The number of live nodes in the cluster (will be 0 if this node is not itself live). |
DEPENDENT | cockroachdb.livecount Preprocessing: - PROMETHEUS PATTERN:liveness_livenodes : value : `</p><p>- DISCARD_UNCHANGED_HEARTBEAT: 3h` |
CockroachDB | CockroachDB: Liveness heartbeats, rate | Number of successful node liveness heartbeats per second from this node. |
DEPENDENT | cockroachdb.heartbeaths.success.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: Memory: Allocated by Cgo | Current bytes of memory allocated by the C layer. |
DEPENDENT | cockroachdb.memory.cgo.allocated Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Memory: Allocated by Go | Current bytes of memory allocated by the Go layer. |
DEPENDENT | cockroachdb.memory.go.allocated Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Memory: Managed by Cgo | Total bytes of memory managed by the C layer. |
DEPENDENT | cockroachdb.memory.cgo.managed Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Memory: Managed by Go | Total bytes of memory managed by the Go layer. |
DEPENDENT | cockroachdb.memory.go.managed Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Memory: Total usage | Resident set size (RSS) of memory in use by the node. |
DEPENDENT | cockroachdb.memory.total Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Network: Bytes received, rate | Bytes received per second on all network interfaces since this process started. |
DEPENDENT | cockroachdb.network.bytes.received.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: Network: Bytes sent, rate | Bytes sent per second on all network interfaces since this process started. |
DEPENDENT | cockroachdb.network.bytes.sent.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: Time series: Sample errors, rate | The number of errors encountered while attempting to write metrics to disk, per second. |
DEPENDENT | cockroachdb.ts.samples.errors.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: Time series: Samples written, rate | The number of successfully written metric samples per second. |
DEPENDENT | cockroachdb.ts.samples.written.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: Slow requests: DistSender RPCs | Number of RPCs stuck or retrying for a long time. |
DEPENDENT | cockroachdb.slowrequests.rpc Preprocessing: - PROMETHEUS PATTERN:requests_slow_distsender : value : `` |
CockroachDB | CockroachDB: SQL: Bytes received, rate | Total amount of incoming SQL client network traffic in bytes per second. |
DEPENDENT | cockroachdb.sql.bytes.received.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: SQL: Bytes sent, rate | Total amount of outgoing SQL client network traffic in bytes per second. |
DEPENDENT | cockroachdb.sql.bytes.sent.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: Memory: Allocated by SQL | Current SQL statement memory usage for root. |
DEPENDENT | cockroachdb.memory.sql Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: SQL: Schema changes, rate | Total number of SQL DDL statements successfully executed per second. |
DEPENDENT | cockroachdb.sql.schemachanges.rate Preprocessing: - PROMETHEUS PATTERN:sql_ddl_count : value : ``- CHANGEPERSECOND |
CockroachDB | CockroachDB: SQL sessions: Open | Total number of open SQL sessions. |
DEPENDENT | cockroachdb.sql.sessions Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: SQL statements: Active | Total number of SQL statements currently active. |
DEPENDENT | cockroachdb.sql.statements.active Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: SQL statements: DELETE, rate | A moving average of the number of DELETE statements successfully executed per second. |
DEPENDENT | cockroachdb.sql.statements.delete.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: SQL statements: Executed, rate | Number of SQL queries executed per second. |
DEPENDENT | cockroachdb.sql.statements.executed.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: SQL statements: Denials, rate | The number of statements denied per second by a feature flag. |
DEPENDENT | cockroachdb.sql.statements.denials.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: SQL statements: Active flows distributed, rate | The number of distributed SQL flows currently active per second. |
DEPENDENT | cockroachdb.sql.statements.flows.active.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: SQL statements: INSERT, rate | A moving average of the number of INSERT statements successfully executed per second. |
DEPENDENT | cockroachdb.sql.statements.insert.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: SQL statements: SELECT, rate | A moving average of the number of SELECT statements successfully executed per second. |
DEPENDENT | cockroachdb.sql.statements.select.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: SQL statements: UPDATE, rate | A moving average of the number of UPDATE statements successfully executed per second. |
DEPENDENT | cockroachdb.sql.statements.update.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: SQL statements: Contention, rate | Total number of SQL statements that experienced contention per second. |
DEPENDENT | cockroachdb.sql.statements.contention.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: SQL statements: Errors, rate | Total number of statements which returned a planning or runtime error per second. |
DEPENDENT | cockroachdb.sql.statements.errors.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: SQL transactions: Open | Total number of currently open SQL transactions. |
DEPENDENT | cockroachdb.sql.transactions.open Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: SQL transactions: Aborted, rate | Total number of SQL transaction abort errors per second. |
DEPENDENT | cockroachdb.sql.transactions.aborted.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: SQL transactions: Committed, rate | Total number of SQL transaction COMMIT statements successfully executed per second. |
DEPENDENT | cockroachdb.sql.transactions.committed.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: SQL transactions: Initiated, rate | Total number of SQL transaction BEGIN statements successfully executed per second. |
DEPENDENT | cockroachdb.sql.transactions.initiated.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: SQL transactions: Rolled back, rate | Total number of SQL transaction ROLLBACK statements successfully executed per second. |
DEPENDENT | cockroachdb.sql.transactions.rollbacks.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: Uptime | Process uptime. |
DEPENDENT | cockroachdb.uptime Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Node certificate expiration date | Node certificate expires at that date. |
DEPENDENT | cockroachdb.cert.expiredate.node Preprocessing: - PROMETHEUS PATTERN:security_certificate_expiration_node : value : `</p><p>⛔️ON_FAIL: DISCARD_VALUE -> </p><p>- DISCARD_UNCHANGED_HEARTBEAT: 6h` |
CockroachDB | CockroachDB: CA certificate expiration date | CA certificate expires at that date. |
DEPENDENT | cockroachdb.cert.expiredate.ca Preprocessing: - PROMETHEUS PATTERN:security_certificate_expiration_ca : value : `</p><p>⛔️ON_FAIL: DISCARD_VALUE -> </p><p>- DISCARD_UNCHANGED_HEARTBEAT: 6h` |
CockroachDB | CockroachDB: Storage [{#STORE}]: Bytes: Live | Number of logical bytes stored in live key-value pairs on this node. Live data excludes historical and deleted data. |
DEPENDENT | cockroachdb.storage.bytes.[{#STORE},live] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Bytes: System | Number of physical bytes stored in system key-value pairs. |
DEPENDENT | cockroachdb.storage.bytes.[{#STORE},system] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Capacity available | Available storage capacity. |
DEPENDENT | cockroachdb.storage.capacity.[{#STORE},available] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Capacity total | Total storage capacity. This value may be explicitly set using --store. If a store size has not been set, this metric displays the actual disk capacity. |
DEPENDENT | cockroachdb.storage.capacity.[{#STORE},total] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Capacity used | Disk space in use by CockroachDB data on this node. This excludes the Cockroach binary, operating system, and other system files. |
DEPENDENT | cockroachdb.storage.capacity.[{#STORE},used] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Capacity available in % | Available storage capacity in %. |
CALCULATED | cockroachdb.storage.capacity.[{#STORE},available_percent] Expression: last(//cockroachdb.storage.capacity.[{#STORE},available]) / last(//cockroachdb.storage.capacity.[{#STORE},total]) * 100 |
CockroachDB | CockroachDB: Storage [{#STORE}]: Replication: Lease holders | Number of lease holders. |
DEPENDENT | cockroachdb.replication.[{#STORE},leaseholders] Preprocessing: - PROMETHEUS PATTERN:replicas_leaseholders{store="{#STORE}"} : value : `` |
CockroachDB | CockroachDB: Storage [{#STORE}]: Bytes: Logical | Number of logical bytes stored in key-value pairs on this node. This includes historical and deleted data. |
DEPENDENT | cockroachdb.storage.bytes.[{#STORE},logical] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Rebalancing: Average queries, rate | Number of kv-level requests received per second by the store, averaged over a large time period as used in rebalancing decisions. |
DEPENDENT | cockroachdb.rebalancing.queries.average.[{#STORE},rate] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Rebalancing: Average writes, rate | Number of keys written (i.e. applied by raft) per second to the store, averaged over a large time period as used in rebalancing decisions. |
DEPENDENT | cockroachdb.rebalancing.writes.average.[{#STORE},rate] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Queue processing failures: Consistency, rate | Number of replicas which failed processing in the consistency checker queue per second. |
DEPENDENT | cockroachdb.queue.processingfailures.consistency.[{#STORE},rate] Preprocessing: - PROMETHEUS PATTERN:queue_consistency_process_failure{store="{#STORE}"} : value : ``- CHANGEPERSECOND |
CockroachDB | CockroachDB: Storage [{#STORE}]: Queue processing failures: GC, rate | Number of replicas which failed processing in the GC queue per second. |
DEPENDENT | cockroachdb.queue.processingfailures.gc.[{#STORE},rate] Preprocessing: - PROMETHEUS PATTERN:queue_gc_process_failure{store="{#STORE}"} : value : ``- CHANGEPERSECOND |
CockroachDB | CockroachDB: Storage [{#STORE}]: Queue processing failures: Raft log, rate | Number of replicas which failed processing in the Raft log queue per second. |
DEPENDENT | cockroachdb.queue.processingfailures.raftlog.[{#STORE},rate] Preprocessing: - PROMETHEUS PATTERN:queue_raftlog_process_failure{store="{#STORE}"} : value : ``- CHANGEPERSECOND |
CockroachDB | CockroachDB: Storage [{#STORE}]: Queue processing failures: Raft snapshot, rate | Number of replicas which failed processing in the Raft repair queue per second. |
DEPENDENT | cockroachdb.queue.processingfailures.raftsnapshot.[{#STORE},rate] Preprocessing: - PROMETHEUS PATTERN:queue_raftsnapshot_process_failure{store="{#STORE}"} : value : ``- CHANGEPERSECOND |
CockroachDB | CockroachDB: Storage [{#STORE}]: Queue processing failures: Replica GC, rate | Number of replicas which failed processing in the replica GC queue per second. |
DEPENDENT | cockroachdb.queue.processingfailures.gcreplica.[{#STORE},rate] Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: Storage [{#STORE}]: Queue processing failures: Replicate, rate | Number of replicas which failed processing in the replicate queue per second. |
DEPENDENT | cockroachdb.queue.processingfailures.replicate.[{#STORE},rate] Preprocessing: - PROMETHEUS PATTERN:queue_replicate_process_failure{store="{#STORE}"} : value : ``- CHANGEPERSECOND |
CockroachDB | CockroachDB: Storage [{#STORE}]: Queue processing failures: Split, rate | Number of replicas which failed processing in the split queue per second. |
DEPENDENT | cockroachdb.queue.processingfailures.split.[{#STORE},rate] Preprocessing: - PROMETHEUS PATTERN:queue_split_process_failure{store="{#STORE}"} : value : ``- CHANGEPERSECOND |
CockroachDB | CockroachDB: Storage [{#STORE}]: Queue processing failures: Time series maintenance, rate | Number of replicas which failed processing in the time series maintenance queue per second. |
DEPENDENT | cockroachdb.queue.processingfailures.tsmaintenance.[{#STORE},rate] Preprocessing: - PROMETHEUS PATTERN:queue_tsmaintenance_process_failure{store="{#STORE}"} : value : ``- CHANGEPERSECOND |
CockroachDB | CockroachDB: Storage [{#STORE}]: Ranges count | Number of ranges. |
DEPENDENT | cockroachdb.ranges.[{#STORE},count] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Ranges unavailable | Number of ranges with fewer live replicas than needed for quorum. |
DEPENDENT | cockroachdb.ranges.[{#STORE},unavailable] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Ranges underreplicated | Number of ranges with fewer live replicas than the replication target. |
DEPENDENT | cockroachdb.ranges.[{#STORE},underreplicated] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: RocksDB read amplification | The average number of real read operations executed per logical read operation. |
DEPENDENT | cockroachdb.rocksdb.[{#STORE},readamp] Preprocessing: - PROMETHEUS PATTERN:rocksdb_read_amplification{store="{#STORE}"} : value : `` |
CockroachDB | CockroachDB: Storage [{#STORE}]: RocksDB cache hits, rate | Count of block cache hits per second. |
DEPENDENT | cockroachdb.rocksdb.cache.hits.[{#STORE},rate] Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: Storage [{#STORE}]: RocksDB cache misses, rate | Count of block cache misses per second. |
DEPENDENT | cockroachdb.rocksdb.cache.misses.[{#STORE},rate] Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
CockroachDB | CockroachDB: Storage [{#STORE}]: RocksDB cache hit ratio | Block cache hit ratio in %. |
CALCULATED | cockroachdb.rocksdb.cache.[{#STORE},hit_ratio] Expression: last(//cockroachdb.rocksdb.cache.hits.[{#STORE},rate]) / (last(//cockroachdb.rocksdb.cache.hits.[{#STORE},rate]) + last(//cockroachdb.rocksdb.cache.misses.[{#STORE},rate])) * 100 |
CockroachDB | CockroachDB: Storage [{#STORE}]: Replication: Replicas | Number of replicas. |
DEPENDENT | cockroachdb.replication.replicas.[{#STORE},count] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Replication: Replicas quiesced | Number of quiesced replicas. |
DEPENDENT | cockroachdb.replication.replicas.[{#STORE},quiesced] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Slow requests: Latch acquisitions | Number of requests that have been stuck for a long time acquiring latches. |
DEPENDENT | cockroachdb.slowrequests.[{#STORE},latchacquisitions] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Slow requests: Lease acquisitions | Number of requests that have been stuck for a long time acquiring a lease. |
DEPENDENT | cockroachdb.slowrequests.[{#STORE},leaseacquisitions] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: Slow requests: Raft proposals | Number of requests that have been stuck for a long time in raft. |
DEPENDENT | cockroachdb.slowrequests.[{#STORE},raftproposals] Preprocessing: - PROMETHEUS_PATTERN: |
CockroachDB | CockroachDB: Storage [{#STORE}]: RocksDB SSTables | The number of SSTables in use. |
DEPENDENT | cockroachdb.rocksdb.[{#STORE},sstables] Preprocessing: - PROMETHEUS_PATTERN: |
Zabbix raw items | CockroachDB: Get metrics | Get raw metrics from the Prometheus endpoint. |
HTTP_AGENT | cockroachdb.getmetrics Preprocessing: - CHECK NOTSUPPORTED⛔️ON FAIL:DISCARD_VALUE -> |
Zabbix raw items | CockroachDB: Get health | Get node /health endpoint |
HTTP_AGENT | cockroachdb.gethealth Preprocessing: - CHECK NOTSUPPORTED⛔️ON FAIL:DISCARD_VALUE -> - REGEX: - DISCARDUNCHANGEDHEARTBEAT: |
Zabbix raw items | CockroachDB: Get readiness | Get node /health?ready=1 endpoint |
HTTP_AGENT | cockroachdb.getreadiness Preprocessing: - CHECK NOTSUPPORTED⛔️ON FAIL:DISCARD_VALUE -> - REGEX: - DISCARDUNCHANGEDHEARTBEAT: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
CockroachDB: Service is down | - |
last(/CockroachDB by HTTP/net.tcp.service["{$COCKROACHDB.API.SCHEME}","{HOST.CONN}","{$COCKROACHDB.API.PORT}"]) = 0 |
AVERAGE | |
CockroachDB: Clock offset is too high | Cockroach-measured clock offset is nearing limit (by default, servers kill themselves at 400ms from the mean). |
min(/CockroachDB by HTTP/cockroachdb.clock.offset,5m) > {$COCKROACHDB.CLOCK.OFFSET.MAX.WARN} * 0.001 |
WARNING | |
CockroachDB: Version has changed | - |
last(/CockroachDB by HTTP/cockroachdb.version) <> last(/CockroachDB by HTTP/cockroachdb.version,#2) and length(last(/CockroachDB by HTTP/cockroachdb.version)) > 0 |
INFO | |
CockroachDB: Current number of open files is too high | Getting close to open file descriptor limit. |
min(/CockroachDB by HTTP/cockroachdb.descriptors.open,10m) / last(/CockroachDB by HTTP/cockroachdb.descriptors.limit) * 100 > {$COCKROACHDB.OPEN.FDS.MAX.WARN} |
WARNING | |
CockroachDB: Node is not executing SQL | Node is not executing SQL despite having connections. |
last(/CockroachDB by HTTP/cockroachdb.sql.sessions) > 0 and last(/CockroachDB by HTTP/cockroachdb.sql.statements.executed.rate) = 0 |
WARNING | |
CockroachDB: SQL statements errors rate is too high | - |
min(/CockroachDB by HTTP/cockroachdb.sql.statements.errors.rate,5m) > {$COCKROACHDB.STATEMENTS.ERRORS.MAX.WARN} |
WARNING | |
CockroachDB: Node has been restarted | Uptime is less than 10 minutes. |
last(/CockroachDB by HTTP/cockroachdb.uptime) < 10m |
INFO | |
CockroachDB: Failed to fetch node data | Zabbix has not received data for items for the last 5 minutes. |
nodata(/CockroachDB by HTTP/cockroachdb.uptime,5m) = 1 |
WARNING | Depends on: - CockroachDB: Service is down |
CockroachDB: Node certificate expires soon | Node certificate expires soon. |
(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.node) - now()) / 86400 < {$COCKROACHDB.CERT.NODE.EXPIRY.WARN} |
WARNING | |
CockroachDB: CA certificate expires soon | CA certificate expires soon. |
(last(/CockroachDB by HTTP/cockroachdb.cert.expire_date.ca) - now()) / 86400 < {$COCKROACHDB.CERT.CA.EXPIRY.WARN} |
WARNING | |
CockroachDB: Storage [{#STORE}]: Available storage capacity is low | Storage is running low on free space (less than {$COCKROACHDB.STORE.USED.MIN.WARN}% available). |
max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.WARN} Recovery expression: min(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) > {$COCKROACHDB.STORE.USED.MIN.WARN} |
WARNING | Depends on: - CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low |
CockroachDB: Storage [{#STORE}]: Available storage capacity is critically low | Storage is running critically low on free space (less than {$COCKROACHDB.STORE.USED.MIN.CRIT}% available). |
max(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) < {$COCKROACHDB.STORE.USED.MIN.CRIT} Recovery expression: min(/CockroachDB by HTTP/cockroachdb.storage.capacity.[{#STORE},available_percent],5m) > {$COCKROACHDB.STORE.USED.MIN.CRIT} |
AVERAGE | |
CockroachDB: Node is unhealthy | Node's /health endpoint has returned HTTP 500 Internal Server Error which indicates unhealthy mode. |
last(/CockroachDB by HTTP/cockroachdb.get_health) = 500 |
AVERAGE | Depends on: - CockroachDB: Service is down |
CockroachDB: Node is not ready | Node's /health?ready=1 endpoint has returned HTTP 503 Service Unavailable. Possible reasons: - node is in the wait phase of the node shutdown sequence; - node is unable to communicate with a majority of the other nodes in the cluster, likely because the cluster is unavailable due to too many nodes being down. |
last(/CockroachDB by HTTP/cockroachdb.get_readiness) = 503 and last(/CockroachDB by HTTP/cockroachdb.uptime) > 5m |
AVERAGE | Depends on: - CockroachDB: Service is down |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
The template to monitor ClickHouse by Zabbix that work without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
This template was tested on:
See Zabbix template operation for basic instructions.
Create a user to monitor the service:
create file /etc/clickhouse-server/users.d/zabbix.xml
<yandex>
<users>
<zabbix>
<password>zabbix_pass</password>
<networks incl="networks" />
<profile>web</profile>
<quota>default</quota>
<allow_databases>
<database>test</database>
</allow_databases>
</zabbix>
</users>
</yandex>
Login and password are also set in macros:
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$CLICKHOUSE.DELAYED.FILES.DISTRIBUTED.COUNT.MAX.WARN} | Maximum size of distributed files queue to insert for trigger expression. |
600 |
{$CLICKHOUSE.DELAYED.INSERTS.MAX.WARN} | Maximum number of delayed inserts for trigger expression. |
0 |
{$CLICKHOUSE.LLD.FILTER.DB.MATCHES} | Filter of discoverable databases |
.* |
{$CLICKHOUSE.LLD.FILTER.DB.NOT_MATCHES} | Filter to exclude discovered databases |
CHANGE_IF_NEEDED |
{$CLICKHOUSE.LLD.FILTER.DICT.MATCHES} | Filter of discoverable dictionaries |
.* |
{$CLICKHOUSE.LLD.FILTER.DICT.NOT_MATCHES} | Filter to exclude discovered dictionaries |
CHANGE_IF_NEEDED |
{$CLICKHOUSE.LOG_POSITION.DIFF.MAX.WARN} | Maximum diff between logpointer and logmax_index. |
30 |
{$CLICKHOUSE.NETWORK.ERRORS.MAX.WARN} | Maximum number of smth for trigger expression |
5 |
{$CLICKHOUSE.PARTS.PER.PARTITION.WARN} | Maximum number of parts per partition for trigger expression. |
300 |
{$CLICKHOUSE.PASSWORD} | - |
zabbix_pass |
{$CLICKHOUSE.PORT} | The port of ClickHouse HTTP endpoint |
8123 |
{$CLICKHOUSE.QUERY_TIME.MAX.WARN} | Maximum ClickHouse query time in seconds for trigger expression |
600 |
{$CLICKHOUSE.QUEUE.SIZE.MAX.WARN} | Maximum size of the queue for operations waiting to be performed for trigger expression. |
20 |
{$CLICKHOUSE.REPLICA.MAX.WARN} | Replication lag across all tables for trigger expression. |
600 |
{$CLICKHOUSE.SCHEME} | Request scheme which may be http or https |
http |
{$CLICKHOUSE.USER} | - |
zabbix |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Dictionaries | Info about dictionaries |
DEPENDENT | clickhouse.dictionaries.discovery Filter: AND- {#NAME} MATCHESREGEX - {#NAME} NOTMATCHES_REGEX |
Replicas | Info about replicas |
DEPENDENT | clickhouse.replicas.discovery Filter: AND- {#DB} MATCHESREGEX - {#DB} NOTMATCHES_REGEX |
Tables | Info about tables |
DEPENDENT | clickhouse.tables.discovery Filter: AND- {#DB} MATCHESREGEX - {#DB} NOTMATCHES_REGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
ClickHouse | ClickHouse: Longest currently running query time | Get longest running query. |
HTTP_AGENT | clickhouse.process.elapsed |
ClickHouse | ClickHouse: Check port availability | - |
SIMPLE | net.tcp.service[{$CLICKHOUSE.SCHEME},"{HOST.CONN}","{$CLICKHOUSE.PORT}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
ClickHouse | ClickHouse: Ping | HTTP_AGENT | clickhouse.ping Preprocessing: - REGEX: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
|
ClickHouse | ClickHouse: Version | Version of the server |
HTTP_AGENT | clickhouse.version Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
ClickHouse | ClickHouse: Revision | Revision of the server. |
DEPENDENT | clickhouse.revision Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: Uptime | Number of seconds since ClickHouse server start |
DEPENDENT | clickhouse.uptime Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: New queries per second | Number of queries to be interpreted and potentially executed. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries. |
DEPENDENT | clickhouse.query.rate Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
ClickHouse | ClickHouse: New SELECT queries per second | Number of SELECT queries to be interpreted and potentially executed. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries. |
DEPENDENT | clickhouse.selectquery.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 - CHANGEPERSECOND |
ClickHouse | ClickHouse: New INSERT queries per second | Number of INSERT queries to be interpreted and potentially executed. Does not include queries that failed to parse or were rejected due to AST size limits, quota limits or limits on the number of simultaneously running queries. May include internal queries initiated by ClickHouse itself. Does not count subqueries. |
DEPENDENT | clickhouse.insertquery.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 - CHANGEPERSECOND |
ClickHouse | ClickHouse: Delayed insert queries | "Number of INSERT queries that are throttled due to high number of active data parts for partition in a MergeTree table." |
DEPENDENT | clickhouse.insert.delay Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: Current running queries | Number of executing queries |
DEPENDENT | clickhouse.query.current Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: Current running merges | Number of executing background merges |
DEPENDENT | clickhouse.merge.current Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: Inserted bytes per second | The number of uncompressed bytes inserted in all tables. |
DEPENDENT | clickhouse.insertedbytes.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 - CHANGEPERSECOND |
ClickHouse | ClickHouse: Read bytes per second | "Number of bytes (the number of bytes before decompression) read from compressed sources (files, network)." |
DEPENDENT | clickhouse.readbytes.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 - CHANGEPERSECOND |
ClickHouse | ClickHouse: Inserted rows per second | The number of rows inserted in all tables. |
DEPENDENT | clickhouse.insertedrows.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 - CHANGEPERSECOND |
ClickHouse | ClickHouse: Merged rows per second | Rows read for background merges. |
DEPENDENT | clickhouse.mergerows.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 - CHANGEPERSECOND |
ClickHouse | ClickHouse: Uncompressed bytes merged per second | Uncompressed bytes that were read for background merges |
DEPENDENT | clickhouse.mergebytes.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 - CHANGEPERSECOND |
ClickHouse | ClickHouse: Max count of parts per partition across all tables | Clickhouse MergeTree table engine split each INSERT query to partitions (PARTITION BY expression) and add one or more PARTS per INSERT inside each partition, after that background merge process run. |
DEPENDENT | clickhouse.max.part.count.for.partition Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: Current TCP connections | Number of connections to TCP server (clients with native interface). |
DEPENDENT | clickhouse.connections.tcp Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: Current HTTP connections | Number of connections to HTTP server. |
DEPENDENT | clickhouse.connections.http Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: Current distribute connections | Number of connections to remote servers sending data that was INSERTed into Distributed tables. |
DEPENDENT | clickhouse.connections.distribute Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: Current MySQL connections | Number of connections to MySQL server. |
DEPENDENT | clickhouse.connections.mysql Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
ClickHouse | ClickHouse: Current Interserver connections | Number of connections from other replicas to fetch parts. |
DEPENDENT | clickhouse.connections.interserver Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: Network errors per second | Network errors (timeouts and connection failures) during query execution, background pool tasks and DNS cache update. |
DEPENDENT | clickhouse.network.error.rate Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
ClickHouse | ClickHouse: Read syscalls in fly | Number of read (read, pread, io_getevents, etc.) syscalls in fly |
DEPENDENT | clickhouse.read Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: Write syscalls in fly | Number of write (write, pwrite, io_getevents, etc.) syscalls in fly |
DEPENDENT | clickhouse.write Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: Allocated bytes | "Total number of bytes allocated by the application." |
DEPENDENT | clickhouse.jemalloc.allocated Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: Resident memory | Maximum number of bytes in physically resident data pages mapped by the allocator, comprising all pages dedicated to allocator metadata, pages backing active allocations, and unused dirty pages. |
DEPENDENT | clickhouse.jemalloc.resident Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: Mapped memory | "Total number of bytes in active extents mapped by the allocator." |
DEPENDENT | clickhouse.jemalloc.mapped Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: Memory used for queries | "Total amount of memory (bytes) allocated in currently executing queries." |
DEPENDENT | clickhouse.memory.tracking Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: Memory used for background merges | "Total amount of memory (bytes) allocated in background processing pool (that is dedicated for background merges, mutations and fetches). Note that this value may include a drift when the memory was allocated in a context of background processing pool and freed in other context or vice-versa. This happens naturally due to caches for tables indexes and doesn't indicate memory leaks." |
DEPENDENT | clickhouse.memory.tracking.background Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: Memory used for background moves | "Total amount of memory (bytes) allocated in background processing pool (that is dedicated for background moves). Note that this value may include a drift when the memory was allocated in a context of background processing pool and freed in other context or vice-versa. This happens naturally due to caches for tables indexes and doesn't indicate memory leaks." |
DEPENDENT | clickhouse.memory.tracking.background.moves Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
ClickHouse | ClickHouse: Memory used for background schedule pool | "Total amount of memory (bytes) allocated in background schedule pool (that is dedicated for bookkeeping tasks of Replicated tables)." |
DEPENDENT | clickhouse.memory.tracking.schedule.pool Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: Memory used for merges | Total amount of memory (bytes) allocated for background merges. Included in MemoryTrackingInBackgroundProcessingPool. Note that this value may include a drift when the memory was allocated in a context of background processing pool and freed in other context or vice-versa. This happens naturally due to caches for tables indexes and doesn't indicate memory leaks. |
DEPENDENT | clickhouse.memory.tracking.merges Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: Current distributed files to insert | Number of pending files to process for asynchronous insertion into Distributed tables. Number of files for every shard is summed. |
DEPENDENT | clickhouse.distributed.files Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: Distributed connection fail with retry per second | Connection retries in replicated DB connection pool |
DEPENDENT | clickhouse.distributed.files.retry.rate Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
ClickHouse | ClickHouse: Distributed connection fail with retry per second | "Connection failures after all retries in replicated DB connection pool" |
DEPENDENT | clickhouse.distributed.files.fail.rate Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
ClickHouse | ClickHouse: Replication lag across all tables | Maximum replica queue delay relative to current time |
DEPENDENT | clickhouse.replicas.max.absolute.delay Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: Total replication tasks in queue | DEPENDENT | clickhouse.replicas.sum.queue.size Preprocessing: - JSONPATH: |
|
ClickHouse | ClickHouse: Total number read-only Replicas | Number of Replicated tables that are currently in readonly state due to re-initialization after ZooKeeper session loss or due to startup without ZooKeeper configured. |
DEPENDENT | clickhouse.replicas.readonly.total Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: {#DB}.{#TABLE}: Bytes | Table size in bytes. Database: {#DB}, table: {#TABLE} |
DEPENDENT | clickhouse.table.bytes["{#DB}.{#TABLE}"] Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: {#DB}.{#TABLE}: Parts | Number of parts of the table. Database: {#DB}, table: {#TABLE} |
DEPENDENT | clickhouse.table.parts["{#DB}.{#TABLE}"] Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: {#DB}.{#TABLE}: Rows | Number of rows in the table. Database: {#DB}, table: {#TABLE} |
DEPENDENT | clickhouse.table.rows["{#DB}.{#TABLE}"] Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: {#DB}: Bytes | Database size in bytes. |
DEPENDENT | clickhouse.db.bytes["{#DB}"] Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: {#DB}.{#TABLE}: Replica readonly | Whether the replica is in read-only mode. This mode is turned on if the config doesn't have sections with ZooKeeper, if an unknown error occurred when re-initializing sessions in ZooKeeper, and during session re-initialization in ZooKeeper. |
DEPENDENT | clickhouse.replica.is_readonly["{#DB}.{#TABLE}"] Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: {#DB}.{#TABLE}: Replica session expired | True if the ZooKeeper session expired |
DEPENDENT | clickhouse.replica.issessionexpired["{#DB}.{#TABLE}"] Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: {#DB}.{#TABLE}: Replica future parts | Number of data parts that will appear as the result of INSERTs or merges that haven't been done yet. |
DEPENDENT | clickhouse.replica.future_parts["{#DB}.{#TABLE}"] Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: {#DB}.{#TABLE}: Replica parts to check | Number of data parts in the queue for verification. A part is put in the verification queue if there is suspicion that it might be damaged. |
DEPENDENT | clickhouse.replica.partstocheck["{#DB}.{#TABLE}"] Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: {#DB}.{#TABLE}: Replica queue size | Size of the queue for operations waiting to be performed. |
DEPENDENT | clickhouse.replica.queue_size["{#DB}.{#TABLE}"] Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: {#DB}.{#TABLE}: Replica queue inserts size | Number of inserts of blocks of data that need to be made. |
DEPENDENT | clickhouse.replica.insertsinqueue["{#DB}.{#TABLE}"] Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: {#DB}.{#TABLE}: Replica queue merges size | Number of merges waiting to be made. |
DEPENDENT | clickhouse.replica.mergesinqueue["{#DB}.{#TABLE}"] Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: {#DB}.{#TABLE}: Replica log max index | Maximum entry number in the log of general activity. (Have a non-zero value only where there is an active session with ZooKeeper). |
DEPENDENT | clickhouse.replica.logmaxindex["{#DB}.{#TABLE}"] Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: {#DB}.{#TABLE}: Replica log pointer | Maximum entry number in the log of general activity that the replica copied to its execution queue, plus one. (Have a non-zero value only where there is an active session with ZooKeeper). |
DEPENDENT | clickhouse.replica.log_pointer["{#DB}.{#TABLE}"] Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: {#DB}.{#TABLE}: Total replicas | Total number of known replicas of this table. (Have a non-zero value only where there is an active session with ZooKeeper). |
DEPENDENT | clickhouse.replica.total_replicas["{#DB}.{#TABLE}"] Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: {#DB}.{#TABLE}: Active replicas | Number of replicas of this table that have a session in ZooKeeper (i.e., the number of functioning replicas). (Have a non-zero value only where there is an active session with ZooKeeper). |
DEPENDENT | clickhouse.replica.active_replicas["{#DB}.{#TABLE}"] Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: {#DB}.{#TABLE}: Replica lag | Difference between logmaxindex and log_pointer |
DEPENDENT | clickhouse.replica.lag["{#DB}.{#TABLE}"] Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: Dictionary {#NAME}: Bytes allocated | The amount of RAM the dictionary uses. |
DEPENDENT | clickhouse.dictionary.bytes_allocated["{#NAME}"] Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: Dictionary {#NAME}: Element count | Number of items stored in the dictionary. |
DEPENDENT | clickhouse.dictionary.element_count["{#NAME}"] Preprocessing: - JSONPATH: |
ClickHouse | ClickHouse: Dictionary {#NAME}: Load factor | The percentage filled in the dictionary (for a hashed dictionary, the percentage filled in the hash table). |
DEPENDENT | clickhouse.dictionary.load_factor["{#NAME}"] Preprocessing: - JSONPATH: - MULTIPLIER: |
ClickHouse ZooKeeper | ClickHouse: ZooKeeper sessions | Number of sessions (connections) to ZooKeeper. Should be no more than one. |
DEPENDENT | clickhouse.zookeeper.session Preprocessing: - JSONPATH: |
ClickHouse ZooKeeper | ClickHouse: ZooKeeper watches | Number of watches (e.g., event subscriptions) in ZooKeeper. |
DEPENDENT | clickhouse.zookeeper.watch Preprocessing: - JSONPATH: |
ClickHouse ZooKeeper | ClickHouse: ZooKeeper requests | Number of requests to ZooKeeper in progress. |
DEPENDENT | clickhouse.zookeeper.request Preprocessing: - JSONPATH: |
ClickHouse ZooKeeper | ClickHouse: ZooKeeper wait time | Time spent in waiting for ZooKeeper operations. |
DEPENDENT | clickhouse.zookeeper.wait.time Preprocessing: - JSONPATH: ⛔️ONFAIL: - MULTIPLIER: - CHANGEPER_SECOND |
ClickHouse ZooKeeper | ClickHouse: ZooKeeper exceptions per second | Count of ZooKeeper exceptions that does not belong to user/hardware exceptions. |
DEPENDENT | clickhouse.zookeeper.exceptions.rate Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
ClickHouse ZooKeeper | ClickHouse: ZooKeeper hardware exceptions per second | Count of ZooKeeper exceptions caused by session moved/expired, connection loss, marshalling error, operation timed out and invalid zhandle state. |
DEPENDENT | clickhouse.zookeeper.hwexceptions.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 - CHANGEPERSECOND |
ClickHouse ZooKeeper | ClickHouse: ZooKeeper user exceptions per second | Count of ZooKeeper exceptions caused by no znodes, bad version, node exists, node empty and no children for ephemeral. |
DEPENDENT | clickhouse.zookeeper.userexceptions.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 - CHANGEPERSECOND |
Zabbix raw items | ClickHouse: Get system.events | Get information about the number of events that have occurred in the system. |
HTTP_AGENT | clickhouse.system.events Preprocessing: - JSONPATH: |
Zabbix raw items | ClickHouse: Get system.metrics | Get metrics which can be calculated instantly, or have a current value format JSONEachRow |
HTTP_AGENT | clickhouse.system.metrics Preprocessing: - JSONPATH: |
Zabbix raw items | ClickHouse: Get system.asynchronous_metrics | Get metrics that are calculated periodically in the background |
HTTP_AGENT | clickhouse.system.asynchronous_metrics Preprocessing: - JSONPATH: |
Zabbix raw items | ClickHouse: Get system.settings | Get information about settings that are currently in use. |
HTTP_AGENT | clickhouse.system.settings Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Zabbix raw items | ClickHouse: Get replicas info | - |
HTTP_AGENT | clickhouse.replicas Preprocessing: - JSONPATH: |
Zabbix raw items | ClickHouse: Get tables info | - |
HTTP_AGENT | clickhouse.tables Preprocessing: - JSONPATH: |
Zabbix raw items | ClickHouse: Get dictionaries info | - |
HTTP_AGENT | clickhouse.dictionaries Preprocessing: - JSONPATH: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
ClickHouse: There are queries running is long | - |
last(/ClickHouse by HTTP/clickhouse.process.elapsed)>{$CLICKHOUSE.QUERY_TIME.MAX.WARN} |
AVERAGE | Manual close: YES |
ClickHouse: Port {$CLICKHOUSE.PORT} is unavailable | - |
last(/ClickHouse by HTTP/net.tcp.service[{$CLICKHOUSE.SCHEME},"{HOST.CONN}","{$CLICKHOUSE.PORT}"])=0 |
AVERAGE | Manual close: YES |
ClickHouse: Service is down | - |
last(/ClickHouse by HTTP/clickhouse.ping)=0 or last(/ClickHouse by HTTP/net.tcp.service[{$CLICKHOUSE.SCHEME},"{HOST.CONN}","{$CLICKHOUSE.PORT}"]) = 0 |
AVERAGE | Manual close: YES Depends on: - ClickHouse: Port {$CLICKHOUSE.PORT} is unavailable |
ClickHouse: Version has changed | ClickHouse version has changed. Ack to close. |
last(/ClickHouse by HTTP/clickhouse.version,#1)<>last(/ClickHouse by HTTP/clickhouse.version,#2) and length(last(/ClickHouse by HTTP/clickhouse.version))>0 |
INFO | Manual close: YES |
ClickHouse: has been restarted | Uptime is less than 10 minutes. |
last(/ClickHouse by HTTP/clickhouse.uptime)<10m |
INFO | Manual close: YES |
ClickHouse: Failed to fetch info data | Zabbix has not received data for items for the last 30 minutes |
nodata(/ClickHouse by HTTP/clickhouse.uptime,30m)=1 |
WARNING | Manual close: YES Depends on: - ClickHouse: Service is down |
ClickHouse: Too many throttled insert queries | Clickhouse have INSERT queries that are throttled due to high number of active data parts for partition in a MergeTree, please decrease INSERT frequency |
min(/ClickHouse by HTTP/clickhouse.insert.delay,5m)>{$CLICKHOUSE.DELAYED.INSERTS.MAX.WARN} |
WARNING | Manual close: YES |
ClickHouse: Too many MergeTree parts | Descease INSERT queries frequency. Clickhouse MergeTree table engine split each INSERT query to partitions (PARTITION BY expression) and add one or more PARTS per INSERT inside each partition, after that background merge process run, and when you have too much unmerged parts inside partition, SELECT queries performance can significate degrade, so clickhouse try delay insert, or abort it. |
min(/ClickHouse by HTTP/clickhouse.max.part.count.for.partition,5m)>{$CLICKHOUSE.PARTS.PER.PARTITION.WARN} * 0.9 |
WARNING | Manual close: YES |
ClickHouse: Too many network errors | Number of errors (timeouts and connection failures) during query execution, background pool tasks and DNS cache update is too high. |
min(/ClickHouse by HTTP/clickhouse.network.error.rate,5m)>{$CLICKHOUSE.NETWORK.ERRORS.MAX.WARN} |
WARNING | |
ClickHouse: Too many distributed files to insert | "Clickhouse servers and https://clickhouse.tech/docs/en/operations/table_engines/distributed/" |
min(/ClickHouse by HTTP/clickhouse.distributed.files,5m)>{$CLICKHOUSE.DELAYED.FILES.DISTRIBUTED.COUNT.MAX.WARN} |
WARNING | Manual close: YES |
ClickHouse: Replication lag is too high | When replica have too much lag, it can be skipped from Distributed SELECT Queries without errors and you will have wrong query results. |
min(/ClickHouse by HTTP/clickhouse.replicas.max.absolute.delay,5m)>{$CLICKHOUSE.REPLICA.MAX.WARN} |
WARNING | Manual close: YES |
ClickHouse: {#DB}.{#TABLE} Replica is readonly | This mode is turned on if the config doesn't have sections with ZooKeeper, if an unknown error occurred when re-initializing sessions in ZooKeeper, and during session re-initialization in ZooKeeper. |
min(/ClickHouse by HTTP/clickhouse.replica.is_readonly["{#DB}.{#TABLE}"],5m)=1 |
WARNING | |
ClickHouse: {#DB}.{#TABLE} Replica session is expired | This mode is turned on if the config doesn't have sections with ZooKeeper, if an unknown error occurred when re-initializing sessions in ZooKeeper, and during session re-initialization in ZooKeeper. |
min(/ClickHouse by HTTP/clickhouse.replica.is_session_expired["{#DB}.{#TABLE}"],5m)=1 |
WARNING | |
ClickHouse: {#DB}.{#TABLE}: Too many operations in queue | - |
min(/ClickHouse by HTTP/clickhouse.replica.queue_size["{#DB}.{#TABLE}"],5m)>{$CLICKHOUSE.QUEUE.SIZE.MAX.WARN:"{#TABLE}"} |
WARNING | |
ClickHouse: {#DB}.{#TABLE}: Number of active replicas less than number of total replicas | - |
max(/ClickHouse by HTTP/clickhouse.replica.active_replicas["{#DB}.{#TABLE}"],5m) < last(/ClickHouse by HTTP/clickhouse.replica.total_replicas["{#DB}.{#TABLE}"]) |
WARNING | |
ClickHouse: {#DB}.{#TABLE}: Difference between logmaxindex and log_pointer is too high | - |
min(/ClickHouse by HTTP/clickhouse.replica.lag["{#DB}.{#TABLE}"],5m) > {$CLICKHOUSE.LOG_POSITION.DIFF.MAX.WARN} |
WARNING | |
ClickHouse: Too many ZooKeeper sessions opened | Number of sessions (connections) to ZooKeeper. Should be no more than one, because using more than one connection to ZooKeeper may lead to bugs due to lack of linearizability (stale reads) that ZooKeeper consistency model allows. |
min(/ClickHouse by HTTP/clickhouse.zookeeper.session,5m)>1 |
WARNING | |
ClickHouse: Configuration has been changed | ClickHouse configuration has been changed. Ack to close. |
last(/ClickHouse by HTTP/clickhouse.system.settings,#1)<>last(/ClickHouse by HTTP/clickhouse.system.settings,#2) and length(last(/ClickHouse by HTTP/clickhouse.system.settings))>0 |
INFO | Manual close: YES |
Please report any issues with the template at https://support.zabbix.com
For Zabbix version: 6.2 and higher
Official JMX Template for Apache Cassandra DBSM.
This template was tested on:
See Zabbix template operation for basic instructions.
This template works with standalone and cluster instances. Metrics are collected by JMX.
No specific Zabbix configuration is required.
Name | Description | Default | |||
---|---|---|---|---|---|
{$CASSANDRA.KEY_SPACE.MATCHES} | Filter of discoverable key spaces |
.* |
|||
{$CASSANDRA.KEYSPACE.NOTMATCHES} | Filter to exclude discovered key spaces |
`(system | system_auth | system_distributed | system_schema)` |
{$CASSANDRA.PASSWORD} | - |
zabbix |
|||
{$CASSANDRA.PENDING_TASKS.MAX.HIGH} | - |
500 |
|||
{$CASSANDRA.PENDING_TASKS.MAX.WARN} | - |
350 |
|||
{$CASSANDRA.USER} | - |
zabbix |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Tables | Info about keyspaces and tables |
JMX | jmx.discovery[beans,"org.apache.cassandra.metrics:type=Table,keyspace=,scope=,name=ReadLatency"] Filter: AND- {#JMXKEYSPACE} MATCHESREGEX - {#JMXKEYSPACE} NOTMATCHES_REGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Cassandra | Cluster: Nodes down | - |
JMX | jmx["org.apache.cassandra.net:type=FailureDetector","DownEndpointCount"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Cassandra | Cluster: Nodes up | - |
JMX | jmx["org.apache.cassandra.net:type=FailureDetector","UpEndpointCount"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Cassandra | Cluster: Name | - |
JMX | jmx["org.apache.cassandra.db:type=StorageService","ClusterName"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Cassandra | Version | - |
JMX | jmx["org.apache.cassandra.db:type=StorageService","ReleaseVersion"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Cassandra | Dropped messages: Write (Mutation) | Number of dropped regular writes messages. |
JMX | jmx["org.apache.cassandra.metrics:type=DroppedMessage,scope=MUTATION,name=Dropped","Count"] |
Cassandra | Dropped messages: Read | Number of dropped regular reads messages. |
JMX | jmx["org.apache.cassandra.metrics:type=DroppedMessage,scope=READ,name=Dropped","Count"] |
Cassandra | Storage: Used (bytes) | Size, in bytes, of the on disk data size this node manages. |
JMX | jmx["org.apache.cassandra.metrics:type=Storage,name=Load","Count"] |
Cassandra | Storage: Errors | Number of internal exceptions caught. Under normal exceptions this should be zero. |
JMX | jmx["org.apache.cassandra.metrics:type=Storage,name=Exceptions","Count"] |
Cassandra | Storage: Hints | Number of hint messages written to this node since [re]start. Includes one entry for each host to be hinted per hint. |
JMX | jmx["org.apache.cassandra.metrics:type=Storage,name=TotalHints","Count"] |
Cassandra | Compaction: Number of completed tasks | Number of completed compactions since server [re]start. |
JMX | jmx["org.apache.cassandra.metrics:name=CompletedTasks,type=Compaction","Value"] |
Cassandra | Compaction: Total compactions completed | Throughput of completed compactions since server [re]start. |
JMX | jmx["org.apache.cassandra.metrics:name=TotalCompactionsCompleted,type=Compaction","Count"] |
Cassandra | Compaction: Pending tasks | Estimated number of compactions remaining to perform. |
JMX | jmx["org.apache.cassandra.metrics:type=Compaction,name=PendingTasks","Value"] |
Cassandra | Commitlog: Pending tasks | Number of commit log messages written but yet to be fsync'd. |
JMX | jmx["org.apache.cassandra.metrics:name=PendingTasks,type=CommitLog","Value"] |
Cassandra | Commitlog: Total size | Current size, in bytes, used by all the commit log segments. |
JMX | jmx["org.apache.cassandra.metrics:name=TotalCommitLogSize,type=CommitLog","Value"] |
Cassandra | Latency: Read median | Latency read from disk in milliseconds - median. |
JMX | jmx["org.apache.cassandra.metrics:name=ReadLatency,type=Table","50thPercentile"] Preprocessing: - MULTIPLIER: |
Cassandra | Latency: Read 75 percentile | Latency read from disk in milliseconds - p75. |
JMX | jmx["org.apache.cassandra.metrics:name=ReadLatency,type=Table","75thPercentile"] Preprocessing: - MULTIPLIER: |
Cassandra | Latency: Read 95 percentile | Latency read from disk in milliseconds - p95. |
JMX | jmx["org.apache.cassandra.metrics:name=ReadLatency,type=Table","95thPercentile"] Preprocessing: - MULTIPLIER: |
Cassandra | Latency: Write median | Latency write to disk in milliseconds - median. |
JMX | jmx["org.apache.cassandra.metrics:name=WriteLatency,type=Table","50thPercentile"] Preprocessing: - MULTIPLIER: |
Cassandra | Latency: Write 75 percentile | Latency write to disk in milliseconds - p75. |
JMX | jmx["org.apache.cassandra.metrics:name=WriteLatency,type=Table","75thPercentile"] Preprocessing: - MULTIPLIER: |
Cassandra | Latency: Write 95 percentile | Latency write to disk in milliseconds - p95. |
JMX | jmx["org.apache.cassandra.metrics:name=WriteLatency,type=Table","95thPercentile"] Preprocessing: - MULTIPLIER: |
Cassandra | Latency: Client request read median | Total latency serving data to clients in milliseconds - median. |
JMX | jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency","50thPercentile"] Preprocessing: - MULTIPLIER: |
Cassandra | Latency: Client request read 75 percentile | Total latency serving data to clients in milliseconds - p75. |
JMX | jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency","75thPercentile"] Preprocessing: - MULTIPLIER: |
Cassandra | Latency: Client request read 95 percentile | Total latency serving data to clients in milliseconds - p95. |
JMX | jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency","95thPercentile"] Preprocessing: - MULTIPLIER: |
Cassandra | Latency: Client request write median | Total latency serving write requests from clients in milliseconds - median. |
JMX | jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Latency","50thPercentile"] Preprocessing: - MULTIPLIER: |
Cassandra | Latency: Client request write 75 percentile | Total latency serving write requests from clients in milliseconds - p75. |
JMX | jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Latency","75thPercentile"] Preprocessing: - MULTIPLIER: |
Cassandra | Latency: Client request write 95 percentile | Total latency serving write requests from clients in milliseconds - p95. |
JMX | jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Latency","95thPercentile"] Preprocessing: - MULTIPLIER: |
Cassandra | KeyCache: Capacity | Cache capacity in bytes. |
JMX | jmx["org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Capacity","Value"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Cassandra | KeyCache: Entries | Total number of cache entries. |
JMX | jmx["org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Entries","Value"] |
Cassandra | KeyCache: HitRate | All time cache hit rate. |
JMX | jmx["org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=HitRate","Value"] Preprocessing: - MULTIPLIER: |
Cassandra | KeyCache: Hits per second | Rate of cache hits. |
JMX | jmx["org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Hits","Count"] Preprocessing: - CHANGEPERSECOND |
Cassandra | KeyCache: requests per second | Rate of cache requests. |
JMX | jmx["org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Requests","Count"] Preprocessing: - CHANGEPERSECOND |
Cassandra | KeyCache: Size | Total size of occupied cache, in bytes. |
JMX | jmx["org.apache.cassandra.metrics:type=Cache,scope=KeyCache,name=Size","Value"] |
Cassandra | Client connections: Native | Number of clients connected to this nodes native protocol server. |
JMX | jmx["org.apache.cassandra.metrics:type=Client,name=connectedNativeClients","Value"] |
Cassandra | Client connections: Trifts | Number of connected to this nodes thrift clients. |
JMX | jmx["org.apache.cassandra.metrics:type=Client,name=connectedThriftClients","Value"] |
Cassandra | Client request: Read per second | The number of client requests per second. |
JMX | jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Read,name=Latency","Count"] Preprocessing: - CHANGEPERSECOND |
Cassandra | Client request: Write per second | The number of local write requests per second. |
JMX | jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Latency","Count"] Preprocessing: - CHANGEPERSECOND |
Cassandra | Client request: Write Timeouts | Number of write requests timeouts encountered. |
JMX | jmx["org.apache.cassandra.metrics:type=ClientRequest,scope=Write,name=Timeouts","Count"] |
Cassandra | Thread pool.MutationStage: Pending tasks | Number of queued tasks queued up on this pool. MutationStage: Responsible for writes (exclude materialized and counter writes). |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=MutationStage,name=PendingTasks","Value"] |
Cassandra | Thread pool MutationStage: Currently blocked task | Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. MutationStage: Responsible for writes (exclude materialized and counter writes). |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=MutationStage,name=CurrentlyBlockedTasks","Count"] |
Cassandra | Thread pool MutationStage: Total blocked tasks | Number of tasks that were blocked due to queue saturation. MutationStage: Responsible for writes (exclude materialized and counter writes). |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=MutationStage,name=TotalBlockedTasks","Count"] |
Cassandra | Thread pool CounterMutationStage: Pending tasks | Number of queued tasks queued up on this pool. CounterMutationStage: Responsible for counter writes. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=CounterMutationStage,name=PendingTasks","Value"] |
Cassandra | Thread pool CounterMutationStage: Currently blocked task | Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. CounterMutationStage: Responsible for counter writes. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=CounterMutationStage,name=CurrentlyBlockedTasks","Count"] |
Cassandra | Thread pool CounterMutationStage: Total blocked tasks | Number of tasks that were blocked due to queue saturation. CounterMutationStage: Responsible for counter writes. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=CounterMutationStage,name=TotalBlockedTasks","Count"] |
Cassandra | Thread pool ReadStage: Pending tasks | Number of queued tasks queued up on this pool. ReadStage: Local reads run on this thread pool. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ReadStage,name=PendingTasks","Value"] |
Cassandra | Thread pool ReadStage: Currently blocked task | Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. ReadStage: Local reads run on this thread pool. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ReadStage,name=CurrentlyBlockedTasks","Count"] |
Cassandra | Thread pool ReadStage: Total blocked tasks | Number of tasks that were blocked due to queue saturation. ReadStage: Local reads run on this thread pool. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ReadStage,name=TotalBlockedTasks","Count"] |
Cassandra | Thread pool ViewMutationStage: Pending tasks | Number of queued tasks queued up on this pool. ViewMutationStage: Responsible for materialized view writes. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ViewMutationStage,name=PendingTasks","Value"] |
Cassandra | Thread pool ViewMutationStage: Currently blocked task | Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. ViewMutationStage: Responsible for materialized view writes. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ViewMutationStage,name=CurrentlyBlockedTasks","Count"] |
Cassandra | Thread pool ViewMutationStage: Total blocked tasks | Number of tasks that were blocked due to queue saturation. ViewMutationStage: Responsible for materialized view writes. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=request,scope=ViewMutationStage,name=TotalBlockedTasks","Count"] |
Cassandra | Thread pool MemtableFlushWriter: Pending tasks | Number of queued tasks queued up on this pool. MemtableFlushWriter: Writes memtables to disk. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MemtableFlushWriter,name=PendingTasks","Value"] |
Cassandra | Thread pool MemtableFlushWriter: Currently blocked task | Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. MemtableFlushWriter: Writes memtables to disk. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MemtableFlushWriter,name=CurrentlyBlockedTasks","Count"] |
Cassandra | Thread pool MemtableFlushWriter: Total blocked tasks | Number of tasks that were blocked due to queue saturation. MemtableFlushWriter: Writes memtables to disk. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MemtableFlushWriter,name=TotalBlockedTasks","Count"] |
Cassandra | Thread pool HintsDispatcher: Pending tasks | Number of queued tasks queued up on this pool. HintsDispatcher: Performs hinted handoff. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=HintsDispatcher,name=PendingTasks","Value"] |
Cassandra | Thread pool HintsDispatcher: Currently blocked task | Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. HintsDispatcher: Performs hinted handoff. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=HintsDispatcher,name=CurrentlyBlockedTasks","Count"] |
Cassandra | Thread pool HintsDispatcher: Total blocked tasks | Number of tasks that were blocked due to queue saturation. HintsDispatcher: Performs hinted handoff. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=HintsDispatcher,name=TotalBlockedTasks","Count"] |
Cassandra | Thread pool MemtablePostFlush: Pending tasks | Number of queued tasks queued up on this pool. MemtablePostFlush: Cleans up commit log after memtable is written to disk. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MemtablePostFlush,name=PendingTasks","Value"] |
Cassandra | Thread pool MemtablePostFlush: Currently blocked task | Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. MemtablePostFlush: Cleans up commit log after memtable is written to disk. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MemtablePostFlush,name=CurrentlyBlockedTasks","Count"] |
Cassandra | Thread pool MemtablePostFlush: Total blocked tasks | Number of tasks that were blocked due to queue saturation. MemtablePostFlush: Cleans up commit log after memtable is written to disk. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MemtablePostFlush,name=TotalBlockedTasks","Count"] |
Cassandra | Thread pool MigrationStage: Pending tasks | Number of queued tasks queued up on this pool. MigrationStage: Runs schema migrations. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MigrationStage,name=PendingTasks","Value"] |
Cassandra | Thread pool MigrationStage: Currently blocked task | Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. MigrationStage: Runs schema migrations. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MigrationStage,name=CurrentlyBlockedTasks","Count"] |
Cassandra | Thread pool MigrationStage: Total blocked tasks | Number of tasks that were blocked due to queue saturation. MigrationStage: Runs schema migrations. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MigrationStage,name=TotalBlockedTasks","Count"] |
Cassandra | Thread pool MiscStage: Pending tasks | Number of queued tasks queued up on this pool. MiscStage: Miscellaneous tasks run here. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MiscStage,name=PendingTasks","Value"] |
Cassandra | Thread pool MiscStage: Currently blocked task | Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. MiscStage: Miscellaneous tasks run here. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MiscStage,name=CurrentlyBlockedTasks","Count"] |
Cassandra | Thread pool MiscStage: Total blocked tasks | Number of tasks that were blocked due to queue saturation. MiscStage: Miscellaneous tasks run here. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=MiscStage,name=TotalBlockedTasks","Count"] |
Cassandra | Thread pool SecondaryIndexManagement: Pending tasks | Number of queued tasks queued up on this pool. SecondaryIndexManagement: Performs updates to secondary indexes. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=SecondaryIndexManagement,name=PendingTasks","Value"] |
Cassandra | Thread pool SecondaryIndexManagement: Currently blocked task | Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked. SecondaryIndexManagement: Performs updates to secondary indexes. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=SecondaryIndexManagement,name=CurrentlyBlockedTasks","Count"] |
Cassandra | Thread pool SecondaryIndexManagement: Total blocked tasks | Number of tasks that were blocked due to queue saturation. SecondaryIndexManagement: Performs updates to secondary indexes. |
JMX | jmx["org.apache.cassandra.metrics:type=ThreadPools,path=internal,scope=SecondaryIndexManagement,name=TotalBlockedTasks","Count"] |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: SS Tables per read 75 percentile | The number of SSTable data files accessed per read - p75. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=SSTablesPerReadHistogram","75thPercentile"] |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: SS Tables per read 95 percentile | The number of SSTable data files accessed per read - p95. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=SSTablesPerReadHistogram","95thPercentile"] |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Tombstone scanned 75 percentile | Number of tombstones scanned per read - p75. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=TombstoneScannedHistogram","75thPercentile"] |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Tombstone scanned 95 percentile | Number of tombstones scanned per read - p95. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=TombstoneScannedHistogram","95thPercentile"] |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Waiting on free memtable space 75 percentile | The time spent waiting for free memtable space either on- or off-heap - p75. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=WaitingOnFreeMemtableSpace","75thPercentile"] Preprocessing: - MULTIPLIER: |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Waiting on free memtable space95 percentile | The time spent waiting for free memtable space either on- or off-heap - p95. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=WaitingOnFreeMemtableSpace","95thPercentile"] Preprocessing: - MULTIPLIER: |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Col update time delta75 percentile | The column update time delta - p75. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=ColUpdateTimeDeltaHistogram","75thPercentile"] Preprocessing: - MULTIPLIER: |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Col update time delta 95 percentile | The column update time delta - p95. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=ColUpdateTimeDeltaHistogram","95thPercentile"] Preprocessing: - MULTIPLIER: |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Bloom filter false ratio | The ratio of Bloom filter false positives to total checks. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=BloomFilterFalseRatio","Value"] |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Compression ratio | The compression ratio for all SSTables. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=CompressionRatio","Value"] |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: KeyCache hit rate | The key cache hit rate. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=KeyCacheHitRate","Value"] |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Live SS Table | Number of "live" (in use) SSTables. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=LiveSSTableCount","Value"] |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Max sartition size | The size of the largest compacted partition. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=MaxPartitionSize","Value"] |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Mean partition size | The average size of compacted partition. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=MeanPartitionSize","Value"] |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Pending compactions | The number of pending compactions. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=PendingCompactions","Value"] |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Snapshots size | The disk space truly used by snapshots. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=SnapshotsSize","Value"] |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Compaction bytes written | The amount of data that was compacted since (re)start. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=CompactionBytesWritten","Count"] |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Bytes flushed | The amount of data that was flushed since (re)start. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=BytesFlushed","Count"] |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Pending flushes | The number of pending flushes. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=PendingFlushes","Count"] |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Live disk space used | The disk space used by "live" SSTables (only counts in use files). |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=LiveDiskSpaceUsed","Count"] |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Disk space used | Disk space used. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=TotalDiskSpaceUsed","Count"] |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Out of row cache hits | The number of row cache hits that do not satisfy the query filter and went to disk. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=RowCacheHitOutOfRange","Count"] |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Row cache hits | The number of row cache hits. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=RowCacheHit","Count"] |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Row cache misses | The number of table row cache misses. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=RowCacheMiss","Count"] |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Read latency 75 percentile | Latency read from disk in milliseconds. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=ReadLatency","75thPercentile"] Preprocessing: - MULTIPLIER: |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Read latency 95 percentile | Latency read from disk in milliseconds. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=ReadLatency","95thPercentile"] Preprocessing: - MULTIPLIER: |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Read per second | The number of client requests per second. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=ReadLatency","Count"] Preprocessing: - CHANGEPERSECOND |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Write latency 75 percentile | Latency write to disk in milliseconds. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=WriteLatency","75thPercentile"] Preprocessing: - MULTIPLIER: |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Write latency 95 percentile | Latency write to disk in milliseconds. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=WriteLatency","95thPercentile"] Preprocessing: - MULTIPLIER: |
Cassandra | {#JMXKEYSPACE}.{#JMXSCOPE}: Write per second | The number of local write requests per second. |
JMX | jmx["org.apache.cassandra.metrics:type=Table,keyspace={#JMXKEYSPACE},scope={#JMXSCOPE},name=WriteLatency","Count"] Preprocessing: - CHANGEPERSECOND |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
There are down nodes in cluster | - |
last(/Apache Cassandra by JMX/jmx["org.apache.cassandra.net:type=FailureDetector","DownEndpointCount"])>0 |
AVERAGE | |
Version has changed | Cassandra version has changed. Ack to close. |
last(/Apache Cassandra by JMX/jmx["org.apache.cassandra.db:type=StorageService","ReleaseVersion"],#1)<>last(/Apache Cassandra by JMX/jmx["org.apache.cassandra.db:type=StorageService","ReleaseVersion"],#2) and length(last(/Apache Cassandra by JMX/jmx["org.apache.cassandra.db:type=StorageService","ReleaseVersion"]))>0 |
INFO | Manual close: YES |
Failed to fetch info data | Zabbix has not received data for items for the last 15 minutes |
nodata(/Apache Cassandra by JMX/jmx["org.apache.cassandra.metrics:type=Storage,name=Load","Count"],15m)=1 |
WARNING | |
Too many storage exceptions | - |
min(/Apache Cassandra by JMX/jmx["org.apache.cassandra.metrics:type=Storage,name=Exceptions","Count"],5m)>0 |
WARNING | |
Many pending tasks | - |
min(/Apache Cassandra by JMX/jmx["org.apache.cassandra.metrics:type=Compaction,name=PendingTasks","Value"],15m)>{$CASSANDRA.PENDING_TASKS.MAX.WARN} |
WARNING | Depends on: - Too many pending tasks |
Too many pending tasks | - |
min(/Apache Cassandra by JMX/jmx["org.apache.cassandra.metrics:type=Compaction,name=PendingTasks","Value"],15m)>{$CASSANDRA.PENDING_TASKS.MAX.HIGH} |
AVERAGE |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.