For Zabbix version: 6.2 and higher
The template to monitor Apache Zookeeper by Zabbix that work without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
This template was tested on:
See Zabbix template operation for basic instructions.
This template works with standalone and cluster instances. Metrics are collected from each Zookeeper node by requests to AdminServer. By default AdminServer is enabled and listens on port 8080. You can enable or configure AdminServer parameters according official documentations. Don't forget to change macros {$ZOOKEEPER.COMMAND_URL}, {$ZOOKEEPER.PORT}, {$ZOOKEEPER.SCHEME}.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$ZOOKEEPER.COMMAND_URL} | The URL for listing and issuing commands relative to the root URL (admin.commandURL). |
commands |
{$ZOOKEEPER.FILE_DESCRIPTORS.MAX.WARN} | Maximum percentage of file descriptors usage alert threshold (for trigger expression). |
85 |
{$ZOOKEEPER.OUTSTANDING_REQ.MAX.WARN} | Maximum number of outstanding requests (for trigger expression). |
10 |
{$ZOOKEEPER.PENDING_SYNCS.MAX.WARN} | Maximum number of pending syncs from the followers (for trigger expression). |
10 |
{$ZOOKEEPER.PORT} | The port the embedded Jetty server listens on (admin.serverPort). |
8080 |
{$ZOOKEEPER.SCHEME} | Request scheme which may be http or https |
http |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Clients discovery | Get list of client connections. Note, depending on the number of client connections this operation may be expensive (i.e. impact server performance). |
HTTP_AGENT | zookeeper.clients Preprocessing: - JAVASCRIPT: |
Leader metrics discovery | Additional metrics for leader node |
DEPENDENT | zookeeper.metrics.leader Preprocessing: - JSONPATH: - JAVASCRIPT: |
Group | Name | Description | Type | Key and additional info | ||
---|---|---|---|---|---|---|
Zabbix raw items | Zookeeper: Get server metrics | - |
HTTP_AGENT | zookeeper.get_metrics | ||
Zabbix raw items | Zookeeper: Get connections stats | Get information on client connections to server. Note, depending on the number of client connections this operation may be expensive (i.e. impact server performance). |
HTTP_AGENT | zookeeper.getconnectionsstats | ||
Zookeeper | Zookeeper: Server mode | Mode of the server. In an ensemble, this may either be leader or follower. Otherwise, it is standalone |
DEPENDENT | zookeeper.serverstate Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
||
Zookeeper | Zookeeper: Uptime | Uptime that a peer has been in a table leading/following/observing state. |
DEPENDENT | zookeeper.uptime Preprocessing: - JSONPATH: - MULTIPLIER: |
||
Zookeeper | Zookeeper: Version | Version of Zookeeper server. |
DEPENDENT | zookeeper.version Preprocessing: - JSONPATH: - REGEX: - DISCARDUNCHANGEDHEARTBEAT: |
||
Zookeeper | Zookeeper: Approximate data size | Data tree size in bytes.The size includes the znode path and its value. |
DEPENDENT | zookeeper.approximatedatasize Preprocessing: - JSONPATH: |
||
Zookeeper | Zookeeper: File descriptors, max | Maximum number of file descriptors that a zookeeper server can open. |
DEPENDENT | zookeeper.maxfiledescriptorcount Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
||
Zookeeper | Zookeeper: File descriptors, open | Number of file descriptors that a zookeeper server has open. |
DEPENDENT | zookeeper.openfiledescriptor_count Preprocessing: - JSONPATH: |
||
Zookeeper | Zookeeper: Outstanding requests | The number of queued requests when the server is under load and is receiving more sustained requests than it can process. |
DEPENDENT | zookeeper.outstanding_requests Preprocessing: - JSONPATH: |
||
Zookeeper | Zookeeper: Commit per sec | The number of commits performed per second |
DEPENDENT | zookeeper.commitcount.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
||
Zookeeper | Zookeeper: Diff syncs per sec | Number of diff syncs performed per second |
DEPENDENT | zookeeper.diffcount.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
||
Zookeeper | Zookeeper: Snap syncs per sec | Number of snap syncs performed per second |
DEPENDENT | zookeeper.snapcount.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
||
Zookeeper | Zookeeper: Looking per sec | Rate of transitions into looking state. |
DEPENDENT | zookeeper.lookingcount.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
||
Zookeeper | Zookeeper: Alive connections | Number of active clients connected to a zookeeper server. |
DEPENDENT | zookeeper.numaliveconnections Preprocessing: - JSONPATH: |
||
Zookeeper | Zookeeper: Global sessions | Number of global sessions. |
DEPENDENT | zookeeper.global_sessions Preprocessing: - JSONPATH: |
||
Zookeeper | Zookeeper: Local sessions | Number of local sessions. |
DEPENDENT | zookeeper.local_sessions Preprocessing: - JSONPATH: |
||
Zookeeper | Zookeeper: Drop connections per sec | Rate of connection drops. |
DEPENDENT | zookeeper.connectiondropcount.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||
Zookeeper | Zookeeper: Rejected connections per sec | Rate of connection rejected. |
DEPENDENT | zookeeper.connectionrejected.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
||
Zookeeper | Zookeeper: Revalidate connections per sec | Rate of connection revalidations. |
DEPENDENT | zookeeper.connectionrevalidatecount.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||
Zookeeper | Zookeeper: Revalidate per sec | Rate of revalidations. |
DEPENDENT | zookeeper.revalidatecount.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
||
Zookeeper | Zookeeper: Latency, max | The maximum amount of time it takes for the server to respond to a client request. |
DEPENDENT | zookeeper.max_latency Preprocessing: - JSONPATH: |
||
Zookeeper | Zookeeper: Latency, min | The minimum amount of time it takes for the server to respond to a client request. |
DEPENDENT | zookeeper.min_latency Preprocessing: - JSONPATH: |
||
Zookeeper | Zookeeper: Latency, avg | The average amount of time it takes for the server to respond to a client request. |
DEPENDENT | zookeeper.avg_latency Preprocessing: - JSONPATH: |
||
Zookeeper | Zookeeper: Znode count | The number of znodes in the ZooKeeper namespace (the data) |
DEPENDENT | zookeeper.znodecount Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
||
Zookeeper | Zookeeper: Ephemeral nodes count | Number of ephemeral nodes that a zookeeper server has in its data tree. |
DEPENDENT | zookeeper.ephemerals_count Preprocessing: - JSONPATH: |
||
Zookeeper | Zookeeper: Watch count | Number of watches currently set on the local ZooKeeper process. |
DEPENDENT | zookeeper.watch_count Preprocessing: - JSONPATH: |
||
Zookeeper | Zookeeper: Packets sent per sec | The number of zookeeper packets sent from a server per second. |
DEPENDENT | zookeeper.packetssent Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
||
Zookeeper | Zookeeper: Packets received per sec | The number of zookeeper packets received by a server per second. |
DEPENDENT | zookeeper.packetsreceived.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
||
Zookeeper | Zookeeper: Bytes received per sec | Number of bytes received per second. |
DEPENDENT | zookeeper.bytesreceivedcount.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
||
Zookeeper | Zookeeper: Election time, avg | Time between entering and leaving election. |
DEPENDENT | zookeeper.avgelectiontime Preprocessing: - JAVASCRIPT: |
||
Zookeeper | Zookeeper: Elections | Number of elections happened. |
DEPENDENT | zookeeper.cntelectiontime Preprocessing: - JAVASCRIPT: |
||
Zookeeper | Zookeeper: Fsync time, avg | Time to fsync transaction log. |
DEPENDENT | zookeeper.avg_fsynctime Preprocessing: - JAVASCRIPT: |
||
Zookeeper | Zookeeper: Fsync | Count of performed fsyncs. |
DEPENDENT | zookeeper.cntfsynctime Preprocessing: - JAVASCRIPT: `var metrics = JSON.parse(value) return metrics.cnt fsynctime |
metrics.fsynctime_count` | |
Zookeeper | Zookeeper: Snapshot write time, avg | Average time to write a snapshot. |
DEPENDENT | zookeeper.avg_snapshottime Preprocessing: - JAVASCRIPT: |
||
Zookeeper | Zookeeper: Snapshot writes | Count of performed snapshot writes. |
DEPENDENT | zookeeper.cntsnapshottime Preprocessing: - JAVASCRIPT: `var metrics = JSON.parse(value) return metrics.snapshottime count |
metrics.cnt_snapshottime` | |
Zookeeper | Zookeeper: Pending syncs{#SINGLETON} | Number of pending syncs to carry out to ZooKeeper ensemble followers. |
DEPENDENT | zookeeper.pending_syncs[{#SINGLETON}] Preprocessing: - JSONPATH: |
||
Zookeeper | Zookeeper: Quorum size{#SINGLETON} | - |
DEPENDENT | zookeeper.quorum_size[{#SINGLETON}] Preprocessing: - JSONPATH: |
||
Zookeeper | Zookeeper: Synced followers{#SINGLETON} | Number of synced followers reported when a node server_state is leader. |
DEPENDENT | zookeeper.synced_followers[{#SINGLETON}] Preprocessing: - JSONPATH: |
||
Zookeeper | Zookeeper: Synced non-voting follower{#SINGLETON} | Number of synced voting followers reported when a node server_state is leader. |
DEPENDENT | zookeeper.syncednonvoting_followers[{#SINGLETON}] Preprocessing: - JSONPATH: |
||
Zookeeper | Zookeeper: Synced observers{#SINGLETON} | Number of synced observers. |
DEPENDENT | zookeeper.synced_observers[{#SINGLETON}] Preprocessing: - JSONPATH: |
||
Zookeeper | Zookeeper: Learners{#SINGLETON} | Number of learners. |
DEPENDENT | zookeeper.learners[{#SINGLETON}] Preprocessing: - JSONPATH: |
||
Zookeeper | Zookeeper client {#TYPE} [{#CLIENT}]: Latency, max | The maximum amount of time it takes for the server to respond to a client request. |
DEPENDENT | zookeeper.max_latency[{#TYPE},{#CLIENT}] Preprocessing: - JSONPATH: |
||
Zookeeper | Zookeeper client {#TYPE} [{#CLIENT}]: Latency, min | The minimum amount of time it takes for the server to respond to a client request. |
DEPENDENT | zookeeper.min_latency[{#TYPE},{#CLIENT}] Preprocessing: - JSONPATH: |
||
Zookeeper | Zookeeper client {#TYPE} [{#CLIENT}]: Latency, avg | The average amount of time it takes for the server to respond to a client request. |
DEPENDENT | zookeeper.avg_latency[{#TYPE},{#CLIENT}] Preprocessing: - JSONPATH: |
||
Zookeeper | Zookeeper client {#TYPE} [{#CLIENT}]: Packets sent per sec | The number of packets sent. |
DEPENDENT | zookeeper.packetssent[{#TYPE},{#CLIENT}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
||
Zookeeper | Zookeeper client {#TYPE} [{#CLIENT}]: Packets received per sec | The number of packets received. |
DEPENDENT | zookeeper.packetsreceived[{#TYPE},{#CLIENT}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
||
Zookeeper | Zookeeper client {#TYPE} [{#CLIENT}]: Outstanding requests | The number of queued requests when the server is under load and is receiving more sustained requests than it can process. |
DEPENDENT | zookeeper.outstanding_requests[{#TYPE},{#CLIENT}] Preprocessing: - JSONPATH: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zookeeper: Server mode has changed | Zookeeper node state has changed. Ack to close. |
last(/Zookeeper by HTTP/zookeeper.server_state,#1)<>last(/Zookeeper by HTTP/zookeeper.server_state,#2) and length(last(/Zookeeper by HTTP/zookeeper.server_state))>0 |
INFO | Manual close: YES |
Zookeeper: Failed to fetch info data | Zabbix has not received data for items for the last 10 minutes |
nodata(/Zookeeper by HTTP/zookeeper.uptime,10m)=1 |
WARNING | Manual close: YES |
Zookeeper: Version has changed | Zookeeper version has changed. Ack to close. |
last(/Zookeeper by HTTP/zookeeper.version,#1)<>last(/Zookeeper by HTTP/zookeeper.version,#2) and length(last(/Zookeeper by HTTP/zookeeper.version))>0 |
INFO | Manual close: YES |
Zookeeper: Too many file descriptors used | Number of file descriptors used more than {$ZOOKEEPER.FILE_DESCRIPTORS.MAX.WARN}% of the available number of file descriptors. |
min(/Zookeeper by HTTP/zookeeper.open_file_descriptor_count,5m) * 100 / last(/Zookeeper by HTTP/zookeeper.max_file_descriptor_count) > {$ZOOKEEPER.FILE_DESCRIPTORS.MAX.WARN} |
WARNING | |
Zookeeper: Too many queued requests | Number of queued requests in the server. This goes up when the server receives more requests than it can process. |
min(/Zookeeper by HTTP/zookeeper.outstanding_requests,5m)>{$ZOOKEEPER.OUTSTANDING_REQ.MAX.WARN} |
AVERAGE | Manual close: YES |
Zookeeper: Too many pending syncs | - |
min(/Zookeeper by HTTP/zookeeper.pending_syncs[{#SINGLETON}],5m)>{$ZOOKEEPER.PENDING_SYNCS.MAX.WARN} |
AVERAGE | Manual close: YES |
Zookeeper: Too few active followers | The number of followers should equal the total size of your ZooKeeper ensemble, minus 1 (the leader is not included in the follower count). If the ensemble fails to maintain quorum, all automatic failover features are suspended. |
last(/Zookeeper by HTTP/zookeeper.synced_followers[{#SINGLETON}]) < last(/Zookeeper by HTTP/zookeeper.quorum_size[{#SINGLETON}])-1 |
AVERAGE |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher. This template is designed to monitor internal Zabbix metrics on the remote Zabbix server.
Specify the address of the remote Zabbix server by changing {$ADDRESS}
and {$PORT}
macros. Don't forget to adjust the StatsAllowedIP
parameter in the remote server's configuration file to allow the collection of statistics.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$ADDRESS} | - |
`` |
{$PORT} | - |
`` |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
High availability cluster node discovery | LLD rule with item and trigger prototypes for the node discovery. |
DEPENDENT | zabbix.nodes.discovery Preprocessing: - JSONPATH: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Cluster | Cluster node [{#NODE.NAME}]: Stats | Provides the statistics of a node. |
DEPENDENT | zabbix.nodes.stats[{#NODE.ID}] Preprocessing: - JSONPATH: |
Cluster | Cluster node [{#NODE.NAME}]: Address | The IPv4 address of a node. |
DEPENDENT | zabbix.nodes.address[{#NODE.ID}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Cluster | Cluster node [{#NODE.NAME}]: Last access time | Last access time. |
DEPENDENT | zabbix.nodes.lastaccess.time[{#NODE.ID}] Preprocessing: - JSONPATH: |
Cluster | Cluster node [{#NODE.NAME}]: Last access age | The time between the database's |
DEPENDENT | zabbix.nodes.lastaccess.age[{#NODE.ID}] Preprocessing: - JSONPATH: |
Cluster | Cluster node [{#NODE.NAME}]: Status | The status of a node. |
DEPENDENT | zabbix.nodes.status[{#NODE.ID}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Zabbix raw items | Remote Zabbix server: Zabbix stats | The master item of Zabbix server statistics. |
INTERNAL | zabbix[stats,{$ADDRESS},{$PORT}] |
Zabbix server | Remote Zabbix server: Zabbix stats queue over 10m | The number of monitored items in the queue, which are delayed at least by 10 minutes. |
INTERNAL | zabbix[stats,{$ADDRESS},{$PORT},queue,10m] Preprocessing: - JSONPATH: |
Zabbix server | Remote Zabbix server: Zabbix stats queue | The number of monitored items in the queue, which are delayed at least by 6 seconds. |
INTERNAL | zabbix[stats,{$ADDRESS},{$PORT},queue] Preprocessing: - JSONPATH: |
Zabbix server | Remote Zabbix server: Utilization of alert manager internal processes, in % | The average percentage of the time during which the alert manager processes have been busy for the last minute. |
DEPENDENT | process.alertmanager.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> No "alert manager" processes started. |
Zabbix server | Remote Zabbix server: Utilization of alert syncer internal processes, in % | The average percentage of the time during which the alert syncer processes have been busy for the last minute. |
DEPENDENT | process.alertsyncer.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> No "alert syncer" processes started. |
Zabbix server | Remote Zabbix server: Utilization of alerter internal processes, in % | The average percentage of the time during which the alerter processes have been busy for the last minute. |
DEPENDENT | process.alerter.avg.busy Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Zabbix server | Remote Zabbix server: Utilization of availability manager internal processes, in % | The average percentage of the time during which the availability manager processes have been busy for the last minute. |
DEPENDENT | process.availabilitymanager.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> No "availability manager" processes started. |
Zabbix server | Remote Zabbix server: Utilization of configuration syncer internal processes, in % | The average percentage of the time during which the configuration syncer processes have been busy for the last minute. |
DEPENDENT | process.configurationsyncer.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> No "configuration syncer" processes started. |
Zabbix server | Remote Zabbix server: Utilization of discoverer data collector processes, in % | The average percentage of the time during which the discoverer processes have been busy for the last minute. |
DEPENDENT | process.discoverer.avg.busy Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Zabbix server | Remote Zabbix server: Utilization of escalator internal processes, in % | The average percentage of the time during which the escalator processes have been busy for the last minute. |
DEPENDENT | process.escalator.avg.busy Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Zabbix server | Remote Zabbix server: Utilization of history poller data collector processes, in % | The average percentage of the time during which the history poller processes have been busy for the last minute. |
DEPENDENT | process.historypoller.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> No "history poller" processes started. |
Zabbix server | Remote Zabbix server: Utilization of ODBC poller data collector processes, in % | The average percentage of the time during which the ODBC poller processes have been busy for the last minute. |
DEPENDENT | process.odbc_poller.avg.busy Preprocessing: - JSONPATH: |
Zabbix server | Remote Zabbix server: Utilization of history syncer internal processes, in % | The average percentage of the time during which the history syncer processes have been busy for the last minute. |
DEPENDENT | process.historysyncer.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> No "history syncer" processes started. |
Zabbix server | Remote Zabbix server: Utilization of housekeeper internal processes, in % | The average percentage of the time during which the housekeeper processes have been busy for the last minute. |
DEPENDENT | process.housekeeper.avg.busy Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Zabbix server | Remote Zabbix server: Utilization of http poller data collector processes, in % | The average percentage of the time during which the http poller processes have been busy for the last minute. |
DEPENDENT | process.httppoller.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> No "http poller" processes started. |
Zabbix server | Remote Zabbix server: Utilization of icmp pinger data collector processes, in % | The average percentage of the time during which the icmp pinger processes have been busy for the last minute. |
DEPENDENT | process.icmppinger.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> No "icmp pinger" processes started. |
Zabbix server | Remote Zabbix server: Utilization of ipmi manager internal processes, in % | The average percentage of the time during which the ipmi manager processes have been busy for the last minute. |
DEPENDENT | process.ipmimanager.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> No "ipmi manager" processes started. |
Zabbix server | Remote Zabbix server: Utilization of ipmi poller data collector processes, in % | The average percentage of the time during which the ipmi poller processes have been busy for the last minute. |
DEPENDENT | process.ipmipoller.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> No "ipmi poller" processes started. |
Zabbix server | Remote Zabbix server: Utilization of java poller data collector processes, in % | The average percentage of the time during which the java poller processes have been busy for the last minute. |
DEPENDENT | process.javapoller.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> No "java poller" processes started. |
Zabbix server | Remote Zabbix server: Utilization of LLD manager internal processes, in % | The average percentage of the time during which the lld manager processes have been busy for the last minute. |
DEPENDENT | process.lldmanager.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> No "LLD manager" processes started. |
Zabbix server | Remote Zabbix server: Utilization of LLD worker internal processes, in % | The average percentage of the time during which the lld worker processes have been busy for the last minute. |
DEPENDENT | process.lldworker.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> No "LLD worker" processes started. |
Zabbix server | Remote Zabbix server: Utilization of poller data collector processes, in % | The average percentage of the time during which the poller processes have been busy for the last minute. |
DEPENDENT | process.poller.avg.busy Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Zabbix server | Remote Zabbix server: Utilization of preprocessing worker internal processes, in % | The average percentage of the time during which the preprocessing worker processes have been busy for the last minute. |
DEPENDENT | process.preprocessingworker.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> No "preprocessing worker" processes started. |
Zabbix server | Remote Zabbix server: Utilization of preprocessing manager internal processes, in % | The average percentage of the time during which the preprocessing manager processes have been busy for the last minute. |
DEPENDENT | process.preprocessingmanager.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> No "preprocessing manager" processes started. |
Zabbix server | Remote Zabbix server: Utilization of proxy poller data collector processes, in % | The average percentage of the time during which the proxy poller processes have been busy for the last minute. |
DEPENDENT | process.proxypoller.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> No "proxy poller" processes started. |
Zabbix server | Remote Zabbix server: Utilization of report manager internal processes, in % | The average percentage of the time during which the report manager processes have been busy for the last minute. |
DEPENDENT | process.reportmanager.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> No "report manager" processes started. |
Zabbix server | Remote Zabbix server: Utilization of report writer internal processes, in % | The average percentage of the time during which the report writer processes have been busy for the last minute. |
DEPENDENT | process.reportwriter.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> No "report writer" processes started. |
Zabbix server | Remote Zabbix server: Utilization of self-monitoring internal processes, in % | The average percentage of the time during which the self-monitoring processes have been busy for the last minute. |
DEPENDENT | process.self-monitoring.avg.busy Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Zabbix server | Remote Zabbix server: Utilization of snmp trapper data collector processes, in % | The average percentage of the time during which the snmp trapper processes have been busy for the last minute. |
DEPENDENT | process.snmptrapper.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> No "snmp trapper" processes started. |
Zabbix server | Remote Zabbix server: Utilization of task manager internal processes, in % | The average percentage of the time during which the task manager processes have been busy for the last minute. |
DEPENDENT | process.taskmanager.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> No "task manager" processes started. |
Zabbix server | Remote Zabbix server: Utilization of timer internal processes, in % | The average percentage of the time during which the timer processes have been busy for the last minute. |
DEPENDENT | process.timer.avg.busy Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Zabbix server | Remote Zabbix server: Utilization of service manager internal processes, in % | The average percentage of the time during which the service manager processes have been busy for the last minute. |
DEPENDENT | process.servicemanager.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> No "service manager" processes started. |
Zabbix server | Remote Zabbix server: Utilization of trigger housekeeper internal processes, in % | The average percentage of the time during which the trigger housekeeper processes have been busy for the last minute. |
DEPENDENT | process.triggerhousekeeper.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> No "trigger housekeeper" processes started. |
Zabbix server | Remote Zabbix server: Utilization of trapper data collector processes, in % | The average percentage of the time during which the trapper processes have been busy for the last minute. |
DEPENDENT | process.trapper.avg.busy Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Zabbix server | Remote Zabbix server: Utilization of unreachable poller data collector processes, in % | The average percentage of the time during which the unreachable poller processes have been busy for the last minute. |
DEPENDENT | process.unreachablepoller.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> No "unreachable poller" processes started. |
Zabbix server | Remote Zabbix server: Utilization of vmware data collector processes, in % | The average percentage of the time during which the vmware collector processes have been busy for the last minute. |
DEPENDENT | process.vmwarecollector.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> No "vmware collector" processes started. |
Zabbix server | Remote Zabbix server: Configuration cache, % used | The availability statistics of Zabbix configuration cache. The percentage of used data buffer. |
DEPENDENT | rcache.buffer.pused Preprocessing: - JSONPATH: |
Zabbix server | Remote Zabbix server: Trend function cache, % of unique requests | The effectiveness statistics of Zabbix trend function cache. The percentage of cached items calculated from the sum of cached items plus requests. Low percentage most likely means that the cache size can be reduced. |
DEPENDENT | tcache.pitems Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Zabbix server | Remote Zabbix server: Trend function cache, % of misses | The effectiveness statistics of Zabbix trend function cache. The percentage of cache misses. |
DEPENDENT | tcache.pmisses Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Zabbix server | Remote Zabbix server: Value cache, % used | The availability statistics of Zabbix value cache. The percentage of used data buffer. |
DEPENDENT | vcache.buffer.pused Preprocessing: - JSONPATH: |
Zabbix server | Remote Zabbix server: Value cache hits | The effectiveness statistics of Zabbix value cache. The number of cache hits (history values taken from the cache). |
DEPENDENT | vcache.cache.hits Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Zabbix server | Remote Zabbix server: Value cache misses | The effectiveness statistics of Zabbix value cache. The number of cache misses (history values taken from the database). |
DEPENDENT | vcache.cache.misses Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Zabbix server | Remote Zabbix server: Value cache operating mode | The operating mode of the value cache. |
DEPENDENT | vcache.cache.mode Preprocessing: - JSONPATH: |
Zabbix server | Remote Zabbix server: Version | A version of Zabbix server. |
DEPENDENT | version Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Zabbix server | Remote Zabbix server: VMware cache, % used | The availability statistics of Zabbix vmware cache. The percentage of used data buffer. |
DEPENDENT | vmware.buffer.pused Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Zabbix server | Remote Zabbix server: History write cache, % used | The statistics and availability of Zabbix write cache. The percentage of used history buffer. The history cache is used to store item values. A high number indicates performance problems on the database side. |
DEPENDENT | wcache.history.pused Preprocessing: - JSONPATH: |
Zabbix server | Remote Zabbix server: History index cache, % used | The statistics and availability of Zabbix write cache. The percentage of used history index buffer. The history index cache is used to index values stored in the history cache. |
DEPENDENT | wcache.index.pused Preprocessing: - JSONPATH: |
Zabbix server | Remote Zabbix server: Trend write cache, % used | The statistics and availability of Zabbix write cache. The percentage of used trend buffer. The trend cache stores the aggregate of all items that have receive data for the current hour. |
DEPENDENT | wcache.trend.pused Preprocessing: - JSONPATH: |
Zabbix server | Remote Zabbix server: Number of processed values per second | The statistics and availability of Zabbix write cache. The total number of values processed by Zabbix server or Zabbix proxy, except unsupported items. |
DEPENDENT | wcache.values Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Zabbix server | Remote Zabbix server: Number of processed numeric (float) values per second | The statistics and availability of Zabbix write cache. The number of processed float values. |
DEPENDENT | wcache.values.float Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Zabbix server | Remote Zabbix server: Number of processed log values per second | The statistics and availability of Zabbix write cache. The number of processed log values. |
DEPENDENT | wcache.values.log Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Zabbix server | Remote Zabbix server: Number of processed not supported values per second | The statistics and availability of Zabbix write cache. The number of times the item processing resulted in an item becoming unsupported or keeping that state. |
DEPENDENT | wcache.values.notsupported Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Zabbix server | Remote Zabbix server: Number of processed character values per second | The statistics and availability of Zabbix write cache. The number of processed character/string values. |
DEPENDENT | wcache.values.str Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Zabbix server | Remote Zabbix server: Number of processed text values per second | The statistics and availability of Zabbix write cache. The number of processed text values. |
DEPENDENT | wcache.values.text Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Zabbix server | Remote Zabbix server: LLD queue | The count of values enqueued in the low-level discovery processing queue. |
DEPENDENT | lld_queue Preprocessing: - JSONPATH: |
Zabbix server | Remote Zabbix server: Preprocessing queue | The count of values enqueued in the preprocessing queue. |
DEPENDENT | preprocessing_queue Preprocessing: - JSONPATH: |
Zabbix server | Remote Zabbix server: Number of processed numeric (unsigned) values per second | The statistics and availability of Zabbix write cache. The number of processed numeric (unsigned) values. |
DEPENDENT | wcache.values.uint Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Cluster node [{#NODE.NAME}]: Status changed | The state of the node has changed. Confirm to close. |
last(/Remote Zabbix server health/zabbix.nodes.status[{#NODE.ID}],#1)<>last(/Remote Zabbix server health/zabbix.nodes.status[{#NODE.ID}],#2) |
INFO | Manual close: YES |
Remote Zabbix server: More than 100 items having missing data for more than 10 minutes | The |
min(/Remote Zabbix server health/zabbix[stats,{$ADDRESS},{$PORT},queue,10m],10m)>100 |
WARNING | |
Remote Zabbix server: Utilization of alert manager processes is high | - |
avg(/Remote Zabbix server health/process.alert_manager.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.alert_manager.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of alert syncer processes is high | - |
avg(/Remote Zabbix server health/process.alert_syncer.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.alert_syncer.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of alerter processes is high | - |
avg(/Remote Zabbix server health/process.alerter.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.alerter.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of availability manager processes is high | - |
avg(/Remote Zabbix server health/process.availability_manager.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.availability_manager.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of configuration syncer processes is high | - |
avg(/Remote Zabbix server health/process.configuration_syncer.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.configuration_syncer.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of discoverer processes is high | - |
avg(/Remote Zabbix server health/process.discoverer.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.discoverer.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of escalator processes is high | - |
avg(/Remote Zabbix server health/process.escalator.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.escalator.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of history poller processes is high | - |
avg(/Remote Zabbix server health/process.history_poller.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.history_poller.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of ODBC poller processes is high | - |
avg(/Remote Zabbix server health/process.odbc_poller.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.odbc_poller.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of history syncer processes is high | - |
avg(/Remote Zabbix server health/process.history_syncer.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.history_syncer.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of housekeeper processes is high | - |
avg(/Remote Zabbix server health/process.housekeeper.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.housekeeper.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of http poller processes is high | - |
avg(/Remote Zabbix server health/process.http_poller.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.http_poller.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of icmp pinger processes is high | - |
avg(/Remote Zabbix server health/process.icmp_pinger.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.icmp_pinger.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of ipmi manager processes is high | - |
avg(/Remote Zabbix server health/process.ipmi_manager.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.ipmi_manager.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of ipmi poller processes is high | - |
avg(/Remote Zabbix server health/process.ipmi_poller.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.ipmi_poller.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of java poller processes is high | - |
avg(/Remote Zabbix server health/process.java_poller.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.java_poller.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of lld manager processes is high | - |
avg(/Remote Zabbix server health/process.lld_manager.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.lld_manager.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of lld worker processes is high | - |
avg(/Remote Zabbix server health/process.lld_worker.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.lld_worker.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of poller processes is high | - |
avg(/Remote Zabbix server health/process.poller.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.poller.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of preprocessing worker processes is high | - |
avg(/Remote Zabbix server health/process.preprocessing_worker.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.preprocessing_worker.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of preprocessing manager processes is high | - |
avg(/Remote Zabbix server health/process.preprocessing_manager.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.preprocessing_manager.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of proxy poller processes is high | - |
avg(/Remote Zabbix server health/process.proxy_poller.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.proxy_poller.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of report manager processes is high | - |
avg(/Remote Zabbix server health/process.report_manager.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.report_manager.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of report writer processes is high | - |
avg(/Remote Zabbix server health/process.report_writer.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.report_writer.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of self-monitoring processes is high | - |
avg(/Remote Zabbix server health/process.self-monitoring.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.self-monitoring.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of snmp trapper processes is high | - |
avg(/Remote Zabbix server health/process.snmp_trapper.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.snmp_trapper.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of task manager processes is high | - |
avg(/Remote Zabbix server health/process.task_manager.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.task_manager.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of timer processes is high | - |
avg(/Remote Zabbix server health/process.timer.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.timer.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of service manager processes is high | - |
avg(/Remote Zabbix server health/process.service_manager.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.service_manager.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of trigger housekeeper processes is high | - |
avg(/Remote Zabbix server health/process.trigger_housekeeper.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.trigger_housekeeper.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of trapper processes is high | - |
avg(/Remote Zabbix server health/process.trapper.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.trapper.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of unreachable poller processes is high | - |
avg(/Remote Zabbix server health/process.unreachable_poller.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.unreachable_poller.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: Utilization of vmware collector processes is high | - |
avg(/Remote Zabbix server health/process.vmware_collector.avg.busy,10m)>75 Recovery expression: avg(/Remote Zabbix server health/process.vmware_collector.avg.busy,10m)<65 |
AVERAGE | |
Remote Zabbix server: More than 75% used in the configuration cache | Consider increasing |
max(/Remote Zabbix server health/rcache.buffer.pused,10m)>75 |
AVERAGE | |
Remote Zabbix server: More than 95% used in the value cache | Consider increasing |
max(/Remote Zabbix server health/vcache.buffer.pused,10m)>95 |
AVERAGE | |
Remote Zabbix server: Zabbix value cache working in low memory mode | Once the low memory mode has been switched on, the value cache will remain in this state for 24 hours, even if the problem that triggered this mode is resolved sooner. |
last(/Remote Zabbix server health/vcache.cache.mode)=1 |
HIGH | |
Remote Zabbix server: Version has changed | Zabbix server version has changed. Acknowledge to close manually. |
last(/Remote Zabbix server health/version,#1)<>last(/Remote Zabbix server health/version,#2) and length(last(/Remote Zabbix server health/version))>0 |
INFO | Manual close: YES |
Remote Zabbix server: More than 75% used in the vmware cache | Consider increasing |
max(/Remote Zabbix server health/vmware.buffer.pused,10m)>75 |
AVERAGE | |
Remote Zabbix server: More than 75% used in the history cache | Consider increasing |
max(/Remote Zabbix server health/wcache.history.pused,10m)>75 |
AVERAGE | |
Remote Zabbix server: More than 75% used in the history index cache | Consider increasing |
max(/Remote Zabbix server health/wcache.index.pused,10m)>75 |
AVERAGE | |
Remote Zabbix server: More than 75% used in the trends cache | Consider increasing |
max(/Remote Zabbix server health/wcache.trend.pused,10m)>75 |
AVERAGE |
Please report any issues with the template at https://support.zabbix.com.
For Zabbix version: 6.2 and higher. This template is designed to monitor internal Zabbix metrics on the local Zabbix server.
Link this template to the local Zabbix server host.
No specific Zabbix configuration is required.
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
High availability cluster node discovery | LLD rule with item and trigger prototypes for the node discovery. |
DEPENDENT | zabbix.nodes.discovery |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Cluster | Cluster node [{#NODE.NAME}]: Stats | Provides the statistics of a node. |
DEPENDENT | zabbix.nodes.stats[{#NODE.ID}] Preprocessing: - JSONPATH: |
Cluster | Cluster node [{#NODE.NAME}]: Address | The IPv4 address of a node. |
DEPENDENT | zabbix.nodes.address[{#NODE.ID}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Cluster | Cluster node [{#NODE.NAME}]: Last access time | Last access time. |
DEPENDENT | zabbix.nodes.lastaccess.time[{#NODE.ID}] Preprocessing: - JSONPATH: |
Cluster | Cluster node [{#NODE.NAME}]: Last access age | The time between the database's |
DEPENDENT | zabbix.nodes.lastaccess.age[{#NODE.ID}] Preprocessing: - JSONPATH: |
Cluster | Cluster node [{#NODE.NAME}]: Status | The status of a node. |
DEPENDENT | zabbix.nodes.status[{#NODE.ID}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Zabbix raw items | Zabbix stats cluster | The master item of Zabbix cluster statistics. |
INTERNAL | zabbix[cluster,discovery,nodes] |
Zabbix server | Zabbix server: Queue over 10 minutes | The number of monitored items in the queue, which are delayed at least by 10 minutes. |
INTERNAL | zabbix[queue,10m] |
Zabbix server | Zabbix server: Queue | The number of monitored items in the queue, which are delayed at least by 6 seconds. |
INTERNAL | zabbix[queue] |
Zabbix server | Zabbix server: Utilization of alert manager internal processes, in % | The average percentage of the time during which the alert manager processes have been busy for the last minute. |
INTERNAL | zabbix[process,alert manager,avg,busy] |
Zabbix server | Zabbix server: Utilization of alert syncer internal processes, in % | The average percentage of the time during which the alert syncer processes have been busy for the last minute. |
INTERNAL | zabbix[process,alert syncer,avg,busy] |
Zabbix server | Zabbix server: Utilization of alerter internal processes, in % | The average percentage of the time during which the alerter processes have been busy for the last minute. |
INTERNAL | zabbix[process,alerter,avg,busy] |
Zabbix server | Zabbix server: Utilization of availability manager internal processes, in % | The average percentage of the time during which the availability manager processes have been busy for the last minute. |
INTERNAL | zabbix[process,availability manager,avg,busy] |
Zabbix server | Zabbix server: Utilization of configuration syncer internal processes, in % | The average percentage of the time during which the configuration syncer processes have been busy for the last minute. |
INTERNAL | zabbix[process,configuration syncer,avg,busy] |
Zabbix server | Zabbix server: Utilization of discoverer data collector processes, in % | The average percentage of the time during which the discoverer processes have been busy for the last minute. |
INTERNAL | zabbix[process,discoverer,avg,busy] |
Zabbix server | Zabbix server: Utilization of escalator internal processes, in % | The average percentage of the time during which the escalator processes have been busy for the last minute. |
INTERNAL | zabbix[process,escalator,avg,busy] |
Zabbix server | Zabbix server: Utilization of history poller data collector processes, in % | The average percentage of the time during which the history poller processes have been busy for the last minute. |
INTERNAL | zabbix[process,history poller,avg,busy] |
Zabbix server | Zabbix server: Utilization of ODBC poller data collector processes, in % | The average percentage of the time during which the ODBC poller processes have been busy for the last minute. |
INTERNAL | zabbix[process,odbc poller,avg,busy] |
Zabbix server | Zabbix server: Utilization of history syncer internal processes, in % | The average percentage of the time during which the history syncer processes have been busy for the last minute. |
INTERNAL | zabbix[process,history syncer,avg,busy] |
Zabbix server | Zabbix server: Utilization of housekeeper internal processes, in % | The average percentage of the time during which the housekeeper processes have been busy for the last minute. |
INTERNAL | zabbix[process,housekeeper,avg,busy] |
Zabbix server | Zabbix server: Utilization of http poller data collector processes, in % | The average percentage of the time during which the http poller processes have been busy for the last minute. |
INTERNAL | zabbix[process,http poller,avg,busy] |
Zabbix server | Zabbix server: Utilization of icmp pinger data collector processes, in % | The average percentage of the time during which the icmp pinger processes have been busy for the last minute. |
INTERNAL | zabbix[process,icmp pinger,avg,busy] |
Zabbix server | Zabbix server: Utilization of ipmi manager internal processes, in % | The average percentage of the time during which the ipmi manager processes have been busy for the last minute. |
INTERNAL | zabbix[process,ipmi manager,avg,busy] |
Zabbix server | Zabbix server: Utilization of ipmi poller data collector processes, in % | The average percentage of the time during which the ipmi poller processes have been busy for the last minute. |
INTERNAL | zabbix[process,ipmi poller,avg,busy] |
Zabbix server | Zabbix server: Utilization of java poller data collector processes, in % | The average percentage of the time during which the java poller processes have been busy for the last minute. |
INTERNAL | zabbix[process,java poller,avg,busy] |
Zabbix server | Zabbix server: Utilization of LLD manager internal processes, in % | The average percentage of the time during which the lld manager processes have been busy for the last minute. |
INTERNAL | zabbix[process,lld manager,avg,busy] |
Zabbix server | Zabbix server: Utilization of LLD worker internal processes, in % | The average percentage of the time during which the lld worker processes have been busy for the last minute. |
INTERNAL | zabbix[process,lld worker,avg,busy] |
Zabbix server | Zabbix server: Utilization of poller data collector processes, in % | The average percentage of the time during which the poller processes have been busy for the last minute. |
INTERNAL | zabbix[process,poller,avg,busy] |
Zabbix server | Zabbix server: Utilization of preprocessing worker internal processes, in % | The average percentage of the time during which the preprocessing worker processes have been busy for the last minute. |
INTERNAL | zabbix[process,preprocessing worker,avg,busy] |
Zabbix server | Zabbix server: Utilization of preprocessing manager internal processes, in % | The average percentage of the time during which the preprocessing manager processes have been busy for the last minute. |
INTERNAL | zabbix[process,preprocessing manager,avg,busy] |
Zabbix server | Zabbix server: Utilization of proxy poller data collector processes, in % | The average percentage of the time during which the proxy poller processes have been busy for the last minute. |
INTERNAL | zabbix[process,proxy poller,avg,busy] |
Zabbix server | Zabbix server: Utilization of report manager internal processes, in % | The average percentage of the time during which the report manager processes have been busy for the last minute. |
INTERNAL | zabbix[process,report manager,avg,busy] |
Zabbix server | Zabbix server: Utilization of report writer internal processes, in % | The average percentage of the time during which the report writer processes have been busy for the last minute. |
INTERNAL | zabbix[process,report writer,avg,busy] |
Zabbix server | Zabbix server: Utilization of self-monitoring internal processes, in % | The average percentage of the time during which the self-monitoring processes have been busy for the last minute. |
INTERNAL | zabbix[process,self-monitoring,avg,busy] |
Zabbix server | Zabbix server: Utilization of snmp trapper data collector processes, in % | The average percentage of the time during which the snmp trapper processes have been busy for the last minute. |
INTERNAL | zabbix[process,snmp trapper,avg,busy] |
Zabbix server | Zabbix server: Utilization of task manager internal processes, in % | The average percentage of the time during which the task manager processes have been busy for the last minute. |
INTERNAL | zabbix[process,task manager,avg,busy] |
Zabbix server | Zabbix server: Utilization of timer internal processes, in % | The average percentage of the time during which the timer processes have been busy for the last minute. |
INTERNAL | zabbix[process,timer,avg,busy] |
Zabbix server | Zabbix server: Utilization of service manager internal processes, in % | The average percentage of the time during which the service manager processes have been busy for the last minute. |
INTERNAL | zabbix[process,service manager,avg,busy] |
Zabbix server | Zabbix server: Utilization of trigger housekeeper internal processes, in % | The average percentage of the time during which the trigger housekeeper processes have been busy for the last minute. |
INTERNAL | zabbix[process,trigger housekeeper,avg,busy] |
Zabbix server | Zabbix server: Utilization of trapper data collector processes, in % | The average percentage of the time during which the trapper processes have been busy for the last minute. |
INTERNAL | zabbix[process,trapper,avg,busy] |
Zabbix server | Zabbix server: Utilization of unreachable poller data collector processes, in % | The average percentage of the time during which the unreachable poller processes have been busy for the last minute. |
INTERNAL | zabbix[process,unreachable poller,avg,busy] |
Zabbix server | Zabbix server: Utilization of vmware data collector processes, in % | The average percentage of the time during which the vmware collector processes have been busy for the last minute. |
INTERNAL | zabbix[process,vmware collector,avg,busy] |
Zabbix server | Zabbix server: Configuration cache, % used | The availability statistics of Zabbix configuration cache. The percentage of used data buffer. |
INTERNAL | zabbix[rcache,buffer,pused] |
Zabbix server | Zabbix server: Trend function cache, % of unique requests | The effectiveness statistics of Zabbix trend function cache. The percentage of cached items calculated from the sum of cached items plus requests. Low percentage most likely means that the cache size can be reduced. |
INTERNAL | zabbix[tcache,cache,pitems] |
Zabbix server | Zabbix server: Trend function cache, % of misses | The effectiveness statistics of Zabbix trend function cache. The percentage of cache misses. |
INTERNAL | zabbix[tcache,cache,pmisses] |
Zabbix server | Zabbix server: Value cache, % used | The availability statistics of Zabbix value cache. The percentage of used data buffer. |
INTERNAL | zabbix[vcache,buffer,pused] |
Zabbix server | Zabbix server: Value cache hits | The effectiveness statistics of Zabbix value cache. The number of cache hits (history values taken from the cache). |
INTERNAL | zabbix[vcache,cache,hits] Preprocessing: - CHANGEPERSECOND |
Zabbix server | Zabbix server: Value cache misses | The effectiveness statistics of Zabbix value cache. The number of cache misses (history values taken from the database). |
INTERNAL | zabbix[vcache,cache,misses] Preprocessing: - CHANGEPERSECOND |
Zabbix server | Zabbix server: Value cache operating mode | The operating mode of the value cache. |
INTERNAL | zabbix[vcache,cache,mode] |
Zabbix server | Zabbix server: Version | A version of Zabbix server. |
INTERNAL | zabbix[version] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Zabbix server | Zabbix server: VMware cache, % used | The availability statistics of Zabbix vmware cache. The percentage of used data buffer. |
INTERNAL | zabbix[vmware,buffer,pused] |
Zabbix server | Zabbix server: History write cache, % used | The statistics and availability of Zabbix write cache. The percentage of used history buffer. The history cache is used to store item values. A high number indicates performance problems on the database side. |
INTERNAL | zabbix[wcache,history,pused] |
Zabbix server | Zabbix server: History index cache, % used | The statistics and availability of Zabbix write cache. The percentage of used history index buffer. The history index cache is used to index values stored in the history cache. |
INTERNAL | zabbix[wcache,index,pused] |
Zabbix server | Zabbix server: Trend write cache, % used | The statistics and availability of Zabbix write cache. The percentage of used trend buffer. The trend cache stores the aggregate of all items that have received data for the current hour. |
INTERNAL | zabbix[wcache,trend,pused] |
Zabbix server | Zabbix server: Number of processed values per second | The statistics and availability of Zabbix write cache. The total number of values processed by Zabbix server or Zabbix proxy, except unsupported items. |
INTERNAL | zabbix[wcache,values] Preprocessing: - CHANGEPERSECOND |
Zabbix server | Zabbix server: Number of processed numeric (float) values per second | The statistics and availability of Zabbix write cache. The number of processed float values. |
INTERNAL | zabbix[wcache,values,float] Preprocessing: - CHANGEPERSECOND |
Zabbix server | Zabbix server: Number of processed log values per second | The statistics and availability of Zabbix write cache. The number of processed log values. |
INTERNAL | zabbix[wcache,values,log] Preprocessing: - CHANGEPERSECOND |
Zabbix server | Zabbix server: Number of processed not supported values per second | The statistics and availability of Zabbix write cache. The number of times the item processing resulted in an item becoming unsupported or keeping that state. |
INTERNAL | zabbix[wcache,values,not supported] Preprocessing: - CHANGEPERSECOND |
Zabbix server | Zabbix server: Number of processed character values per second | The statistics and availability of Zabbix write cache. The number of processed character/string values. |
INTERNAL | zabbix[wcache,values,str] Preprocessing: - CHANGEPERSECOND |
Zabbix server | Zabbix server: Number of processed text values per second | The statistics and availability of Zabbix write cache. The number of processed text values. |
INTERNAL | zabbix[wcache,values,text] Preprocessing: - CHANGEPERSECOND |
Zabbix server | Zabbix server: LLD queue | The count of values enqueued in the low-level discovery processing queue. |
INTERNAL | zabbix[lld_queue] |
Zabbix server | Zabbix server: Preprocessing queue | The count of values enqueued in the preprocessing queue. |
INTERNAL | zabbix[preprocessing_queue] |
Zabbix server | Zabbix server: Number of processed numeric (unsigned) values per second | The statistics and availability of Zabbix write cache. The number of processed numeric (unsigned) values. |
INTERNAL | zabbix[wcache,values,uint] Preprocessing: - CHANGEPERSECOND |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Cluster node [{#NODE.NAME}]: Status changed | The state of the node has changed. Confirm to close. |
last(/Zabbix server health/zabbix.nodes.status[{#NODE.ID}],#1)<>last(/Zabbix server health/zabbix.nodes.status[{#NODE.ID}],#2) |
INFO | Manual close: YES |
Zabbix server: More than 100 items having missing data for more than 10 minutes | The |
min(/Zabbix server health/zabbix[queue,10m],10m)>100 |
WARNING | |
Zabbix server: Utilization of alert manager processes is high | - |
avg(/Zabbix server health/zabbix[process,alert manager,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,alert manager,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of alert syncer processes is high | - |
avg(/Zabbix server health/zabbix[process,alert syncer,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,alert syncer,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of alerter processes is high | - |
avg(/Zabbix server health/zabbix[process,alerter,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,alerter,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of availability manager processes is high | - |
avg(/Zabbix server health/zabbix[process,availability manager,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,availability manager,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of configuration syncer processes is high | - |
avg(/Zabbix server health/zabbix[process,configuration syncer,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,configuration syncer,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of discoverer processes is high | - |
avg(/Zabbix server health/zabbix[process,discoverer,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,discoverer,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of escalator processes is high | - |
avg(/Zabbix server health/zabbix[process,escalator,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,escalator,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of history poller processes is high | - |
avg(/Zabbix server health/zabbix[process,history poller,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,history poller,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of ODBC poller processes is high | - |
avg(/Zabbix server health/zabbix[process,odbc poller,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,odbc poller,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of history syncer processes is high | - |
avg(/Zabbix server health/zabbix[process,history syncer,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,history syncer,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of housekeeper processes is high | - |
avg(/Zabbix server health/zabbix[process,housekeeper,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,housekeeper,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of http poller processes is high | - |
avg(/Zabbix server health/zabbix[process,http poller,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,http poller,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of icmp pinger processes is high | - |
avg(/Zabbix server health/zabbix[process,icmp pinger,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,icmp pinger,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of ipmi manager processes is high | - |
avg(/Zabbix server health/zabbix[process,ipmi manager,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,ipmi manager,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of ipmi poller processes is high | - |
avg(/Zabbix server health/zabbix[process,ipmi poller,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,ipmi poller,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of java poller processes is high | - |
avg(/Zabbix server health/zabbix[process,java poller,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,java poller,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of lld manager processes is high | - |
avg(/Zabbix server health/zabbix[process,lld manager,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,lld manager,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of lld worker processes is high | - |
avg(/Zabbix server health/zabbix[process,lld worker,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,lld worker,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of poller processes is high | - |
avg(/Zabbix server health/zabbix[process,poller,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,poller,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of preprocessing worker processes is high | - |
avg(/Zabbix server health/zabbix[process,preprocessing worker,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,preprocessing worker,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of preprocessing manager processes is high | - |
avg(/Zabbix server health/zabbix[process,preprocessing manager,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,preprocessing manager,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of proxy poller processes is high | - |
avg(/Zabbix server health/zabbix[process,proxy poller,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,proxy poller,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of report manager processes is high | - |
avg(/Zabbix server health/zabbix[process,report manager,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,report manager,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of report writer processes is high | - |
avg(/Zabbix server health/zabbix[process,report writer,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,report writer,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of self-monitoring processes is high | - |
avg(/Zabbix server health/zabbix[process,self-monitoring,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,self-monitoring,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of snmp trapper processes is high | - |
avg(/Zabbix server health/zabbix[process,snmp trapper,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,snmp trapper,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of task manager processes is high | - |
avg(/Zabbix server health/zabbix[process,task manager,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,task manager,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of timer processes is high | - |
avg(/Zabbix server health/zabbix[process,timer,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,timer,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of service manager processes is high | - |
avg(/Zabbix server health/zabbix[process,service manager,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,service manager,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of trigger housekeeper processes is high | - |
avg(/Zabbix server health/zabbix[process,trigger housekeeper,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,trigger housekeeper,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of trapper processes is high | - |
avg(/Zabbix server health/zabbix[process,trapper,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,trapper,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of unreachable poller processes is high | - |
avg(/Zabbix server health/zabbix[process,unreachable poller,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,unreachable poller,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: Utilization of vmware collector processes is high | - |
avg(/Zabbix server health/zabbix[process,vmware collector,avg,busy],10m)>75 Recovery expression: avg(/Zabbix server health/zabbix[process,vmware collector,avg,busy],10m)<65 |
AVERAGE | |
Zabbix server: More than 75% used in the configuration cache | Consider increasing |
max(/Zabbix server health/zabbix[rcache,buffer,pused],10m)>75 |
AVERAGE | |
Zabbix server: More than 95% used in the value cache | Consider increasing |
max(/Zabbix server health/zabbix[vcache,buffer,pused],10m)>95 |
AVERAGE | |
Zabbix server: Zabbix value cache working in low memory mode | Once the low memory mode has been switched on, the value cache will remain in this state for 24 hours, even if the problem that triggered this mode is resolved sooner. |
last(/Zabbix server health/zabbix[vcache,cache,mode])=1 |
HIGH | |
Zabbix server: Version has changed | Zabbix server version has changed. Acknowledge to close manually. |
last(/Zabbix server health/zabbix[version],#1)<>last(/Zabbix server health/zabbix[version],#2) and length(last(/Zabbix server health/zabbix[version]))>0 |
INFO | Manual close: YES |
Zabbix server: More than 75% used in the vmware cache | Consider increasing |
max(/Zabbix server health/zabbix[vmware,buffer,pused],10m)>75 |
AVERAGE | |
Zabbix server: More than 75% used in the history cache | Consider increasing |
max(/Zabbix server health/zabbix[wcache,history,pused],10m)>75 |
AVERAGE | |
Zabbix server: More than 75% used in the history index cache | Consider increasing |
max(/Zabbix server health/zabbix[wcache,index,pused],10m)>75 |
AVERAGE | |
Zabbix server: More than 75% used in the trends cache | Consider increasing |
max(/Zabbix server health/zabbix[wcache,trend,pused],10m)>75 |
AVERAGE |
Please report any issues with the template at https://support.zabbix.com.
For Zabbix version: 6.2 and higher
Refer to the vendor documentation.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$ZABBIX.PROXY.ADDRESS} | IP/DNS/network mask list of proxies to be remotely queried (default is 127.0.0.1). |
127.0.0.1 |
{$ZABBIX.PROXY.PORT} | Port of proxy to be remotely queried (default is 10051). |
10051 |
{$ZABBIX.PROXY.UTIL.MAX} | Maximum average percentage of time processes busy in the last minute (default is 75). |
75 |
{$ZABBIX.PROXY.UTIL.MIN} | Minimum average percentage of time processes busy in the last minute (default is 65). |
65 |
There are no template links in this template.
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Zabbix raw items | Remote Zabbix proxy: Zabbix stats | Zabbix server statistics master item. |
INTERNAL | zabbix[stats,{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT}] |
Zabbix proxy | Remote Zabbix proxy: Zabbix stats queue over 10m | Number of monitored items in the queue which are delayed at least by 10 minutes. |
INTERNAL | zabbix[stats,{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue,10m] Preprocessing: - JSONPATH: |
Zabbix proxy | Remote Zabbix proxy: Zabbix stats queue | Number of monitored items in the queue which are delayed at least by 6 seconds. |
INTERNAL | zabbix[stats,{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue] Preprocessing: - JSONPATH: |
Zabbix proxy | Remote Zabbix proxy: Utilization of data sender internal processes, in % | Average percentage of time data sender processes have been busy in the last minute. |
DEPENDENT | process.datasender.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> Processes data sender not started |
Zabbix proxy | Remote Zabbix proxy: Utilization of availability manager internal processes, in % | Average percentage of time availability manager processes have been busy in the last minute. |
DEPENDENT | process.availabilitymanager.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> Processes availability manager not started |
Zabbix proxy | Remote Zabbix proxy: Utilization of configuration syncer internal processes, in % | Average percentage of time configuration syncer processes have been busy in the last minute. |
DEPENDENT | process.configurationsyncer.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> Processes configuration syncer not started |
Zabbix proxy | Remote Zabbix proxy: Utilization of discoverer data collector processes, in % | Average percentage of time discoverer processes have been busy in the last minute. |
DEPENDENT | process.discoverer.avg.busy Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Zabbix proxy | Remote Zabbix proxy: Utilization of heartbeat sender internal processes, in % | Average percentage of time heartbeat sender processes have been busy in the last minute. |
DEPENDENT | process.heartbeatsender.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> Processes heartbeat sender not started |
Zabbix proxy | Remote Zabbix proxy: Utilization of ODBC poller data collector processes, in % | Average percentage of time ODBC poller processes have been busy in the last minute. |
DEPENDENT | process.odbc_poller.avg.busy Preprocessing: - JSONPATH: |
Zabbix proxy | Remote Zabbix proxy: Utilization of history poller data collector processes, in % | Average percentage of time history poller processes have been busy in the last minute. |
DEPENDENT | process.historypoller.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> Processes history poller not started |
Zabbix proxy | Remote Zabbix proxy: Utilization of history syncer internal processes, in % | Average percentage of time history syncer processes have been busy in the last minute. |
DEPENDENT | process.historysyncer.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> Processes history syncer not started |
Zabbix proxy | Remote Zabbix proxy: Utilization of housekeeper internal processes, in % | Average percentage of time housekeeper processes have been busy in the last minute. |
DEPENDENT | process.housekeeper.avg.busy Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Zabbix proxy | Remote Zabbix proxy: Utilization of http poller data collector processes, in % | Average percentage of time http poller processes have been busy in the last minute. |
DEPENDENT | process.httppoller.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> Processes http poller not started |
Zabbix proxy | Remote Zabbix proxy: Utilization of icmp pinger data collector processes, in % | Average percentage of time icmp pinger processes have been busy in the last minute. |
DEPENDENT | process.icmppinger.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> Processes icmp pinger not started |
Zabbix proxy | Remote Zabbix proxy: Utilization of ipmi manager internal processes, in % | Average percentage of time ipmi manager processes have been busy in the last minute. |
DEPENDENT | process.ipmimanager.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> Processes ipmi manager not started |
Zabbix proxy | Remote Zabbix proxy: Utilization of ipmi poller data collector processes, in % | Average percentage of time ipmi poller processes have been busy in the last minute. |
DEPENDENT | process.ipmipoller.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> Processes ipmi poller not started |
Zabbix proxy | Remote Zabbix proxy: Utilization of java poller data collector processes, in % | Average percentage of time java poller processes have been busy in the last minute. |
DEPENDENT | process.javapoller.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> Processes java poller not started |
Zabbix proxy | Remote Zabbix proxy: Utilization of poller data collector processes, in % | Average percentage of time poller processes have been busy in the last minute. |
DEPENDENT | process.poller.avg.busy Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Zabbix proxy | Remote Zabbix proxy: Utilization of preprocessing worker internal processes, in % | Average percentage of time preprocessing worker processes have been busy in the last minute. |
DEPENDENT | process.preprocessingworker.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> Processes preprocessing worker not started |
Zabbix proxy | Remote Zabbix proxy: Utilization of preprocessing manager internal processes, in % | Average percentage of time preprocessing manager processes have been busy in the last minute. |
DEPENDENT | process.preprocessingmanager.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> Processes preprocessing manager not started |
Zabbix proxy | Remote Zabbix proxy: Utilization of self-monitoring internal processes, in % | Average percentage of time self-monitoring processes have been busy in the last minute. |
DEPENDENT | process.self-monitoring.avg.busy Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Zabbix proxy | Remote Zabbix proxy: Utilization of snmp trapper data collector processes, in % | Average percentage of time snmp trapper processes have been busy in the last minute. |
DEPENDENT | process.snmptrapper.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> Processes snmp trapper not started |
Zabbix proxy | Remote Zabbix proxy: Utilization of task manager internal processes, in % | Average percentage of time task manager processes have been busy in the last minute. |
DEPENDENT | process.taskmanager.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> Processes task manager not started |
Zabbix proxy | Remote Zabbix proxy: Utilization of trapper data collector processes, in % | Average percentage of time trapper processes have been busy in the last minute. |
DEPENDENT | process.trapper.avg.busy Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Zabbix proxy | Remote Zabbix proxy: Utilization of unreachable poller data collector processes, in % | Average percentage of time unreachable poller processes have been busy in the last minute. |
DEPENDENT | process.unreachablepoller.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> Processes unreachable poller not started |
Zabbix proxy | Remote Zabbix proxy: Utilization of vmware data collector processes, in % | Average percentage of time vmware collector processes have been busy in the last minute. |
DEPENDENT | process.vmwarecollector.avg.busy Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> Processes vmware collector not started |
Zabbix proxy | Remote Zabbix proxy: Configuration cache, % used | Availability statistics of Zabbix configuration cache. Percentage of used buffer. |
DEPENDENT | rcache.buffer.pused Preprocessing: - JSONPATH: |
Zabbix proxy | Remote Zabbix proxy: Version | Version of Zabbix proxy. |
DEPENDENT | version Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Zabbix proxy | Remote Zabbix proxy: VMware cache, % used | Availability statistics of Zabbix vmware cache. Percentage of used buffer. |
DEPENDENT | vmware.buffer.pused Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Zabbix proxy | Remote Zabbix proxy: History write cache, % used | Statistics and availability of Zabbix write cache. Percentage of used history buffer. History cache is used to store item values. A high number indicates performance problems on the database side. |
DEPENDENT | wcache.history.pused Preprocessing: - JSONPATH: |
Zabbix proxy | Remote Zabbix proxy: History index cache, % used | Statistics and availability of Zabbix write cache. Percentage of used history index buffer. History index cache is used to index values stored in history cache. |
DEPENDENT | wcache.index.pused Preprocessing: - JSONPATH: |
Zabbix proxy | Remote Zabbix proxy: Number of processed values per second | Statistics and availability of Zabbix write cache. Total number of values processed by Zabbix server or Zabbix proxy, except unsupported items. |
DEPENDENT | wcache.values Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Zabbix proxy | Remote Zabbix proxy: Number of processed numeric (float) values per second | Statistics and availability of Zabbix write cache. Number of processed float values. |
DEPENDENT | wcache.values.float Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Zabbix proxy | Remote Zabbix proxy: Number of processed log values per second | Statistics and availability of Zabbix write cache. Number of processed log values. |
DEPENDENT | wcache.values.log Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Zabbix proxy | Remote Zabbix proxy: Number of processed not supported values per second | Statistics and availability of Zabbix write cache. Number of times item processing resulted in item becoming unsupported or keeping that state. |
DEPENDENT | wcache.values.notsupported Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Zabbix proxy | Remote Zabbix proxy: Number of processed character values per second | Statistics and availability of Zabbix write cache. Number of processed character/string values. |
DEPENDENT | wcache.values.str Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Zabbix proxy | Remote Zabbix proxy: Number of processed text values per second | Statistics and availability of Zabbix write cache. Number of processed text values. |
DEPENDENT | wcache.values.text Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Zabbix proxy | Remote Zabbix proxy: Preprocessing queue | Count of values enqueued in the preprocessing queue. |
DEPENDENT | preprocessing_queue Preprocessing: - JSONPATH: |
Zabbix proxy | Remote Zabbix proxy: Number of processed numeric (unsigned) values per second | Statistics and availability of Zabbix write cache. Number of processed numeric (unsigned) values. |
DEPENDENT | wcache.values.uint Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Zabbix proxy | Remote Zabbix proxy: Required performance | Required performance of Zabbix proxy, in new values per second expected. |
DEPENDENT | requiredperformance Preprocessing: - JSONPATH: |
Zabbix proxy | Remote Zabbix proxy: Uptime | Uptime of Zabbix proxy process in seconds. |
DEPENDENT | uptime Preprocessing: - JSONPATH: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Remote Zabbix proxy: More than 100 items having missing data for more than 10 minutes | zabbix[stats,{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue,10m] item is collecting data about how many items are missing data for more than 10 minutes. |
min(/Remote Zabbix proxy health/zabbix[stats,{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue,10m],10m)>100 |
WARNING | |
Remote Zabbix proxy: Utilization of data sender processes is high | - |
avg(/Remote Zabbix proxy health/process.data_sender.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"data sender"} Recovery expression: avg(/Remote Zabbix proxy health/process.data_sender.avg.busy,10m)<{$ZABBIX.PROXY.UTIL.MIN:"data sender"} |
AVERAGE | |
Remote Zabbix proxy: Utilization of availability manager processes is high | - |
avg(/Remote Zabbix proxy health/process.availability_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"availability manager"} Recovery expression: avg(/Remote Zabbix proxy health/process.availability_manager.avg.busy,10m)<{$ZABBIX.PROXY.UTIL.MIN:"availability manager"} |
AVERAGE | |
Remote Zabbix proxy: Utilization of configuration syncer processes is high | - |
avg(/Remote Zabbix proxy health/process.configuration_syncer.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"configuration syncer"} Recovery expression: avg(/Remote Zabbix proxy health/process.configuration_syncer.avg.busy,10m)<{$ZABBIX.PROXY.UTIL.MIN:"configuration syncer"} |
AVERAGE | |
Remote Zabbix proxy: Utilization of discoverer processes is high | - |
avg(/Remote Zabbix proxy health/process.discoverer.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"discoverer"} Recovery expression: avg(/Remote Zabbix proxy health/process.discoverer.avg.busy,10m)<{$ZABBIX.PROXY.UTIL.MIN:"discoverer"} |
AVERAGE | |
Remote Zabbix proxy: Utilization of heartbeat sender processes is high | - |
avg(/Remote Zabbix proxy health/process.heartbeat_sender.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"heartbeat sender"} Recovery expression: avg(/Remote Zabbix proxy health/process.heartbeat_sender.avg.busy,10m)<{$ZABBIX.PROXY.UTIL.MIN:"heartbeat sender"} |
AVERAGE | |
Remote Zabbix proxy: Utilization of ODBC poller processes is high | - |
avg(/Remote Zabbix proxy health/process.odbc_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ODBC poller"} Recovery expression: avg(/Remote Zabbix proxy health/process.odbc_poller.avg.busy,10m)<{$ZABBIX.PROXY.UTIL.MIN:"ODBC poller"} |
AVERAGE | |
Remote Zabbix proxy: Utilization of history poller processes is high | - |
avg(/Remote Zabbix proxy health/process.history_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"history poller"} Recovery expression: avg(/Remote Zabbix proxy health/process.history_poller.avg.busy,10m)<{$ZABBIX.PROXY.UTIL.MIN:"history poller"} |
AVERAGE | |
Remote Zabbix proxy: Utilization of history syncer processes is high | - |
avg(/Remote Zabbix proxy health/process.history_syncer.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"history syncer"} Recovery expression: avg(/Remote Zabbix proxy health/process.history_syncer.avg.busy,10m)<{$ZABBIX.PROXY.UTIL.MIN:"history syncer"} |
AVERAGE | |
Remote Zabbix proxy: Utilization of housekeeper processes is high | - |
avg(/Remote Zabbix proxy health/process.housekeeper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"housekeeper"} Recovery expression: avg(/Remote Zabbix proxy health/process.housekeeper.avg.busy,10m)<{$ZABBIX.PROXY.UTIL.MIN:"housekeeper"} |
AVERAGE | |
Remote Zabbix proxy: Utilization of http poller processes is high | - |
avg(/Remote Zabbix proxy health/process.http_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"http poller"} Recovery expression: avg(/Remote Zabbix proxy health/process.http_poller.avg.busy,10m)<{$ZABBIX.PROXY.UTIL.MIN:"http poller"} |
AVERAGE | |
Remote Zabbix proxy: Utilization of icmp pinger processes is high | - |
avg(/Remote Zabbix proxy health/process.icmp_pinger.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"icmp pinger"} Recovery expression: avg(/Remote Zabbix proxy health/process.icmp_pinger.avg.busy,10m)<{$ZABBIX.PROXY.UTIL.MIN:"icmp pinger"} |
AVERAGE | |
Remote Zabbix proxy: Utilization of ipmi manager processes is high | - |
avg(/Remote Zabbix proxy health/process.ipmi_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi manager"} Recovery expression: avg(/Remote Zabbix proxy health/process.ipmi_manager.avg.busy,10m)<{$ZABBIX.PROXY.UTIL.MIN:"ipmi manager"} |
AVERAGE | |
Remote Zabbix proxy: Utilization of ipmi poller processes is high | - |
avg(/Remote Zabbix proxy health/process.ipmi_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi poller"} Recovery expression: avg(/Remote Zabbix proxy health/process.ipmi_poller.avg.busy,10m)<{$ZABBIX.PROXY.UTIL.MIN:"ipmi poller"} |
AVERAGE | |
Remote Zabbix proxy: Utilization of java poller processes is high | - |
avg(/Remote Zabbix proxy health/process.java_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"java poller"} Recovery expression: avg(/Remote Zabbix proxy health/process.java_poller.avg.busy,10m)<{$ZABBIX.PROXY.UTIL.MIN:"java poller"} |
AVERAGE | |
Remote Zabbix proxy: Utilization of poller processes is high | - |
avg(/Remote Zabbix proxy health/process.poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"poller"} Recovery expression: avg(/Remote Zabbix proxy health/process.poller.avg.busy,10m)<{$ZABBIX.PROXY.UTIL.MIN:"poller"} |
AVERAGE | |
Remote Zabbix proxy: Utilization of preprocessing worker processes is high | - |
avg(/Remote Zabbix proxy health/process.preprocessing_worker.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing worker"} Recovery expression: avg(/Remote Zabbix proxy health/process.preprocessing_worker.avg.busy,10m)<{$ZABBIX.PROXY.UTIL.MIN:"preprocessing worker"} |
AVERAGE | |
Remote Zabbix proxy: Utilization of preprocessing manager processes is high | - |
avg(/Remote Zabbix proxy health/process.preprocessing_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing manager"} Recovery expression: avg(/Remote Zabbix proxy health/process.preprocessing_manager.avg.busy,10m)<{$ZABBIX.PROXY.UTIL.MIN:"preprocessing manager"} |
AVERAGE | |
Remote Zabbix proxy: Utilization of self-monitoring processes is high | - |
avg(/Remote Zabbix proxy health/process.self-monitoring.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"self-monitoring"} Recovery expression: avg(/Remote Zabbix proxy health/process.self-monitoring.avg.busy,10m)<{$ZABBIX.PROXY.UTIL.MIN:"self-monitoring"} |
AVERAGE | |
Remote Zabbix proxy: Utilization of snmp trapper processes is high | - |
avg(/Remote Zabbix proxy health/process.snmp_trapper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"snmp trapper"} Recovery expression: avg(/Remote Zabbix proxy health/process.snmp_trapper.avg.busy,10m)<{$ZABBIX.PROXY.UTIL.MIN:"snmp trapper"} |
AVERAGE | |
Remote Zabbix proxy: Utilization of task manager processes is high | - |
avg(/Remote Zabbix proxy health/process.task_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"task manager"} Recovery expression: avg(/Remote Zabbix proxy health/process.task_manager.avg.busy,10m)<{$ZABBIX.PROXY.UTIL.MIN:"task manager"} |
AVERAGE | |
Remote Zabbix proxy: Utilization of trapper processes is high | - |
avg(/Remote Zabbix proxy health/process.trapper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"trapper"} Recovery expression: avg(/Remote Zabbix proxy health/process.trapper.avg.busy,10m)<{$ZABBIX.PROXY.UTIL.MIN:"trapper"} |
AVERAGE | |
Remote Zabbix proxy: Utilization of unreachable poller processes is high | - |
avg(/Remote Zabbix proxy health/process.unreachable_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"unreachable poller"} Recovery expression: avg(/Remote Zabbix proxy health/process.unreachable_poller.avg.busy,10m)<{$ZABBIX.PROXY.UTIL.MIN:"unreachable poller"} |
AVERAGE | |
Remote Zabbix proxy: Utilization of vmware collector processes is high | - |
avg(/Remote Zabbix proxy health/process.vmware_collector.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"vmware collector"} Recovery expression: avg(/Remote Zabbix proxy health/process.vmware_collector.avg.busy,10m)<{$ZABBIX.PROXY.UTIL.MIN:"vmware collector"} |
AVERAGE | |
Remote Zabbix proxy: More than {$ZABBIX.PROXY.UTIL.MAX}% used in the configuration cache | Consider increasing CacheSize in the zabbix_server.conf configuration file. |
max(/Remote Zabbix proxy health/rcache.buffer.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX} |
AVERAGE | |
Remote Zabbix proxy: Version has changed | Remote Zabbix proxy version has changed. Ack to close. |
last(/Remote Zabbix proxy health/version,#1)<>last(/Remote Zabbix proxy health/version,#2) and length(last(/Remote Zabbix proxy health/version))>0 |
INFO | Manual close: YES |
Remote Zabbix proxy: More than {$ZABBIX.PROXY.UTIL.MAX}% used in the vmware cache | Consider increasing VMwareCacheSize in the zabbix_server.conf configuration file. |
max(/Remote Zabbix proxy health/vmware.buffer.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX} |
AVERAGE | |
Remote Zabbix proxy: More than {$ZABBIX.PROXY.UTIL.MAX}% used in the history cache | Consider increasing HistoryCacheSize in the zabbix_server.conf configuration file. |
max(/Remote Zabbix proxy health/wcache.history.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX} |
AVERAGE | |
Remote Zabbix proxy: More than {$ZABBIX.PROXY.UTIL.MAX}% used in the history index cache | Consider increasing HistoryIndexCacheSize in the zabbix_server.conf configuration file. |
max(/Remote Zabbix proxy health/wcache.index.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX} |
AVERAGE | |
Remote Zabbix proxy: has been restarted | Uptime is less than 10 minutes. |
last(/Remote Zabbix proxy health/uptime)<10m |
INFO | Manual close: YES |
Please report any issues with the template at https://support.zabbix.com
For Zabbix version: 6.2 and higher
Refer to the vendor documentation.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$ZABBIX.PROXY.UTIL.MAX} | Maximum average percentage of time processes busy in the last minute (default is 75). |
75 |
{$ZABBIX.PROXY.UTIL.MIN} | Minimum average percentage of time processes busy in the last minute (default is 65). |
65 |
There are no template links in this template.
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Zabbix proxy | Zabbix proxy: Queue over 10 minutes | Number of monitored items in the queue which are delayed at least by 10 minutes. |
INTERNAL | zabbix[queue,10m] |
Zabbix proxy | Zabbix proxy: Queue | Number of monitored items in the queue which are delayed at least by 6 seconds. |
INTERNAL | zabbix[queue] |
Zabbix proxy | Zabbix proxy: Utilization of data sender internal processes, in % | Average percentage of time data sender processes have been busy in the last minute. |
INTERNAL | zabbix[process,data sender,avg,busy] |
Zabbix proxy | Zabbix proxy: Utilization of availability manager internal processes, in % | Average percentage of time availability manager processes have been busy in the last minute. |
INTERNAL | zabbix[process,availability manager,avg,busy] |
Zabbix proxy | Zabbix proxy: Utilization of configuration syncer internal processes, in % | Average percentage of time configuration syncer processes have been busy in the last minute. |
INTERNAL | zabbix[process,configuration syncer,avg,busy] |
Zabbix proxy | Zabbix proxy: Utilization of discoverer data collector processes, in % | Average percentage of time discoverer processes have been busy in the last minute. |
INTERNAL | zabbix[process,discoverer,avg,busy] |
Zabbix proxy | Zabbix proxy: Utilization of heartbeat sender internal processes, in % | Average percentage of time heartbeat sender processes have been busy in the last minute. |
INTERNAL | zabbix[process,heartbeat sender,avg,busy] |
Zabbix proxy | Zabbix proxy: Utilization of ODBC poller data collector processes, in % | Average percentage of time ODBC poller processes have been busy in the last minute. |
INTERNAL | zabbix[process,odbc poller,avg,busy] |
Zabbix proxy | Zabbix proxy: Utilization of history poller data collector processes, in % | Average percentage of time history poller processes have been busy in the last minute. |
INTERNAL | zabbix[process,history poller,avg,busy] |
Zabbix proxy | Zabbix proxy: Utilization of history syncer internal processes, in % | Average percentage of time history syncer processes have been busy in the last minute. |
INTERNAL | zabbix[process,history syncer,avg,busy] |
Zabbix proxy | Zabbix proxy: Utilization of housekeeper internal processes, in % | Average percentage of time housekeeper processes have been busy in the last minute. |
INTERNAL | zabbix[process,housekeeper,avg,busy] |
Zabbix proxy | Zabbix proxy: Utilization of http poller data collector processes, in % | Average percentage of time http poller processes have been busy in the last minute. |
INTERNAL | zabbix[process,http poller,avg,busy] |
Zabbix proxy | Zabbix proxy: Utilization of icmp pinger data collector processes, in % | Average percentage of time icmp pinger processes have been busy in the last minute. |
INTERNAL | zabbix[process,icmp pinger,avg,busy] |
Zabbix proxy | Zabbix proxy: Utilization of ipmi manager internal processes, in % | Average percentage of time ipmi manager processes have been busy in the last minute. |
INTERNAL | zabbix[process,ipmi manager,avg,busy] |
Zabbix proxy | Zabbix proxy: Utilization of ipmi poller data collector processes, in % | Average percentage of time ipmi poller processes have been busy in the last minute. |
INTERNAL | zabbix[process,ipmi poller,avg,busy] |
Zabbix proxy | Zabbix proxy: Utilization of java poller data collector processes, in % | Average percentage of time java poller processes have been busy in the last minute. |
INTERNAL | zabbix[process,java poller,avg,busy] |
Zabbix proxy | Zabbix proxy: Utilization of poller data collector processes, in % | Average percentage of time poller processes have been busy in the last minute. |
INTERNAL | zabbix[process,poller,avg,busy] |
Zabbix proxy | Zabbix proxy: Utilization of preprocessing worker internal processes, in % | Average percentage of time preprocessing worker processes have been busy in the last minute. |
INTERNAL | zabbix[process,preprocessing worker,avg,busy] |
Zabbix proxy | Zabbix proxy: Utilization of preprocessing manager internal processes, in % | Average percentage of time preprocessing manager processes have been busy in the last minute. |
INTERNAL | zabbix[process,preprocessing manager,avg,busy] |
Zabbix proxy | Zabbix proxy: Utilization of self-monitoring internal processes, in % | Average percentage of time self-monitoring processes have been busy in the last minute. |
INTERNAL | zabbix[process,self-monitoring,avg,busy] |
Zabbix proxy | Zabbix proxy: Utilization of snmp trapper data collector processes, in % | Average percentage of time snmp trapper processes have been busy in the last minute. |
INTERNAL | zabbix[process,snmp trapper,avg,busy] |
Zabbix proxy | Zabbix proxy: Utilization of task manager internal processes, in % | Average percentage of time task manager processes have been busy in the last minute. |
INTERNAL | zabbix[process,task manager,avg,busy] |
Zabbix proxy | Zabbix proxy: Utilization of trapper data collector processes, in % | Average percentage of time trapper processes have been busy in the last minute. |
INTERNAL | zabbix[process,trapper,avg,busy] |
Zabbix proxy | Zabbix proxy: Utilization of unreachable poller data collector processes, in % | Average percentage of time unreachable poller processes have been busy in the last minute. |
INTERNAL | zabbix[process,unreachable poller,avg,busy] |
Zabbix proxy | Zabbix proxy: Utilization of vmware data collector processes, in % | Average percentage of time vmware collector processes have been busy in the last minute. |
INTERNAL | zabbix[process,vmware collector,avg,busy] |
Zabbix proxy | Zabbix proxy: Configuration cache, % used | Availability statistics of Zabbix configuration cache. Percentage of used buffer. |
INTERNAL | zabbix[rcache,buffer,pused] |
Zabbix proxy | Zabbix proxy: Version | Version of Zabbix proxy. |
INTERNAL | zabbix[version] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Zabbix proxy | Zabbix proxy: VMware cache, % used | Availability statistics of Zabbix vmware cache. Percentage of used buffer. |
INTERNAL | zabbix[vmware,buffer,pused] |
Zabbix proxy | Zabbix proxy: History write cache, % used | Statistics and availability of Zabbix write cache. Percentage of used history buffer. History cache is used to store item values. A high number indicates performance problems on the database side. |
INTERNAL | zabbix[wcache,history,pused] |
Zabbix proxy | Zabbix proxy: History index cache, % used | Statistics and availability of Zabbix write cache. Percentage of used history index buffer. History index cache is used to index values stored in history cache. |
INTERNAL | zabbix[wcache,index,pused] |
Zabbix proxy | Zabbix proxy: Number of processed values per second | Statistics and availability of Zabbix write cache. Total number of values processed by Zabbix server or Zabbix proxy, except unsupported items. |
INTERNAL | zabbix[wcache,values] Preprocessing: - CHANGEPERSECOND |
Zabbix proxy | Zabbix proxy: Number of processed numeric (float) values per second | Statistics and availability of Zabbix write cache. Number of processed float values. |
INTERNAL | zabbix[wcache,values,float] Preprocessing: - CHANGEPERSECOND |
Zabbix proxy | Zabbix proxy: Number of processed log values per second | Statistics and availability of Zabbix write cache. Number of processed log values. |
INTERNAL | zabbix[wcache,values,log] Preprocessing: - CHANGEPERSECOND |
Zabbix proxy | Zabbix proxy: Number of processed not supported values per second | Statistics and availability of Zabbix write cache. Number of times item processing resulted in item becoming unsupported or keeping that state. |
INTERNAL | zabbix[wcache,values,not supported] Preprocessing: - CHANGEPERSECOND |
Zabbix proxy | Zabbix proxy: Number of processed character values per second | Statistics and availability of Zabbix write cache. Number of processed character/string values. |
INTERNAL | zabbix[wcache,values,str] Preprocessing: - CHANGEPERSECOND |
Zabbix proxy | Zabbix proxy: Number of processed text values per second | Statistics and availability of Zabbix write cache. Number of processed text values. |
INTERNAL | zabbix[wcache,values,text] Preprocessing: - CHANGEPERSECOND |
Zabbix proxy | Zabbix proxy: Preprocessing queue | Count of values enqueued in the preprocessing queue. |
INTERNAL | zabbix[preprocessing_queue] |
Zabbix proxy | Zabbix proxy: Number of processed numeric (unsigned) values per second | Statistics and availability of Zabbix write cache. Number of processed numeric (unsigned) values. |
INTERNAL | zabbix[wcache,values,uint] Preprocessing: - CHANGEPERSECOND |
Zabbix proxy | Zabbix proxy: Values waiting to be sent | Number of values in the proxy history table waiting to be sent to the server. |
INTERNAL | zabbix[proxy_history] |
Zabbix proxy | Zabbix proxy: Required performance | Required performance of Zabbix proxy, in new values per second expected. |
INTERNAL | zabbix[requiredperformance] |
Zabbix proxy | Zabbix proxy: Uptime | Uptime of Zabbix proxy process in seconds. |
INTERNAL | zabbix[uptime] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zabbix proxy: More than 100 items having missing data for more than 10 minutes | zabbix[stats,{$IP},{$PORT},queue,10m] item is collecting data about how many items are missing data for more than 10 minutes. |
min(/Zabbix proxy health/zabbix[queue,10m],10m)>100 |
WARNING | |
Zabbix proxy: Utilization of data sender processes is high | - |
avg(/Zabbix proxy health/zabbix[process,data sender,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"data sender"} Recovery expression: avg(/Zabbix proxy health/zabbix[process,data sender,avg,busy],10m)<{$ZABBIX.PROXY.UTIL.MIN:"data sender"} |
AVERAGE | |
Zabbix proxy: Utilization of availability manager processes is high | - |
avg(/Zabbix proxy health/zabbix[process,availability manager,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"availability manager"} Recovery expression: avg(/Zabbix proxy health/zabbix[process,availability manager,avg,busy],10m)<{$ZABBIX.PROXY.UTIL.MIN:"availability manager"} |
AVERAGE | |
Zabbix proxy: Utilization of configuration syncer processes is high | - |
avg(/Zabbix proxy health/zabbix[process,configuration syncer,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"configuration syncer"} Recovery expression: avg(/Zabbix proxy health/zabbix[process,configuration syncer,avg,busy],10m)<{$ZABBIX.PROXY.UTIL.MIN:"configuration syncer"} |
AVERAGE | |
Zabbix proxy: Utilization of discoverer processes is high | - |
avg(/Zabbix proxy health/zabbix[process,discoverer,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"discoverer"} Recovery expression: avg(/Zabbix proxy health/zabbix[process,discoverer,avg,busy],10m)<{$ZABBIX.PROXY.UTIL.MIN:"discoverer"} |
AVERAGE | |
Zabbix proxy: Utilization of heartbeat sender processes is high | - |
avg(/Zabbix proxy health/zabbix[process,heartbeat sender,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"heartbeat sender"} Recovery expression: avg(/Zabbix proxy health/zabbix[process,heartbeat sender,avg,busy],10m)<{$ZABBIX.PROXY.UTIL.MIN:"heartbeat sender"} |
AVERAGE | |
Zabbix proxy: Utilization of ODBC poller processes is high | - |
avg(/Zabbix proxy health/zabbix[process,odbc poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"ODBC poller"} Recovery expression: avg(/Zabbix proxy health/zabbix[process,odbc poller,avg,busy],10m)<{$ZABBIX.PROXY.UTIL.MIN:"ODBC poller"} |
AVERAGE | |
Zabbix proxy: Utilization of history poller processes is high | - |
avg(/Zabbix proxy health/zabbix[process,history poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"history poller"} Recovery expression: avg(/Zabbix proxy health/zabbix[process,history poller,avg,busy],10m)<{$ZABBIX.PROXY.UTIL.MIN:"history poller"} |
AVERAGE | |
Zabbix proxy: Utilization of history syncer processes is high | - |
avg(/Zabbix proxy health/zabbix[process,history syncer,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"history syncer"} Recovery expression: avg(/Zabbix proxy health/zabbix[process,history syncer,avg,busy],10m)<{$ZABBIX.PROXY.UTIL.MIN:"history syncer"} |
AVERAGE | |
Zabbix proxy: Utilization of housekeeper processes is high | - |
avg(/Zabbix proxy health/zabbix[process,housekeeper,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"housekeeper"} Recovery expression: avg(/Zabbix proxy health/zabbix[process,housekeeper,avg,busy],10m)<{$ZABBIX.PROXY.UTIL.MIN:"housekeeper"} |
AVERAGE | |
Zabbix proxy: Utilization of http poller processes is high | - |
avg(/Zabbix proxy health/zabbix[process,http poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"http poller"} Recovery expression: avg(/Zabbix proxy health/zabbix[process,http poller,avg,busy],10m)<{$ZABBIX.PROXY.UTIL.MIN:"http poller"} |
AVERAGE | |
Zabbix proxy: Utilization of icmp pinger processes is high | - |
avg(/Zabbix proxy health/zabbix[process,icmp pinger,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"icmp pinger"} Recovery expression: avg(/Zabbix proxy health/zabbix[process,icmp pinger,avg,busy],10m)<{$ZABBIX.PROXY.UTIL.MIN:"icmp pinger"} |
AVERAGE | |
Zabbix proxy: Utilization of ipmi manager processes is high | - |
avg(/Zabbix proxy health/zabbix[process,ipmi manager,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi manager"} Recovery expression: avg(/Zabbix proxy health/zabbix[process,ipmi manager,avg,busy],10m)<{$ZABBIX.PROXY.UTIL.MIN:"ipmi manager"} |
AVERAGE | |
Zabbix proxy: Utilization of ipmi poller processes is high | - |
avg(/Zabbix proxy health/zabbix[process,ipmi poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi poller"} Recovery expression: avg(/Zabbix proxy health/zabbix[process,ipmi poller,avg,busy],10m)<{$ZABBIX.PROXY.UTIL.MIN:"ipmi poller"} |
AVERAGE | |
Zabbix proxy: Utilization of java poller processes is high | - |
avg(/Zabbix proxy health/zabbix[process,java poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"java poller"} Recovery expression: avg(/Zabbix proxy health/zabbix[process,java poller,avg,busy],10m)<{$ZABBIX.PROXY.UTIL.MIN:"java poller"} |
AVERAGE | |
Zabbix proxy: Utilization of poller processes is high | - |
avg(/Zabbix proxy health/zabbix[process,poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"poller"} Recovery expression: avg(/Zabbix proxy health/zabbix[process,poller,avg,busy],10m)<{$ZABBIX.PROXY.UTIL.MIN:"poller"} |
AVERAGE | |
Zabbix proxy: Utilization of preprocessing worker processes is high | - |
avg(/Zabbix proxy health/zabbix[process,preprocessing worker,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing worker"} Recovery expression: avg(/Zabbix proxy health/zabbix[process,preprocessing worker,avg,busy],10m)<{$ZABBIX.PROXY.UTIL.MIN:"preprocessing worker"} |
AVERAGE | |
Zabbix proxy: Utilization of preprocessing manager processes is high | - |
avg(/Zabbix proxy health/zabbix[process,preprocessing manager,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing manager"} Recovery expression: avg(/Zabbix proxy health/zabbix[process,preprocessing manager,avg,busy],10m)<{$ZABBIX.PROXY.UTIL.MIN:"preprocessing manager"} |
AVERAGE | |
Zabbix proxy: Utilization of self-monitoring processes is high | - |
avg(/Zabbix proxy health/zabbix[process,self-monitoring,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"self-monitoring"} Recovery expression: avg(/Zabbix proxy health/zabbix[process,self-monitoring,avg,busy],10m)<{$ZABBIX.PROXY.UTIL.MIN:"self-monitoring"} |
AVERAGE | |
Zabbix proxy: Utilization of snmp trapper processes is high | - |
avg(/Zabbix proxy health/zabbix[process,snmp trapper,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"snmp trapper"} Recovery expression: avg(/Zabbix proxy health/zabbix[process,snmp trapper,avg,busy],10m)<{$ZABBIX.PROXY.UTIL.MIN:"snmp trapper"} |
AVERAGE | |
Zabbix proxy: Utilization of task manager processes is high | - |
avg(/Zabbix proxy health/zabbix[process,task manager,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"task manager"} Recovery expression: avg(/Zabbix proxy health/zabbix[process,task manager,avg,busy],10m)<{$ZABBIX.PROXY.UTIL.MIN:"task manager"} |
AVERAGE | |
Zabbix proxy: Utilization of trapper processes is high | - |
avg(/Zabbix proxy health/zabbix[process,trapper,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"trapper"} Recovery expression: avg(/Zabbix proxy health/zabbix[process,trapper,avg,busy],10m)<{$ZABBIX.PROXY.UTIL.MIN:"trapper"} |
AVERAGE | |
Zabbix proxy: Utilization of unreachable poller processes is high | - |
avg(/Zabbix proxy health/zabbix[process,unreachable poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"unreachable poller"} Recovery expression: avg(/Zabbix proxy health/zabbix[process,unreachable poller,avg,busy],10m)<{$ZABBIX.PROXY.UTIL.MIN:"unreachable poller"} |
AVERAGE | |
Zabbix proxy: Utilization of vmware collector processes is high | - |
avg(/Zabbix proxy health/zabbix[process,vmware collector,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"vmware collector"} Recovery expression: avg(/Zabbix proxy health/zabbix[process,vmware collector,avg,busy],10m)<{$ZABBIX.PROXY.UTIL.MIN:"vmware collector"} |
AVERAGE | |
Zabbix proxy: More than {$ZABBIX.PROXY.UTIL.MAX}% used in the configuration cache | Consider increasing CacheSize in the zabbix_proxy.conf configuration file. |
max(/Zabbix proxy health/zabbix[rcache,buffer,pused],10m)>{$ZABBIX.PROXY.UTIL.MAX} |
AVERAGE | |
Zabbix proxy: Version has changed | Zabbix proxy version has changed. Ack to close. |
last(/Zabbix proxy health/zabbix[version],#1)<>last(/Zabbix proxy health/zabbix[version],#2) and length(last(/Zabbix proxy health/zabbix[version]))>0 |
INFO | Manual close: YES |
Zabbix proxy: More than {$ZABBIX.PROXY.UTIL.MAX}% used in the vmware cache | Consider increasing VMwareCacheSize in the zabbix_proxy.conf configuration file. |
max(/Zabbix proxy health/zabbix[vmware,buffer,pused],10m)>{$ZABBIX.PROXY.UTIL.MAX} |
AVERAGE | |
Zabbix proxy: More than {$ZABBIX.PROXY.UTIL.MAX}% used in the history cache | Consider increasing HistoryCacheSize in the zabbix_proxy.conf configuration file. |
max(/Zabbix proxy health/zabbix[wcache,history,pused],10m)>{$ZABBIX.PROXY.UTIL.MAX} |
AVERAGE | |
Zabbix proxy: More than {$ZABBIX.PROXY.UTIL.MAX}% used in the history index cache | Consider increasing HistoryIndexCacheSize in the zabbix_proxy.conf configuration file. |
max(/Zabbix proxy health/zabbix[wcache,index,pused],10m)>{$ZABBIX.PROXY.UTIL.MAX} |
AVERAGE | |
Zabbix proxy: has been restarted | Uptime is less than 10 minutes. |
last(/Zabbix proxy health/zabbix[uptime])<10m |
INFO | Manual close: YES |
Please report any issues with the template at https://support.zabbix.com
For Zabbix version: 6.2 and higher
Official JMX Template for WildFly server.
This template was tested on:
See Zabbix template operation for basic instructions.
Metrics are collected by JMX. This template works with standalone and domain instances.
/(wildfly,EAP,Jboss,AS)/bin/client
in to directory /usr/share/zabbix-java-gateway/lib
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$WILDFLY.CONN.USAGE.WARN.MAX} | The maximum connection usage percent for trigger expression. |
80 |
{$WILDFLY.CONN.WAIT.MAX.WARN} | The maximum number of waiting connections for trigger expression. |
300 |
{$WILDFLY.DEPLOYMENT.MATCHES} | Filter of discoverable deployments |
.* |
{$WILDFLY.DEPLOYMENT.NOT_MATCHES} | Filter to exclude discovered deployments |
CHANGE_IF_NEEDED |
{$WILDFLY.JMX.PROTOCOL} | - |
remote+http |
{$WILDFLY.PASSWORD} | - |
zabbix |
{$WILDFLY.USER} | - |
zabbix |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Deployments discovery | Discovery deployments metrics. |
JMX | jmx.get[beans,"jboss.as.expr:deployment=*"] Filter: AND- {#DEPLOYMENT} MATCHESREGEX - {#DEPLOYMENT} NOTMATCHES_REGEX |
JDBC metrics discovery | - |
JMX | jmx.get[beans,"jboss.as:subsystem=datasources,data-source=*,statistics=jdbc"] |
Pools metrics discovery | - |
JMX | jmx.get[beans,"jboss.as:subsystem=datasources,data-source=*,statistics=pool"] |
Undertow metrics discovery | - |
JMX | jmx.get[beans,"jboss.as:subsystem=undertow,server=,http-listener="] |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
WildFly | WildFly: Launch type | The manner in which the server process was launched. Either "DOMAIN" for a domain mode server launched by a Host Controller, "STANDALONE" for a standalone server launched from the command line, or "EMBEDDED" for a standalone server launched as an embedded part of an application running in the same virtual machine. |
JMX | jmx["jboss.as:management-root=server","launchType"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
WildFly | WildFly: Name | For standalone mode: The name of this server. If not set, defaults to the runtime value of InetAddress.getLocalHost().getHostName(). For domain mode: The name given to this domain |
JMX | jmx["jboss.as:management-root=server","name"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
WildFly | WildFly: Process type | The type of process represented by this root resource. |
JMX | jmx["jboss.as:management-root=server","processType"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
WildFly | WildFly: Runtime configuration state | The current persistent configuration state, one of starting, ok, reload-required, restart-required, stopping or stopped. |
JMX | jmx["jboss.as:management-root=server","runtimeConfigurationState"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
WildFly | WildFly: Server controller state | The current state of the server controller; either STARTING, RUNNING, RESTARTREQUIRED, RELOADREQUIRED or STOPPING. |
JMX | jmx["jboss.as:management-root=server","serverState"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
WildFly | WildFly: Version | The version of the WildFly Core based product release |
JMX | jmx["jboss.as:management-root=server","productVersion"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
WildFly | WildFly: Uptime | WildFly server uptime. |
JMX | jmx["java.lang:type=Runtime","Uptime"] Preprocessing: - MULTIPLIER: |
WildFly | WildFly: Transactions: Total, rate | The total number of transactions (top-level and nested) created per second. |
JMX | jmx["jboss.as:subsystem=transactions","numberOfTransactions"] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly: Transactions: Aborted, rate | The number of aborted (i.e. rolledback) transactions per second. |
JMX | jmx["jboss.as:subsystem=transactions","numberOfAbortedTransactions"] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly: Transactions: Application rollbacks, rate | The number of transactions that have been rolled back by application request. This includes those that timeout, since the timeout behavior is considered an attribute of the application configuration. |
JMX | jmx["jboss.as:subsystem=transactions","numberOfApplicationRollbacks"] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly: Transactions: Committed, rate | The number of committed transactions |
JMX | jmx["jboss.as:subsystem=transactions","numberOfCommittedTransactions"] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly: Transactions: Heuristics, rate | The number of transactions which have terminated with heuristic outcomes. |
JMX | jmx["jboss.as:subsystem=transactions","numberOfHeuristics"] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly: Transactions: Current | The number of transactions that have begun but not yet terminated. |
JMX | jmx["jboss.as:subsystem=transactions","numberOfInflightTransactions"] |
WildFly | WildFly: Transactions: Nested, rate | The total number of nested (sub) transactions created. |
JMX | jmx["jboss.as:subsystem=transactions","numberOfNestedTransactions"] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly: Transactions: ResourceRollbacks, rate | The number of transactions that rolled back due to resource (participant) failure. |
JMX | jmx["jboss.as:subsystem=transactions","numberOfResourceRollbacks"] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly: Transactions: System rollbacks, rate | The number of transactions that have been rolled back due to internal system errors. |
JMX | jmx["jboss.as:subsystem=transactions","numberOfSystemRollbacks"] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly: Transactions: Timed out, rate | The number of transactions that have rolled back due to timeout. |
JMX | jmx["jboss.as:subsystem=transactions","numberOfTimedOutTransactions"] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly deployment [{#DEPLOYMENT}]: Status | The current runtime status of a deployment. Possible status modes are OK, FAILED, and STOPPED. FAILED indicates a dependency is missing or a service could not start. STOPPED indicates that the deployment was not enabled or was manually stopped. |
JMX | jmx["{#JMXOBJ}",status] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
WildFly | WildFly deployment [{#DEPLOYMENT}]: Enabled | Boolean indicating whether the deployment content is currently deployed in the runtime (or should be deployed in the runtime the next time the server starts.) |
JMX | jmx["{#JMXOBJ}",enabled] Preprocessing: - BOOLTODECIMAL - DISCARDUNCHANGEDHEARTBEAT: |
WildFly | WildFly deployment [{#DEPLOYMENT}]: Managed | Indicates if the deployment is managed (aka uses the ContentRepository). |
JMX | jmx["{#JMXOBJ}",managed] Preprocessing: - BOOLTODECIMAL - DISCARDUNCHANGEDHEARTBEAT: |
WildFly | WildFly deployment [{#DEPLOYMENT}]: Persistent | Indicates if the deployment is managed (aka uses the ContentRepository). |
JMX | jmx["{#JMXOBJ}",persistent] Preprocessing: - BOOLTODECIMAL - DISCARDUNCHANGEDHEARTBEAT: |
WildFly | WildFly deployment [{#DEPLOYMENT}]: Enabled time | Indicates if the deployment is managed (aka uses the ContentRepository). |
JMX | jmx["{#JMXOBJ}",enabledTime] Preprocessing: - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
WildFly | WildFly {#JMXDATASOURCE}: Cache access, rate | The number of times that the statement cache was accessed per second. |
JMX | jmx["{#JMXOBJ}",PreparedStatementCacheAccessCount] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly {#JMXDATASOURCE}: Cache add, rate | The number of statements added to the statement cache per second. |
JMX | jmx["{#JMXOBJ}",PreparedStatementCacheAddCount] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly {#JMXDATASOURCE}: Cache current size | The number of prepared and callable statements currently cached in the statement cache. |
JMX | jmx["{#JMXOBJ}",PreparedStatementCacheCurrentSize] |
WildFly | WildFly {#JMXDATASOURCE}: Cache delete, rate | The number of statements discarded from the cache per second. |
JMX | jmx["{#JMXOBJ}",PreparedStatementCacheDeleteCount] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly {#JMXDATASOURCE}: Cache hit, rate | The number of times that statements from the cache were used per second. |
JMX | jmx["{#JMXOBJ}",PreparedStatementCacheHitCount] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly {#JMXDATASOURCE}: Cache miss, rate | The number of times that a statement request could not be satisfied with a statement from the cache per second. |
JMX | jmx["{#JMXOBJ}",PreparedStatementCacheMissCount] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly {#JMXDATASOURCE}: Statistics enabled | Define whether runtime statistics are enabled or not. |
JMX | jmx["{#JMXOBJ}",statisticsEnabled] Preprocessing: - BOOLTODECIMAL - DISCARDUNCHANGEDHEARTBEAT: |
WildFly | WildFly {#JMXDATASOURCE}: Connections: Active | The number of open connections. |
JMX | jmx["{#JMXOBJ}",ActiveCount] |
WildFly | WildFly {#JMXDATASOURCE}: Connections: Available | The available count. |
JMX | jmx["{#JMXOBJ}",AvailableCount] |
WildFly | WildFly {#JMXDATASOURCE}: Blocking time, avg | Average Blocking Time for pool. |
JMX | jmx["{#JMXOBJ}",AverageBlockingTime] |
WildFly | WildFly {#JMXDATASOURCE}: Connections: Creating time, avg | The average time spent creating a physical connection. |
JMX | jmx["{#JMXOBJ}",AverageCreationTime] |
WildFly | WildFly {#JMXDATASOURCE}: Connections: Get time, avg | The average time spent obtaining a physical connection. |
JMX | jmx["{#JMXOBJ}",AverageGetTime] |
WildFly | WildFly {#JMXDATASOURCE}: Connections: Pool time, avg | The average time for a physical connection spent in the pool. |
JMX | jmx["{#JMXOBJ}",AveragePoolTime] |
WildFly | WildFly {#JMXDATASOURCE}: Connections: Usage time, avg | The average time spent using a physical connection |
JMX | jmx["{#JMXOBJ}",AverageUsageTime] |
WildFly | WildFly {#JMXDATASOURCE}: Connections: Blocking failure, rate | The number of failures trying to obtain a physical connection per second. |
JMX | jmx["{#JMXOBJ}",BlockingFailureCount] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly {#JMXDATASOURCE}: Connections: Created, rate | The created per second |
JMX | jmx["{#JMXOBJ}",CreatedCount] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly {#JMXDATASOURCE}: Connections: Destroyed, rate | The destroyed count. |
JMX | jmx["{#JMXOBJ}",DestroyedCount] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly {#JMXDATASOURCE}: Connections: Idle | The number of physical connections currently idle. |
JMX | jmx["{#JMXOBJ}",IdleCount] |
WildFly | WildFly {#JMXDATASOURCE}: Connections: In use | The number of physical connections currently in use. |
JMX | jmx["{#JMXOBJ}",InUseCount] |
WildFly | WildFly {#JMXDATASOURCE}: Connections: Used, max | The maximum number of connections used. |
JMX | jmx["{#JMXOBJ}",MaxUsedCount] |
WildFly | WildFly {#JMXDATASOURCE}: Statistics enabled | Define whether runtime statistics are enabled or not. |
JMX | jmx["{#JMXOBJ}",statisticsEnabled] Preprocessing: - BOOLTODECIMAL - DISCARDUNCHANGEDHEARTBEAT: |
WildFly | WildFly {#JMXDATASOURCE}: Connections: Timed out, rate | The timed out connections per second. |
JMX | jmx["{#JMXOBJ}",TimedOut] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly {#JMXDATASOURCE}: Connections: Wait | The number of requests that had to wait to obtain a physical connection. |
JMX | jmx["{#JMXOBJ}",WaitCount] |
WildFly | WildFly {#JMXDATASOURCE}: XA: Commit time, avg | The average time for a XAResource commit invocation. |
JMX | jmx["{#JMXOBJ}",XACommitAverageTime] |
WildFly | WildFly {#JMXDATASOURCE}: XA: Commit, rate | The number of XAResource commit invocations per second. |
JMX | jmx["{#JMXOBJ}",XACommitCount] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly {#JMXDATASOURCE}: XA: End time, avg | The average time for a XAResource end invocation. |
JMX | jmx["{#JMXOBJ}",XAEndAverageTime] |
WildFly | WildFly {#JMXDATASOURCE}: XA: End, rate | The number of XAResource end invocations per second. |
JMX | jmx["{#JMXOBJ}",XAEndCount] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly {#JMXDATASOURCE}: XA: Forget time, avg | The average time for a XAResource forget invocation. |
JMX | jmx["{#JMXOBJ}",XAForgetAverageTime] |
WildFly | WildFly {#JMXDATASOURCE}: XA: Forget, rate | The number of XAResource forget invocations per second. |
JMX | jmx["{#JMXOBJ}",XAForgetCount] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly {#JMXDATASOURCE}: XA: Prepare time, avg | The average time for a XAResource prepare invocation. |
JMX | jmx["{#JMXOBJ}",XAPrepareAverageTime] |
WildFly | WildFly {#JMXDATASOURCE}: XA: Prepare, rate | The number of XAResource prepare invocations per second. |
JMX | jmx["{#JMXOBJ}",XAPrepareCount] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly {#JMXDATASOURCE}: XA: Recover time, avg | The average time for a XAResource recover invocation. |
JMX | jmx["{#JMXOBJ}",XARecoverAverageTime] |
WildFly | WildFly {#JMXDATASOURCE}: XA: Recover, rate | The number of XAResource recover invocations per second. |
JMX | jmx["{#JMXOBJ}",XARecoverCount] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly {#JMXDATASOURCE}: XA: Rollback time, avg | The average time for a XAResource rollback invocation. |
JMX | jmx["{#JMXOBJ}",XARollbackAverageTime] |
WildFly | WildFly {#JMXDATASOURCE}: XA: Rollback, rate | The number of XAResource rollback invocations per second. |
JMX | jmx["{#JMXOBJ}",XARollbackCount] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly {#JMXDATASOURCE}: XA: Start time, avg | The average time for a XAResource start invocation. |
JMX | jmx["{#JMXOBJ}",XAStartAverageTime] |
WildFly | WildFly {#JMXDATASOURCE}: XA: Start rate | The number of XAResource start invocations per second. |
JMX | jmx["{#JMXOBJ}",XAStartCount] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly listener {#HTTP_LISTENER}: Errors, rate | The number of 500 responses that have been sent by this listener per second. |
JMX | jmx["{#JMXOBJ}",errorCount] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly listener {#HTTP_LISTENER}: Requests, rate | The number of requests this listener has served per second. |
JMX | jmx["{#JMXOBJ}",requestCount] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly listener {#HTTP_LISTENER}: Bytes sent, rate | The number of bytes that have been sent out on this listener per second. |
JMX | jmx["{#JMXOBJ}",bytesSent] Preprocessing: - CHANGEPERSECOND |
WildFly | WildFly listener {#HTTP_LISTENER}: Bytes received, rate | The number of bytes that have been received by this listener per second. |
JMX | jmx["{#JMXOBJ}",bytesReceived] Preprocessing: - CHANGEPERSECOND |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
WildFly: Server needs to restart for configuration change. | - |
find(/WildFly Server by JMX/jmx["jboss.as:management-root=server","runtimeConfigurationState"],,"like","ok")=0 |
WARNING | |
WildFly: Server controller is not in RUNNING state | - |
find(/WildFly Server by JMX/jmx["jboss.as:management-root=server","serverState"],,"like","running")=0 |
WARNING | Depends on: - WildFly: Server needs to restart for configuration change. |
WildFly: Version has changed | WildFly version has changed. Ack to close. |
last(/WildFly Server by JMX/jmx["jboss.as:management-root=server","productVersion"],#1)<>last(/WildFly Server by JMX/jmx["jboss.as:management-root=server","productVersion"],#2) and length(last(/WildFly Server by JMX/jmx["jboss.as:management-root=server","productVersion"]))>0 |
INFO | Manual close: YES |
WildFly: has been restarted | Uptime is less than 10 minutes. |
last(/WildFly Server by JMX/jmx["java.lang:type=Runtime","Uptime"])<10m |
INFO | Manual close: YES |
WildFly: Failed to fetch info data | Zabbix has not received data for items for the last 15 minutes |
nodata(/WildFly Server by JMX/jmx["java.lang:type=Runtime","Uptime"],15m)=1 |
WARNING | |
WildFly deployment [{#DEPLOYMENT}]: Deployment status has changed | Deployment status has changed. Ack to close. |
last(/WildFly Server by JMX/jmx["{#JMXOBJ}",status],#1)<>last(/WildFly Server by JMX/jmx["{#JMXOBJ}",status],#2) and length(last(/WildFly Server by JMX/jmx["{#JMXOBJ}",status]))>0 |
WARNING | Manual close: YES |
WildFly {#JMXDATASOURCE}: JDBC monitoring statistic is not enabled | - |
last(/WildFly Server by JMX/jmx["{#JMXOBJ}",statisticsEnabled])=0 |
INFO | |
WildFly {#JMXDATASOURCE}: There are no active connections for 5m | - |
max(/WildFly Server by JMX/jmx["{#JMXOBJ}",ActiveCount],5m)=0 |
WARNING | |
WildFly {#JMXDATASOURCE}: Connection usage is too high | - |
min(/WildFly Server by JMX/jmx["{#JMXOBJ}",InUseCount],5m)/last(/WildFly Server by JMX/jmx["{#JMXOBJ}",AvailableCount])*100>{$WILDFLY.CONN.USAGE.WARN.MAX} |
HIGH | |
WildFly {#JMXDATASOURCE}: Pools monitoring statistic is not enabled | Zabbix has not received data for items for the last 15 minutes |
last(/WildFly Server by JMX/jmx["{#JMXOBJ}",statisticsEnabled])=0 |
INFO | |
WildFly {#JMXDATASOURCE}: There are timeout connections | - |
last(/WildFly Server by JMX/jmx["{#JMXOBJ}",TimedOut])>0 |
WARNING | |
WildFly {#JMXDATASOURCE}: Too many waiting connections | - |
min(/WildFly Server by JMX/jmx["{#JMXOBJ}",WaitCount],5m)>{$WILDFLY.CONN.WAIT.MAX.WARN} |
WARNING | |
WildFly listener {#HTTP_LISTENER}: There are 500 responses by this listener. | - |
last(/WildFly Server by JMX/jmx["{#JMXOBJ}",errorCount])>0 |
WARNING |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
Official JMX Template for WildFly Domain Controller.
This template was tested on:
See Zabbix template operation for basic instructions.
Metrics are collected by JMX. This template works with Domain Controller.
/(wildfly,EAP,Jboss,AS)/bin/client
in to directory /usr/share/zabbix-java-gateway/lib
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$WILDFLY.DEPLOYMENT.MATCHES} | Filter of discoverable deployments |
.* |
{$WILDFLY.DEPLOYMENT.NOT_MATCHES} | Filter to exclude discovered deployments |
CHANGE_IF_NEEDED |
{$WILDFLY.JMX.PROTOCOL} | - |
remote+http |
{$WILDFLY.PASSWORD} | - |
zabbix |
{$WILDFLY.SERVER.MATCHES} | Filter of discoverable servers |
.* |
{$WILDFLY.SERVER.NOT_MATCHES} | Filter to exclude discovered servers |
CHANGE_IF_NEEDED |
{$WILDFLY.USER} | - |
zabbix |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Deployments discovery | Discovery deployments metrics. |
JMX | jmx.get[beans,"jboss.as.expr:deployment=,server-group="] Filter: AND- {#DEPLOYMENT} MATCHESREGEX - {#DEPLOYMENT} NOTMATCHES_REGEX |
Servers discovery | Discovery instances in domain. |
JMX | jmx.get[beans,"jboss.as:host=master,server-config=*"] Filter: AND- {#SERVER} MATCHESREGEX - {#SERVER} NOTMATCHES_REGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
WildFly | WildFly: Launch type | The manner in which the server process was launched. Either "DOMAIN" for a domain mode server launched by a Host Controller, "STANDALONE" for a standalone server launched from the command line, or "EMBEDDED" for a standalone server launched as an embedded part of an application running in the same virtual machine. |
JMX | jmx["jboss.as:management-root=server","launchType"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
WildFly | WildFly: Name | For standalone mode: The name of this server. If not set, defaults to the runtime value of InetAddress.getLocalHost().getHostName(). For domain mode: The name given to this domain |
JMX | jmx["jboss.as:management-root=server","name"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
WildFly | WildFly: Process type | The type of process represented by this root resource. |
JMX | jmx["jboss.as:management-root=server","processType"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
WildFly | WildFly: Version | The version of the WildFly Core based product release |
JMX | jmx["jboss.as:management-root=server","productVersion"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
WildFly | WildFly: Uptime | WildFly server uptime. |
JMX | jmx["java.lang:type=Runtime","Uptime"] Preprocessing: - MULTIPLIER: |
WildFly | WildFly deployment [{#DEPLOYMENT}]: Enabled | Boolean indicating whether the deployment content is currently deployed in the runtime (or should be deployed in the runtime the next time the server starts.) |
JMX | jmx["{#JMXOBJ}",enabled] Preprocessing: - BOOLTODECIMAL - DISCARDUNCHANGEDHEARTBEAT: |
WildFly | WildFly deployment [{#DEPLOYMENT}]: Managed | Indicates if the deployment is managed (aka uses the ContentRepository). |
JMX | jmx["{#JMXOBJ}",managed] Preprocessing: - BOOLTODECIMAL - DISCARDUNCHANGEDHEARTBEAT: |
WildFly | WildFly domain: Server {#SERVER}: Autostart | Whether or not this server should be started when the Host Controller starts. |
JMX | jmx["{#JMXOBJ}",autoStart] Preprocessing: - BOOLTODECIMAL - DISCARDUNCHANGEDHEARTBEAT: |
WildFly | WildFly domain: Server {#SERVER}: Status | The current status of the server. |
JMX | jmx["{#JMXOBJ}",status] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
WildFly | WildFly domain: Server {#SERVER}: Server group | The name of a server group from the domain model. |
JMX | jmx["{#JMXOBJ}",group] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
WildFly: Version has changed | WildFly version has changed. Ack to close. |
last(/WildFly Domain by JMX/jmx["jboss.as:management-root=server","productVersion"],#1)<>last(/WildFly Domain by JMX/jmx["jboss.as:management-root=server","productVersion"],#2) and length(last(/WildFly Domain by JMX/jmx["jboss.as:management-root=server","productVersion"]))>0 |
INFO | Manual close: YES |
WildFly: has been restarted | Uptime is less than 10 minutes. |
last(/WildFly Domain by JMX/jmx["java.lang:type=Runtime","Uptime"])<10m |
INFO | Manual close: YES |
WildFly domain: Server {#SERVER}: Server status has changed | Server status has changed. Ack to close. |
last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",status],#1)<>last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",status],#2) and length(last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",status]))>0 |
WARNING | Manual close: YES |
WildFly domain: Server {#SERVER}: Server group has changed | Server group has changed. Ack to close. |
last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",group],#1)<>last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",group],#2) and length(last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",group]))>0 |
INFO | Manual close: YES |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
The template to monitor VMware vCenter and ESX hypervisor.
The "VMware Hypervisor" and "VMware Guest" templates are used by discovery and normally should not be manually linked to a host.
For additional information please check https://www.zabbix.com/documentation/6.2/manual/vm_monitoring
{$VMWARE.URL}
{$VMWARE.USERNAME}
{$VMWARE.PASSWORD}
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$VMWARE.PASSWORD} | VMware service {$USERNAME} user password |
`` |
{$VMWARE.URL} | VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk) |
`` |
{$VMWARE.USERNAME} | VMware service user name |
`` |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Discover VMware clusters | Discovery of clusters |
SIMPLE | vmware.cluster.discovery[{$VMWARE.URL}] |
Discover VMware datastores | - |
SIMPLE | vmware.datastore.discovery[{$VMWARE.URL}] |
Discover VMware hypervisors | Discovery of hypervisors. |
SIMPLE | vmware.hv.discovery[{$VMWARE.URL}] |
Discover VMware VMs FQDN | Discovery of guest virtual machines. |
SIMPLE | vmware.vm.discovery[{$VMWARE.URL}] Filter: AND- {#VM.DNS} NOTMATCHESREGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
VMware | VMware: Event log | Collect VMware event log. See also: https://www.zabbix.com/documentation/6.2/manual/config/items/preprocessing/examples#filteringvmwareeventlogrecords |
SIMPLE | vmware.eventlog[{$VMWARE.URL},skip] |
VMware | VMware: Full name | VMware service full name. |
SIMPLE | vmware.fullname[{$VMWARE.URL}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: Version | VMware service version. |
SIMPLE | vmware.version[{$VMWARE.URL}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: Status of "{#CLUSTER.NAME}" cluster | VMware cluster status. |
SIMPLE | vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}] |
VMware | VMware: Average read latency of the datastore {#DATASTORE} | Amount of time for a read operation from the datastore (milliseconds). |
SIMPLE | vmware.datastore.read[{$VMWARE.URL},{#DATASTORE},latency] |
VMware | VMware: Free space on datastore {#DATASTORE} (percentage) | VMware datastore space in percentage from total. |
SIMPLE | vmware.datastore.size[{$VMWARE.URL},{#DATASTORE},pfree] |
VMware | VMware: Total size of datastore {#DATASTORE} | VMware datastore space in bytes. |
SIMPLE | vmware.datastore.size[{$VMWARE.URL},{#DATASTORE}] |
VMware | VMware: Average write latency of the datastore {#DATASTORE} | Amount of time for a write operation to the datastore (milliseconds). |
SIMPLE | vmware.datastore.write[{$VMWARE.URL},{#DATASTORE},latency] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
Refer to the vendor documentation.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$VMWARE.PASSWORD} | VMware service {$USERNAME} user password |
`` |
{$VMWARE.URL} | VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk) |
`` |
{$VMWARE.USERNAME} | VMware service user name |
`` |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Disk device discovery | Discovery of all disk devices. |
SIMPLE | vmware.vm.vfs.dev.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Mounted filesystem discovery | Discovery of all guest file systems. |
SIMPLE | vmware.vm.vfs.fs.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Network device discovery | Discovery of all network devices. |
SIMPLE | vmware.vm.net.if.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
VMware | VMware: Cluster name | Cluster name of the guest VM. |
SIMPLE | vmware.vm.cluster.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: Number of virtual CPUs | Number of virtual CPUs assigned to the guest. |
SIMPLE | vmware.vm.cpu.num[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: CPU ready | Time that the virtual machine was ready, but could not get scheduled to run on the physical CPU during last measurement interval (VMware vCenter/ESXi Server performance counter sampling interval - 20 seconds) |
SIMPLE | vmware.vm.cpu.ready[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: CPU usage | Current upper-bound on CPU usage. The upper-bound is based on the host the virtual machine is current running on, as well as limits configured on the virtual machine itself or any parent resource pool. Valid while the virtual machine is running. |
SIMPLE | vmware.vm.cpu.usage[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Datacenter name | Datacenter name of the guest VM. |
SIMPLE | vmware.vm.datacenter.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: Hypervisor name | Hypervisor name of the guest VM. |
SIMPLE | vmware.vm.hv.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: Ballooned memory | The amount of guest physical memory that is currently reclaimed through the balloon driver. |
SIMPLE | vmware.vm.memory.size.ballooned[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Compressed memory | The amount of memory currently in the compression cache for this VM. |
SIMPLE | vmware.vm.memory.size.compressed[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Private memory | Amount of memory backed by host memory and not being shared. |
SIMPLE | vmware.vm.memory.size.private[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Shared memory | The amount of guest physical memory shared through transparent page sharing. |
SIMPLE | vmware.vm.memory.size.shared[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Swapped memory | The amount of guest physical memory swapped out to the VM's swap device by ESX. |
SIMPLE | vmware.vm.memory.size.swapped[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Guest memory usage | The amount of guest physical memory that is being used by the VM. |
SIMPLE | vmware.vm.memory.size.usage.guest[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Host memory usage | The amount of host physical memory allocated to the VM, accounting for saving from memory sharing with other VMs. |
SIMPLE | vmware.vm.memory.size.usage.host[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Memory size | Total size of configured memory. |
SIMPLE | vmware.vm.memory.size[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: Power state | The current power state of the virtual machine. |
SIMPLE | vmware.vm.powerstate[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: Committed storage space | Total storage space, in bytes, committed to this virtual machine across all datastores. |
SIMPLE | vmware.vm.storage.committed[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Uncommitted storage space | Additional storage space, in bytes, potentially used by this virtual machine on all datastores. |
SIMPLE | vmware.vm.storage.uncommitted[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Unshared storage space | Total storage space, in bytes, occupied by the virtual machine across all datastores, that is not shared with any other virtual machine. |
SIMPLE | vmware.vm.storage.unshared[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Uptime | System uptime. |
SIMPLE | vmware.vm.uptime[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Guest memory swapped | Amount of guest physical memory that is swapped out to the swap space. |
SIMPLE | vmware.vm.guest.memory.size.swapped[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Host memory consumed | Amount of host physical memory consumed for backing up guest physical memory pages. |
SIMPLE | vmware.vm.memory.size.consumed[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Host memory usage in percents | Percentage of host physical memory that has been consumed. |
SIMPLE | vmware.vm.memory.usage[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: CPU usage in percents | CPU usage as a percentage during the interval. |
SIMPLE | vmware.vm.cpu.usage.perf[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: CPU latency in percents | Percentage of time the virtual machine is unable to run because it is contending for access to the physical CPU(s). |
SIMPLE | vmware.vm.cpu.latency[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: CPU readiness latency in percents | Percentage of time that the virtual machine was ready, but could not get scheduled to run on the physical CPU. |
SIMPLE | vmware.vm.cpu.readiness[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: CPU swap-in latency in percents | Percentage of CPU time spent waiting for swap-in. |
SIMPLE | vmware.vm.cpu.swapwait[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Uptime of guest OS | Total time elapsed since the last operating system boot-up (in seconds). |
SIMPLE | vmware.vm.guest.osuptime[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Number of bytes received on interface {#IFDESC} | VMware virtual machine network interface input statistics (bytes per second). |
SIMPLE | vmware.vm.net.if.in[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},bps] |
VMware | VMware: Number of packets received on interface {#IFDESC} | VMware virtual machine network interface input statistics (packets per second). |
SIMPLE | vmware.vm.net.if.in[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},pps] |
VMware | VMware: Number of bytes transmitted on interface {#IFDESC} | VMware virtual machine network interface output statistics (bytes per second). |
SIMPLE | vmware.vm.net.if.out[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},bps] |
VMware | VMware: Number of packets transmitted on interface {#IFDESC} | VMware virtual machine network interface output statistics (packets per second). |
SIMPLE | vmware.vm.net.if.out[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},pps] |
VMware | VMware: Network utilization on interface {#IFDESC} | VMware virtual machine network utilization (combined transmit-rates and receive-rates) during the interval. |
SIMPLE | vmware.vm.net.if.usage[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME}] Preprocessing: - MULTIPLIER: |
VMware | VMware: Average number of bytes read from the disk {#DISKDESC} | VMware virtual machine disk device read statistics (bytes per second). |
SIMPLE | vmware.vm.vfs.dev.read[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},bps] |
VMware | VMware: Average number of reads from the disk {#DISKDESC} | VMware virtual machine disk device read statistics (operations per second). |
SIMPLE | vmware.vm.vfs.dev.read[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},ops] |
VMware | VMware: Average number of bytes written to the disk {#DISKDESC} | VMware virtual machine disk device write statistics (bytes per second). |
SIMPLE | vmware.vm.vfs.dev.write[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},bps] |
VMware | VMware: Average number of writes to the disk {#DISKDESC} | VMware virtual machine disk device write statistics (operations per second). |
SIMPLE | vmware.vm.vfs.dev.write[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},ops] |
VMware | VMware: Average number of outstanding read requests to the disk {#DISKDESC} | Average number of outstanding read requests to the virtual disk during the collection interval. |
SIMPLE | vmware.vm.storage.readoio[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}] |
VMware | VMware: Average number of outstanding write requests to the disk {#DISKDESC} | Average number of outstanding write requests to the virtual disk during the collection interval. |
SIMPLE | vmware.vm.storage.writeoio[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}] |
VMware | VMware: Average write latency to the disk {#DISKDESC} | The average time a write to the virtual disk takes. |
SIMPLE | vmware.vm.storage.totalwritelatency[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}] |
VMware | VMware: Average read latency to the disk {#DISKDESC} | The average time a read from the virtual disk takes. |
SIMPLE | vmware.vm.storage.totalreadlatency[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}] |
VMware | VMware: Free disk space on {#FSNAME} | VMware virtual machine file system statistics (bytes). |
SIMPLE | vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},free] |
VMware | VMware: Free disk space on {#FSNAME} (percentage) | VMware virtual machine file system statistics (percentages). |
SIMPLE | vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},pfree] |
VMware | VMware: Total disk space on {#FSNAME} | VMware virtual machine total disk space (bytes). |
SIMPLE | vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},total] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: Used disk space on {#FSNAME} | VMware virtual machine used disk space (bytes). |
SIMPLE | vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},used] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware: VM has been restarted | Uptime is less than 10 minutes. |
last(/VMware Guest/vmware.vm.guest.osuptime[{$VMWARE.URL},{$VMWARE.VM.UUID}])<10m |
WARNING | Manual close: YES |
Please report any issues with the template at https://support.zabbix.com
For Zabbix version: 6.2 and higher
Refer to the vendor documentation.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$VMWARE.PASSWORD} | VMware service {$USERNAME} user password |
`` |
{$VMWARE.URL} | VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk) |
`` |
{$VMWARE.USERNAME} | VMware service user name |
`` |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Datastore discovery | - |
SIMPLE | vmware.hv.datastore.discovery[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Healthcheck discovery | VMware Rollup Health State sensor discovery |
DEPENDENT | vmware.hv.healthcheck.discovery Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
VMware | VMware: Hypervisor ping | Checks if the hypervisor is running and accepting ICMP pings. |
SIMPLE | icmpping[] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: Cluster name | Cluster name of the guest VM. |
SIMPLE | vmware.hv.cluster.name[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: CPU usage | Aggregated CPU usage across all cores on the host in Hz. This is only available if the host is connected. |
SIMPLE | vmware.hv.cpu.usage[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: CPU usage in percents | CPU usage as a percentage during the interval. |
SIMPLE | vmware.hv.cpu.usage.perf[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: CPU utilization | CPU usage as a percentage during the interval depends on power management or HT. |
SIMPLE | vmware.hv.cpu.utilization[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Power usage | Current power usage. |
SIMPLE | vmware.hv.power[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Power usage maximum allowed | Maximum allowed power usage. |
SIMPLE | vmware.hv.power[{$VMWARE.URL},{$VMWARE.HV.UUID},max] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: Datacenter name | Datacenter name of the hypervisor. |
SIMPLE | vmware.hv.datacenter.name[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: Full name | The complete product name, including the version information. |
SIMPLE | vmware.hv.fullname[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: CPU frequency | The speed of the CPU cores. This is an average value if there are multiple speeds. The product of CPU frequency and number of cores is approximately equal to the sum of the MHz for all the individual cores on the host. |
SIMPLE | vmware.hv.hw.cpu.freq[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: CPU model | The CPU model. |
SIMPLE | vmware.hv.hw.cpu.model[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: CPU cores | Number of physical CPU cores on the host. Physical CPU cores are the processors contained by a CPU package. |
SIMPLE | vmware.hv.hw.cpu.num[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: CPU threads | Number of physical CPU threads on the host. |
SIMPLE | vmware.hv.hw.cpu.threads[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Total memory | The physical memory size. |
SIMPLE | vmware.hv.hw.memory[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Model | The system model identification. |
SIMPLE | vmware.hv.hw.model[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Bios UUID | The hardware BIOS identification. |
SIMPLE | vmware.hv.hw.uuid[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Vendor | The hardware vendor identification. |
SIMPLE | vmware.hv.hw.vendor[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Ballooned memory | The amount of guest physical memory that is currently reclaimed through the balloon driver. Sum of all guest VMs. |
SIMPLE | vmware.hv.memory.size.ballooned[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Used memory | Physical memory usage on the host. |
SIMPLE | vmware.hv.memory.used[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Number of bytes received | VMware hypervisor network input statistics (bytes per second). |
SIMPLE | vmware.hv.network.in[{$VMWARE.URL},{$VMWARE.HV.UUID},bps] |
VMware | VMware: Number of bytes transmitted | VMware hypervisor network output statistics (bytes per second). |
SIMPLE | vmware.hv.network.out[{$VMWARE.URL},{$VMWARE.HV.UUID},bps] |
VMware | VMware: Overall status | The overall alarm status of the host: gray - unknown, green - ok, red - it has a problem, yellow - it might have a problem. |
SIMPLE | vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Uptime | System uptime. |
SIMPLE | vmware.hv.uptime[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Version | Dot-separated version string. |
SIMPLE | vmware.hv.version[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Number of guest VMs | Number of guest virtual machines. |
SIMPLE | vmware.hv.vm.num[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Average read latency of the datastore {#DATASTORE} | Average amount of time for a read operation from the datastore (milliseconds). |
SIMPLE | vmware.hv.datastore.read[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},latency] |
VMware | VMware: Free space on datastore {#DATASTORE} (percentage) | VMware datastore space in percentage from total. |
SIMPLE | vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},pfree] |
VMware | VMware: Total size of datastore {#DATASTORE} | VMware datastore space in bytes. |
SIMPLE | vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}] |
VMware | VMware: Average write latency of the datastore {#DATASTORE} | Average amount of time for a write operation to the datastore (milliseconds). |
SIMPLE | vmware.hv.datastore.write[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},latency] |
VMware | VMware: Multipath count for datastore {#DATASTORE} | Number of available datastore paths. |
SIMPLE | vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}] |
VMware | VMware: Health state rollup | The host health state rollup sensor value: gray - unknown, green - ok, red - it has a problem, yellow - it might have a problem. |
DEPENDENT | vmware.hv.sensor.health.state[{#SINGLETON}] Preprocessing: - JSONPATH: |
Zabbix raw items | VMware: Get sensors | Master item for sensors data. |
SIMPLE | vmware.hv.sensors.get[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware: Hypervisor is down | The service is unavailable or does not accept ICMP ping. |
last(/VMware Hypervisor/icmpping[])=0 |
AVERAGE | Manual close: YES |
VMware: The {$VMWARE.HV.UUID} health is Red | One or more components in the appliance might be in an unusable status and the appliance might become unresponsive soon. Security patches might be available. |
last(/VMware Hypervisor/vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}])=3 |
HIGH | |
VMware: The {$VMWARE.HV.UUID} health is Yellow | One or more components in the appliance might become overloaded soon. |
last(/VMware Hypervisor/vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}])=2 |
AVERAGE | Depends on: - VMware: The {$VMWARE.HV.UUID} health is Red |
VMware: Hypervisor has been restarted | Uptime is less than 10 minutes. |
last(/VMware Hypervisor/vmware.hv.uptime[{$VMWARE.URL},{$VMWARE.HV.UUID}])<10m |
WARNING | Manual close: YES |
VMware: The multipath count has been changed | The number of available datastore paths less than registered ({#MULTIPATH.COUNT}). |
last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}],#1)<>last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}],#2) and last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}])<{#MULTIPATH.COUNT} |
AVERAGE | Manual close: YES |
VMware: The {$VMWARE.HV.UUID} health is Red | One or more components in the appliance might be in an unusable status and the appliance might become unresponsive soon. Security patches might be available. |
last(/VMware Hypervisor/vmware.hv.sensor.health.state[{#SINGLETON}])="Red" |
HIGH | Depends on: - VMware: The {$VMWARE.HV.UUID} health is Red |
VMware: The {$VMWARE.HV.UUID} health is Yellow | One or more components in the appliance might become overloaded soon. |
last(/VMware Hypervisor/vmware.hv.sensor.health.state[{#SINGLETON}])="Yellow" |
AVERAGE | Depends on: - VMware: The {$VMWARE.HV.UUID} health is Red - VMware: The {$VMWARE.HV.UUID} health is Red - VMware: The {$VMWARE.HV.UUID} health is Yellow |
Please report any issues with the template at https://support.zabbix.com
For Zabbix version: 6.2 and higher
The template to monitor VMware vCenter and ESX hypervisor.
The "VMware Hypervisor" and "VMware Guest" templates are used by discovery and normally should not be manually linked to a host.
For additional information please check https://www.zabbix.com/documentation/6.2/manual/vm_monitoring
{$VMWARE.URL}
{$VMWARE.USERNAME}
{$VMWARE.PASSWORD}
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$VMWARE.PASSWORD} | VMware service {$USERNAME} user password |
`` |
{$VMWARE.URL} | VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk) |
`` |
{$VMWARE.USERNAME} | VMware service user name |
`` |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Discover VMware clusters | Discovery of clusters |
SIMPLE | vmware.cluster.discovery[{$VMWARE.URL}] |
Discover VMware datastores | - |
SIMPLE | vmware.datastore.discovery[{$VMWARE.URL}] |
Discover VMware hypervisors | Discovery of hypervisors. |
SIMPLE | vmware.hv.discovery[{$VMWARE.URL}] |
Discover VMware VMs | Discovery of guest virtual machines. |
SIMPLE | vmware.vm.discovery[{$VMWARE.URL}] |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
VMware | VMware: Event log | Collect VMware event log. See also: https://www.zabbix.com/documentation/6.2/manual/config/items/preprocessing/examples#filteringvmwareeventlogrecords |
SIMPLE | vmware.eventlog[{$VMWARE.URL},skip] |
VMware | VMware: Full name | VMware service full name. |
SIMPLE | vmware.fullname[{$VMWARE.URL}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: Version | VMware service version. |
SIMPLE | vmware.version[{$VMWARE.URL}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: Status of "{#CLUSTER.NAME}" cluster | VMware cluster status. |
SIMPLE | vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}] |
VMware | VMware: Average read latency of the datastore {#DATASTORE} | Amount of time for a read operation from the datastore (milliseconds). |
SIMPLE | vmware.datastore.read[{$VMWARE.URL},{#DATASTORE},latency] |
VMware | VMware: Free space on datastore {#DATASTORE} (percentage) | VMware datastore space in percentage from total. |
SIMPLE | vmware.datastore.size[{$VMWARE.URL},{#DATASTORE},pfree] |
VMware | VMware: Total size of datastore {#DATASTORE} | VMware datastore space in bytes. |
SIMPLE | vmware.datastore.size[{$VMWARE.URL},{#DATASTORE}] |
VMware | VMware: Average write latency of the datastore {#DATASTORE} | Amount of time for a write operation to the datastore (milliseconds). |
SIMPLE | vmware.datastore.write[{$VMWARE.URL},{#DATASTORE},latency] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
Refer to the vendor documentation.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$VMWARE.PASSWORD} | VMware service {$USERNAME} user password |
`` |
{$VMWARE.URL} | VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk) |
`` |
{$VMWARE.USERNAME} | VMware service user name |
`` |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Disk device discovery | Discovery of all disk devices. |
SIMPLE | vmware.vm.vfs.dev.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Mounted filesystem discovery | Discovery of all guest file systems. |
SIMPLE | vmware.vm.vfs.fs.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Network device discovery | Discovery of all network devices. |
SIMPLE | vmware.vm.net.if.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
VMware | VMware: Cluster name | Cluster name of the guest VM. |
SIMPLE | vmware.vm.cluster.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: Number of virtual CPUs | Number of virtual CPUs assigned to the guest. |
SIMPLE | vmware.vm.cpu.num[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: CPU ready | Time that the virtual machine was ready, but could not get scheduled to run on the physical CPU during last measurement interval (VMware vCenter/ESXi Server performance counter sampling interval - 20 seconds) |
SIMPLE | vmware.vm.cpu.ready[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: CPU usage | Current upper-bound on CPU usage. The upper-bound is based on the host the virtual machine is current running on, as well as limits configured on the virtual machine itself or any parent resource pool. Valid while the virtual machine is running. |
SIMPLE | vmware.vm.cpu.usage[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Datacenter name | Datacenter name of the guest VM. |
SIMPLE | vmware.vm.datacenter.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: Hypervisor name | Hypervisor name of the guest VM. |
SIMPLE | vmware.vm.hv.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: Ballooned memory | The amount of guest physical memory that is currently reclaimed through the balloon driver. |
SIMPLE | vmware.vm.memory.size.ballooned[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Compressed memory | The amount of memory currently in the compression cache for this VM. |
SIMPLE | vmware.vm.memory.size.compressed[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Private memory | Amount of memory backed by host memory and not being shared. |
SIMPLE | vmware.vm.memory.size.private[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Shared memory | The amount of guest physical memory shared through transparent page sharing. |
SIMPLE | vmware.vm.memory.size.shared[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Swapped memory | The amount of guest physical memory swapped out to the VM's swap device by ESX. |
SIMPLE | vmware.vm.memory.size.swapped[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Guest memory usage | The amount of guest physical memory that is being used by the VM. |
SIMPLE | vmware.vm.memory.size.usage.guest[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Host memory usage | The amount of host physical memory allocated to the VM, accounting for saving from memory sharing with other VMs. |
SIMPLE | vmware.vm.memory.size.usage.host[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Memory size | Total size of configured memory. |
SIMPLE | vmware.vm.memory.size[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: Power state | The current power state of the virtual machine. |
SIMPLE | vmware.vm.powerstate[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: Committed storage space | Total storage space, in bytes, committed to this virtual machine across all datastores. |
SIMPLE | vmware.vm.storage.committed[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Uncommitted storage space | Additional storage space, in bytes, potentially used by this virtual machine on all datastores. |
SIMPLE | vmware.vm.storage.uncommitted[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Unshared storage space | Total storage space, in bytes, occupied by the virtual machine across all datastores, that is not shared with any other virtual machine. |
SIMPLE | vmware.vm.storage.unshared[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Uptime | System uptime. |
SIMPLE | vmware.vm.uptime[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Guest memory swapped | Amount of guest physical memory that is swapped out to the swap space. |
SIMPLE | vmware.vm.guest.memory.size.swapped[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Host memory consumed | Amount of host physical memory consumed for backing up guest physical memory pages. |
SIMPLE | vmware.vm.memory.size.consumed[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Host memory usage in percents | Percentage of host physical memory that has been consumed. |
SIMPLE | vmware.vm.memory.usage[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: CPU usage in percents | CPU usage as a percentage during the interval. |
SIMPLE | vmware.vm.cpu.usage.perf[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: CPU latency in percents | Percentage of time the virtual machine is unable to run because it is contending for access to the physical CPU(s). |
SIMPLE | vmware.vm.cpu.latency[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: CPU readiness latency in percents | Percentage of time that the virtual machine was ready, but could not get scheduled to run on the physical CPU. |
SIMPLE | vmware.vm.cpu.readiness[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: CPU swap-in latency in percents | Percentage of CPU time spent waiting for swap-in. |
SIMPLE | vmware.vm.cpu.swapwait[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Uptime of guest OS | Total time elapsed since the last operating system boot-up (in seconds). |
SIMPLE | vmware.vm.guest.osuptime[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware | VMware: Number of bytes received on interface {#IFDESC} | VMware virtual machine network interface input statistics (bytes per second). |
SIMPLE | vmware.vm.net.if.in[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},bps] |
VMware | VMware: Number of packets received on interface {#IFDESC} | VMware virtual machine network interface input statistics (packets per second). |
SIMPLE | vmware.vm.net.if.in[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},pps] |
VMware | VMware: Number of bytes transmitted on interface {#IFDESC} | VMware virtual machine network interface output statistics (bytes per second). |
SIMPLE | vmware.vm.net.if.out[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},bps] |
VMware | VMware: Number of packets transmitted on interface {#IFDESC} | VMware virtual machine network interface output statistics (packets per second). |
SIMPLE | vmware.vm.net.if.out[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},pps] |
VMware | VMware: Network utilization on interface {#IFDESC} | VMware virtual machine network utilization (combined transmit-rates and receive-rates) during the interval. |
SIMPLE | vmware.vm.net.if.usage[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME}] Preprocessing: - MULTIPLIER: |
VMware | VMware: Average number of bytes read from the disk {#DISKDESC} | VMware virtual machine disk device read statistics (bytes per second). |
SIMPLE | vmware.vm.vfs.dev.read[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},bps] |
VMware | VMware: Average number of reads from the disk {#DISKDESC} | VMware virtual machine disk device read statistics (operations per second). |
SIMPLE | vmware.vm.vfs.dev.read[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},ops] |
VMware | VMware: Average number of bytes written to the disk {#DISKDESC} | VMware virtual machine disk device write statistics (bytes per second). |
SIMPLE | vmware.vm.vfs.dev.write[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},bps] |
VMware | VMware: Average number of writes to the disk {#DISKDESC} | VMware virtual machine disk device write statistics (operations per second). |
SIMPLE | vmware.vm.vfs.dev.write[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},ops] |
VMware | VMware: Average number of outstanding read requests to the disk {#DISKDESC} | Average number of outstanding read requests to the virtual disk during the collection interval. |
SIMPLE | vmware.vm.storage.readoio[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}] |
VMware | VMware: Average number of outstanding write requests to the disk {#DISKDESC} | Average number of outstanding write requests to the virtual disk during the collection interval. |
SIMPLE | vmware.vm.storage.writeoio[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}] |
VMware | VMware: Average write latency to the disk {#DISKDESC} | The average time a write to the virtual disk takes. |
SIMPLE | vmware.vm.storage.totalwritelatency[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}] |
VMware | VMware: Average read latency to the disk {#DISKDESC} | The average time a read from the virtual disk takes. |
SIMPLE | vmware.vm.storage.totalreadlatency[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}] |
VMware | VMware: Free disk space on {#FSNAME} | VMware virtual machine file system statistics (bytes). |
SIMPLE | vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},free] |
VMware | VMware: Free disk space on {#FSNAME} (percentage) | VMware virtual machine file system statistics (percentages). |
SIMPLE | vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},pfree] |
VMware | VMware: Total disk space on {#FSNAME} | VMware virtual machine total disk space (bytes). |
SIMPLE | vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},total] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: Used disk space on {#FSNAME} | VMware virtual machine used disk space (bytes). |
SIMPLE | vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},used] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware: VM has been restarted | Uptime is less than 10 minutes. |
last(/VMware Guest/vmware.vm.guest.osuptime[{$VMWARE.URL},{$VMWARE.VM.UUID}])<10m |
WARNING | Manual close: YES |
Please report any issues with the template at https://support.zabbix.com
For Zabbix version: 6.2 and higher
Refer to the vendor documentation.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$VMWARE.PASSWORD} | VMware service {$USERNAME} user password |
`` |
{$VMWARE.URL} | VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk) |
`` |
{$VMWARE.USERNAME} | VMware service user name |
`` |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Datastore discovery | - |
SIMPLE | vmware.hv.datastore.discovery[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Healthcheck discovery | VMware Rollup Health State sensor discovery |
DEPENDENT | vmware.hv.healthcheck.discovery Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
VMware | VMware: Hypervisor ping | Checks if the hypervisor is running and accepting ICMP pings. |
SIMPLE | icmpping[] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: Cluster name | Cluster name of the guest VM. |
SIMPLE | vmware.hv.cluster.name[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: CPU usage | Aggregated CPU usage across all cores on the host in Hz. This is only available if the host is connected. |
SIMPLE | vmware.hv.cpu.usage[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: CPU usage in percents | CPU usage as a percentage during the interval. |
SIMPLE | vmware.hv.cpu.usage.perf[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: CPU utilization | CPU usage as a percentage during the interval depends on power management or HT. |
SIMPLE | vmware.hv.cpu.utilization[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Power usage | Current power usage. |
SIMPLE | vmware.hv.power[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Power usage maximum allowed | Maximum allowed power usage. |
SIMPLE | vmware.hv.power[{$VMWARE.URL},{$VMWARE.HV.UUID},max] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: Datacenter name | Datacenter name of the hypervisor. |
SIMPLE | vmware.hv.datacenter.name[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: Full name | The complete product name, including the version information. |
SIMPLE | vmware.hv.fullname[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: CPU frequency | The speed of the CPU cores. This is an average value if there are multiple speeds. The product of CPU frequency and number of cores is approximately equal to the sum of the MHz for all the individual cores on the host. |
SIMPLE | vmware.hv.hw.cpu.freq[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: CPU model | The CPU model. |
SIMPLE | vmware.hv.hw.cpu.model[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: CPU cores | Number of physical CPU cores on the host. Physical CPU cores are the processors contained by a CPU package. |
SIMPLE | vmware.hv.hw.cpu.num[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
VMware | VMware: CPU threads | Number of physical CPU threads on the host. |
SIMPLE | vmware.hv.hw.cpu.threads[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Total memory | The physical memory size. |
SIMPLE | vmware.hv.hw.memory[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Model | The system model identification. |
SIMPLE | vmware.hv.hw.model[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Bios UUID | The hardware BIOS identification. |
SIMPLE | vmware.hv.hw.uuid[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Vendor | The hardware vendor identification. |
SIMPLE | vmware.hv.hw.vendor[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Ballooned memory | The amount of guest physical memory that is currently reclaimed through the balloon driver. Sum of all guest VMs. |
SIMPLE | vmware.hv.memory.size.ballooned[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Used memory | Physical memory usage on the host. |
SIMPLE | vmware.hv.memory.used[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Number of bytes received | VMware hypervisor network input statistics (bytes per second). |
SIMPLE | vmware.hv.network.in[{$VMWARE.URL},{$VMWARE.HV.UUID},bps] |
VMware | VMware: Number of bytes transmitted | VMware hypervisor network output statistics (bytes per second). |
SIMPLE | vmware.hv.network.out[{$VMWARE.URL},{$VMWARE.HV.UUID},bps] |
VMware | VMware: Overall status | The overall alarm status of the host: gray - unknown, green - ok, red - it has a problem, yellow - it might have a problem. |
SIMPLE | vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Uptime | System uptime. |
SIMPLE | vmware.hv.uptime[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Version | Dot-separated version string. |
SIMPLE | vmware.hv.version[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Number of guest VMs | Number of guest virtual machines. |
SIMPLE | vmware.hv.vm.num[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware | VMware: Average read latency of the datastore {#DATASTORE} | Average amount of time for a read operation from the datastore (milliseconds). |
SIMPLE | vmware.hv.datastore.read[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},latency] |
VMware | VMware: Free space on datastore {#DATASTORE} (percentage) | VMware datastore space in percentage from total. |
SIMPLE | vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},pfree] |
VMware | VMware: Total size of datastore {#DATASTORE} | VMware datastore space in bytes. |
SIMPLE | vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}] |
VMware | VMware: Average write latency of the datastore {#DATASTORE} | Average amount of time for a write operation to the datastore (milliseconds). |
SIMPLE | vmware.hv.datastore.write[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},latency] |
VMware | VMware: Multipath count for datastore {#DATASTORE} | Number of available datastore paths. |
SIMPLE | vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}] |
VMware | VMware: Health state rollup | The host health state rollup sensor value: gray - unknown, green - ok, red - it has a problem, yellow - it might have a problem. |
DEPENDENT | vmware.hv.sensor.health.state[{#SINGLETON}] Preprocessing: - JSONPATH: |
Zabbix raw items | VMware: Get sensors | Master item for sensors data. |
SIMPLE | vmware.hv.sensors.get[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware: Hypervisor is down | The service is unavailable or does not accept ICMP ping. |
last(/VMware Hypervisor/icmpping[])=0 |
AVERAGE | Manual close: YES |
VMware: The {$VMWARE.HV.UUID} health is Red | One or more components in the appliance might be in an unusable status and the appliance might become unresponsive soon. Security patches might be available. |
last(/VMware Hypervisor/vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}])=3 |
HIGH | |
VMware: The {$VMWARE.HV.UUID} health is Yellow | One or more components in the appliance might become overloaded soon. |
last(/VMware Hypervisor/vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}])=2 |
AVERAGE | Depends on: - VMware: The {$VMWARE.HV.UUID} health is Red |
VMware: Hypervisor has been restarted | Uptime is less than 10 minutes. |
last(/VMware Hypervisor/vmware.hv.uptime[{$VMWARE.URL},{$VMWARE.HV.UUID}])<10m |
WARNING | Manual close: YES |
VMware: The multipath count has been changed | The number of available datastore paths less than registered ({#MULTIPATH.COUNT}). |
last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}],#1)<>last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}],#2) and last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}])<{#MULTIPATH.COUNT} |
AVERAGE | Manual close: YES |
VMware: The {$VMWARE.HV.UUID} health is Red | One or more components in the appliance might be in an unusable status and the appliance might become unresponsive soon. Security patches might be available. |
last(/VMware Hypervisor/vmware.hv.sensor.health.state[{#SINGLETON}])="Red" |
HIGH | Depends on: - VMware: The {$VMWARE.HV.UUID} health is Red |
VMware: The {$VMWARE.HV.UUID} health is Yellow | One or more components in the appliance might become overloaded soon. |
last(/VMware Hypervisor/vmware.hv.sensor.health.state[{#SINGLETON}])="Yellow" |
AVERAGE | Depends on: - VMware: The {$VMWARE.HV.UUID} health is Red - VMware: The {$VMWARE.HV.UUID} health is Red - VMware: The {$VMWARE.HV.UUID} health is Yellow |
Please report any issues with the template at https://support.zabbix.com
This template is designed to monitor Veeam Backup Enterprise Manager. The Veeam Backup Enterprise Manager REST API enables the communication with Zabbix to query the information about Veeam Backup Enterprise Manager objects. It works without any external scripts and uses the script item.
For Zabbix version: 6.2 and higher.
See Zabbix template operation for basic instructions.
Portal Administrator
role.
> See Veeam Help Center for more details.{$VEEAM.MANAGER.API.URL}
, {$VEEAM.MANAGER.USER}
, {$VEEAM.MANAGER.PASSWORD}
.No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$BACKUP.NAME.MATCHES} | This macro is used in backup discovery rule. |
.* |
{$BACKUP.NAME.NOT_MATCHES} | This macro is used in backup discovery rule. |
CHANGE_IF_NEEDED |
{$BACKUP.TYPE.MATCHES} | This macro is used in backup discovery rule. |
.* |
{$BACKUP.TYPE.NOT_MATCHES} | This macro is used in backup discovery rule. |
CHANGE_IF_NEEDED |
{$VEEAM.MANAGER.API.URL} | Veeam Backup Enterprise Manager API endpoint is a URL in the format: |
https://localhost:9398 |
{$VEEAM.MANAGER.DATA.TIMEOUT} | A response timeout for API. |
10 |
{$VEEAM.MANAGER.HTTP.PROXY} | Sets the HTTP proxy to |
`` |
{$VEEAM.MANAGER.JOB.MAX.FAIL} | The maximum score of failed jobs (for a trigger expression). |
5 |
{$VEEAM.MANAGER.JOB.MAX.WARN} | The maximum score of warning jobs (for a trigger expression). |
10 |
{$VEEAM.MANAGER.PASSWORD} | The |
`` |
{$VEEAM.MANAGER.USER} | The |
`` |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Backup Files discovery | Discovery of all backup files created on, or imported to the backup servers that are connected to Veeam Backup Enterprise Manager. |
DEPENDENT | veeam.backup.files.discovery Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: Filter: AND- {#TYPE} MATCHESREGEX - {#TYPE} NOTMATCHESREGEX - {#NAME} MATCHESREGEX - {#NAME} NOTMATCHESREGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Veeam | Veeam Manager: Get metrics | The result of API requests is expressed in the JSON. |
SCRIPT | veeam.manager.get.metrics Expression: The text is too long. Please see the template. |
Veeam | Veeam Manager: Get errors | The errors from API requests. |
DEPENDENT | veeam.manager.get.errors Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
Veeam | Veeam Manager: Running Jobs | Informs about the running jobs. |
DEPENDENT | veeam.manager.running.jobs Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
Veeam | Veeam Manager: Scheduled Jobs | Informs about the scheduled jobs. |
DEPENDENT | veeam.manager.scheduled.jobs Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
Veeam | Veeam Manager: Scheduled Backup Jobs | Informs about the scheduled backup jobs. |
DEPENDENT | veeam.manager.scheduled.backup.jobs Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
Veeam | Veeam Manager: Scheduled Replica Jobs | Informs about the scheduled replica jobs. |
DEPENDENT | veeam.manager.scheduled.replica.jobs Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
Veeam | Veeam Manager: Total Job Runs | Informs about the total job runs. |
DEPENDENT | veeam.manager.scheduled.total.jobs Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
Veeam | Veeam Manager: Warnings Job Runs | Informs about the warning job runs. |
DEPENDENT | veeam.manager.warning.jobs Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
Veeam | Veeam Manager: Failed Job Runs | Informs about the failed job runs. |
DEPENDENT | veeam.manager.failed.jobs Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
Veeam | Veeam Manager: Backup Size [{#NAME}] | Gets the backup size with the name |
DEPENDENT | veeam.backup.file.size[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Veeam | Veeam Manager: Data Size [{#NAME}] | Gets the data size with the name |
DEPENDENT | veeam.backup.data.size[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Veeam | Veeam Manager: Compression ratio [{#NAME}] | Gets the data compression ratio with the name |
DEPENDENT | veeam.backup.compress.ratio[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Veeam | Veeam Manager: Deduplication Ratio [{#NAME}] | Gets the data deduplication ratio with the name |
DEPENDENT | veeam.backup.deduplication.ratio[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Veeam Manager: There are errors in requests to API | Zabbix has received errors in response to API requests. |
length(last(/Veeam Backup Enterprise Manager by HTTP/veeam.manager.get.errors))>0 |
AVERAGE | |
Veeam Manager: Warning job runs is too high | - |
last(/Veeam Backup Enterprise Manager by HTTP/veeam.manager.warning.jobs)>{$VEEAM.MANAGER.JOB.MAX.WARN} |
WARNING | Manual close: YES |
Veeam Manager: Failed job runs is too high | - |
last(/Veeam Backup Enterprise Manager by HTTP/veeam.manager.failed.jobs)>{$VEEAM.MANAGER.JOB.MAX.FAIL} |
AVERAGE | Manual close: YES |
Please report any issues with the template at https://support.zabbix.com.
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums.
This template is designed to monitor Veeam Backup and Replication version 11.0. It works without any external scripts and uses the script item.
For Zabbix version: 6.2 and higher.
See Zabbix template operation for basic instructions.
{$VEEAM.API.URL}
, {$VEEAM.USER}
, and {$VEEAM.PASSWORD}
.No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$CREATED.AFTER} | Returns sessions that are created after chosen days. |
7 |
{$JOB.NAME.MATCHES} | This macro is used in discovery rule to evaluate the states of jobs. |
.* |
{$JOB.NAME.NOT_MATCHES} | This macro is used in discovery rule to evaluate the states of jobs. |
CHANGE_IF_NEEDED |
{$JOB.STATUS.MATCHES} | This macro is used in discovery rule to evaluate the states of jobs. |
.* |
{$JOB.STATUS.NOT_MATCHES} | This macro is used in discovery rule to evaluate the states of jobs. |
CHANGE_IF_NEEDED |
{$JOB.TYPE.MATCHES} | This macro is used in discovery rule to evaluate the states of jobs. |
.* |
{$JOB.TYPE.NOT_MATCHES} | This macro is used in discovery rule to evaluate the states of jobs. |
CHANGE_IF_NEEDED |
{$PROXIES.NAME.MATCHES} | This macro is used in proxies discovery rule. |
.* |
{$PROXIES.NAME.NOT_MATCHES} | This macro is used in proxies discovery rule. |
CHANGE_IF_NEEDED |
{$PROXIES.TYPE.MATCHES} | This macro is used in proxies discovery rule. |
.* |
{$PROXIES.TYPE.NOT_MATCHES} | This macro is used in proxies discovery rule. |
CHANGE_IF_NEEDED |
{$REPOSITORIES.NAME.MATCHES} | This macro is used in repositories discovery rule. |
.* |
{$REPOSITORIES.NAME.NOT_MATCHES} | This macro is used in repositories discovery rule. |
CHANGE_IF_NEEDED |
{$REPOSITORIES.TYPE.MATCHES} | This macro is used in repositories discovery rule. |
.* |
{$REPOSITORIES.TYPE.NOT_MATCHES} | This macro is used in repositories discovery rule. |
CHANGE_IF_NEEDED |
{$SESSION.NAME.MATCHES} | This macro is used in discovery rule to evaluate sessions. |
.* |
{$SESSION.NAME.NOT_MATCHES} | This macro is used in discovery rule to evaluate sessions. |
CHANGE_IF_NEEDED |
{$SESSION.TYPE.MATCHES} | This macro is used in discovery rule to evaluate sessions. |
.* |
{$SESSION.TYPE.NOT_MATCHES} | This macro is used in discovery rule to evaluate sessions. |
CHANGE_IF_NEEDED |
{$VEEAM.API.URL} | The Veeam API endpoint is a URL in the format |
https://localhost:9419 |
{$VEEAM.DATA.TIMEOUT} | A response timeout for the API. |
10 |
{$VEEAM.HTTP.PROXY} | Sets the HTTP proxy to |
`` |
{$VEEAM.PASSWORD} | The |
`` |
{$VEEAM.USER} | The |
`` |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Jobs states discovery | Discovery of the jobs states. |
DEPENDENT | veeam.job.state.discovery Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: Filter: AND- {#TYPE} MATCHESREGEX - {#TYPE} NOTMATCHESREGEX - {#NAME} MATCHESREGEX - {#NAME} NOTMATCHESREGEX - {#JOB.STATUS} MATCHESREGEX - {#JOB.STATUS} NOTMATCHES_REGEX |
Proxies discovery | Discovery of proxies. |
DEPENDENT | veeam.proxies.discovery Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: Filter: AND- {#TYPE} MATCHESREGEX - {#TYPE} NOTMATCHESREGEX - {#NAME} MATCHESREGEX - {#NAME} NOTMATCHESREGEX |
Repositories discovery | Discovery of repositories. |
DEPENDENT | veeam.repositories.discovery Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: Filter: AND- {#TYPE} MATCHESREGEX - {#TYPE} NOTMATCHESREGEX - {#NAME} MATCHESREGEX - {#NAME} NOTMATCHESREGEX |
Sessions discovery | Discovery of sessions. |
DEPENDENT | veeam.sessions.discovery Preprocessing: - JSONPATH: - JAVASCRIPT - DISCARDUNCHANGEDHEARTBEAT: Filter: AND- {#TYPE} MATCHESREGEX - {#TYPE} NOTMATCHESREGEX - {#NAME} MATCHESREGEX - {#NAME} NOTMATCHESREGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Veeam | Veeam: Get metrics | The result of API requests is expressed in the JSON. |
SCRIPT | veeam.get.metrics Expression: The text is too long. Please see the template. |
Veeam | Veeam: Get errors | The errors from API requests. |
DEPENDENT | veeam.get.errors Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
Veeam | Veeam: Server [{#NAME}]: Get data | Gets raw data collected by the proxy server. |
DEPENDENT | veeam.proxy.server.raw[{#NAME}] Preprocessing: - JSONPATH: |
Veeam | Veeam: Proxy [{#NAME}] [{#TYPE}]: Get data | Gets raw data collected by the proxy with the name |
DEPENDENT | veeam.proxy.raw[{#NAME}] Preprocessing: - JSONPATH: |
Veeam | Veeam: Proxy [{#NAME}] [{#TYPE}]: Max Task Count | The maximum number of concurrent tasks. |
DEPENDENT | veeam.proxy.maxtask[{#NAME}] Preprocessing: - JSONPATH: |
Veeam | Veeam: Proxy [{#NAME}] [{#TYPE}]: Host name | The name of the proxy server. |
DEPENDENT | veeam.proxy.server.name[{#NAME}] Preprocessing: - JSONPATH: |
Veeam | Veeam: Proxy [{#NAME}] [{#TYPE}]: Host type | The type of the proxy server. |
DEPENDENT | veeam.proxy.server.type[{#NAME}] Preprocessing: - JSONPATH: |
Veeam | Veeam: Repository [{#NAME}] [{#TYPE}]: Get data | Gets raw data from repository with the name: |
DEPENDENT | veeam.repositories.raw[{#NAME}] Preprocessing: - JSONPATH: |
Veeam | Veeam: Repository [{#NAME}] [{#TYPE}]: Used space [{#PATH}] | Used space by repositories expressed in gigabytes (GB). |
DEPENDENT | veeam.repository.capacity[{#NAME}] Preprocessing: - JSONPATH: |
Veeam | Veeam: Repository [{#NAME}] [{#TYPE}]: Free space [{#PATH}] | Free space of repositories expressed in gigabytes (GB). |
DEPENDENT | veeam.repository.free.space[{#NAME}] Preprocessing: - JSONPATH: |
Veeam | Veeam: Session [{#NAME}] [{#TYPE}]: Get data | Gets raw data from session with the name: |
DEPENDENT | veeam.sessions.raw[{#ID}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Veeam | Veeam: Session [{#NAME}] [{#TYPE}]: State | The state of the session. The enums used: |
DEPENDENT | veeam.sessions.state[{#ID}] Preprocessing: - JSONPATH: |
Veeam | Veeam: Session [{#NAME}] [{#TYPE}]: Result | The result of the session. The enums used: |
DEPENDENT | veeam.sessions.result[{#ID}] Preprocessing: - JSONPATH: |
Veeam | Veeam: Session [{#NAME}] [{#TYPE}]: Message | A message that explains the session result. |
DEPENDENT | veeam.sessions.message[{#ID}] Preprocessing: - JSONPATH: |
Veeam | Veeam: Session progress percent [{#NAME}] [{#TYPE}] | The progress of the session expressed as percentage. |
DEPENDENT | veeam.sessions.progress.percent[{#ID}] Preprocessing: - JSONPATH: |
Veeam | Veeam: Job states [{#NAME}] [{#TYPE}]: Get data | Gets raw data from the job states with the name |
DEPENDENT | veeam.jobs.states.raw[{#ID}] Preprocessing: - JSONPATH: |
Veeam | Veeam: Job states [{#NAME}] [{#TYPE}]: Status | The current status of the job. The enums used: |
DEPENDENT | veeam.jobs.status[{#ID}] Preprocessing: - JSONPATH: |
Veeam | Veeam: Job states [{#NAME}] [{#TYPE}]: Last result | The result of the session. The enums used: |
DEPENDENT | veeam.jobs.last.result[{#ID}] Preprocessing: - JSONPATH: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Veeam: There are errors in requests to API | Zabbix has received errors in response to API requests. |
length(last(/Veeam Backup and Replication by HTTP/veeam.get.errors))>0 |
AVERAGE | |
Veeam: Last result session failed | - |
find(/Veeam Backup and Replication by HTTP/veeam.sessions.result[{#ID}],,"like","Failed")=1 |
AVERAGE | Manual close: YES |
Veeam: Last result job failed | - |
find(/Veeam Backup and Replication by HTTP/veeam.jobs.last.result[{#ID}],,"like","Failed")=1 |
AVERAGE | Manual close: YES |
Please report any issues with the template at https://support.zabbix.com.
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums.
For Zabbix version: 6.2 and higher
The template to monitor HashiCorp Vault by Zabbix that work without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Vault by HTTP
— collects metrics by HTTP agent from /sys/metrics
API endpoint.
See https://www.vaultproject.io/api-docs/system/metrics.
This template was tested on:
See Zabbix template operation for basic instructions.
Configure Vault API. See Vault Configuration.
Create a Vault service token and set it to the macro {$VAULT.TOKEN}
.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$VAULT.API.PORT} | Vault port. |
8200 |
{$VAULT.API.SCHEME} | Vault API scheme. |
http |
{$VAULT.HOST} | Vault host name. |
<PUT YOUR VAULT HOST> |
{$VAULT.LEADERSHIP.LOSSES.MAX.WARN} | Maximum number of Vault leadership losses. |
5 |
{$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN} | Maximum number of Vault leadership setup failed. |
5 |
{$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN} | Maximum number of Vault leadership step downs. |
5 |
{$VAULT.LLD.FILTER.STORAGE.MATCHES} | Filter of discoverable storage backends. |
.+ |
{$VAULT.OPEN.FDS.MAX.WARN} | Maximum percentage of used file descriptors for trigger expression. |
90 |
{$VAULT.TOKEN.ACCESSORS} | Vault accessors separated by spaces for monitoring token expiration time. |
`` |
{$VAULT.TOKEN.TTL.MIN.CRIT} | Token TTL critical threshold. |
3d |
{$VAULT.TOKEN.TTL.MIN.WARN} | Token TTL warning threshold. |
7d |
{$VAULT.TOKEN} | Vault auth token. |
<PUT YOUR AUTH TOKEN> |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Mountpoint metrics discovery | Mountpoint metrics discovery. |
DEPENDENT | vault.mountpoint.discovery |
Replication metrics discovery | Discovery for replication metrics. |
DEPENDENT | vault.replication.discovery |
Storage metrics discovery | Storage backend metrics discovery. |
DEPENDENT | vault.storage.discovery Filter: AND- {#STORAGE} MATCHES_REGEX |
Token metrics discovery | Tokens metrics discovery. |
DEPENDENT | vault.tokens.discovery |
WAL metrics discovery | Discovery for WAL metrics. |
DEPENDENT | vault.wal.discovery |
Group | Name | Description | Type | Key and additional info | |||
---|---|---|---|---|---|---|---|
Vault | Vault: Initialized | Initialization status. |
DEPENDENT | vault.health.initialized Preprocessing: - JSONPATH: ⛔️ONFAIL: - BOOLTODECIMAL - DISCARDUNCHANGED_HEARTBEAT: |
|||
Vault | Vault: Sealed | Seal status. |
DEPENDENT | vault.health.sealed Preprocessing: - JSONPATH: ⛔️ONFAIL: - BOOLTODECIMAL - DISCARDUNCHANGED_HEARTBEAT: |
|||
Vault | Vault: Standby | Standby status. |
DEPENDENT | vault.health.standby Preprocessing: - JSONPATH: ⛔️ONFAIL: - BOOLTODECIMAL - DISCARDUNCHANGED_HEARTBEAT: |
|||
Vault | Vault: Performance standby | Performance standby status. |
DEPENDENT | vault.health.performancestandby Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - BOOLTODECIMAL - DISCARDUNCHANGEDHEARTBEAT: |
|||
Vault | Vault: Performance replication | Performance replication mode https://www.vaultproject.io/docs/enterprise/replication |
DEPENDENT | vault.health.replicationperformancemode Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
|||
Vault | Vault: Disaster Recovery replication | Disaster recovery replication mode https://www.vaultproject.io/docs/enterprise/replication |
DEPENDENT | vault.health.replicationdrmode Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
|||
Vault | Vault: Version | Server version. |
DEPENDENT | vault.health.version Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
|||
Vault | Vault: Healthcheck | Vault healthcheck. |
DEPENDENT | vault.health.check Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
|||
Vault | Vault: HA enabled | HA enabled status. |
DEPENDENT | vault.leader.haenabled Preprocessing: - JSONPATH: - BOOL TODECIMAL- DISCARD UNCHANGED_HEARTBEAT:1h |
|||
Vault | Vault: Is leader | Leader status. |
DEPENDENT | vault.leader.isself Preprocessing: - JSONPATH: - BOOL TODECIMAL- DISCARD UNCHANGED_HEARTBEAT:1h |
|||
Vault | Vault: Get metrics error | Get metrics error. |
DEPENDENT | vault.getmetrics.error Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> - DISCARDUNCHANGEDHEARTBEAT: |
|||
Vault | Vault: Process CPU seconds, total | Total user and system CPU time spent in seconds. |
DEPENDENT | vault.metrics.process.cpu.seconds.total Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Open file descriptors, max | Maximum number of open file descriptors. |
DEPENDENT | vault.metrics.process.max.fds Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - DISCARDUNCHANGEDHEARTBEAT: |
|||
Vault | Vault: Open file descriptors, current | Number of open file descriptors. |
DEPENDENT | vault.metrics.process.open.fds Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Process resident memory | Resident memory size in bytes. |
DEPENDENT | vault.metrics.process.residentmemory.bytes Preprocessing: - PROMETHEUS PATTERN:process_resident_memory_bytes ⛔️ON_FAIL: |
|||
Vault | Vault: Uptime | Server uptime. |
DEPENDENT | vault.metrics.process.uptime Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - JAVASCRIPT: |
|||
Vault | Vault: Process virtual memory, current | Virtual memory size in bytes. |
DEPENDENT | vault.metrics.process.virtualmemory.bytes Preprocessing: - PROMETHEUS PATTERN:process_virtual_memory_bytes ⛔️ON_FAIL: |
|||
Vault | Vault: Process virtual memory, max | Maximum amount of virtual memory available in bytes. |
DEPENDENT | vault.metrics.process.virtualmemory.max.bytes Preprocessing: - PROMETHEUS PATTERN:process_virtual_memory_max_bytes ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
|||
Vault | Vault: Audit log requests, rate | Number of all audit log requests across all audit log devices. |
DEPENDENT | vault.metrics.audit.log.request.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
|||
Vault | Vault: Audit log request failures, rate | Number of audit log request failures. |
DEPENDENT | vault.metrics.audit.log.request.failure.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
|||
Vault | Vault: Audit log response, rate | Number of audit log responses across all audit log devices. |
DEPENDENT | vault.metrics.audit.log.response.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
|||
Vault | Vault: Audit log response failures, rate | Number of audit log response failures. |
DEPENDENT | vault.metrics.audit.log.response.failure.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
|||
Vault | Vault: Barrier DELETE ops, rate | Number of DELETE operations at the barrier. |
DEPENDENT | vault.metrics.barrier.delete.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
|||
Vault | Vault: Barrier GET ops, rate | Number of GET operations at the barrier. |
DEPENDENT | vault.metrics.vault.barrier.get.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
|||
Vault | Vault: Barrier LIST ops, rate | Number of LIST operations at the barrier. |
DEPENDENT | vault.metrics.barrier.list.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
|||
Vault | Vault: Barrier PUT ops, rate | Number of PUT operations at the barrier. |
DEPENDENT | vault.metrics.barrier.put.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
|||
Vault | Vault: Cache hit, rate | Number of times a value was retrieved from the LRU cache. |
DEPENDENT | vault.metrics.cache.hit.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
|||
Vault | Vault: Cache miss, rate | Number of times a value was not in the LRU cache. The results in a read from the configured storage. |
DEPENDENT | vault.metrics.cache.miss.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
|||
Vault | Vault: Cache write, rate | Number of times a value was written to the LRU cache. |
DEPENDENT | vault.metrics.cache.write.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
|||
Vault | Vault: Check token, rate | Number of token checks handled by Vault core. |
DEPENDENT | vault.metrics.core.check.token.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
|||
Vault | Vault: Fetch ACL and token, rate | Number of ACL and corresponding token entry fetches handled by Vault core. |
DEPENDENT | vault.metrics.core.fetch.aclandtoken Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
|||
Vault | Vault: Requests, rate | Number of requests handled by Vault core. |
DEPENDENT | vault.metrics.core.handle.request Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
|||
Vault | Vault: Leadership setup failed, counter | Cluster leadership setup failures which have occurred in a highly available Vault cluster. |
DEPENDENT | vault.metrics.core.leadership.setupfailed Preprocessing: - PROMETHEUS TOJSON:vault_core_leadership_setup_failed - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
|||
Vault | Vault: Leadership setup lost, counter | Cluster leadership losses which have occurred in a highly available Vault cluster. |
DEPENDENT | vault.metrics.core.leadershiplost Preprocessing: - PROMETHEUS TOJSON:vault_core_leadership_lost_count - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
|||
Vault | Vault: Post-unseal ops, counter | Duration of time taken by post-unseal operations handled by Vault core. |
DEPENDENT | vault.metrics.core.postunseal Preprocessing: - PROMETHEUS PATTERN:vault_core_post_unseal_count ⛔️ON_FAIL: |
|||
Vault | Vault: Pre-seal ops, counter | Duration of time taken by pre-seal operations. |
DEPENDENT | vault.metrics.core.preseal Preprocessing: - PROMETHEUS PATTERN:vault_core_pre_seal_count ⛔️ON_FAIL: |
|||
Vault | Vault: Requested seal ops, counter | Duration of time taken by requested seal operations. |
DEPENDENT | vault.metrics.core.sealwithrequest Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Seal ops, counter | Duration of time taken by seal operations. |
DEPENDENT | vault.metrics.core.seal Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Internal seal ops, counter | Duration of time taken by internal seal operations. |
DEPENDENT | vault.metrics.core.sealinternal Preprocessing: - PROMETHEUS PATTERN:vault_core_seal_internal_count ⛔️ON_FAIL: |
|||
Vault | Vault: Leadership step downs, counter | Cluster leadership step down. |
DEPENDENT | vault.metrics.core.stepdown Preprocessing: - PROMETHEUS TOJSON:vault_core_step_down_count - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
|||
Vault | Vault: Unseal ops, counter | Duration of time taken by unseal operations. |
DEPENDENT | vault.metrics.core.unseal Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Fetch lease times, counter | Time taken to fetch lease times. |
DEPENDENT | vault.metrics.expire.fetch.lease.times Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Fetch lease times by token, counter | Time taken to fetch lease times by token. |
DEPENDENT | vault.metrics.expire.fetch.lease.times.bytoken Preprocessing: - PROMETHEUS PATTERN:vault_expire_fetch_lease_times_by_token_count ⛔️ON_FAIL: |
|||
Vault | Vault: Number of expiring leases | Number of all leases which are eligible for eventual expiry. |
DEPENDENT | vault.metrics.expire.numleases Preprocessing: - PROMETHEUS PATTERN:vault_expire_num_leases ⛔️ON_FAIL: |
|||
Vault | Vault: Expire revoke, count | Time taken to revoke a token. |
DEPENDENT | vault.metrics.expire.revoke Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Expire revoke force, count | Time taken to forcibly revoke a token. |
DEPENDENT | vault.metrics.expire.revoke.force Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Expire revoke prefix, count | Tokens revoke on a prefix. |
DEPENDENT | vault.metrics.expire.revoke.prefix Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Revoke secrets by token, count | Time taken to revoke all secrets issued with a given token. |
DEPENDENT | vault.metrics.expire.revoke.bytoken Preprocessing: - PROMETHEUS PATTERN:vault_expire_revoke_by_token_count ⛔️ON_FAIL: |
|||
Vault | Vault: Expire renew, count | Time taken to renew a lease. |
DEPENDENT | vault.metrics.expire.renew Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Renew token, count | Time taken to renew a token which does not need to invoke a logical backend. |
DEPENDENT | vault.metrics.expire.renewtoken Preprocessing: - PROMETHEUS PATTERN:vault_expire_renew_token_count ⛔️ON_FAIL: |
|||
Vault | Vault: Register ops, count | Time taken for register operations. |
DEPENDENT | vault.metrics.expire.register Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Register auth ops, count | Time taken for register authentication operations which create lease entries without lease ID. |
DEPENDENT | vault.metrics.expire.register.auth Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Policy GET ops, rate | Number of operations to get a policy. |
DEPENDENT | vault.metrics.policy.getpolicy.rate Preprocessing: - PROMETHEUS PATTERN:vault_policy_get_policy_count ⛔️ONFAIL: - CHANGEPER_SECOND |
|||
Vault | Vault: Policy LIST ops, rate | Number of operations to list policies. |
DEPENDENT | vault.metrics.policy.listpolicies.rate Preprocessing: - PROMETHEUS PATTERN:vault_policy_list_policies_count ⛔️ONFAIL: - CHANGEPER_SECOND |
|||
Vault | Vault: Policy DELETE ops, rate | Number of operations to delete a policy. |
DEPENDENT | vault.metrics.policy.deletepolicy.rate Preprocessing: - PROMETHEUS PATTERN:vault_policy_delete_policy_count ⛔️ONFAIL: - CHANGEPER_SECOND |
|||
Vault | Vault: Policy SET ops, rate | Number of operations to set a policy. |
DEPENDENT | vault.metrics.policy.setpolicy.rate Preprocessing: - PROMETHEUS PATTERN:vault_policy_set_policy_count ⛔️ONFAIL: - CHANGEPER_SECOND |
|||
Vault | Vault: Token create, count | The time taken to create a token. |
DEPENDENT | vault.metrics.token.create Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Token createAccessor, count | The time taken to create a token accessor. |
DEPENDENT | vault.metrics.token.createAccessor Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Token lookup, rate | Number of token look up. |
DEPENDENT | vault.metrics.token.lookup.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
|||
Vault | Vault: Token revoke, count | The time taken to look up a token. |
DEPENDENT | vault.metrics.token.revoke Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Token revoke tree, count | Time taken to revoke a token tree. |
DEPENDENT | vault.metrics.token.revoke.tree Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Token store, count | Time taken to store an updated token entry without writing to the secondary index. |
DEPENDENT | vault.metrics.token.store Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Runtime allocated bytes | Number of bytes allocated by the Vault process. This could burst from time to time, but should return to a steady state value. |
DEPENDENT | vault.metrics.runtime.alloc.bytes Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Runtime freed objects | Number of freed objects. |
DEPENDENT | vault.metrics.runtime.free.count Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Runtime heap objects | Number of objects on the heap. This is a good general memory pressure indicator worth establishing a baseline and thresholds for alerting. |
DEPENDENT | vault.metrics.runtime.heap.objects Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Runtime malloc count | Cumulative count of allocated heap objects. |
DEPENDENT | vault.metrics.runtime.malloc.count Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Runtime num goroutines | Number of goroutines. This serves as a general system load indicator worth establishing a baseline and thresholds for alerting. |
DEPENDENT | vault.metrics.runtime.numgoroutines Preprocessing: - PROMETHEUS PATTERN:vault_runtime_num_goroutines ⛔️ON_FAIL: |
|||
Vault | Vault: Runtime sys bytes | Number of bytes allocated to Vault. This includes what is being used by Vault's heap and what has been reclaimed but not given back to the operating system. |
DEPENDENT | vault.metrics.runtime.sys.bytes Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Runtime GC pause, total | The total garbage collector pause time since Vault was last started. |
DEPENDENT | vault.metrics.total.gc.pause Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - MULTIPLIER: |
|||
Vault | Vault: Runtime GC runs, total | Total number of garbage collection runs since Vault was last started. |
DEPENDENT | vault.metrics.runtime.total.gc.runs Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Token count, total | Total number of service tokens available for use; counts all un-expired and un-revoked tokens in Vault's token store. This measurement is performed every 10 minutes. |
DEPENDENT | vault.metrics.token Preprocessing: - PROMETHEUSTOJSON: - JSONPATH: ⛔️ON_FAIL: |
|||
Vault | Vault: Token count by auth, total | Total number of service tokens that were created by a auth method. |
DEPENDENT | vault.metrics.token.byauth Preprocessing: - PROMETHEUS TOJSON:vault_token_count_by_auth - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
|||
Vault | Vault: Token count by policy, total | Total number of service tokens that have a policy attached. |
DEPENDENT | vault.metrics.token.bypolicy Preprocessing: - PROMETHEUS TOJSON:vault_token_count_by_policy - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
|||
Vault | Vault: Token count by ttl, total | Number of service tokens, grouped by the TTL range they were assigned at creation. |
DEPENDENT | vault.metrics.token.byttl Preprocessing: - PROMETHEUS TOJSON:vault_token_count_by_ttl - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
|||
Vault | Vault: Token creation, rate | Number of service or batch tokens created. |
DEPENDENT | vault.metrics.token.creation.rate Preprocessing: - PROMETHEUSTOJSON: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
|||
Vault | Vault: Secret kv entries | Number of entries in each key-value secret engine. |
DEPENDENT | vault.metrics.secret.kv.count Preprocessing: - PROMETHEUSTOJSON: - JSONPATH: ⛔️ON_FAIL: |
|||
Vault | Vault: Token secret lease creation, rate | Counts the number of leases created by secret engines. |
DEPENDENT | vault.metrics.secret.lease.creation.rate Preprocessing: - PROMETHEUSTOJSON: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
|||
Vault | Vault: Storage [{#STORAGE}] {#OPERATION} ops, rate | Number of a {#OPERATION} operation against the {#STORAGE} storage backend. |
DEPENDENT | vault.metrics.storage.rate[{#STORAGE}, {#OPERATION}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
|||
Vault | Vault: Rollback attempt [{#MOUNTPOINT}] ops, rate | Number of operations to perform a rollback operation on the given mount point. |
DEPENDENT | vault.metrics.rollback.attempt.rate[{#MOUNTPOINT}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
|||
Vault | Vault: Route rollback [{#MOUNTPOINT}] ops, rate | Number of operations to dispatch a rollback operation to a backend, and for that backend to process it. Rollback operations are automatically scheduled to clean up partial errors. |
DEPENDENT | vault.metrics.route.rollback.rate[{#MOUNTPOINT}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
|||
Vault | Vault: Delete WALs, count{#SINGLETON} | Time taken to delete a Write Ahead Log (WAL). |
DEPENDENT | vault.metrics.wal.deletewals[{#SINGLETON}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: GC deleted WAL{#SINGLETON} | Number of Write Ahead Logs (WAL) deleted during each garbage collection run. |
DEPENDENT | vault.metrics.wal.gc.deleted[{#SINGLETON}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: WALs on disk, total{#SINGLETON} | Total Number of Write Ahead Logs (WAL) on disk. |
DEPENDENT | vault.metrics.wal.gc.total[{#SINGLETON}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Load WALs, count{#SINGLETON} | Time taken to load a Write Ahead Log (WAL). |
DEPENDENT | vault.metrics.wal.loadWAL[{#SINGLETON}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Persist WALs, count{#SINGLETON} | Time taken to persist a Write Ahead Log (WAL). |
DEPENDENT | vault.metrics.wal.persistwals[{#SINGLETON}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Flush ready WAL, count{#SINGLETON} | Time taken to flush a ready Write Ahead Log (WAL) to storage. |
DEPENDENT | vault.metrics.wal.flushready[{#SINGLETON}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Stream WAL missing guard, count{#SINGLETON} | Number of incidences where the starting Merkle Tree index used to begin streaming WAL entries is not matched/found. |
DEPENDENT | vault.metrics.logshipper.streamWALs.missingguard[{#SINGLETON}] Preprocessing: - PROMETHEUS PATTERN:logshipper_streamWALs_missing_guard ⛔️ON_FAIL: |
|||
Vault | Vault: Stream WAL guard found, count{#SINGLETON} | Number of incidences where the starting Merkle Tree index used to begin streaming WAL entries is matched/found. |
DEPENDENT | vault.metrics.logshipper.streamWALs.guardfound[{#SINGLETON}] Preprocessing: - PROMETHEUS PATTERN:logshipper_streamWALs_guard_found ⛔️ON_FAIL: |
|||
Vault | Vault: Merkle commit index{#SINGLETON} | The last committed index in the Merkle Tree. |
DEPENDENT | vault.metrics.replication.merkle.commitindex[{#SINGLETON}] Preprocessing: - PROMETHEUS PATTERN:replication_merkle_commit_index ⛔️ON_FAIL: |
|||
Vault | Vault: Last WAL{#SINGLETON} | The index of the last WAL. |
DEPENDENT | vault.metrics.replication.wal.lastwal[{#SINGLETON}] Preprocessing: - PROMETHEUS PATTERN:replication_wal_last_wal ⛔️ON_FAIL: |
|||
Vault | Vault: Last DR WAL{#SINGLETON} | The index of the last DR WAL. |
DEPENDENT | vault.metrics.replication.wal.lastdrwal[{#SINGLETON}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Last performance WAL{#SINGLETON} | The index of the last Performance WAL. |
DEPENDENT | vault.metrics.replication.wal.lastperformancewal[{#SINGLETON}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Last remote WAL{#SINGLETON} | The index of the last remote WAL. |
DEPENDENT | vault.metrics.replication.fsm.lastremotewal[{#SINGLETON}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
|||
Vault | Vault: Token [{#TOKEN_NAME}] error | Token lookup error text. |
DEPENDENT | vault.tokenviaaccessor.error["{#ACCESSOR}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
|||
Vault | Vault: Token [{#TOKEN_NAME}] has TTL | The Token has TTL. |
DEPENDENT | vault.tokenviaaccessor.hasttl["{#ACCESSOR}"] Preprocessing: - JSONPATH: - BOOL TODECIMAL- DISCARD UNCHANGED_HEARTBEAT:1h |
|||
Vault | Vault: Token [{#TOKEN_NAME}] TTL | The TTL period of the token. |
DEPENDENT | vault.tokenviaaccessor.ttl["{#ACCESSOR}"] Preprocessing: - JSONPATH: |
|||
Zabbix raw items | Vault: Get health | - |
HTTP_AGENT | vault.gethealth Preprocessing: - CHECK NOTSUPPORTED⛔️ON FAIL:CUSTOM_VALUE -> {"healthcheck": 0} |
|||
Zabbix raw items | Vault: Get leader | - |
HTTP_AGENT | vault.getleader Preprocessing: - CHECK NOT_SUPPORTED |
|||
Zabbix raw items | Vault: Get metrics | - |
HTTP_AGENT | vault.getmetrics Preprocessing: - CHECK NOT_SUPPORTED |
|||
Zabbix raw items | Vault: Clear metrics | - |
DEPENDENT | vault.clearmetrics Preprocessing: - CHECK JSONERROR:$.errors ⛔️ON FAIL:DISCARD_VALUE -> |
|||
Zabbix raw items | Vault: Get tokens | Get information about tokens via their accessors. Accessors are defined in the macro "{$VAULT.TOKEN.ACCESSORS}". |
SCRIPT | vault.get_tokens Expression: The text is too long. Please see the template. |
|||
Zabbix raw items | Vault: Check WAL discovery | - |
DEPENDENT | vault.checkwaldiscovery Preprocessing: - PROMETHEUSTOJSON: ⛔️ONFAIL: - JAVASCRIPT: - DISCARDUNCHANGED_HEARTBEAT: |
|||
Zabbix raw items | Vault: Check replication discovery | - |
DEPENDENT | vault.checkreplicationdiscovery Preprocessing: - PROMETHEUSTOJSON: ⛔️ONFAIL: - JAVASCRIPT: - DISCARDUNCHANGED_HEARTBEAT: |
|||
Zabbix raw items | Vault: Check storage discovery | - |
DEPENDENT | vault.checkstoragediscovery Preprocessing: - PROMETHEUSTOJSON: `{name=~"^vault(?:.+)(?:get |
put | list | delete)count$"}</p><p>⛔️ON_FAIL: DISCARDVALUE -> </p><p>- JAVASCRIPT: The text is too long. Please see the template.</p><p>- DISCARD_UNCHANGED_HEARTBEAT: 15m` |
Zabbix raw items | Vault: Check mountpoint discovery | - |
DEPENDENT | vault.checkmountpointdiscovery Preprocessing: - PROMETHEUSTOJSON: ⛔️ONFAIL: - JAVASCRIPT: - DISCARDUNCHANGED_HEARTBEAT: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Vault: Vault server is sealed | https://www.vaultproject.io/docs/concepts/seal |
last(/HashiCorp Vault by HTTP/vault.health.sealed)=1 |
AVERAGE | |
Vault: Version has changed | Vault version has changed. Ack to close. |
last(/HashiCorp Vault by HTTP/vault.health.version,#1)<>last(/HashiCorp Vault by HTTP/vault.health.version,#2) and length(last(/HashiCorp Vault by HTTP/vault.health.version))>0 |
INFO | Manual close: YES |
Vault: Vault server is not responding | - |
last(/HashiCorp Vault by HTTP/vault.health.check)=0 |
HIGH | |
Vault: Failed to get metrics | - |
length(last(/HashiCorp Vault by HTTP/vault.get_metrics.error))>0 |
WARNING | Depends on: - Vault: Vault server is sealed |
Vault: Current number of open files is too high | - |
min(/HashiCorp Vault by HTTP/vault.metrics.process.open.fds,5m)/last(/HashiCorp Vault by HTTP/vault.metrics.process.max.fds)*100>{$VAULT.OPEN.FDS.MAX.WARN} |
WARNING | |
Vault: has been restarted | Uptime is less than 10 minutes. |
last(/HashiCorp Vault by HTTP/vault.metrics.process.uptime)<10m |
INFO | Manual close: YES |
Vault: High frequency of leadership setup failures | There have been more than {$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN} Vault leadership setup failures in the past 1h. |
(max(/HashiCorp Vault by HTTP/vault.metrics.core.leadership.setup_failed,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.leadership.setup_failed,1h))>{$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN} |
AVERAGE | |
Vault: High frequency of leadership losses | There have been more than {$VAULT.LEADERSHIP.LOSSES.MAX.WARN} Vault leadership losses in the past 1h. |
(max(/HashiCorp Vault by HTTP/vault.metrics.core.leadership_lost,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.leadership_lost,1h))>{$VAULT.LEADERSHIP.LOSSES.MAX.WARN} |
AVERAGE | |
Vault: High frequency of leadership step downs | There have been more than {$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN} Vault leadership step downs in the past 1h. |
(max(/HashiCorp Vault by HTTP/vault.metrics.core.step_down,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.step_down,1h))>{$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN} |
AVERAGE | |
Vault: Token [{#TOKEN_NAME}] lookup error occurred | - |
length(last(/HashiCorp Vault by HTTP/vault.token_via_accessor.error["{#ACCESSOR}"]))>0 |
WARNING | Depends on: - Vault: Vault server is sealed |
Vault: Token [{#TOKEN_NAME}] will expire soon | - |
last(/HashiCorp Vault by HTTP/vault.token_via_accessor.has_ttl["{#ACCESSOR}"])=1 and last(/HashiCorp Vault by HTTP/vault.token_via_accessor.ttl["{#ACCESSOR}"])<{$VAULT.TOKEN.TTL.MIN.CRIT} |
AVERAGE | |
Vault: Token [{#TOKEN_NAME}] will expire soon | - |
last(/HashiCorp Vault by HTTP/vault.token_via_accessor.has_ttl["{#ACCESSOR}"])=1 and last(/HashiCorp Vault by HTTP/vault.token_via_accessor.ttl["{#ACCESSOR}"])<{$VAULT.TOKEN.TTL.MIN.WARN} |
WARNING | Depends on: - Vault: Token [{#TOKEN_NAME}] will expire soon |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher. Template for monitoring TrueNAS by SNMP
This template was tested on:
See Zabbix template operation for basic instructions.
No specific Zabbix configuration is required.
Name | Description | Default | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
{$CPU.UTIL.CRIT} | Threshold of CPU utilization for warning trigger in %. |
90 |
||||||||||||
{$DATASET.FREE.MIN.CRIT} | This macro is used for trigger expression. It can be overridden on the host or linked on the template level. |
5G |
||||||||||||
{$DATASET.FREE.MIN.WARN} | This macro is used for trigger expression. It can be overridden on the host or linked on the template level. |
5G |
||||||||||||
{$DATASET.NAME.MATCHES} | This macro is used in datasets discovery. Can be overridden on the host or linked template level |
.+ |
||||||||||||
{$DATASET.NAME.NOT_MATCHES} | This macro is used in datasets discovery. Can be overridden on the host or linked template level |
`^(boot | .+.system(.+)?$)` | |||||||||||
{$DATASET.PUSED.MAX.CRIT} | Threshold of used dataset space for average severity trigger in %. |
90 |
||||||||||||
{$DATASET.PUSED.MAX.WARN} | Threshold of used dataset space for warning trigger in %. |
80 |
||||||||||||
{$ICMPLOSSWARN} | Threshold of ICMP packets loss for warning trigger in %. |
20 |
||||||||||||
{$ICMPRESPONSETIME_WARN} | Threshold of average ICMP response time for warning trigger in seconds. |
0.15 |
||||||||||||
{$IF.ERRORS.WARN} | Threshold of error packets rate for warning trigger. Can be used with interface name as context. |
2 |
||||||||||||
{$IF.UTIL.MAX} | Threshold of interface bandwidth utilization for warning trigger in %. Can be used with interface name as context. |
90 |
||||||||||||
{$IFCONTROL} | Macro for operational state of the interface for link down trigger. Can be used with interface name as context. |
1 |
||||||||||||
{$LOADAVGPER_CPU.MAX.WARN} | Load per CPU considered sustainable. Tune if needed. |
1.5 |
||||||||||||
{$MEMORY.AVAILABLE.MIN} | Threshold of available memory for trigger in bytes. |
20M |
||||||||||||
{$MEMORY.UTIL.MAX} | Threshold of memory utilization for trigger in % |
90 |
||||||||||||
{$NET.IF.IFADMINSTATUS.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
^.* |
||||||||||||
{$NET.IF.IFADMINSTATUS.NOT_MATCHES} | Ignore down(2) administrative status |
^2$ |
||||||||||||
{$NET.IF.IFALIAS.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
.* |
||||||||||||
{$NET.IF.IFALIAS.NOT_MATCHES} | This macro is used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
||||||||||||
{$NET.IF.IFDESCR.MATCHES} | This macro used in filters of network interfaces discovery rule. |
.* |
||||||||||||
{$NET.IF.IFDESCR.NOT_MATCHES} | This macro used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
||||||||||||
{$NET.IF.IFNAME.NOT_MATCHES} | This macro used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
||||||||||||
{$NET.IF.IFOPERSTATUS.MATCHES} | This macro used in filters of network interfaces discovery rule. |
^.*$ |
||||||||||||
{$NET.IF.IFOPERSTATUS.NOT_MATCHES} | Ignore notPresent(6) |
^6$ |
||||||||||||
{$NET.IF.IFTYPE.MATCHES} | This macro used in filters of network interfaces discovery rule. |
.* |
||||||||||||
{$NET.IF.IFTYPE.NOT_MATCHES} | This macro used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
||||||||||||
{$SNMP.TIMEOUT} | The time interval for SNMP availability trigger. |
5m |
||||||||||||
{$SWAP.PFREE.MIN.WARN} | Threshold of free swap space for warning trigger in %. |
50 |
||||||||||||
{$TEMPERATURE.MAX.CRIT} | This macro is used for trigger expression. It can be overridden on the host or linked on the template level. |
65 |
||||||||||||
{$TEMPERATURE.MAX.WARN} | This macro is used for trigger expression. It can be overridden on the host or linked on the template level. |
50 |
||||||||||||
{$VFS.DEV.DEVNAME.MATCHES} | This macro is used in block devices discovery. Can be overridden on the host or linked template level |
.+ |
||||||||||||
{$VFS.DEV.DEVNAME.NOT_MATCHES} | This macro is used in block devices discovery. Can be overridden on the host or linked template level |
`^(loop[0-9]* | sd[a-z][0-9]+ | nbd[0-9]+ | sr[0-9]+ | fd[0-9]+ | dm-[0-9]+ | ram[0-9]+ | ploop[a-z0-9]+ | md[0-9]* | hcp[0-9]* | cd[0-9]* | pass[0-9]* | zram[0-9]*)` |
{$ZPOOL.FREE.MIN.CRIT} | This macro is used for trigger expression. It can be overridden on the host or linked on the template level. |
5G |
||||||||||||
{$ZPOOL.FREE.MIN.WARN} | This macro is used for trigger expression. It can be overridden on the host or linked on the template level. |
5G |
||||||||||||
{$ZPOOL.PUSED.MAX.CRIT} | Threshold of used pool space for average severity trigger in %. |
90 |
||||||||||||
{$ZPOOL.PUSED.MAX.WARN} | Threshold of used pool space for warning trigger in %. |
80 |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Block devices discovery | Block devices are discovered from UCD-DISKIO-MIB::diskIOTable (http://net-snmp.sourceforge.net/docs/mibs/ucdDiskIOMIB.html#diskIOTable). |
SNMP | vfs.dev.discovery Filter: AND- {#DEVNAME} MATCHESREGEX - {#DEVNAME} NOTMATCHES_REGEX |
CPU discovery | This discovery will create set of per core CPU metrics from UCD-SNMP-MIB, using {#CPU.COUNT} in preprocessing. That's the only reason why LLD is used. |
DEPENDENT | cpu.discovery Preprocessing: - JAVASCRIPT: |
Disks temperature discovery | Disks temperature discovery from FREENAS-MIB. |
SNMP | truenas.disk.temp.discovery Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Network interfaces discovery | Discovering interfaces from IF-MIB. |
SNMP | net.if.discovery Filter: AND- {#IFADMINSTATUS} MATCHESREGEX - {#IFADMINSTATUS} NOTMATCHESREGEX - {#IFOPERSTATUS} MATCHESREGEX - {#IFOPERSTATUS} NOTMATCHESREGEX - {#IFNAME} MATCHESREGEX - {#IFNAME} NOTMATCHESREGEX - {#IFDESCR} MATCHESREGEX - {#IFDESCR} NOTMATCHESREGEX - {#IFALIAS} MATCHESREGEX - {#IFALIAS} NOTMATCHESREGEX - {#IFTYPE} MATCHESREGEX - {#IFTYPE} NOTMATCHESREGEX |
ZFS datasets discovery | ZFS datasets discovery from FREENAS-MIB. |
SNMP | truenas.zfs.dataset.discovery Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: Filter: AND- {#DATASETNAME} MATCHESREGEX - {#DATASETNAME} NOTMATCHES_REGEX |
ZFS pools discovery | ZFS pools discovery from FREENAS-MIB. |
SNMP | truenas.zfs.pools.discovery Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
ZFS volumes discovery | ZFS volumes discovery from FREENAS-MIB. |
SNMP | truenas.zfs.zvols.discovery Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
CPU | TrueNAS: Interrupts per second | MIB: UCD-SNMP-MIB Number of interrupts processed. |
SNMP | system.cpu.intr Preprocessing: - CHANGEPERSECOND |
CPU | TrueNAS: Context switches per second | MIB: UCD-SNMP-MIB Number of context switches. |
SNMP | system.cpu.switches Preprocessing: - CHANGEPERSECOND |
CPU | TrueNAS: Load average (1m avg) | MIB: UCD-SNMP-MIB The 1 minute load averages. |
SNMP | system.cpu.load.avg1 |
CPU | TrueNAS: Load average (5m avg) | MIB: UCD-SNMP-MIB The 5 minutes load averages. |
SNMP | system.cpu.load.avg5 |
CPU | TrueNAS: Load average (15m avg) | MIB: UCD-SNMP-MIB The 15 minutes load averages. |
SNMP | system.cpu.load.avg15 |
CPU | TrueNAS: Number of CPUs | MIB: HOST-RESOURCES-MIB Count the number of CPU cores by counting number of cores discovered in hrProcessorTable using LLD. |
SNMP | system.cpu.num Preprocessing: - JAVASCRIPT: |
CPU | TrueNAS: CPU idle time | MIB: UCD-SNMP-MIB The time the CPU has spent doing nothing. |
SNMP | system.cpu.idle[{#SNMPINDEX}] |
CPU | TrueNAS: CPU system time | MIB: UCD-SNMP-MIB The time the CPU has spent running the kernel and its processes. |
SNMP | system.cpu.system[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND - JAVASCRIPT: |
CPU | TrueNAS: CPU user time | MIB: UCD-SNMP-MIB The time the CPU has spent running users' processes that are not niced. |
SNMP | system.cpu.user[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND - JAVASCRIPT: |
CPU | TrueNAS: CPU nice time | MIB: UCD-SNMP-MIB The time the CPU has spent running users' processes that have been niced. |
SNMP | system.cpu.nice[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND - JAVASCRIPT: |
CPU | TrueNAS: CPU iowait time | MIB: UCD-SNMP-MIB The amount of time the CPU has been waiting for I/O to complete. |
SNMP | system.cpu.iowait[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND - JAVASCRIPT: |
CPU | TrueNAS: CPU interrupt time | MIB: UCD-SNMP-MIB The amount of time the CPU has been servicing hardware interrupts. |
SNMP | system.cpu.interrupt[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND - JAVASCRIPT: |
CPU | TrueNAS: CPU utilization | The CPU utilization expressed in %. |
DEPENDENT | system.cpu.util[{#SNMPINDEX}] Preprocessing: - JAVASCRIPT: |
General | TrueNAS: System contact details | MIB: SNMPv2-MIB The textual identification of the contact person for this managed node, together with information on how to contact this person. If no contact information is known, the value is the zero-length string. |
SNMP | system.contact Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
General | TrueNAS: System description | MIB: SNMPv2-MIB System description of the host. |
SNMP | system.descr Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
General | TrueNAS: System location | MIB: SNMPv2-MIB The physical location of this node. If the location is unknown, the value is the zero-length string. |
SNMP | system.location Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
General | TrueNAS: System name | MIB: SNMPv2-MIB The host name of the system. |
SNMP | system.name Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
General | TrueNAS: System object ID | MIB: SNMPv2-MIB The vendor authoritative identification of the network management subsystem contained in the entity. This value is allocated within the SMI enterprises subtree (1.3.6.1.4.1) and provides an easy and unambiguous means for determining what kind of box is being managed. |
SNMP | system.objectid Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Memory | TrueNAS: Free memory | MIB: UCD-SNMP-MIB The amount of real/physical memory currently unused or available. |
SNMP | vm.memory.free Preprocessing: - MULTIPLIER: |
Memory | TrueNAS: Memory (buffers) | MIB: UCD-SNMP-MIB The total amount of real or virtual memory currently allocated for use as memory buffers. |
SNMP | vm.memory.buffers Preprocessing: - MULTIPLIER: |
Memory | TrueNAS: Memory (cached) | MIB: UCD-SNMP-MIB The total amount of real or virtual memory currently allocated for use as cached memory. |
SNMP | vm.memory.cached Preprocessing: - MULTIPLIER: |
Memory | TrueNAS: Total memory | MIB: UCD-SNMP-MIB The total memory expressed in Bytes. |
SNMP | vm.memory.total Preprocessing: - MULTIPLIER: |
Memory | TrueNAS: Available memory | Please note that memory utilization is a rough estimate, since memory available is calculated as free+buffers+cached, which is not 100% accurate, but the best we can get using SNMP. |
CALCULATED | vm.memory.available Expression: last(//vm.memory.free)+last(//vm.memory.buffers)+last(//vm.memory.cached) |
Memory | TrueNAS: Memory utilization | Please note that memory utilization is a rough estimate, since memory available is calculated as free+buffers+cached, which is not 100% accurate, but the best we can get using SNMP. |
CALCULATED | vm.memory.util Expression: (last(//vm.memory.total)-(last(//vm.memory.free)+last(//vm.memory.buffers)+last(//vm.memory.cached)))/last(//vm.memory.total)*100 |
Memory | TrueNAS: Total swap space | MIB: UCD-SNMP-MIB The total amount of swap space configured for this host. |
SNMP | system.swap.total Preprocessing: - MULTIPLIER: |
Memory | TrueNAS: Free swap space | MIB: UCD-SNMP-MIB The amount of swap space currently unused or available. |
SNMP | system.swap.free Preprocessing: - MULTIPLIER: |
Memory | TrueNAS: Free swap space in % | The free space of the swap volume/file expressed in %. |
CALCULATED | system.swap.pfree Expression: last(//system.swap.free)/last(//system.swap.total)*100 |
Network interfaces | TrueNAS: Interface [{#IFNAME}({#IFALIAS})]: Inbound packets discarded | MIB: IF-MIB The number of inbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP | net.if.in.discards[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND: `` |
Network interfaces | TrueNAS: Interface [{#IFNAME}({#IFALIAS})]: Inbound packets with errors | MIB: IF-MIB For packet-oriented interfaces, the number of inbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of inbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP | net.if.in.errors[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND: `` |
Network interfaces | TrueNAS: Interface [{#IFNAME}({#IFALIAS})]: Bits received | MIB: IF-MIB The total number of octets received on the interface, including framing characters. This object is a 64-bit version of ifInOctets. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP | net.if.in[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND: ` |
Network interfaces | TrueNAS: Interface [{#IFNAME}({#IFALIAS})]: Outbound packets discarded | MIB: IF-MIB The number of outbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP | net.if.out.discards[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND: `` |
Network interfaces | TrueNAS: Interface [{#IFNAME}({#IFALIAS})]: Outbound packets with errors | MIB: IF-MIB For packet-oriented interfaces, the number of outbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of outbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP | net.if.out.errors[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND: `` |
Network interfaces | TrueNAS: Interface [{#IFNAME}({#IFALIAS})]: Bits sent | MIB: IF-MIB The total number of octets transmitted out of the interface, including framing characters. This object is a 64-bit version of ifOutOctets.Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP | net.if.out[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND: ` |
Network interfaces | TrueNAS: Interface [{#IFNAME}({#IFALIAS})]: Speed | MIB: IF-MIB An estimate of the interface's current bandwidth in units of 1,000,000 bits per second. If this object reports a value of |
SNMP | net.if.speed[{#SNMPINDEX}] Preprocessing: - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
Network interfaces | TrueNAS: Interface [{#IFNAME}({#IFALIAS})]: Operational status | MIB: IF-MIB The current operational state of the interface. - The testing(3) state indicates that no operational packet scan be passed - If ifAdminStatus is down(2) then ifOperStatus should be down(2) - If ifAdminStatus is changed to up(1) then ifOperStatus should change to up(1) if the interface is ready to transmit and receive network traffic - It should change todormant(5) if the interface is waiting for external actions (such as a serial line waiting for an incoming connection) - It should remain in the down(2) state if and only if there is a fault that prevents it from going to the up(1) state - It should remain in the notPresent(6) state if the interface has missing(typically, hardware) components. |
SNMP | net.if.status[{#SNMPINDEX}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Network interfaces | TrueNAS: Interface [{#IFNAME}({#IFALIAS})]: Interface type | MIB: IF-MIB The type of interface. Additional values for ifType are assigned by the Internet Assigned Numbers Authority (IANA), through updating the syntax of the IANAifType textual convention. |
SNMP | net.if.type[{#SNMPINDEX}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Status | TrueNAS: ICMP ping | Host accessibility by ICMP. 0 - ICMP ping fails. 1 - ICMP ping successful. |
SIMPLE | icmpping |
Status | TrueNAS: ICMP loss | Percentage of lost packets. |
SIMPLE | icmppingloss |
Status | TrueNAS: ICMP response time | ICMP ping response time (in seconds). |
SIMPLE | icmppingsec |
Status | TrueNAS: Uptime | MIB: SNMPv2-MIB The system uptime expressed in the following format:'N days, hh:mm:ss'. |
SNMP | system.uptime Preprocessing: - MULTIPLIER: |
Status | TrueNAS: SNMP agent availability | Availability of SNMP checks on the host. The value of this item corresponds to availability icons in the host list. Possible value: 0 - not available 1 - available 2 - unknown |
INTERNAL | zabbix[host,snmp,available] |
Storage | TrueNAS: [{#DEVNAME}]: Disk read rate | MIB: UCD-DISKIO-MIB The number of read accesses from this device since boot. |
SNMP | vfs.dev.read.rate[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND |
Storage | TrueNAS: [{#DEVNAME}]: Disk write rate | MIB: UCD-DISKIO-MIB The number of write accesses from this device since boot. |
SNMP | vfs.dev.write.rate[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND |
Storage | TrueNAS: [{#DEVNAME}]: Disk utilization | MIB: UCD-DISKIO-MIB The 1 minute average load of disk (%). |
SNMP | vfs.dev.util[{#SNMPINDEX}] |
TrueNAS | TrueNAS: ARC size | MIB: FREENAS-MIB ARC size in bytes. |
SNMP | truenas.zfs.arc.size Preprocessing: - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
TrueNAS | TrueNAS: ARC metadata size | MIB: FREENAS-MIB ARC metadata size used in bytes. |
SNMP | truenas.zfs.arc.meta Preprocessing: - MULTIPLIER: |
TrueNAS | TrueNAS: ARC data size | MIB: FREENAS-MIB ARC data size used in bytes. |
SNMP | truenas.zfs.arc.data Preprocessing: - MULTIPLIER: |
TrueNAS | TrueNAS: ARC hits | MIB: FREENAS-MIB Total amount of cache hits in the ARC per second. |
SNMP | truenas.zfs.arc.hits Preprocessing: - CHANGEPERSECOND |
TrueNAS | TrueNAS: ARC misses | MIB: FREENAS-MIB Total amount of cache misses in the ARC per second. |
SNMP | truenas.zfs.arc.misses Preprocessing: - CHANGEPERSECOND |
TrueNAS | TrueNAS: ARC target size of cache | MIB: FREENAS-MIB ARC target size of cache in bytes. |
SNMP | truenas.zfs.arc.c Preprocessing: - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
TrueNAS | TrueNAS: ARC target size of MRU | MIB: FREENAS-MIB ARC target size of MRU in bytes. |
SNMP | truenas.zfs.arc.p Preprocessing: - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
TrueNAS | TrueNAS: ARC cache hit ratio | MIB: FREENAS-MIB ARC cache hit ration percentage. |
SNMP | truenas.zfs.arc.hit.ratio |
TrueNAS | TrueNAS: ARC cache miss ratio | MIB: FREENAS-MIB ARC cache miss ration percentage. |
SNMP | truenas.zfs.arc.miss.ratio |
TrueNAS | TrueNAS: L2ARC hits | MIB: FREENAS-MIB Hits to the L2 cache per second. |
SNMP | truenas.zfs.l2arc.hits Preprocessing: - CHANGEPERSECOND |
TrueNAS | TrueNAS: L2ARC misses | MIB: FREENAS-MIB Misses to the L2 cache per second. |
SNMP | truenas.zfs.l2arc.misses Preprocessing: - CHANGEPERSECOND |
TrueNAS | TrueNAS: L2ARC read rate | MIB: FREENAS-MIB Read rate from L2 cache in bytes per second. |
SNMP | truenas.zfs.l2arc.read Preprocessing: - CHANGEPERSECOND |
TrueNAS | TrueNAS: L2ARC write rate | MIB: FREENAS-MIB Write rate from L2 cache in bytes per second. |
SNMP | truenas.zfs.l2arc.write Preprocessing: - CHANGEPERSECOND |
TrueNAS | TrueNAS: L2ARC size | MIB: FREENAS-MIB L2ARC size in bytes. |
SNMP | truenas.zfs.l2arc.size Preprocessing: - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
TrueNAS | TrueNAS: ZIL operations 1 second | MIB: FREENAS-MIB The ops column parsed from the command zilstat 1 1. |
SNMP | truenas.zfs.zil.ops1 |
TrueNAS | TrueNAS: ZIL operations 5 seconds | MIB: FREENAS-MIB The ops column parsed from the command zilstat 5 1. |
SNMP | truenas.zfs.zil.ops5 |
TrueNAS | TrueNAS: ZIL operations 10 seconds | MIB: FREENAS-MIB The ops column parsed from the command zilstat 10 1. |
SNMP | truenas.zfs.zil.ops10 |
TrueNAS | TrueNAS: Pool [{#POOLNAME}]: Total space | MIB: FREENAS-MIB The size of the storage pool in bytes. |
SNMP | truenas.zpool.size.total[{#POOLNAME}] Preprocessing: - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
TrueNAS | TrueNAS: Pool [{#POOLNAME}]: Used space | MIB: FREENAS-MIB The used size of the storage pool in bytes. |
SNMP | truenas.zpool.used[{#POOLNAME}] Preprocessing: - MULTIPLIER: |
TrueNAS | TrueNAS: Pool [{#POOLNAME}]: Available space | MIB: FREENAS-MIB The available size of the storage pool in bytes. |
SNMP | truenas.zpool.avail[{#POOLNAME}] Preprocessing: - MULTIPLIER: |
TrueNAS | TrueNAS: Pool [{#POOLNAME}]: Usage in % | The used size of the storage pool in %. |
CALCULATED | truenas.zpool.pused[{#POOLNAME}] Expression: last(//truenas.zpool.used[{#POOLNAME}]) * 100 / last(//truenas.zpool.size.total[{#POOLNAME}]) |
TrueNAS | TrueNAS: Pool [{#POOLNAME}]: Health | MIB: FREENAS-MIB The current health of the containing pool, as reported by zpool status. |
SNMP | truenas.zpool.health[{#POOLNAME}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
TrueNAS | TrueNAS: Pool [{#POOLNAME}]: Read operations rate | MIB: FREENAS-MIB The number of read I/O operations sent to the pool or device, including metadata requests (averaged since system booted). |
SNMP | truenas.zpool.read.ops[{#POOLNAME}] Preprocessing: - CHANGEPERSECOND |
TrueNAS | TrueNAS: Pool [{#POOLNAME}]: Write operations rate | MIB: FREENAS-MIB The number of write I/O operations sent to the pool or device (averaged since system booted). |
SNMP | truenas.zpool.write.ops[{#POOLNAME}] Preprocessing: - CHANGEPERSECOND |
TrueNAS | TrueNAS: Pool [{#POOLNAME}]: Read rate | MIB: FREENAS-MIB The bandwidth of all read operations (including metadata), expressed as units per second (averaged since system booted). |
SNMP | truenas.zpool.read.bytes[{#POOLNAME}] Preprocessing: - MULTIPLIER: - CHANGEPERSECOND |
TrueNAS | TrueNAS: Pool [{#POOLNAME}]: Write rate | MIB: FREENAS-MIB The bandwidth of all write operations, expressed as units per second (averaged since system booted). |
SNMP | truenas.zpool.write.bytes[{#POOLNAME}] Preprocessing: - MULTIPLIER: - CHANGEPERSECOND |
TrueNAS | TrueNAS: Dataset [{#DATASET_NAME}]: Total space | MIB: FREENAS-MIB The size of the dataset in bytes. |
SNMP | truenas.dataset.size.total[{#DATASETNAME}] Preprocessing: - MULTIPLIER: - DISCARD UNCHANGED_HEARTBEAT:1h |
TrueNAS | TrueNAS: Dataset [{#DATASET_NAME}]: Used space | MIB: FREENAS-MIB The used size of the dataset in bytes. |
SNMP | truenas.dataset.used[{#DATASET_NAME}] Preprocessing: - MULTIPLIER: |
TrueNAS | TrueNAS: Dataset [{#DATASET_NAME}]: Available space | MIB: FREENAS-MIB The available size of the dataset in bytes. |
SNMP | truenas.dataset.avail[{#DATASET_NAME}] Preprocessing: - MULTIPLIER: |
TrueNAS | TrueNAS: Dataset [{#DATASET_NAME}]: Usage in % | The used size of the dataset in %. |
CALCULATED | truenas.dataset.pused[{#DATASET_NAME}] Expression: last(//truenas.dataset.used[{#DATASET_NAME}]) * 100 / last(//truenas.dataset.size.total[{#DATASET_NAME}]) |
TrueNAS | TrueNAS: ZFS volume [{#ZVOL_NAME}]: Total space | MIB: FREENAS-MIB The size of the ZFS volume in bytes. |
SNMP | truenas.zvol.size.total[{#ZVOLNAME}] Preprocessing: - MULTIPLIER: - DISCARD UNCHANGED_HEARTBEAT:1h |
TrueNAS | TrueNAS: ZFS volume [{#ZVOL_NAME}]: Used space | MIB: FREENAS-MIB The used size of the ZFS volume in bytes. |
SNMP | truenas.zvol.used[{#ZVOL_NAME}] Preprocessing: - MULTIPLIER: |
TrueNAS | TrueNAS: ZFS volume [{#ZVOL_NAME}]: Available space | MIB: FREENAS-MIB The available of the ZFS volume in bytes. |
SNMP | truenas.zvol.avail[{#ZVOL_NAME}] Preprocessing: - MULTIPLIER: |
TrueNAS | TrueNAS: Disk [{#DISK_NAME}]: Temperature | MIB: FREENAS-MIB The temperature of this HDD in mC. |
SNMP | truenas.disk.temp[{#DISKNAME}] Preprocessing: - MULTIPLIER: - DISCARD UNCHANGED_HEARTBEAT:1h |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
TrueNAS: Load average is too high | Per CPU load average is too high. Your system may be slow to respond. |
min(/TrueNAS by SNMP/system.cpu.load.avg1,5m)/last(/TrueNAS by SNMP/system.cpu.num)>{$LOAD_AVG_PER_CPU.MAX.WARN} and last(/TrueNAS by SNMP/system.cpu.load.avg5)>0 and last(/TrueNAS by SNMP/system.cpu.load.avg15)>0 |
AVERAGE | |
TrueNAS: High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/TrueNAS by SNMP/system.cpu.util[{#SNMPINDEX}],5m)>{$CPU.UTIL.CRIT} |
WARNING | Depends on: - TrueNAS: Load average is too high |
TrueNAS: System name has changed | The name of the system has changed. Ack to close the problem manually. |
last(/TrueNAS by SNMP/system.name,#1)<>last(/TrueNAS by SNMP/system.name,#2) and length(last(/TrueNAS by SNMP/system.name))>0 |
INFO | Manual close: YES |
TrueNAS: Lack of available memory | The system is running out of memory. |
min(/TrueNAS by SNMP/vm.memory.available,5m)<{$MEMORY.AVAILABLE.MIN} and last(/TrueNAS by SNMP/vm.memory.total)>0 |
AVERAGE | |
TrueNAS: High memory utilization | The system is running out of free memory. |
min(/TrueNAS by SNMP/vm.memory.util,5m)>{$MEMORY.UTIL.MAX} |
AVERAGE | Depends on: - TrueNAS: Lack of available memory |
TrueNAS: High swap space usage | This trigger is ignored, if there is no swap configured. |
min(/TrueNAS by SNMP/system.swap.pfree,5m)<{$SWAP.PFREE.MIN.WARN} and last(/TrueNAS by SNMP/system.swap.total)>0 |
WARNING | Depends on: - TrueNAS: High memory utilization - TrueNAS: Lack of available memory |
TrueNAS: Interface [{#IFNAME}({#IFALIAS})]: High input error rate | Recovers when below 80% of {$IF.ERRORS.WARN:"{#IFNAME}"} threshold. |
min(/TrueNAS by SNMP/net.if.in.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} Recovery expression: max(/TrueNAS by SNMP/net.if.in.errors[{#SNMPINDEX}],5m)<{$IF.ERRORS.WARN:"{#IFNAME}"}*0.8 |
WARNING | Depends on: - TrueNAS: Interface [{#IFNAME}({#IFALIAS})]: Link down |
TrueNAS: Interface [{#IFNAME}({#IFALIAS})]: High inbound bandwidth usage | The network interface utilization is close to its estimated maximum bandwidth. |
(avg(/TrueNAS by SNMP/net.if.in[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/TrueNAS by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/TrueNAS by SNMP/net.if.speed[{#SNMPINDEX}])>0 Recovery expression: avg(/TrueNAS by SNMP/net.if.in[{#SNMPINDEX}],15m)<(({$IF.UTIL.MAX:"{#IFNAME}"}-3)/100)*last(/TrueNAS by SNMP/net.if.speed[{#SNMPINDEX}]) |
WARNING | Depends on: - TrueNAS: Interface [{#IFNAME}({#IFALIAS})]: Link down |
TrueNAS: Interface [{#IFNAME}({#IFALIAS})]: High output error rate | Recovers when below 80% of {$IF.ERRORS.WARN:"{#IFNAME}"} threshold. |
min(/TrueNAS by SNMP/net.if.out.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} Recovery expression: max(/TrueNAS by SNMP/net.if.out.errors[{#SNMPINDEX}],5m)<{$IF.ERRORS.WARN:"{#IFNAME}"}*0.8 |
WARNING | Depends on: - TrueNAS: Interface [{#IFNAME}({#IFALIAS})]: Link down |
TrueNAS: Interface [{#IFNAME}({#IFALIAS})]: High outbound bandwidth usage | The network interface utilization is close to its estimated maximum bandwidth. |
(avg(/TrueNAS by SNMP/net.if.out[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/TrueNAS by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/TrueNAS by SNMP/net.if.speed[{#SNMPINDEX}])>0 Recovery expression: avg(/TrueNAS by SNMP/net.if.out[{#SNMPINDEX}],15m)<(({$IF.UTIL.MAX:"{#IFNAME}"}-3)/100)*last(/TrueNAS by SNMP/net.if.speed[{#SNMPINDEX}]) |
WARNING | Depends on: - TrueNAS: Interface [{#IFNAME}({#IFALIAS})]: Link down |
TrueNAS: Interface [{#IFNAME}({#IFALIAS})]: Ethernet has changed to lower speed than it was before | This Ethernet connection has transitioned down from its known maximum speed. This might be a sign of autonegotiation issues. Ack to close. |
change(/TrueNAS by SNMP/net.if.speed[{#SNMPINDEX}])<0 and last(/TrueNAS by SNMP/net.if.speed[{#SNMPINDEX}])>0 and ( last(/TrueNAS by SNMP/net.if.type[{#SNMPINDEX}])=6 or last(/TrueNAS by SNMP/net.if.type[{#SNMPINDEX}])=7 or last(/TrueNAS by SNMP/net.if.type[{#SNMPINDEX}])=11 or last(/TrueNAS by SNMP/net.if.type[{#SNMPINDEX}])=62 or last(/TrueNAS by SNMP/net.if.type[{#SNMPINDEX}])=69 or last(/TrueNAS by SNMP/net.if.type[{#SNMPINDEX}])=117 ) and (last(/TrueNAS by SNMP/net.if.status[{#SNMPINDEX}])<>2) Recovery expression: (change(/TrueNAS by SNMP/net.if.speed[{#SNMPINDEX}])>0 and last(/TrueNAS by SNMP/net.if.speed[{#SNMPINDEX}],#2)>0) or (last(/TrueNAS by SNMP/net.if.status[{#SNMPINDEX}])=2) |
INFO | Depends on: - TrueNAS: Interface [{#IFNAME}({#IFALIAS})]: Link down |
TrueNAS: Interface [{#IFNAME}({#IFALIAS})]: Link down | This trigger expression works as follows: 1. Can be triggered if operations status is down. 2. {$IFCONTROL:"{#IFNAME}"}=1 - user can redefine Context macro to value - 0. That marks this interface as not important. No new trigger will be fired if this interface is down. |
{$IFCONTROL:"{#IFNAME}"}=1 and (last(/TrueNAS by SNMP/net.if.status[{#SNMPINDEX}])=2) |
AVERAGE | |
TrueNAS: Unavailable by ICMP ping | Last three attempts returned timeout. Please check device connectivity. |
max(/TrueNAS by SNMP/icmpping,#3)=0 |
HIGH | |
TrueNAS: High ICMP ping loss | ICMP packets loss detected. |
min(/TrueNAS by SNMP/icmppingloss,5m)>{$ICMP_LOSS_WARN} and min(/TrueNAS by SNMP/icmppingloss,5m)<100 |
WARNING | Depends on: - TrueNAS: Unavailable by ICMP ping |
TrueNAS: High ICMP ping response time | Average ICMP response time is too big. |
avg(/TrueNAS by SNMP/icmppingsec,5m)>{$ICMP_RESPONSE_TIME_WARN} |
WARNING | Depends on: - TrueNAS: Unavailable by ICMP ping |
TrueNAS: Host has been restarted | The host uptime is less than 10 minutes. |
last(/TrueNAS by SNMP/system.uptime)<10m |
INFO | Manual close: YES |
TrueNAS: No SNMP data collection | SNMP is not available for polling. Please check device connectivity and SNMP settings. |
max(/TrueNAS by SNMP/zabbix[host,snmp,available],{$SNMP.TIMEOUT})=0 |
WARNING | Depends on: - TrueNAS: Unavailable by ICMP ping |
TrueNAS: Pool [{#POOLNAME}]: Very high space usage | Two conditions should match: First, space utilization should be above {$ZPOOL.PUSED.MAX.CRIT:"{#POOLNAME}"}%. Second condition: The pool free space is less than {$ZPOOL.FREE.MIN.CRIT:"{#POOLNAME}"}. |
min(/TrueNAS by SNMP/truenas.zpool.pused[{#POOLNAME}],5m) > {$ZPOOL.PUSED.MAX.CRIT:"{#POOLNAME}"} and last(/TrueNAS by SNMP/truenas.zpool.avail[{#POOLNAME}]) < {$ZPOOL.FREE.MIN.CRIT:"{#POOLNAME}"} |
AVERAGE | |
TrueNAS: Pool [{#POOLNAME}]: High space usage | Two conditions should match: First, space utilization should be above {$ZPOOL.PUSED.MAX.WARN:"{#POOLNAME}"}%. Second condition: The pool free space is less than {$ZPOOL.FREE.MIN.WARN:"{#POOLNAME}"}. |
min(/TrueNAS by SNMP/truenas.zpool.pused[{#POOLNAME}],5m) > {$ZPOOL.PUSED.MAX.WARN:"{#POOLNAME}"} and last(/TrueNAS by SNMP/truenas.zpool.avail[{#POOLNAME}]) < {$ZPOOL.FREE.MIN.WARN:"{#POOLNAME}"} |
WARNING | Depends on: - TrueNAS: Pool [{#POOLNAME}]: Very high space usage |
TrueNAS: Pool [{#POOLNAME}]: Status is not online | Please check pool status. |
last(/TrueNAS by SNMP/truenas.zpool.health[{#POOLNAME}]) <> 0 |
AVERAGE | |
TrueNAS: Dataset [{#DATASET_NAME}]: Very high space usage | Two conditions should match: First, space utilization should be above {$DATASET.PUSED.MAX.CRIT:"{#DATASET_NAME}"}%. Second condition: The dataset free space is less than {$DATASET.FREE.MIN.CRIT:"{#POOLNAME}"}. |
min(/TrueNAS by SNMP/truenas.dataset.pused[{#DATASET_NAME}],5m) > {$DATASET.PUSED.MAX.CRIT:"{#DATASET_NAME}"} and last(/TrueNAS by SNMP/truenas.dataset.avail[{#DATASET_NAME}]) < {$DATASET.FREE.MIN.CRIT:"{#POOLNAME}"} |
AVERAGE | |
TrueNAS: Dataset [{#DATASET_NAME}]: High space usage | Two conditions should match: First, space utilization should be above {$DATASET.PUSED.MAX.WARN:"{#DATASET_NAME}"}%. Second condition: The dataset free space is less than {$DATASET.FREE.MIN.WARN:"{#POOLNAME}"}. |
min(/TrueNAS by SNMP/truenas.dataset.pused[{#DATASET_NAME}],5m) > {$DATASET.PUSED.MAX.WARN:"{#DATASET_NAME}"} and last(/TrueNAS by SNMP/truenas.dataset.avail[{#DATASET_NAME}]) < {$DATASET.FREE.MIN.WARN:"{#POOLNAME}"} |
WARNING | Depends on: - TrueNAS: Dataset [{#DATASET_NAME}]: Very high space usage |
TrueNAS: Disk [{#DISK_NAME}]: Average disk temperature is too high | Disk temperature is high. |
avg(/TrueNAS by SNMP/truenas.disk.temp[{#DISK_NAME}],5m) > {$TEMPERATURE.MAX.CRIT:"{#DISK_NAME}"} |
AVERAGE | |
TrueNAS: Disk [{#DISK_NAME}]: Average disk temperature is too high | Disk temperature is high. |
avg(/TrueNAS by SNMP/truenas.disk.temp[{#DISK_NAME}],5m) > {$TEMPERATURE.MAX.WARN:"{#DISK_NAME}"} |
WARNING |
Please report any issues with the template at https://support.zabbix.com.
For Zabbix version: 6.2 and higher
The template to monitor Travis CI by Zabbix that work without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
This template was tested on:
See Zabbix template operation for basic instructions.
You must set {$TRAVIS.API.TOKEN} and {$TRAVIS.API.URL} macros. {$TRAVIS.API.TOKEN} is a Travis API authentication token located in User -> Settings -> API authentication. {$TRAVIS.API.URL} could be in 2 different variations:
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$TRAVIS.API.TOKEN} | Travis API Token |
`` |
{$TRAVIS.API.URL} | Travis API URL |
api.travis-ci.com |
{$TRAVIS.BUILDS.SUCCESS.PERCENT} | Percent of successful builds in the repo (for trigger expression) |
80 |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Repos metrics discovery | Metrics for Repos statistics. |
DEPENDENT | travis.repos.discovery Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Travis | Travis: Get health | Getting home JSON using Travis API. |
HTTP_AGENT | travis.gethealth Preprocessing: - CHECK NOTSUPPORTED⛔️ON FAIL:CUSTOM_VALUE -> 0 - JAVASCRIPT: |
Travis | Travis: Jobs passed | Total count of passed jobs in all repos. |
DEPENDENT | travis.jobs.total Preprocessing: - JSONPATH: |
Travis | Travis: Jobs active | Active jobs in all repos. |
DEPENDENT | travis.jobs.active Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Travis | Travis: Jobs in queue | Jobs in queue in all repos. |
DEPENDENT | travis.jobs.queue Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Travis | Travis: Builds | Total count of builds in all repos. |
DEPENDENT | travis.builds.total Preprocessing: - JSONPATH: |
Travis | Travis: Builds duration | Sum of all builds durations in all repos. |
DEPENDENT | travis.builds.duration Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Travis | Travis: Repo [{#SLUG}]: Cache files | Count of cache files in {#SLUG} repo. |
DEPENDENT | travis.repo.caches.files[{#SLUG}] Preprocessing: - JSONPATH: |
Travis | Travis: Repo [{#SLUG}]: Cache size | Total size of cache files in {#SLUG} repo. |
DEPENDENT | travis.repo.caches.size[{#SLUG}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Travis | Travis: Repo [{#SLUG}]: Builds passed | Count of all passed builds in {#SLUG} repo. |
DEPENDENT | travis.repo.builds.passed[{#SLUG}] Preprocessing: - JAVASCRIPT: |
Travis | Travis: Repo [{#SLUG}]: Builds failed | Count of all failed builds in {#SLUG} repo. |
DEPENDENT | travis.repo.builds.failed[{#SLUG}] Preprocessing: - JAVASCRIPT: |
Travis | Travis: Repo [{#SLUG}]: Builds total | Count of total builds in {#SLUG} repo. |
DEPENDENT | travis.repo.builds.total[{#SLUG}] Preprocessing: - JSONPATH: |
Travis | Travis: Repo [{#SLUG}]: Builds passed, % | Percent of passed builds in {#SLUG} repo. |
CALCULATED | travis.repo.builds.passed.pct[{#SLUG}] Expression: last(//travis.repo.builds.passed[{#SLUG}])/last(//travis.repo.builds.total[{#SLUG}])*100 |
Travis | Travis: Repo [{#SLUG}]: Description | Description of Travis repo (git project description). |
DEPENDENT | travis.repo.description[{#SLUG}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Travis | Travis: Repo [{#SLUG}]: Last build duration | Last build duration in {#SLUG} repo. |
DEPENDENT | travis.repo.lastbuild.duration[{#SLUG}] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - DISCARDUNCHANGEDHEARTBEAT: |
Travis | Travis: Repo [{#SLUG}]: Last build state | Last build state in {#SLUG} repo. |
DEPENDENT | travis.repo.lastbuild.state[{#SLUG}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
Travis | Travis: Repo [{#SLUG}]: Last build number | Last build number in {#SLUG} repo. |
DEPENDENT | travis.repo.lastbuild.number[{#SLUG}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
Travis | Travis: Repo [{#SLUG}]: Last build id | Last build id in {#SLUG} repo. |
DEPENDENT | travis.repo.lastbuild.id[{#SLUG}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
Zabbix raw items | Travis: Get repos | Getting repos using Travis API. |
HTTP_AGENT | travis.get_repos |
Zabbix raw items | Travis: Get builds | Getting builds using Travis API. |
HTTP_AGENT | travis.get_builds |
Zabbix raw items | Travis: Get jobs | Getting jobs using Travis API. |
HTTP_AGENT | travis.get_jobs |
Zabbix raw items | Travis: Repo [{#SLUG}]: Get builds | Getting builds of {#SLUG} using Travis API. |
HTTP_AGENT | travis.repo.get_builds[{#SLUG}] |
Zabbix raw items | Travis: Repo [{#SLUG}]: Get caches | Getting caches of {#SLUG} using Travis API. |
HTTP_AGENT | travis.repo.get_caches[{#SLUG}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Travis: Service is unavailable | Travis API is unavailable. Please check if the correct macros are set. |
last(/Travis CI by HTTP/travis.get_health)=0 |
HIGH | Manual close: YES |
Travis: Failed to fetch home page | Zabbix has not received data for items for the last 30 minutes. |
nodata(/Travis CI by HTTP/travis.get_health,30m)=1 |
WARNING | Manual close: YES |
Travis: Repo [{#SLUG}]: Percent of successful builds | Low successful builds rate. |
last(/Travis CI by HTTP/travis.repo.builds.passed.pct[{#SLUG}])<{$TRAVIS.BUILDS.SUCCESS.PERCENT} |
WARNING | Manual close: YES |
Travis: Repo [{#SLUG}]: Last build status is 'errored' | Last build status is errored. |
find(/Travis CI by HTTP/travis.repo.last_build.state[{#SLUG}],,"like","errored")=1 |
WARNING | Manual close: YES |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
Official JMX Template for Apache Tomcat.
This template was tested on:
See Zabbix template operation for basic instructions.
Metrics are collected by JMX.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$TOMCAT.LLD.FILTER.MATCHES} | Filter for discoverable objects. Can be used with following contexts: "GlobalRequestProcessor", "ThreadPool", "Manager" |
.* |
{$TOMCAT.LLD.FILTER.NOT_MATCHES} | Filter to exclude discovered objects. Can be used with following contexts: "GlobalRequestProcessor", "ThreadPool", "Manager" |
CHANGE IF NEEDED |
{$TOMCAT.PASSWORD} | Password for JMX |
`` |
{$TOMCAT.THREADS.MAX.PCT} | Threshold for busy worker threads trigger. Can be used with {#JMXNAME} as context. |
75 |
{$TOMCAT.THREADS.MAX.TIME} | The time during which the number of busy threads can exceed the threshold. Can be used with {#JMXNAME} as context. |
5m |
{$TOMCAT.USER} | User for JMX |
`` |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Contexts discovery | Discovery for contexts |
JMX | jmx.discovery[beans,"Catalina:type=Manager,host=,context="] Filter: AND- {#JMXHOST} MATCHESREGEX - {#JMXHOST} NOTMATCHES_REGEX |
Global request processors discovery | Discovery for GlobalRequestProcessor |
JMX | jmx.discovery[beans,"Catalina:type=GlobalRequestProcessor,name=*"] Filter: AND- {#JMXNAME} MATCHESREGEX - {#JMXNAME} NOTMATCHES_REGEX |
Protocol handlers discovery | Discovery for ProtocolHandler |
JMX | jmx.discovery[attributes,"Catalina:type=ProtocolHandler,port=*"] Filter: AND- {#JMXATTR} MATCHES_REGEX |
Thread pools discovery | Discovery for ThreadPool |
JMX | jmx.discovery[beans,"Catalina:type=ThreadPool,name=*"] Filter: AND- {#JMXNAME} MATCHESREGEX - {#JMXNAME} NOTMATCHES_REGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Tomcat | Tomcat: Version | The version of the Tomcat. |
JMX | jmx["Catalina:type=Server",serverInfo] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Tomcat | {#JMXNAME}: Bytes received per second | Bytes received rate by processor {#JMXNAME} |
JMX | jmx[{#JMXOBJ},bytesReceived] Preprocessing: - CHANGEPERSECOND |
Tomcat | {#JMXNAME}: Bytes sent per second | Bytes sent rate by processor {#JMXNAME} |
JMX | jmx[{#JMXOBJ},bytesSent] Preprocessing: - CHANGEPERSECOND |
Tomcat | {#JMXNAME}: Errors per second | Error rate of request processor {#JMXNAME} |
JMX | jmx[{#JMXOBJ},errorCount] Preprocessing: - CHANGEPERSECOND |
Tomcat | {#JMXNAME}: Requests per second | Rate of requests served by request processor {#JMXNAME} |
JMX | jmx[{#JMXOBJ},requestCount] Preprocessing: - CHANGEPERSECOND |
Tomcat | {#JMXNAME}: Requests processing time | The total time to process all incoming requests of request processor {#JMXNAME} |
JMX | jmx[{#JMXOBJ},processingTime] Preprocessing: - MULTIPLIER: |
Tomcat | {#JMXVALUE}: Gzip compression status | Gzip compression status on {#JMXNAME}. Enabling gzip compression may save server bandwidth. |
JMX | jmx[{#JMXOBJ},compression] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Tomcat | {#JMXNAME}: Threads count | Amount of threads the thread pool has right now, both busy and free. |
JMX | jmx[{#JMXOBJ},currentThreadCount] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Tomcat | {#JMXNAME}: Threads limit | Limit of the threads count. When currentThreadsBusy counter reaches the maxThreads limit, no more requests could be handled, and the application chokes. |
JMX | jmx[{#JMXOBJ},maxThreads] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Tomcat | {#JMXNAME}: Threads busy | Number of the requests that are being currently handled. |
JMX | jmx[{#JMXOBJ},currentThreadsBusy] |
Tomcat | {#JMXHOST}{#JMXCONTEXT}: Sessions active | Active sessions of the application. |
JMX | jmx[{#JMXOBJ},activeSessions] |
Tomcat | {#JMXHOST}{#JMXCONTEXT}: Sessions active maximum so far | Maximum number of active sessions so far. |
JMX | jmx[{#JMXOBJ},maxActive] |
Tomcat | {#JMXHOST}{#JMXCONTEXT}: Sessions created per second | Rate of sessions created by this application per second. |
JMX | jmx[{#JMXOBJ},sessionCounter] Preprocessing: - CHANGEPERSECOND |
Tomcat | {#JMXHOST}{#JMXCONTEXT}: Sessions rejected per second | Rate of sessions we rejected due to maxActive being reached. |
JMX | jmx[{#JMXOBJ},rejectedSessions] Preprocessing: - CHANGEPERSECOND |
Tomcat | {#JMXHOST}{#JMXCONTEXT}: Sessions allowed maximum | The maximum number of active Sessions allowed, or -1 for no limit. |
JMX | jmx[{#JMXOBJ},maxActiveSessions] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Tomcat: Version has been changed | Tomcat version has changed. Ack to close. |
last(/Apache Tomcat by JMX/jmx["Catalina:type=Server",serverInfo],#1)<>last(/Apache Tomcat by JMX/jmx["Catalina:type=Server",serverInfo],#2) and length(last(/Apache Tomcat by JMX/jmx["Catalina:type=Server",serverInfo]))>0 |
INFO | Manual close: YES |
{#JMXVALUE}: Gzip compression is disabled | gzip compression is disabled for connector {#JMXVALUE}. |
find(/Apache Tomcat by JMX/jmx[{#JMXOBJ},compression],,"like","off") = 1 |
INFO | Manual close: YES |
{#JMXNAME}: Busy worker threads count is high | When current threads busy counter reaches the limit, no more requests could be handled, and the application chokes. |
min(/Apache Tomcat by JMX/jmx[{#JMXOBJ},currentThreadsBusy],{$TOMCAT.THREADS.MAX.TIME:"{#JMXNAME}"})>last(/Apache Tomcat by JMX/jmx[{#JMXOBJ},maxThreads])*{$TOMCAT.THREADS.MAX.PCT:"{#JMXNAME}"}/100 |
HIGH |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
Refer to the vendor documentation.
No specific Zabbix configuration is required.
There are no template links in this template.
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Services | Telnet service is running | - |
SIMPLE | net.tcp.service[telnet] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Telnet service is down on {HOST.NAME} | - |
max(/Telnet Service/net.tcp.service[telnet],#3)=0 |
AVERAGE |
Please report any issues with the template at https://support.zabbix.com
For Zabbix version: 6.2 and higher
The template to monitor systemd units.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Systemd by Zabbix agent 2
— collects metrics by polling zabbix-agent2.
This template was tested on:
See Zabbix template operation for basic instructions.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$SYSTEMD.ACTIVESTATE.SERVICE.MATCHES} | Filter of systemd service units by active state |
active |
{$SYSTEMD.ACTIVESTATE.SERVICE.NOT_MATCHES} | Filter of systemd service units by active state |
CHANGE_IF_NEEDED |
{$SYSTEMD.ACTIVESTATE.SOCKET.MATCHES} | Filter of systemd socket units by active state |
active |
{$SYSTEMD.ACTIVESTATE.SOCKET.NOT_MATCHES} | Filter of systemd socket units by active state |
CHANGE_IF_NEEDED |
{$SYSTEMD.NAME.SERVICE.MATCHES} | Filter of systemd service units by name |
.* |
{$SYSTEMD.NAME.SERVICE.NOT_MATCHES} | Filter of systemd service units by name |
CHANGE_IF_NEEDED |
{$SYSTEMD.NAME.SOCKET.MATCHES} | Filter of systemd socket units by name |
.* |
{$SYSTEMD.NAME.SOCKET.NOT_MATCHES} | Filter of systemd socket units by name |
CHANGE_IF_NEEDED |
{$SYSTEMD.UNITFILESTATE.SERVICE.MATCHES} | Filter of systemd service units by unit file state |
enabled |
{$SYSTEMD.UNITFILESTATE.SERVICE.NOT_MATCHES} | Filter of systemd service units by unit file state |
CHANGE_IF_NEEDED |
{$SYSTEMD.UNITFILESTATE.SOCKET.MATCHES} | Filter of systemd socket units by unit file state |
enabled |
{$SYSTEMD.UNITFILESTATE.SOCKET.NOT_MATCHES} | Filter of systemd socket units by unit file state |
CHANGE_IF_NEEDED |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Service units discovery | Discover systemd service units and their details. |
ZABBIX_PASSIVE | systemd.unit.discovery[service] Filter: AND- {#UNIT.ACTIVESTATE} MATCHESREGEX - {#UNIT.ACTIVESTATE} NOTMATCHESREGEX - {#UNIT.UNITFILESTATE} MATCHESREGEX - {#UNIT.UNITFILESTATE} NOTMATCHESREGEX - {#UNIT.NAME} NOTMATCHESREGEX - {#UNIT.NAME} MATCHES_REGEX |
Socket units discovery | Discover systemd socket units and their details. |
ZABBIX_PASSIVE | systemd.unit.discovery[socket] Filter: AND- {#UNIT.ACTIVESTATE} MATCHESREGEX - {#UNIT.ACTIVESTATE} NOTMATCHESREGEX - {#UNIT.UNITFILESTATE} MATCHESREGEX - {#UNIT.UNITFILESTATE} NOTMATCHESREGEX - {#UNIT.NAME} NOTMATCHESREGEX - {#UNIT.NAME} MATCHES_REGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Systemd | {#UNIT.NAME}: Active state | State value that reflects whether the unit is currently active or not. The following states are currently defined: "active", "reloading", "inactive", "failed", "activating", and "deactivating". |
DEPENDENT | systemd.service.activestate["{#UNIT.NAME}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:30m |
Systemd | {#UNIT.NAME}: Load state | State value that reflects whether the configuration file of this unit has been loaded. The following states are currently defined: "loaded", "error", and "masked". |
DEPENDENT | systemd.service.loadstate["{#UNIT.NAME}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:30m |
Systemd | {#UNIT.NAME}: Unit file state | Encodes the install state of the unit file of FragmentPath. It currently knows the following states: "enabled", "enabled-runtime", "linked", "linked-runtime", "masked", "masked-runtime", "static", "disabled", and "invalid". |
DEPENDENT | systemd.service.unitfilestate["{#UNIT.NAME}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:30m |
Systemd | {#UNIT.NAME}: Active time | Number of seconds since unit entered the active state. |
DEPENDENT | systemd.service.uptime["{#UNIT.NAME}"] Preprocessing: - JAVASCRIPT: |
Systemd | {#UNIT.NAME}: Connections accepted per sec | The number of accepted socket connections (NAccepted) per second. |
DEPENDENT | systemd.socket.connaccepted.rate["{#UNIT.NAME}"] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Systemd | {#UNIT.NAME}: Connections connected | The current number of socket connections (NConnections). |
DEPENDENT | systemd.socket.conn_count["{#UNIT.NAME}"] Preprocessing: - JSONPATH: |
Zabbix raw items | {#UNIT.NAME}: Get unit info | Returns all properties of a systemd service unit. Unit description: {#UNIT.DESCRIPTION}. |
ZABBIX_PASSIVE | systemd.unit.get["{#UNIT.NAME}"] |
Zabbix raw items | {#UNIT.NAME}: Get unit info | Returns all properties of a systemd socket unit. Unit description: {#UNIT.DESCRIPTION}. |
ZABBIX_PASSIVE | systemd.unit.get["{#UNIT.NAME}",Socket] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
{#UNIT.NAME}: Service is not running | - |
last(/Systemd by Zabbix agent 2/systemd.service.active_state["{#UNIT.NAME}"])<>1 |
WARNING | Manual close: YES |
{#UNIT.NAME}: has been restarted | Uptime is less than 10 minutes. |
last(/Systemd by Zabbix agent 2/systemd.service.uptime["{#UNIT.NAME}"])<10m |
INFO | Manual close: YES |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
Refer to the vendor documentation.
No specific Zabbix configuration is required.
There are no template links in this template.
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Services | SSH service is running | - |
SIMPLE | net.tcp.service[ssh] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
SSH service is down on {HOST.NAME} | - |
max(/SSH Service/net.tcp.service[ssh],#3)=0 |
AVERAGE |
Please report any issues with the template at https://support.zabbix.com
For Zabbix version: 6.2 and higher.
This template was tested on:
See Zabbix template operation for basic instructions.
Enable SNMP support following official documentation. Required parameters in squid.conf:
snmp_port <port_number>
acl <zbx_acl_name> snmp_community <community_name>
snmp_access allow <zbx_acl_name> <zabbix_server_ip>
1. Import the template templateappsquid_snmp.yaml into Zabbix.
2. Set values for {$SQUID.SNMP.COMMUNITY}, {$SQUID.SNMP.PORT} and {$SQUID.HTTP.PORT} as configured in squid.conf.
3. Link the imported template to a host with Squid.
4. Add SNMPv2 interface to Squid host. Set Port as {$SQUID.SNMP.PORT} and SNMP community as {$SQUID.SNMP.COMMUNITY}.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$SQUID.FILE.DESC.WARN.MIN} | The threshold for minimum number of available file descriptors |
100 |
{$SQUID.HTTP.PORT} | http_port configured in squid.conf (Default: 3128) |
3128 |
{$SQUID.PAGE.FAULT.WARN} | The threshold for sys page faults rate in percent of received HTTP requests |
90 |
{$SQUID.SNMP.COMMUNITY} | SNMP community allowed by ACL in squid.conf |
public |
{$SQUID.SNMP.PORT} | snmp_port configured in squid.conf (Default: 3401) |
3401 |
There are no template links in this template.
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Squid | Squid: Service ping | - |
SIMPLE | net.tcp.service[tcp,,{$SQUID.HTTP.PORT}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Squid | Squid: Uptime | The Uptime of the cache in timeticks (in hundredths of a second) with preprocessing |
SNMP | squid[cacheUptime] Preprocessing: - MULTIPLIER: |
Squid | Squid: Version | Cache Software Version |
SNMP | squid[cacheVersionId] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Squid | Squid: CPU usage | The percentage use of the CPU |
SNMP | squid[cacheCpuUsage] |
Squid | Squid: Memory maximum resident size | Maximum Resident Size |
SNMP | squid[cacheMaxResSize] Preprocessing: - MULTIPLIER: |
Squid | Squid: Memory maximum cache size | The value of the cache_mem parameter |
SNMP | squid[cacheMemMaxSize] Preprocessing: - MULTIPLIER: |
Squid | Squid: Memory cache usage | Total accounted memory |
SNMP | squid[cacheMemUsage] Preprocessing: - MULTIPLIER: |
Squid | Squid: Cache swap low water mark | Cache Swap Low Water Mark |
SNMP | squid[cacheSwapLowWM] |
Squid | Squid: Cache swap high water mark | Cache Swap High Water Mark |
SNMP | squid[cacheSwapHighWM] |
Squid | Squid: Cache swap directory size | The total of the cache_dir space allocated |
SNMP | squid[cacheSwapMaxSize] Preprocessing: - MULTIPLIER: |
Squid | Squid: Cache swap current size | Storage Swap Size |
SNMP | squid[cacheCurrentSwapSize] |
Squid | Squid: File descriptor count - current used | Number of file descriptors in use |
SNMP | squid[cacheCurrentFileDescrCnt] |
Squid | Squid: File descriptor count - current maximum | Highest number of file descriptors in use |
SNMP | squid[cacheCurrentFileDescrMax] |
Squid | Squid: File descriptor count - current reserved | Reserved number of file descriptors |
SNMP | squid[cacheCurrentResFileDescrCnt] |
Squid | Squid: File descriptor count - current available | Available number of file descriptors |
SNMP | squid[cacheCurrentUnusedFDescrCnt] |
Squid | Squid: Byte hit ratio per 1 minute | Byte Hit Ratios |
SNMP | squid[cacheRequestByteRatio.1] |
Squid | Squid: Byte hit ratio per 5 minutes | Byte Hit Ratios |
SNMP | squid[cacheRequestByteRatio.5] |
Squid | Squid: Byte hit ratio per 1 hour | Byte Hit Ratios |
SNMP | squid[cacheRequestByteRatio.60] |
Squid | Squid: Request hit ratio per 1 minute | Byte Hit Ratios |
SNMP | squid[cacheRequestHitRatio.1] |
Squid | Squid: Request hit ratio per 5 minutes | Byte Hit Ratios |
SNMP | squid[cacheRequestHitRatio.5] |
Squid | Squid: Request hit ratio per 1 hour | Byte Hit Ratios |
SNMP | squid[cacheRequestHitRatio.60] |
Squid | Squid: Sys page faults per second | Page faults with physical I/O |
SNMP | squid[cacheSysPageFaults] Preprocessing: - CHANGEPERSECOND |
Squid | Squid: HTTP requests received per second | Number of HTTP requests received |
SNMP | squid[cacheProtoClientHttpRequests] Preprocessing: - CHANGEPERSECOND |
Squid | Squid: HTTP traffic received per second | Number of HTTP traffic received from clients |
SNMP | squid[cacheHttpInKb] Preprocessing: - MULTIPLIER: - CHANGEPERSECOND |
Squid | Squid: HTTP traffic sent per second | Number of HTTP traffic sent to clients |
SNMP | squid[cacheHttpOutKb] Preprocessing: - MULTIPLIER: - CHANGEPERSECOND |
Squid | Squid: HTTP Hits sent from cache per second | Number of HTTP Hits sent to clients from cache |
SNMP | squid[cacheHttpHits] Preprocessing: - CHANGEPERSECOND |
Squid | Squid: HTTP Errors sent per second | Number of HTTP Errors sent to clients |
SNMP | squid[cacheHttpErrors] Preprocessing: - CHANGEPERSECOND |
Squid | Squid: ICP messages sent per second | Number of ICP messages sent |
SNMP | squid[cacheIcpPktsSent] Preprocessing: - CHANGEPERSECOND |
Squid | Squid: ICP messages received per second | Number of ICP messages received |
SNMP | squid[cacheIcpPktsRecv] Preprocessing: - CHANGEPERSECOND |
Squid | Squid: ICP traffic transmitted per second | Number of ICP traffic transmitted |
SNMP | squid[cacheIcpKbSent] Preprocessing: - MULTIPLIER: - CHANGEPERSECOND |
Squid | Squid: ICP traffic received per second | Number of ICP traffic received |
SNMP | squid[cacheIcpKbRecv] Preprocessing: - MULTIPLIER: - CHANGEPERSECOND |
Squid | Squid: DNS server requests per second | Number of external dns server requests |
SNMP | squid[cacheDnsRequests] Preprocessing: - CHANGEPERSECOND |
Squid | Squid: DNS server replies per second | Number of external dns server replies |
SNMP | squid[cacheDnsReplies] Preprocessing: - CHANGEPERSECOND |
Squid | Squid: FQDN cache requests per second | Number of FQDN Cache requests |
SNMP | squid[cacheFqdnRequests] Preprocessing: - CHANGEPERSECOND |
Squid | Squid: FQDN cache hits per second | Number of FQDN Cache hits |
SNMP | squid[cacheFqdnHits] Preprocessing: - CHANGEPERSECOND |
Squid | Squid: FQDN cache misses per second | Number of FQDN Cache misses |
SNMP | squid[cacheFqdnMisses] Preprocessing: - CHANGEPERSECOND |
Squid | Squid: IP cache requests per second | Number of IP Cache requests |
SNMP | squid[cacheIpRequests] Preprocessing: - CHANGEPERSECOND |
Squid | Squid: IP cache hits per second | Number of IP Cache hits |
SNMP | squid[cacheIpHits] Preprocessing: - CHANGEPERSECOND |
Squid | Squid: IP cache misses per second | Number of IP Cache misses |
SNMP | squid[cacheIpMisses] Preprocessing: - CHANGEPERSECOND |
Squid | Squid: Objects count | Number of objects stored by the cache |
SNMP | squid[cacheNumObjCount] |
Squid | Squid: Objects LRU expiration age | Storage LRU Expiration Age |
SNMP | squid[cacheCurrentLRUExpiration] Preprocessing: - MULTIPLIER: |
Squid | Squid: Objects unlinkd requests | Requests given to unlinkd |
SNMP | squid[cacheCurrentUnlinkRequests] |
Squid | Squid: HTTP all service time per 5 minutes | HTTP all service time per 5 minutes |
SNMP | squid[cacheHttpAllSvcTime.5] Preprocessing: - MULTIPLIER: |
Squid | Squid: HTTP all service time per hour | HTTP all service time per hour |
SNMP | squid[cacheHttpAllSvcTime.60] Preprocessing: - MULTIPLIER: |
Squid | Squid: HTTP miss service time per 5 minutes | HTTP miss service time per 5 minutes |
SNMP | squid[cacheHttpMissSvcTime.5] Preprocessing: - MULTIPLIER: |
Squid | Squid: HTTP miss service time per hour | HTTP miss service time per hour |
SNMP | squid[cacheHttpMissSvcTime.60] Preprocessing: - MULTIPLIER: |
Squid | Squid: HTTP hit service time per 5 minutes | HTTP hit service time per 5 minutes |
SNMP | squid[cacheHttpHitSvcTime.5] Preprocessing: - MULTIPLIER: |
Squid | Squid: HTTP hit service time per hour | HTTP hit service time per hour |
SNMP | squid[cacheHttpHitSvcTime.60] Preprocessing: - MULTIPLIER: |
Squid | Squid: ICP query service time per 5 minutes | ICP query service time per 5 minutes |
SNMP | squid[cacheIcpQuerySvcTime.5] Preprocessing: - MULTIPLIER: |
Squid | Squid: ICP query service time per hour | ICP query service time per hour |
SNMP | squid[cacheIcpQuerySvcTime.60] Preprocessing: - MULTIPLIER: |
Squid | Squid: ICP reply service time per 5 minutes | ICP reply service time per 5 minutes |
SNMP | squid[cacheIcpReplySvcTime.5] Preprocessing: - MULTIPLIER: |
Squid | Squid: ICP reply service time per hour | ICP reply service time per hour |
SNMP | squid[cacheIcpReplySvcTime.60] Preprocessing: - MULTIPLIER: |
Squid | Squid: DNS service time per 5 minutes | DNS service time per 5 minutes |
SNMP | squid[cacheDnsSvcTime.5] Preprocessing: - MULTIPLIER: |
Squid | Squid: DNS service time per hour | DNS service time per hour |
SNMP | squid[cacheDnsSvcTime.60] Preprocessing: - MULTIPLIER: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Squid: Port {$SQUID.HTTP.PORT} is down | - |
last(/Squid by SNMP/net.tcp.service[tcp,,{$SQUID.HTTP.PORT}])=0 |
AVERAGE | Manual close: YES |
Squid: Squid has been restarted | Uptime is less than 10 minutes. |
last(/Squid by SNMP/squid[cacheUptime])<10m |
INFO | Manual close: YES |
Squid: Squid version has been changed | Squid version has changed. Ack to close. |
last(/Squid by SNMP/squid[cacheVersionId],#1)<>last(/Squid by SNMP/squid[cacheVersionId],#2) and length(last(/Squid by SNMP/squid[cacheVersionId]))>0 |
INFO | Manual close: YES |
Squid: Swap usage is more than low watermark | - |
last(/Squid by SNMP/squid[cacheCurrentSwapSize])>last(/Squid by SNMP/squid[cacheSwapLowWM])*last(/Squid by SNMP/squid[cacheSwapMaxSize])/100 |
WARNING | |
Squid: Swap usage is more than high watermark | - |
last(/Squid by SNMP/squid[cacheCurrentSwapSize])>last(/Squid by SNMP/squid[cacheSwapHighWM])*last(/Squid by SNMP/squid[cacheSwapMaxSize])/100 |
HIGH | |
Squid: Squid is running out of file descriptors | - |
last(/Squid by SNMP/squid[cacheCurrentUnusedFDescrCnt])<{$SQUID.FILE.DESC.WARN.MIN} |
WARNING | |
Squid: High sys page faults rate | - |
avg(/Squid by SNMP/squid[cacheSysPageFaults],5m)>avg(/Squid by SNMP/squid[cacheProtoClientHttpRequests],5m)/100*{$SQUID.PAGE.FAULT.WARN} |
WARNING |
Please report any issues with the template at https://support.zabbix.com.
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums.
For Zabbix version: 6.2 and higher
Refer to the vendor documentation.
No specific Zabbix configuration is required.
There are no template links in this template.
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Services | SMTP service is running | - |
SIMPLE | net.tcp.service[smtp] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
SMTP service is down on {HOST.NAME} | - |
max(/SMTP Service/net.tcp.service[smtp],#3)=0 |
AVERAGE |
Please report any issues with the template at https://support.zabbix.com
For Zabbix version: 6.2 and higher
SharePoint includes a Representational State Transfer (REST) service. Developers can perform read operations from their SharePoint Add-ins, solutions, and client applications, using REST web technologies and standard Open Data Protocol (OData) syntax. Details in
https://docs.microsoft.com/ru-ru/sharepoint/dev/sp-add-ins/get-to-know-the-sharepoint-rest-service?tabs=csom
This template was tested on:
See Zabbix template operation for basic instructions.
Create a new host. Define macros according to your Sharepoint web portal. It is recommended to fill in the values of the filter macros to avoid getting redundant data.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$SHAREPOINT.GET_INTERVAL} | - |
1m |
{$SHAREPOINT.LLD.FILTER.FULL_PATH.MATCHES} | Filter of discoverable dictionaries by full path. |
^/ |
{$SHAREPOINT.LLD.FILTER.FULLPATH.NOTMATCHES} | Filter to exclude discovered dictionaries by full path. |
CHANGE_IF_NEEDED |
{$SHAREPOINT.LLD.FILTER.NAME.MATCHES} | Filter of discoverable dictionaries by name. |
.* |
{$SHAREPOINT.LLD.FILTER.NAME.NOT_MATCHES} | Filter to exclude discovered dictionaries by name. |
CHANGE_IF_NEEDED |
{$SHAREPOINT.LLD.FILTER.TYPE.MATCHES} | Filter of discoverable types. |
FOLDER |
{$SHAREPOINT.LLD.FILTER.TYPE.NOT_MATCHES} | Filter to exclude discovered types. |
CHANGE_IF_NEEDED |
{$SHAREPOINT.LLD_INTERVAL} | - |
3h |
{$SHAREPOINT.MAXHEALTSCORE} | Must be in the range from 0 to 10 in details: https://docs.microsoft.com/en-us/openspecs/sharepoint_protocols/ms-wsshp/c60ddeb6-4113-4a73-9e97-26b5c3907d33 |
5 |
{$SHAREPOINT.PASSWORD} | - |
`` |
{$SHAREPOINT.ROOT} | - |
/Shared Documents |
{$SHAREPOINT.URL} | Portal page URL. For example http://sharepoint.companyname.local/ |
`` |
{$SHAREPOINT.USER} | - |
`` |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Directory discovery | - |
SCRIPT | sharepoint.directory.discovery Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: Filter: AND- {#SHAREPOINT.LLD.NAME} MATCHESREGEX - {#SHAREPOINT.LLD.NAME} NOTMATCHESREGEX - {#SHAREPOINT.LLD.FULLPATH} MATCHESREGEX - {#SHAREPOINT.LLD.FULLPATH} NOTMATCHESREGEX - {#SHAREPOINT.LLD.TYPE} MATCHESREGEX - {#SHAREPOINT.LLD.TYPE} NOTMATCHES_REGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Sharepoint | Sharepoint: Get directory structure: Status | HTTP response (status) code. Indicates whether the HTTP request was successfully completed. Additional information is available in the server log file. |
DEPENDENT | sharepoint.getdir.status Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> DISCARD_VALUE - DISCARDUNCHANGEDHEARTBEAT: |
Sharepoint | Sharepoint: Get directory structure: Exec time | The time taken to execute the script for obtaining the data structure (in ms). Less is better. |
DEPENDENT | sharepoint.getdir.time Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_ERROR -> DISCARD_VALUE - DISCARDUNCHANGEDHEARTBEAT: |
Sharepoint | Sharepoint: Health score | This item specifies a value between 0 and 10, where 0 represents a low load and a high ability to process requests and 10 represents a high load and that the server is throttling requests to maintain adequate throughput. |
HTTP_AGENT | sharepoint.healthscore Preprocessing: - REGEX: - IN RANGE:0 10 ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
Sharepoint | Sharepoint: Size ({#SHAREPOINT.LLD.FULL_PATH}) | Size of: {#SHAREPOINT.LLD.FULL_PATH} |
DEPENDENT | sharepoint.size["{#SHAREPOINT.LLD.FULLPATH}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - DISCARDUNCHANGEDHEARTBEAT: |
Sharepoint | Sharepoint: Modified ({#SHAREPOINT.LLD.FULL_PATH}) | Date of change: {#SHAREPOINT.LLD.FULL_PATH} |
DEPENDENT | sharepoint.modified["{#SHAREPOINT.LLD.FULLPATH}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - DISCARDUNCHANGEDHEARTBEAT: |
Sharepoint | Sharepoint: Created ({#SHAREPOINT.LLD.FULL_PATH}) | Date of creation: {#SHAREPOINT.LLD.FULL_PATH} |
DEPENDENT | sharepoint.created["{#SHAREPOINT.LLD.FULLPATH}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - DISCARDUNCHANGEDHEARTBEAT: |
Zabbix raw items | Sharepoint: Get directory structure | Used to get directory structure information |
SCRIPT | sharepoint.getdir Preprocessing: - CHECK NOTSUPPORTED⛔️ON FAIL:CUSTOM_VALUE -> {"status":520,"data":{},"time":0} Expression: The text is too long. Please see the template. |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Sharepoint: Error getting directory structure. | Error getting directory structure. Check the Zabbix server log for more details. |
last(/Microsoft SharePoint by HTTP/sharepoint.get_dir.status)<>200 |
WARNING | |
Sharepoint: Server responds slowly to API request | - |
last(/Microsoft SharePoint by HTTP/sharepoint.get_dir.time)>2000 |
WARNING | |
Sharepoint: Bad health score | - |
last(/Microsoft SharePoint by HTTP/sharepoint.health_score)>"{$SHAREPOINT.MAX_HEALT_SCORE}" |
AVERAGE | |
Sharepoint: Sharepoint object is changed | Updated date of modification of folder / file |
last(/Microsoft SharePoint by HTTP/sharepoint.modified["{#SHAREPOINT.LLD.FULL_PATH}"],#1)<>last(/Microsoft SharePoint by HTTP/sharepoint.modified["{#SHAREPOINT.LLD.FULL_PATH}"],#2) |
INFO | Manual close: YES |
Please report any issues with the template at https://support.zabbix.com
For Zabbix version: 6.2 and higher. The template to monitor RabbitMQ by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template RabbitMQ Cluster
— collects metrics by polling RabbitMQ management plugin with HTTP agent remotely.
This template was tested on:
See Zabbix template operation for basic instructions.
Enable the RabbitMQ management plugin. See RabbitMQ's documentation to enable it.
Create a user to monitor the service:
rabbitmqctl add_user zbx_monitor <PASSWORD>
rabbitmqctl set_permissions -p / zbx_monitor "" "" ".*"
rabbitmqctl set_user_tags zbx_monitor monitoring
Login and password are also set in macros:
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$RABBITMQ.API.PASSWORD} | - |
zabbix |
{$RABBITMQ.API.PORT} | The port of RabbitMQ API endpoint |
15672 |
{$RABBITMQ.API.SCHEME} | Request scheme which may be http or https |
http |
{$RABBITMQ.API.USER} | - |
zbx_monitor |
{$RABBITMQ.LLD.FILTER.EXCHANGE.MATCHES} | Filter of discoverable exchanges |
.* |
{$RABBITMQ.LLD.FILTER.EXCHANGE.NOT_MATCHES} | Filter to exclude discovered exchanges |
CHANGE_IF_NEEDED |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Exchanges discovery | Individual exchange metrics |
DEPENDENT | rabbitmq.exchanges.discovery Filter: AND- {#EXCHANGE} MATCHESREGEX - {#EXCHANGE} NOTMATCHES_REGEX |
Health Check 3.8.10+ discovery | Version 3.8.10+ specific metrics |
DEPENDENT | rabbitmq.healthcheck.v3810.discovery Preprocessing: - JSONPATH: - JAVASCRIPT: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
RabbitMQ | RabbitMQ: Connections total | The total number of connections. |
DEPENDENT | rabbitmq.overview.object_totals.connections Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Channels total | The total number of channels. |
DEPENDENT | rabbitmq.overview.object_totals.channels Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Queues total | The total number of queues. |
DEPENDENT | rabbitmq.overview.object_totals.queues Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Consumers total | The total number of consumers. |
DEPENDENT | rabbitmq.overview.object_totals.consumers Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Exchanges total | The total number of exchanges. |
DEPENDENT | rabbitmq.overview.object_totals.exchanges Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Messages total | The total number of messages (ready, plus unacknowledged). |
DEPENDENT | rabbitmq.overview.queuetotals.messages Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Messages ready for delivery | The number of messages ready for delivery. |
DEPENDENT | rabbitmq.overview.queuetotals.messages.ready Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Messages unacknowledged | The number of unacknowledged messages. |
DEPENDENT | rabbitmq.overview.queuetotals.messages.unacknowledged Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Messages acknowledged | The number of messages delivered to clients and acknowledged. |
DEPENDENT | rabbitmq.overview.messages.ack Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Messages acknowledged per second | The rate of messages (per second) delivered to clients and acknowledged. |
DEPENDENT | rabbitmq.overview.messages.ack.rate Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Messages confirmed | The count of confirmed messages. |
DEPENDENT | rabbitmq.overview.messages.confirm Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Messages confirmed per second | The rate of confirmed messages per second. |
DEPENDENT | rabbitmq.overview.messages.confirm.rate Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Messages delivered | The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to |
DEPENDENT | rabbitmq.overview.messages.deliverget Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Messages delivered per second | The rate of the sum of messages (per second) delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to |
DEPENDENT | rabbitmq.overview.messages.deliverget.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Messages published | The count of published messages. |
DEPENDENT | rabbitmq.overview.messages.publish Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Messages published per second | The rate of published messages per second. |
DEPENDENT | rabbitmq.overview.messages.publish.rate Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Messages publish_in | The count of messages published from the channels into this overview. |
DEPENDENT | rabbitmq.overview.messages.publishin Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Messages publish_in per second | The rate of messages (per second) published from the channels into this overview. |
DEPENDENT | rabbitmq.overview.messages.publishin.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Messages publish_out | The count of messages published from this overview into queues. |
DEPENDENT | rabbitmq.overview.messages.publishout Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Messages publish_out per second | The rate of messages (per second) published from this overview into queues. |
DEPENDENT | rabbitmq.overview.messages.publishout.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Messages returned unroutable | The count of messages returned to a publisher as unroutable. |
DEPENDENT | rabbitmq.overview.messages.returnunroutable Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Messages returned unroutable per second | The rate of messages (per second) returned to a publisher as unroutable. |
DEPENDENT | rabbitmq.overview.messages.returnunroutable.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Messages returned redeliver | The count of subset of messages in the |
DEPENDENT | rabbitmq.overview.messages.redeliver Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Messages returned redeliver per second | The rate of subset of messages (per second) in the |
DEPENDENT | rabbitmq.overview.messages.redeliver.rate Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Healthcheck: alarms in effect in the cluster{#SINGLETON} | Responds a 200 OK if there are no alarms in effect in the cluster, otherwise responds with a 503 Service Unavailable. |
HTTP_AGENT | rabbitmq.healthcheck.alarms[{#SINGLETON}] Preprocessing: - REGEX: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages acknowledged | The number of messages delivered to clients and acknowledged. |
DEPENDENT | rabbitmq.exchange.messages.ack["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages acknowledged per second | The rate of messages (per second) delivered to clients and acknowledged. |
DEPENDENT | rabbitmq.exchange.messages.ack.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages confirmed | The count of confirmed messages. |
DEPENDENT | rabbitmq.exchange.messages.confirm["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages confirmed per second | The rate of messages confirmed per second. |
DEPENDENT | rabbitmq.exchange.messages.confirm.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages delivered | The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the |
DEPENDENT | rabbitmq.exchange.messages.deliverget["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages delivered per second | The rate of the sum of messages (per second) delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the |
DEPENDENT | rabbitmq.exchange.messages.deliverget.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages published | The count of published messages. |
DEPENDENT | rabbitmq.exchange.messages.publish["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages published per second | The rate of messages published per second. |
DEPENDENT | rabbitmq.exchange.messages.publish.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_in | The count of messages published from the channels into this overview. |
DEPENDENT | rabbitmq.exchange.messages.publishin["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_in per second | The rate of messages (per second) published from the channels into this overview. |
DEPENDENT | rabbitmq.exchange.messages.publishin.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_out | The count of messages published from this overview into queues. |
DEPENDENT | rabbitmq.exchange.messages.publishout["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_out per second | The rate of messages (per second) published from this overview into queues. |
DEPENDENT | rabbitmq.exchange.messages.publishout.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages returned unroutable | The count of messages returned to a publisher as unroutable. |
DEPENDENT | rabbitmq.exchange.messages.returnunroutable["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages returned unroutable per second | The rate of messages (per second) returned to a publisher as unroutable. |
DEPENDENT | rabbitmq.exchange.messages.returnunroutable.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages redelivered | The count of subset of messages in the |
DEPENDENT | rabbitmq.exchange.messages.redeliver["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Exchange {#VHOST}/{#EXCHANGE}/{#TYPE}: Messages redelivered per second | The rate of subset of messages (per second) in the |
DEPENDENT | rabbitmq.exchange.messages.redeliver.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Zabbix raw items | RabbitMQ: Get overview | The HTTP API endpoint that returns cluster-wide metrics. |
HTTP_AGENT | rabbitmq.get_overview |
Zabbix raw items | RabbitMQ: Get exchanges | The HTTP API endpoint that returns exchanges metrics. |
HTTP_AGENT | rabbitmq.get_exchanges |
Zabbix raw items | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Get data | The HTTP API endpoint that returns [{#VHOST}][{#EXCHANGE}][{#TYPE}] exchanges metrics |
DEPENDENT | rabbitmq.get_exchanges["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ: There are active alarms in the cluster | This is the default API endpoint path: http://{HOST.CONN}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ cluster by HTTP/rabbitmq.healthcheck.alarms[{#SINGLETON}])=0 |
AVERAGE | |
RabbitMQ: Failed to fetch overview data | Zabbix has not received data for items for the last 30 minutes |
nodata(/RabbitMQ cluster by HTTP/rabbitmq.get_overview,30m)=1 |
WARNING | Manual close: YES |
Please report any issues with the template at https://support.zabbix.com.
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums.
For Zabbix version: 6.2 and higher. The template to monitor RabbitMQ by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template RabbitMQ Node
— (Zabbix version >= 4.2) collects metrics by polling RabbitMQ management plugin with HTTP agent remotely.
This template was tested on:
Enable the RabbitMQ management plugin. See RabbitMQ's documentation to enable it.
Create a user to monitor the service:
rabbitmqctl add_user zbx_monitor <PASSWORD>
rabbitmqctl set_permissions -p / zbx_monitor "" "" ".*"
rabbitmqctl set_user_tags zbx_monitor monitoring
Login and password are also set in macros:
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$RABBITMQ.API.PASSWORD} | - |
zabbix |
{$RABBITMQ.API.PORT} | The port of RabbitMQ API endpoint |
15672 |
{$RABBITMQ.API.SCHEME} | Request scheme which may be http or https |
http |
{$RABBITMQ.API.USER} | - |
zbx_monitor |
{$RABBITMQ.CLUSTER.NAME} | The name of RabbitMQ cluster |
rabbit |
{$RABBITMQ.LLD.FILTER.QUEUE.MATCHES} | Filter of discoverable queues |
.* |
{$RABBITMQ.LLD.FILTER.QUEUE.NOT_MATCHES} | Filter to exclude discovered queues |
CHANGE_IF_NEEDED |
{$RABBITMQ.MESSAGES.MAX.WARN} | Maximum number of messages in the queue for trigger expression |
1000 |
{$RABBITMQ.RESPONSE_TIME.MAX.WARN} | Maximum RabbitMQ response time in seconds for trigger expression |
10 |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Health Check 3.8.10+ discovery | Version 3.8.10+ specific metrics |
DEPENDENT | rabbitmq.healthcheck.v3810.discovery Preprocessing: - JSONPATH: - JAVASCRIPT: |
Health Check 3.8.9- discovery | Specific metrics up to and including version 3.8.4 |
DEPENDENT | rabbitmq.healthcheck.v389.discovery Preprocessing: - JSONPATH: - JAVASCRIPT: |
Queues discovery | Individual queue metrics |
DEPENDENT | rabbitmq.queues.discovery Filter: AND- {#QUEUE} MATCHESREGEX - {#QUEUE} NOTMATCHESREGEX - {#NODE} MATCHESREGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
RabbitMQ | RabbitMQ: Management plugin version | The version of the management plugin in use. |
DEPENDENT | rabbitmq.node.overview.managementversion Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
RabbitMQ | RabbitMQ: RabbitMQ version | The version of the RabbitMQ on the node, which processed this request. |
DEPENDENT | rabbitmq.node.overview.rabbitmqversion Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
RabbitMQ | RabbitMQ: Used file descriptors | The descriptors of the used file. |
DEPENDENT | rabbitmq.node.fd_used Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Free disk space | The current free disk space. |
DEPENDENT | rabbitmq.node.disk_free Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Disk free limit | The free space limit of a disk expressed in bytes. |
DEPENDENT | rabbitmq.node.diskfreelimit Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Memory used | The memory usage expressed in bytes. |
DEPENDENT | rabbitmq.node.mem_used Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Memory limit | The memory usage with high watermark properties expressed in bytes. |
DEPENDENT | rabbitmq.node.mem_limit Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Runtime run queue | The average number of Erlang processes waiting to run. |
DEPENDENT | rabbitmq.node.run_queue Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Sockets used | The number of file descriptors used as sockets. |
DEPENDENT | rabbitmq.node.sockets_used Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Sockets available | The file descriptors available for use as sockets. |
DEPENDENT | rabbitmq.node.sockets_total Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Number of network partitions | The number of network partitions, which this node "sees". |
DEPENDENT | rabbitmq.node.partitions Preprocessing: - JSONPATH: - JAVASCRIPT: |
RabbitMQ | RabbitMQ: Is running | It "sees" whether the node is running or not. |
DEPENDENT | rabbitmq.node.running Preprocessing: - JSONPATH: - BOOLTODECIMAL |
RabbitMQ | RabbitMQ: Memory alarm | It checks whether the host has a memory alarm or not. |
DEPENDENT | rabbitmq.node.memalarm Preprocessing: - JSONPATH: - BOOL TO_DECIMAL |
RabbitMQ | RabbitMQ: Disk free alarm | It checks whether the node has a disk alarm or not. |
DEPENDENT | rabbitmq.node.diskfreealarm Preprocessing: - JSONPATH: - BOOLTODECIMAL |
RabbitMQ | RabbitMQ: Uptime | Uptime expressed in milliseconds. |
DEPENDENT | rabbitmq.node.uptime Preprocessing: - JSONPATH: - MULTIPLIER: |
RabbitMQ | RabbitMQ: Service ping | - |
SIMPLE | net.tcp.service["{$RABBITMQ.API.SCHEME}","{HOST.CONN}","{$RABBITMQ.API.PORT}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
RabbitMQ | RabbitMQ: Service response time | - |
SIMPLE | net.tcp.service.perf["{$RABBITMQ.API.SCHEME}","{HOST.CONN}","{$RABBITMQ.API.PORT}"] |
RabbitMQ | RabbitMQ: Healthcheck: local alarms in effect on this node{#SINGLETON} | Responds a 200 OK if there are no local alarms in effect on the target node, otherwise responds with a 503 Service Unavailable. |
HTTP_AGENT | rabbitmq.healthcheck.localalarms[{#SINGLETON}] Preprocessing: - REGEX: - JAVASCRIPT: - DISCARD UNCHANGED_HEARTBEAT:3h |
RabbitMQ | RabbitMQ: Healthcheck: expiration date on the certificates{#SINGLETON} | Checks the expiration date on the certificates for every listener configured to use TLS. Responds a 200 OK if all certificates are valid (have not expired), otherwise responds with a 503 Service Unavailable. |
HTTP_AGENT | rabbitmq.healthcheck.certificateexpiration[{#SINGLETON}] Preprocessing: - REGEX: - JAVASCRIPT: - DISCARD UNCHANGED_HEARTBEAT:3h |
RabbitMQ | RabbitMQ: Healthcheck: virtual hosts on this node{#SINGLETON} | Responds a 200 OK if all virtual hosts and running on the target node, otherwise responds with a 503 Service Unavailable. |
HTTP_AGENT | rabbitmq.healthcheck.virtualhosts[{#SINGLETON}] Preprocessing: - REGEX: - JAVASCRIPT: - DISCARD UNCHANGED_HEARTBEAT:3h |
RabbitMQ | RabbitMQ: Healthcheck: classic mirrored queues without synchronized mirrors online{#SINGLETON} | Checks if there are classic mirrored queues without synchronized mirrors online (queues that would potentially lose data if the target node is shut down). Responds a 200 OK if there are no such classic mirrored queues, otherwise responds with a 503 Service Unavailable. |
HTTP_AGENT | rabbitmq.healthcheck.mirrorsync[{#SINGLETON}] Preprocessing: - REGEX: - JAVASCRIPT: - DISCARD UNCHANGED_HEARTBEAT:3h |
RabbitMQ | RabbitMQ: Healthcheck: queues with minimum online quorum{#SINGLETON} | Checks if there are quorum queues with minimum online quorum (queues that would lose their quorum and availability if the target node is shut down). Responds a 200 OK if there are no such quorum queues, otherwise responds with a 503 Service Unavailable. |
HTTP_AGENT | rabbitmq.healthcheck.quorum[{#SINGLETON}] Preprocessing: - REGEX: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
RabbitMQ | RabbitMQ: Healthcheck{#SINGLETON} | It checks whether the RabbitMQ application is running; and whether the channels and queues can be listed successfully; and that no alarms are in effect. |
HTTP_AGENT | rabbitmq.healthcheck[{#SINGLETON}] Preprocessing: - JSONPATH: - BOOLTODECIMAL ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Get data | The HTTP API endpoint that returns [{#VHOST}][{#QUEUE}] queue metrics |
DEPENDENT | rabbitmq.get_exchanges["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages | The count of total messages in the queue. |
DEPENDENT | rabbitmq.queue.messages["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages per second | The count of total messages per second in the queue. |
DEPENDENT | rabbitmq.queue.messages.rate["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Consumers | The number of consumers. |
DEPENDENT | rabbitmq.queue.consumers["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Memory | The bytes of memory consumed by the Erlang process associated with the queue, including stack, heap and internal structures. |
DEPENDENT | rabbitmq.queue.memory["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages ready | The number of messages ready to be delivered to clients. |
DEPENDENT | rabbitmq.queue.messages_ready["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages ready per second | The number of messages per second ready to be delivered to clients. |
DEPENDENT | rabbitmq.queue.messages_ready.rate["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages unacknowledged | The number of messages delivered to clients but not yet acknowledged. |
DEPENDENT | rabbitmq.queue.messages_unacknowledged["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages unacknowledged per second | The number of messages per second delivered to clients but not yet acknowledged. |
DEPENDENT | rabbitmq.queue.messages_unacknowledged.rate["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages acknowledged | The number of messages delivered to clients and acknowledged. |
DEPENDENT | rabbitmq.queue.messages.ack["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages acknowledged per second | The number of messages (per second) delivered to clients and acknowledged. |
DEPENDENT | rabbitmq.queue.messages.ack.rate["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages delivered | The count of messages delivered to consumers in acknowledgement mode. |
DEPENDENT | rabbitmq.queue.messages.deliver["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages delivered per second | The count of messages (per second) delivered to consumers in acknowledgement mode. |
DEPENDENT | rabbitmq.queue.messages.deliver.rate["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Sum of messages delivered | The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to |
DEPENDENT | rabbitmq.queue.messages.deliverget["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Sum of messages delivered per second | The rate of delivery per second. The sum of messages delivered (per second) to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to |
DEPENDENT | rabbitmq.queue.messages.deliverget.rate["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages published | The count of published messages. |
DEPENDENT | rabbitmq.queue.messages.publish["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages published per second | The rate of published messages per second. |
DEPENDENT | rabbitmq.queue.messages.publish.rate["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages redelivered | The count of subset of messages in the |
DEPENDENT | rabbitmq.queue.messages.redeliver["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages redelivered per second | The rate of messages redelivered per second. |
DEPENDENT | rabbitmq.queue.messages.redeliver.rate["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Zabbix raw items | RabbitMQ: Get node overview | The HTTP API endpoint that returns cluster-wide metrics. |
HTTP_AGENT | rabbitmq.getnodeoverview |
Zabbix raw items | RabbitMQ: Get nodes | The HTTP API endpoint that returns metrics of the nodes. |
HTTP_AGENT | rabbitmq.get_nodes |
Zabbix raw items | RabbitMQ: Get queues | The HTTP API endpoint that returns metrics of the queues metrics. |
HTTP_AGENT | rabbitmq.get_queues |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ: Version has changed | The RabbitMQ version has changed. Acknowledge (Ack) to close manually. |
last(/RabbitMQ node by HTTP/rabbitmq.node.overview.rabbitmq_version,#1)<>last(/RabbitMQ node by HTTP/rabbitmq.node.overview.rabbitmq_version,#2) and length(last(/RabbitMQ node by HTTP/rabbitmq.node.overview.rabbitmq_version))>0 |
INFO | Manual close: YES |
RabbitMQ: Number of network partitions is too high | https://www.rabbitmq.com/partitions.html#detecting |
min(/RabbitMQ node by HTTP/rabbitmq.node.partitions,5m)>0 |
WARNING | |
RabbitMQ: Node is not running | RabbitMQ node is not running |
max(/RabbitMQ node by HTTP/rabbitmq.node.running,5m)=0 |
AVERAGE | Depends on: - RabbitMQ: Service is down |
RabbitMQ: Memory alarm | https://www.rabbitmq.com/memory.html |
last(/RabbitMQ node by HTTP/rabbitmq.node.mem_alarm)=1 |
AVERAGE | |
RabbitMQ: Free disk space alarm | https://www.rabbitmq.com/disk-alarms.html |
last(/RabbitMQ node by HTTP/rabbitmq.node.disk_free_alarm)=1 |
AVERAGE | |
RabbitMQ: Host has been restarted | The host uptime is less than 10 minutes. |
last(/RabbitMQ node by HTTP/rabbitmq.node.uptime)<10m |
INFO | Manual close: YES |
RabbitMQ: Service is down | - |
last(/RabbitMQ node by HTTP/net.tcp.service["{$RABBITMQ.API.SCHEME}","{HOST.CONN}","{$RABBITMQ.API.PORT}"])=0 |
AVERAGE | Manual close: YES |
RabbitMQ: Service response time is too high | - |
min(/RabbitMQ node by HTTP/net.tcp.service.perf["{$RABBITMQ.API.SCHEME}","{HOST.CONN}","{$RABBITMQ.API.PORT}"],5m)>{$RABBITMQ.RESPONSE_TIME.MAX.WARN} |
WARNING | Manual close: YES Depends on: - RabbitMQ: Service is down |
RabbitMQ: There are active alarms in the node | This is the default API endpoint path: http://{HOST.CONN}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by HTTP/rabbitmq.healthcheck.local_alarms[{#SINGLETON}])=0 |
AVERAGE | |
RabbitMQ: There are valid TLS certificates expiring in the next month | This is the default API endpoint path: http://{HOST.CONN}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by HTTP/rabbitmq.healthcheck.certificate_expiration[{#SINGLETON}])=0 |
AVERAGE | |
RabbitMQ: There are not running virtual hosts | This is the default API endpoint path: http://{HOST.CONN}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by HTTP/rabbitmq.healthcheck.virtual_hosts[{#SINGLETON}])=0 |
AVERAGE | |
RabbitMQ: There are queues that could potentially lose data if this node goes offline. | This is the default API endpoint path: http://{HOST.CONN}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by HTTP/rabbitmq.healthcheck.mirror_sync[{#SINGLETON}])=0 |
AVERAGE | |
RabbitMQ: There are queues that would lose their quorum and availability if this node is shut down. | This is the default API endpoint path: http://{HOST.CONN}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by HTTP/rabbitmq.healthcheck.quorum[{#SINGLETON}])=0 |
AVERAGE | |
RabbitMQ: Node healthcheck failed | https://www.rabbitmq.com/monitoring.html#health-checks |
last(/RabbitMQ node by HTTP/rabbitmq.healthcheck[{#SINGLETON}])=0 |
AVERAGE | |
RabbitMQ: Too many messages in queue [{#VHOST}][{#QUEUE}] | - |
min(/RabbitMQ node by HTTP/rabbitmq.queue.messages["{#VHOST}/{#QUEUE}"],5m)>{$RABBITMQ.MESSAGES.MAX.WARN:"{#QUEUE}"} |
WARNING | |
RabbitMQ: Failed to fetch nodes data | Zabbix has not received data for items for the last 30 minutes. |
nodata(/RabbitMQ node by HTTP/rabbitmq.get_nodes,30m)=1 |
WARNING | Manual close: YES Depends on: - RabbitMQ: Service is down |
Please report any issues with the template at https://support.zabbix.com.
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums.
For Zabbix version: 6.2 and higher. This template is developed to monitor the messaging broker RabbitMQ by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template RabbitMQ Cluster
— collects metrics by polling RabbitMQ management plugin with Zabbix agent.
This template was tested on:
See Zabbix template operation for basic instructions.
Enable the RabbitMQ management plugin. See RabbitMQ documentation for the instructions.
Create a user to monitor the service:
rabbitmqctl add_user zbx_monitor <PASSWORD>
rabbitmqctl set_permissions -p / zbx_monitor "" "" ".*"
rabbitmqctl set_user_tags zbx_monitor monitoring
A login name and password are also supported in macros functions:
If your cluster consists of several nodes, it is recommended to assign the cluster
template to a separate balancing host.
In the case of a single-node installation, you can assign the cluster
template to one host with a node
template.
If you use another API endpoint, then don't forget to change {$RABBITMQ.API.CLUSTER_HOST}
macro.
Install and setup Zabbix agent.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$RABBITMQ.API.CLUSTER_HOST} | The hostname or an IP of the API endpoint for the RabbitMQ cluster. |
127.0.0.1 |
{$RABBITMQ.API.PASSWORD} | - |
zabbix |
{$RABBITMQ.API.PORT} | The port of the RabbitMQ API endpoint. |
15672 |
{$RABBITMQ.API.SCHEME} | The request scheme, which may be HTTP or HTTPS. |
http |
{$RABBITMQ.API.USER} | - |
zbx_monitor |
{$RABBITMQ.LLD.FILTER.EXCHANGE.MATCHES} | This macro is used in the discovery of exchanges. It can be overridden at host level or its linked template level. |
.* |
{$RABBITMQ.LLD.FILTER.EXCHANGE.NOT_MATCHES} | This macro is used in the discovery of exchanges. It can be overridden at host level or its linked template level. |
CHANGE_IF_NEEDED |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Exchanges discovery | The metrics for an individual exchange. |
DEPENDENT | rabbitmq.exchanges.discovery Filter: AND- {#EXCHANGE} MATCHESREGEX - {#EXCHANGE} NOTMATCHES_REGEX |
Health Check 3.8.10+ discovery | Specific metrics for the versions: up to and including 3.8.10. |
DEPENDENT | rabbitmq.healthcheck.v3810.discovery Preprocessing: - JSONPATH: - JAVASCRIPT: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
RabbitMQ | RabbitMQ: Connections total | The total number of connections. |
DEPENDENT | rabbitmq.overview.object_totals.connections Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Channels total | The total number of channels. |
DEPENDENT | rabbitmq.overview.object_totals.channels Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Queues total | The total number of queues. |
DEPENDENT | rabbitmq.overview.object_totals.queues Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Consumers total | The total number of consumers. |
DEPENDENT | rabbitmq.overview.object_totals.consumers Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Exchanges total | The total number of exchanges. |
DEPENDENT | rabbitmq.overview.object_totals.exchanges Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Messages total | The total number of messages (ready, plus unacknowledged). |
DEPENDENT | rabbitmq.overview.queuetotals.messages Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Messages ready for delivery | The number of messages ready for delivery. |
DEPENDENT | rabbitmq.overview.queuetotals.messages.ready Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Messages unacknowledged | The number of unacknowledged messages. |
DEPENDENT | rabbitmq.overview.queuetotals.messages.unacknowledged Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Messages acknowledged | The number of messages delivered to clients and acknowledged. |
DEPENDENT | rabbitmq.overview.messages.ack Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Messages acknowledged per second | The rate of messages (per second) delivered to clients and acknowledged. |
DEPENDENT | rabbitmq.overview.messages.ack.rate Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Messages confirmed | The count of confirmed messages. |
DEPENDENT | rabbitmq.overview.messages.confirm Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Messages confirmed per second | The rate of confirmed messages per second. |
DEPENDENT | rabbitmq.overview.messages.confirm.rate Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Messages delivered | The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to |
DEPENDENT | rabbitmq.overview.messages.deliverget Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Messages delivered per second | The rate of the sum of messages (per second) delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to |
DEPENDENT | rabbitmq.overview.messages.deliverget.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Messages published | The count of published messages. |
DEPENDENT | rabbitmq.overview.messages.publish Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Messages published per second | The rate of published messages per second. |
DEPENDENT | rabbitmq.overview.messages.publish.rate Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Messages publish_in | The count of messages published from the channels into this overview. |
DEPENDENT | rabbitmq.overview.messages.publishin Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Messages publish_in per second | The rate of messages (per second) published from the channels into this overview. |
DEPENDENT | rabbitmq.overview.messages.publishin.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Messages publish_out | The count of messages published from this overview into queues. |
DEPENDENT | rabbitmq.overview.messages.publishout Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Messages publish_out per second | The rate of messages (per second) published from this overview into queues. |
DEPENDENT | rabbitmq.overview.messages.publishout.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Messages returned unroutable | The count of messages returned to a publisher as unroutable. |
DEPENDENT | rabbitmq.overview.messages.returnunroutable Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Messages returned unroutable per second | The rate of messages (per second) returned to a publisher as unroutable. |
DEPENDENT | rabbitmq.overview.messages.returnunroutable.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Messages returned redeliver | The count of subset of messages in the |
DEPENDENT | rabbitmq.overview.messages.redeliver Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Messages returned redeliver per second | The rate of subset of messages (per second) in the |
DEPENDENT | rabbitmq.overview.messages.redeliver.rate Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Healthcheck: alarms in effect in the cluster{#SINGLETON} | It responds with a status code |
ZABBIX_PASSIVE | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.CLUSTERHOST}:{$RABBITMQ.API.PORT}/api/health/checks/alarms{#SINGLETON}"] Preprocessing: - REGEX: - JAVASCRIPT: - DISCARD UNCHANGED_HEARTBEAT:3h |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages acknowledged | The number of messages delivered to clients and acknowledged. |
DEPENDENT | rabbitmq.exchange.messages.ack["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages acknowledged per second | The rate of messages (per second) delivered to clients and acknowledged. |
DEPENDENT | rabbitmq.exchange.messages.ack.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages confirmed | The count of confirmed messages. |
DEPENDENT | rabbitmq.exchange.messages.confirm["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages confirmed per second | The rate of messages confirmed per second. |
DEPENDENT | rabbitmq.exchange.messages.confirm.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages delivered | The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the |
DEPENDENT | rabbitmq.exchange.messages.deliverget["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages delivered per second | The rate of the sum of messages (per second) delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the |
DEPENDENT | rabbitmq.exchange.messages.deliverget.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages published | The count of published messages. |
DEPENDENT | rabbitmq.exchange.messages.publish["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages published per second | The rate of messages published per second. |
DEPENDENT | rabbitmq.exchange.messages.publish.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_in | The count of messages published from the channels into this overview. |
DEPENDENT | rabbitmq.exchange.messages.publishin["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_in per second | The rate of messages (per second) published from the channels into this overview. |
DEPENDENT | rabbitmq.exchange.messages.publishin.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_out | The count of messages published from this overview into queues. |
DEPENDENT | rabbitmq.exchange.messages.publishout["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_out per second | The rate of messages (per second) published from this overview into queues. |
DEPENDENT | rabbitmq.exchange.messages.publishout.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages returned unroutable | The count of messages returned to a publisher as unroutable. |
DEPENDENT | rabbitmq.exchange.messages.returnunroutable["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages returned unroutable per second | The rate of messages (per second) returned to a publisher as unroutable. |
DEPENDENT | rabbitmq.exchange.messages.returnunroutable.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages redelivered | The count of subset of messages in the |
DEPENDENT | rabbitmq.exchange.messages.redeliver["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Exchange {#VHOST}/{#EXCHANGE}/{#TYPE}: Messages redelivered per second | The rate of subset of messages (per second) in the |
DEPENDENT | rabbitmq.exchange.messages.redeliver.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Zabbix raw items | RabbitMQ: Get overview | The HTTP API endpoint that returns cluster-wide metrics. |
ZABBIX_PASSIVE | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/overview"] Preprocessing: - REGEX: |
Zabbix raw items | RabbitMQ: Get exchanges | The HTTP API endpoint that returns exchanges metrics. |
ZABBIX_PASSIVE | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/exchanges"] Preprocessing: - REGEX: |
Zabbix raw items | RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Get data | The HTTP API endpoint that returns [{#VHOST}][{#EXCHANGE}][{#TYPE}] exchanges metrics |
DEPENDENT | rabbitmq.get_exchanges["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing: - JSONPATH: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ: There are active alarms in the cluster | This is the default API endpoint path: http://{HOST.CONN}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ cluster by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/health/checks/alarms{#SINGLETON}"])=0 |
AVERAGE | |
RabbitMQ: Failed to fetch overview data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/RabbitMQ cluster by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/overview"],30m)=1 |
WARNING | Manual close: YES |
Please report any issues with the template at https://support.zabbix.com.
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums.
For Zabbix version: 6.2 and higher.
This template is developed to monitor RabbitMQ by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template RabbitMQ Node
— (Zabbix version >= 4.2) collects metrics by polling RabbitMQ management plugin with Zabbix agent.
It also uses Zabbix agent to collect RabbitMQ Linux process statistics, such as CPU usage, memory usage, and whether the process is running or not.
This template was tested on:
Enable the RabbitMQ management plugin. See RabbitMQ documentation for the instructions.
Create a user to monitor the service:
rabbitmqctl add_user zbx_monitor <PASSWORD>
rabbitmqctl set_permissions -p / zbx_monitor "" "" ".*"
rabbitmqctl set_user_tags zbx_monitor monitoring
A login name and password are also supported in macros functions:
If you use another API endpoint, then don't forget to change {$RABBITMQ.API.HOST}
macro.
Install and setup Zabbix agent.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$RABBITMQ.API.HOST} | The hostname or an IP of the API endpoint for the RabbitMQ. |
127.0.0.1 |
{$RABBITMQ.API.PASSWORD} | - |
zabbix |
{$RABBITMQ.API.PORT} | The port of the RabbitMQ API endpoint. |
15672 |
{$RABBITMQ.API.SCHEME} | The request scheme, which may be HTTP or HTTPS. |
http |
{$RABBITMQ.API.USER} | - |
zbx_monitor |
{$RABBITMQ.CLUSTER.NAME} | The name of the RabbitMQ cluster. |
rabbit |
{$RABBITMQ.LLD.FILTER.QUEUE.MATCHES} | This macro is used in the discovery of queues. It can be overridden at host level or its linked template level. |
.* |
{$RABBITMQ.LLD.FILTER.QUEUE.NOT_MATCHES} | This macro is used in the discovery of queues. It can be overridden at host level or its linked template level. |
CHANGE_IF_NEEDED |
{$RABBITMQ.MESSAGES.MAX.WARN} | The maximum number of messages in the queue for a trigger expression. |
1000 |
{$RABBITMQ.PROCESS_NAME} | The name of the RabbitMQ server process. |
beam.smp |
{$RABBITMQ.RESPONSE_TIME.MAX.WARN} | The maximum response time by the RabbitMQ expressed in seconds for a trigger expression. |
10 |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Health Check 3.8.10+ discovery | Specific metrics for the versions: up to and including 3.8.10. |
DEPENDENT | rabbitmq.healthcheck.v3810.discovery Preprocessing: - JSONPATH: - JAVASCRIPT: |
Health Check 3.8.9- discovery | Specific metrics for the versions: up to and including 3.8.4. |
DEPENDENT | rabbitmq.healthcheck.v389.discovery Preprocessing: - JSONPATH: - JAVASCRIPT: |
Queues discovery | The metrics for an individual queue. |
DEPENDENT | rabbitmq.queues.discovery Filter: AND- {#QUEUE} MATCHESREGEX - {#QUEUE} NOTMATCHESREGEX - {#NODE} MATCHESREGEX |
RabbitMQ process discovery | The discovery of the RabbitMQ summary processes. |
DEPENDENT | rabbitmq.proc.discovery Filter: AND- {#NAME} MATCHES_REGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
RabbitMQ | RabbitMQ: Get nodes | The HTTP API endpoint that returns metrics from the nodes. |
ZABBIX_PASSIVE | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/nodes/{$RABBITMQ.CLUSTER.NAME}@{HOST.NAME}?memory=true"] Preprocessing: - REGEX: |
RabbitMQ | RabbitMQ: Management plugin version | The version of the management plugin in use. |
DEPENDENT | rabbitmq.node.overview.managementversion Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
RabbitMQ | RabbitMQ: RabbitMQ version | The version of the RabbitMQ on the node, which processed this request. |
DEPENDENT | rabbitmq.node.overview.rabbitmqversion Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
RabbitMQ | RabbitMQ: Used file descriptors | The descriptors of the used file. |
DEPENDENT | rabbitmq.node.fd_used Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Free disk space | The current free disk space. |
DEPENDENT | rabbitmq.node.disk_free Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Memory used | The memory usage expressed in bytes. |
DEPENDENT | rabbitmq.node.mem_used Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Memory limit | The memory usage with high watermark properties expressed in bytes. |
DEPENDENT | rabbitmq.node.mem_limit Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Disk free limit | The free space limit of a disk expressed in bytes. |
DEPENDENT | rabbitmq.node.diskfreelimit Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Runtime run queue | The average number of Erlang processes waiting to run. |
DEPENDENT | rabbitmq.node.run_queue Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Sockets used | The number of file descriptors used as sockets. |
DEPENDENT | rabbitmq.node.sockets_used Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Sockets available | The file descriptors available for use as sockets. |
DEPENDENT | rabbitmq.node.sockets_total Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Number of network partitions | The number of network partitions, which this node "sees". |
DEPENDENT | rabbitmq.node.partitions Preprocessing: - JSONPATH: - JAVASCRIPT: |
RabbitMQ | RabbitMQ: Is running | It "sees" whether the node is running or not. |
DEPENDENT | rabbitmq.node.running Preprocessing: - JSONPATH: - BOOLTODECIMAL |
RabbitMQ | RabbitMQ: Memory alarm | It checks whether the host has a memory alarm or not. |
DEPENDENT | rabbitmq.node.memalarm Preprocessing: - JSONPATH: - BOOL TO_DECIMAL |
RabbitMQ | RabbitMQ: Disk free alarm | It checks whether the node has a disk alarm or not. |
DEPENDENT | rabbitmq.node.diskfreealarm Preprocessing: - JSONPATH: - BOOLTODECIMAL |
RabbitMQ | RabbitMQ: Uptime | Uptime expressed in milliseconds. |
DEPENDENT | rabbitmq.node.uptime Preprocessing: - JSONPATH: - MULTIPLIER: |
RabbitMQ | RabbitMQ: Get processes summary | The aggregated data of summary metrics for all processes. |
ZABBIX_PASSIVE | proc.get[,,,summary] |
RabbitMQ | RabbitMQ: Service ping | - |
ZABBIX_PASSIVE | net.tcp.service["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
RabbitMQ | RabbitMQ: Service response time | - |
ZABBIX_PASSIVE | net.tcp.service.perf["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"] |
RabbitMQ | RabbitMQ: Get process data | The summary metrics aggregated by a process {#NAME}. |
DEPENDENT | rabbitmq.proc.get[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Number of running processes | The number of running processes {#NAME}. |
DEPENDENT | rabbitmq.proc.num[{#NAME}] Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
RabbitMQ | RabbitMQ: Memory usage (rss) | The summary of resident set size memory used by a process {#NAME} expressed in bytes. |
DEPENDENT | rabbitmq.proc.rss[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Memory usage (vsize) | The summary of virtual memory used by a process {#NAME} expressed in bytes. |
DEPENDENT | rabbitmq.proc.vmem[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Memory usage, % | The percentage of real memory used by a process {#NAME}. |
DEPENDENT | rabbitmq.proc.pmem[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: CPU utilization | The percentage of the CPU utilization by a process {#NAME}. |
ZABBIX_PASSIVE | proc.cpu.util[{#NAME}] |
RabbitMQ | RabbitMQ: Healthcheck: local alarms in effect on this node{#SINGLETON} | It responds with a status code |
ZABBIX_PASSIVE | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/local-alarms{#SINGLETON}"] Preprocessing: - REGEX: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
RabbitMQ | RabbitMQ: Healthcheck: expiration date on the certificates{#SINGLETON} | It checks the expiration date on the certificates for every listener configured to use the Transport Layer Security (TLS). It responds with a status code |
ZABBIX_PASSIVE | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/certificate-expiration/1/months{#SINGLETON}"] Preprocessing: - REGEX: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
RabbitMQ | RabbitMQ: Healthcheck: virtual hosts on this node{#SINGLETON} | It responds with It responds with a status code |
ZABBIX_PASSIVE | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/virtual-hosts{#SINGLETON}"] Preprocessing: - REGEX: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
RabbitMQ | RabbitMQ: Healthcheck: classic mirrored queues without synchronized mirrors online{#SINGLETON} | It checks if there are classic mirrored queues without synchronized mirrors online (queues that would potentially lose data if the target node is shut down). It responds with a status code |
ZABBIX_PASSIVE | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/node-is-mirror-sync-critical{#SINGLETON}"] Preprocessing: - REGEX: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
RabbitMQ | RabbitMQ: Healthcheck: queues with minimum online quorum{#SINGLETON} | It checks if there are quorum queues with minimum online quorum (queues that would lose their quorum and availability if the target node is shut down). It responds with a status code |
ZABBIX_PASSIVE | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/node-is-quorum-critical{#SINGLETON}"] Preprocessing: - REGEX: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
RabbitMQ | RabbitMQ: Healthcheck{#SINGLETON} | It checks whether the RabbitMQ application is running; and whether the channels and queues can be listed successfully; and that no alarms are in effect. |
ZABBIX_PASSIVE | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/healthchecks/node{#SINGLETON}"] Preprocessing: - REGEX: - JSONPATH: - BOOLTODECIMAL ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Get data | The HTTP API endpoint that returns [{#VHOST}][{#QUEUE}] queue metrics |
DEPENDENT | rabbitmq.get_exchanges["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages | The count of total messages in the queue. |
DEPENDENT | rabbitmq.queue.messages["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages per second | The count of total messages per second in the queue. |
DEPENDENT | rabbitmq.queue.messages.rate["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Consumers | The number of consumers. |
DEPENDENT | rabbitmq.queue.consumers["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Memory | The bytes of memory consumed by the Erlang process associated with the queue, including stack, heap and internal structures. |
DEPENDENT | rabbitmq.queue.memory["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages ready | The number of messages ready to be delivered to clients. |
DEPENDENT | rabbitmq.queue.messages_ready["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages ready per second | The number of messages per second ready to be delivered to clients. |
DEPENDENT | rabbitmq.queue.messages_ready.rate["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages unacknowledged | The number of messages delivered to clients but not yet acknowledged. |
DEPENDENT | rabbitmq.queue.messages_unacknowledged["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages unacknowledged per second | The number of messages per second delivered to clients but not yet acknowledged. |
DEPENDENT | rabbitmq.queue.messages_unacknowledged.rate["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages acknowledged | The number of messages delivered to clients and acknowledged. |
DEPENDENT | rabbitmq.queue.messages.ack["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages acknowledged per second | The number of messages (per second) delivered to clients and acknowledged. |
DEPENDENT | rabbitmq.queue.messages.ack.rate["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages delivered | The count of messages delivered to consumers in acknowledgement mode. |
DEPENDENT | rabbitmq.queue.messages.deliver["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages delivered per second | The count of messages (per second) delivered to consumers in acknowledgement mode. |
DEPENDENT | rabbitmq.queue.messages.deliver.rate["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Sum of messages delivered | The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to |
DEPENDENT | rabbitmq.queue.messages.deliverget["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Sum of messages delivered per second | The rate of delivery per second. The sum of messages delivered (per second) to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to |
DEPENDENT | rabbitmq.queue.messages.deliverget.rate["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages published | The count of published messages. |
DEPENDENT | rabbitmq.queue.messages.publish["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages published per second | The rate of published messages per second. |
DEPENDENT | rabbitmq.queue.messages.publish.rate["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages redelivered | The count of subset of messages in the |
DEPENDENT | rabbitmq.queue.messages.redeliver["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
RabbitMQ | RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages redelivered per second | The rate of messages redelivered per second. |
DEPENDENT | rabbitmq.queue.messages.redeliver.rate["{#VHOST}/{#QUEUE}"] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Zabbix raw items | RabbitMQ: Get node overview | The HTTP API endpoint that returns cluster-wide metrics. |
ZABBIX_PASSIVE | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/overview"] Preprocessing: - REGEX: |
Zabbix raw items | RabbitMQ: Get queues | The HTTP API endpoint that returns metrics of the queues metrics. |
ZABBIX_PASSIVE | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/queues"] Preprocessing: - REGEX: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ: Version has changed | The RabbitMQ version has changed. Acknowledge (Ack) to close manually. |
last(/RabbitMQ node by Zabbix agent/rabbitmq.node.overview.rabbitmq_version,#1)<>last(/RabbitMQ node by Zabbix agent/rabbitmq.node.overview.rabbitmq_version,#2) and length(last(/RabbitMQ node by Zabbix agent/rabbitmq.node.overview.rabbitmq_version))>0 |
INFO | Manual close: YES |
RabbitMQ: Number of network partitions is too high | For more details see Detecting Network Partitions. |
min(/RabbitMQ node by Zabbix agent/rabbitmq.node.partitions,5m)>0 |
WARNING | |
RabbitMQ: Memory alarm | For more details see Memory Alarms. |
last(/RabbitMQ node by Zabbix agent/rabbitmq.node.mem_alarm)=1 |
AVERAGE | |
RabbitMQ: Free disk space alarm | For more details see Free Disk Space Alarms. |
last(/RabbitMQ node by Zabbix agent/rabbitmq.node.disk_free_alarm)=1 |
AVERAGE | |
RabbitMQ: Host has been restarted | The host uptime is less than 10 minutes. |
last(/RabbitMQ node by Zabbix agent/rabbitmq.node.uptime)<10m |
INFO | Manual close: YES |
RabbitMQ: Process is not running | - |
last(/RabbitMQ node by Zabbix agent/rabbitmq.proc.num[{#NAME}])=0 |
HIGH | |
RabbitMQ: Service is down | - |
last(/RabbitMQ node by Zabbix agent/net.tcp.service["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"])=0 and last(/RabbitMQ node by Zabbix agent/rabbitmq.proc.num[{#NAME}])>0 |
AVERAGE | Manual close: YES |
RabbitMQ: Failed to fetch nodes data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/nodes/{$RABBITMQ.CLUSTER.NAME}@{HOST.NAME}?memory=true"],30m)=1 and last(/RabbitMQ node by Zabbix agent/rabbitmq.proc.num[{#NAME}])>0 |
WARNING | Manual close: YES Depends on: - RabbitMQ: Process is not running |
RabbitMQ: Node is not running | The RabbitMQ node is not running. |
max(/RabbitMQ node by Zabbix agent/rabbitmq.node.running,5m)=0 and last(/RabbitMQ node by Zabbix agent/rabbitmq.proc.num[{#NAME}])>0 |
AVERAGE | Depends on: - RabbitMQ: Service is down |
RabbitMQ: Service response time is too high | - |
min(/RabbitMQ node by Zabbix agent/net.tcp.service.perf["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"],5m)>{$RABBITMQ.RESPONSE_TIME.MAX.WARN} and last(/RabbitMQ node by Zabbix agent/rabbitmq.proc.num[{#NAME}])>0 |
WARNING | Manual close: YES Depends on: - RabbitMQ: Service is down |
RabbitMQ: There are active alarms in the node | It checks the active alarms in the nodes via API. This is the default API endpoint path: http://{HOST.CONN}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/local-alarms{#SINGLETON}"])=0 |
AVERAGE | |
RabbitMQ: There are valid TLS certificates expiring in the next month | It checks if there are valid TLS certificates expiring in the next month. This is the default API endpoint path: http://{HOST.CONN}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/certificate-expiration/1/months{#SINGLETON}"])=0 |
AVERAGE | |
RabbitMQ: There are not running virtual hosts | It checks if there are not running virtual hosts via API. This is the default API endpoint path: http://{HOST.CONN}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/virtual-hosts{#SINGLETON}"])=0 |
AVERAGE | |
RabbitMQ: There are queues that could potentially lose data if this node goes offline. | It checks whether there are queues that could potentially lose data if this node goes offline via API. This is the default API endpoint path: http://{HOST.CONN}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/node-is-mirror-sync-critical{#SINGLETON}"])=0 |
AVERAGE | |
RabbitMQ: There are queues that would lose their quorum and availability if this node is shut down. | It checks if there are queues that could potentially lose data if this node goes offline via API. This is the default API endpoint path: http://{HOST.CONN}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/node-is-quorum-critical{#SINGLETON}"])=0 |
AVERAGE | |
RabbitMQ: Node healthcheck failed | For more details see Health Checks. |
last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/healthchecks/node{#SINGLETON}"])=0 |
AVERAGE | |
RabbitMQ: Too many messages in queue [{#VHOST}][{#QUEUE}] | - |
min(/RabbitMQ node by Zabbix agent/rabbitmq.queue.messages["{#VHOST}/{#QUEUE}"],5m)>{$RABBITMQ.MESSAGES.MAX.WARN:"{#QUEUE}"} |
WARNING |
Please report any issues with the template at https://support.zabbix.com.
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums.
For Zabbix version: 6.2 and higher
Proxmox VE uses a REST like API. The concept is described in (Resource Oriented Architecture - ROA).
We choose JSON as primary data format, and the whole API is formally defined using JSON Schema.
You can explore the API documentation at http://pve.proxmox.com/pve-docs/api-viewer/index.html
Create an API token for the monitoring user. Important note: for security reasons, it is recommended to create a separate user (Datacenter - Permissions).
For the created API token and user, provide the necessary access levels:
Check: ["perm","/",["Sys.Audit"]]
Check: ["perm","/nodes/{node}",["Sys.Audit"]]
Check: ["perm","/vms/{vmid}",["VM.Audit"]]
Copy the resulting Token ID and Secret into host macros.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$PVE.CPU.PUSE.MAX.WARN} | Maximum used CPU in percentage. |
90 |
{$PVE.LXC.CPU.PUSE.MAX.WARN} | Maximum used CPU in percentage. |
90 |
{$PVE.LXC.MEMORY.PUSE.MAX.WARN} | Maximum used memory in percentage. |
90 |
{$PVE.MEMORY.PUSE.MAX.WARN} | Maximum used memory in percentage. |
90 |
{$PVE.ROOT.PUSE.MAX.WARN} | Maximum used root space in percentage. |
90 |
{$PVE.STORAGE.PUSE.MAX.WARN} | Maximum used storage space in percentage. |
90 |
{$PVE.SWAP.PUSE.MAX.WARN} | Maximum used swap space in percentage. |
90 |
{$PVE.TOKEN.ID} | API tokens allow stateless access to most parts of the REST API by another system, software or API client. |
USER@REALM!TOKENID |
{$PVE.TOKEN.SECRET} | Secret key. |
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx |
{$PVE.URL.PORT} | The API uses the HTTPS protocol and the server listens to port 8006 by default. |
8006 |
{$PVE.VM.CPU.PUSE.MAX.WARN} | Maximum used CPU in percentage. |
90 |
{$PVE.VM.MEMORY.PUSE.MAX.WARN} | Maximum used memory in percentage. |
90 |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster discovery | - |
DEPENDENT | proxmox.cluster.discovery Filter: AND- {#RESOURCE.TYPE} MATCHES_REGEX |
LXC discovery | - |
DEPENDENT | proxmox.lxc.discovery Filter: AND- {#RESOURCE.TYPE} MATCHES_REGEX |
Node discovery | - |
DEPENDENT | proxmox.node.discovery Filter: AND- {#RESOURCE.TYPE} MATCHES_REGEX |
QEMU discovery | - |
DEPENDENT | proxmox.qemu.discovery Filter: AND- {#RESOURCE.TYPE} MATCHES_REGEX |
Storage discovery | - |
DEPENDENT | proxmox.storage.discovery Filter: AND- {#RESOURCE.TYPE} MATCHES_REGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
CPU | Proxmox: Node [{#NODE.NAME}]: CPU, usage | CPU usage. |
DEPENDENT | proxmox.node.cpu[{#NODE.NAME}] Preprocessing: - JSONPATH: - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
CPU | Proxmox: Node [{#NODE.NAME}]: CPU, loadavg | CPU average load. |
DEPENDENT | proxmox.node.loadavg[{#NODE.NAME}] Preprocessing: - JSONPATH: - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
CPU | Proxmox: Node [{#NODE.NAME}]: CPU, iowait | CPU iowait time. |
DEPENDENT | proxmox.node.iowait[{#NODE.NAME}] Preprocessing: - JSONPATH: - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
CPU | Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: CPU usage | CPU load. |
DEPENDENT | proxmox.qemu.cpu[{#QEMU.ID}] Preprocessing: - JSONPATH: - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
CPU | Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: CPU usage | CPU load. |
DEPENDENT | proxmox.lxc.cpu[{#LXC.ID}] Preprocessing: - JSONPATH: - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
General | Proxmox: Node [{#NODE.NAME}]: Time zone | Time zone. |
DEPENDENT | proxmox.node.timezone[{#NODE.NAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
General | Proxmox: Node [{#NODE.NAME}]: Localtime | Seconds since 1970-01-01 00:00:00 (local time). |
DEPENDENT | proxmox.node.localtime[{#NODE.NAME}] Preprocessing: - JSONPATH: |
General | Proxmox: Node [{#NODE.NAME}]: Time | Seconds since 1970-01-01 00:00:00 UTC. |
DEPENDENT | proxmox.node.utctime[{#NODE.NAME}] Preprocessing: - JSONPATH: |
Inventory | Proxmox: Node [{#NODE.NAME}]: PVE version | PVE manager version. |
DEPENDENT | proxmox.node.pveversion[{#NODE.NAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Inventory | Proxmox: Node [{#NODE.NAME}]: Kernel version | Kernel version info. |
DEPENDENT | proxmox.node.kernelversion[{#NODE.NAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Memory | Proxmox: Node [{#NODE.NAME}]: Memory, used | Memory usage. |
DEPENDENT | proxmox.node.memused[{#NODE.NAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Memory | Proxmox: Node [{#NODE.NAME}]: Memory, total | Memory total. |
DEPENDENT | proxmox.node.memtotal[{#NODE.NAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Memory | Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Memory usage | Used memory in Bytes. |
DEPENDENT | proxmox.qemu.mem[{#QEMU.ID}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Memory | Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Memory total | Total memory in Bytes. |
DEPENDENT | proxmox.qemu.maxmem[{#QEMU.ID}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Memory | Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Memory usage | Used memory in Bytes. |
DEPENDENT | proxmox.lxc.mem[{#LXC.ID}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Memory | Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Memory total | Total memory in Bytes. |
DEPENDENT | proxmox.lxc.maxmem[{#LXC.ID}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Network interfaces | Proxmox: Node [{#NODE.NAME}]: Outgoing data, rate | Network usage. |
DEPENDENT | proxmox.node.netout[{#NODE.NAME}] Preprocessing: - JSONPATH: - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
Network interfaces | Proxmox: Node [{#NODE.NAME}]: Incoming data, rate | Network usage. |
DEPENDENT | proxmox.node.netin[{#NODE.NAME}] Preprocessing: - JSONPATH: - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
Network interfaces | Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Incoming data, rate | Incoming data rate. |
DEPENDENT | proxmox.qemu.netin[{#QEMU.ID}] Preprocessing: - JSONPATH: - CHANGEPERSECOND - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
Network interfaces | Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Outgoing data, rate | Outgoing data rate. |
DEPENDENT | proxmox.qemu.netout[{#QEMU.ID}] Preprocessing: - JSONPATH: - CHANGEPERSECOND - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
Network interfaces | Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Incoming data, rate | Incoming data rate. |
DEPENDENT | proxmox.lxc.netin[{#LXC.ID}] Preprocessing: - JSONPATH: - CHANGEPERSECOND - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
Network interfaces | Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Outgoing data, rate | Outgoing data rate. |
DEPENDENT | proxmox.lxc.netout[{#LXC.ID}] Preprocessing: - JSONPATH: - CHANGEPERSECOND - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
Status | Proxmox: API service status | Get API service status. |
SCRIPT | proxmox.api.available Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: Expression: The text is too long. Please see the template. |
Status | Proxmox: Cluster [{#RESOURCE.NAME}]: Quorate | Indicates if there is a majority of nodes online to make decisions. |
DEPENDENT | proxmox.cluster.quorate[{#RESOURCE.NAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Status | Proxmox: Node [{#NODE.NAME}]: Status | Indicates if the node is online or offline. |
DEPENDENT | proxmox.node.online[{#NODE.NAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Status | Proxmox: Node [{#NODE.NAME}]: Uptime | System uptime in 'N days, hh:mm:ss' format. |
DEPENDENT | proxmox.node.uptime[{#NODE.NAME}] Preprocessing: - JSONPATH: |
Status | Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Uptime | System uptime in 'N days, hh:mm:ss' format. |
DEPENDENT | proxmox.qemu.uptime[{#QEMU.ID}] Preprocessing: - JSONPATH: |
Status | Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Status | - |
DEPENDENT | proxmox.qemu.vmstatus[{#QEMU.ID}] Preprocessing: - JSONPATH: |
Status | Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Uptime | System uptime in 'N days, hh:mm:ss' format. |
DEPENDENT | proxmox.lxc.uptime[{#LXC.ID}] Preprocessing: - JSONPATH: |
Status | Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Status | - |
DEPENDENT | proxmox.lxc.vmstatus[{#LXC.ID}] Preprocessing: - JSONPATH: |
Storage | Proxmox: Node [{#NODE.NAME}]: Root filesystem, used | Root filesystem usage. |
DEPENDENT | proxmox.node.rootused[{#NODE.NAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Storage | Proxmox: Node [{#NODE.NAME}]: Root filesystem, total | Root filesystem total. |
DEPENDENT | proxmox.node.roottotal[{#NODE.NAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Storage | Proxmox: Node [{#NODE.NAME}]: Swap filesystem, total | Swap total. |
DEPENDENT | proxmox.node.swaptotal[{#NODE.NAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Storage | Proxmox: Node [{#NODE.NAME}]: Swap filesystem, used | Swap used. |
DEPENDENT | proxmox.node.swapused[{#NODE.NAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Storage | Proxmox: Storage [{#NODE.NAME}/{#STORAGE.NAME}]: Type | More specific type, if available. |
DEPENDENT | proxmox.node.plugintype[{#NODE.NAME},{#STORAGE.NAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Storage | Proxmox: Storage [{#NODE.NAME}/{#STORAGE.NAME}]: Size | Storage size in bytes. |
DEPENDENT | proxmox.node.maxdisk[{#NODE.NAME},{#STORAGE.NAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Storage | Proxmox: Storage [{#NODE.NAME}/{#STORAGE.NAME}]: Content | Allowed storage content types. |
DEPENDENT | proxmox.node.content[{#NODE.NAME},{#STORAGE.NAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Storage | Proxmox: Storage [{#NODE.NAME}/{#STORAGE.NAME}]: Used | Used disk space in bytes. |
DEPENDENT | proxmox.node.disk[{#NODE.NAME},{#STORAGE.NAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Storage | Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Disk write, rate | Disk write. |
DEPENDENT | proxmox.qemu.diskwrite[{#QEMU.ID}] Preprocessing: - JSONPATH: - CHANGEPERSECOND - DISCARDUNCHANGEDHEARTBEAT: |
Storage | Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Disk read, rate | Disk read. |
DEPENDENT | proxmox.qemu.diskread[{#QEMU.ID}] Preprocessing: - JSONPATH: - CHANGEPERSECOND - DISCARDUNCHANGEDHEARTBEAT: |
Storage | Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Disk write, rate | Disk write. |
DEPENDENT | proxmox.lxc.diskwrite[{#LXC.ID}] Preprocessing: - JSONPATH: - CHANGEPERSECOND - DISCARDUNCHANGEDHEARTBEAT: |
Storage | Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Disk read, rate | Disk read. |
DEPENDENT | proxmox.lxc.diskread[{#LXC.ID}] Preprocessing: - JSONPATH: - CHANGEPERSECOND - DISCARDUNCHANGEDHEARTBEAT: |
Zabbix raw items | Proxmox: Get cluster resources | Resources index. |
HTTP_AGENT | proxmox.cluster.resources Preprocessing: - CHECKNOTSUPPORTED ⛔️ON_FAIL: |
Zabbix raw items | Proxmox: Get cluster status | Get cluster status information. |
HTTP_AGENT | proxmox.cluster.status Preprocessing: - CHECKNOTSUPPORTED ⛔️ON_FAIL: |
Zabbix raw items | Proxmox: Node [{#NODE.NAME}]: Status | Read node status. |
HTTP_AGENT | proxmox.node.status[{#NODE.NAME}] |
Zabbix raw items | Proxmox: Node [{#NODE.NAME}]: RRD statistics | Read node RRD statistics. |
HTTP_AGENT | proxmox.node.rrd[{#NODE.NAME}] Preprocessing: - JAVASCRIPT: |
Zabbix raw items | Proxmox: Node [{#NODE.NAME}]: Time | Read server time and time zone settings. |
HTTP_AGENT | proxmox.node.time[{#NODE.NAME}] |
Zabbix raw items | Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME}]: Status | Read VM status. |
HTTP_AGENT | proxmox.qemu.status[{#QEMU.ID}] |
Zabbix raw items | Proxmox: LXC [{#LXC.NAME}/{#LXC.NAME}]: Status | Read LXC status. |
HTTP_AGENT | proxmox.lxc.status[{#LXC.ID}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Proxmox: Node [{#NODE.NAME}] high CPU usage | CPU usage. |
min(/Proxmox VE by HTTP/proxmox.node.cpu[{#NODE.NAME}],5m) > {$PVE.CPU.PUSE.MAX.WARN:"{#NODE.NAME}"} |
WARNING | |
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})] high CPU usage | CPU usage. |
min(/Proxmox VE by HTTP/proxmox.qemu.cpu[{#QEMU.ID}],5m) > {$PVE.VM.CPU.PUSE.MAX.WARN:"{#QEMU.ID}"} |
WARNING | |
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})] high CPU usage | CPU usage. |
min(/Proxmox VE by HTTP/proxmox.lxc.cpu[{#LXC.ID}],5m) > {$PVE.LXC.CPU.PUSE.MAX.WARN:"{#LXC.ID}"} |
WARNING | |
Proxmox: Node [{#NODE.NAME}]: PVE manager has changed | Firmware version has changed. Ack to close |
last(/Proxmox VE by HTTP/proxmox.node.pveversion[{#NODE.NAME}],#1)<>last(/Proxmox VE by HTTP/proxmox.node.pveversion[{#NODE.NAME}],#2) and length(last(/Proxmox VE by HTTP/proxmox.node.pveversion[{#NODE.NAME}]))>0 |
INFO | Manual close: YES |
Proxmox: Node [{#NODE.NAME}]: Kernel version has changed | Firmware version has changed. Ack to close |
last(/Proxmox VE by HTTP/proxmox.node.kernelversion[{#NODE.NAME}],#1)<>last(/Proxmox VE by HTTP/proxmox.node.kernelversion[{#NODE.NAME}],#2) and length(last(/Proxmox VE by HTTP/proxmox.node.kernelversion[{#NODE.NAME}]))>0 |
INFO | Manual close: YES |
Proxmox: Node [{#NODE.NAME}] high memory usage | Memory usage. |
min(/Proxmox VE by HTTP/proxmox.node.memused[{#NODE.NAME}],5m) / last(/Proxmox VE by HTTP/proxmox.node.memtotal[{#NODE.NAME}]) * 100 >{$PVE.MEMORY.PUSE.MAX.WARN:"{#NODE.NAME}"} |
WARNING | |
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})] high memory usage | Memory usage. |
min(/Proxmox VE by HTTP/proxmox.qemu.mem[{#QEMU.ID}],5m) / last(/Proxmox VE by HTTP/proxmox.qemu.maxmem[{#QEMU.ID}]) * 100 >{$PVE.VM.MEMORY.PUSE.MAX.WARN:"{#QEMU.ID}"} |
WARNING | |
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})] high memory usage | Memory usage. |
min(/Proxmox VE by HTTP/proxmox.lxc.mem[{#LXC.ID}],5m) / last(/Proxmox VE by HTTP/proxmox.lxc.maxmem[{#LXC.ID}]) * 100 >{$PVE.LXC.MEMORY.PUSE.MAX.WARN:"{#LXC.ID}"} |
WARNING | |
Proxmox: API service not available | The API service is not available. Check your network and authorization settings. |
last(/Proxmox VE by HTTP/proxmox.api.available) <> 200 |
HIGH | |
Proxmox: Cluster [{#RESOURCE.NAME}] not quorum | Proxmox VE use a quorum-based technique to provide a consistent state among all cluster nodes. |
last(/Proxmox VE by HTTP/proxmox.cluster.quorate[{#RESOURCE.NAME}]) <> 1 |
HIGH | |
Proxmox: Node [{#NODE.NAME}] offline | Node offline. |
last(/Proxmox VE by HTTP/proxmox.node.online[{#NODE.NAME}]) <> 1 |
HIGH | |
Proxmox: Node [{#NODE.NAME}]: has been restarted | Uptime is less than 10 minutes. |
last(/Proxmox VE by HTTP/proxmox.node.uptime[{#NODE.NAME}])<10m |
INFO | Manual close: YES Depends on: - Proxmox: Node [{#NODE.NAME}] offline |
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME}]: has been restarted | Uptime is less than 10 minutes. |
last(/Proxmox VE by HTTP/proxmox.qemu.uptime[{#QEMU.ID}])<10m |
INFO | Manual close: YES Depends on: - Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Not running |
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Not running | VM state is not "running". |
last(/Proxmox VE by HTTP/proxmox.qemu.vmstatus[{#QEMU.ID}])<>"running" |
AVERAGE | |
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME}]: has been restarted | Uptime is less than 10 minutes. |
last(/Proxmox VE by HTTP/proxmox.lxc.uptime[{#LXC.ID}])<10m |
INFO | Manual close: YES Depends on: - Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Not running |
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Not running | LXC state is not "running". |
last(/Proxmox VE by HTTP/proxmox.lxc.vmstatus[{#LXC.ID}])<>"running" |
AVERAGE | |
Proxmox: Node [{#NODE.NAME}] high root filesystem space usage | Root filesystem space usage. |
min(/Proxmox VE by HTTP/proxmox.node.rootused[{#NODE.NAME}],5m) / last(/Proxmox VE by HTTP/proxmox.node.roottotal[{#NODE.NAME}]) * 100 >{$PVE.ROOT.PUSE.MAX.WARN:"{#NODE.NAME}"} |
WARNING | |
Proxmox: Node [{#NODE.NAME}] high root filesystem space usage | This trigger is ignored, if there is no swap configured. |
min(/Proxmox VE by HTTP/proxmox.node.swapused[{#NODE.NAME}],5m) / last(/Proxmox VE by HTTP/proxmox.node.swaptotal[{#NODE.NAME}]) * 100 > {$PVE.SWAP.PUSE.MAX.WARN:"{#NODE.NAME}"} and last(/Proxmox VE by HTTP/proxmox.node.swaptotal[{#NODE.NAME}]) > 0 |
WARNING | |
Proxmox: Storage [{#NODE.NAME}/{#STORAGE.NAME}] high filesystem space usage | Root filesystem space usage. |
min(/Proxmox VE by HTTP/proxmox.node.disk[{#NODE.NAME},{#STORAGE.NAME}],5m) / last(/Proxmox VE by HTTP/proxmox.node.maxdisk[{#NODE.NAME},{#STORAGE.NAME}]) * 100 >{$PVE.STORAGE.PUSE.MAX.WARN:"{#NODE.NAME}/{#STORAGE.NAME}"} |
WARNING |
Please report any issues with the template at https://support.zabbix.com
For Zabbix version: 6.2 and higher
Refer to the vendor documentation.
No specific Zabbix configuration is required.
There are no template links in this template.
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Services | POP service is running | - |
SIMPLE | net.tcp.service[pop] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
POP service is down on {HOST.NAME} | - |
max(/POP Service/net.tcp.service[pop],#3)=0 |
AVERAGE |
Please report any issues with the template at https://support.zabbix.com
For Zabbix version: 6.2 and higher
The template to monitor PHP-FPM by Zabbix that work without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template PHP-FPM by HTTP
collects metrics by polling PHP-FPM status-page with HTTP agent remotely.
Note that this solution supports https and redirects.
This template was tested on:
See Zabbix template operation for basic instructions.
pm.status_path = /status
ping.path = /ping
$ php-fpm7 -t
$ systemctl reload php-fpm
# Enable php-fpm status page
location ~ ^/(status|ping)$ {
## disable access logging for request if you prefer
access_log off;
## Only allow trusted IPs for security, deny everyone else
# allow 127.0.0.1;
# allow 1.2.3.4; # your IP here
# deny all;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_index index.php;
include fastcgi_params;
## Now the port or socket of the php-fpm pool we want the status of
fastcgi_pass 127.0.0.1:9000;
# fastcgi_pass unix:/run/php-fpm/your_socket.sock;
}
Check the syntax
$ nginx -t
Reload Nginx
$ systemctl reload nginx
Verify
curl -L 127.0.0.1/status
If you use another location of status/ping page, don't forget to change {$PHPFPM.STATUS.PAGE}/{$PHPFPM.PING.PAGE} macro.
If you use an atypical location for PHP-FPM status-page don't forget to change the macros {$PHPFPM.SCHEME},{$PHPFPM.PORT}.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$PHP_FPM.HOST} | Hostname or IP of PHP-FPM status host or container. |
localhost |
{$PHP_FPM.PING.PAGE} | The path of PHP-FPM ping page. |
ping |
{$PHP_FPM.PING.REPLY} | Expected reply to the ping. |
pong |
{$PHP_FPM.PORT} | The port of PHP-FPM status host or container. |
80 |
{$PHP_FPM.QUEUE.WARN.MAX} | The maximum PHP-FPM queue usage percent for trigger expression. |
80 |
{$PHP_FPM.SCHEME} | Request scheme which may be http or https |
http |
{$PHP_FPM.STATUS.PAGE} | The path of PHP-FPM status page. |
status |
There are no template links in this template.
Group | Name | Description | Type | Key and additional info | |
---|---|---|---|---|---|
PHP-FPM | PHP-FPM: Ping | - |
DEPENDENT | php-fpm.ping Preprocessing: - REGEX: `{$PHP_FPM.PING.REPLY}($ |
\r?\n) 1</p><p>⛔️ON_FAIL: CUSTOM_VALUE -> 0` |
PHP-FPM | PHP-FPM: Processes, active | The total number of active processes. |
DEPENDENT | php-fpm.processes_active Preprocessing: - JSONPATH: |
|
PHP-FPM | PHP-FPM: Version | Current version PHP. Get from HTTP-Header "X-Powered-By" and may not work if you change default HTTP-headers. |
DEPENDENT | php-fpm.version Preprocessing: - REGEX: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
|
PHP-FPM | PHP-FPM: Pool name | The name of current pool. |
DEPENDENT | php-fpm.name Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
|
PHP-FPM | PHP-FPM: Uptime | How long has this pool been running. |
DEPENDENT | php-fpm.uptime Preprocessing: - JSONPATH: |
|
PHP-FPM | PHP-FPM: Start time | The time when this pool was started. |
DEPENDENT | php-fpm.start_time Preprocessing: - JSONPATH: |
|
PHP-FPM | PHP-FPM: Processes, total | The total number of server processes currently running. |
DEPENDENT | php-fpm.processes_total Preprocessing: - JSONPATH: |
|
PHP-FPM | PHP-FPM: Processes, idle | The total number of idle processes. |
DEPENDENT | php-fpm.processes_idle Preprocessing: - JSONPATH: |
|
PHP-FPM | PHP-FPM: Process manager | The method used by the process manager to control the number of child processes for this pool. |
DEPENDENT | php-fpm.processmanager Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:3h |
|
PHP-FPM | PHP-FPM: Processes, max active | The highest value that 'active processes' has reached since the php-fpm server started. |
DEPENDENT | php-fpm.processesmaxactive Preprocessing: - JSONPATH: |
|
PHP-FPM | PHP-FPM: Accepted connections per second | The number of accepted requests per second. |
DEPENDENT | php-fpm.connaccepted.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
|
PHP-FPM | PHP-FPM: Slow requests | The number of requests that exceeded your requestslowlogtimeout value. |
DEPENDENT | php-fpm.slowrequests Preprocessing: - JSONPATH: - SIMPLE CHANGE |
|
PHP-FPM | PHP-FPM: Listen queue | The current number of connections that have been initiated, but not yet accepted. |
DEPENDENT | php-fpm.listen_queue Preprocessing: - JSONPATH: |
|
PHP-FPM | PHP-FPM: Listen queue, max | The maximum number of requests in the queue of pending connections since this FPM pool has started. |
DEPENDENT | php-fpm.listenqueuemax Preprocessing: - JSONPATH: |
|
PHP-FPM | PHP-FPM: Listen queue, len | Size of the socket queue of pending connections. |
DEPENDENT | php-fpm.listenqueuelen Preprocessing: - JSONPATH: |
|
PHP-FPM | PHP-FPM: Queue usage | Queue utilization |
CALCULATED | php-fpm.listenqueueusage Expression: last(//php-fpm.listen_queue)/(last(//php-fpm.listen_queue_len)+(last(//php-fpm.listen_queue_len)=0))*100 |
|
PHP-FPM | PHP-FPM: Max children reached | The number of times that pm.max_children has been reached since the php-fpm pool started |
DEPENDENT | php-fpm.maxchildren Preprocessing: - JSONPATH: - SIMPLE CHANGE |
|
Zabbix raw items | PHP-FPM: Get ping page | - |
HTTP_AGENT | php-fpm.get_ping | |
Zabbix raw items | PHP-FPM: Get status page | - |
HTTP_AGENT | php-fpm.get_status |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
PHP-FPM: Service is down | - |
last(/PHP-FPM by HTTP/php-fpm.ping)=0 or nodata(/PHP-FPM by HTTP/php-fpm.ping,3m)=1 |
HIGH | Manual close: YES |
PHP-FPM: Version has changed | PHP-FPM version has changed. Ack to close. |
last(/PHP-FPM by HTTP/php-fpm.version,#1)<>last(/PHP-FPM by HTTP/php-fpm.version,#2) and length(last(/PHP-FPM by HTTP/php-fpm.version))>0 |
INFO | Manual close: YES |
PHP-FPM: Failed to fetch info data | Zabbix has not received data for items for the last 30 minutes |
nodata(/PHP-FPM by HTTP/php-fpm.uptime,30m)=1 |
INFO | Manual close: YES Depends on: - PHP-FPM: Service is down |
PHP-FPM: Pool has been restarted | Uptime is less than 10 minutes. |
last(/PHP-FPM by HTTP/php-fpm.uptime)<10m |
INFO | Manual close: YES |
PHP-FPM: Manager changed | PHP-FPM manager changed. Ack to close. |
last(/PHP-FPM by HTTP/php-fpm.process_manager,#1)<>last(/PHP-FPM by HTTP/php-fpm.process_manager,#2) |
INFO | Manual close: YES |
PHP-FPM: Detected slow requests | PHP-FPM detected slow request. A slow request means that it took more time to execute than expected (defined in the configuration of your pool). |
min(/PHP-FPM by HTTP/php-fpm.slow_requests,#3)>0 |
WARNING | |
PHP-FPM: Queue utilization is high | The queue for this pool reached {$PHP_FPM.QUEUE.WARN.MAX}% of its maximum capacity. Items in queue represent the current number of connections that have been initiated on this pool, but not yet accepted. |
min(/PHP-FPM by HTTP/php-fpm.listen_queue_usage,15m) > {$PHP_FPM.QUEUE.WARN.MAX} |
WARNING |
Please report any issues with the template at https://support.zabbix.com
For Zabbix version: 6.2 and higher. This template is developed to monitor the FastCGI Process Manager (PHP-FPM) by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template PHP-FPM by Zabbix agent
- collects metrics by polling the PHP-FPM status-page locally with Zabbix agent.
Note that this template doesn't support HTTPS and redirects (limitations of web.page.get
).
It also uses Zabbix agent to collect PHP-FPM
Linux process statistics, such as CPU usage, memory usage, and whether the process is running or not.
This template was tested on:
See Zabbix template operation for basic instructions.
php-fpm configuration file
and enable the status page as shown.
pm.status_path = /status
ping.path = /ping
$ php-fpm7 -t
php-fpm service
to make the change active.
$ systemctl reload php-fpm
# Enable php-fpm status page
location ~ ^/(status|ping)$ {
## disable access logging for request if you prefer
access_log off;
## Only allow trusted IPs for security, deny everyone else
# allow 127.0.0.1;
# allow 1.2.3.4; # your IP here
# deny all;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_index index.php;
include fastcgi_params;
## Now the port or socket of the php-fpm pool we want the status of
fastcgi_pass 127.0.0.1:9000;
# fastcgi_pass unix:/run/php-fpm/your_socket.sock;
}
Check the syntax again.
$ nginx -t
Reload Nginx server.
$ systemctl reload nginx
Verify it with this command line.
curl -L 127.0.0.1/status
If you use another location of the status/ping page, don't forget to change the {$PHP_FPM.STATUS.PAGE}/{$PHP_FPM.PING.PAGE}
macro.
If you use an atypical location for the PHP-FPM status-page, don't forget to change the macro {$PHP_FPM.PORT}
.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$PHP_FPM.HOST} | The Hostname or an IP address of the PHP-FPM status for a host or container. |
localhost |
{$PHP_FPM.PING.PAGE} | The path of the PHP-FPM ping page. |
ping |
{$PHP_FPM.PING.REPLY} | The expected reply to the ping. |
pong |
{$PHP_FPM.PORT} | The port of the PHP-FPM status host or container. |
80 |
{$PHPFPM.PROCESSNAME} | The name of the PHP-FPM process. |
php-fpm |
{$PHP_FPM.QUEUE.WARN.MAX} | The maximum percent of the PHP-FPM queue usage for a trigger expression. |
80 |
{$PHP_FPM.STATUS.PAGE} | The path of the PHP-FPM status page. |
status |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
PHP-FPM process discovery | The discovery of the PHP-FPM summary processes. |
DEPENDENT | php-fpm.proc.discovery Filter: AND- {#NAME} MATCHES_REGEX |
Group | Name | Description | Type | Key and additional info | |
---|---|---|---|---|---|
PHP-FPM | PHP-FPM: Get processes summary | The aggregated data of summary metrics for all processes. |
ZABBIX_PASSIVE | proc.get[,,,summary] | |
PHP-FPM | PHP-FPM: Ping | - |
DEPENDENT | php-fpm.ping Preprocessing: - REGEX: `{$PHP_FPM.PING.REPLY}($ |
\r?\n) 1</p><p>⛔️ON_FAIL: CUSTOM_VALUE -> 0` |
PHP-FPM | PHP-FPM: Processes, active | The total number of active processes. |
DEPENDENT | php-fpm.processes_active Preprocessing: - JSONPATH: |
|
PHP-FPM | PHP-FPM: Version | The current version of the PHP. You can get it from the HTTP-Header "X-Powered-By"; it may not work if you have changed the default HTTP-headers. |
DEPENDENT | php-fpm.version Preprocessing: - REGEX: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
|
PHP-FPM | PHP-FPM: Pool name | The name of the current pool. |
DEPENDENT | php-fpm.name Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
|
PHP-FPM | PHP-FPM: Uptime | It indicates how long has this pool been running. |
DEPENDENT | php-fpm.uptime Preprocessing: - JSONPATH: |
|
PHP-FPM | PHP-FPM: Start time | The time when this pool was started. |
DEPENDENT | php-fpm.start_time Preprocessing: - JSONPATH: |
|
PHP-FPM | PHP-FPM: Processes, total | The total number of server processes running currently. |
DEPENDENT | php-fpm.processes_total Preprocessing: - JSONPATH: |
|
PHP-FPM | PHP-FPM: Processes, idle | The total number of idle processes. |
DEPENDENT | php-fpm.processes_idle Preprocessing: - JSONPATH: |
|
PHP-FPM | PHP-FPM: Queue usage | The utilization of the queue. |
CALCULATED | php-fpm.listenqueueusage Expression: last(//php-fpm.listen_queue)/(last(//php-fpm.listen_queue_len)+(last(//php-fpm.listen_queue_len)=0))*100 |
|
PHP-FPM | PHP-FPM: Process manager | The method used by the process manager to control the number of child processes for this pool. |
DEPENDENT | php-fpm.processmanager Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:3h |
|
PHP-FPM | PHP-FPM: Processes, max active | The highest value of "active processes" since the PHP-FPM server was started. |
DEPENDENT | php-fpm.processesmaxactive Preprocessing: - JSONPATH: |
|
PHP-FPM | PHP-FPM: Accepted connections per second | The number of accepted requests per second. |
DEPENDENT | php-fpm.connaccepted.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
|
PHP-FPM | PHP-FPM: Slow requests | The number of requests that has exceeded your |
DEPENDENT | php-fpm.slowrequests Preprocessing: - JSONPATH: - SIMPLE CHANGE |
|
PHP-FPM | PHP-FPM: Listen queue | The current number of connections that have been initiated but not yet accepted. |
DEPENDENT | php-fpm.listen_queue Preprocessing: - JSONPATH: |
|
PHP-FPM | PHP-FPM: Listen queue, max | The maximum number of requests in the queue of pending connections since this FPM pool was started. |
DEPENDENT | php-fpm.listenqueuemax Preprocessing: - JSONPATH: |
|
PHP-FPM | PHP-FPM: Listen queue, len | The size of the socket queue of pending connections. |
DEPENDENT | php-fpm.listenqueuelen Preprocessing: - JSONPATH: |
|
PHP-FPM | PHP-FPM: Max children reached | The number of times that |
DEPENDENT | php-fpm.maxchildren Preprocessing: - JSONPATH: - SIMPLE CHANGE |
|
PHP-FPM | PHP-FPM: Get process data | The summary metrics aggregated by a process |
DEPENDENT | php-fpm.proc.get[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
|
PHP-FPM | PHP-FPM: Memory usage (rss) | The summary of resident set size memory used by a process |
DEPENDENT | php-fpm.proc.rss[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
|
PHP-FPM | PHP-FPM: Memory usage (vsize) | The summary of virtual memory used by a process |
DEPENDENT | php-fpm.proc.vmem[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
|
PHP-FPM | PHP-FPM: Memory usage, % | The percentage of real memory used by a process |
DEPENDENT | php-fpm.proc.pmem[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
|
PHP-FPM | PHP-FPM: Number of running processes | The number of running processes |
DEPENDENT | php-fpm.proc.num[{#NAME}] Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
|
PHP-FPM | PHP-FPM: CPU utilization | The percentage of the CPU utilization by a process |
ZABBIX_PASSIVE | proc.cpu.util[{#NAME}] | |
Zabbix raw items | PHP-FPM: php-fpm_ping | - |
ZABBIX_PASSIVE | web.page.get["{$PHPFPM.HOST}","{$PHPFPM.PING.PAGE}","{$PHP_FPM.PORT}"] | |
Zabbix raw items | PHP-FPM: Get status page | - |
ZABBIX_PASSIVE | web.page.get["{$PHPFPM.HOST}","{$PHPFPM.STATUS.PAGE}?json","{$PHP_FPM.PORT}"] Preprocessing: - REGEX: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
PHP-FPM: Version has changed | The PHP-FPM version has changed. Acknowledge (Ack) to close manually. |
last(/PHP-FPM by Zabbix agent/php-fpm.version,#1)<>last(/PHP-FPM by Zabbix agent/php-fpm.version,#2) and length(last(/PHP-FPM by Zabbix agent/php-fpm.version))>0 |
INFO | Manual close: YES |
PHP-FPM: Pool has been restarted | Uptime is less than 10 minutes. |
last(/PHP-FPM by Zabbix agent/php-fpm.uptime)<10m |
INFO | Manual close: YES |
PHP-FPM: Queue utilization is high | The queue for this pool has reached |
min(/PHP-FPM by Zabbix agent/php-fpm.listen_queue_usage,15m) > {$PHP_FPM.QUEUE.WARN.MAX} |
WARNING | |
PHP-FPM: Manager changed | The PHP-FPM manager has changed. |
last(/PHP-FPM by Zabbix agent/php-fpm.process_manager,#1)<>last(/PHP-FPM by Zabbix agent/php-fpm.process_manager,#2) |
INFO | Manual close: YES |
PHP-FPM: Detected slow requests | The PHP-FPM has detected a slow request. The slow request means that it took more time to execute than expected (defined in the configuration of your pool). |
min(/PHP-FPM by Zabbix agent/php-fpm.slow_requests,#3)>0 |
WARNING | |
PHP-FPM: Process is not running | - |
last(/PHP-FPM by Zabbix agent/php-fpm.proc.num[{#NAME}])=0 |
HIGH | |
PHP-FPM: Failed to fetch info data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/PHP-FPM by Zabbix agent/php-fpm.uptime,30m)=1 and last(/PHP-FPM by Zabbix agent/php-fpm.proc.num[{#NAME}])>0 |
INFO | Manual close: YES |
PHP-FPM: Service is down | - |
(last(/PHP-FPM by Zabbix agent/php-fpm.ping)=0 or nodata(/PHP-FPM by Zabbix agent/php-fpm.ping,3m)=1) and last(/PHP-FPM by Zabbix agent/php-fpm.proc.num[{#NAME}])>0 |
HIGH | Manual close: YES |
Please report any issues with the template at https://support.zabbix.com.
For Zabbix version: 6.2 and higher. Template for monitoring pfSense by SNMP
This template was tested on:
See Zabbix template operation for basic instructions.
No specific Zabbix configuration is required.
Name | Description | Default | |
---|---|---|---|
{$IF.ERRORS.WARN} | Threshold of error packets rate for warning trigger. Can be used with interface name as context. |
2 |
|
{$IF.UTIL.MAX} | Threshold of interface bandwidth utilization for warning trigger in %. Can be used with interface name as context. |
90 |
|
{$IFCONTROL} | Macro for operational state of the interface for link down trigger. Can be used with interface name as context. |
1 |
|
{$NET.IF.IFADMINSTATUS.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
^.* |
|
{$NET.IF.IFADMINSTATUS.NOT_MATCHES} | Ignore down(2) administrative status |
^2$ |
|
{$NET.IF.IFALIAS.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
.* |
|
{$NET.IF.IFALIAS.NOT_MATCHES} | This macro is used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
|
{$NET.IF.IFDESCR.MATCHES} | This macro used in filters of network interfaces discovery rule. |
.* |
|
{$NET.IF.IFDESCR.NOT_MATCHES} | This macro used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
|
{$NET.IF.IFNAME.NOT_MATCHES} | This macro used in filters of network interfaces discovery rule. |
`(^pflog[0-9.]*$ | ^pfsync[0-9.]*$)` |
{$NET.IF.IFOPERSTATUS.MATCHES} | This macro used in filters of network interfaces discovery rule. |
^.*$ |
|
{$NET.IF.IFOPERSTATUS.NOT_MATCHES} | Ignore notPresent(6) |
^6$ |
|
{$NET.IF.IFTYPE.MATCHES} | This macro used in filters of network interfaces discovery rule. |
.* |
|
{$NET.IF.IFTYPE.NOT_MATCHES} | This macro used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
|
{$SNMP.TIMEOUT} | The time interval for SNMP availability trigger. |
5m |
|
{$SOURCE.TRACKING.TABLE.UTIL.MAX} | Threshold of source tracking table utilization trigger in %. |
90 |
|
{$STATE.TABLE.UTIL.MAX} | Threshold of state table utilization trigger in %. |
90 |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Network interfaces discovery | Discovering interfaces from IF-MIB. |
SNMP | pfsense.net.if.discovery Filter: AND- {#IFADMINSTATUS} MATCHESREGEX - {#IFADMINSTATUS} NOTMATCHESREGEX - {#IFOPERSTATUS} MATCHESREGEX - {#IFOPERSTATUS} NOTMATCHESREGEX - {#IFNAME} MATCHESREGEX - {#IFNAME} NOTMATCHESREGEX - {#IFDESCR} MATCHESREGEX - {#IFDESCR} NOTMATCHESREGEX - {#IFALIAS} MATCHESREGEX - {#IFALIAS} NOTMATCHESREGEX - {#IFTYPE} MATCHESREGEX - {#IFTYPE} NOTMATCHESREGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound packets discarded | MIB: IF-MIB The number of inbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP | net.if.in.discards[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND: `` |
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound packets with errors | MIB: IF-MIB For packet-oriented interfaces, the number of inbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of inbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP | net.if.in.errors[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND: `` |
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Bits received | MIB: IF-MIB The total number of octets received on the interface, including framing characters. This object is a 64-bit version of ifInOctets. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP | net.if.in[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND: ` |
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound packets discarded | MIB: IF-MIB The number of outbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP | net.if.out.discards[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND: `` |
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound packets with errors | MIB: IF-MIB For packet-oriented interfaces, the number of outbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of outbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP | net.if.out.errors[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND: `` |
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Bits sent | MIB: IF-MIB The total number of octets transmitted out of the interface, including framing characters. This object is a 64-bit version of ifOutOctets.Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP | net.if.out[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND: ` |
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Speed | MIB: IF-MIB An estimate of the interface's current bandwidth in units of 1,000,000 bits per second. If this object reports a value of |
SNMP | net.if.speed[{#SNMPINDEX}] Preprocessing: - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Operational status | MIB: IF-MIB The current operational state of the interface. - The testing(3) state indicates that no operational packet scan be passed - If ifAdminStatus is down(2) then ifOperStatus should be down(2) - If ifAdminStatus is changed to up(1) then ifOperStatus should change to up(1) if the interface is ready to transmit and receive network traffic - It should change todormant(5) if the interface is waiting for external actions (such as a serial line waiting for an incoming connection) - It should remain in the down(2) state if and only if there is a fault that prevents it from going to the up(1) state - It should remain in the notPresent(6) state if the interface has missing(typically, hardware) components. |
SNMP | net.if.status[{#SNMPINDEX}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Interface type | MIB: IF-MIB The type of interface. Additional values for ifType are assigned by the Internet Assigned Numbers Authority (IANA), through updating the syntax of the IANAifType textual convention. |
SNMP | net.if.type[{#SNMPINDEX}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Rules references count | MIB: BEGEMOT-PF-MIB The number of rules referencing this interface. |
SNMP | net.if.rules.refs[{#SNMPINDEX}] |
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 traffic passed | MIB: BEGEMOT-PF-MIB IPv4 bits per second passed coming in on this interface. |
SNMP | net.if.in.pass.v4.bps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND - MULTIPLIER: |
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 traffic blocked | MIB: BEGEMOT-PF-MIB IPv4 bits per second blocked coming in on this interface. |
SNMP | net.if.in.block.v4.bps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND - MULTIPLIER: |
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 traffic passed | MIB: BEGEMOT-PF-MIB IPv4 bits per second passed going out on this interface. |
SNMP | net.if.out.pass.v4.bps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND - MULTIPLIER: |
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 traffic blocked | MIB: BEGEMOT-PF-MIB IPv4 bits per second blocked going out on this interface. |
SNMP | net.if.out.block.v4.bps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND - MULTIPLIER: |
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 packets passed | MIB: BEGEMOT-PF-MIB The number of IPv4 packets passed coming in on this interface. |
SNMP | net.if.in.pass.v4.pps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND |
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 packets blocked | MIB: BEGEMOT-PF-MIB The number of IPv4 packets blocked coming in on this interface. |
SNMP | net.if.in.block.v4.pps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND |
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 packets passed | MIB: BEGEMOT-PF-MIB The number of IPv4 packets passed going out on this interface. |
SNMP | net.if.out.pass.v4.pps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND |
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 packets blocked | MIB: BEGEMOT-PF-MIB The number of IPv4 packets blocked going out on this interface. |
SNMP | net.if.out.block.v4.pps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND |
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 traffic passed | MIB: BEGEMOT-PF-MIB IPv6 bits per second passed coming in on this interface. |
SNMP | net.if.in.pass.v6.bps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND - MULTIPLIER: |
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 traffic blocked | MIB: BEGEMOT-PF-MIB IPv6 bits per second blocked coming in on this interface. |
SNMP | net.if.in.block.v6.bps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND - MULTIPLIER: |
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 traffic passed | MIB: BEGEMOT-PF-MIB IPv6 bits per second passed going out on this interface. |
SNMP | net.if.out.pass.v6.bps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND - MULTIPLIER: |
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 traffic blocked | MIB: BEGEMOT-PF-MIB IPv6 bits per second blocked going out on this interface. |
SNMP | net.if.out.block.v6.bps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND - MULTIPLIER: |
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 packets passed | MIB: BEGEMOT-PF-MIB The number of IPv6 packets passed coming in on this interface. |
SNMP | net.if.in.pass.v6.pps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND |
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 packets blocked | MIB: BEGEMOT-PF-MIB The number of IPv6 packets blocked coming in on this interface. |
SNMP | net.if.in.block.v6.pps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND |
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 packets passed | MIB: BEGEMOT-PF-MIB The number of IPv6 packets passed going out on this interface. |
SNMP | net.if.out.pass.v6.pps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND |
Network interfaces | PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 packets blocked | MIB: BEGEMOT-PF-MIB The number of IPv6 packets blocked going out on this interface. |
SNMP | net.if.out.block.v6.pps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND |
pfSense | PFSense: Packet filter running status | MIB: BEGEMOT-PF-MIB True if packet filter is currently enabled. |
SNMP | pfsense.pf.status |
pfSense | PFSense: States table current | MIB: BEGEMOT-PF-MIB Number of entries in the state table. |
SNMP | pfsense.state.table.count |
pfSense | PFSense: States table limit | MIB: BEGEMOT-PF-MIB Maximum number of 'keep state' rules in the ruleset. |
SNMP | pfsense.state.table.limit |
pfSense | PFSense: States table utilization in % | Utilization of state table in %. |
CALCULATED | pfsense.state.table.pused Expression: last(//pfsense.state.table.count) * 100 / last(//pfsense.state.table.limit) |
pfSense | PFSense: Source tracking table current | MIB: BEGEMOT-PF-MIB Number of entries in the source tracking table. |
SNMP | pfsense.source.tracking.table.count |
pfSense | PFSense: Source tracking table limit | MIB: BEGEMOT-PF-MIB Maximum number of 'sticky-address' or 'source-track' rules in the ruleset. |
SNMP | pfsense.source.tracking.table.limit |
pfSense | PFSense: Source tracking table utilization in % | Utilization of source tracking table in %. |
CALCULATED | pfsense.source.tracking.table.pused Expression: last(//pfsense.source.tracking.table.count) * 100 / last(//pfsense.source.tracking.table.limit) |
pfSense | PFSense: DHCP server status | MIB: HOST-RESOURCES-MIB The status of DHCP server process. |
SNMP | pfsense.dhcpd.status Preprocessing: - CHECKNOTSUPPORTED: ` |
pfSense | PFSense: DNS server status | MIB: HOST-RESOURCES-MIB The status of DNS server process. |
SNMP | pfsense.dns.status Preprocessing: - CHECKNOTSUPPORTED: ` |
pfSense | PFSense: State of nginx process | MIB: HOST-RESOURCES-MIB The status of nginx process. |
SNMP | pfsense.nginx.status Preprocessing: - CHECKNOTSUPPORTED: ` |
pfSense | PFSense: Packets matched a filter rule | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP | pfsense.packets.match Preprocessing: - CHANGEPERSECOND |
pfSense | PFSense: Packets with bad offset | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP | pfsense.packets.bad.offset Preprocessing: - CHANGEPERSECOND |
pfSense | PFSense: Fragmented packets | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP | pfsense.packets.fragment Preprocessing: - CHANGEPERSECOND |
pfSense | PFSense: Short packets | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP | pfsense.packets.short Preprocessing: - CHANGEPERSECOND |
pfSense | PFSense: Normalized packets | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP | pfsense.packets.normalize Preprocessing: - CHANGEPERSECOND |
pfSense | PFSense: Packets dropped due to memory limitation | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP | pfsense.packets.mem.drop Preprocessing: - CHANGEPERSECOND |
pfSense | PFSense: Firewall rules count | MIB: BEGEMOT-PF-MIB The number of labeled filter rules on this system. |
SNMP | pfsense.rules.count |
Status | PFSense: SNMP agent availability | Availability of SNMP checks on the host. The value of this item corresponds to availability icons in the host list. Possible value: 0 - not available 1 - available 2 - unknown |
INTERNAL | zabbix[host,snmp,available] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: High input error rate | Recovers when below 80% of {$IF.ERRORS.WARN:"{#IFNAME}"} threshold. |
min(/PFSense by SNMP/net.if.in.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} Recovery expression: max(/PFSense by SNMP/net.if.in.errors[{#SNMPINDEX}],5m)<{$IF.ERRORS.WARN:"{#IFNAME}"}*0.8 |
WARNING | Depends on: - PFSense: Interface [{#IFNAME}({#IFALIAS})]: Link down |
PFSense: Interface [{#IFNAME}({#IFALIAS})]: High inbound bandwidth usage | The network interface utilization is close to its estimated maximum bandwidth. |
(avg(/PFSense by SNMP/net.if.in[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])>0 Recovery expression: avg(/PFSense by SNMP/net.if.in[{#SNMPINDEX}],15m)<(({$IF.UTIL.MAX:"{#IFNAME}"}-3)/100)*last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}]) |
WARNING | Depends on: - PFSense: Interface [{#IFNAME}({#IFALIAS})]: Link down |
PFSense: Interface [{#IFNAME}({#IFALIAS})]: High output error rate | Recovers when below 80% of {$IF.ERRORS.WARN:"{#IFNAME}"} threshold. |
min(/PFSense by SNMP/net.if.out.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} Recovery expression: max(/PFSense by SNMP/net.if.out.errors[{#SNMPINDEX}],5m)<{$IF.ERRORS.WARN:"{#IFNAME}"}*0.8 |
WARNING | Depends on: - PFSense: Interface [{#IFNAME}({#IFALIAS})]: Link down |
PFSense: Interface [{#IFNAME}({#IFALIAS})]: High outbound bandwidth usage | The network interface utilization is close to its estimated maximum bandwidth. |
(avg(/PFSense by SNMP/net.if.out[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])>0 Recovery expression: avg(/PFSense by SNMP/net.if.out[{#SNMPINDEX}],15m)<(({$IF.UTIL.MAX:"{#IFNAME}"}-3)/100)*last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}]) |
WARNING | Depends on: - PFSense: Interface [{#IFNAME}({#IFALIAS})]: Link down |
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Ethernet has changed to lower speed than it was before | This Ethernet connection has transitioned down from its known maximum speed. This might be a sign of autonegotiation issues. Ack to close. |
change(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])<0 and last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])>0 and ( last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=6 or last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=7 or last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=11 or last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=62 or last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=69 or last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=117 ) and (last(/PFSense by SNMP/net.if.status[{#SNMPINDEX}])<>2) Recovery expression: (change(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])>0 and last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}],#2)>0) or (last(/PFSense by SNMP/net.if.status[{#SNMPINDEX}])=2) |
INFO | Depends on: - PFSense: Interface [{#IFNAME}({#IFALIAS})]: Link down |
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Link down | This trigger expression works as follows: 1. Can be triggered if operations status is down. 2. {$IFCONTROL:"{#IFNAME}"}=1 - user can redefine Context macro to value - 0. That marks this interface as not important. No new trigger will be fired if this interface is down. |
{$IFCONTROL:"{#IFNAME}"}=1 and (last(/PFSense by SNMP/net.if.status[{#SNMPINDEX}])=2) |
AVERAGE | |
PFSense: Packet filter is not running | Please check PF status. |
last(/PFSense by SNMP/pfsense.pf.status)<>1 |
HIGH | |
PFSense: State table usage is high | Please check the number of connections https://docs.netgate.com/pfsense/en/latest/config/advanced-firewall-nat.html#config-advanced-firewall-maxstates |
min(/PFSense by SNMP/pfsense.state.table.pused,#3)>{$STATE.TABLE.UTIL.MAX} |
WARNING | |
PFSense: Source tracking table usage is high | Please check the number of sticky connections https://docs.netgate.com/pfsense/en/latest/monitoring/status/firewall-states-sources.html |
min(/PFSense by SNMP/pfsense.source.tracking.table.pused,#3)>{$SOURCE.TRACKING.TABLE.UTIL.MAX} |
WARNING | |
PFSense: DHCP server is not running | Please check DHCP server settings https://docs.netgate.com/pfsense/en/latest/services/dhcp/index.html |
last(/PFSense by SNMP/pfsense.dhcpd.status)=0 |
AVERAGE | |
PFSense: DNS server is not running | Please check DNS server settings https://docs.netgate.com/pfsense/en/latest/services/dns/index.html |
last(/PFSense by SNMP/pfsense.dns.status)=0 |
AVERAGE | |
PFSense: Web server is not running | Please check nginx service status. |
last(/PFSense by SNMP/pfsense.nginx.status)=0 |
AVERAGE | |
PFSense: No SNMP data collection | SNMP is not available for polling. Please check device connectivity and SNMP settings. |
max(/PFSense by SNMP/zabbix[host,snmp,available],{$SNMP.TIMEOUT})=0 |
WARNING |
Please report any issues with the template at https://support.zabbix.com.
For Zabbix version: 6.2 and higher. Template for monitoring OPNsense by SNMP
This template was tested on:
See Zabbix template operation for basic instructions.
No specific Zabbix configuration is required.
Name | Description | Default | |
---|---|---|---|
{$IF.ERRORS.WARN} | Threshold of error packets rate for warning trigger. Can be used with interface name as context. |
2 |
|
{$IF.UTIL.MAX} | Threshold of interface bandwidth utilization for warning trigger in %. Can be used with interface name as context. |
90 |
|
{$IFCONTROL} | Macro for operational state of the interface for link down trigger. Can be used with interface name as context. |
1 |
|
{$NET.IF.IFADMINSTATUS.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
^.* |
|
{$NET.IF.IFADMINSTATUS.NOT_MATCHES} | Ignore down(2) administrative status. |
^2$ |
|
{$NET.IF.IFALIAS.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
.* |
|
{$NET.IF.IFALIAS.NOT_MATCHES} | This macro is used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
|
{$NET.IF.IFDESCR.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
.* |
|
{$NET.IF.IFDESCR.NOT_MATCHES} | This macro is used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
|
{$NET.IF.IFNAME.NOT_MATCHES} | This macro is used in filters of network interfaces discovery rule. |
`(^pflog[0-9.]*$ | ^pfsync[0-9.]*$)` |
{$NET.IF.IFOPERSTATUS.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
^.*$ |
|
{$NET.IF.IFOPERSTATUS.NOT_MATCHES} | Ignore notPresent(6). |
^6$ |
|
{$NET.IF.IFTYPE.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
.* |
|
{$NET.IF.IFTYPE.NOT_MATCHES} | This macro is used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
|
{$SNMP.TIMEOUT} | The time interval for SNMP availability trigger. |
5m |
|
{$SOURCE.TRACKING.TABLE.UTIL.MAX} | Threshold of source tracking table utilization trigger in %. |
90 |
|
{$STATE.TABLE.UTIL.MAX} | Threshold of state table utilization trigger in %. |
90 |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Network interfaces discovery | Discovering interfaces from IF-MIB. |
SNMP | opnsense.net.if.discovery Filter: AND- {#IFADMINSTATUS} MATCHESREGEX - {#IFADMINSTATUS} NOTMATCHESREGEX - {#IFOPERSTATUS} MATCHESREGEX - {#IFOPERSTATUS} NOTMATCHESREGEX - {#IFNAME} MATCHESREGEX - {#IFNAME} NOTMATCHESREGEX - {#IFDESCR} MATCHESREGEX - {#IFDESCR} NOTMATCHESREGEX - {#IFALIAS} MATCHESREGEX - {#IFALIAS} NOTMATCHESREGEX - {#IFTYPE} MATCHESREGEX - {#IFTYPE} NOTMATCHESREGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound packets discarded | MIB: IF-MIB The number of inbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP | net.if.in.discards[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND: `` |
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound packets with errors | MIB: IF-MIB For packet-oriented interfaces, the number of inbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of inbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP | net.if.in.errors[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND: `` |
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Bits received | MIB: IF-MIB The total number of octets received on the interface, including framing characters. This object is a 64-bit version of ifInOctets. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP | net.if.in[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND: ` |
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound packets discarded | MIB: IF-MIB The number of outbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP | net.if.out.discards[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND: `` |
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound packets with errors | MIB: IF-MIB For packet-oriented interfaces, the number of outbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of outbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP | net.if.out.errors[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND: `` |
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Bits sent | MIB: IF-MIB The total number of octets transmitted out of the interface, including framing characters. This object is a 64-bit version of ifOutOctets.Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP | net.if.out[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND: ` |
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Speed | MIB: IF-MIB An estimate of the interface's current bandwidth in units of 1,000,000 bits per second. If this object reports a value of |
SNMP | net.if.speed[{#SNMPINDEX}] Preprocessing: - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Operational status | MIB: IF-MIB The current operational state of the interface. - The testing(3) state indicates that no operational packet scan be passed - If ifAdminStatus is down(2) then ifOperStatus should be down(2) - If ifAdminStatus is changed to up(1) then ifOperStatus should change to up(1) if the interface is ready to transmit and receive network traffic - It should change todormant(5) if the interface is waiting for external actions (such as a serial line waiting for an incoming connection) - It should remain in the down(2) state if and only if there is a fault that prevents it from going to the up(1) state - It should remain in the notPresent(6) state if the interface has missing(typically, hardware) components. |
SNMP | net.if.status[{#SNMPINDEX}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Interface type | MIB: IF-MIB The type of interface. Additional values for ifType are assigned by the Internet Assigned Numbers Authority (IANA), through updating the syntax of the IANAifType textual convention. |
SNMP | net.if.type[{#SNMPINDEX}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Rules references count | MIB: BEGEMOT-PF-MIB The number of rules referencing this interface. |
SNMP | net.if.rules.refs[{#SNMPINDEX}] |
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 traffic passed | MIB: BEGEMOT-PF-MIB IPv4 bits per second passed coming in on this interface. |
SNMP | net.if.in.pass.v4.bps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND - MULTIPLIER: |
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 traffic blocked | MIB: BEGEMOT-PF-MIB IPv4 bits per second blocked coming in on this interface. |
SNMP | net.if.in.block.v4.bps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND - MULTIPLIER: |
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 traffic passed | MIB: BEGEMOT-PF-MIB IPv4 bits per second passed going out on this interface. |
SNMP | net.if.out.pass.v4.bps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND - MULTIPLIER: |
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 traffic blocked | MIB: BEGEMOT-PF-MIB IPv4 bits per second blocked going out on this interface. |
SNMP | net.if.out.block.v4.bps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND - MULTIPLIER: |
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 packets passed | MIB: BEGEMOT-PF-MIB The number of IPv4 packets passed coming in on this interface. |
SNMP | net.if.in.pass.v4.pps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND |
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 packets blocked | MIB: BEGEMOT-PF-MIB The number of IPv4 packets blocked coming in on this interface. |
SNMP | net.if.in.block.v4.pps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND |
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 packets passed | MIB: BEGEMOT-PF-MIB The number of IPv4 packets passed going out on this interface. |
SNMP | net.if.out.pass.v4.pps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND |
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 packets blocked | MIB: BEGEMOT-PF-MIB The number of IPv4 packets blocked going out on this interface. |
SNMP | net.if.out.block.v4.pps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND |
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 traffic passed | MIB: BEGEMOT-PF-MIB IPv6 bits per second passed coming in on this interface. |
SNMP | net.if.in.pass.v6.bps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND - MULTIPLIER: |
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 traffic blocked | MIB: BEGEMOT-PF-MIB IPv6 bits per second blocked coming in on this interface. |
SNMP | net.if.in.block.v6.bps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND - MULTIPLIER: |
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 traffic passed | MIB: BEGEMOT-PF-MIB IPv6 bits per second passed going out on this interface. |
SNMP | net.if.out.pass.v6.bps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND - MULTIPLIER: |
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 traffic blocked | MIB: BEGEMOT-PF-MIB IPv6 bits per second blocked going out on this interface. |
SNMP | net.if.out.block.v6.bps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND - MULTIPLIER: |
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 packets passed | MIB: BEGEMOT-PF-MIB The number of IPv6 packets passed coming in on this interface. |
SNMP | net.if.in.pass.v6.pps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND |
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 packets blocked | MIB: BEGEMOT-PF-MIB The number of IPv6 packets blocked coming in on this interface. |
SNMP | net.if.in.block.v6.pps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND |
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 packets passed | MIB: BEGEMOT-PF-MIB The number of IPv6 packets passed going out on this interface. |
SNMP | net.if.out.pass.v6.pps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND |
Network interfaces | OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 packets blocked | MIB: BEGEMOT-PF-MIB The number of IPv6 packets blocked going out on this interface. |
SNMP | net.if.out.block.v6.pps[{#SNMPINDEX}] Preprocessing: - CHANGEPERSECOND |
OPNsense | OPNsense: Packet filter running status | MIB: BEGEMOT-PF-MIB True if packet filter is currently enabled. |
SNMP | opnsense.pf.status |
OPNsense | OPNsense: States table current | MIB: BEGEMOT-PF-MIB Number of entries in the state table. |
SNMP | opnsense.state.table.count |
OPNsense | OPNsense: States table limit | MIB: BEGEMOT-PF-MIB Maximum number of 'keep state' rules in the ruleset. |
SNMP | opnsense.state.table.limit |
OPNsense | OPNsense: States table utilization in % | Utilization of state table in %. |
CALCULATED | opnsense.state.table.pused Expression: last(//opnsense.state.table.count) * 100 / last(//opnsense.state.table.limit) |
OPNsense | OPNsense: Source tracking table current | MIB: BEGEMOT-PF-MIB Number of entries in the source tracking table. |
SNMP | opnsense.source.tracking.table.count |
OPNsense | OPNsense: Source tracking table limit | MIB: BEGEMOT-PF-MIB Maximum number of 'sticky-address' or 'source-track' rules in the ruleset. |
SNMP | opnsense.source.tracking.table.limit |
OPNsense | OPNsense: Source tracking table utilization in % | Utilization of source tracking table in %. |
CALCULATED | opnsense.source.tracking.table.pused Expression: last(//opnsense.source.tracking.table.count) * 100 / last(//opnsense.source.tracking.table.limit) |
OPNsense | OPNsense: DHCP server status | MIB: HOST-RESOURCES-MIB The status of DHCP server process. |
SNMP | opnsense.dhcpd.status Preprocessing: - CHECKNOTSUPPORTED: ` |
OPNsense | OPNsense: DNS server status | MIB: HOST-RESOURCES-MIB The status of DNS server process. |
SNMP | opnsense.dns.status Preprocessing: - CHECKNOTSUPPORTED: ` |
OPNsense | OPNsense: Web server status | MIB: HOST-RESOURCES-MIB The status of lighttpd process. |
SNMP | opnsense.lighttpd.status Preprocessing: - CHECKNOTSUPPORTED: ` |
OPNsense | OPNsense: Packets matched a filter rule | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP | opnsense.packets.match Preprocessing: - CHANGEPERSECOND |
OPNsense | OPNsense: Packets with bad offset | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP | opnsense.packets.bad.offset Preprocessing: - CHANGEPERSECOND |
OPNsense | OPNsense: Fragmented packets | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP | opnsense.packets.fragment Preprocessing: - CHANGEPERSECOND |
OPNsense | OPNsense: Short packets | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP | opnsense.packets.short Preprocessing: - CHANGEPERSECOND |
OPNsense | OPNsense: Normalized packets | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP | opnsense.packets.normalize Preprocessing: - CHANGEPERSECOND |
OPNsense | OPNsense: Packets dropped due to memory limitation | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP | opnsense.packets.mem.drop Preprocessing: - CHANGEPERSECOND |
OPNsense | OPNsense: Firewall rules count | MIB: BEGEMOT-PF-MIB The number of labeled filter rules on this system. |
SNMP | opnsense.rules.count |
Status | OPNsense: SNMP agent availability | Availability of SNMP checks on the host. The value of this item corresponds to availability icons in the host list. Possible value: 0 - not available 1 - available 2 - unknown |
INTERNAL | zabbix[host,snmp,available] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: High input error rate | Recovers when below 80% of {$IF.ERRORS.WARN:"{#IFNAME}"} threshold. |
min(/OPNsense by SNMP/net.if.in.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} Recovery expression: max(/OPNsense by SNMP/net.if.in.errors[{#SNMPINDEX}],5m)<{$IF.ERRORS.WARN:"{#IFNAME}"}*0.8 |
WARNING | Depends on: - OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Link down |
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: High inbound bandwidth usage | The network interface utilization is close to its estimated maximum bandwidth. |
(avg(/OPNsense by SNMP/net.if.in[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])>0 Recovery expression: avg(/OPNsense by SNMP/net.if.in[{#SNMPINDEX}],15m)<(({$IF.UTIL.MAX:"{#IFNAME}"}-3)/100)*last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}]) |
WARNING | Depends on: - OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Link down |
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: High output error rate | Recovers when below 80% of {$IF.ERRORS.WARN:"{#IFNAME}"} threshold. |
min(/OPNsense by SNMP/net.if.out.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} Recovery expression: max(/OPNsense by SNMP/net.if.out.errors[{#SNMPINDEX}],5m)<{$IF.ERRORS.WARN:"{#IFNAME}"}*0.8 |
WARNING | Depends on: - OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Link down |
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: High outbound bandwidth usage | The network interface utilization is close to its estimated maximum bandwidth. |
(avg(/OPNsense by SNMP/net.if.out[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])>0 Recovery expression: avg(/OPNsense by SNMP/net.if.out[{#SNMPINDEX}],15m)<(({$IF.UTIL.MAX:"{#IFNAME}"}-3)/100)*last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}]) |
WARNING | Depends on: - OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Link down |
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Ethernet has changed to lower speed than it was before | This Ethernet connection has transitioned down from its known maximum speed. This might be a sign of autonegotiation issues. Ack to close. |
change(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])<0 and last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])>0 and ( last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=6 or last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=7 or last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=11 or last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=62 or last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=69 or last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=117 ) and (last(/OPNsense by SNMP/net.if.status[{#SNMPINDEX}])<>2) Recovery expression: (change(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])>0 and last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}],#2)>0) or (last(/OPNsense by SNMP/net.if.status[{#SNMPINDEX}])=2) |
INFO | Depends on: - OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Link down |
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Link down | This trigger expression works as follows: 1. Can be triggered if operations status is down. 2. {$IFCONTROL:"{#IFNAME}"}=1 - user can redefine Context macro to value - 0. That marks this interface as not important. No new trigger will be fired if this interface is down. |
{$IFCONTROL:"{#IFNAME}"}=1 and (last(/OPNsense by SNMP/net.if.status[{#SNMPINDEX}])=2) |
AVERAGE | |
OPNsense: Packet filter is not running | Please check PF status. |
last(/OPNsense by SNMP/opnsense.pf.status)<>1 |
HIGH | |
OPNsense: State table usage is high | Please check the number of connections. |
min(/OPNsense by SNMP/opnsense.state.table.pused,#3)>{$STATE.TABLE.UTIL.MAX} |
WARNING | |
OPNsense: Source tracking table usage is high | Please check the number of sticky connections. |
min(/OPNsense by SNMP/opnsense.source.tracking.table.pused,#3)>{$SOURCE.TRACKING.TABLE.UTIL.MAX} |
WARNING | |
OPNsense: DHCP server is not running | Please check DHCP server settings. |
last(/OPNsense by SNMP/opnsense.dhcpd.status)=0 |
AVERAGE | |
OPNsense: DNS server is not running | Please check DNS server settings. |
last(/OPNsense by SNMP/opnsense.dns.status)=0 |
AVERAGE | |
OPNsense: Web server is not running | Please check lighttpd service status. |
last(/OPNsense by SNMP/opnsense.lighttpd.status)=0 |
AVERAGE | |
OPNsense: No SNMP data collection | SNMP is not available for polling. Please check device connectivity and SNMP settings. |
max(/OPNsense by SNMP/zabbix[host,snmp,available],{$SNMP.TIMEOUT})=0 |
WARNING |
Please report any issues with the template at https://support.zabbix.com.
For Zabbix version: 6.2 and higher
Get weather metrics from OpenWeatherMap current weather API by HTTP.
It works without any external scripts and uses the Script item.
See Zabbix template operation for basic instructions.
Create a host.
Link the template to the host.
Customize the values of {$OPENWEATHERMAP.API.TOKEN} and {$LOCATION} macros.
OpenWeatherMap API Tokens are available in your OpenWeatherMap account https://home.openweathermap.org/api_keys.
Locations can be set by few ways:
|
delimiter.
For example: 43.81821,7.76115|Riga|2643743|94040,us
.
Please note that API requests by city name, zip-codes and city id will be deprecated soon.Language and units macros can be customized too if necessary. List of available languages: https://openweathermap.org/current#multi. Available units of measurement are: standard, metric and imperial https://openweathermap.org/current#data.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$LANG} | List of available languages https://openweathermap.org/current#multi. |
en |
{$LOCATION} | Locations can be set by few ways: 1. by geo coordinates (for example: 56.95,24.0833) 2. by location name (for example: Riga) 3. by location ID. Link to the list of city ID: http://bulk.openweathermap.org/sample/city.list.json.gz 4. by zip/post code with a country code (for example: 94040,us) A few locations can be added to the macro at the same time by For example: Please note that API requests by city name, zip-codes and city id will be deprecated soon. |
Riga |
{$OPENWEATHERMAP.API.ENDPOINT} | OpenWeatherMap API endpoint. |
api.openweathermap.org/data/2.5/weather? |
{$OPENWEATHERMAP.API.TOKEN} | Specify openweathermap API key. |
`` |
{$OPENWEATHERMAP.DATA.TIMEOUT} | Response timeout for OpenWeatherMap API. |
3s |
{$TEMP.CRIT.HIGH} | Threshold for high temperature trigger. |
30 |
{$TEMP.CRIT.LOW} | Threshold for low temperature trigger. |
-20 |
{$UNITS} | Available units of measurement are standard, metric and imperial https://openweathermap.org/current#data. |
metric |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Locations discovery | Weather metrics discovery by location. |
DEPENDENT | openweathermap.locations.discovery Preprocessing: - JSONPATH: - NOTMATCHESREGEX: - DISCARDUNCHANGEDHEARTBEAT: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
OpenWeatherMap | [{#LOCATION}, {#COUNTRY}]: Data | JSON with result of OpenWeatherMap API request by location. |
DEPENDENT | openweathermap.location.data[{#ID}] Preprocessing: - JSONPATH: |
OpenWeatherMap | [{#LOCATION}, {#COUNTRY}]: Atmospheric pressure | Atmospheric pressure in Pa. |
DEPENDENT | openweathermap.pressure[{#ID}] Preprocessing: - JSONPATH: - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
OpenWeatherMap | [{#LOCATION}, {#COUNTRY}]: Cloudiness | Cloudiness in %. |
DEPENDENT | openweathermap.clouds[{#ID}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
OpenWeatherMap | [{#LOCATION}, {#COUNTRY}]: Humidity | Humidity in %. |
DEPENDENT | openweathermap.humidity[{#ID}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
OpenWeatherMap | [{#LOCATION}, {#COUNTRY}]: Rain volume for the last one hour | Rain volume for the lat one hour in m. |
DEPENDENT | openweathermap.rain[{#ID}] Preprocessing: - JSONPATH: ⛔️ONFAIL: - MULTIPLIER: - DISCARDUNCHANGED_HEARTBEAT: |
OpenWeatherMap | [{#LOCATION}, {#COUNTRY}]: Short weather status | Short weather status description. |
DEPENDENT | openweathermap.description[{#ID}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
OpenWeatherMap | [{#LOCATION}, {#COUNTRY}]: Snow volume for the last one hour | Snow volume for the lat one hour in m. |
DEPENDENT | openweathermap.snow[{#ID}] Preprocessing: - JSONPATH: ⛔️ONFAIL: - MULTIPLIER: - DISCARDUNCHANGED_HEARTBEAT: |
OpenWeatherMap | [{#LOCATION}, {#COUNTRY}]: Temperature | Atmospheric temperature value. |
DEPENDENT | openweathermap.temp[{#ID}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
OpenWeatherMap | [{#LOCATION}, {#COUNTRY}]: Visibility | Visibility in m. |
DEPENDENT | openweathermap.visibility[{#ID}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
OpenWeatherMap | [{#LOCATION}, {#COUNTRY}]: Wind direction | Wind direction in degrees. |
DEPENDENT | openweathermap.wind.direction[{#ID}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
OpenWeatherMap | [{#LOCATION}, {#COUNTRY}]: Wind speed | Wind speed value. |
DEPENDENT | openweathermap.wind.speed[{#ID}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Zabbix raw items | Openweathermap: Get data | JSON array with result of OpenWeatherMap API requests. |
SCRIPT | openweathermap.get.data Expression: The text is too long. Please see the template. |
Zabbix raw items | Openweathermap: Get data collection errors | Errors from get data requests by script item. |
DEPENDENT | openweathermap.get.errors Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
[{#LOCATION}, {#COUNTRY}]: Temperature is too high | Temperature value is too high. |
min(/OpenWeatherMap by HTTP/openweathermap.temp[{#ID}],#3)>{$TEMP.CRIT.HIGH} |
AVERAGE | Manual close: YES |
[{#LOCATION}, {#COUNTRY}]: Temperature is too low | Temperature value is too low. |
max(/OpenWeatherMap by HTTP/openweathermap.temp[{#ID}],#3)<{$TEMP.CRIT.LOW} |
AVERAGE | Manual close: YES |
Openweathermap: There are errors in requests to OpenWeatherMap API | Zabbix has received errors in requests to OpenWeatherMap API. |
length(last(/OpenWeatherMap by HTTP/openweathermap.get.errors))>0 |
AVERAGE | Manual close: YES |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
Refer to the vendor documentation.
No specific Zabbix configuration is required.
There are no template links in this template.
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Services | NTP service is running | - |
SIMPLE | net.udp.service[ntp] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
NTP service is down on {HOST.NAME} | - |
max(/NTP Service/net.udp.service[ntp],#3)=0 |
AVERAGE |
Please report any issues with the template at https://support.zabbix.com
For Zabbix version: 6.2 and higher
Refer to the vendor documentation.
No specific Zabbix configuration is required.
There are no template links in this template.
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Services | NNTP service is running | - |
SIMPLE | net.tcp.service[nntp] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
NNTP service is down on {HOST.NAME} | - |
max(/NNTP Service/net.tcp.service[nntp],#3)=0 |
AVERAGE |
Please report any issues with the template at https://support.zabbix.com
For Zabbix version: 6.2 and higher. This template is designed to monitor NGINX Plus by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The monitoring data of the live activity is generated by the NGINX Plus API.
This template has been tested on:
See Zabbix template operation for basic instructions.
<scheme>://<host>:<port>/<location>/
.Note that depending on the number of zones and upstreams discovery operation may be expensive. Therefore, use the following filters with these macros:
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$NGINX.API.ENDPOINT} | NGINX Plus API URL in the format |
`` |
{$NGINX.DROP_RATE.MAX.WARN} | The critical rate of the dropped connections for a trigger expression. |
1 |
{$NGINX.HTTP.UPSTREAM.4XX.MAX.WARN} | The maximum percentage of errors with the status code |
5 |
{$NGINX.HTTP.UPSTREAM.5XX.MAX.WARN} | The maximum percentage of errors with the status code |
5 |
{$NGINX.LLD.FILTER.HTTP.LOCATION.ZONE.MATCHES} | The filter to include the necessary discovered HTTP location zones. |
.* |
{$NGINX.LLD.FILTER.HTTP.LOCATION.ZONE.NOT_MATCHES} | The filter to exclude discovered HTTP location zones. |
CHANGE_IF_NEEDED |
{$NGINX.LLD.FILTER.HTTP.UPSTREAM.MATCHES} | The filter to include the necessary discovered HTTP upstreams. |
.* |
{$NGINX.LLD.FILTER.HTTP.UPSTREAM.NOT_MATCHES} | The filter to exclude discovered HTTP upstreams. |
CHANGE_IF_NEEDED |
{$NGINX.LLD.FILTER.HTTP.ZONE.MATCHES} | The filter to include the necessary discovered HTTP server zones. |
.* |
{$NGINX.LLD.FILTER.HTTP.ZONE.NOT_MATCHES} | The filter to exclude discovered HTTP server zones. |
CHANGE_IF_NEEDED |
{$NGINX.LLD.FILTER.RESOLVER.MATCHES} | The filter to include the necessary discovered |
.* |
{$NGINX.LLD.FILTER.RESOLVER.NOT_MATCHES} | The filter to exclude discovered |
CHANGE_IF_NEEDED |
{$NGINX.LLD.FILTER.STREAM.UPSTREAM.MATCHES} | The filter to include the necessary discovered upstreams of the "stream" directive. |
.* |
{$NGINX.LLD.FILTER.STREAM.UPSTREAM.NOT_MATCHES} | The filter to exclude discovered upstreams of the "stream" directive |
CHANGE_IF_NEEDED |
{$NGINX.LLD.FILTER.STREAM.ZONE.MATCHES} | The filter to include discovered server zones of the "stream" directive. |
.* |
{$NGINX.LLD.FILTER.STREAM.ZONE.NOT_MATCHES} | The filter to exclude discovered server zones of the "stream" directive. |
CHANGE_IF_NEEDED |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
HTTP location zones discovery | - |
DEPENDENT | nginx.http.locationzones.discovery Preprocessing: - JAVASCRIPT: - DISCARD UNCHANGEDHEARTBEAT:30m Filter: AND- {#NAME} MATCHES REGEX{$NGINX.LLD.FILTER.HTTP.LOCATION.ZONE.MATCHES} - {#NAME} NOTMATCHESREGEX |
HTTP server zones discovery | - |
DEPENDENT | nginx.http.serverzones.discovery Preprocessing: - JAVASCRIPT: - DISCARD UNCHANGEDHEARTBEAT:30m Filter: AND- {#NAME} MATCHES REGEX{$NGINX.LLD.FILTER.HTTP.ZONE.MATCHES} - {#NAME} NOTMATCHESREGEX |
HTTP upstream peers discovery | - |
DEPENDENT | nginx.http.upstream.peers.discovery Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: Filter: AND- {#UPSTREAM} MATCHESREGEX - {#UPSTREAM} NOTMATCHES_REGEX |
HTTP upstreams discovery | - |
DEPENDENT | nginx.http.upstreams.discovery Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: Filter: AND- {#NAME} MATCHESREGEX - {#NAME} NOTMATCHES_REGEX |
Resolvers discovery | - |
DEPENDENT | nginx.resolvers.discovery Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: Filter: AND- {#NAME} MATCHESREGEX - {#NAME} NOTMATCHES_REGEX |
Stream server zones discovery | - |
DEPENDENT | nginx.stream.serverzones.discovery Preprocessing: - JAVASCRIPT: - DISCARD UNCHANGEDHEARTBEAT:30m Filter: AND- {#NAME} MATCHES REGEX{$NGINX.LLD.FILTER.STREAM.ZONE.MATCHES} - {#NAME} NOTMATCHESREGEX |
Stream upstream peers discovery | - |
DEPENDENT | nginx.stream.upstream.peers.discovery Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: Filter: AND- {#UPSTREAM} MATCHESREGEX - {#UPSTREAM} NOTMATCHES_REGEX |
Stream upstreams discovery | - |
DEPENDENT | nginx.stream.upstreams.discovery Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: Filter: AND- {#NAME} MATCHESREGEX - {#NAME} NOTMATCHES_REGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Nginx | Nginx: Get info error | The description of NGINX errors. |
DEPENDENT | nginx.info.error Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
Nginx | Nginx: Version | A version number of NGINX. |
DEPENDENT | nginx.info.version Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Nginx | Nginx: Address | The address of the server that accepted status request. |
DEPENDENT | nginx.info.address Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Nginx | Nginx: Generation | The total number of configuration reloads. |
DEPENDENT | nginx.info.generation Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Nginx | Nginx: Uptime | The server uptime. |
DEPENDENT | nginx.info.uptime Preprocessing: - JSONPATH: ⛔️ON_FAIL: - JAVASCRIPT: |
Nginx | Nginx: Connections accepted, rate | The total number of accepted client connections per second. |
DEPENDENT | nginx.connections.accepted.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: Connections dropped | The total number of dropped client connections. |
DEPENDENT | nginx.connections.dropped Preprocessing: - JSONPATH: |
Nginx | Nginx: Connections active | The current number of active client connections. |
DEPENDENT | nginx.connections.active Preprocessing: - JSONPATH: |
Nginx | Nginx: Connections idle | The current number of idle client connections. |
DEPENDENT | nginx.connections.idle Preprocessing: - JSONPATH: |
Nginx | Nginx: SSL handshakes, rate | The total number of successful SSL handshakes per second. |
DEPENDENT | nginx.ssl.handshakes.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: SSL handshakes failed, rate | The total number of failed SSL handshakes per second. |
DEPENDENT | nginx.ssl.handshakesfailed.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: SSL session reuses, rate | The total number of session reuses during SSL handshake per second. |
DEPENDENT | nginx.ssl.sessionreuses.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: Requests total, rate | The total number of client requests per second. |
DEPENDENT | nginx.requests.total.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: Requests current | The current number of client requests. |
DEPENDENT | nginx.requests.current Preprocessing: - JSONPATH: |
Nginx | Nginx: HTTP server zone [{#NAME}]: Raw data | The raw data of the HTTP server zone with the name |
DEPENDENT | nginx.http.server_zones.raw[{#NAME}] Preprocessing: - JSONPATH: |
Nginx | Nginx: HTTP server zone [{#NAME}]: Processing | The number of client requests that are currently being processed. |
DEPENDENT | nginx.http.server_zones.processing[{#NAME}] Preprocessing: - JSONPATH: |
Nginx | Nginx: HTTP server zone [{#NAME}]: Requests, rate | The total number of client requests received from clients per second. |
DEPENDENT | nginx.http.serverzones.requests.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: HTTP server zone [{#NAME}]: Responses 1xx, rate | The number of responses with |
DEPENDENT | nginx.http.serverzones.responses.1xx.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: HTTP server zone [{#NAME}]: Responses 2xx, rate | The number of responses with |
DEPENDENT | nginx.http.serverzones.responses.2xx.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: HTTP server zone [{#NAME}]: Responses 3xx, rate | The number of responses with |
DEPENDENT | nginx.http.serverzones.responses.3xx.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: HTTP server zone [{#NAME}]: Responses 4xx, rate | The number of responses with |
DEPENDENT | nginx.http.serverzones.responses.4xx.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: HTTP server zone [{#NAME}]: Responses 5xx, rate | The number of responses with |
DEPENDENT | nginx.http.serverzones.responses.5xx.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: HTTP server zone [{#NAME}]: Responses total, rate | The total number of responses sent to clients per second. |
DEPENDENT | nginx.http.serverzones.responses.total.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: HTTP server zone [{#NAME}]: Discarded, rate | The total number of requests completed without sending a response per second. |
DEPENDENT | nginx.http.serverzones.discarded.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: HTTP server zone [{#NAME}]: Received, rate | The total number of bytes received from clients per second. |
DEPENDENT | nginx.http.serverzones.received.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: HTTP server zone [{#NAME}]: Sent, rate | The total number of bytes sent to clients per second. |
DEPENDENT | nginx.http.serverzones.sent.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: HTTP location zone [{#NAME}]: Raw data | The raw data of the location zone with the name |
DEPENDENT | nginx.http.location_zones.raw[{#NAME}] Preprocessing: - JSONPATH: |
Nginx | Nginx: HTTP location zone [{#NAME}]: Requests, rate | The total number of client requests received from clients per second. |
DEPENDENT | nginx.http.locationzones.requests.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: HTTP location zone [{#NAME}]: Responses 1xx, rate | The number of responses with |
DEPENDENT | nginx.http.locationzones.responses.1xx.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: HTTP location zone [{#NAME}]: Responses 2xx, rate | The number of responses with |
DEPENDENT | nginx.http.locationzones.responses.2xx.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: HTTP location zone [{#NAME}]: Responses 3xx, rate | The number of responses with |
DEPENDENT | nginx.http.locationzones.responses.3xx.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: HTTP location zone [{#NAME}]: Responses 4xx, rate | The number of responses with |
DEPENDENT | nginx.http.locationzones.responses.4xx.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: HTTP location zone [{#NAME}]: Responses 5xx, rate | The number of responses with |
DEPENDENT | nginx.http.locationzones.responses.5xx.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: HTTP location zone [{#NAME}]: Responses total, rate | The total number of responses sent to clients per second. |
DEPENDENT | nginx.http.locationzones.responses.total.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: HTTP location zone [{#NAME}]: Discarded, rate | The total number of requests completed without sending a response per second. |
DEPENDENT | nginx.http.locationzones.discarded.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: HTTP location zone [{#NAME}]: Received, rate | The total number of bytes received from clients per second. |
DEPENDENT | nginx.http.locationzones.received.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: HTTP location zone [{#NAME}]: Sent, rate | The total number of bytes sent to clients per second. |
DEPENDENT | nginx.http.locationzones.sent.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: HTTP upstream [{#NAME}]: Raw data | The raw data of the HTTP upstream with the name |
DEPENDENT | nginx.http.upstreams.raw[{#NAME}] Preprocessing: - JSONPATH: |
Nginx | Nginx: HTTP upstream [{#NAME}]: Keepalive | The current number of idle keepalive connections. |
DEPENDENT | nginx.http.upstreams.keepalive[{#NAME}] Preprocessing: - JSONPATH: |
Nginx | Nginx: HTTP upstream [{#NAME}]: Zombies | The current number of servers removed from the group but still processing active client requests. |
DEPENDENT | nginx.http.upstreams.zombies[{#NAME}] Preprocessing: - JSONPATH: |
Nginx | Nginx: HTTP upstream [{#NAME}]: Zone | The name of the shared memory zone that keeps the group's configuration and run-time state. |
DEPENDENT | nginx.http.upstreams.zone[{#NAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Nginx | Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Raw data | The raw data of the HTTP upstream with the name |
DEPENDENT | nginx.http.upstream.peer.raw[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: |
Nginx | Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: State | The current state, which may be one of “up”, “draining”, “down”, “unavail”, “checking”, and “unhealthy”. |
DEPENDENT | nginx.http.upstream.peer.state[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Nginx | Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Active | The current number of active connections. |
DEPENDENT | nginx.http.upstream.peer.active[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: |
Nginx | Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Requests, rate | The total number of client requests forwarded to this server per second. |
DEPENDENT | nginx.http.upstream.peer.requests.rate[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses 1xx, rate | The number of responses with |
DEPENDENT | nginx.http.upstream.peer.responses.1xx.rate[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses 2xx, rate | The number of responses with |
DEPENDENT | nginx.http.upstream.peer.responses.2xx.rate[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses 3xx, rate | The number of responses with |
DEPENDENT | nginx.http.upstream.peer.responses.3xx.rate[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses 4xx, rate | The number of responses with |
DEPENDENT | nginx.http.upstream.peer.responses.4xx.rate[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses 5xx, rate | The number of responses with |
DEPENDENT | nginx.http.upstream.peer.responses.5xx.rate[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses total, rate | The total number of responses obtained from this server. |
DEPENDENT | nginx.http.upstream.peer.responses.total.rate[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Sent, rate | The total number of bytes sent to this server per second. |
DEPENDENT | nginx.http.upstream.peer.sent.rate[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Received, rate | The total number of bytes received from this server per second. |
DEPENDENT | nginx.http.upstream.peer.received.rate[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Fails, rate | The total number of unsuccessful attempts to communicate with the server per second. |
DEPENDENT | nginx.http.upstream.peer.fails.rate[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Unavail | Displays how many times the server has become unavailable for client requests (the state - “unavail”) due to the number of unsuccessful attempts reaching the |
DEPENDENT | nginx.http.upstream.peer.unavail.rate[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: |
Nginx | Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Header time | The average time to get the response header from the server. |
DEPENDENT | nginx.http.upstream.peer.headertime.rate[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
Nginx | Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Response time | The average time to get the full response from the server. |
DEPENDENT | nginx.http.upstream.peer.responsetime.rate[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
Nginx | Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, check | The total number of health check requests made. |
DEPENDENT | nginx.http.upstream.peer.health_checks.checks[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: |
Nginx | Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, fails | The number of failed health checks. |
DEPENDENT | nginx.http.upstream.peer.health_checks.fails[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: |
Nginx | Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, unhealthy | Displays how many times the server has become |
DEPENDENT | nginx.http.upstream.peer.health_checks.unhealthy[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: |
Nginx | Nginx: Stream server zone [{#NAME}]: Raw data | The raw data of server zone with the name |
DEPENDENT | nginx.stream.server_zones.raw[{#NAME}] Preprocessing: - JSONPATH: |
Nginx | Nginx: Stream server zone [{#NAME}]: Processing | The number of client connections that are currently being processed. |
DEPENDENT | nginx.stream.server_zones.processing[{#NAME}] Preprocessing: - JSONPATH: |
Nginx | Nginx: Stream server zone [{#NAME}]: Connections, rate | The total number of connections accepted from clients per second. |
DEPENDENT | nginx.stream.serverzones.connections.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: Stream server zone [{#NAME}]: Sessions 2xx, rate | The total number of sessions completed with status code |
DEPENDENT | nginx.stream.serverzones.sessions.2xx.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: Stream server zone [{#NAME}]: Sessions 4xx, rate | The total number of sessions completed with status code |
DEPENDENT | nginx.stream.serverzones.sessions.4xx.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: Stream server zone [{#NAME}]: Sessions 5xx, rate | The total number of sessions completed with status code |
DEPENDENT | nginx.stream.serverzones.sessions.5xx.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: Stream server zone [{#NAME}]: Sessions total, rate | The total number of completed client sessions per second. |
DEPENDENT | nginx.stream.serverzones.sessions.total.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: Stream server zone [{#NAME}]: Discarded, rate | The total number of connections completed without creating a session per second. |
DEPENDENT | nginx.stream.serverzones.discarded.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: Stream server zone [{#NAME}]: Received, rate | The total number of bytes received from clients per second. |
DEPENDENT | nginx.stream.serverzones.received.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: Stream server zone [{#NAME}]: Sent, rate | The total number of bytes sent to clients per second. |
DEPENDENT | nginx.stream.serverzones.sent.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Nginx | Nginx: Stream upstream [{#NAME}]: Raw data | The raw data of the upstream with the name |
DEPENDENT | nginx.stream.upstreams.raw[{#NAME}] Preprocessing: - JSONPATH: |
Nginx | Nginx: Stream upstream [{#NAME}]: Zombies | - |
DEPENDENT | nginx.stream.upstreams.zombies[{#NAME}] Preprocessing: - JSONPATH: |
Nginx | Nginx: Stream upstream [{#NAME}]: Zone | - |
DEPENDENT | nginx.stream.upstreams.zone[{#NAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Nginx | Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Raw data | The raw data of the upstream with the name |
DEPENDENT | nginx.stream.upstream.peer.raw[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: |
Nginx | Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: State | The current state, which may be one of “up”, “draining”, “down”, “unavail”, “checking”, and “unhealthy”. |
DEPENDENT | nginx.stream.upstream.peer.state[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Nginx | Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Active | The current number of connections. |
DEPENDENT | nginx.stream.upstream.peer.active[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: |
Nginx | Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Sent, rate | The total number of bytes sent to this server per second. |
DEPENDENT | nginx.stream.upstream.peer.sent.rate[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Received, rate | The total number of bytes received from this server per second. |
DEPENDENT | nginx.stream.upstream.peer.received.rate[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Fails, rate | The total number of unsuccessful attempts to communicate with the server per second. |
DEPENDENT | nginx.stream.upstream.peer.fails.rate[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Unavail | Displays how many times the server has become unavailable for client requests (the state - “unavail”) due to the number of unsuccessful attempts reaching the |
DEPENDENT | nginx.stream.upstream.peer.unavail.rate[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: |
Nginx | Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Connections | The total number of client connections forwarded to this server. |
DEPENDENT | nginx.stream.upstream.peer.connections.rate[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: |
Nginx | Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Connect time | The average time to connect to the upstream server. |
DEPENDENT | nginx.stream.upstream.peer.connecttime.rate[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
Nginx | Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: First byte time | The average time to receive the first byte of data. |
DEPENDENT | nginx.stream.upstream.peer.firstbytetime.rate[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Nginx | Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Response time | The average time to receive the last byte of data. |
DEPENDENT | nginx.stream.upstream.peer.responsetime.rate[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
Nginx | Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, check | The total number of health check requests made. |
DEPENDENT | nginx.stream.upstream.peer.health_checks.checks[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: |
Nginx | Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, fails | The number of failed health checks. |
DEPENDENT | nginx.stream.upstream.peer.health_checks.fails[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: |
Nginx | Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, unhealthy | Displays how many times the server has become |
DEPENDENT | nginx.stream.upstream.peer.health_checks.unhealthy[{#UPSTREAM},{#PEER}] Preprocessing: - JSONPATH: |
Nginx | Nginx: Resolver [{#NAME}]: Raw data | The raw data of the |
DEPENDENT | nginx.resolvers.raw[{#NAME}] Preprocessing: - JSONPATH: |
Nginx | Nginx: Resolver [{#NAME}]: Requests name, rate | The total number of requests to resolve names to addresses per second. |
DEPENDENT | nginx.resolvers.requests.name.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: Resolver [{#NAME}]: Requests srv, rate | The total number of requests to resolve SRV records per second. |
DEPENDENT | nginx.resolvers.requests.srv.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: Resolver [{#NAME}]: Requests addr, rate | The total number of requests to resolve addresses to names per second. |
DEPENDENT | nginx.resolvers.requests.addr.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: Resolver [{#NAME}]: Responses noerror, rate | The total number of successful responses per second. |
DEPENDENT | nginx.resolvers.responses.noerror.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: Resolver [{#NAME}]: Responses formerr, rate | The total number of |
DEPENDENT | nginx.resolvers.responses.formerr.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: Resolver [{#NAME}]: Responses servfail, rate | The total number of |
DEPENDENT | nginx.resolvers.responses.servfail.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: Resolver [{#NAME}]: Responses nxdomain, rate | The total number of |
DEPENDENT | nginx.resolvers.responses.nxdomain.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: Resolver [{#NAME}]: Responses notimp, rate | The total number of |
DEPENDENT | nginx.resolvers.responses.notimp.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: Resolver [{#NAME}]: Responses refused, rate | The total number of |
DEPENDENT | nginx.resolvers.responses.refused.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: Resolver [{#NAME}]: Responses timedout, rate | The total number of timed out requests per second. |
DEPENDENT | nginx.resolvers.responses.timedout.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Nginx | Nginx: Resolver [{#NAME}]: Responses unknown, rate | The total number of requests completed with an unknown error per second. |
DEPENDENT | nginx.resolvers.responses.unknown.rate[{#NAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Zabbix raw items | Nginx: Get info | Return status of the NGINX running instance. |
HTTP_AGENT | nginx.info |
Zabbix raw items | Nginx: Get connections | Returns the statistics of client connections. |
HTTP_AGENT | nginx.connections |
Zabbix raw items | Nginx: Get SSL | Returns the SSL statistics. |
HTTP_AGENT | nginx.ssl |
Zabbix raw items | Nginx: Get requests | Returns the status of the client's HTTP requests. |
HTTP_AGENT | nginx.requests |
Zabbix raw items | Nginx: Get HTTP zones | Returns the status information for each HTTP server zone. |
HTTP_AGENT | nginx.http.server_zones |
Zabbix raw items | Nginx: Get HTTP location zones | Returns the status information for each HTTP location zone. |
HTTP_AGENT | nginx.http.location_zones |
Zabbix raw items | Nginx: Get HTTP upstreams | Returns the status of each HTTP upstream server group and its servers. |
HTTP_AGENT | nginx.http.upstreams |
Zabbix raw items | Nginx: Get Stream server zones | Returns the status information for each server zone configured in the "stream" directive. |
HTTP_AGENT | nginx.stream.server_zones |
Zabbix raw items | Nginx: Get Stream upstreams | Returns status of each stream upstream server group and its servers. |
HTTP_AGENT | nginx.stream.upstreams |
Zabbix raw items | Nginx: Get resolvers | Returns the status information for each Resolver zone. |
HTTP_AGENT | nginx.resolvers |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Nginx: Server response error | - |
length(last(/NGINX Plus by HTTP/nginx.info.error))>0 |
HIGH | |
Nginx: Version has changed | Nginx version has changed. Acknowledge to close manually. |
last(/NGINX Plus by HTTP/nginx.info.version,#1)<>last(/NGINX Plus by HTTP/nginx.info.version,#2) and length(last(/NGINX Plus by HTTP/nginx.info.version))>0 |
INFO | Manual close: YES |
Nginx: Host has been restarted | The host uptime is less than 10 minutes. |
last(/NGINX Plus by HTTP/nginx.info.uptime)<10m |
INFO | Manual close: YES |
Nginx: Failed to fetch info data | Zabbix has not received any data for metrics for the last 30 minutes |
nodata(/NGINX Plus by HTTP/nginx.info.uptime,30m)=1 |
WARNING | Manual close: YES |
Nginx: High connections drop rate | The rate of dropped connections is greater than |
min(/NGINX Plus by HTTP/nginx.connections.dropped,5m) > {$NGINX.DROP_RATE.MAX.WARN} |
WARNING | |
Nginx: HTTP upstream server is not in UP or DOWN state. | - |
find(/NGINX Plus by HTTP/nginx.http.upstream.peer.state[{#UPSTREAM},{#PEER}],,"like","up")=0 and find(/NGINX Plus by HTTP/nginx.http.upstream.peer.state[{#UPSTREAM},{#PEER}],,"like","down")=0 |
WARNING | |
Nginx: Too many HTTP requests with code 4xx | - |
sum(/NGINX Plus by HTTP/nginx.http.upstream.peer.responses.4xx.rate[{#UPSTREAM},{#PEER}],5m) > (sum(/NGINX Plus by HTTP/nginx.http.upstream.peer.responses.total.rate[{#UPSTREAM},{#PEER}],5m)*({$NGINX.HTTP.UPSTREAM.4XX.MAX.WARN}/100)) |
WARNING | |
Nginx: Too many HTTP requests with code 5xx | - |
sum(/NGINX Plus by HTTP/nginx.http.upstream.peer.responses.5xx.rate[{#UPSTREAM},{#PEER}],5m) > (sum(/NGINX Plus by HTTP/nginx.http.upstream.peer.responses.total.rate[{#UPSTREAM},{#PEER}],5m)*({$NGINX.HTTP.UPSTREAM.5XX.MAX.WARN}/100)) |
HIGH | |
Nginx: Stream upstream server is not in UP or DOWN state. | - |
find(/NGINX Plus by HTTP/nginx.stream.upstream.peer.state[{#UPSTREAM},{#PEER}],,"like","up")=0 and find(/NGINX Plus by HTTP/nginx.stream.upstream.peer.state[{#UPSTREAM},{#PEER}],,"like","down")=0 |
WARNING |
Please report any issues with the template at https://support.zabbix.com.
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums.
For Zabbix version: 6.2 and higher
The template to monitor Nginx by Zabbix that work without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Nginx by HTTP
collects metrics by polling ngxhttpstubstatusmodule with HTTP agent remotely:
Active connections: 291
server accepts handled requests
16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106
Note that this solution supports https and redirects.
This template was tested on:
See Zabbix template operation for basic instructions.
Setup ngxhttpstubstatusmodule.
Test availability of httpstubstatus module with nginx -V 2>&1 | grep -o with-http_stub_status_module
.
Example configuration of Nginx:
location = /basic_status {
stub_status;
allow <IP of your Zabbix server/proxy>;
deny all;
}
If you use another location, don't forget to change {$NGINX.STUB_STATUS.PATH} macro.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$NGINX.DROP_RATE.MAX.WARN} | The critical rate of the dropped connections for trigger expression. |
1 |
{$NGINX.RESPONSE_TIME.MAX.WARN} | The Nginx maximum response time in seconds for trigger expression. |
10 |
{$NGINX.STUB_STATUS.PATH} | The path of Nginx stub_status page. |
basic_status |
{$NGINX.STUB_STATUS.PORT} | The port of Nginx stub_status host or container. |
80 |
{$NGINX.STUB_STATUS.SCHEME} | The protocol http or https of Nginx stub_status host or container. |
http |
There are no template links in this template.
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Nginx | Nginx: Service status | - |
SIMPLE | net.tcp.service[http,"{HOST.CONN}","{$NGINX.STUBSTATUS.PORT}"] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
Nginx | Nginx: Service response time | - |
SIMPLE | net.tcp.service.perf[http,"{HOST.CONN}","{$NGINX.STUB_STATUS.PORT}"] |
Nginx | Nginx: Requests total | The total number of client requests. |
DEPENDENT | nginx.requests.total Preprocessing: - REGEX: |
Nginx | Nginx: Requests per second | The total number of client requests. |
DEPENDENT | nginx.requests.total.rate Preprocessing: - REGEX: - CHANGEPERSECOND |
Nginx | Nginx: Connections accepted per second | The total number of accepted client connections. |
DEPENDENT | nginx.connections.accepted.rate Preprocessing: - REGEX: - CHANGEPERSECOND |
Nginx | Nginx: Connections dropped per second | The total number of dropped client connections. |
DEPENDENT | nginx.connections.dropped.rate Preprocessing: - JAVASCRIPT: - CHANGEPERSECOND |
Nginx | Nginx: Connections handled per second | The total number of handled connections. Generally, the parameter value is the same as accepts unless some resource limits have been reached (for example, the worker_connections limit). |
DEPENDENT | nginx.connections.handled.rate Preprocessing: - REGEX: - CHANGEPERSECOND |
Nginx | Nginx: Connections active | The current number of active client connections including Waiting connections. |
DEPENDENT | nginx.connections.active Preprocessing: - REGEX: |
Nginx | Nginx: Connections reading | The current number of connections where nginx is reading the request header. |
DEPENDENT | nginx.connections.reading Preprocessing: - REGEX: |
Nginx | Nginx: Connections waiting | The current number of idle client connections waiting for a request. |
DEPENDENT | nginx.connections.waiting Preprocessing: - REGEX: |
Nginx | Nginx: Connections writing | The current number of connections where nginx is writing the response back to the client. |
DEPENDENT | nginx.connections.writing Preprocessing: - REGEX: |
Nginx | Nginx: Version | - |
DEPENDENT | nginx.version Preprocessing: - REGEX: - DISCARDUNCHANGEDHEARTBEAT: |
Zabbix raw items | Nginx: Get stub status page | The following status information is provided: Active connections - the current number of active client connections including Waiting connections. Accepts - the total number of accepted client connections. Handled - the total number of handled connections. Generally, the parameter value is the same as accepts unless some resource limits have been reached (for example, the workerconnections limit). Requests - the total number of client requests. Reading - the current number of connections where nginx is reading the request header. Writing - the current number of connections where nginx is writing the response back to the client. Waiting - the current number of idle client connections waiting for a request. https://nginx.org/en/docs/http/ngxhttpstubstatus_module.html |
HTTP_AGENT | nginx.getstubstatus |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Nginx: Service is down | - |
last(/Nginx by HTTP/net.tcp.service[http,"{HOST.CONN}","{$NGINX.STUB_STATUS.PORT}"])=0 |
AVERAGE | Manual close: YES |
Nginx: Service response time is too high | - |
min(/Nginx by HTTP/net.tcp.service.perf[http,"{HOST.CONN}","{$NGINX.STUB_STATUS.PORT}"],5m)>{$NGINX.RESPONSE_TIME.MAX.WARN} |
WARNING | Manual close: YES Depends on: - Nginx: Service is down |
Nginx: High connections drop rate | The dropping rate connections is greater than {$NGINX.DROP_RATE.MAX.WARN} for the last 5 minutes. |
min(/Nginx by HTTP/nginx.connections.dropped.rate,5m) > {$NGINX.DROP_RATE.MAX.WARN} |
WARNING | Depends on: - Nginx: Service is down |
Nginx: Version has changed | Nginx version has changed. Ack to close. |
last(/Nginx by HTTP/nginx.version,#1)<>last(/Nginx by HTTP/nginx.version,#2) and length(last(/Nginx by HTTP/nginx.version))>0 |
INFO | Manual close: YES |
Nginx: Failed to fetch stub status page | Zabbix has not received data for items for the last 30 minutes. |
find(/Nginx by HTTP/nginx.get_stub_status,,"like","HTTP/1.1 200")=0 or nodata(/Nginx by HTTP/nginx.get_stub_status,30m)=1 |
WARNING | Manual close: YES Depends on: - Nginx: Service is down |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher. This template is developed to monitor Nginx by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template Nginx by Zabbix agent
- collects metrics by polling the Module ngxhttpstubstatusmodule locally with Zabbix agent:
Active connections: 291
server accepts handled requests
16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106
Note that this template doesn't support HTTPS and redirects (limitations of web.page.get
).
It also uses Zabbix agent to collect Nginx
Linux process statistics, such as CPU usage, memory usage and whether the process is running or not.
This template was tested on:
See Zabbix template operation for basic instructions.
See the setup instructions for ngxhttpstubstatusmodule.
Test the availability of the http_stub_status_module
with nginx -V 2>&1 | grep -o with-http_stub_status_module
.
Example configuration of Nginx:
location = /basic_status {
stub_status;
allow 127.0.0.1;
allow ::1;
deny all;
}
If you use another location, then don't forget to change the {$NGINX.STUB_STATUS.PATH} macro. Install and setup Zabbix agent.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$NGINX.DROP_RATE.MAX.WARN} | The critical rate of the dropped connections for a trigger expression. |
1 |
{$NGINX.PROCESS_NAME} | The process name of the Nginx server. |
nginx |
{$NGINX.RESPONSE_TIME.MAX.WARN} | The maximum response time of Nginx expressed in seconds for a trigger expression. |
10 |
{$NGINX.STUB_STATUS.HOST} | The Hostname or an IP addess of the Nginx host or Nginx container of |
localhost |
{$NGINX.STUB_STATUS.PATH} | The path of the |
basic_status |
{$NGINX.STUB_STATUS.PORT} | The port of the |
80 |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Nginx process discovery | The discovery of Nginx process summary. |
DEPENDENT | nginx.proc.discovery Filter: AND- {#NAME} MATCHES_REGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Nginx | Nginx: Service status | - |
ZABBIX_PASSIVE | net.tcp.service[http,"{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PORT}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Nginx | Nginx: Service response time | - |
ZABBIX_PASSIVE | net.tcp.service.perf[http,"{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PORT}"] |
Nginx | Nginx: Requests total | The total number of client requests. |
DEPENDENT | nginx.requests.total Preprocessing: - REGEX: |
Nginx | Nginx: Requests per second | The total number of client requests. |
DEPENDENT | nginx.requests.total.rate Preprocessing: - REGEX: - CHANGEPERSECOND |
Nginx | Nginx: Connections accepted per second | The total number of accepted client connections. |
DEPENDENT | nginx.connections.accepted.rate Preprocessing: - REGEX: - CHANGEPERSECOND |
Nginx | Nginx: Connections dropped per second | The total number of dropped client connections. |
DEPENDENT | nginx.connections.dropped.rate Preprocessing: - JAVASCRIPT: - CHANGEPERSECOND |
Nginx | Nginx: Connections handled per second | The total number of handled connections. Generally, the parameter value is the same as for the accepted connections, unless some resource limits have been reached (for example, the |
DEPENDENT | nginx.connections.handled.rate Preprocessing: - REGEX: - CHANGEPERSECOND |
Nginx | Nginx: Connections active | The current number of active client connections including waiting connections. |
DEPENDENT | nginx.connections.active Preprocessing: - REGEX: |
Nginx | Nginx: Connections reading | The current number of connections where Nginx is reading the request header. |
DEPENDENT | nginx.connections.reading Preprocessing: - REGEX: |
Nginx | Nginx: Connections waiting | The current number of idle client connections waiting for a request. |
DEPENDENT | nginx.connections.waiting Preprocessing: - REGEX: |
Nginx | Nginx: Connections writing | The current number of connections where Nginx is writing a response back to the client. |
DEPENDENT | nginx.connections.writing Preprocessing: - REGEX: |
Nginx | Nginx: Get processes summary | The aggregated data of summary metrics for all processes. |
ZABBIX_PASSIVE | proc.get[,,,summary] |
Nginx | Nginx: Version | - |
DEPENDENT | nginx.version Preprocessing: - REGEX: - DISCARDUNCHANGEDHEARTBEAT: |
Nginx | Nginx: CPU utilization | The percentage of the CPU utilization by a process {#NAME}. |
ZABBIX_PASSIVE | proc.cpu.util[{#NAME}] |
Nginx | Nginx: Get process data | The summary metrics aggregated by a process {#NAME}. |
DEPENDENT | nginx.proc.get[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Nginx | Nginx: Memory usage (vsize) | The summary of virtual memory used by a process {#NAME} expressed in bytes. |
DEPENDENT | nginx.proc.vmem[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Nginx | Nginx: Memory usage (rss) | The summary of resident set size memory used by a process {#NAME} expressed in bytes. |
DEPENDENT | nginx.proc.rss[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Nginx | Nginx: Memory usage, % | The percentage of real memory used by a process {#NAME}. |
DEPENDENT | nginx.proc.pmem[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Nginx | Nginx: Number of running processes | The number of running processes {#NAME}. |
DEPENDENT | nginx.proc.num[{#NAME}] Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
Zabbix raw items | Nginx: Get stub status page | The following status information is provided: See also Module ngxhttpstubstatusmodule. |
ZABBIX_PASSIVE | web.page.get["{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PATH}","{$NGINX.STUB_STATUS.PORT}"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Nginx: Version has changed | The Nginx version has changed. Acknowledge (Ack) to close manually. |
last(/Nginx by Zabbix agent/nginx.version,#1)<>last(/Nginx by Zabbix agent/nginx.version,#2) and length(last(/Nginx by Zabbix agent/nginx.version))>0 |
INFO | Manual close: YES |
Nginx: Process is not running | - |
last(/Nginx by Zabbix agent/nginx.proc.num[{#NAME}])=0 |
HIGH | |
Nginx: Service is down | - |
last(/Nginx by Zabbix agent/net.tcp.service[http,"{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PORT}"])=0 and last(/Nginx by Zabbix agent/nginx.proc.num[{#NAME}])>0 |
AVERAGE | Manual close: YES |
Nginx: High connections drop rate | The rate of dropping connections has been greater than {$NGINX.DROP_RATE.MAX.WARN} for the last 5 minutes. |
min(/Nginx by Zabbix agent/nginx.connections.dropped.rate,5m) > {$NGINX.DROP_RATE.MAX.WARN} and last(/Nginx by Zabbix agent/nginx.proc.num[{#NAME}])>0 |
WARNING | Depends on: - Nginx: Service is down |
Nginx: Service response time is too high | - |
min(/Nginx by Zabbix agent/net.tcp.service.perf[http,"{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PORT}"],5m)>{$NGINX.RESPONSE_TIME.MAX.WARN} and last(/Nginx by Zabbix agent/nginx.proc.num[{#NAME}])>0 |
WARNING | Manual close: YES Depends on: - Nginx: Service is down |
Nginx: Failed to fetch stub status page | Zabbix has not received any data for items for the last 30 minutes. |
(find(/Nginx by Zabbix agent/web.page.get["{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PATH}","{$NGINX.STUB_STATUS.PORT}"],,"like","HTTP/1.1 200")=0 or nodata(/Nginx by Zabbix agent/web.page.get["{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PATH}","{$NGINX.STUB_STATUS.PORT}"],30m)) and last(/Nginx by Zabbix agent/nginx.proc.num[{#NAME}])>0 |
WARNING | Manual close: YES Depends on: - Nginx: Service is down |
Please report any issues with the template at https://support.zabbix.com.
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums.
For Zabbix version: 6.2 and higher
The template to monitor Memcached server by Zabbix that work without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Memcached by Zabbix agent 2
— collects metrics by polling zabbix-agent2.
This template was tested on:
See Zabbix template operation for basic instructions.
Setup and configure zabbix-agent2 compiled with the Memcached monitoring plugin.
Test availability: zabbix_get -s memcached-host -k memcached.ping
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$MEMCACHED.CONN.PRC.MAX.WARN} | Maximum percentage of connected clients |
80 |
{$MEMCACHED.CONN.QUEUED.MAX.WARN} | Maximum number of queued connections per second |
1 |
{$MEMCACHED.CONN.THROTTLED.MAX.WARN} | Maximum number of throttled connections per second |
1 |
{$MEMCACHED.CONN.URI} | Connection string in the URI format (password is not used). This param overwrites a value configured in the "Plugins.Memcached.Uri" option of the configuration file (if it's set), otherwise, the plugin's default value is used: "tcp://localhost:11211" |
tcp://localhost:11211 |
{$MEMCACHED.MEM.PUSED.MAX.WARN} | Maximum percentage of memory used |
90 |
There are no template links in this template.
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Memcached | Memcached: Ping | ZABBIX_PASSIVE | memcached.ping["{$MEMCACHED.CONN.URI}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
|
Memcached | Memcached: Max connections | Max number of concurrent connections |
DEPENDENT | memcached.connections.max Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Memcached | Memcached: Maximum number of bytes | Maximum number of bytes allowed in cache. You can adjust this setting via a config file or the command line while starting your Memcached server. |
DEPENDENT | memcached.config.limitmaxbytes Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:30m |
Memcached | Memcached: CPU sys | System CPU consumed by the Memcached server |
DEPENDENT | memcached.cpu.sys Preprocessing: - JSONPATH: |
Memcached | Memcached: CPU user | User CPU consumed by the Memcached server |
DEPENDENT | memcached.cpu.user Preprocessing: - JSONPATH: |
Memcached | Memcached: Queued connections per second | Number of times that memcached has hit its connections limit and disabled its listener |
DEPENDENT | memcached.connections.queued.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Memcached | Memcached: New connections per second | Number of connections opened per second |
DEPENDENT | memcached.connections.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Memcached | Memcached: Throttled connections | Number of times a client connection was throttled. When sending GETs in batch mode and the connection contains too many requests (limited by -R parameter) the connection might be throttled to prevent starvation. |
DEPENDENT | memcached.connections.throttled.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Memcached | Memcached: Connection structures | Number of connection structures allocated by the server |
DEPENDENT | memcached.connections.structures Preprocessing: - JSONPATH: |
Memcached | Memcached: Open connections | The number of clients presently connected |
DEPENDENT | memcached.connections.current Preprocessing: - JSONPATH: |
Memcached | Memcached: Commands: FLUSH per second | The flush_all command invalidates all items in the database. This operation incurs a performance penalty and shouldn't take place in production, so check your debug scripts. |
DEPENDENT | memcached.commands.flush.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Memcached | Memcached: Commands: GET per second | Number of GET requests received by server per second. |
DEPENDENT | memcached.commands.get.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Memcached | Memcached: Commands: SET per second | Number of SET requests received by server per second. |
DEPENDENT | memcached.commands.set.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Memcached | Memcached: Process id | PID of the server process |
DEPENDENT | memcached.processid Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
Memcached | Memcached: Memcached version | Version of the Memcached server |
DEPENDENT | memcached.version Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Memcached | Memcached: Uptime | Number of seconds since Memcached server start |
DEPENDENT | memcached.uptime Preprocessing: - JSONPATH: |
Memcached | Memcached: Bytes used | Current number of bytes used to store items. |
DEPENDENT | memcached.stats.bytes Preprocessing: - JSONPATH: |
Memcached | Memcached: Written bytes per second | The network's read rate per second in B/sec |
DEPENDENT | memcached.stats.byteswritten.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Memcached | Memcached: Read bytes per second | The network's read rate per second in B/sec |
DEPENDENT | memcached.stats.bytesread.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Memcached | Memcached: Hits per second | Number of successful GET requests (items requested and found) per second. |
DEPENDENT | memcached.stats.hits.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Memcached | Memcached: Misses per second | Number of missed GET requests (items requested but not found) per second. |
DEPENDENT | memcached.stats.misses.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Memcached | Memcached: Evictions per second | "An eviction is when an item that still has time to live is removed from the cache because a brand new item needs to be allocated. The item is selected with a pseudo-LRU mechanism. A high number of evictions coupled with a low hit rate means your application is setting a large number of keys that are never used again." |
DEPENDENT | memcached.stats.evictions.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Memcached | Memcached: New items per second | Number of new items stored per second. |
DEPENDENT | memcached.stats.totalitems.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Memcached | Memcached: Current number of items stored | Current number of items stored by this instance. |
DEPENDENT | memcached.stats.curr_items Preprocessing: - JSONPATH: |
Memcached | Memcached: Threads | Number of worker threads requested |
DEPENDENT | memcached.stats.threads Preprocessing: - JSONPATH: |
Zabbix raw items | Memcached: Get status | ZABBIX_PASSIVE | memcached.stats["{$MEMCACHED.CONN.URI}"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Memcached: Service is down | - |
last(/Memcached by Zabbix agent 2/memcached.ping["{$MEMCACHED.CONN.URI}"])=0 |
AVERAGE | Manual close: YES |
Memcached: Failed to fetch info data | Zabbix has not received data for items for the last 30 minutes |
nodata(/Memcached by Zabbix agent 2/memcached.cpu.sys,30m)=1 |
WARNING | Manual close: YES Depends on: - Memcached: Service is down |
Memcached: Too many queued connections | The max number of connections is reached and a new connection had to wait in the queue as a result. |
min(/Memcached by Zabbix agent 2/memcached.connections.queued.rate,5m)>{$MEMCACHED.CONN.QUEUED.MAX.WARN} |
WARNING | |
Memcached: Too many throttled connections | Number of times a client connection was throttled is too high. When sending GETs in batch mode and the connection contains too many requests (limited by -R parameter) the connection might be throttled to prevent starvation. |
min(/Memcached by Zabbix agent 2/memcached.connections.throttled.rate,5m)>{$MEMCACHED.CONN.THROTTLED.MAX.WARN} |
WARNING | |
Memcached: Total number of connected clients is too high | When the number of connections reaches the value of the "max_connections" parameter, new connections will be rejected. |
min(/Memcached by Zabbix agent 2/memcached.connections.current,5m)/last(/Memcached by Zabbix agent 2/memcached.connections.max)*100>{$MEMCACHED.CONN.PRC.MAX.WARN} |
WARNING | |
Memcached: Version has changed | Memcached version has changed. Ack to close. |
last(/Memcached by Zabbix agent 2/memcached.version,#1)<>last(/Memcached by Zabbix agent 2/memcached.version,#2) and length(last(/Memcached by Zabbix agent 2/memcached.version))>0 |
INFO | Manual close: YES |
Memcached: has been restarted | Uptime is less than 10 minutes. |
last(/Memcached by Zabbix agent 2/memcached.uptime)<10m |
INFO | Manual close: YES |
Memcached: Memory usage is too high | - |
min(/Memcached by Zabbix agent 2/memcached.stats.bytes,5m)/last(/Memcached by Zabbix agent 2/memcached.config.limit_maxbytes)*100>{$MEMCACHED.MEM.PUSED.MAX.WARN} |
WARNING |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
Refer to the vendor documentation.
No specific Zabbix configuration is required.
There are no template links in this template.
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Services | LDAP service is running | - |
SIMPLE | net.tcp.service[ldap] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
LDAP service is down on {HOST.NAME} | - |
max(/LDAP Service/net.tcp.service[ldap],#3)=0 |
AVERAGE |
Please report any issues with the template at https://support.zabbix.com
For Zabbix version: 6.2 and higher. The template to monitor Kubernetes state that work without any external scripts. It works without external scripts and uses the script item to make HTTP requests to the Kubernetes API.
Template Kubernetes cluster state by HTTP
— collects metrics by HTTP agent from kube-state-metrics endpoint and Kubernetes API.
Don't forget change macros {$KUBE.API.URL} and {$KUBE.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values. NOTE. Some metrics may not be collected depending on your Kubernetes version and configuration.
This template was tested on:
See Zabbix template operation for basic instructions.
Install the Zabbix Helm Chart in your Kubernetes cluster. Internal service metrics are collected from kube-state-metrics endpoint.
Template needs to use Authorization via API token.
Set the {$KUBE.API.URL}
such as <scheme>://<host>:<port>
.
Get the generated service account token using the command
kubectl get secret zabbix-service-account -n monitoring -o jsonpath={.data.token} | base64 -d
Then set it to the macro {$KUBE.API.TOKEN}
.
Set {$KUBE.STATE.ENDPOINT.NAME}
with Kube state metrics endpoint name. See kubectl -n monitoring get ep
. Default: zabbix-kube-state-metrics
.
Also, see the Macros section for a list of macros used to set trigger values. NOTE. Some metrics may not be collected depending on your Kubernetes version and configuration.
Set up the macros to filter the metrics of discovered worker nodes:
Set up macros to filter metrics by namespace:
Set up macros to filter node metrics by nodename:
Note, If you have a large cluster, it is highly recommended to set a filter for discoverable namespaces.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$KUBE.API.COMPONENTSTATUSES.ENDPOINT} | Kubernetes API componentstatuses endpoint /api/v1/componentstatuses |
/api/v1/componentstatuses |
{$KUBE.API.LIVEZ.ENDPOINT} | Kubernetes API livez endpoint /livez |
/livez |
{$KUBE.API.READYZ.ENDPOINT} | Kubernetes API readyz endpoint /readyz |
/readyz |
{$KUBE.API.TOKEN} | Service account bearer token |
`` |
{$KUBE.API.URL} | Kubernetes API endpoint URL in the format |
https://localhost:6443 |
{$KUBE.API_SERVER.PORT} | Kubernetes API servers metrics endpoint port. Used in ControlPlane LLD. |
6443 |
{$KUBE.API_SERVER.SCHEME} | Kubernetes API servers metrics endpoint scheme. Used in ControlPlane LLD. |
https |
{$KUBE.CONTROLLER_MANAGER.PORT} | Kubernetes Controller manager metrics endpoint port. Used in ControlPlane LLD. |
10252 |
{$KUBE.CONTROLLER_MANAGER.SCHEME} | Kubernetes Controller manager metrics endpoint scheme. Used in ControlPlane LLD. |
http |
{$KUBE.KUBELET.PORT} | Kubernetes Kubelet manager metrics endpoint port. Used in Kubelet LLD. |
10250 |
{$KUBE.KUBELET.SCHEME} | Kubernetes Kubelet manager metrics endpoint scheme. Used in Kubelet LLD. |
https |
{$KUBE.LLD.FILTER.NAMESPACE.MATCHES} | Filter of discoverable pods by namespace |
.* |
{$KUBE.LLD.FILTER.NAMESPACE.NOT_MATCHES} | Filter to exclude discovered pods by namespace |
CHANGE_IF_NEEDED |
{$KUBE.LLD.FILTER.NODE.MATCHES} | Filter of discoverable nodes by nodename |
.* |
{$KUBE.LLD.FILTER.NODE.NOT_MATCHES} | Filter to exclude discovered nodes by nodename |
CHANGE_IF_NEEDED |
{$KUBE.LLD.FILTER.WORKER_NODE.MATCHES} | Filter of discoverable worker nodes by nodename |
.* |
{$KUBE.LLD.FILTER.WORKERNODE.NOTMATCHES} | Filter to exclude discovered worker nodes by nodename |
CHANGE_IF_NEEDED |
{$KUBE.SCHEDULER.PORT} | Kubernetes Scheduler manager metrics endpoint port. Used in ControlPlane LLD. |
10251 |
{$KUBE.SCHEDULER.SCHEME} | Kubernetes Scheduler manager metrics endpoint scheme. Used in ControlPlane LLD. |
http |
{$KUBE.STATE.ENDPOINT.NAME} | Kubernetes state endpoint name |
zabbix-kube-state-metrics |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
API servers discovery | - |
DEPENDENT | kube.api_servers.discovery |
Component statuses discovery | - |
DEPENDENT | kube.componentstatuses.discovery Preprocessing: - JAVASCRIPT - DISCARDUNCHANGEDHEARTBEAT |
Controller manager nodes discovery | - |
DEPENDENT | kube.controller_manager.discovery |
CronJob discovery | - |
DEPENDENT | kube.cronjob.discovery Preprocessing: - PROMETHEUSTOJSON - JAVASCRIPT - DISCARDUNCHANGEDHEARTBEAT Filter: AND- {#NAMESPACE} MATCHESREGEX - {#NAMESPACE} NOTMATCHES_REGEX |
Daemonset discovery | - |
DEPENDENT | kube.daemonset.discovery Preprocessing: - PROMETHEUSTOJSON - JAVASCRIPT - DISCARDUNCHANGEDHEARTBEAT Filter: AND- {#NAMESPACE} MATCHESREGEX - {#NAMESPACE} NOTMATCHES_REGEX |
Deployment discovery | - |
DEPENDENT | kube.deployment.discovery Preprocessing: - PROMETHEUSTOJSON - JAVASCRIPT - DISCARDUNCHANGEDHEARTBEAT Filter: AND- {#NAMESPACE} MATCHESREGEX - {#NAMESPACE} NOTMATCHES_REGEX |
Endpoint discovery | - |
DEPENDENT | kube.endpoint.discovery Preprocessing: - PROMETHEUSTOJSON - JAVASCRIPT - DISCARDUNCHANGEDHEARTBEAT Filter: AND- {#NAMESPACE} MATCHESREGEX - {#NAMESPACE} NOTMATCHES_REGEX |
Job discovery | - |
DEPENDENT | kube.job.discovery Preprocessing: - PROMETHEUSTOJSON - JAVASCRIPT - DISCARDUNCHANGEDHEARTBEAT Filter: AND- {#NAMESPACE} MATCHESREGEX - {#NAMESPACE} NOTMATCHES_REGEX |
Kubelet discovery | - |
DEPENDENT | kube.kubelet.discovery Filter: AND- {#NAME} MATCHESREGEX - {#NAME} NOTMATCHES_REGEX |
Livez discovery | - |
DEPENDENT | kube.livez.discovery Preprocessing: - JAVASCRIPT - DISCARDUNCHANGEDHEARTBEAT |
Node discovery | - |
DEPENDENT | kube.node.discovery Preprocessing: - PROMETHEUSTOJSON - JAVASCRIPT - DISCARDUNCHANGEDHEARTBEAT Filter: AND- {#NAME} MATCHESREGEX - {#NAME} NOTMATCHES_REGEX |
Pod discovery | - |
DEPENDENT | kube.pod.discovery Preprocessing: - PROMETHEUSTOJSON - JAVASCRIPT - DISCARDUNCHANGEDHEARTBEAT Filter: AND- {#NAMESPACE} MATCHESREGEX - {#NAMESPACE} NOTMATCHES_REGEX |
PodDisruptionBudget discovery | - |
DEPENDENT | kube.pdb.discovery Preprocessing: - PROMETHEUSTOJSON - JAVASCRIPT - DISCARDUNCHANGEDHEARTBEAT Filter: AND- {#NAMESPACE} MATCHESREGEX - {#NAMESPACE} NOTMATCHES_REGEX |
PVC discovery | - |
DEPENDENT | kube.pvc.discovery Preprocessing: - PROMETHEUSTOJSON - JAVASCRIPT - DISCARDUNCHANGEDHEARTBEAT Filter: AND- {#NAMESPACE} MATCHESREGEX - {#NAMESPACE} NOTMATCHES_REGEX |
Readyz discovery | - |
DEPENDENT | kube.readyz.discovery Preprocessing: - JAVASCRIPT - DISCARDUNCHANGEDHEARTBEAT |
Replicaset discovery | - |
DEPENDENT | kube.replicaset.discovery Preprocessing: - PROMETHEUSTOJSON - JAVASCRIPT - DISCARDUNCHANGEDHEARTBEAT Filter: AND- {#NAMESPACE} MATCHESREGEX - {#NAMESPACE} NOTMATCHES_REGEX |
Scheduler servers nodes discovery | - |
DEPENDENT | kube.scheduler.discovery |
Statefulset discovery | - |
DEPENDENT | kube.statefulset.discovery Preprocessing: - PROMETHEUSTOJSON - JAVASCRIPT - DISCARDUNCHANGEDHEARTBEAT Filter: AND- {#NAMESPACE} MATCHESREGEX - {#NAMESPACE} NOTMATCHES_REGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Kubernetes | Kubernetes: Get state metrics | Collecting Kubernetes metrics from kube-state-metrics. |
SCRIPT | kube.state.metrics Expression: The text is too long. Please see the template. |
Kubernetes | Kubernetes: Control plane LLD | Generation of data for Control plane discovery rules. |
SCRIPT | kube.controlplane.lld Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:3h Expression: The text is too long. Please see the template. |
Kubernetes | Kubernetes: Node LLD | Generation of data for Kubelet discovery rules. |
SCRIPT | kube.node.lld Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: Expression: The text is too long. Please see the template. |
Kubernetes | Kubernetes: Get component statuses | - |
HTTP_AGENT | kube.componentstatuses Preprocessing: - CHECKNOTSUPPORTED ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Get readyz | - |
HTTP_AGENT | kube.readyz Preprocessing: - JAVASCRIPT: |
Kubernetes | Kubernetes: Get livez | - |
HTTP_AGENT | kube.livez Preprocessing: - JAVASCRIPT: |
Kubernetes | Kubernetes: Namespace count | The number of namespaces. |
DEPENDENT | kube.namespace.count Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: CronJob count | Number of cronjobs. |
DEPENDENT | kube.cronjob.count Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Job count | Number of jobs(generated by cronjob + job). |
DEPENDENT | kube.job.count Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Endpoint count | Number of endpoints. |
DEPENDENT | kube.endpoint.count Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Deployment count | The number of deployments. |
DEPENDENT | kube.deployment.count Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Service count | The number of services. |
DEPENDENT | kube.service.count Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Statefulset count | The number of statefulsets. |
DEPENDENT | kube.statefulset.count Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Node count | The number of nodes. |
DEPENDENT | kube.node.count Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Daemonset [{#NAME}]: Ready | The number of nodes that should be running the daemon pod and have one or more running and ready. |
DEPENDENT | kube.daemonset.ready[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Daemonset [{#NAME}]: Scheduled | The number of nodes running at least one daemon pod and are supposed to. |
DEPENDENT | kube.daemonset.scheduled[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Daemonset [{#NAME}]: Desired | The number of nodes that should be running the daemon pod. |
DEPENDENT | kube.daemonset.desired[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Daemonset [{#NAME}]: Misscheduled | The number of nodes running a daemon pod but are not supposed to. |
DEPENDENT | kube.daemonset.misscheduled[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Daemonset [{#NAME}]: Updated number scheduled | The total number of nodes that are running updated daemon pod. |
DEPENDENT | kube.daemonset.updated[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] PVC [{#NAME}] Status phase: Available | Persistent volume claim is currently in Active phase. |
DEPENDENT | kube.pvc.statusphase.active[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_persistentvolumeclaim_status_phase{namespace="{#NAMESPACE}", name="{#NAME}", phase="Available"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] PVC [{#NAME}] Status phase: Lost | Persistent volume claim is currently in Lost phase. |
DEPENDENT | kube.pvc.statusphase.lost[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_persistentvolumeclaim_status_phase{namespace="{#NAMESPACE}", name="{#NAME}", phase="Lost"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] PVC [{#NAME}] Status phase: Bound | Persistent volume claim is currently in Bound phase. |
DEPENDENT | kube.pvc.statusphase.bound[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_persistentvolumeclaim_status_phase{namespace="{#NAMESPACE}", name="{#NAME}", phase="Bound"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] PVC [{#NAME}] Status phase: Pending | Persistent volume claim is currently in Pending phase. |
DEPENDENT | kube.pvc.statusphase.pending[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_persistentvolumeclaim_status_phase{namespace="{#NAMESPACE}", name="{#NAME}", phase="Pending"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] PVC [{#NAME}] Requested storage | The capacity of storage requested by the persistent volume claim. |
DEPENDENT | kube.pvc.requested.storage[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Status phase: Pending, sum | Persistent volume claim is currently in Pending phase. |
DEPENDENT | kube.pvc.statusphase.pending.sum[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_persistentvolumeclaim_status_phase{namespace="{#NAMESPACE}", persistentvolumeclaim="{#NAME}", phase="Pending"} : function : sum ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Status phase: Active, sum | Persistent volume claim is currently in Active phase. |
DEPENDENT | kube.pvc.statusphase.active.sum[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_persistentvolumeclaim_status_phase{namespace="{#NAMESPACE}", persistentvolumeclaim="{#NAME}", phase="Active"} : function : sum ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Status phase: Bound, sum | Persistent volume claim is currently in Bound phase. |
DEPENDENT | kube.pvc.statusphase.bound.sum[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_persistentvolumeclaim_status_phase{namespace="{#NAMESPACE}", persistentvolumeclaim="{#NAME}", phase="Bound"} : function : sum ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Status phase: Lost, sum | Persistent volume claim is currently in Lost phase. |
DEPENDENT | kube.pvc.statusphase.lost.sum[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_persistentvolumeclaim_status_phase{namespace="{#NAMESPACE}",persistentvolumeclaim="{#NAME}", phase="Lost"} : function : sum ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Paused | Whether the deployment is paused and will not be processed by the deployment controller. |
DEPENDENT | kube.deployment.specpaused[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_deployment_spec_paused{namespace="{#NAMESPACE}", deployment="{#NAME}"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas desired | Number of desired pods for a deployment. |
DEPENDENT | kube.deployment.replicasdesired[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_deployment_spec_replicas{namespace="{#NAMESPACE}", deployment="{#NAME}"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Rollingupdate max unavailable | Maximum number of unavailable replicas during a rolling update of a deployment. |
DEPENDENT | kube.deployment.rollingupdate.maxunavailable[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_deployment_spec_strategy_rollingupdate_max_unavailable{namespace="{#NAMESPACE}", deployment="{#NAME}"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas | The number of replicas per deployment. |
DEPENDENT | kube.deployment.replicas[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas available | The number of available replicas per deployment. |
DEPENDENT | kube.deployment.replicasavailable[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_deployment_status_replicas_available{namespace="{#NAMESPACE}", deployment="{#NAME}"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas unavailable | The number of unavailable replicas per deployment. |
DEPENDENT | kube.deployment.replicasunavailable[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_deployment_status_replicas_unavailable{namespace="{#NAMESPACE}", deployment="{#NAME}"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas updated | The number of updated replicas per deployment. |
DEPENDENT | kube.deployment.replicasupdated[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_deployment_status_replicas_updated{namespace="{#NAMESPACE}", deployment="{#NAME}"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Endpoint [{#NAME}]: Address available | Number of addresses available in endpoint. |
DEPENDENT | kube.endpoint.addressavailable[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_endpoint_address_available{namespace="{#NAMESPACE}", endpoint="{#NAME}"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Endpoint [{#NAME}]: Address not ready | Number of addresses not ready in endpoint. |
DEPENDENT | kube.endpoint.addressnotready[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Endpoint [{#NAME}]: Age | Endpoint age (number of seconds since creation). |
DEPENDENT | kube.endpoint.age[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - JAVASCRIPT: |
Kubernetes | Kubernetes: Node [{#NAME}]: CPU allocatable | The CPU resources of a node that are available for scheduling. |
DEPENDENT | kube.node.cpuallocatable[{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_node_status_allocatable{node="{#NAME}", resource="cpu"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Node [{#NAME}]: Memory allocatable | The Memory resources of a node that are available for scheduling. |
DEPENDENT | kube.node.memoryallocatable[{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_node_status_allocatable{node="{#NAME}", resource="memory"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Node [{#NAME}]: Pods allocatable | The Pods resources of a node that are available for scheduling. |
DEPENDENT | kube.node.podsallocatable[{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_node_status_allocatable{node="{#NAME}", resource="pods"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Node [{#NAME}]: Ephemeral storage allocatable | The allocatable ephemeral-storage of a node that is available for scheduling. |
DEPENDENT | kube.node.ephemeralstorageallocatable[{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Node [{#NAME}]: CPU capacity | The capacity for CPU resources of a node. |
DEPENDENT | kube.node.cpucapacity[{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_node_status_capacity{node="{#NAME}", resource="cpu"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Node [{#NAME}]: Memory capacity | The capacity for Memory resources of a node. |
DEPENDENT | kube.node.memorycapacity[{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_node_status_capacity{node="{#NAME}", resource="memory"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Node [{#NAME}]: Ephemeral storage capacity | The ephemeral-storage capacity of a node. |
DEPENDENT | kube.node.ephemeralstoragecapacity[{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Node [{#NAME}]: Pods capacity | The capacity for Pods resources of a node. |
DEPENDENT | kube.node.podscapacity[{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_node_status_capacity{node="{#NAME}", resource="pods"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] Phase: Pending | Pod is in pending state. |
DEPENDENT | kube.pod.phase.pending[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] Phase: Succeeded | Pod is in succeeded state. |
DEPENDENT | kube.pod.phase.succeeded[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] Phase: Failed | Pod is in failed state. |
DEPENDENT | kube.pod.phase.failed[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] Phase: Unknown | Pod is in unknown state. |
DEPENDENT | kube.pod.phase.unknown[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] Phase: Running | Pod is in unknown state. |
DEPENDENT | kube.pod.phase.running[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers terminated | Describes whether the container is currently in terminated state. |
DEPENDENT | kube.pod.containersterminated[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_pod_container_status_terminated{pod="{#NAME}"} : function : sum ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers waiting | Describes whether the container is currently in waiting state. |
DEPENDENT | kube.pod.containerswaiting[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_pod_container_status_waiting{pod="{#NAME}"} : function : sum ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers ready | Describes whether the containers readiness check succeeded. |
DEPENDENT | kube.pod.containersready[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_pod_container_status_ready{pod="{#NAME}"} : function : sum ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers restarts | The number of container restarts. |
DEPENDENT | kube.pod.containersrestarts[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_pod_container_status_restarts_total{pod="{#NAME}"} : function : sum ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers running | Describes whether the container is currently in running state. |
DEPENDENT | kube.pod.containersrunning[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_pod_container_status_running{pod="{#NAME}"} : function : sum ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Ready | Describes whether the pod is ready to serve requests. |
DEPENDENT | kube.pod.ready[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Scheduled | Describes the status of the scheduling process for the pod. |
DEPENDENT | kube.pod.scheduled[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Unschedulable | Describes the unschedulable status for the pod. |
DEPENDENT | kube.pod.unschedulable[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers CPU limits | The limit on CPU cores to be used by a container. |
DEPENDENT | kube.pod.containers.limits.cpu[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers memory limits | The limit on memory to be used by a container. |
DEPENDENT | kube.pod.containers.limits.memory[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers CPU requests | The number of requested cpu cores by a container. |
DEPENDENT | kube.pod.containers.requests.cpu[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers memory requests | The number of requested memory bytes by a container. |
DEPENDENT | kube.pod.containers.requests.memory[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Replicaset [{#NAME}]: Replicas | The number of replicas per ReplicaSet. |
DEPENDENT | kube.replicaset.replicas[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Replicaset [{#NAME}]: Desired replicas | Number of desired pods for a ReplicaSet. |
DEPENDENT | kube.replicaset.replicasdesired[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_replicaset_spec_replicas{namespace="{#NAMESPACE}", replicaset="{#NAME}"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Replicaset [{#NAME}]: Fully labeled replicas | The number of fully labeled replicas per ReplicaSet. |
DEPENDENT | kube.replicaset.fullylabeledreplicas[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Replicaset [{#NAME}]: Ready | The number of ready replicas per ReplicaSet. |
DEPENDENT | kube.replicaset.ready[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Statefulset [{#NAME}]: Replicas | The number of replicas per StatefulSet. |
DEPENDENT | kube.statefulset.replicas[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Statefulset [{#NAME}]: Desired replicas | Number of desired pods for a StatefulSet. |
DEPENDENT | kube.statefulset.replicasdesired[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_statefulset_replicas{namespace="{#NAMESPACE}", statefulset="{#NAME}"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Statefulset [{#NAME}]: Current replicas | The number of current replicas per StatefulSet. |
DEPENDENT | kube.statefulset.replicascurrent[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_statefulset_status_replicas_current{namespace="{#NAMESPACE}", statefulset="{#NAME}"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Statefulset [{#NAME}]: Ready replicas | The number of ready replicas per StatefulSet. |
DEPENDENT | kube.statefulset.replicasready[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_statefulset_status_replicas_ready{namespace="{#NAMESPACE}", statefulset="{#NAME}"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Statefulset [{#NAME}]: Updated replicas | The number of updated replicas per StatefulSet. |
DEPENDENT | kube.statefulset.replicasupdated[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_statefulset_status_replicas_updated{namespace="{#NAMESPACE}", statefulset="{#NAME}"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] PodDisruptionBudget [{#NAME}]: Pods healthy | Current number of healthy pods. |
DEPENDENT | kube.pdb.podshealthy[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_poddisruptionbudget_status_current_healthy{namespace="{#NAMESPACE}", poddisruptionbudget="{#NAME}"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] PodDisruptionBudget [{#NAME}]: Pods desired | Minimum desired number of healthy pods. |
DEPENDENT | kube.pdb.podsdesired[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_poddisruptionbudget_status_desired_healthy{namespace="{#NAMESPACE}", poddisruptionbudget="{#NAME}"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] PodDisruptionBudget [{#NAME}]: Disruptions allowed | Number of pod disruptions that are allowed. |
DEPENDENT | kube.pdb.disruptionsallowed[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_poddisruptionbudget_status_pod_disruptions_allowed{namespace="{#NAMESPACE}", poddisruptionbudget="{#NAME}"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] PodDisruptionBudget [{#NAME}]: Pods total | Total number of pods counted by this disruption budget. |
DEPENDENT | kube.pdb.podstotal[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_poddisruptionbudget_status_expected_pods{namespace="{#NAMESPACE}", poddisruptionbudget="{#NAME}"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Suspend | Suspend flag tells the controller to suspend subsequent executions. |
DEPENDENT | kube.cronjob.specsuspend[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_cronjob_spec_suspend{namespace="{#NAMESPACE}", cronjob="{#NAME}"} ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Active | Active holds pointers to currently running jobs. |
DEPENDENT | kube.cronjob.statusactive[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_cronjob_status_active{namespace="{#NAMESPACE}", cronjob="{#NAME}"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Last schedule | LastScheduleTime keeps information of when was the last time the job was successfully scheduled. |
DEPENDENT | kube.cronjob.lastscheduletime[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - JAVASCRIPT: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Next schedule | Next time the cronjob should be scheduled. The time after lastScheduleTime, or after the cron job's creation time if it's never been scheduled. Use this to determine if the job is delayed. |
DEPENDENT | kube.cronjob.nextscheduletime[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - JAVASCRIPT: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Failed | The number of pods which reached Phase Failed and the reason for failure. |
DEPENDENT | kube.cronjob.statusfailed[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_job_status_failed{namespace="{#NAMESPACE}", job_name=~"{#NAME}-*"} : function : sum ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Succeeded | The number of pods which reached Phase Succeeded. |
DEPENDENT | kube.cronjob.statussucceeded[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_job_status_succeeded{namespace="{#NAMESPACE}", job_name=~"{#NAME}-*"} : function : sum ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Completion succeeded | Number of job has completed its execution. |
DEPENDENT | kube.cronjob.completion.succeeded[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Completion failed | Number of job has failed its execution. |
DEPENDENT | kube.cronjob.completion.failed[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Job [{#NAME}]: Failed | The number of pods which reached Phase Failed and the reason for failure. |
DEPENDENT | kube.job.statusfailed[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_job_status_failed{namespace="{#NAMESPACE}", job_name="{#NAME}"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Job [{#NAME}]: Succeeded | The number of pods which reached Phase Succeeded. |
DEPENDENT | kube.job.statussucceeded[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUS PATTERN:kube_job_status_succeeded{namespace="{#NAMESPACE}", job_name="{#NAME}"} ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Job [{#NAME}]: Completion succeeded | Number of job has completed its execution. |
DEPENDENT | kube.job.completion.succeeded[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Job [{#NAME}]: Completion failed | Number of job has failed its execution. |
DEPENDENT | kube.job.completion.failed[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Component [{#NAME}]: Healthy | Cluster component healthy. |
DEPENDENT | kube.componentstatuses.healthy[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Readyz [{#NAME}]: Healthcheck | Result of readyz healthcheck for component. |
DEPENDENT | kube.readyz.healthcheck[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Kubernetes | Kubernetes: Livez [{#NAME}]: Healthcheck | Result of livez healthcheck for component. |
DEPENDENT | kube.livez.healthcheck[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes: NS [{#NAMESPACE}] PVC [{#NAME}]: PVC is pending | - |
min(/Kubernetes cluster state by HTTP/kube.pvc.status_phase.pending[{#NAMESPACE}/{#NAME}],2m)>0 |
WARNING | |
Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Deployment replicas mismatch | - |
(last(/Kubernetes cluster state by HTTP/kube.deployment.replicas[{#NAMESPACE}/{#NAME}])-last(/Kubernetes cluster state by HTTP/kube.deployment.replicas_available[{#NAMESPACE}/{#NAME}]))<>0 |
WARNING | |
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Pod is not healthy | - |
min(/Kubernetes cluster state by HTTP/kube.pod.phase.failed[{#NAMESPACE}/{#NAME}],10m)>0 or min(/Kubernetes cluster state by HTTP/kube.pod.phase.pending[{#NAMESPACE}/{#NAME}],10m)>0 or min(/Kubernetes cluster state by HTTP/kube.pod.phase.unknown[{#NAMESPACE}/{#NAME}],10m)>0 |
HIGH | |
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Pod is crash looping | - |
(last(/Kubernetes cluster state by HTTP/kube.pod.containers_restarts[{#NAMESPACE}/{#NAME}])-min(/Kubernetes cluster state by HTTP/kube.pod.containers_restarts[{#NAMESPACE}/{#NAME}],#3))>2 |
WARNING | |
Kubernetes: Namespace [{#NAMESPACE}] RS [{#NAME}]: ReplicasSet mismatch | - |
(last(/Kubernetes cluster state by HTTP/kube.replicaset.replicas[{#NAMESPACE}/{#NAME}])-last(/Kubernetes cluster state by HTTP/kube.replicaset.ready[{#NAMESPACE}/{#NAME}]))<>0 |
WARNING | |
Kubernetes: Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: StatfulSet is down | - |
(last(/Kubernetes cluster state by HTTP/kube.statefulset.replicas_ready[{#NAMESPACE}/{#NAME}]) / last(/Kubernetes cluster state by HTTP/kube.statefulset.replicas_current[{#NAMESPACE}/{#NAME}]))<>1 |
HIGH | |
Kubernetes: Namespace [{#NAMESPACE}] RS [{#NAME}]: Statefulset replicas mismatch | - |
(last(/Kubernetes cluster state by HTTP/kube.statefulset.replicas[{#NAMESPACE}/{#NAME}])-last(/Kubernetes cluster state by HTTP/kube.statefulset.replicas_ready[{#NAMESPACE}/{#NAME}]))<>0 |
WARNING | |
Kubernetes: Component [{#NAME}] is unhealthy | - |
count(/Kubernetes cluster state by HTTP/kube.componentstatuses.healthy[{#NAME}],#3,,"True")<2 and length(last(/Kubernetes cluster state by HTTP/kube.componentstatuses.healthy[{#NAME}]))>0 |
WARNING | |
Kubernetes: Readyz [{#NAME}] is unhealthy | - |
count(/Kubernetes cluster state by HTTP/kube.readyz.healthcheck[{#NAME}],#3,,"ok")<2 and length(last(/Kubernetes cluster state by HTTP/kube.readyz.healthcheck[{#NAME}]))>0 |
WARNING | |
Kubernetes: Livez [{#NAME}] is unhealthy | - |
count(/Kubernetes cluster state by HTTP/kube.livez.healthcheck[{#NAME}],#3,,"ok")<2 and length(last(/Kubernetes cluster state by HTTP/kube.livez.healthcheck[{#NAME}]))>0 |
WARNING |
Please report any issues with the template at https://support.zabbix.com.
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums.
For Zabbix version: 6.2 and higher
The template to monitor Kubernetes Scheduler by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Kubernetes Scheduler by HTTP
— collects metrics by HTTP agent from Scheduler /metrics endpoint.
This template was tested on:
See Zabbix template operation for basic instructions.
Internal service metrics are collected from /metrics endpoint. Template needs to use Authorization via API token.
Don't forget change macros {$KUBE.SCHEDULER.SERVER.URL}, {$KUBE.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values. NOTE. Some metrics may not be collected depending on your Kubernetes Scheduler instance version and configuration.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$KUBE.API.TOKEN} | API Authorization Token |
`` |
{$KUBE.SCHEDULER.ERROR} | Maximum number of scheduling failures with 'error' used for trigger |
2 |
{$KUBE.SCHEDULER.HTTP.CLIENT.ERROR} | Maximum number of HTTP client requests failures used for trigger |
2 |
{$KUBE.SCHEDULER.SERVER.URL} | Instance URL |
http://localhost:10251/metrics |
{$KUBE.SCHEDULER.UNSCHEDULABLE} | Maximum number of scheduling failures with 'unschedulable' used for trigger |
2 |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Binding histogram | Discovery raw data of binding latency. |
DEPENDENT | kubernetes.scheduler.binding.discovery Preprocessing: - PROMETHEUSTOJSON: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: Overrides: bucket item total item |
e2e scheduling histogram | Discovery raw data and percentile items of e2e scheduling latency. |
DEPENDENT | kubernetes.controller.e2escheduling.discovery Preprocessing: - PROMETHEUS TOJSON:{__name__=~ "scheduler_e2e_scheduling_duration_*", result =~ ".*"} - JAVASCRIPT: - DISCARD UNCHANGEDHEARTBEAT:3h Overrides: bucket item buckets - ITEMPROTOTYPE LIKE bucket - DISCOVERtotal item totals - ITEMPROTOTYPE NOTLIKE bucket - DISCOVER |
Scheduling algorithm histogram | Discovery raw data of scheduling algorithm latency. |
DEPENDENT | kubernetes.scheduler.schedulingalgorithm.discovery Preprocessing: - PROMETHEUS TOJSON:{__name__=~ "scheduler_scheduling_algorithm_duration_seconds_*"} - JAVASCRIPT: - DISCARD UNCHANGEDHEARTBEAT:3h Overrides: bucket item buckets - ITEMPROTOTYPE LIKE bucket - DISCOVERtotal item totals - ITEMPROTOTYPE NOTLIKE bucket - DISCOVER |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Kubernetes Scheduler | Kubernetes Scheduler: Virtual memory, bytes | Virtual memory size in bytes. |
DEPENDENT | kubernetes.scheduler.processvirtualmemorybytes Preprocessing: - PROMETHEUS PATTERN:process_virtual_memory_bytes ⛔️ON_FAIL: |
Kubernetes Scheduler | Kubernetes Scheduler: Resident memory, bytes | Resident memory size in bytes. |
DEPENDENT | kubernetes.scheduler.processresidentmemorybytes Preprocessing: - PROMETHEUS PATTERN:process_resident_memory_bytes ⛔️ON_FAIL: |
Kubernetes Scheduler | Kubernetes Scheduler: CPU | Total user and system CPU usage ratio. |
DEPENDENT | kubernetes.scheduler.cpu.util Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND - MULTIPLIER: |
Kubernetes Scheduler | Kubernetes Scheduler: Goroutines | Number of goroutines that currently exist. |
DEPENDENT | kubernetes.scheduler.gogoroutines Preprocessing: - PROMETHEUS PATTERN:go_goroutines : function : sum ⛔️ON_FAIL: |
Kubernetes Scheduler | Kubernetes Scheduler: Go threads | Number of OS threads created. |
DEPENDENT | kubernetes.scheduler.gothreads Preprocessing: - PROMETHEUS PATTERN:go_threads ⛔️ON_FAIL: |
Kubernetes Scheduler | Kubernetes Scheduler: Fds open | Number of open file descriptors. |
DEPENDENT | kubernetes.scheduler.openfds Preprocessing: - PROMETHEUS PATTERN:process_open_fds ⛔️ON_FAIL: |
Kubernetes Scheduler | Kubernetes Scheduler: Fds max | Maximum allowed open file descriptors. |
DEPENDENT | kubernetes.scheduler.maxfds Preprocessing: - PROMETHEUS PATTERN:process_max_fds ⛔️ON_FAIL: |
Kubernetes Scheduler | Kubernetes Scheduler: REST Client requests: 2xx, rate | Number of HTTP requests with 2xx status code per second. |
DEPENDENT | kubernetes.scheduler.clienthttprequests200.rate Preprocessing: - PROMETHEUS PATTERN:rest_client_requests_total{code =~ "2.."} : function : sum ⛔️ONFAIL: - CHANGEPER_SECOND |
Kubernetes Scheduler | Kubernetes Scheduler: REST Client requests: 3xx, rate | Number of HTTP requests with 3xx status code per second. |
DEPENDENT | kubernetes.scheduler.clienthttprequests300.rate Preprocessing: - PROMETHEUS PATTERN:rest_client_requests_total{code =~ "3.."} : function : sum ⛔️ONFAIL: - CHANGEPER_SECOND |
Kubernetes Scheduler | Kubernetes Scheduler: REST Client requests: 4xx, rate | Number of HTTP requests with 4xx status code per second. |
DEPENDENT | kubernetes.scheduler.clienthttprequests400.rate Preprocessing: - PROMETHEUS PATTERN:rest_client_requests_total{code =~ "4.."} : function : sum ⛔️ONFAIL: - CHANGEPER_SECOND |
Kubernetes Scheduler | Kubernetes Scheduler: REST Client requests: 5xx, rate | Number of HTTP requests with 5xx status code per second. |
DEPENDENT | kubernetes.scheduler.clienthttprequests500.rate Preprocessing: - PROMETHEUS PATTERN:rest_client_requests_total{code =~ "5.."} : function : sum ⛔️ONFAIL: - CHANGEPER_SECOND |
Kubernetes Scheduler | Kubernetes Scheduler: Schedule attempts: scheduled | Number of attempts to schedule pods with result "scheduled" per second. |
DEPENDENT | kubernetes.scheduler.schedulerscheduleattempts.scheduled.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Kubernetes Scheduler | Kubernetes Scheduler: Schedule attempts: unschedulable | Number of attempts to schedule pods with result "unschedulable" per second. |
DEPENDENT | kubernetes.scheduler.schedulerscheduleattempts.unschedulable.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Kubernetes Scheduler | Kubernetes Scheduler: Schedule attempts: error | Number of attempts to schedule pods with result "error" per second. |
DEPENDENT | kubernetes.scheduler.schedulerscheduleattempts.error.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Kubernetes Scheduler | Kubernetes Scheduler: Scheduling algorithm duration bucket, {#LE} | Scheduling algorithm latency in seconds. |
DEPENDENT | kubernetes.scheduler.schedulingalgorithmduration[{#LE}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes Scheduler | Kubernetes Scheduler: Scheduling algorithm duration, p90 | 90 percentile of scheduling algorithm latency in seconds. |
CALCULATED | kubernetes.scheduler.schedulingalgorithmduration_p90[{#SINGLETON}] Expression: bucket_percentile(//kubernetes.scheduler.scheduling_algorithm_duration[*],5m,90) |
Kubernetes Scheduler | Kubernetes Scheduler: Scheduling algorithm duration, p95 | 95 percentile of scheduling algorithm latency in seconds. |
CALCULATED | kubernetes.scheduler.schedulingalgorithmduration_p95[{#SINGLETON}] Expression: bucket_percentile(//kubernetes.scheduler.scheduling_algorithm_duration[*],5m,95) |
Kubernetes Scheduler | Kubernetes Scheduler: Scheduling algorithm duration, p99 | 99 percentile of scheduling algorithm latency in seconds. |
CALCULATED | kubernetes.scheduler.schedulingalgorithmduration_p99[{#SINGLETON}] Expression: bucket_percentile(//kubernetes.scheduler.scheduling_algorithm_duration[*],5m,99) |
Kubernetes Scheduler | Kubernetes Scheduler: Scheduling algorithm duration, p50 | 50 percentile of scheduling algorithm latency in seconds. |
CALCULATED | kubernetes.scheduler.schedulingalgorithmduration_p50[{#SINGLETON}] Expression: bucket_percentile(//kubernetes.scheduler.scheduling_algorithm_duration[*],5m,50) |
Kubernetes Scheduler | Kubernetes Scheduler: Binding duration bucket, {#LE} | Binding latency in seconds. |
DEPENDENT | kubernetes.scheduler.bindingduration[{#LE}] Preprocessing: - PROMETHEUS PATTERN:scheduler_binding_duration_seconds_bucket{le = "{#LE}"} ⛔️ON_FAIL: |
Kubernetes Scheduler | Kubernetes Scheduler: Binding duration, p90 | 90 percentile of binding latency in seconds. |
CALCULATED | kubernetes.scheduler.bindingdurationp90[{#SINGLETON}] Expression: bucket_percentile(//kubernetes.scheduler.binding_duration[*],5m,90) |
Kubernetes Scheduler | Kubernetes Scheduler: Binding duration, p95 | 99 percentile of binding latency in seconds. |
CALCULATED | kubernetes.scheduler.bindingdurationp95[{#SINGLETON}] Expression: bucket_percentile(//kubernetes.scheduler.binding_duration[*],5m,95) |
Kubernetes Scheduler | Kubernetes Scheduler: Binding duration, p99 | 95 percentile of binding latency in seconds. |
CALCULATED | kubernetes.scheduler.bindingdurationp99[{#SINGLETON}] Expression: bucket_percentile(//kubernetes.scheduler.binding_duration[*],5m,99) |
Kubernetes Scheduler | Kubernetes Scheduler: Binding duration, p50 | 50 percentile of binding latency in seconds. |
CALCULATED | kubernetes.scheduler.bindingdurationp50[{#SINGLETON}] Expression: bucket_percentile(//kubernetes.scheduler.binding_duration[*],5m,50) |
Kubernetes Scheduler | Kubernetes Scheduler: ["{#RESULT}"]: e2e scheduling seconds bucket, {#LE} | E2e scheduling latency in seconds (scheduling algorithm + binding) |
DEPENDENT | kubernetes.scheduler.e2eschedulingbucket[{#LE},"{#RESULT}"] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes Scheduler | Kubernetes Scheduler: ["{#RESULT}"]: e2e scheduling, p50 | 50 percentile of e2e scheduling latency. |
CALCULATED | kubernetes.scheduler.e2eschedulingp50["{#RESULT}"] Expression: bucket_percentile(//kubernetes.scheduler.e2e_scheduling_bucket[*,"{#RESULT}"],5m,50) |
Kubernetes Scheduler | Kubernetes Scheduler: ["{#RESULT}"]: e2e scheduling, p90 | 90 percentile of e2e scheduling latency. |
CALCULATED | kubernetes.scheduler.e2eschedulingp90["{#RESULT}"] Expression: bucket_percentile(//kubernetes.scheduler.e2e_scheduling_bucket[*,"{#RESULT}"],5m,90) |
Kubernetes Scheduler | Kubernetes Scheduler: ["{#RESULT}"]: e2e scheduling, p95 | 95 percentile of e2e scheduling latency. |
CALCULATED | kubernetes.scheduler.e2eschedulingp95["{#RESULT}"] Expression: bucket_percentile(//kubernetes.scheduler.e2e_scheduling_bucket[*,"{#RESULT}"],5m,95) |
Kubernetes Scheduler | Kubernetes Scheduler: ["{#RESULT}"]: e2e scheduling, p99 | 95 percentile of e2e scheduling latency. |
CALCULATED | kubernetes.scheduler.e2eschedulingp99["{#RESULT}"] Expression: bucket_percentile(//kubernetes.scheduler.e2e_scheduling_bucket[*,"{#RESULT}"],5m,99) |
Zabbix raw items | Kubernetes Scheduler: Get Scheduler metrics | Get raw metrics from Scheduler instance /metrics endpoint. |
HTTP_AGENT | kubernetes.scheduler.getmetrics Preprocessing: - CHECK NOTSUPPORTED⛔️ON FAIL:DISCARD_VALUE -> |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes Scheduler: Too many REST Client errors | "Kubernetes Scheduler REST Client requests is experiencing high error rate (with 5xx HTTP code). |
min(/Kubernetes Scheduler by HTTP/kubernetes.scheduler.client_http_requests_500.rate,5m)>{$KUBE.SCHEDULER.HTTP.CLIENT.ERROR} |
WARNING | |
Kubernetes Scheduler: Too many unschedulable pods | "Number of attempts to schedule pods with 'unschedulable' result is too high. 'unschedulable' means a pod could not be scheduled." |
min(/Kubernetes Scheduler by HTTP/kubernetes.scheduler.scheduler_schedule_attempts.unschedulable.rate,5m)>{$KUBE.SCHEDULER.UNSCHEDULABLE} |
WARNING | |
Kubernetes Scheduler: Too many schedule attempts with errors | "Number of attempts to schedule pods with 'error' result is too high. 'error' means an internal scheduler problem." |
min(/Kubernetes Scheduler by HTTP/kubernetes.scheduler.scheduler_schedule_attempts.error.rate,5m)>{$KUBE.SCHEDULER.ERROR} |
WARNING |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher.
The template to monitor Kubernetes nodes that work without any external scripts.
It works without external scripts and uses the script item to make HTTP requests to the Kubernetes API.
Install the Zabbix Helm Chart (https://git.zabbix.com/projects/ZT/repos/kubernetes-helm/browse?at=refs%2Fheads%2Frelease%2F6.2) in your Kubernetes cluster.
Set the {$KUBE.API.ENDPOINT.URL}
such as <scheme>://<host>:<port>/api
.
Get the generated service account token using the command
kubectl get secret zabbix-service-account -n monitoring -o jsonpath={.data.token} | base64 -d
Then set it to the macro {$KUBE.API.TOKEN}
.
Set {$KUBE.NODES.ENDPOINT.NAME}
with Zabbix agent's endpoint name. See kubectl -n monitoring get ep
. Default: zabbix-zabbix-helm-chrt-agent
.
Set up the macros to filter the metrics of discovered nodes
This template was tested on:
See Zabbix template operation for basic instructions.
Install the Zabbix Helm Chart in your Kubernetes cluster.
Set the {$KUBE.API.ENDPOINT.URL}
such as <scheme>://<host>:<port>/api
.
Get the generated service account token using the command
kubectl get secret zabbix-service-account -n monitoring -o jsonpath={.data.token} | base64 -d
Then set it to the macro {$KUBE.API.TOKEN}
.
Set {$KUBE.NODES.ENDPOINT.NAME}
with Zabbix agent's endpoint name. See kubectl -n monitoring get ep
. Default: zabbix-zabbix-helm-chrt-agent
.
Set up the macros to filter the metrics of discovered nodes:
Set up the macros to filter host creation based on host prototypes:
Set up macros to filter pod metrics by namespace:
Note, If you have a large cluster, it is highly recommended to set a filter for discoverable pods.
You can use {$KUBE.NODE.FILTER.LABELS}
, {$KUBE.POD.FILTER.LABELS}
, {$KUBE.NODE.FILTER.ANNOTATIONS}
and {$KUBE.POD.FILTER.ANNOTATIONS}
macros for advanced filtering nodes and pods by labels and annotations. Macro values are specified separated by commas and must have the key/value form with support for regular expressions in the value.
For example: kubernetes.io/hostname: kubernetes-node[5-25], !node-role.kubernetes.io/ingress: .*
. As a result, the nodes 5-25 without the "ingress" role will be discovered.
See documentation for details:
Note, the discovered nodes will be created as separate hosts in Zabbix with the Linux template automatically assigned to them.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$KUBE.API.ENDPOINT.URL} | Kubernetes API endpoint URL in the format |
https://localhost:6443/api |
{$KUBE.API.TOKEN} | Service account bearer token |
`` |
{$KUBE.LLD.FILTER.NODE.MATCHES} | Filter of discoverable nodes |
.* |
{$KUBE.LLD.FILTER.NODE.NOT_MATCHES} | Filter to exclude discovered nodes |
CHANGE_IF_NEEDED |
{$KUBE.LLD.FILTER.NODE.ROLE.MATCHES} | Filter of discoverable nodes by role |
.* |
{$KUBE.LLD.FILTER.NODE.ROLE.NOT_MATCHES} | Filter to exclude discovered node by role |
CHANGE_IF_NEEDED |
{$KUBE.LLD.FILTER.NODE_HOST.MATCHES} | Filter of discoverable cluster nodes |
.* |
{$KUBE.LLD.FILTER.NODEHOST.NOTMATCHES} | Filter to exclude discovered cluster nodes |
CHANGE_IF_NEEDED |
{$KUBE.LLD.FILTER.NODE_HOST.ROLE.MATCHES} | Filter of discoverable nodes hosts by role |
.* |
{$KUBE.LLD.FILTER.NODEHOST.ROLE.NOTMATCHES} | Filter to exclude discovered cluster nodes by role |
CHANGE_IF_NEEDED |
{$KUBE.LLD.FILTER.POD.NAMESPACE.MATCHES} | Filter of discoverable pods by namespace |
.* |
{$KUBE.LLD.FILTER.POD.NAMESPACE.NOT_MATCHES} | Filter to exclude discovered pods by namespace |
CHANGE_IF_NEEDED |
{$KUBE.NODE.FILTER.ANNOTATIONS} | Annotations to filter nodes (regex in values are supported) |
`` |
{$KUBE.NODE.FILTER.LABELS} | Labels to filter nodes (regex in values are supported) |
`` |
{$KUBE.NODES.ENDPOINT.NAME} | Kubernetes nodes endpoint name. See kubectl -n monitoring get ep |
zabbix-zabbix-helm-chrt-agent |
{$KUBE.POD.FILTER.ANNOTATIONS} | Annotations to filter pods (regex in values are supported) |
`` |
{$KUBE.POD.FILTER.LABELS} | Labels to filter Pods (regex in values are supported) |
`` |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster node discovery | - |
DEPENDENT | kube.nodehost.discovery Filter: AND- {#NAME} MATCHES REGEX{$KUBE.LLD.FILTER.NODE_HOST.MATCHES} - {#NAME} NOTMATCHESREGEX - {#ROLES} MATCHESREGEX - {#ROLES} NOTMATCHES_REGEX |
Node discovery | - |
DEPENDENT | kube.node.discovery Filter: AND- {#NAME} MATCHESREGEX - {#NAME} NOTMATCHESREGEX - {#ROLES} MATCHESREGEX - {#ROLES} NOTMATCHESREGEX |
Pod discovery | - |
DEPENDENT | kube.pod.discovery Preprocessing: - JAVASCRIPT - DISCARDUNCHANGEDHEARTBEAT Filter: AND- {#NODE} MATCHESREGEX - {#NODE} NOTMATCHESREGEX - {#NAMESPACE} MATCHESREGEX - {#NAMESPACE} NOTMATCHESREGEX |
Group | Name | Description | Type | Key and additional info | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Kubernetes | Kubernetes: Get nodes | Collecting and processing cluster nodes data via Kubernetes API. |
SCRIPT | kube.nodes Expression: The text is too long. Please see the template. |
||||||||||
Kubernetes | Get nodes check | Data collection check. |
DEPENDENT | kube.nodes.check Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
||||||||||
Kubernetes | Node LLD | Generation of data for node discovery rules. |
DEPENDENT | kube.nodes.lld Preprocessing: - JAVASCRIPT: `function parseFilters(filter) { var pairs = {}; filter.split(/\s,\s/).forEach(function (kv) { if (/([\w.-]+\/[\w.-]+):\s.+/.test(kv)) { var pair = kv.split(/\s:\s*/); pairs[pair[0]] = pair[1]; } }); return pairs; } function filter(name, data, filters) { var filtered = true; if (typeof data === 'object') { Object.keys(filters).some(function (filter) { var exclude = filter.match(/^!(.+)/); if (filter in data |
(exclude && exclude[1] in data)) { if ((exclude && new RegExp(filters[filter]).test(data[exclude[1]])) | (!exclude && !(new RegExp(filters[filter]).test(data[filter])))) { Zabbix.log(4, '[ Kubernetes discovery ] Discarded "' + name + '" by filter "' + filter + ': ' + filters[filter] + '"'); filtered = false; return true; } }; }); } return filtered; } try { var input = JSON.parse(value), output = []; apiurl = '{$KUBE.API.ENDPOINT.URL}', hostname = apiurl.match(/\/\/(.+):/); if (typeof hostname[1] === 'undefined') { Zabbix.log(4, '[ Kubernetes ] Received incorrect Kubernetes API url: ' + api_url + '. Expected format: |
typeof input.items === 'undefined') { Zabbix.log(4, '[ Kubernetes ] Received incorrect JSON: ' + value); throw 'Incorrect JSON. Check debug log for more information.'; } var filterLabels = parseFilters('{$KUBE.NODE.FILTER.LABELS}'), filterAnnotations = parseFilters('{$KUBE.NODE.FILTER.ANNOTATIONS}'); input.items.forEach(function (node) { if (filter(node.metadata.name, node.metadata.labels, filterLabels) && filter(node.metadata.name, node.metadata.annotations, filterAnnotations)) { Zabbix.log(4, '[ Kubernetes discovery ] Filtered node "' + node.metadata.name + '"'); var internalIPs = node.status.addresses.filter(function (addr) { return addr.type === 'InternalIP'; }); var internalIP = internalIPs.length && internalIPs[0].address; if (internalIP in input.endpointIPs) { output.push({ '{#NAME}': node.metadata.name, '{#IP}': internalIP, '{#ROLES}': node.status.roles, '{#ARCH}': node.metadata.labels['kubernetes.io/arch'] | '', '{#OS}': node.metadata.labels['kubernetes.io/os'] | '', '{#CLUSTER_HOSTNAME}': hostname[1] }); } else { Zabbix.log(4, '[ Kubernetes discovery ] Node "' + node.metadata.name + '" is not included in the list of endpoint IPs'); } } }); return JSON.stringify(output); } catch (error) { error += (String(error).endsWith('.')) ? '' : '.'; Zabbix.log(3, '[ Kubernetes discovery ] ERROR: ' + error); throw 'Discovery error: ' + error; } </p><p>- DISCARD_UNCHANGED_HEARTBEAT: 3h` |
|||||
Kubernetes | Node [{#NAME}]: Get data | Collecting and processing cluster by node [{#NAME}] data via Kubernetes API. |
DEPENDENT | kube.node.get[{#NAME}] Preprocessing: - JSONPATH: |
||||||||||
Kubernetes | Node [{#NAME}] Addresses: External IP | Typically the IP address of the node that is externally routable (available from outside the cluster). |
DEPENDENT | kube.node.addresses.externalip[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - DISCARDUNCHANGEDHEARTBEAT: |
||||||||||
Kubernetes | Node [{#NAME}] Addresses: Internal IP | Typically the IP address of the node that is routable only within the cluster. |
DEPENDENT | kube.node.addresses.internalip[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - DISCARDUNCHANGEDHEARTBEAT: |
||||||||||
Kubernetes | Node [{#NAME}] Allocatable: CPU | Allocatable CPU. 'Allocatable' on a Kubernetes node is defined as the amount of compute resources that are available for pods. The scheduler does not over-subscribe 'Allocatable'. 'CPU', 'memory' and 'ephemeral-storage' are supported as of now. |
DEPENDENT | kube.node.allocatable.cpu[{#NAME}] Preprocessing: - JSONPATH: |
||||||||||
Kubernetes | Node [{#NAME}] Allocatable: Memory | Allocatable Memory. 'Allocatable' on a Kubernetes node is defined as the amount of compute resources that are available for pods. The scheduler does not over-subscribe 'Allocatable'. 'CPU', 'memory' and 'ephemeral-storage' are supported as of now. |
DEPENDENT | kube.node.allocatable.memory[{#NAME}] Preprocessing: - JSONPATH: |
||||||||||
Kubernetes | Node [{#NAME}] Allocatable: Pods | https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/ |
DEPENDENT | kube.node.allocatable.pods[{#NAME}] Preprocessing: - JSONPATH: |
||||||||||
Kubernetes | Node [{#NAME}] Capacity: CPU | CPU resource capacity. https://kubernetes.io/docs/concepts/architecture/nodes/#capacity |
DEPENDENT | kube.node.capacity.cpu[{#NAME}] Preprocessing: - JSONPATH: |
||||||||||
Kubernetes | Node [{#NAME}] Capacity: Memory | Memory resource capacity. https://kubernetes.io/docs/concepts/architecture/nodes/#capacity |
DEPENDENT | kube.node.capacity.memory[{#NAME}] Preprocessing: - JSONPATH: |
||||||||||
Kubernetes | Node [{#NAME}] Capacity: Pods | https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/ |
DEPENDENT | kube.node.capacity.pods[{#NAME}] Preprocessing: - JSONPATH: |
||||||||||
Kubernetes | Node [{#NAME}] Conditions: Disk pressure | True if pressure exists on the disk size - that is, if the disk capacity is low; otherwise False. |
DEPENDENT | kube.node.conditions.diskpressure[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: - JAVASCRIPT: `return ['True', 'False', 'Unknown'].indexOf(value) + 1 |
'Problem with status processing in JS'; ` | |||||||||
Kubernetes | Node [{#NAME}] Conditions: Memory pressure | True if pressure exists on the node memory - that is, if the node memory is low; otherwise False. |
DEPENDENT | kube.node.conditions.memorypressure[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: - JAVASCRIPT: `return ['True', 'False', 'Unknown'].indexOf(value) + 1 |
'Problem with status processing in JS'; ` | |||||||||
Kubernetes | Node [{#NAME}] Conditions: Network unavailable | True if the network for the node is not correctly configured, otherwise False. |
DEPENDENT | kube.node.conditions.networkunavailable[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: - JAVASCRIPT: `return ['True', 'False', 'Unknown'].indexOf(value) + 1 |
'Problem with status processing in JS'; ` | |||||||||
Kubernetes | Node [{#NAME}] Conditions: PID pressure | True if pressure exists on the processes - that is, if there are too many processes on the node; otherwise False. |
DEPENDENT | kube.node.conditions.pidpressure[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: - JAVASCRIPT: `return ['True', 'False', 'Unknown'].indexOf(value) + 1 |
'Problem with status processing in JS'; ` | |||||||||
Kubernetes | Node [{#NAME}] Conditions: Ready | True if the node is healthy and ready to accept pods, False if the node is not healthy and is not accepting pods, and Unknown if the node controller has not heard from the node in the last node-monitor-grace-period (default is 40 seconds). |
DEPENDENT | kube.node.conditions.ready[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: - JAVASCRIPT: `return ['True', 'False', 'Unknown'].indexOf(value) + 1 |
'Problem with status processing in JS'; ` | |||||||||
Kubernetes | Node [{#NAME}] Info: Architecture | Node architecture. |
DEPENDENT | kube.node.info.architecture[{#NAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
||||||||||
Kubernetes | Node [{#NAME}] Info: Container runtime | Container runtime. https://kubernetes.io/docs/setup/production-environment/container-runtimes/ |
DEPENDENT | kube.node.info.containerruntime[{#NAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
||||||||||
Kubernetes | Node [{#NAME}] Info: Kernel version | Node kernel version. |
DEPENDENT | kube.node.info.kernelversion[{#NAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
||||||||||
Kubernetes | Node [{#NAME}] Info: Kubelet version | Version of Kubelet. |
DEPENDENT | kube.node.info.kubeletversion[{#NAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
||||||||||
Kubernetes | Node [{#NAME}] Info: KubeProxy version | Version of KubeProxy. |
DEPENDENT | kube.node.info.kubeproxyversion[{#NAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
||||||||||
Kubernetes | Node [{#NAME}] Info: Operating system | Node operating system. |
DEPENDENT | kube.node.info.operatingsystem[{#NAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
||||||||||
Kubernetes | Node [{#NAME}] Info: OS image | Node OS image. |
DEPENDENT | kube.node.info.osversion[{#NAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
||||||||||
Kubernetes | Node [{#NAME}] Info: Roles | Node roles. |
DEPENDENT | kube.node.info.roles[{#NAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
||||||||||
Kubernetes | Node [{#NAME}] Limits: CPU | Node CPU limits. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |
DEPENDENT | kube.node.limits.cpu[{#NAME}] Preprocessing: - JSONPATH: |
||||||||||
Kubernetes | Node [{#NAME}] Limits: Memory | Node Memory limits. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |
DEPENDENT | kube.node.limits.memory[{#NAME}] Preprocessing: - JSONPATH: |
||||||||||
Kubernetes | Node [{#NAME}] Requests: CPU | Node CPU requests. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |
DEPENDENT | kube.node.requests.cpu[{#NAME}] Preprocessing: - JSONPATH: |
||||||||||
Kubernetes | Node [{#NAME}] Requests: Memory | Node Memory requests. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |
DEPENDENT | kube.node.requests.memory[{#NAME}] Preprocessing: - JSONPATH: |
||||||||||
Kubernetes | Node [{#NAME}] Uptime | Node uptime. |
DEPENDENT | kube.node.uptime[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: - JAVASCRIPT: |
||||||||||
Kubernetes | Node [{#NAME}] Used: Pods | Current number of pods on the node. |
DEPENDENT | kube.node.used.pods[{#NAME}] Preprocessing: - JSONPATH: |
||||||||||
Kubernetes | Node [{#NODE}] Pod [{#POD}]: Get data | Collecting and processing cluster by node [{#NODE}] data via Kubernetes API. |
DEPENDENT | kube.pod.get[{#POD}] Preprocessing: - JSONPATH: |
||||||||||
Kubernetes | Node [{#NODE}] Pod [{#POD}] Conditions: Containers ready | All containers in the Pod are ready. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions |
DEPENDENT | kube.pod.conditions.containersready[{#POD}] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - JAVASCRIPT: `return ['True', 'False', 'Unknown'].indexOf(value) + 1 |
'Problem with status processing in JS'; ` | |||||||||
Kubernetes | Node [{#NODE}] Pod [{#POD}] Conditions: Initialized | All init containers have started successfully. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions |
DEPENDENT | kube.pod.conditions.initialized[{#POD}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: - JAVASCRIPT: `return ['True', 'False', 'Unknown'].indexOf(value) + 1 |
'Problem with status processing in JS'; ` | |||||||||
Kubernetes | Node [{#NODE}] Pod [{#POD}] Conditions: Ready | The Pod is able to serve requests and should be added to the load balancing pools of all matching Services. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions |
DEPENDENT | kube.pod.conditions.ready[{#POD}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: - JAVASCRIPT: `return ['True', 'False', 'Unknown'].indexOf(value) + 1 |
'Problem with status processing in JS'; ` | |||||||||
Kubernetes | Node [{#NODE}] Pod [{#POD}] Conditions: Scheduled | The Pod has been scheduled to a node. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions |
DEPENDENT | kube.pod.conditions.scheduled[{#POD}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: - JAVASCRIPT: `return ['True', 'False', 'Unknown'].indexOf(value) + 1 |
'Problem with status processing in JS'; ` | |||||||||
Kubernetes | Node [{#NODE}] Pod [{#POD}] Containers: Restarts | The number of times the container has been restarted, currently based on the number of dead containers that have not yet been removed. Note that this is calculated from dead containers. But those containers are subject to garbage collection. |
DEPENDENT | kube.pod.containers.restartcount[{#POD}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
||||||||||
Kubernetes | Node [{#NODE}] Pod [{#POD}] Status: Phase | The phase of a Pod is a simple, high-level summary of where the Pod is in its lifecycle. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#pod-phase |
DEPENDENT | kube.pod.status.phase[{#POD}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: - JAVASCRIPT: `return ['Pending', 'Running', 'Succeeded', 'Failed', 'Unknown'].indexOf(value) + 1 |
'Problem with status processing in JS'; ` | |||||||||
Kubernetes | Node [{#NODE}] Pod [{#POD}] Uptime | Pod uptime. |
DEPENDENT | kube.pod.uptime[{#POD}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: - JAVASCRIPT: |
Name | Description | Expression | Severity | Dependencies and additional info | ||
---|---|---|---|---|---|---|
Kubernetes: Failed to get nodes | - |
length(last(/Kubernetes nodes by HTTP/kube.nodes.check))>0 |
WARNING | |||
Node [{#NAME}] Conditions: Pressure exists on the disk size | True - pressure exists on the disk size - that is, if the disk capacity is low; otherwise False. |
last(/Kubernetes nodes by HTTP/kube.node.conditions.diskpressure[{#NAME}])=1 |
WARNING | |||
Node [{#NAME}] Conditions: Pressure exists on the node memory | True - pressure exists on the node memory - that is, if the node memory is low; otherwise False |
last(/Kubernetes nodes by HTTP/kube.node.conditions.memorypressure[{#NAME}])=1 |
WARNING | |||
Node [{#NAME}] Conditions: Network is not correctly configured | True - the network for the node is not correctly configured, otherwise False |
last(/Kubernetes nodes by HTTP/kube.node.conditions.networkunavailable[{#NAME}])=1 |
WARNING | |||
Node [{#NAME}] Conditions: Pressure exists on the processes | True - pressure exists on the processes - that is, if there are too many processes on the node; otherwise False |
last(/Kubernetes nodes by HTTP/kube.node.conditions.pidpressure[{#NAME}])=1 |
WARNING | |||
Node [{#NAME}] Conditions: Is not in Ready state | False - if the node is not healthy and is not accepting pods. Unknown - if the node controller has not heard from the node in the last node-monitor-grace-period (default is 40 seconds). |
last(/Kubernetes nodes by HTTP/kube.node.conditions.ready[{#NAME}])<>1 |
WARNING | |||
Node [{#NAME}] Limits: Total CPU limits are too high | - |
last(/Kubernetes nodes by HTTP/kube.node.limits.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 0.9 |
WARNING | Depends on: - Node [{#NAME}] Limits: Total CPU limits are too high |
||
Node [{#NAME}] Limits: Total CPU limits are too high | - |
last(/Kubernetes nodes by HTTP/kube.node.limits.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 1 |
AVERAGE | |||
Node [{#NAME}] Limits: Total memory limits are too high | - |
last(/Kubernetes nodes by HTTP/kube.node.limits.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 0.9 |
WARNING | Depends on: - Node [{#NAME}] Limits: Total memory limits are too high |
||
Node [{#NAME}] Limits: Total memory limits are too high | - |
last(/Kubernetes nodes by HTTP/kube.node.limits.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 1 |
AVERAGE | |||
Node [{#NAME}] Requests: Total CPU requests are too high | - |
last(/Kubernetes nodes by HTTP/kube.node.requests.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 0.5 |
WARNING | Depends on: - Node [{#NAME}] Requests: Total CPU requests are too high |
||
Node [{#NAME}] Requests: Total CPU requests are too high | - |
last(/Kubernetes nodes by HTTP/kube.node.requests.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 0.8 |
AVERAGE | |||
Node [{#NAME}] Requests: Total memory requests are too high | - |
last(/Kubernetes nodes by HTTP/kube.node.requests.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 0.5 |
WARNING | Depends on: - Node [{#NAME}] Requests: Total memory requests are too high |
||
Node [{#NAME}] Requests: Total memory requests are too high | - |
last(/Kubernetes nodes by HTTP/kube.node.requests.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 0.8 |
AVERAGE | |||
Node [{#NAME}]: Has been restarted | Uptime is less than 10 minutes |
last(/Kubernetes nodes by HTTP/kube.node.uptime[{#NAME}])<10 |
INFO | |||
Node [{#NAME}] Used: Kubelet too many pods | Kubelet is running at capacity. |
last(/Kubernetes nodes by HTTP/kube.node.used.pods[{#NAME}])/ last(/Kubernetes nodes by HTTP/kube.node.capacity.pods[{#NAME}]) > 0.9 |
WARNING | |||
Node [{#NODE}] Pod [{#POD}]: Pod is crash looping | Pos restarts more than 2 times in the last 3 minutes. |
(last(/Kubernetes nodes by HTTP/kube.pod.containers.restartcount[{#POD}])-min(/Kubernetes nodes by HTTP/kube.pod.containers.restartcount[{#POD}],3m))>2 |
WARNING | |||
Node [{#NODE}] Pod [{#POD}] Status: Kubernetes Pod not healthy | Pod has been in a non-ready state for longer than 10 minutes. |
`count(/Kubernetes nodes by HTTP/kube.pod.status.phase[{#POD}],10m, "regexp","^(1 | 4 | 5)$")>=9` | HIGH |
Please report any issues with the template at https://support.zabbix.com.
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums.
For Zabbix version: 6.2 and higher. The template to monitor Kubernetes Controller manager by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Kubernetes Controller manager by HTTP
— collects metrics by HTTP agent from Controller manager /metrics endpoint.
Don't forget change macros {$KUBE.KUBELET.URL}, {$KUBE.API.TOKEN}. NOTE. Some metrics may not be collected depending on your Kubernetes instance version and configuration.
This template was tested on:
See Zabbix template operation for basic instructions.
Internal service metrics are collected from /metrics endpoint. Template needs to use Authorization via API token.
Don't forget change macros {$KUBE.KUBELET.URL}, {$KUBE.API.TOKEN}. NOTE. Some metrics may not be collected depending on your Kubernetes instance version and configuration.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$KUBE.API.TOKEN} | Service account bearer token |
`` |
{$KUBE.KUBELET.CADVISOR.ENDPOINT} | cAdvisor metrics from Kubelet /metrics/cadvisor endpoint |
/metrics/cadvisor |
{$KUBE.KUBELET.METRIC.ENDPOINT} | Kubelet /metrics endpoint |
/metrics |
{$KUBE.KUBELET.PODS.ENDPOINT} | Kubelet /pods endpoint |
/pods |
{$KUBE.KUBELET.URL} | Instance URL |
https://localhost:10250 |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Container memory discovery | DEPENDENT | kube.kubelet.container.memory.cache.discovery Preprocessing: - PROMETHEUSTOJSON - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
|
Pods discovery | DEPENDENT | kube.kubelet.pods.discovery Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
|
REST client requests discovery | DEPENDENT | kube.kubelet.rest.requests.discovery Preprocessing: - PROMETHEUSTOJSON - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
|
Runtime operations discovery | DEPENDENT | kube.kubelet.runtimeoperationsbucket.discovery Preprocessing: - PROMETHEUSTOJSON: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: Overrides: bucket item total item |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Kubernetes | Kubernetes: Get kubelet metrics | Collecting raw Kubelet metrics from /metrics endpoint. |
HTTP_AGENT | kube.kubelet.metrics |
Kubernetes | Kubernetes: Get cadvisor metrics | Collecting raw Kubelet metrics from /metrics/cadvisor endpoint. |
HTTP_AGENT | kube.cadvisor.metrics |
Kubernetes | Kubernetes: Get pods | Collecting raw Kubelet metrics from /pods endpoint. |
HTTP_AGENT | kube.pods |
Kubernetes | Kubernetes: Pods running | The number of running pods. |
DEPENDENT | kube.kubelet.pods.running Preprocessing: - JSONPATH: |
Kubernetes | Kubernetes: Containers running | The number of running containers. |
DEPENDENT | kube.kubelet.containers.running Preprocessing: - JSONPATH: |
Kubernetes | Kubernetes: Containers last state terminated | The number of containers that were previously terminated. |
DEPENDENT | kube.kublet.containers.terminated Preprocessing: - JSONPATH: |
Kubernetes | Kubernetes: Containers restarts | The number of times the container has been restarted. |
DEPENDENT | kube.kubelet.containers.restarts Preprocessing: - JSONPATH: |
Kubernetes | Kubernetes: CPU cores, total | The number of cores in this machine (available until kubernetes v1.18). |
DEPENDENT | kube.kubelet.cpu.cores Preprocessing: - PROMETHEUS_PATTERN: |
Kubernetes | Kubernetes: Machine memory, bytes | Resident memory size in bytes. |
DEPENDENT | kube.kubelet.machine.memory Preprocessing: - PROMETHEUS_PATTERN: |
Kubernetes | Kubernetes: Virtual memory, bytes | Virtual memory size in bytes. |
DEPENDENT | kube.kubelet.virtual.memory Preprocessing: - PROMETHEUS_PATTERN: |
Kubernetes | Kubernetes: File descriptors, max | Maximum number of open file descriptors. |
DEPENDENT | kube.kubelet.processmaxfds Preprocessing: - PROMETHEUS_PATTERN: |
Kubernetes | Kubernetes: File descriptors, open | Number of open file descriptors. |
DEPENDENT | kube.kubelet.processopenfds Preprocessing: - PROMETHEUS_PATTERN: |
Kubernetes | Kubernetes: [{#OP_TYPE}] Runtime operations bucket: {#LE} | Duration in seconds of runtime operations. Broken down by operation type. |
DEPENDENT | kube.kublet.runtimeopsdurationsecondsbucket[{#LE},"{#OPTYPE}"] Preprocessing: - PROMETHEUS PATTERN:kubelet_runtime_operations_duration_seconds_bucket{le="{#LE}",operation_type="{#OP_TYPE}"} : function : sum |
Kubernetes | Kubernetes: [{#OP_TYPE}] Runtime operations total, rate | Cumulative number of runtime operations by operation type. |
DEPENDENT | kube.kublet.runtimeopstotal.rate["{#OPTYPE}"] Preprocessing: - PROMETHEUS PATTERN:kubelet_runtime_operations_total{operation_type="{#OP_TYPE}"} ⛔️ONFAIL: - CHANGEPER_SECOND |
Kubernetes | Kubernetes: [{#OP_TYPE}] Operations, p90 | 90 percentile of operation latency distribution in seconds for each verb. |
CALCULATED | kube.kublet.runtimeopsdurationsecondsp90["{#OP_TYPE}"] Expression: bucket_percentile(//kube.kublet.runtime_ops_duration_seconds_bucket[*,"{#OP_TYPE}"],5m,90) |
Kubernetes | Kubernetes: [{#OP_TYPE}] Operations, p95 | 95 percentile of operation latency distribution in seconds for each verb. |
CALCULATED | kube.kublet.runtimeopsdurationsecondsp95["{#OP_TYPE}"] Expression: bucket_percentile(//kube.kublet.runtime_ops_duration_seconds_bucket[*,"{#OP_TYPE}"],5m,95) |
Kubernetes | Kubernetes: [{#OP_TYPE}] Operations, p99 | 99 percentile of operation latency distribution in seconds for each verb. |
CALCULATED | kube.kublet.runtimeopsdurationsecondsp99["{#OP_TYPE}"] Expression: bucket_percentile(//kube.kublet.runtime_ops_duration_seconds_bucket[*,"{#OP_TYPE}"],5m,99) |
Kubernetes | Kubernetes: [{#OP_TYPE}] Operations, p50 | 50 percentile of operation latency distribution in seconds for each verb. |
CALCULATED | kube.kublet.runtimeopsdurationsecondsp50["{#OP_TYPE}"] Expression: bucket_percentile(//kube.kublet.runtime_ops_duration_seconds_bucket[*,"{#OP_TYPE}"],5m,50) |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] CPU: Load average, 10s | Pods cpu load average over the last 10 seconds. |
DEPENDENT | kube.pod.containercpuloadaverage10s[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] CPU: System seconds, total | The number of cores used for system time. |
DEPENDENT | kube.pod.containercpusystemsecondstotal[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] CPU: User seconds, total | The number of cores used for user time. |
DEPENDENT | kube.pod.containercpuusersecondstotal[{#NAMESPACE}/{#NAME}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes | Kubernetes: Host [{#HOST}] Request method [{#METHOD}] Code:[{#CODE}] | Number of HTTP requests, partitioned by status code, method, and host. |
DEPENDENT | kube.kubelet.rest.requests["{#CODE}", "{#HOST}", "{#METHOD}"] Preprocessing: - PROMETHEUSPATTERN: - DISCARDUNCHANGED_HEARTBEAT: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: Memory page cache | Number of bytes of page cache memory. |
DEPENDENT | kube.kubelet.container.memory.cache["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing: - PROMETHEUSPATTERN: - DISCARDUNCHANGED_HEARTBEAT: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: Memory max usage | Maximum memory usage recorded in bytes. |
DEPENDENT | kube.kubelet.container.memory.maxusage["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing: - PROMETHEUS PATTERN:container_memory_max_usage_bytes{container="{#CONTAINER}", namespace="{#NAMESPACE}", pod="{#POD}"} - DISCARDUNCHANGEDHEARTBEAT: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: RSS | Size of RSS in bytes. |
DEPENDENT | kube.kubelet.container.memory.rss["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing: - PROMETHEUSPATTERN: - DISCARDUNCHANGED_HEARTBEAT: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: Swap | Container swap usage in bytes. |
DEPENDENT | kube.kubelet.container.memory.swap["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing: - PROMETHEUSPATTERN: - DISCARDUNCHANGED_HEARTBEAT: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: Usage | Current memory usage in bytes, including all memory regardless of when it was accessed. |
DEPENDENT | kube.kubelet.container.memory.usage["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing: - PROMETHEUSPATTERN: - DISCARDUNCHANGED_HEARTBEAT: |
Kubernetes | Kubernetes: Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: Working set | Current working set in bytes. |
DEPENDENT | kube.kubelet.container.memory.workingset["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing: - PROMETHEUS PATTERN:container_memory_working_set_bytes{container="{#CONTAINER}", namespace="{#NAMESPACE}", pod="{#POD}"} - DISCARDUNCHANGEDHEARTBEAT: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|
Please report any issues with the template at https://support.zabbix.com.
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums.
For Zabbix version: 6.2 and higher
The template to monitor Kubernetes Controller manager by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Kubernetes Controller manager by HTTP
— collects metrics by HTTP agent from Controller manager /metrics endpoint.
This template was tested on:
See Zabbix template operation for basic instructions.
Internal service metrics are collected from /metrics endpoint. Template needs to use Authorization via API token.
Don't forget change macros {$KUBE.CONTROLLER.SERVER.URL}, {$KUBE.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values. NOTE. Some metrics may not be collected depending on your Kubernetes Controller manager instance version and configuration.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$KUBE.API.TOKEN} | API Authorization Token |
`` |
{$KUBE.CONTROLLER.HTTP.CLIENT.ERROR} | Maximum number of HTTP client requests failures used for trigger |
2 |
{$KUBE.CONTROLLER.SERVER.URL} | Instance URL |
http://localhost:10252/metrics |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Workqueue metrics discovery | DEPENDENT | kubernetes.controller.workqueue.discovery Preprocessing: - PROMETHEUSTOJSON: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: Overrides: bucket item total item |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Kubernetes Controller | Kubernetes Controller Manager: Leader election status | Gauge of if the reporting system is master of the relevant lease, 0 indicates backup, 1 indicates master. |
DEPENDENT | kubernetes.controller.leaderelectionmasterstatus Preprocessing: - PROMETHEUS PATTERN:leader_election_master_status ⛔️ON_FAIL: |
Kubernetes Controller | Kubernetes Controller Manager: Virtual memory, bytes | Virtual memory size in bytes. |
DEPENDENT | kubernetes.controller.processvirtualmemorybytes Preprocessing: - PROMETHEUS PATTERN:process_virtual_memory_bytes ⛔️ON_FAIL: |
Kubernetes Controller | Kubernetes Controller Manager: Resident memory, bytes | Resident memory size in bytes. |
DEPENDENT | kubernetes.controller.processresidentmemorybytes Preprocessing: - PROMETHEUS PATTERN:process_resident_memory_bytes ⛔️ON_FAIL: |
Kubernetes Controller | Kubernetes Controller Manager: CPU | Total user and system CPU usage ratio. |
DEPENDENT | kubernetes.controller.cpu.util Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND - MULTIPLIER: |
Kubernetes Controller | Kubernetes Controller Manager: Goroutines | Number of goroutines that currently exist. |
DEPENDENT | kubernetes.controller.gogoroutines Preprocessing: - PROMETHEUS PATTERN:go_goroutines : function : sum ⛔️ON_FAIL: |
Kubernetes Controller | Kubernetes Controller Manager: Go threads | Number of OS threads created. |
DEPENDENT | kubernetes.controller.gothreads Preprocessing: - PROMETHEUS PATTERN:go_threads ⛔️ON_FAIL: |
Kubernetes Controller | Kubernetes Controller Manager: Fds open | Number of open file descriptors. |
DEPENDENT | kubernetes.controller.openfds Preprocessing: - PROMETHEUS PATTERN:process_open_fds ⛔️ON_FAIL: |
Kubernetes Controller | Kubernetes Controller Manager: Fds max | Maximum allowed open file descriptors. |
DEPENDENT | kubernetes.controller.maxfds Preprocessing: - PROMETHEUS PATTERN:process_max_fds ⛔️ON_FAIL: |
Kubernetes Controller | Kubernetes Controller Manager: REST Client requests: 2xx, rate | Number of HTTP requests with 2xx status code per second. |
DEPENDENT | kubernetes.controller.clienthttprequests200.rate Preprocessing: - PROMETHEUS PATTERN:rest_client_requests_total{code =~ "2.."} : function : sum ⛔️ONFAIL: - CHANGEPER_SECOND |
Kubernetes Controller | Kubernetes Controller Manager: REST Client requests: 3xx, rate | Number of HTTP requests with 3xx status code per second. |
DEPENDENT | kubernetes.controller.clienthttprequests300.rate Preprocessing: - PROMETHEUS PATTERN:rest_client_requests_total{code =~ "3.."} : function : sum ⛔️ONFAIL: - CHANGEPER_SECOND |
Kubernetes Controller | Kubernetes Controller Manager: REST Client requests: 4xx, rate | Number of HTTP requests with 4xx status code per second. |
DEPENDENT | kubernetes.controller.clienthttprequests400.rate Preprocessing: - PROMETHEUS PATTERN:rest_client_requests_total{code =~ "4.."} : function : sum ⛔️ONFAIL: - CHANGEPER_SECOND |
Kubernetes Controller | Kubernetes Controller Manager: REST Client requests: 5xx, rate | Number of HTTP requests with 5xx status code per second. |
DEPENDENT | kubernetes.controller.clienthttprequests500.rate Preprocessing: - PROMETHEUS PATTERN:rest_client_requests_total{code =~ "5.."} : function : sum ⛔️ONFAIL: - CHANGEPER_SECOND |
Kubernetes Controller | Kubernetes Controller Manager: ["{#NAME}"]: Workqueue adds total, rate | Total number of adds handled by workqueue per second. |
DEPENDENT | kubernetes.controller.workqueueaddstotal["{#NAME}"] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Kubernetes Controller | Kubernetes Controller Manager: ["{#NAME}"]: Workqueue depth | Current depth of workqueue. |
DEPENDENT | kubernetes.controller.workqueuedepth["{#NAME}"] Preprocessing: - PROMETHEUS PATTERN:workqueue_depth{name = "{#NAME}"} ⛔️ON_FAIL: |
Kubernetes Controller | Kubernetes Controller Manager: ["{#NAME}"]: Workqueue unfinished work, sec | How many seconds of work has done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases. |
DEPENDENT | kubernetes.controller.workqueueunfinishedworkseconds["{#NAME}"] Preprocessing: - PROMETHEUS PATTERN:workqueue_unfinished_work_seconds{name = "{#NAME}"} ⛔️ON_FAIL: |
Kubernetes Controller | Kubernetes Controller Manager: ["{#NAME}"]: Workqueue retries, rate | Total number of retries handled by workqueue per second. |
DEPENDENT | kubernetes.controller.workqueueretriestotal["{#NAME}"] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Kubernetes Controller | Kubernetes Controller Manager: ["{#NAME}"]: Workqueue longest running processor, sec | How many seconds has the longest running processor for workqueue been running. |
DEPENDENT | kubernetes.controller.workqueuelongestrunningprocessorseconds["{#NAME}"] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes Controller | Kubernetes Controller Manager: ["{#NAME}"]: Workqueue work duration, p90 | 90 percentile of how long in seconds processing an item from workqueue takes, by queue. |
CALCULATED | kubernetes.controller.workqueueworkdurationsecondsp90["{#NAME}"] Expression: bucket_percentile(//kubernetes.controller.duration_seconds_bucket[*,"{#NAME}"],5m,90) |
Kubernetes Controller | Kubernetes Controller Manager: ["{#NAME}"]: Workqueue work duration, p95 | 95 percentile of how long in seconds processing an item from workqueue takes, by queue. |
CALCULATED | kubernetes.controller.workqueueworkdurationsecondsp95["{#NAME}"] Expression: bucket_percentile(//kubernetes.controller.duration_seconds_bucket[*,"{#NAME}"],5m,95) |
Kubernetes Controller | Kubernetes Controller Manager: ["{#NAME}"]: Workqueue work duration, p99 | 99 percentile of how long in seconds processing an item from workqueue takes, by queue. |
CALCULATED | kubernetes.controller.workqueueworkdurationsecondsp99["{#NAME}"] Expression: bucket_percentile(//kubernetes.controller.duration_seconds_bucket[*,"{#NAME}"],5m,99) |
Kubernetes Controller | Kubernetes Controller Manager: ["{#NAME}"]: Workqueue work duration, 50p | 50 percentiles of how long in seconds processing an item from workqueue takes, by queue. |
CALCULATED | kubernetes.controller.workqueueworkdurationsecondsp50["{#NAME}"] Expression: bucket_percentile(//kubernetes.controller.duration_seconds_bucket[*,"{#NAME}"],5m,50) |
Kubernetes Controller | Kubernetes Controller Manager: ["{#NAME}"]: Workqueue queue duration, p90 | 90 percentile of how long in seconds an item stays in workqueue before being requested, by queue. |
CALCULATED | kubernetes.controller.workqueuequeuedurationsecondsp90["{#NAME}"] Expression: bucket_percentile(//kubernetes.controller.queue_duration_seconds_bucket[*,"{#NAME}"],5m,90) |
Kubernetes Controller | Kubernetes Controller Manager: ["{#NAME}"]: Workqueue queue duration, p95 | 95 percentile of how long in seconds an item stays in workqueue before being requested, by queue. |
CALCULATED | kubernetes.controller.workqueuequeuedurationsecondsp95["{#NAME}"] Expression: bucket_percentile(//kubernetes.controller.queue_duration_seconds_bucket[*,"{#NAME}"],5m,95) |
Kubernetes Controller | Kubernetes Controller Manager: ["{#NAME}"]: Workqueue queue duration, p99 | 99 percentile of how long in seconds an item stays in workqueue before being requested, by queue. |
CALCULATED | kubernetes.controller.workqueuequeuedurationsecondsp99["{#NAME}"] Expression: bucket_percentile(//kubernetes.controller.queue_duration_seconds_bucket[*,"{#NAME}"],5m,99) |
Kubernetes Controller | Kubernetes Controller Manager: ["{#NAME}"]: Workqueue queue duration, 50p | 50 percentile of how long in seconds an item stays in workqueue before being requested. If there are no requests for 5 minute, item value will be discarded. |
CALCULATED | kubernetes.controller.workqueuequeuedurationsecondsp50["{#NAME}"] Preprocessing: - CHECKNOTSUPPORTED ⛔️ON_FAIL: Expression: bucket_percentile(//kubernetes.controller.queue_duration_seconds_bucket[*,"{#NAME}"],5m,50) |
Kubernetes Controller | Kubernetes Controller Manager: ["{#NAME}"]: Workqueue duration seconds bucket, {#LE} | How long in seconds processing an item from workqueue takes. |
DEPENDENT | kubernetes.controller.durationsecondsbucket[{#LE},"{#NAME}"] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes Controller | Kubernetes Controller Manager: ["{#NAME}"]: Queue duration seconds bucket, {#LE} | How long in seconds an item stays in workqueue before being requested. |
DEPENDENT | kubernetes.controller.queuedurationsecondsbucket[{#LE},"{#NAME}"] Preprocessing: - PROMETHEUS PATTERN:workqueue_queue_duration_seconds_bucket{name = "{#NAME}",le = "{#LE}"} ⛔️ON_FAIL: |
Zabbix raw items | Kubernetes Controller: Get Controller metrics | Get raw metrics from Controller instance /metrics endpoint. |
HTTP_AGENT | kubernetes.controller.getmetrics Preprocessing: - CHECK NOTSUPPORTED⛔️ON FAIL:DISCARD_VALUE -> |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes Controller Manager: Too many HTTP client errors | "Kubernetes Controller manager is experiencing high error rate (with 5xx HTTP code). |
min(/Kubernetes Controller manager by HTTP/kubernetes.controller.client_http_requests_500.rate,5m)>{$KUBE.CONTROLLER.HTTP.CLIENT.ERROR} |
WARNING |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
The template to monitor InfluxDB by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Kubernetes API server by HTTP
— collects metrics by HTTP agent from API server /metrics endpoint.
This template was tested on:
See Zabbix template operation for basic instructions.
Internal service metrics are collected from /metrics endpoint. Template needs to use Authorization via API token.
Don't forget change macros {$KUBE.API.SERVER.URL}, {$KUBE.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values. NOTE. Some metrics may not be collected depending on your Kubernetes API server instance version and configuration.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$KUBE.API.CERT.EXPIRATION} | Number of days for alert of client certificate used for trigger |
7 |
{$KUBE.API.HTTP.CLIENT.ERROR} | Maximum number of HTTP client requests failures used for trigger |
2 |
{$KUBE.API.HTTP.SERVER.ERROR} | Maximum number of HTTP client requests failures used for trigger |
2 |
{$KUBE.API.SERVER.URL} | instance URL |
http://localhost:8086/metrics |
{$KUBE.API.TOKEN} | API Authorization Token |
`` |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Authentication attempts discovery | Discovery authentication attempts by result. |
DEPENDENT | kubernetes.api.authenticationattempts.discovery Preprocessing: - PROMETHEUS TOJSON:authentication_attempts{result =~ ".*"} - JAVASCRIPT: - DISCARD UNCHANGED_HEARTBEAT:3h |
Authentication requests discovery | Discovery authentication attempts by name. |
DEPENDENT | kubernetes.api.authenticateduserrequests.discovery Preprocessing: - PROMETHEUSTOJSON: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
Client certificate expiration histogram | Discovery raw data of client certificate expiration |
DEPENDENT | kubernetes.api.certificateexpiration.discovery Preprocessing: - PROMETHEUS TOJSON:{__name__=~ "apiserver_client_certificate_expiration_seconds_*"} - JAVASCRIPT: - DISCARD UNCHANGEDHEARTBEAT:3h Overrides: bucket item buckets - ITEMPROTOTYPE LIKE bucket - DISCOVERtotal item totals - ITEMPROTOTYPE NOTLIKE bucket - DISCOVER |
Etcd objects metrics discovery | Discovery etcd objects by resource. |
DEPENDENT | kubernetes.api.etcdobjectcounts.discovery Preprocessing: - PROMETHEUSTOJSON: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
gRPC completed requests discovery | Discovery grpc completed requests by grpc code. |
DEPENDENT | kubernetes.api.grpcclienthandled.discovery Preprocessing: - PROMETHEUSTOJSON: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
Long-running requests | Discovery of long-running requests by verb, resource and scope. |
DEPENDENT | kubernetes.api.longrunninggauge.discovery Preprocessing: - PROMETHEUS TOJSON:apiserver_longrunning_gauge{resource =~ ".*", scope =~ ".*", verb =~ ".*"} - JAVASCRIPT: - DISCARD UNCHANGED_HEARTBEAT:3h |
Request duration histogram | Discovery raw data and percentile items of request duration. |
DEPENDENT | kubernetes.api.requestsbucket.discovery Preprocessing: - PROMETHEUS TOJSON:{__name__=~ "apiserver_request_duration_*", verb =~ ".*"} - JAVASCRIPT: - DISCARD UNCHANGEDHEARTBEAT:3h Overrides: bucket item buckets - ITEMPROTOTYPE LIKE bucket - DISCOVERtotal item totals - ITEMPROTOTYPE NOTLIKE bucket - DISCOVER |
Requests inflight discovery | Discovery requests inflight by kind. |
DEPENDENT | kubernetes.api.inflightrequests.discovery Preprocessing: - PROMETHEUS TOJSON:apiserver_current_inflight_requests{request_kind =~ ".*"} - JAVASCRIPT: - DISCARD UNCHANGED_HEARTBEAT:3h |
Watchers metrics discovery | Discovery watchers by kind. |
DEPENDENT | kubernetes.api.apiserverregisteredwatchers.discovery Preprocessing: - PROMETHEUSTOJSON: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
Workqueue metrics discovery | Discovery workqueue metrics by name. |
DEPENDENT | kubernetes.api.workqueue.discovery Preprocessing: - PROMETHEUSTOJSON: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Kubernetes API | Kubernetes API: Audit events, total | Accumulated number audit events generated and sent to the audit backend. |
DEPENDENT | kubernetes.api.auditeventtotal Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes API | Kubernetes API: Virtual memory, bytes | Virtual memory size in bytes. |
DEPENDENT | kubernetes.api.processvirtualmemorybytes Preprocessing: - PROMETHEUS PATTERN:process_virtual_memory_bytes ⛔️ON_FAIL: |
Kubernetes API | Kubernetes API: Resident memory, bytes | Resident memory size in bytes. |
DEPENDENT | kubernetes.api.processresidentmemorybytes Preprocessing: - PROMETHEUS PATTERN:process_resident_memory_bytes ⛔️ON_FAIL: |
Kubernetes API | Kubernetes API: CPU | Total user and system CPU usage ratio. |
DEPENDENT | kubernetes.api.cpu.util Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND - MULTIPLIER: |
Kubernetes API | Kubernetes API: Goroutines | Number of goroutines that currently exist. |
DEPENDENT | kubernetes.api.gogoroutines Preprocessing: - PROMETHEUS PATTERN:go_goroutines : function : sum ⛔️ON_FAIL: |
Kubernetes API | Kubernetes API: Go threads | Number of OS threads created. |
DEPENDENT | kubernetes.api.gothreads Preprocessing: - PROMETHEUS PATTERN:go_threads ⛔️ON_FAIL: |
Kubernetes API | Kubernetes API: Fds open | Number of open file descriptors. |
DEPENDENT | kubernetes.api.openfds Preprocessing: - PROMETHEUS PATTERN:process_open_fds ⛔️ON_FAIL: |
Kubernetes API | Kubernetes API: Fds max | Maximum allowed open file descriptors. |
DEPENDENT | kubernetes.api.maxfds Preprocessing: - PROMETHEUS PATTERN:process_max_fds ⛔️ON_FAIL: |
Kubernetes API | Kubernetes API: gRPCs client started, rate | Total number of RPCs started per second. |
DEPENDENT | kubernetes.api.grpcclientstarted.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Kubernetes API | Kubernetes API: gRPCs messages ressived, rate | Total number of gRPC stream messages received per second. |
DEPENDENT | kubernetes.api.grpcclientmsgreceived.rate Preprocessing: - PROMETHEUS PATTERN:grpc_client_msg_received_total : function : sum ⛔️ONFAIL: - CHANGEPER_SECOND |
Kubernetes API | Kubernetes API: gRPCs messages sent, rate | Total number of gRPC stream messages sent per second. |
DEPENDENT | kubernetes.api.grpcclientmsgsent.rate Preprocessing: - PROMETHEUS PATTERN:grpc_client_msg_sent_total : function : sum ⛔️ONFAIL: - CHANGEPER_SECOND |
Kubernetes API | Kubernetes API: Request terminations, rate | Number of requests which apiserver terminated in self-defense per second. |
DEPENDENT | kubernetes.api.apiserverrequestterminations Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Kubernetes API | Kubernetes API: TLS handshake errors, rate | Number of requests dropped with 'TLS handshake error from' error per second. |
DEPENDENT | kubernetes.api.apiservertlshandshakeerrorstotal.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes API | Kubernetes API: API server requests: 5xx, rate | Counter of apiserver requests broken out for each HTTP response code. |
DEPENDENT | kubernetes.api.apiserverrequesttotal500.rate Preprocessing: - PROMETHEUS PATTERN:apiserver_request_total{code =~ "5.."} : function : sum ⛔️ONFAIL: - CHANGEPER_SECOND |
Kubernetes API | Kubernetes API: API server requests: 4xx, rate | Counter of apiserver requests broken out for each HTTP response code. |
DEPENDENT | kubernetes.api.apiserverrequesttotal400.rate Preprocessing: - PROMETHEUS PATTERN:apiserver_request_total{code =~ "4.."} : function : sum ⛔️ONFAIL: - CHANGEPER_SECOND |
Kubernetes API | Kubernetes API: API server requests: 3xx, rate | Counter of apiserver requests broken out for each HTTP response code. |
DEPENDENT | kubernetes.api.apiserverrequesttotal300.rate Preprocessing: - PROMETHEUS PATTERN:apiserver_request_total{code =~ "3.."} : function : sum ⛔️ONFAIL: - CHANGEPER_SECOND |
Kubernetes API | Kubernetes API: API server requests: 0 | Counter of apiserver requests broken out for each HTTP response code. |
DEPENDENT | kubernetes.api.apiserverrequesttotal0.rate Preprocessing: - PROMETHEUS PATTERN:apiserver_request_total{code = "0"} : function : sum ⛔️ONFAIL: - CHANGEPER_SECOND |
Kubernetes API | Kubernetes API: API server requests: 2xx, rate | Counter of apiserver requests broken out for each HTTP response code. |
DEPENDENT | kubernetes.api.apiserverrequesttotal200.rate Preprocessing: - PROMETHEUS PATTERN:apiserver_request_total{code =~ "2.."} : function : sum ⛔️ONFAIL: - CHANGEPER_SECOND |
Kubernetes API | Kubernetes API: HTTP requests: 5xx, rate | Number of HTTP requests with 5xx status code per second. |
DEPENDENT | kubernetes.api.restclientrequeststotal500.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Kubernetes API | Kubernetes API: HTTP requests: 4xx, rate | Number of HTTP requests with 4xx status code per second. |
DEPENDENT | kubernetes.api.restclientrequeststotal400.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Kubernetes API | Kubernetes API: HTTP requests: 3xx, rate | Number of HTTP requests with 3xx status code per second. |
DEPENDENT | kubernetes.api.restclientrequeststotal300.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Kubernetes API | Kubernetes API: HTTP requests: 2xx, rate | Number of HTTP requests with 2xx status code per second. |
DEPENDENT | kubernetes.api.restclientrequeststotal200.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Kubernetes API | Kubernetes API: Long-running ["{#VERB}"] requests ["{#RESOURCE}"]: {#SCOPE} | Gauge of all active long-running apiserver requests broken out by verb, resource and scope. Not all requests are tracked this way. |
DEPENDENT | kubernetes.api.longrunninggauge["{#RESOURCE}","{#SCOPE}","{#VERB}"] Preprocessing: - PROMETHEUS PATTERN:apiserver_longrunning_gauge{resource = "{#RESOURCE}", scope = "{#SCOPE}", verb = "{#VERB}"} : function : sum ⛔️ON_FAIL: |
Kubernetes API | Kubernetes API: ["{#VERB}"] Requests bucket: {#LE} | Response latency distribution in seconds for each verb. |
DEPENDENT | kubernetes.api.requestdurationsecondsbucket[{#LE},"{#VERB}"] Preprocessing: - PROMETHEUS PATTERN:apiserver_request_duration_seconds_bucket{le="{#LE}",verb="{#VERB}"} : function : sum |
Kubernetes API | Kubernetes API: ["{#VERB}"] Requests, p90 | 90 percentile of response latency distribution in seconds for each verb. |
CALCULATED | kubernetes.api.requestdurationseconds_p90["{#VERB}"] Expression: bucket_percentile(//kubernetes.api.request_duration_seconds_bucket[*,"{#VERB}"],5m,90) |
Kubernetes API | Kubernetes API: ["{#VERB}"] Requests, p95 | 95 percentile of response latency distribution in seconds for each verb. |
CALCULATED | kubernetes.api.requestdurationseconds_p95["{#VERB}"] Expression: bucket_percentile(//kubernetes.api.request_duration_seconds_bucket[*,"{#VERB}"],5m,95) |
Kubernetes API | Kubernetes API: ["{#VERB}"] Requests, p99 | 99 percentile of response latency distribution in seconds for each verb. |
CALCULATED | kubernetes.api.requestdurationseconds_p99["{#VERB}"] Expression: bucket_percentile(//kubernetes.api.request_duration_seconds_bucket[*,"{#VERB}"],5m,99) |
Kubernetes API | Kubernetes API: ["{#VERB}"] Requests, p50 | 50 percentile of response latency distribution in seconds for each verb. |
CALCULATED | kubernetes.api.requestdurationseconds_p50["{#VERB}"] Expression: bucket_percentile(//kubernetes.api.request_duration_seconds_bucket[*,"{#VERB}"],5m,50) |
Kubernetes API | Kubernetes API: Requests current: {#KIND} | Maximal number of currently used inflight request limit of this apiserver per request kind in last second. |
DEPENDENT | kubernetes.api.currentinflightrequests["{#KIND}"] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes API | Kubernetes API: gRPCs completed: {#GRPC_CODE}, rate | Total number of RPCs completed by the client regardless of success or failure per second. |
DEPENDENT | kubernetes.api.grpcclienthandledtotal.rate["{#GRPCCODE}"] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Kubernetes API | Kubernetes API: Authentication attempts: {#RESULT}, rate | Authentication attempts by result per second. |
DEPENDENT | kubernetes.api.authenticationattempts.rate["{#RESULT}"] Preprocessing: - PROMETHEUS PATTERN:authentication_attempts{result = "{#RESULT}"} : function : sum ⛔️ONFAIL: - CHANGEPER_SECOND |
Kubernetes API | Kubernetes API: Authenticated requests: {#NAME}, rate | Counter of authenticated requests broken out by username per second. |
DEPENDENT | kubernetes.api.authenticateduserrequests.rate["{#NAME}"] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Kubernetes API | Kubernetes API: Watchers: {#KIND} | Number of currently registered watchers for a given resource. |
DEPENDENT | kubernetes.api.apiserverregisteredwatchers["{#KIND}"] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes API | Kubernetes API: etcd objects: {#RESOURCE} | Number of stored objects at the time of last check split by kind. |
DEPENDENT | kubernetes.api.etcdobjectcounts["{#RESOURCE}"] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes API | Kubernetes API: ["{#NAME}"] Workqueue depth | Current depth of workqueue. |
DEPENDENT | kubernetes.api.workqueuedepth["{#NAME}"] Preprocessing: - PROMETHEUS PATTERN:workqueue_depth{name = "{#NAME}"} ⛔️ON_FAIL: |
Kubernetes API | Kubernetes API: ["{#NAME}"] Workqueue adds total, rate | Total number of adds handled by workqueue per second. |
DEPENDENT | kubernetes.api.workqueueaddstotal.rate["{#NAME}"] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Kubernetes API | Kubernetes API: Certificate expiration seconds bucket, {#LE} | Distribution of the remaining lifetime on the certificate used to authenticate a request. |
DEPENDENT | kubernetes.api.clientcertificateexpirationsecondsbucket[{#LE}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Kubernetes API | Kubernetes API: Client certificate expiration, p1 | 1 percentile of the remaining lifetime on the certificate used to authenticate a request. |
CALCULATED | kubernetes.api.clientcertificateexpiration_p1[{#SINGLETON}] Expression: bucket_percentile(//kubernetes.api.client_certificate_expiration_seconds_bucket[*],5m,1) |
Zabbix raw items | Kubernetes API: Get API instance metrics | Get raw metrics from API instance /metrics endpoint. |
HTTP_AGENT | kubernetes.api.getmetrics Preprocessing: - CHECK NOTSUPPORTED⛔️ON FAIL:DISCARD_VALUE -> |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes API: Too many server errors | "Kubernetes API server is experiencing high error rate (with 5xx HTTP code). |
min(/Kubernetes API server by HTTP/kubernetes.api.apiserver_request_total_500.rate,5m)>{$KUBE.API.HTTP.SERVER.ERROR} |
WARNING | |
Kubernetes API: Too many client errors | "Kubernetes API client is experiencing high error rate (with 5xx HTTP code). |
min(/Kubernetes API server by HTTP/kubernetes.api.rest_client_requests_total_500.rate,5m)>{$KUBE.API.HTTP.CLIENT.ERROR} |
WARNING | |
Kubernetes API: Kubernetes client certificate is expiring | A client certificate used to authenticate to the apiserver is expiring in {$KUBE.API.CERT.EXPIRATION} days. |
last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) > 0 and last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) < {$KUBE.API.CERT.EXPIRATION}*24*60*60 |
WARNING | Depends on: - Kubernetes API: Kubernetes client certificate expires soon |
Kubernetes API: Kubernetes client certificate expires soon | A client certificate used to authenticate to the apiserver is expiring in less than 24.0 hours. |
last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) > 0 and last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) < 24*60*60 |
WARNING |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
Official JMX Template for Apache Kafka.
This template was tested on:
See Zabbix template operation for basic instructions.
Metrics are collected by JMX.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$KAFKA.NETPROCAVG_IDLE.MIN.WARN} | The minimum Network processor average idle percent for trigger expression. |
30 |
{$KAFKA.PASSWORD} | - |
zabbix |
{$KAFKA.REQUESTHANDLERAVG_IDLE.MIN.WARN} | The minimum Request handler average idle percent for trigger expression. |
30 |
{$KAFKA.TOPIC.MATCHES} | Filter of discoverable topics |
.* |
{$KAFKA.TOPIC.NOT_MATCHES} | Filter to exclude discovered topics |
__consumer_offsets |
{$KAFKA.USER} | - |
zabbix |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Topic Metrics (errors) | - |
JMX | jmx.discovery[beans,"kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec,topic=*"] Filter: AND- {#JMXTOPIC} MATCHESREGEX - {#JMXTOPIC} NOTMATCHES_REGEX |
Topic Metrics (read) | - |
JMX | jmx.discovery[beans,"kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic=*"] Filter: AND- {#JMXTOPIC} MATCHESREGEX - {#JMXTOPIC} NOTMATCHES_REGEX |
Topic Metrics (write) | - |
JMX | jmx.discovery[beans,"kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=*"] Filter: AND- {#JMXTOPIC} MATCHESREGEX - {#JMXTOPIC} NOTMATCHES_REGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Kafka | Kafka: Leader election per second | Number of leader elections per second. |
JMX | jmx["kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs","Count"] |
Kafka | Kafka: Unclean leader election per second | Number of “unclean” elections per second. |
JMX | jmx["kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec","Count"] Preprocessing: - CHANGEPERSECOND |
Kafka | Kafka: Controller state on broker | One indicates that the broker is the controller for the cluster. |
JMX | jmx["kafka.controller:type=KafkaController,name=ActiveControllerCount","Value"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Kafka | Kafka: Ineligible pending replica deletes | The number of ineligible pending replica deletes. |
JMX | jmx["kafka.controller:type=KafkaController,name=ReplicasIneligibleToDeleteCount","Value"] |
Kafka | Kafka: Pending replica deletes | The number of pending replica deletes. |
JMX | jmx["kafka.controller:type=KafkaController,name=ReplicasToDeleteCount","Value"] |
Kafka | Kafka: Ineligible pending topic deletes | The number of ineligible pending topic deletes. |
JMX | jmx["kafka.controller:type=KafkaController,name=TopicsIneligibleToDeleteCount","Value"] |
Kafka | Kafka: Pending topic deletes | The number of pending topic deletes. |
JMX | jmx["kafka.controller:type=KafkaController,name=TopicsToDeleteCount","Value"] |
Kafka | Kafka: Offline log directory count | The number of offline log directories (for example, after a hardware failure). |
JMX | jmx["kafka.log:type=LogManager,name=OfflineLogDirectoryCount","Value"] |
Kafka | Kafka: Offline partitions count | Number of partitions that don't have an active leader. |
JMX | jmx["kafka.controller:type=KafkaController,name=OfflinePartitionsCount","Value"] |
Kafka | Kafka: Bytes out per second | The rate at which data is fetched and read from the broker by consumers. |
JMX | jmx["kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec","Count"] Preprocessing: - CHANGEPERSECOND |
Kafka | Kafka: Bytes in per second | The rate at which data sent from producers is consumed by the broker. |
JMX | jmx["kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec","Count"] Preprocessing: - CHANGEPERSECOND |
Kafka | Kafka: Messages in per second | The rate at which individual messages are consumed by the broker. |
JMX | jmx["kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec","Count"] Preprocessing: - CHANGEPERSECOND |
Kafka | Kafka: Bytes rejected per second | The rate at which bytes rejected per second by the broker. |
JMX | jmx["kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec","Count"] Preprocessing: - CHANGEPERSECOND |
Kafka | Kafka: Client fetch request failed per second | Number of client fetch request failures per second. |
JMX | jmx["kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec","Count"] Preprocessing: - CHANGEPERSECOND |
Kafka | Kafka: Produce requests failed per second | Number of failed produce requests per second. |
JMX | jmx["kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec","Count"] Preprocessing: - CHANGEPERSECOND |
Kafka | Kafka: Request handler average idle percent | Indicates the percentage of time that the request handler (IO) threads are not in use. |
JMX | jmx["kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent","OneMinuteRate"] Preprocessing: - MULTIPLIER: |
Kafka | Kafka: Fetch-Consumer response send time, mean | Average time taken, in milliseconds, to send the response. |
JMX | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer","Mean"] |
Kafka | Kafka: Fetch-Consumer response send time, p95 | The time taken, in milliseconds, to send the response for 95th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer","95thPercentile"] |
Kafka | Kafka: Fetch-Consumer response send time, p99 | The time taken, in milliseconds, to send the response for 99th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer","99thPercentile"] |
Kafka | Kafka: Fetch-Follower response send time, mean | Average time taken, in milliseconds, to send the response. |
JMX | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchFollower","Mean"] |
Kafka | Kafka: Fetch-Follower response send time, p95 | The time taken, in milliseconds, to send the response for 95th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchFollower","95thPercentile"] |
Kafka | Kafka: Fetch-Follower response send time, p99 | The time taken, in milliseconds, to send the response for 99th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchFollower","99thPercentile"] |
Kafka | Kafka: Produce response send time, mean | Average time taken, in milliseconds, to send the response. |
JMX | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce","Mean"] |
Kafka | Kafka: Produce response send time, p95 | The time taken, in milliseconds, to send the response for 95th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce","95thPercentile"] |
Kafka | Kafka: Produce response send time, p99 | The time taken, in milliseconds, to send the response for 99th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce","99thPercentile"] |
Kafka | Kafka: Fetch-Consumer request total time, mean | Average time in ms to serve the Fetch-Consumer request. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer","Mean"] |
Kafka | Kafka: Fetch-Consumer request total time, p95 | Time in ms to serve the Fetch-Consumer request for 95th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer","95thPercentile"] |
Kafka | Kafka: Fetch-Consumer request total time, p99 | Time in ms to serve the specified Fetch-Consumer for 99th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer","99thPercentile"] |
Kafka | Kafka: Fetch-Follower request total time, mean | Average time in ms to serve the Fetch-Follower request. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower","Mean"] |
Kafka | Kafka: Fetch-Follower request total time, p95 | Time in ms to serve the Fetch-Follower request for 95th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower","95thPercentile"] |
Kafka | Kafka: Fetch-Follower request total time, p99 | Time in ms to serve the Fetch-Follower request for 99th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower","99thPercentile"] |
Kafka | Kafka: Produce request total time, mean | Average time in ms to serve the Produce request. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce","Mean"] |
Kafka | Kafka: Produce request total time, p95 | Time in ms to serve the Produce requests for 95th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce","95thPercentile"] |
Kafka | Kafka: Produce request total time, p99 | Time in ms to serve the Produce requests for 99th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce","99thPercentile"] |
Kafka | Kafka: Fetch-Consumer request total time, mean | Average time for a request to update metadata. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata","Mean"] |
Kafka | Kafka: UpdateMetadata request total time, p95 | Time for update metadata requests for 95th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata","95thPercentile"] |
Kafka | Kafka: UpdateMetadata request total time, p99 | Time for update metadata requests for 99th percentile. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata","99thPercentile"] |
Kafka | Kafka: Temporary memory size in bytes (Fetch), max | The maximum of temporary memory used for converting message formats and decompressing messages. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Fetch","Max"] |
Kafka | Kafka: Temporary memory size in bytes (Fetch), min | The minimum of temporary memory used for converting message formats and decompressing messages. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Fetch","Mean"] |
Kafka | Kafka: Temporary memory size in bytes (Produce), max | The maximum of temporary memory used for converting message formats and decompressing messages. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Produce","Max"] |
Kafka | Kafka: Temporary memory size in bytes (Produce), avg | The amount of temporary memory used for converting message formats and decompressing messages. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Produce","Mean"] |
Kafka | Kafka: Temporary memory size in bytes (Produce), min | The minimum of temporary memory used for converting message formats and decompressing messages. |
JMX | jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Produce","Min"] |
Kafka | Kafka: Network processor average idle percent | The average percentage of time that the network processors are idle. |
JMX | jmx["kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent","Value"] Preprocessing: - MULTIPLIER: |
Kafka | Kafka: Requests in producer purgatory | Number of requests waiting in producer purgatory. |
JMX | jmx["kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Fetch","Value"] |
Kafka | Kafka: Requests in fetch purgatory | Number of requests waiting in fetch purgatory. |
JMX | jmx["kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Produce","Value"] |
Kafka | Kafka: Replication maximum lag | The maximum lag between the time that messages are received by the leader replica and by the follower replicas. |
JMX | jmx["kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica","Value"] |
Kafka | Kafka: Under minimum ISR partition count | The number of partitions under the minimum In-Sync Replica (ISR) count. |
JMX | jmx["kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCount","Value"] |
Kafka | Kafka: Under replicated partitions | The number of partitions that have not been fully replicated in the follower replicas (the number of non-reassigning replicas - the number of ISR > 0). |
JMX | jmx["kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions","Value"] |
Kafka | Kafka: ISR expands per second | The rate at which the number of ISRs in the broker increases. |
JMX | jmx["kafka.server:type=ReplicaManager,name=IsrExpandsPerSec","Count"] Preprocessing: - CHANGEPERSECOND |
Kafka | Kafka: ISR shrink per second | Rate of replicas leaving the ISR pool. |
JMX | jmx["kafka.server:type=ReplicaManager,name=IsrShrinksPerSec","Count"] Preprocessing: - CHANGEPERSECOND |
Kafka | Kafka: Leader count | The number of replicas for which this broker is the leader. |
JMX | jmx["kafka.server:type=ReplicaManager,name=LeaderCount","Value"] |
Kafka | Kafka: Partition count | The number of partitions in the broker. |
JMX | jmx["kafka.server:type=ReplicaManager,name=PartitionCount","Value"] |
Kafka | Kafka: Number of reassigning partitions | The number of reassigning leader partitions on a broker. |
JMX | jmx["kafka.server:type=ReplicaManager,name=ReassigningPartitions","Value"] |
Kafka | Kafka: Request queue size | The size of the delay queue. |
JMX | jmx["kafka.server:type=Request","queue-size"] |
Kafka | Kafka: Version | Current version of broker. |
JMX | jmx["kafka.server:type=app-info","version"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Kafka | Kafka: Uptime | Service uptime in seconds. |
JMX | jmx["kafka.server:type=app-info","start-time-ms"] Preprocessing: - JAVASCRIPT: |
Kafka | Kafka: ZooKeeper client request latency | Latency in milliseconds for ZooKeeper requests from broker. |
JMX | jmx["kafka.server:type=ZooKeeperClientMetrics,name=ZooKeeperRequestLatencyMs","Count"] |
Kafka | Kafka: ZooKeeper connection status | Connection status of broker's ZooKeeper session. |
JMX | jmx["kafka.server:type=SessionExpireListener,name=SessionState","Value"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Kafka | Kafka: ZooKeeper disconnect rate | ZooKeeper client disconnect per second. |
JMX | jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperDisconnectsPerSec","Count"] Preprocessing: - CHANGEPERSECOND |
Kafka | Kafka: ZooKeeper session expiration rate | ZooKeeper client session expiration per second. |
JMX | jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperExpiresPerSec","Count"] Preprocessing: - CHANGEPERSECOND |
Kafka | Kafka: ZooKeeper readonly rate | ZooKeeper client readonly per second. |
JMX | jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperReadOnlyConnectsPerSec","Count"] Preprocessing: - CHANGEPERSECOND |
Kafka | Kafka: ZooKeeper sync rate | ZooKeeper client sync per second. |
JMX | jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperSyncConnectsPerSec","Count"] Preprocessing: - CHANGEPERSECOND |
Kafka | Kafka {#JMXTOPIC}: Messages in per second | The rate at which individual messages are consumed by topic. |
JMX | jmx["kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic={#JMXTOPIC}","Count"] Preprocessing: - CHANGEPERSECOND |
Kafka | Kafka {#JMXTOPIC}: Bytes in per second | The rate at which data sent from producers is consumed by topic. |
JMX | jmx["kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic={#JMXTOPIC}","Count"] Preprocessing: - CHANGEPERSECOND |
Kafka | Kafka {#JMXTOPIC}: Bytes out per second | The rate at which data is fetched and read from the broker by consumers (by topic). |
JMX | jmx["kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic={#JMXTOPIC}","Count"] Preprocessing: - CHANGEPERSECOND |
Kafka | Kafka {#JMXTOPIC}: Bytes rejected per second | Rejected bytes rate by topic. |
JMX | jmx["kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec,topic={#JMXTOPIC}","Count"] Preprocessing: - CHANGEPERSECOND |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kafka: Unclean leader election detected | Unclean leader elections occur when there is no qualified partition leader among Kafka brokers. If Kafka is configured to allow an unclean leader election, a leader is chosen from the out-of-sync replicas, and any messages that were not synced prior to the loss of the former leader are lost forever. Essentially, unclean leader elections sacrifice consistency for availability. |
last(/Apache Kafka by JMX/jmx["kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec","Count"])>0 |
AVERAGE | |
Kafka: There are offline log directories | The offline log directory count metric indicate the number of log directories which are offline (due to a hardware failure for example) so that the broker cannot store incoming messages anymore. |
last(/Apache Kafka by JMX/jmx["kafka.log:type=LogManager,name=OfflineLogDirectoryCount","Value"]) > 0 |
WARNING | |
Kafka: One or more partitions have no leader | Any partition without an active leader will be completely inaccessible, and both consumers and producers of that partition will be blocked until a leader becomes available. |
last(/Apache Kafka by JMX/jmx["kafka.controller:type=KafkaController,name=OfflinePartitionsCount","Value"]) > 0 |
WARNING | |
Kafka: Request handler average idle percent is too low | The request handler idle ratio metric indicates the percentage of time the request handlers are not in use. The lower this number, the more loaded the broker is. |
max(/Apache Kafka by JMX/jmx["kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent","OneMinuteRate"],15m)<{$KAFKA.REQUEST_HANDLER_AVG_IDLE.MIN.WARN} |
AVERAGE | |
Kafka: Network processor average idle percent is too low | The network processor idle ratio metric indicates the percentage of time the network processor are not in use. The lower this number, the more loaded the broker is. |
max(/Apache Kafka by JMX/jmx["kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent","Value"],15m)<{$KAFKA.NET_PROC_AVG_IDLE.MIN.WARN} |
AVERAGE | |
Kafka: Failed to fetch info data | Zabbix has not received data for items for the last 15 minutes |
nodata(/Apache Kafka by JMX/jmx["kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent","Value"],15m)=1 |
WARNING | |
Kafka: There are partitions under the min ISR | The Under min ISR partitions metric displays the number of partitions, where the number of In-Sync Replicas (ISR) is less than the minimum number of in-sync replicas specified. The two most common causes of under-min ISR partitions are that one or more brokers is unresponsive, or the cluster is experiencing performance issues and one or more brokers are falling behind. |
last(/Apache Kafka by JMX/jmx["kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCount","Value"])>0 |
AVERAGE | |
Kafka: There are under replicated partitions | The Under replicated partitions metric displays the number of partitions that do not have enough replicas to meet the desired replication factor. A partition will also be considered under-replicated if the correct number of replicas exist, but one or more of the replicas have fallen significantly behind the partition leader. The two most common causes of under-replicated partitions are that one or more brokers is unresponsive, or the cluster is experiencing performance issues and one or more brokers have fallen behind. |
last(/Apache Kafka by JMX/jmx["kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions","Value"])>0 |
AVERAGE | |
Kafka: Version has changed | Kafka version has changed. Ack to close. |
last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","version"],#1)<>last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","version"],#2) and length(last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","version"]))>0 |
INFO | Manual close: YES |
Kafka: has been restarted | Uptime is less than 10 minutes. |
last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","start-time-ms"])<10m |
INFO | Manual close: YES |
Kafka: Broker is not connected to ZooKeeper | - |
find(/Apache Kafka by JMX/jmx["kafka.server:type=SessionExpireListener,name=SessionState","Value"],,"regexp","CONNECTED")=0 |
AVERAGE |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher.
The template to monitor Apache Jenkins by Zabbix that work without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
This template was tested on:
See Zabbix template operation for basic instructions.
Metrics are collected by requests to Metrics API. For common metrics: Install and configure Metrics plugin parameters according official documentations. Do not forget to configure access to the Metrics Servlet by issuing API key and change macro {$JENKINS.API.KEY}.
For monitoring computers and builds: Create API token for monitoring user according official documentations and change macro {$JENKINS.USER}, {$JENKINS.API.TOKEN}. Don't forget to change macros {$JENKINS.URL}.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$JENKINS.API.KEY} | API key to access Metrics Servlet |
`` |
{$JENKINS.API.TOKEN} | API token for HTTP BASIC authentication. |
`` |
{$JENKINS.FILE_DESCRIPTORS.MAX.WARN} | Maximum percentage of file descriptors usage alert threshold (for trigger expression). |
85 |
{$JENKINS.JOB.HEALTH.SCORE.MIN.WARN} | Minimum job's health score (for trigger expression). |
50 |
{$JENKINS.PING.REPLY} | Expected reply to the ping. |
pong |
{$JENKINS.URL} | Jenkins URL in the format |
`` |
{$JENKINS.USER} | Username for HTTP BASIC authentication |
zabbix |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Computers discovery | - |
HTTP_AGENT | jenkins.computers Preprocessing: - JSONPATH: |
Jobs discovery | - |
HTTP_AGENT | jenkins.jobs Preprocessing: - JSONPATH: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Jenkins | Jenkins: Disk space check message | The message will reference the first node which fails this check. There may be other nodes that fail the check, but this health check is designed to fail fast. |
DEPENDENT | jenkins.diskspace.message Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> - DISCARDUNCHANGEDHEARTBEAT: |
Jenkins | Jenkins: Temporary space check message | The message will reference the first node which fails this check. There may be other nodes that fail the check, but this health check is designed to fail fast. |
DEPENDENT | jenkins.temporaryspace.message Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> - DISCARDUNCHANGEDHEARTBEAT: |
Jenkins | Jenkins: Plugins check message | The message of plugins health check. |
DEPENDENT | jenkins.plugins.message Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
Jenkins | Jenkins: Thread deadlock check message | The message of thread deadlock health check. |
DEPENDENT | jenkins.threaddeadlock.message Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> - DISCARDUNCHANGEDHEARTBEAT: |
Jenkins | Jenkins: Disk space check | Returns FAIL if any of the Jenkins disk space monitors are reporting the disk space as less than the configured threshold. |
DEPENDENT | jenkins.diskspace Preprocessing: - JSONPATH: - BOOL TODECIMAL- DISCARD UNCHANGED_HEARTBEAT:1h |
Jenkins | Jenkins: Plugins check | Returns FAIL if any of the Jenkins plugins failed to start. |
DEPENDENT | jenkins.plugins Preprocessing: - JSONPATH: - BOOLTODECIMAL - DISCARDUNCHANGEDHEARTBEAT: |
Jenkins | Jenkins: Temporary space check | Returns FAIL if any of the Jenkins temporary space monitors are reporting the temporary space as less than the configured threshold. |
DEPENDENT | jenkins.temporaryspace Preprocessing: - JSONPATH: - BOOL TODECIMAL- DISCARD UNCHANGED_HEARTBEAT:1h |
Jenkins | Jenkins: Thread deadlock check | Returns FAIL if there are any deadlocked threads in the Jenkins master JVM. |
DEPENDENT | jenkins.threaddeadlock Preprocessing: - JSONPATH: - BOOL TODECIMAL- DISCARD UNCHANGED_HEARTBEAT:1h |
Jenkins | Jenkins: Executors count | The number of executors available to Jenkins. This is corresponds to the sum of all the executors of all the on-line nodes. |
DEPENDENT | jenkins.executor.count Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Jenkins | Jenkins: Executors free | The number of executors available to Jenkins that are not currently in use. |
DEPENDENT | jenkins.executor.free Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Executors in use | The number of executors available to Jenkins that are currently in use. |
DEPENDENT | jenkins.executor.in_use Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Nodes count | The number of build nodes available to Jenkins, both on-line and off-line. |
DEPENDENT | jenkins.node.count Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Jenkins | Jenkins: Nodes offline | The number of build nodes available to Jenkins but currently off-line. |
DEPENDENT | jenkins.node.offline Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Jenkins | Jenkins: Nodes online | The number of build nodes available to Jenkins and currently on-line. |
DEPENDENT | jenkins.node.online Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Jenkins | Jenkins: Plugins active | The number of plugins in the Jenkins instance that started successfully. |
DEPENDENT | jenkins.plugins.active Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Jenkins | Jenkins: Plugins failed | The number of plugins in the Jenkins instance that failed to start. A value other than 0 is typically indicative of a potential issue within the Jenkins installation that will either be solved by explicitly disabling the plugin(s) or by resolving the plugin dependency issues. |
DEPENDENT | jenkins.plugins.failed Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Jenkins | Jenkins: Plugins inactive | The number of plugins in the Jenkins instance that are not currently enabled. |
DEPENDENT | jenkins.plugins.inactive Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Jenkins | Jenkins: Plugins with update | The number of plugins in the Jenkins instance that have an newer version reported as available in the current Jenkins update center metadata held by Jenkins. This value is not indicative of an issue with Jenkins but high values can be used as a trigger to review the plugins with updates with a view to seeing whether those updates potentially contain fixes for issues that could be affecting your Jenkins instance. |
DEPENDENT | jenkins.plugins.withupdate Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
Jenkins | Jenkins: Projects count | The number of projects. |
DEPENDENT | jenkins.project.count Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Jenkins | Jenkins: Jobs count | The number of jobs in Jenkins. |
DEPENDENT | jenkins.job.count.value Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Jenkins | Jenkins: Job scheduled, m1 rate | The rate at which jobs are scheduled. If a job is already in the queue and an identical request for scheduling the job is received then Jenkins will coalesce the two requests. This metric gives a reasonably pure measure of the load requirements of the Jenkins master as it is unaffected by the number of executors available to the system. |
DEPENDENT | jenkins.job.scheduled.m1.rate Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Jobs scheduled, m5 rate | The rate at which jobs are scheduled. If a job is already in the queue and an identical request for scheduling the job is received then Jenkins will coalesce the two requests. This metric gives a reasonably pure measure of the load requirements of the Jenkins master as it is unaffected by the number of executors available to the system. |
DEPENDENT | jenkins.job.scheduled.m5.rate Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Job blocked, m1 rate | The rate at which jobs in the build queue enter the blocked state. |
DEPENDENT | jenkins.job.blocked.m1.rate Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Job blocked, m5 rate | The rate at which jobs in the build queue enter the blocked state. |
DEPENDENT | jenkins.job.blocked.m5.rate Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Job blocked duration, p95 | The amount of time which jobs spend in the blocked state. |
DEPENDENT | jenkins.job.blocked.duration.p95 Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Job blocked duration, median | The amount of time which jobs spend in the blocked state. |
DEPENDENT | jenkins.job.blocked.duration.p50 Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Job building, m1 rate | The rate at which jobs are built. |
DEPENDENT | jenkins.job.building.m1.rate Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Job building, m5 rate | The rate at which jobs are built. |
DEPENDENT | jenkins.job.building.m5.rate Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Job building duration, p95 | The amount of time which jobs spend building. |
DEPENDENT | jenkins.job.building.duration.p95 Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Job building duration, median | The amount of time which jobs spend building. |
DEPENDENT | jenkins.job.building.duration.p50 Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Job buildable, m1 rate | The rate at which jobs in the build queue enter the buildable state. |
DEPENDENT | jenkins.job.buildable.m1.rate Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Job buildable, m5 rate | The rate at which jobs in the build queue enter the buildable state. |
DEPENDENT | jenkins.job.buildable.m5.rate Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Job buildable duration, p95 | The amount of time which jobs spend in the buildable state. |
DEPENDENT | jenkins.job.buildable.duration.p95 Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Job buildable duration, median | The amount of time which jobs spend in the buildable state. |
DEPENDENT | jenkins.job.buildable.duration.p50 Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Job queuing, m1 rate | The rate at which jobs are queued. |
DEPENDENT | jenkins.job.queuing.m1.rate Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Job queuing, m5 rate | The rate at which jobs are queued. |
DEPENDENT | jenkins.job.queuing.m5.rate Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Job queuing duration, p95 | The total time which jobs spend in the build queue. |
DEPENDENT | jenkins.job.queuing.duration.p95 Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Job queuing duration, median | The total time which jobs spend in the build queue. |
DEPENDENT | jenkins.job.queuing.duration.p50 Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Job total, m1 rate | The rate at which jobs are queued. |
DEPENDENT | jenkins.job.total.m1.rate Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Job total, m5 rate | The rate at which jobs are queued. |
DEPENDENT | jenkins.job.total.m5.rate Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Job total duration, p95 | The total time which jobs spend from entering the build queue to completing building. |
DEPENDENT | jenkins.job.total.duration.p95 Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Job total duration, median | The total time which jobs spend from entering the build queue to completing building. |
DEPENDENT | jenkins.job.total.duration.p50 Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Job waiting, m1 rate | The rate at which jobs enter the quiet period. |
DEPENDENT | jenkins.job.waiting.m1.rate Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Job waiting, m5 rate | The rate at which jobs enter the quiet period. |
DEPENDENT | jenkins.job.waiting.m5.rate Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Job waiting duration, p95 | The total amount of time that jobs spend in their quiet period. |
DEPENDENT | jenkins.job.waiting.duration.p95 Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Job waiting duration, median | The total amount of time that jobs spend in their quiet period. |
DEPENDENT | jenkins.job.waiting.duration.p50 Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Build queue, blocked | The number of jobs that are in the Jenkins build queue and currently in the blocked state. |
DEPENDENT | jenkins.queue.blocked Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Build queue, size | The number of jobs that are in the Jenkins build queue. |
DEPENDENT | jenkins.queue.size Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Build queue, buildable | The number of jobs that are in the Jenkins build queue and currently in the blocked state. |
DEPENDENT | jenkins.queue.buildable Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Build queue, pending | The number of jobs that are in the Jenkins build queue and currently in the blocked state. |
DEPENDENT | jenkins.queue.pending Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Build queue, stuck | The number of jobs that are in the Jenkins build queue and currently in the blocked state. |
DEPENDENT | jenkins.queue.stuck Preprocessing: - JSONPATH: |
Jenkins | Jenkins: HTTP active requests, rate | The number of currently active requests against the Jenkins master Web UI. |
DEPENDENT | jenkins.http.activerequests.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Jenkins | Jenkins: HTTP response 400, rate | The rate at which the Jenkins master Web UI is responding to requests with a HTTP/400 status code. |
DEPENDENT | jenkins.http.badrequest.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Jenkins | Jenkins: HTTP response 500, rate | The rate at which the Jenkins master Web UI is responding to requests with a HTTP/500 status code. |
DEPENDENT | jenkins.http.servererror.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Jenkins | Jenkins: HTTP response 503, rate | The rate at which the Jenkins master Web UI is responding to requests with a HTTP/503 status code. |
DEPENDENT | jenkins.http.serviceunavailable.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Jenkins | Jenkins: HTTP response 200, rate | The rate at which the Jenkins master Web UI is responding to requests with a HTTP/200 status code. |
DEPENDENT | jenkins.http.ok.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Jenkins | Jenkins: HTTP response other, rate | The rate at which the Jenkins master Web UI is responding to requests with a non-informational status code that is not in the list: HTTP/200, HTTP/201, HTTP/204, HTTP/304, HTTP/400, HTTP/403, HTTP/404, HTTP/500, or HTTP/503. |
DEPENDENT | jenkins.http.other.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Jenkins | Jenkins: HTTP response 201, rate | The rate at which the Jenkins master Web UI is responding to requests with a HTTP/201 status code. |
DEPENDENT | jenkins.http.created.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Jenkins | Jenkins: HTTP response 204, rate | The rate at which the Jenkins master Web UI is responding to requests with a HTTP/204 status code. |
DEPENDENT | jenkins.http.nocontent.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Jenkins | Jenkins: HTTP response 404, rate | The rate at which the Jenkins master Web UI is responding to requests with a HTTP/404 status code. |
DEPENDENT | jenkins.http.notfound.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Jenkins | Jenkins: HTTP response 304, rate | The rate at which the Jenkins master Web UI is responding to requests with a HTTP/304 status code. |
DEPENDENT | jenkins.http.notmodified.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Jenkins | Jenkins: HTTP response 403, rate | The rate at which the Jenkins master Web UI is responding to requests with a HTTP/403 status code. |
DEPENDENT | jenkins.http.forbidden.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Jenkins | Jenkins: HTTP requests, rate | The rate at which the Jenkins master Web UI is receiving requests. |
DEPENDENT | jenkins.http.requests.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Jenkins | Jenkins: HTTP requests, p95 | The time spent generating the corresponding responses. |
DEPENDENT | jenkins.http.requests_p95.rate Preprocessing: - JSONPATH: |
Jenkins | Jenkins: HTTP requests, median | The time spent generating the corresponding responses. |
DEPENDENT | jenkins.http.requests_p50.rate Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Version | Version of Jenkins server. |
DEPENDENT | jenkins.version Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Jenkins | Jenkins: CPU Load | The system load on the Jenkins master as reported by the JVM's Operating System JMX bean. The calculation of system load is operating system dependent. Typically this is the sum of the number of processes that are currently running plus the number that are waiting to run. This is typically comparable against the number of CPU cores. |
DEPENDENT | jenkins.system.cpu.load Preprocessing: - JSONPATH: |
Jenkins | Jenkins: Uptime | The number of seconds since the Jenkins master JVM started. |
DEPENDENT | jenkins.system.uptime Preprocessing: - JSONPATH: - MULTIPLIER: |
Jenkins | Jenkins: File descriptor ratio | The ratio of used to total file descriptors |
DEPENDENT | jenkins.descriptor.ratio Preprocessing: - JSONPATH: - MULTIPLIER: |
Jenkins | Jenkins: Service ping | HTTP_AGENT | jenkins.ping Preprocessing: - CHECKNOTSUPPORTED ⛔️ONFAIL: - REGEX: ⛔️ONFAIL: - DISCARDUNCHANGEDHEARTBEAT: |
|
Jenkins | Jenkins job [{#NAME}]: Get job | Raw data for a job. |
DEPENDENT | jenkins.job.get[{#NAME}] Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
Jenkins | Jenkins job [{#NAME}]: Health score | Represents health of project. A number between 0-100. Job Description: {#DESCRIPTION} Job Url: {#URL} |
DEPENDENT | jenkins.build.health[{#NAME}] Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
Jenkins | Jenkins job [{#NAME}]: Last Build number | Details: {#URL}/lastBuild/ |
DEPENDENT | jenkins.lastbuild.number[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - DISCARDUNCHANGEDHEARTBEAT: |
Jenkins | Jenkins job [{#NAME}]: Last Build duration | Build duration (in seconds). |
DEPENDENT | jenkins.lastbuild.duration[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
Jenkins | Jenkins job [{#NAME}]: Last Build timestamp | DEPENDENT | jenkins.lastbuild.timestamp[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
|
Jenkins | Jenkins job [{#NAME}]: Last Build result | DEPENDENT | jenkins.lastbuild.result[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - DISCARDUNCHANGEDHEARTBEAT: |
|
Jenkins | Jenkins job [{#NAME}]: Last Failed Build number | Details: {#URL}/lastFailedBuild/ |
DEPENDENT | jenkins.lastfailedbuild.number[{#NAME}] Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
Jenkins | Jenkins job [{#NAME}]: Last Failed Build duration | Build duration (in seconds). |
DEPENDENT | jenkins.lastfailedbuild.duration[{#NAME}] Preprocessing: - JSONPATH: ⛔️ONFAIL: - MULTIPLIER: - DISCARDUNCHANGED_HEARTBEAT: |
Jenkins | Jenkins job [{#NAME}]: Last Failed Build timestamp | - |
DEPENDENT | jenkins.lastfailedbuild.timestamp[{#NAME}] Preprocessing: - JSONPATH: ⛔️ONFAIL: - MULTIPLIER: - DISCARDUNCHANGED_HEARTBEAT: |
Jenkins | Jenkins job [{#NAME}]: Last Successful Build number | Details: {#URL}/lastSuccessfulBuild/ |
DEPENDENT | jenkins.lastsuccessfulbuild.number[{#NAME}] Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
Jenkins | Jenkins job [{#NAME}]: Last Successful Build duration | Build duration (in seconds). |
DEPENDENT | jenkins.lastsuccessfulbuild.duration[{#NAME}] Preprocessing: - JSONPATH: ⛔️ONFAIL: - MULTIPLIER: - DISCARDUNCHANGED_HEARTBEAT: |
Jenkins | Jenkins job [{#NAME}]: Last Successful Build timestamp | - |
DEPENDENT | jenkins.lastsuccessfulbuild.timestamp[{#NAME}] Preprocessing: - JSONPATH: ⛔️ONFAIL: - MULTIPLIER: - DISCARDUNCHANGED_HEARTBEAT: |
Jenkins | Jenkins: Computer [{#DISPLAY_NAME}]: Get computer | Raw data for a computer. |
DEPENDENT | jenkins.computer.get[{#DISPLAYNAME}] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - DISCARDUNCHANGEDHEARTBEAT: |
Jenkins | Jenkins: Computer [{#DISPLAY_NAME}]: Executors | The maximum number of concurrent builds that Jenkins may perform on this node. |
DEPENDENT | jenkins.computer.numExecutors[{#DISPLAYNAME}] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
Jenkins | Jenkins: Computer [{#DISPLAY_NAME}]: State | Represents the actual online/offline state. Node description: {#DESCRIPTION} |
DEPENDENT | jenkins.computer.state[{#DISPLAYNAME}] Preprocessing: - JSONPATH: - BOOL TODECIMAL- DISCARD UNCHANGED_HEARTBEAT:1h |
Jenkins | Jenkins: Computer [{#DISPLAY_NAME}]: Offline cause reason | If the computer was offline (either temporarily or not), will return the cause as a string (without user info). Empty string if the system was put offline without given a cause. |
DEPENDENT | jenkins.computer.offline.reason[{#DISPLAYNAME}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:3h |
Jenkins | Jenkins: Computer [{#DISPLAY_NAME}]: Idle | Returns true if all the executors of this computer are idle. |
DEPENDENT | jenkins.computer.idle[{#DISPLAYNAME}] Preprocessing: - JSONPATH: - BOOL TODECIMAL- DISCARD UNCHANGED_HEARTBEAT:1h |
Jenkins | Jenkins: Computer [{#DISPLAY_NAME}]: Temporarily offline | Returns true if this node is marked temporarily offline. |
DEPENDENT | jenkins.computer.tempoffline[{#DISPLAYNAME}] Preprocessing: - JSONPATH: - BOOLTODECIMAL - DISCARDUNCHANGEDHEARTBEAT: |
Jenkins | Jenkins: Computer [{#DISPLAY_NAME}]: Available disk space | The available disk space of $JENKINS_HOME on agent. |
DEPENDENT | jenkins.computer.diskspace[{#DISPLAYNAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Jenkins | Jenkins: Computer [{#DISPLAY_NAME}]: Available temp space | The available disk space of the temporary directory. Java tools and tests/builds often create files in the temporary directory, and may not function properly if there's no available space. |
DEPENDENT | jenkins.computer.tempspace[{#DISPLAYNAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Jenkins | Jenkins: Computer [{#DISPLAY_NAME}]: Response time average | The round trip network response time from the master to the agent |
DEPENDENT | jenkins.computer.responsetime[{#DISPLAYNAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: - MULTIPLIER: |
Jenkins | Jenkins: Computer [{#DISPLAY_NAME}]: Available physical memory | The total physical memory of the system, available bytes. |
DEPENDENT | jenkins.computer.availablephysicalmemory[{#DISPLAYNAME}] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
Jenkins | Jenkins: Computer [{#DISPLAY_NAME}]: Available swap space | Available swap space in bytes. |
DEPENDENT | jenkins.computer.availableswapspace[{#DISPLAYNAME}] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
Jenkins | Jenkins: Computer [{#DISPLAY_NAME}]: Total physical memory | Total physical memory of the system, in bytes. |
DEPENDENT | jenkins.computer.totalphysicalmemory[{#DISPLAYNAME}] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
Jenkins | Jenkins: Computer [{#DISPLAY_NAME}]: Total swap space | Total number of swap space in bytes. |
DEPENDENT | jenkins.computer.totalswapspace[{#DISPLAYNAME}] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
Jenkins | Jenkins: Computer [{#DISPLAY_NAME}]: Clock difference | The clock difference between the master and nodes. |
DEPENDENT | jenkins.computer.clockdifference[{#DISPLAYNAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: - MULTIPLIER: |
Zabbix raw items | Jenkins: Get service metrics | - |
HTTP_AGENT | jenkins.getmetrics Preprocessing: - CHECK NOTSUPPORTED⛔️ON FAIL:DISCARD_VALUE -> |
Zabbix raw items | Jenkins: Get healthcheck | HTTP_AGENT | jenkins.healthcheck Preprocessing: - CHECKNOTSUPPORTED ⛔️ON_FAIL: |
|
Zabbix raw items | Jenkins: Get jobs info | - |
HTTP_AGENT | jenkins.jobinfo Preprocessing: - CHECK NOTSUPPORTED⛔️ON FAIL:DISCARD_VALUE -> |
Zabbix raw items | Jenkins: Get computer info | - |
HTTP_AGENT | jenkins.computerinfo Preprocessing: - CHECK NOTSUPPORTED⛔️ON FAIL:DISCARD_VALUE -> |
Zabbix raw items | Jenkins: Get gauges | Raw items for gauges metrics. |
DEPENDENT | jenkins.gauges.raw Preprocessing: - JSONPATH: |
Zabbix raw items | Jenkins: Get meters | Raw items for meters metrics. |
DEPENDENT | jenkins.meters.raw Preprocessing: - JSONPATH: |
Zabbix raw items | Jenkins: Get timers | Raw items for timers metrics. |
DEPENDENT | jenkins.timers.raw Preprocessing: - JSONPATH: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Jenkins: Disk space is too low | Jenkins disk space monitors are reporting the disk space as less than the configured threshold. The message will reference the first node which fails this check. Health check message: {{ITEM.LASTVALUE2}.regsub("(.*)",\1)} |
last(/Jenkins by HTTP/jenkins.disk_space)=0 and length(last(/Jenkins by HTTP/jenkins.disk_space.message))>0 |
WARNING | |
Jenkins: One or more Jenkins plugins failed to start | A failure is typically indicative of a potential issue within the Jenkins installation that will either be solved by explicitly disabling the failing plugin(s) or by resolving the corresponding plugin dependency issues. Health check message: {{ITEM.LASTVALUE2}.regsub("(.*)",\1)} |
last(/Jenkins by HTTP/jenkins.plugins)=0 and length(last(/Jenkins by HTTP/jenkins.plugins.message))>0 |
INFO | Manual close: YES |
Jenkins: Temporary space is too low | Jenkins temporary space monitors are reporting the temporary space as less than the configured threshold. The message will reference the first node which fails this check. Health check message: {{ITEM.LASTVALUE2}.regsub("(.*)",\1)} |
last(/Jenkins by HTTP/jenkins.temporary_space)=0 and length(last(/Jenkins by HTTP/jenkins.temporary_space.message))>0 |
WARNING | |
Jenkins: There are deadlocked threads in Jenkins master JVM | There are any deadlocked threads in the Jenkins master JVM. Health check message: {{ITEM.LASTVALUE2}.regsub('(.*)',\1)} |
last(/Jenkins by HTTP/jenkins.thread_deadlock)=0 and length(last(/Jenkins by HTTP/jenkins.thread_deadlock.message))>0 |
WARNING | |
Jenkins: Service has no online nodes | - |
last(/Jenkins by HTTP/jenkins.node.online)=0 |
AVERAGE | |
Jenkins: Version has changed | The Jenkins version has changed. Perform Ack to close. |
last(/Jenkins by HTTP/jenkins.version,#1)<>last(/Jenkins by HTTP/jenkins.version,#2) and length(last(/Jenkins by HTTP/jenkins.version))>0 |
INFO | Manual close: YES |
Jenkins: Host has been restarted | Uptime is less than 10 minutes. |
last(/Jenkins by HTTP/jenkins.system.uptime)<10m |
INFO | Manual close: YES |
Jenkins: Current number of used files is too high | - |
min(/Jenkins by HTTP/jenkins.descriptor.ratio,5m)>{$JENKINS.FILE_DESCRIPTORS.MAX.WARN} |
WARNING | |
Jenkins: Service is down | - |
last(/Jenkins by HTTP/jenkins.ping)=0 |
AVERAGE | Manual close: YES |
Jenkins job [{#NAME}]: Job is unhealthy | - |
last(/Jenkins by HTTP/jenkins.build.health[{#NAME}])<{$JENKINS.JOB.HEALTH.SCORE.MIN.WARN} |
WARNING | Manual close: YES |
Jenkins: Computer [{#DISPLAY_NAME}]: Node is down | Node down with reason: {{ITEM.LASTVALUE2}.regsub("(.*)",\1)} |
last(/Jenkins by HTTP/jenkins.computer.state[{#DISPLAY_NAME}])=1 and length(last(/Jenkins by HTTP/jenkins.computer.offline.reason[{#DISPLAY_NAME}]))>0 |
AVERAGE | Depends on: - Jenkins: Computer [{#DISPLAY_NAME}]: Node is temporarily offline - Jenkins: Service has no online nodes |
Jenkins: Computer [{#DISPLAY_NAME}]: Node is temporarily offline | Node is temporarily Offline with reason: {{ITEM.LASTVALUE2}.regsub("(.*)",\1)} |
last(/Jenkins by HTTP/jenkins.computer.temp_offline[{#DISPLAY_NAME}])=1 and length(last(/Jenkins by HTTP/jenkins.computer.offline.reason[{#DISPLAY_NAME}]))>0 |
INFO | Manual close: YES |
Please report any issues with the template at https://support.zabbix.com.
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums.
For Zabbix version: 6.2 and higher
Refer to the vendor documentation.
No specific Zabbix configuration is required.
There are no template links in this template.
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Services | IMAP service is running | - |
SIMPLE | net.tcp.service[imap] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
IMAP service is down on {HOST.NAME} | - |
max(/IMAP Service/net.tcp.service[imap],#3)=0 |
AVERAGE |
Please report any issues with the template at https://support.zabbix.com
For Zabbix version: 6.2 and higher
The template to monitor IIS (Internet Information Services) by Zabbix that works without any external scripts.
This template was tested on:
See Zabbix template operation for basic instructions.
You have to enable the following Windows Features (Control Panel > Programs and Features > Turn Windows features on or off) on your server
Web Server (IIS)
Web Server (IIS)\Management Tools\IIS Management Scripts and Tools
Optionally, it is possible to customize the template:
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$AGENT.TIMEOUT} | Timeout after which agent is considered unavailable. |
5m |
{$IIS.APPPOOL.MATCHES} | This macro is used in application pools discovery. Can be overridden on the host or linked template level. |
.+ |
{$IIS.APPPOOL.MONITORED} | Monitoring status for discovered application pools. Use context to avoid trigger firing for specific application pools. "1" - enabled, "0" - disabled. |
1 |
{$IIS.APPPOOL.NOT_MATCHES} | This macro is used in application pools discovery. Can be overridden on the host or linked template level. |
<CHANGE_IF_NEEDED> |
{$IIS.PORT} | Listening port. |
80 |
{$IIS.QUEUE.MAX.TIME} | The time during which the queue length may exceed the threshold. |
5m |
{$IIS.QUEUE.MAX.WARN} | Maximum application pool's request queue length for trigger expression. |
`` |
{$IIS.SERVICE} | The service (http/https/etc) for port check. See "net.tcp.service" documentation page for more information: https://www.zabbix.com/documentation/6.2/manual/config/items/itemtypes/simple_checks |
http |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Application pools discovery | - |
ZABBIX_ACTIVE | wmi.getall[root\webAdministration, select Name from ApplicationPool] Filter: AND- {#APPPOOL} NOTMATCHESREGEX - {#APPPOOL} MATCHES_REGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
IIS | IIS: World Wide Web Publishing Service (W3SVC) state | The World Wide Web Publishing Service (W3SVC) provides web connectivity and administration of websites through the IIS snap-in. If the World Wide Web Publishing Service stops, the operating system cannot serve any form of web request. This service was dependent on "Windows Process Activation Service". |
ZABBIX_ACTIVE | service.info[W3SVC] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
IIS | IIS: Windows Process Activation Service (WAS) state | Windows Process Activation Service (WAS) is a tool for managing worker processes that contain applications that host Windows Communication Foundation (WCF) services. Worker processes handle requests that are sent to a Web Server for specific application pools. Each application pool sets boundaries for the applications it contains. |
ZABBIX_ACTIVE | service.info[WAS] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
IIS | IIS: {$IIS.PORT} port ping | - |
SIMPLE | net.tcp.service[{$IIS.SERVICE},,{$IIS.PORT}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
IIS | IIS: Active agent availability | Availability of active checks on the host. The value of this item corresponds to availability icons in the host list. Possible value: 0 - unknown 1 - available 2 - not available |
INTERNAL | zabbix[host,active_agent,available] |
IIS | IIS: Uptime | Service uptime in seconds. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(_Total)\Service Uptime"] |
IIS | IIS: Bytes Received per second | The average rate per minute at which data bytes are received by the service at the Application Layer. Does not include protocol headers or control bytes. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(_Total)\Bytes Received/sec", 60] |
IIS | IIS: Bytes Sent per second | The average rate per minute at which data bytes are sent by the service. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(_Total)\Bytes Sent/sec", 60] |
IIS | IIS: Bytes Total per second | The average rate per minute of total bytes/sec transferred by the Web service (sum of bytes sent/sec and bytes received/sec). |
ZABBIX_ACTIVE | perfcounteren["\Web Service(_Total)\Bytes Total/Sec", 60] |
IIS | IIS: Current connections | The number of active connections. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(_Total)\Current Connections"] |
IIS | IIS: Total connection attempts | The total number of connections to the Web or FTP service that have been attempted since service startup. The count is the total for all Web sites or FTP sites combined. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(Total)\Total Connection Attempts (all instances)"] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Connection attempts per second | The average rate per minute that connections using the Web service are being attempted. The count is the average for all Web sites combined. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(Total)\Connection Attempts/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Anonymous users per second | The number of requests from users over an anonymous connection per second. Average per minute. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(_Total)\Anonymous Users/sec", 60] |
IIS | IIS: NonAnonymous users per second | The number of requests from users over a non-anonymous connection per second. Average per minute. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(_Total)\NonAnonymous Users/sec", 60] |
IIS | IIS: Method GET requests per second | The rate of HTTP requests made using the GET method. GET requests are generally used for basic file retrievals or image maps, though they can be used with forms. Average per minute. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(Total)\Get Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method COPY requests per second | The rate of HTTP requests made using the COPY method. Copy requests are used for copying files and directories. Average per minute. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(Total)\Copy Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method CGI requests per second | The rate of CGI requests that are simultaneously being processed by the Web service. Average per minute. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(Total)\CGI Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method DELETE requests per second | The rate of HTTP requests using the DELETE method made. Average per minute. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(Total)\Delete Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method HEAD requests per second | The rate of HTTP requests using the HEAD method made. HEAD requests generally indicate a client is querying the state of a document they already have to see if it needs to be refreshed. Average per minute. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(Total)\Head Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method ISAPI requests per second | The rate of ISAPI Extension requests that are simultaneously being processed by the Web service. Average per minute. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(Total)\ISAPI Extension Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method LOCK requests per second | The rate of HTTP requests made using the LOCK method. Lock requests are used to lock a file for one user so that only that user can modify the file. Average per minute. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(Total)\Lock Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method MKCOL requests per second | The rate of HTTP requests using the MKCOL method made. Mkcol requests are used to create directories on the server. Average per minute. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(Total)\Mkcol Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method MOVE requests per second | The rate of HTTP requests using the MOVE method made. Move requests are used for moving files and directories. Average per minute. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(Total)\Move Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method OPTIONS requests per second | The rate of HTTP requests using the OPTIONS method made. Average per minute. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(Total)\Options Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method POST requests per second | Rate of HTTP requests using POST method. Generally used for forms or gateway requests. Average per minute. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(Total)\Post Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method PROPFIND requests per second | The rate of HTTP requests using the PROPFIND method made. Propfind requests retrieve property values on files and directories. Average per minute. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(Total)\Propfind Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method PROPPATCH requests per second | The rate of HTTP requests using the PROPPATCH method made. Proppatch requests set property values on files and directories. Average per minute. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(Total)\Proppatch Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method PUT requests per second | The rate of HTTP requests using the PUT method made. Average per minute. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(Total)\Put Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method MS-SEARCH requests per second | The rate of HTTP requests using the MS-SEARCH method made. Search requests are used to query the server to find resources that match a set of conditions provided by the client. Average per minute. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(Total)\Search Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method TRACE requests per second | The rate of HTTP requests using the TRACE method made. Average per minute. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(Total)\Trace Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method TRACE requests per second | The rate of HTTP requests using the UNLOCK method made. Unlock requests are used to remove locks from files. Average per minute. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(Total)\Unlock Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method Total requests per second | The rate of all HTTP requests received. Average per minute. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(Total)\Total Method Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method Total Other requests per second | Total Other Request Methods is the number of HTTP requests that are not OPTIONS, GET, HEAD, POST, PUT, DELETE, TRACE, MOVE, COPY, MKCOL, PROPFIND, PROPPATCH, SEARCH, LOCK or UNLOCK methods (since service startup). Average per minute. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(Total)\Other Request Methods/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Locked errors per second | The rate of errors due to requests that couldn't be satisfied by the server because the requested document was locked. These are generally reported as an HTTP 423 error code to the client. Average per minute. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(Total)\Locked Errors/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Not Found errors per second | The rate of errors due to requests that couldn't be satisfied by the server because the requested document could not be found. These are generally reported to the client with HTTP error code 404. Average per minute. |
ZABBIX_ACTIVE | perfcounteren["\Web Service(Total)\Not Found Errors/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Files cache hits percentage | The ratio of user-mode file cache hits to total cache requests (since service startup). Note: This value might be low if the Kernel URI cache hits percentage is high. |
ZABBIX_ACTIVE | perfcounteren["\Web Service Cache\File Cache Hits %"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
IIS | IIS: URIs cache hits percentage | The ratio of user-mode URI Cache Hits to total cache requests (since service startup) |
ZABBIX_ACTIVE | perfcounteren["\Web Service Cache\URI Cache Hits %"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
IIS | IIS: File cache misses | The total number of unsuccessful lookups in the user-mode file cache since service startup. |
ZABBIX_ACTIVE | perfcounteren["\Web Service Cache\File Cache Misses"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
IIS | IIS: URI cache misses | The total number of unsuccessful lookups in the user-mode URI cache since service startup. |
ZABBIX_ACTIVE | perfcounteren["\Web Service Cache\URI Cache Misses"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
IIS | IIS: {#APPPOOL} Uptime | The web application uptime period since the last restart. |
ZABBIX_ACTIVE | perfcounteren["\APPPOOLWAS({#APPPOOL})\Current Application Pool Uptime"] |
IIS | IIS: AppPool {#APPPOOL} state | The state of the application pool. |
ZABBIX_ACTIVE | perfcounteren["\APPPOOLWAS({#APPPOOL})\Current Application Pool State"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
IIS | IIS: AppPool {#APPPOOL} recycles | The number of times the application pool has been recycled since Windows Process Activation Service (WAS) started. |
ZABBIX_ACTIVE | perfcounteren["\APPPOOLWAS({#APPPOOL})\Total Application Pool Recycles"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
IIS | IIS: AppPool {#APPPOOL} current queue size | The number of requests in the queue. |
ZABBIX_ACTIVE | perfcounteren["\HTTP Service Request Queues({#APPPOOL})\CurrentQueueSize"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
IIS: The World Wide Web Publishing Service (W3SVC) is not running | The World Wide Web Publishing Service (W3SVC) is not in the running state. IIS cannot start. |
last(/IIS by Zabbix agent active/service.info[W3SVC])<>0 |
HIGH | Depends on: - IIS: Windows process Activation Service (WAS) is not running |
IIS: Windows process Activation Service (WAS) is not running | Windows Process Activation Service (WAS) is not in the running state. IIS cannot start. |
last(/IIS by Zabbix agent active/service.info[WAS])<>0 |
HIGH | |
IIS: Port {$IIS.PORT} is down | - |
last(/IIS by Zabbix agent active/net.tcp.service[{$IIS.SERVICE},,{$IIS.PORT}])=0 |
AVERAGE | Manual close: YES Depends on: - IIS: The World Wide Web Publishing Service (W3SVC) is not running |
IIS: Zabbix agent: active checks are not available | Active checks are considered unavailable. Agent is not sending heartbeat for prolonged time. |
min(/IIS by Zabbix agent active/zabbix[host,active_agent,available],{$AGENT.TIMEOUT})=2 |
HIGH | |
IIS: has been restarted | Uptime is less than 10 minutes. |
last(/IIS by Zabbix agent active/perf_counter_en["\Web Service(_Total)\Service Uptime"])<10m |
INFO | Manual close: YES |
IIS: {#APPPOOL} has been restarted | Uptime is less than 10 minutes. |
last(/IIS by Zabbix agent active/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Current Application Pool Uptime"])<10m |
INFO | Manual close: YES |
IIS: Application pool {#APPPOOL} is not in Running state | - |
last(/IIS by Zabbix agent active/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Current Application Pool State"])<>3 and {$IIS.APPPOOL.MONITORED:"{#APPPOOL}"}=1 |
HIGH | Depends on: - IIS: The World Wide Web Publishing Service (W3SVC) is not running |
IIS: Application pool {#APPPOOL} has been recycled | - |
last(/IIS by Zabbix agent active/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Total Application Pool Recycles"],#1)<>last(/IIS by Zabbix agent active/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Total Application Pool Recycles"],#2) and {$IIS.APPPOOL.MONITORED:"{#APPPOOL}"}=1 |
INFO | |
IIS: Request queue of {#APPPOOL} is too large | - |
min(/IIS by Zabbix agent active/perf_counter_en["\HTTP Service Request Queues({#APPPOOL})\CurrentQueueSize"],{$IIS.QUEUE.MAX.TIME})>{$IIS.QUEUE.MAX.WARN} |
WARNING | Depends on: - IIS: Application pool {#APPPOOL} is not in Running state |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
The template to monitor IIS (Internet Information Services) by Zabbix that works without any external scripts.
This template was tested on:
See Zabbix template operation for basic instructions.
You have to enable the following Windows Features (Control Panel > Programs and Features > Turn Windows features on or off) on your server
Web Server (IIS)
Web Server (IIS)\Management Tools\IIS Management Scripts and Tools
Optionally, it is possible to customize the template:
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$IIS.APPPOOL.MATCHES} | This macro is used in application pools discovery. Can be overridden on the host or linked template level. |
.+ |
{$IIS.APPPOOL.MONITORED} | Monitoring status for discovered application pools. Use context to avoid trigger firing for specific application pools. "1" - enabled, "0" - disabled. |
1 |
{$IIS.APPPOOL.NOT_MATCHES} | This macro is used in application pools discovery. Can be overridden on the host or linked template level. |
<CHANGE_IF_NEEDED> |
{$IIS.PORT} | Listening port. |
80 |
{$IIS.QUEUE.MAX.TIME} | The time during which the queue length may exceed the threshold. |
5m |
{$IIS.QUEUE.MAX.WARN} | Maximum application pool's request queue length for trigger expression. |
`` |
{$IIS.SERVICE} | The service (http/https/etc) for port check. See "net.tcp.service" documentation page for more information: https://www.zabbix.com/documentation/6.2/manual/config/items/itemtypes/simple_checks |
http |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Application pools discovery | - |
ZABBIX_PASSIVE | wmi.getall[root\webAdministration, select Name from ApplicationPool] Filter: AND- {#APPPOOL} NOTMATCHESREGEX - {#APPPOOL} MATCHES_REGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
IIS | IIS: World Wide Web Publishing Service (W3SVC) state | The World Wide Web Publishing Service (W3SVC) provides web connectivity and administration of websites through the IIS snap-in. If the World Wide Web Publishing Service stops, the operating system cannot serve any form of web request. This service was dependent on "Windows Process Activation Service". |
ZABBIX_PASSIVE | service.info[W3SVC] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
IIS | IIS: Windows Process Activation Service (WAS) state | Windows Process Activation Service (WAS) is a tool for managing worker processes that contain applications that host Windows Communication Foundation (WCF) services. Worker processes handle requests that are sent to a Web Server for specific application pools. Each application pool sets boundaries for the applications it contains. |
ZABBIX_PASSIVE | service.info[WAS] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
IIS | IIS: {$IIS.PORT} port ping | - |
SIMPLE | net.tcp.service[{$IIS.SERVICE},,{$IIS.PORT}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
IIS | IIS: Uptime | Service uptime in seconds. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(_Total)\Service Uptime"] |
IIS | IIS: Bytes Received per second | The average rate per minute at which data bytes are received by the service at the Application Layer. Does not include protocol headers or control bytes. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(_Total)\Bytes Received/sec", 60] |
IIS | IIS: Bytes Sent per second | The average rate per minute at which data bytes are sent by the service. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(_Total)\Bytes Sent/sec", 60] |
IIS | IIS: Bytes Total per second | The average rate per minute of total bytes/sec transferred by the Web service (sum of bytes sent/sec and bytes received/sec). |
ZABBIX_PASSIVE | perfcounteren["\Web Service(_Total)\Bytes Total/Sec", 60] |
IIS | IIS: Current connections | The number of active connections. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(_Total)\Current Connections"] |
IIS | IIS: Total connection attempts | The total number of connections to the Web or FTP service that have been attempted since service startup. The count is the total for all Web sites or FTP sites combined. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(Total)\Total Connection Attempts (all instances)"] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Connection attempts per second | The average rate per minute that connections using the Web service are being attempted. The count is the average for all Web sites combined. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(Total)\Connection Attempts/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Anonymous users per second | The number of requests from users over an anonymous connection per second. Average per minute. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(_Total)\Anonymous Users/sec", 60] |
IIS | IIS: NonAnonymous users per second | The number of requests from users over a non-anonymous connection per second. Average per minute. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(_Total)\NonAnonymous Users/sec", 60] |
IIS | IIS: Method GET requests per second | The rate of HTTP requests made using the GET method. GET requests are generally used for basic file retrievals or image maps, though they can be used with forms. Average per minute. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(Total)\Get Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method COPY requests per second | The rate of HTTP requests made using the COPY method. Copy requests are used for copying files and directories. Average per minute. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(Total)\Copy Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method CGI requests per second | The rate of CGI requests that are simultaneously being processed by the Web service. Average per minute. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(Total)\CGI Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method DELETE requests per second | The rate of HTTP requests using the DELETE method made. Average per minute. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(Total)\Delete Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method HEAD requests per second | The rate of HTTP requests using the HEAD method made. HEAD requests generally indicate a client is querying the state of a document they already have to see if it needs to be refreshed. Average per minute. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(Total)\Head Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method ISAPI requests per second | The rate of ISAPI Extension requests that are simultaneously being processed by the Web service. Average per minute. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(Total)\ISAPI Extension Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method LOCK requests per second | The rate of HTTP requests made using the LOCK method. Lock requests are used to lock a file for one user so that only that user can modify the file. Average per minute. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(Total)\Lock Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method MKCOL requests per second | The rate of HTTP requests using the MKCOL method made. Mkcol requests are used to create directories on the server. Average per minute. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(Total)\Mkcol Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method MOVE requests per second | The rate of HTTP requests using the MOVE method made. Move requests are used for moving files and directories. Average per minute. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(Total)\Move Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method OPTIONS requests per second | The rate of HTTP requests using the OPTIONS method made. Average per minute. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(Total)\Options Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method POST requests per second | Rate of HTTP requests using POST method. Generally used for forms or gateway requests. Average per minute. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(Total)\Post Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method PROPFIND requests per second | The rate of HTTP requests using the PROPFIND method made. Propfind requests retrieve property values on files and directories. Average per minute. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(Total)\Propfind Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method PROPPATCH requests per second | The rate of HTTP requests using the PROPPATCH method made. Proppatch requests set property values on files and directories. Average per minute. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(Total)\Proppatch Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method PUT requests per second | The rate of HTTP requests using the PUT method made. Average per minute. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(Total)\Put Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method MS-SEARCH requests per second | The rate of HTTP requests using the MS-SEARCH method made. Search requests are used to query the server to find resources that match a set of conditions provided by the client. Average per minute. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(Total)\Search Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method TRACE requests per second | The rate of HTTP requests using the TRACE method made. Average per minute. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(Total)\Trace Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method TRACE requests per second | The rate of HTTP requests using the UNLOCK method made. Unlock requests are used to remove locks from files. Average per minute. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(Total)\Unlock Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method Total requests per second | The rate of all HTTP requests received. Average per minute. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(Total)\Total Method Requests/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Method Total Other requests per second | Total Other Request Methods is the number of HTTP requests that are not OPTIONS, GET, HEAD, POST, PUT, DELETE, TRACE, MOVE, COPY, MKCOL, PROPFIND, PROPPATCH, SEARCH, LOCK or UNLOCK methods (since service startup). Average per minute. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(Total)\Other Request Methods/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Locked errors per second | The rate of errors due to requests that couldn't be satisfied by the server because the requested document was locked. These are generally reported as an HTTP 423 error code to the client. Average per minute. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(Total)\Locked Errors/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Not Found errors per second | The rate of errors due to requests that couldn't be satisfied by the server because the requested document could not be found. These are generally reported to the client with HTTP error code 404. Average per minute. |
ZABBIX_PASSIVE | perfcounteren["\Web Service(Total)\Not Found Errors/Sec", 60] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:10m |
IIS | IIS: Files cache hits percentage | The ratio of user-mode file cache hits to total cache requests (since service startup). Note: This value might be low if the Kernel URI cache hits percentage is high. |
ZABBIX_PASSIVE | perfcounteren["\Web Service Cache\File Cache Hits %"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
IIS | IIS: URIs cache hits percentage | The ratio of user-mode URI Cache Hits to total cache requests (since service startup) |
ZABBIX_PASSIVE | perfcounteren["\Web Service Cache\URI Cache Hits %"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
IIS | IIS: File cache misses | The total number of unsuccessful lookups in the user-mode file cache since service startup. |
ZABBIX_PASSIVE | perfcounteren["\Web Service Cache\File Cache Misses"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
IIS | IIS: URI cache misses | The total number of unsuccessful lookups in the user-mode URI cache since service startup. |
ZABBIX_PASSIVE | perfcounteren["\Web Service Cache\URI Cache Misses"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
IIS | IIS: {#APPPOOL} Uptime | The web application uptime period since the last restart. |
ZABBIX_PASSIVE | perfcounteren["\APPPOOLWAS({#APPPOOL})\Current Application Pool Uptime"] |
IIS | IIS: AppPool {#APPPOOL} state | The state of the application pool. |
ZABBIX_PASSIVE | perfcounteren["\APPPOOLWAS({#APPPOOL})\Current Application Pool State"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
IIS | IIS: AppPool {#APPPOOL} recycles | The number of times the application pool has been recycled since Windows Process Activation Service (WAS) started. |
ZABBIX_PASSIVE | perfcounteren["\APPPOOLWAS({#APPPOOL})\Total Application Pool Recycles"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
IIS | IIS: AppPool {#APPPOOL} current queue size | The number of requests in the queue. |
ZABBIX_PASSIVE | perfcounteren["\HTTP Service Request Queues({#APPPOOL})\CurrentQueueSize"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
IIS: The World Wide Web Publishing Service (W3SVC) is not running | The World Wide Web Publishing Service (W3SVC) is not in the running state. IIS cannot start. |
last(/IIS by Zabbix agent/service.info[W3SVC])<>0 |
HIGH | Depends on: - IIS: Windows process Activation Service (WAS) is not running |
IIS: Windows process Activation Service (WAS) is not running | Windows Process Activation Service (WAS) is not in the running state. IIS cannot start. |
last(/IIS by Zabbix agent/service.info[WAS])<>0 |
HIGH | |
IIS: Port {$IIS.PORT} is down | - |
last(/IIS by Zabbix agent/net.tcp.service[{$IIS.SERVICE},,{$IIS.PORT}])=0 |
AVERAGE | Manual close: YES Depends on: - IIS: The World Wide Web Publishing Service (W3SVC) is not running |
IIS: has been restarted | Uptime is less than 10 minutes. |
last(/IIS by Zabbix agent/perf_counter_en["\Web Service(_Total)\Service Uptime"])<10m |
INFO | Manual close: YES |
IIS: {#APPPOOL} has been restarted | Uptime is less than 10 minutes. |
last(/IIS by Zabbix agent/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Current Application Pool Uptime"])<10m |
INFO | Manual close: YES |
IIS: Application pool {#APPPOOL} is not in Running state | - |
last(/IIS by Zabbix agent/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Current Application Pool State"])<>3 and {$IIS.APPPOOL.MONITORED:"{#APPPOOL}"}=1 |
HIGH | Depends on: - IIS: The World Wide Web Publishing Service (W3SVC) is not running |
IIS: Application pool {#APPPOOL} has been recycled | - |
last(/IIS by Zabbix agent/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Total Application Pool Recycles"],#1)<>last(/IIS by Zabbix agent/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Total Application Pool Recycles"],#2) and {$IIS.APPPOOL.MONITORED:"{#APPPOOL}"}=1 |
INFO | |
IIS: Request queue of {#APPPOOL} is too large | - |
min(/IIS by Zabbix agent/perf_counter_en["\HTTP Service Request Queues({#APPPOOL})\CurrentQueueSize"],{$IIS.QUEUE.MAX.TIME})>{$IIS.QUEUE.MAX.WARN} |
WARNING | Depends on: - IIS: Application pool {#APPPOOL} is not in Running state |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
Refer to the vendor documentation.
No specific Zabbix configuration is required.
There are no template links in this template.
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Services | HTTPS service is running | - |
SIMPLE | net.tcp.service[https] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HTTPS service is down on {HOST.NAME} | - |
max(/HTTPS Service/net.tcp.service[https],#3)=0 |
AVERAGE |
Please report any issues with the template at https://support.zabbix.com
For Zabbix version: 6.2 and higher
Refer to the vendor documentation.
No specific Zabbix configuration is required.
There are no template links in this template.
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Services | HTTP service is running | - |
SIMPLE | net.tcp.service[http] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HTTP service is down on {HOST.NAME} | - |
max(/HTTP Service/net.tcp.service[http],#3)=0 |
AVERAGE |
Please report any issues with the template at https://support.zabbix.com
For Zabbix version: 6.2 and higher
The template to monitor HAProxy by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template HAProxy by HTTP
collects metrics by polling HAProxy Stats Page with HTTP agent remotely.
Note that this solution supports https and redirects.
This template was tested on:
See Zabbix template operation for basic instructions.
Setup HAProxy Stats Page.
Example configuration of HAProxy:
frontend stats
bind *:8404
stats enable
stats uri /stats
stats refresh 10s
#stats auth Username:Password # Authentication credentials
If you use another location, don't forget to change the macros {$HAPROXY.STATS.SCHEME},{HOST.CONN}, {$HAPROXY.STATS.PORT},{$HAPROXY.STATS.PATH}.
If you want to use authentication, set the username and password in the "stats auth" option of the configuration file and in the macros {$HAPROXY.USERNAME},{$HAPROXY.PASSWORD}.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$HAPROXY.BACK_ERESP.MAX.WARN} | Maximum of responses with error on Backend for trigger expression. |
10 |
{$HAPROXY.BACK_QCUR.MAX.WARN} | Maximum number of requests on Backend unassigned in queue for trigger expression. |
10 |
{$HAPROXY.BACK_QTIME.MAX.WARN} | Maximum of average time spent in queue on Backend for trigger expression. |
10s |
{$HAPROXY.BACK_RTIME.MAX.WARN} | Maximum of average Backend response time for trigger expression. |
10s |
{$HAPROXY.FRONT_DREQ.MAX.WARN} | The HAProxy maximum denied requests for trigger expression. |
10 |
{$HAPROXY.FRONT_EREQ.MAX.WARN} | The HAProxy maximum number of request errors for trigger expression. |
10 |
{$HAPROXY.FRONT_SUTIL.MAX.WARN} | Maximum of session usage percentage on frontend for trigger expression. |
80 |
{$HAPROXY.PASSWORD} | The password of the HAProxy stats page. |
`` |
{$HAPROXY.RESPONSE_TIME.MAX.WARN} | The HAProxy stats page maximum response time in seconds for trigger expression. |
10s |
{$HAPROXY.SERVER_ERESP.MAX.WARN} | Maximum of responses with error on server for trigger expression. |
10 |
{$HAPROXY.SERVER_QCUR.MAX.WARN} | Maximum number of requests on server unassigned in queue for trigger expression. |
10 |
{$HAPROXY.SERVER_QTIME.MAX.WARN} | Maximum of average time spent in queue on server for trigger expression. |
10s |
{$HAPROXY.SERVER_RTIME.MAX.WARN} | Maximum of average server response time for trigger expression. |
10s |
{$HAPROXY.STATS.PATH} | The path of the HAProxy stats page. |
stats |
{$HAPROXY.STATS.PORT} | The port of the HAProxy stats host or container. |
8404 |
{$HAPROXY.STATS.SCHEME} | The scheme of HAProxy stats page(http/https). |
http |
{$HAPROXY.USERNAME} | The username of the HAProxy stats page. |
`` |
There are no template links in this template.
Name | Description | Type | Key and additional info | ||
---|---|---|---|---|---|
Backend discovery | Discovery backends |
DEPENDENT | haproxy.backend.discovery Filter: AND- {#SVNAME} MATCHESREGEX - {#MODE} MATCHESREGEX `http |
tcp</p><p>**Overrides:**</p><p>Discard HTTP status codes<br> - {#MODE} MATCHES_REGEX tcp<br> - ITEM_PROTOTYPE LIKE Number of responses with codes` - NO_DISCOVER |
|
Frontend discovery | Discovery frontends |
DEPENDENT | haproxy.frontend.discovery Filter: AND- {#SVNAME} MATCHESREGEX - {#MODE} MATCHESREGEX `http |
tcp</p><p>**Overrides:**</p><p>Discard HTTP status codes<br> - {#MODE} MATCHES_REGEX tcp<br> - ITEM_PROTOTYPE LIKE Number of responses with codes` - NO_DISCOVER |
|
Server discovery | Discovery servers |
DEPENDENT | haproxy.server.discovery Filter: AND- {#SVNAME} NOTMATCHESREGEX `FRONTEND |
BACKEND</p><p>- {#MODE} MATCHES_REGEX http |
tcp</p><p>**Overrides:**</p><p>Discard HTTP status codes<br> - {#MODE} MATCHES_REGEX tcp<br> - ITEM_PROTOTYPE LIKE Number of responses with codes` - NO_DISCOVER |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
HAProxy | HAProxy: Version | - |
DEPENDENT | haproxy.version Preprocessing: - REGEX: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
HAProxy | HAProxy: Uptime | - |
DEPENDENT | haproxy.uptime Preprocessing: - JAVASCRIPT: |
HAProxy | HAProxy: Service status | - |
SIMPLE | net.tcp.service["{$HAPROXY.STATS.SCHEME}","{HOST.CONN}","{$HAPROXY.STATS.PORT}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
HAProxy | HAProxy: Service response time | - |
SIMPLE | net.tcp.service.perf["{$HAPROXY.STATS.SCHEME}","{HOST.CONN}","{$HAPROXY.STATS.PORT}"] |
HAProxy | HAProxy Backend {#PXNAME}: Status | Possible values: UP - The server is reporting as healthy. DOWN - The server is reporting as unhealthy and unable to receive requests. NOLB - You've added http-check disable-on-404 to the backend and the health checked URL has returned an HTTP 404 response. MAINT - The server has been disabled or put into maintenance mode. DRAIN - The server has been put into drain mode. no check - Health checks are not enabled for this server. |
DEPENDENT | haproxy.backend.status[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
HAProxy | HAProxy Backend {#PXNAME}: Responses time | Average backend response time (in ms) for the last 1,024 requests |
DEPENDENT | haproxy.backend.rtime[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - MULTIPLIER: |
HAProxy | HAProxy Backend {#PXNAME}: Errors connection per second | Number of requests that encountered an error attempting to connect to a backend server. |
DEPENDENT | haproxy.backend.econ.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy Backend {#PXNAME}: Responses denied per second | Responses denied due to security concerns (ACL-restricted). |
DEPENDENT | haproxy.backend.dresp.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy Backend {#PXNAME}: Response errors per second | Number of requests whose responses yielded an error |
DEPENDENT | haproxy.backend.eresp.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy Backend {#PXNAME}: Unassigned requests | Current number of requests unassigned in queue. |
DEPENDENT | haproxy.backend.qcur[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: |
HAProxy | HAProxy Backend {#PXNAME}: Time in queue | Average time spent in queue (in ms) for the last 1,024 requests |
DEPENDENT | haproxy.backend.qtime[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - MULTIPLIER: |
HAProxy | HAProxy Backend {#PXNAME}: Redispatched requests per second | Number of times a request was redispatched to a different backend. |
DEPENDENT | haproxy.backend.wredis.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy Backend {#PXNAME}: Retried connections per second | Number of times a connection was retried. |
DEPENDENT | haproxy.backend.wretr.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy Backend {#PXNAME}: Number of responses with codes 1xx per second | Number of informational HTTP responses per second. |
DEPENDENT | haproxy.backend.hrsp1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy Backend {#PXNAME}: Number of responses with codes 2xx per second | Number of successful HTTP responses per second. |
DEPENDENT | haproxy.backend.hrsp2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy Backend {#PXNAME}: Number of responses with codes 3xx per second | Number of HTTP redirections per second. |
DEPENDENT | haproxy.backend.hrsp3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy Backend {#PXNAME}: Number of responses with codes 4xx per second | Number of HTTP client errors per second. |
DEPENDENT | haproxy.backend.hrsp4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy Backend {#PXNAME}: Number of responses with codes 5xx per second | Number of HTTP server errors per second. |
DEPENDENT | haproxy.backend.hrsp5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy Backend {#PXNAME}: Incoming traffic | Number of bits received by the backend |
DEPENDENT | haproxy.backend.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - MULTIPLIER: - CHANGEPERSECOND |
HAProxy | HAProxy Backend {#PXNAME}: Outgoing traffic | Number of bits sent by the backend |
DEPENDENT | haproxy.backend.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - MULTIPLIER: - CHANGEPERSECOND |
HAProxy | HAProxy Backend {#PXNAME}: Number of active servers | Number of active servers. |
DEPENDENT | haproxy.backend.act[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
HAProxy | HAProxy Backend {#PXNAME}: Number of backup servers | Number of backup servers. |
DEPENDENT | haproxy.backend.bck[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
HAProxy | HAProxy Backend {#PXNAME}: Sessions per second | Cumulative number of sessions (end-to-end connections) per second. |
DEPENDENT | haproxy.backend.stot.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy Backend {#PXNAME}: Weight | Total effective weight. |
DEPENDENT | haproxy.backend.weight[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
HAProxy | HAProxy Frontend {#PXNAME}: Status | Possible values: OPEN, STOP. When Status is OPEN, the frontend is operating normally and ready to receive traffic. |
DEPENDENT | haproxy.frontend.status[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
HAProxy | HAProxy Frontend {#PXNAME}: Requests rate | HTTP requests per second |
DEPENDENT | haproxy.frontend.req_rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: |
HAProxy | HAProxy Frontend {#PXNAME}: Sessions rate | Number of sessions created per second |
DEPENDENT | haproxy.frontend.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: |
HAProxy | HAProxy Frontend {#PXNAME}: Established sessions | The current number of established sessions. |
DEPENDENT | haproxy.frontend.scur[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: |
HAProxy | HAProxy Frontend {#PXNAME}: Session limits | The most simultaneous sessions that are allowed, as defined by the maxconn setting in the frontend. |
DEPENDENT | haproxy.frontend.slim[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
HAProxy | HAProxy Frontend {#PXNAME}: Session utilization | Percentage of sessions used (scur / slim * 100). |
CALCULATED | haproxy.frontend.sutil[{#PXNAME},{#SVNAME}] Expression: last(//haproxy.frontend.scur[{#PXNAME},{#SVNAME}]) / last(//haproxy.frontend.slim[{#PXNAME},{#SVNAME}]) * 100 |
HAProxy | HAProxy Frontend {#PXNAME}: Request errors per second | Number of request errors per second. |
DEPENDENT | haproxy.frontend.ereq.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy Frontend {#PXNAME}: Denied requests per second | Requests denied due to security concerns (ACL-restricted) per second. |
DEPENDENT | haproxy.frontend.dreq.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy Frontend {#PXNAME}: Number of responses with codes 1xx per second | Number of informational HTTP responses per second. |
DEPENDENT | haproxy.frontend.hrsp1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy Frontend {#PXNAME}: Number of responses with codes 2xx per second | Number of successful HTTP responses per second. |
DEPENDENT | haproxy.frontend.hrsp2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy Frontend {#PXNAME}: Number of responses with codes 3xx per second | Number of HTTP redirections per second. |
DEPENDENT | haproxy.frontend.hrsp3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy Frontend {#PXNAME}: Number of responses with codes 4xx per second | Number of HTTP client errors per second. |
DEPENDENT | haproxy.frontend.hrsp4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy Frontend {#PXNAME}: Number of responses with codes 5xx per second | Number of HTTP server errors per second. |
DEPENDENT | haproxy.frontend.hrsp5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy Frontend {#PXNAME}: Incoming traffic | Number of bits received by the frontend |
DEPENDENT | haproxy.frontend.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - MULTIPLIER: - CHANGEPERSECOND |
HAProxy | HAProxy Frontend {#PXNAME}: Outgoing traffic | Number of bits sent by the frontend |
DEPENDENT | haproxy.frontend.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - MULTIPLIER: - CHANGEPERSECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Status | DEPENDENT | haproxy.server.status[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
|
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Responses time | Average server response time (in ms) for the last 1,024 requests. |
DEPENDENT | haproxy.server.rtime[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - MULTIPLIER: |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Errors connection per second | Number of requests that encountered an error attempting to connect to a backend server. |
DEPENDENT | haproxy.server.econ.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Responses denied per second | Responses denied due to security concerns (ACL-restricted). |
DEPENDENT | haproxy.server.dresp.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Response errors per second | Number of requests whose responses yielded an error. |
DEPENDENT | haproxy.server.eresp.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Unassigned requests | Current number of requests unassigned in queue. |
DEPENDENT | haproxy.server.qcur[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Time in queue | Average time spent in queue (in ms) for the last 1,024 requests. |
DEPENDENT | haproxy.server.qtime[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - MULTIPLIER: |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Redispatched requests per second | Number of times a request was redispatched to a different backend. |
DEPENDENT | haproxy.server.wredis.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Retried connections per second | Number of times a connection was retried. |
DEPENDENT | haproxy.server.wretr.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 1xx per second | Number of informational HTTP responses per second. |
DEPENDENT | haproxy.server.hrsp1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 2xx per second | Number of successful HTTP responses per second. |
DEPENDENT | haproxy.server.hrsp2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 3xx per second | Number of HTTP redirections per second. |
DEPENDENT | haproxy.server.hrsp3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 4xx per second | Number of HTTP client errors per second. |
DEPENDENT | haproxy.server.hrsp4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 5xx per second | Number of HTTP server errors per second. |
DEPENDENT | haproxy.server.hrsp5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Incoming traffic | Number of bits received by the backend |
DEPENDENT | haproxy.server.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - MULTIPLIER: - CHANGEPERSECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Outgoing traffic | Number of bits sent by the backend |
DEPENDENT | haproxy.server.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - MULTIPLIER: - CHANGEPERSECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Server is active | Shows whether the server is active (marked with a Y) or a backup (marked with a -). |
DEPENDENT | haproxy.server.act[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Server is backup | Shows whether the server is a backup (marked with a Y) or active (marked with a -). |
DEPENDENT | haproxy.server.bck[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Sessions per second | Cumulative number of sessions (end-to-end connections) per second. |
DEPENDENT | haproxy.server.stot.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Weight | Effective weight. |
DEPENDENT | haproxy.server.weight[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Configured maxqueue | Configured maxqueue for the server, or nothing in the value is 0 (default, meaning no limit). |
DEPENDENT | haproxy.server.qlimit[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: - MATCHESREGEX: ⛔️ONFAIL: |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Server was selected per second | Number of times that server was selected. |
DEPENDENT | haproxy.server.lbtot.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Status of last health check | Status of last health check, one of: UNK -> unknown INI -> initializing SOCKERR -> socket error L4OK -> check passed on layer 4, no upper layers testing enabled L4TOUT -> layer 1-4 timeout L4CON -> layer 1-4 connection problem, for example "Connection refused" (tcp rst) or "No route to host" (icmp) L6OK -> check passed on layer 6 L6TOUT -> layer 6 (SSL) timeout L6RSP -> layer 6 invalid response - protocol error L7OK -> check passed on layer 7 L7OKC -> check conditionally passed on layer 7, for example 404 with disable-on-404 L7TOUT -> layer 7 (HTTP/SMTP) timeout L7RSP -> layer 7 invalid response - protocol error L7STS -> layer 7 response error, for example HTTP 5xx Notice: If a check is currently running, the last known status will be reported, prefixed with "* ". e. g. "* L7OK". |
DEPENDENT | haproxy.server.checkstatus[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:10m |
Zabbix raw items | HAProxy: Get stats | HAProxy Statistics Report in CSV format |
HTTP_AGENT | haproxy.get Preprocessing: - REGEX: - CSVTOJSON: |
Zabbix raw items | HAProxy: Get nodes | Array for LLD rules. |
DEPENDENT | haproxy.get.nodes Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
Zabbix raw items | HAProxy: Get stats page | HAProxy Statistics Report HTML |
HTTP_AGENT | haproxy.get_html |
Name | Description | Expression | Severity | Dependencies and additional info | |
---|---|---|---|---|---|
HAProxy: Version has changed | HAProxy version has changed. Ack to close. |
last(/HAProxy by HTTP/haproxy.version,#1)<>last(/HAProxy by HTTP/haproxy.version,#2) and length(last(/HAProxy by HTTP/haproxy.version))>0 |
INFO | Manual close: YES |
|
HAProxy: has been restarted | Uptime is less than 10 minutes |
last(/HAProxy by HTTP/haproxy.uptime)<10m |
INFO | Manual close: YES |
|
HAProxy: Service is down | - |
last(/HAProxy by HTTP/net.tcp.service["{$HAPROXY.STATS.SCHEME}","{HOST.CONN}","{$HAPROXY.STATS.PORT}"])=0 |
AVERAGE | Manual close: YES |
|
HAProxy: Service response time is too high | - |
min(/HAProxy by HTTP/net.tcp.service.perf["{$HAPROXY.STATS.SCHEME}","{HOST.CONN}","{$HAPROXY.STATS.PORT}"],5m)>{$HAPROXY.RESPONSE_TIME.MAX.WARN} |
WARNING | Manual close: YES Depends on: - HAProxy: Service is down |
|
HAProxy backend {#PXNAME}: Server is DOWN | Backend is not available. |
count(/HAProxy by HTTP/haproxy.backend.status[{#PXNAME},{#SVNAME}],#5,"eq","DOWN")=5 |
AVERAGE | ||
HAProxy backend {#PXNAME}: Average response time is high | Average backend response time (in ms) for the last 1,024 requests is more than {$HAPROXY.BACK_RTIME.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.backend.rtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_RTIME.MAX.WARN} |
WARNING | ||
HAProxy backend {#PXNAME}: Number of responses with error is high | Number of requests on backend, whose responses yielded an error, is more than {$HAPROXY.BACK_ERESP.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.backend.eresp.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_ERESP.MAX.WARN} |
WARNING | ||
HAProxy backend {#PXNAME}: Current number of requests unassigned in queue is high | Current number of requests on backend unassigned in queue is more than {$HAPROXY.BACK_QCUR.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.backend.qcur[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_QCUR.MAX.WARN} |
WARNING | ||
HAProxy backend {#PXNAME}: Average time spent in queue is high | Average time spent in queue (in ms) for the last 1,024 requests is more than {$HAPROXY.BACK_QTIME.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.backend.qtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_QTIME.MAX.WARN} |
WARNING | ||
HAProxy frontend {#PXNAME}: Session utilization is high | Alerting on this metric is essential to ensure your server has sufficient capacity to handle all concurrent sessions. Unlike requests, upon reaching the session limit HAProxy will deny additional clients until resource consumption drops. Furthermore, if you find your session usage percentage to be hovering above 80%, it could be time to either modify HAProxy's configuration to allow more sessions, or migrate your HAProxy server to a bigger box. |
min(/HAProxy by HTTP/haproxy.frontend.sutil[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_SUTIL.MAX.WARN} |
WARNING | ||
HAProxy frontend {#PXNAME}: Number of request errors is high | Number of request errors is more than {$HAPROXY.FRONT_EREQ.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.frontend.ereq.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_EREQ.MAX.WARN} |
WARNING | ||
HAProxy frontend {#PXNAME}: Number of requests denied is high | Number of requests denied due to security concerns (ACL-restricted) is more than {$HAPROXY.FRONT_DREQ.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.frontend.dreq.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_DREQ.MAX.WARN} |
WARNING | ||
HAProxy {#PXNAME} {#SVNAME}: Server is DOWN | Server is not available. |
count(/HAProxy by HTTP/haproxy.server.status[{#PXNAME},{#SVNAME}],#5,"eq","DOWN")=5 |
WARNING | ||
HAProxy {#PXNAME} {#SVNAME}: Average response time is high | Average server response time (in ms) for the last 1,024 requests is more than {$HAPROXY.SERVER_RTIME.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.server.rtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_RTIME.MAX.WARN} |
WARNING | ||
HAProxy {#PXNAME} {#SVNAME}: Number of responses with error is high | Number of requests on server, whose responses yielded an error, is more than {$HAPROXY.SERVER_ERESP.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.server.eresp.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_ERESP.MAX.WARN} |
WARNING | ||
HAProxy {#PXNAME} {#SVNAME}: Current number of requests unassigned in queue is high | Current number of requests unassigned in queue is more than {$HAPROXY.SERVER_QCUR.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.server.qcur[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_QCUR.MAX.WARN} |
WARNING | ||
HAProxy {#PXNAME} {#SVNAME}: Average time spent in queue is high | Average time spent in queue (in ms) for the last 1,024 requests is more than {$HAPROXY.SERVER_QTIME.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.server.qtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_QTIME.MAX.WARN} |
WARNING | ||
HAProxy {#PXNAME} {#SVNAME}: Health check error | Please check the server for faults. |
`find(/HAProxy by HTTP/haproxy.server.check_status[{#PXNAME},{#SVNAME}],#3,"regexp","(?:L[4-7]OK | ^$)")=0` | WARNING | Depends on: - HAProxy {#PXNAME} {#SVNAME}: Server is DOWN |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
The template to monitor HAProxy by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template HAProxy by Zabbix agent
collects metrics by polling HAProxy Stats Page with Zabbix agent.
Note that this solution supports https and redirects.
This template was tested on:
See Zabbix template operation for basic instructions.
Setup HAProxy Stats Page.
Example configuration of HAProxy:
frontend stats
bind *:8404
stats enable
stats uri /stats
stats refresh 10s
If you use another location, don't forget to change the macros {$HAPROXY.STATS.SCHEME},{$HAPROXY.STATS.PORT},{$HAPROXY.STATS.PATH}.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$HAPROXY.BACK_ERESP.MAX.WARN} | Maximum of responses with error on BACKEND for trigger expression. |
10 |
{$HAPROXY.BACK_QCUR.MAX.WARN} | Maximum number of requests on BACKEND unassigned in queue for trigger expression. |
10 |
{$HAPROXY.BACK_QTIME.MAX.WARN} | Maximum of average time spent in queue on BACKEND for trigger expression. |
10s |
{$HAPROXY.BACK_RTIME.MAX.WARN} | Maximum of average BACKEND response time for trigger expression. |
10s |
{$HAPROXY.FRONT_DREQ.MAX.WARN} | The HAProxy maximum denied requests for trigger expression. |
10 |
{$HAPROXY.FRONT_EREQ.MAX.WARN} | The HAProxy maximum number of request errors for trigger expression. |
10 |
{$HAPROXY.FRONT_SUTIL.MAX.WARN} | Maximum of session usage percentage on frontend for trigger expression. |
80 |
{$HAPROXY.RESPONSE_TIME.MAX.WARN} | The HAProxy stats page maximum response time in seconds for trigger expression. |
10s |
{$HAPROXY.SERVER_ERESP.MAX.WARN} | Maximum of responses with error on server for trigger expression. |
10 |
{$HAPROXY.SERVER_QCUR.MAX.WARN} | Maximum number of requests on server unassigned in queue for trigger expression. |
10 |
{$HAPROXY.SERVER_QTIME.MAX.WARN} | Maximum of average time spent in queue on server for trigger expression. |
10s |
{$HAPROXY.SERVER_RTIME.MAX.WARN} | Maximum of average server response time for trigger expression. |
10s |
{$HAPROXY.STATS.PATH} | The path of HAProxy stats page. |
stats |
{$HAPROXY.STATS.PORT} | The port of the HAProxy stats host or container. |
8404 |
{$HAPROXY.STATS.SCHEME} | The scheme of HAProxy stats page(http/https). |
http |
There are no template links in this template.
Name | Description | Type | Key and additional info | ||
---|---|---|---|---|---|
Backend discovery | Discovery backends |
DEPENDENT | haproxy.backend.discovery Filter: AND- {#SVNAME} MATCHESREGEX - {#MODE} MATCHESREGEX `http |
tcp</p><p>**Overrides:**</p><p>Discard HTTP status codes<br> - {#MODE} MATCHES_REGEX tcp<br> - ITEM_PROTOTYPE LIKE Number of responses with codes` - NO_DISCOVER |
|
Frontend discovery | Discovery frontends |
DEPENDENT | haproxy.frontend.discovery Filter: AND- {#SVNAME} MATCHESREGEX - {#MODE} MATCHESREGEX `http |
tcp</p><p>**Overrides:**</p><p>Discard HTTP status codes<br> - {#MODE} MATCHES_REGEX tcp<br> - ITEM_PROTOTYPE LIKE Number of responses with codes` - NO_DISCOVER |
|
Server discovery | Discovery servers |
DEPENDENT | haproxy.server.discovery Filter: AND- {#SVNAME} NOTMATCHESREGEX `FRONTEND |
BACKEND</p><p>- {#MODE} MATCHES_REGEX http |
tcp</p><p>**Overrides:**</p><p>Discard HTTP status codes<br> - {#MODE} MATCHES_REGEX tcp<br> - ITEM_PROTOTYPE LIKE Number of responses with codes` - NO_DISCOVER |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
HAProxy | HAProxy: Version | - |
DEPENDENT | haproxy.version Preprocessing: - REGEX: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
HAProxy | HAProxy: Uptime | - |
DEPENDENT | haproxy.uptime Preprocessing: - JAVASCRIPT: |
HAProxy | HAProxy: Service status | - |
ZABBIX_PASSIVE | net.tcp.service["{$HAPROXY.STATS.SCHEME}","{HOST.CONN}","{$HAPROXY.STATS.PORT}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
HAProxy | HAProxy: Service response time | - |
ZABBIX_PASSIVE | net.tcp.service.perf["{$HAPROXY.STATS.SCHEME}","{HOST.CONN}","{$HAPROXY.STATS.PORT}"] |
HAProxy | HAProxy Backend {#PXNAME}: Status | Possible values: UP - The server is reporting as healthy. DOWN - The server is reporting as unhealthy and unable to receive requests. NOLB - You've added http-check disable-on-404 to the backend and the health checked URL has returned an HTTP 404 response. MAINT - The server has been disabled or put into maintenance mode. DRAIN - The server has been put into drain mode. no check - Health checks are not enabled for this server. |
DEPENDENT | haproxy.backend.status[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
HAProxy | HAProxy Backend {#PXNAME}: Responses time | Average backend response time (in ms) for the last 1,024 requests |
DEPENDENT | haproxy.backend.rtime[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - MULTIPLIER: |
HAProxy | HAProxy Backend {#PXNAME}: Errors connection per second | Number of requests that encountered an error attempting to connect to a backend server. |
DEPENDENT | haproxy.backend.econ.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy Backend {#PXNAME}: Responses denied per second | Responses denied due to security concerns (ACL-restricted). |
DEPENDENT | haproxy.backend.dresp.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy Backend {#PXNAME}: Response errors per second | Number of requests whose responses yielded an error |
DEPENDENT | haproxy.backend.eresp.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy Backend {#PXNAME}: Unassigned requests | Current number of requests unassigned in queue. |
DEPENDENT | haproxy.backend.qcur[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: |
HAProxy | HAProxy Backend {#PXNAME}: Time in queue | Average time spent in queue (in ms) for the last 1,024 requests |
DEPENDENT | haproxy.backend.qtime[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - MULTIPLIER: |
HAProxy | HAProxy Backend {#PXNAME}: Redispatched requests per second | Number of times a request was redispatched to a different backend. |
DEPENDENT | haproxy.backend.wredis.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy Backend {#PXNAME}: Retried connections per second | Number of times a connection was retried. |
DEPENDENT | haproxy.backend.wretr.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy Backend {#PXNAME}: Number of responses with codes 1xx per second | Number of informational HTTP responses per second. |
DEPENDENT | haproxy.backend.hrsp1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy Backend {#PXNAME}: Number of responses with codes 2xx per second | Number of successful HTTP responses per second. |
DEPENDENT | haproxy.backend.hrsp2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy Backend {#PXNAME}: Number of responses with codes 3xx per second | Number of HTTP redirections per second. |
DEPENDENT | haproxy.backend.hrsp3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy Backend {#PXNAME}: Number of responses with codes 4xx per second | Number of HTTP client errors per second. |
DEPENDENT | haproxy.backend.hrsp4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy Backend {#PXNAME}: Number of responses with codes 5xx per second | Number of HTTP server errors per second. |
DEPENDENT | haproxy.backend.hrsp5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy Backend {#PXNAME}: Incoming traffic | Number of bits received by the backend |
DEPENDENT | haproxy.backend.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - MULTIPLIER: - CHANGEPERSECOND |
HAProxy | HAProxy Backend {#PXNAME}: Outgoing traffic | Number of bits sent by the backend |
DEPENDENT | haproxy.backend.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - MULTIPLIER: - CHANGEPERSECOND |
HAProxy | HAProxy Backend {#PXNAME}: Number of active servers | Number of active servers. |
DEPENDENT | haproxy.backend.act[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
HAProxy | HAProxy Backend {#PXNAME}: Number of backup servers | Number of backup servers. |
DEPENDENT | haproxy.backend.bck[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
HAProxy | HAProxy Backend {#PXNAME}: Sessions per second | Cumulative number of sessions (end-to-end connections) per second. |
DEPENDENT | haproxy.backend.stot.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy Backend {#PXNAME}: Weight | Total effective weight. |
DEPENDENT | haproxy.backend.weight[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
HAProxy | HAProxy Frontend {#PXNAME}: Status | Possible values: OPEN, STOP. When Status is OPEN, the frontend is operating normally and ready to receive traffic. |
DEPENDENT | haproxy.frontend.status[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
HAProxy | HAProxy Frontend {#PXNAME}: Requests rate | HTTP requests per second |
DEPENDENT | haproxy.frontend.req_rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: |
HAProxy | HAProxy Frontend {#PXNAME}: Sessions rate | Number of sessions created per second |
DEPENDENT | haproxy.frontend.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: |
HAProxy | HAProxy Frontend {#PXNAME}: Established sessions | The current number of established sessions. |
DEPENDENT | haproxy.frontend.scur[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: |
HAProxy | HAProxy Frontend {#PXNAME}: Session limits | The most simultaneous sessions that are allowed, as defined by the maxconn setting in the frontend. |
DEPENDENT | haproxy.frontend.slim[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
HAProxy | HAProxy Frontend {#PXNAME}: Session utilization | Percentage of sessions used (scur / slim * 100). |
CALCULATED | haproxy.frontend.sutil[{#PXNAME},{#SVNAME}] Expression: last(//haproxy.frontend.scur[{#PXNAME},{#SVNAME}]) / last(//haproxy.frontend.slim[{#PXNAME},{#SVNAME}]) * 100 |
HAProxy | HAProxy Frontend {#PXNAME}: Request errors per second | Number of request errors per second. |
DEPENDENT | haproxy.frontend.ereq.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy Frontend {#PXNAME}: Denied requests per second | Requests denied due to security concerns (ACL-restricted) per second. |
DEPENDENT | haproxy.frontend.dreq.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy Frontend {#PXNAME}: Number of responses with codes 1xx per second | Number of informational HTTP responses per second. |
DEPENDENT | haproxy.frontend.hrsp1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy Frontend {#PXNAME}: Number of responses with codes 2xx per second | Number of successful HTTP responses per second. |
DEPENDENT | haproxy.frontend.hrsp2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy Frontend {#PXNAME}: Number of responses with codes 3xx per second | Number of HTTP redirections per second. |
DEPENDENT | haproxy.frontend.hrsp3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy Frontend {#PXNAME}: Number of responses with codes 4xx per second | Number of HTTP client errors per second. |
DEPENDENT | haproxy.frontend.hrsp4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy Frontend {#PXNAME}: Number of responses with codes 5xx per second | Number of HTTP server errors per second. |
DEPENDENT | haproxy.frontend.hrsp5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy Frontend {#PXNAME}: Incoming traffic | Number of bits received by the frontend |
DEPENDENT | haproxy.frontend.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - MULTIPLIER: - CHANGEPERSECOND |
HAProxy | HAProxy Frontend {#PXNAME}: Outgoing traffic | Number of bits sent by the frontend |
DEPENDENT | haproxy.frontend.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - MULTIPLIER: - CHANGEPERSECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Status | DEPENDENT | haproxy.server.status[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
|
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Responses time | Average server response time (in ms) for the last 1,024 requests. |
DEPENDENT | haproxy.server.rtime[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - MULTIPLIER: |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Errors connection per second | Number of requests that encountered an error attempting to connect to a backend server. |
DEPENDENT | haproxy.server.econ.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Responses denied per second | Responses denied due to security concerns (ACL-restricted). |
DEPENDENT | haproxy.server.dresp.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Response errors per second | Number of requests whose responses yielded an error. |
DEPENDENT | haproxy.server.eresp.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Unassigned requests | Current number of requests unassigned in queue. |
DEPENDENT | haproxy.server.qcur[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Time in queue | Average time spent in queue (in ms) for the last 1,024 requests. |
DEPENDENT | haproxy.server.qtime[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - MULTIPLIER: |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Redispatched requests per second | Number of times a request was redispatched to a different backend. |
DEPENDENT | haproxy.server.wredis.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Retried connections per second | Number of times a connection was retried. |
DEPENDENT | haproxy.server.wretr.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 1xx per second | Number of informational HTTP responses per second. |
DEPENDENT | haproxy.server.hrsp1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 2xx per second | Number of successful HTTP responses per second. |
DEPENDENT | haproxy.server.hrsp2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 3xx per second | Number of HTTP redirections per second. |
DEPENDENT | haproxy.server.hrsp3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 4xx per second | Number of HTTP client errors per second. |
DEPENDENT | haproxy.server.hrsp4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 5xx per second | Number of HTTP server errors per second. |
DEPENDENT | haproxy.server.hrsp5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Incoming traffic | Number of bits received by the backend |
DEPENDENT | haproxy.server.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - MULTIPLIER: - CHANGEPERSECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Outgoing traffic | Number of bits sent by the backend |
DEPENDENT | haproxy.server.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - MULTIPLIER: - CHANGEPERSECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Server is active | Shows whether the server is active (marked with a Y) or a backup (marked with a -). |
DEPENDENT | haproxy.server.act[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Server is backup | Shows whether the server is a backup (marked with a Y) or active (marked with a -). |
DEPENDENT | haproxy.server.bck[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Sessions per second | Cumulative number of sessions (end-to-end connections) per second. |
DEPENDENT | haproxy.server.stot.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Weight | Effective weight. |
DEPENDENT | haproxy.server.weight[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Configured maxqueue | Configured maxqueue for the server, or nothing in the value is 0 (default, meaning no limit). |
DEPENDENT | haproxy.server.qlimit[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: - MATCHESREGEX: ⛔️ONFAIL: |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Server was selected per second | Number of times that server was selected. |
DEPENDENT | haproxy.server.lbtot.rate[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
HAProxy | HAProxy {#PXNAME} {#SVNAME}: Status of last health check | Status of last health check, one of: UNK -> unknown INI -> initializing SOCKERR -> socket error L4OK -> check passed on layer 4, no upper layers testing enabled L4TOUT -> layer 1-4 timeout L4CON -> layer 1-4 connection problem, for example "Connection refused" (tcp rst) or "No route to host" (icmp) L6OK -> check passed on layer 6 L6TOUT -> layer 6 (SSL) timeout L6RSP -> layer 6 invalid response - protocol error L7OK -> check passed on layer 7 L7OKC -> check conditionally passed on layer 7, for example 404 with disable-on-404 L7TOUT -> layer 7 (HTTP/SMTP) timeout L7RSP -> layer 7 invalid response - protocol error L7STS -> layer 7 response error, for example HTTP 5xx Notice: If a check is currently running, the last known status will be reported, prefixed with "* ". e. g. "* L7OK". |
DEPENDENT | haproxy.server.checkstatus[{#PXNAME},{#SVNAME}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:10m |
Zabbix raw items | HAProxy: Get stats | HAProxy Statistics Report in CSV format |
ZABBIX_PASSIVE | web.page.get["{$HAPROXY.STATS.SCHEME}://{HOST.CONN}:{$HAPROXY.STATS.PORT}/{$HAPROXY.STATS.PATH};csv"] Preprocessing: - REGEX: - CSVTOJSON: |
Zabbix raw items | HAProxy: Get nodes | Array for LLD rules. |
DEPENDENT | haproxy.get.nodes Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
Zabbix raw items | HAProxy: Get stats page | HAProxy Statistics Report HTML |
ZABBIX_PASSIVE | web.page.get["{$HAPROXY.STATS.SCHEME}://{HOST.CONN}:{$HAPROXY.STATS.PORT}/{$HAPROXY.STATS.PATH}"] |
Name | Description | Expression | Severity | Dependencies and additional info | |
---|---|---|---|---|---|
HAProxy: Version has changed | HAProxy version has changed. Ack to close. |
last(/HAProxy by Zabbix agent/haproxy.version,#1)<>last(/HAProxy by Zabbix agent/haproxy.version,#2) and length(last(/HAProxy by Zabbix agent/haproxy.version))>0 |
INFO | Manual close: YES |
|
HAProxy: has been restarted | Uptime is less than 10 minutes |
last(/HAProxy by Zabbix agent/haproxy.uptime)<10m |
INFO | Manual close: YES |
|
HAProxy: Service is down | - |
last(/HAProxy by Zabbix agent/net.tcp.service["{$HAPROXY.STATS.SCHEME}","{HOST.CONN}","{$HAPROXY.STATS.PORT}"])=0 |
AVERAGE | Manual close: YES |
|
HAProxy: Service response time is too high | - |
min(/HAProxy by Zabbix agent/net.tcp.service.perf["{$HAPROXY.STATS.SCHEME}","{HOST.CONN}","{$HAPROXY.STATS.PORT}"],5m)>{$HAPROXY.RESPONSE_TIME.MAX.WARN} |
WARNING | Manual close: YES Depends on: - HAProxy: Service is down |
|
HAProxy backend {#PXNAME}: Server is DOWN | Backend is not available. |
count(/HAProxy by Zabbix agent/haproxy.backend.status[{#PXNAME},{#SVNAME}],#5,"eq","DOWN")=5 |
AVERAGE | ||
HAProxy backend {#PXNAME}: Average response time is high | Average backend response time (in ms) for the last 1,024 requests is more than {$HAPROXY.BACK_RTIME.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.backend.rtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_RTIME.MAX.WARN} |
WARNING | ||
HAProxy backend {#PXNAME}: Number of responses with error is high | Number of requests on backend, whose responses yielded an error, is more than {$HAPROXY.BACK_ERESP.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.backend.eresp.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_ERESP.MAX.WARN} |
WARNING | ||
HAProxy backend {#PXNAME}: Current number of requests unassigned in queue is high | Current number of requests on backend unassigned in queue is more than {$HAPROXY.BACK_QCUR.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.backend.qcur[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_QCUR.MAX.WARN} |
WARNING | ||
HAProxy backend {#PXNAME}: Average time spent in queue is high | Average time spent in queue (in ms) for the last 1,024 requests is more than {$HAPROXY.BACK_QTIME.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.backend.qtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_QTIME.MAX.WARN} |
WARNING | ||
HAProxy frontend {#PXNAME}: Session utilization is high | Alerting on this metric is essential to ensure your server has sufficient capacity to handle all concurrent sessions. Unlike requests, upon reaching the session limit HAProxy will deny additional clients until resource consumption drops. Furthermore, if you find your session usage percentage to be hovering above 80%, it could be time to either modify HAProxy's configuration to allow more sessions, or migrate your HAProxy server to a bigger box. |
min(/HAProxy by Zabbix agent/haproxy.frontend.sutil[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_SUTIL.MAX.WARN} |
WARNING | ||
HAProxy frontend {#PXNAME}: Number of request errors is high | Number of request errors is more than {$HAPROXY.FRONT_EREQ.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.frontend.ereq.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_EREQ.MAX.WARN} |
WARNING | ||
HAProxy frontend {#PXNAME}: Number of requests denied is high | Number of requests denied due to security concerns (ACL-restricted) is more than {$HAPROXY.FRONT_DREQ.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.frontend.dreq.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_DREQ.MAX.WARN} |
WARNING | ||
HAProxy {#PXNAME} {#SVNAME}: Server is DOWN | Server is not available. |
count(/HAProxy by Zabbix agent/haproxy.server.status[{#PXNAME},{#SVNAME}],#5,"eq","DOWN")=5 |
WARNING | ||
HAProxy {#PXNAME} {#SVNAME}: Average response time is high | Average server response time (in ms) for the last 1,024 requests is more than {$HAPROXY.SERVER_RTIME.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.server.rtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_RTIME.MAX.WARN} |
WARNING | ||
HAProxy {#PXNAME} {#SVNAME}: Number of responses with error is high | Number of requests on server, whose responses yielded an error, is more than {$HAPROXY.SERVER_ERESP.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.server.eresp.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_ERESP.MAX.WARN} |
WARNING | ||
HAProxy {#PXNAME} {#SVNAME}: Current number of requests unassigned in queue is high | Current number of requests unassigned in queue is more than {$HAPROXY.SERVER_QCUR.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.server.qcur[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_QCUR.MAX.WARN} |
WARNING | ||
HAProxy {#PXNAME} {#SVNAME}: Average time spent in queue is high | Average time spent in queue (in ms) for the last 1,024 requests is more than {$HAPROXY.SERVER_QTIME.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.server.qtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_QTIME.MAX.WARN} |
WARNING | ||
HAProxy {#PXNAME} {#SVNAME}: Health check error | Please check the server for faults. |
`find(/HAProxy by Zabbix agent/haproxy.server.check_status[{#PXNAME},{#SVNAME}],#3,"regexp","(?:L[4-7]OK | ^$)")=0` | WARNING | Depends on: - HAProxy {#PXNAME} {#SVNAME}: Server is DOWN |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
The template for monitoring Hadoop over HTTP that works without any external scripts.
It collects metrics by polling the Hadoop API remotely using an HTTP agent and JSONPath preprocessing.
Zabbix server (or proxy) execute direct requests to ResourceManager, NodeManagers, NameNode, DataNodes APIs.
All metrics are collected at once, thanks to the Zabbix bulk data collection.
This template was tested on:
See Zabbix template operation for basic instructions.
You should define the IP address (or FQDN) and Web-UI port for the ResourceManager in {$HADOOP.RESOURCEMANAGER.HOST} and {$HADOOP.RESOURCEMANAGER.PORT} macros and for the NameNode in {$HADOOP.NAMENODE.HOST} and {$HADOOP.NAMENODE.PORT} macros respectively. Macros can be set in the template or overridden at the host level.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$HADOOP.CAPACITY_REMAINING.MIN.WARN} | The Hadoop cluster capacity remaining percent for trigger expression. |
20 |
{$HADOOP.NAMENODE.HOST} | The Hadoop NameNode host IP address or FQDN. |
NameNode |
{$HADOOP.NAMENODE.PORT} | The Hadoop NameNode Web-UI port. |
9870 |
{$HADOOP.NAMENODE.RESPONSE_TIME.MAX.WARN} | The Hadoop NameNode API page maximum response time in seconds for trigger expression. |
10s |
{$HADOOP.RESOURCEMANAGER.HOST} | The Hadoop ResourceManager host IP address or FQDN. |
ResourceManager |
{$HADOOP.RESOURCEMANAGER.PORT} | The Hadoop ResourceManager Web-UI port. |
8088 |
{$HADOOP.RESOURCEMANAGER.RESPONSE_TIME.MAX.WARN} | The Hadoop ResourceManager API page maximum response time in seconds for trigger expression. |
10s |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Data node discovery | - |
HTTP_AGENT | hadoop.datanode.discovery Preprocessing: - JAVASCRIPT: |
Node manager discovery | - |
HTTP_AGENT | hadoop.nodemanager.discovery Preprocessing: - JAVASCRIPT: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Hadoop | ResourceManager: Service status | Hadoop ResourceManager API port availability. |
SIMPLE | net.tcp.service["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Hadoop | ResourceManager: Service response time | Hadoop ResourceManager API performance. |
SIMPLE | net.tcp.service.perf["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"] |
Hadoop | ResourceManager: Uptime | - |
DEPENDENT | hadoop.resourcemanager.uptime Preprocessing: - JSONPATH: - MULTIPLIER: |
Hadoop | ResourceManager: RPC queue & processing time | Average time spent on processing RPC requests. |
DEPENDENT | hadoop.resourcemanager.rpcprocessingtime_avg Preprocessing: - JSONPATH: |
Hadoop | ResourceManager: Active NMs | Number of Active NodeManagers. |
DEPENDENT | hadoop.resourcemanager.numactivenm Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Hadoop | ResourceManager: Decommissioning NMs | Number of Decommissioning NodeManagers. |
DEPENDENT | hadoop.resourcemanager.numdecommissioningnm Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Hadoop | ResourceManager: Decommissioned NMs | Number of Decommissioned NodeManagers. |
DEPENDENT | hadoop.resourcemanager.numdecommissionednm Preprocessing: - JSONPATH: |
Hadoop | ResourceManager: Lost NMs | Number of Lost NodeManagers. |
DEPENDENT | hadoop.resourcemanager.numlostnm Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Hadoop | ResourceManager: Unhealthy NMs | Number of Unhealthy NodeManagers. |
DEPENDENT | hadoop.resourcemanager.numunhealthynm Preprocessing: - JSONPATH: |
Hadoop | ResourceManager: Rebooted NMs | Number of Rebooted NodeManagers. |
DEPENDENT | hadoop.resourcemanager.numrebootednm Preprocessing: - JSONPATH: |
Hadoop | ResourceManager: Shutdown NMs | Number of Shutdown NodeManagers. |
DEPENDENT | hadoop.resourcemanager.numshutdownnm Preprocessing: - JSONPATH: |
Hadoop | NameNode: Service status | Hadoop NameNode API port availability. |
SIMPLE | net.tcp.service["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Hadoop | NameNode: Service response time | Hadoop NameNode API performance. |
SIMPLE | net.tcp.service.perf["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"] |
Hadoop | NameNode: Uptime | - |
DEPENDENT | hadoop.namenode.uptime Preprocessing: - JSONPATH: - MULTIPLIER: |
Hadoop | NameNode: RPC queue & processing time | Average time spent on processing RPC requests. |
DEPENDENT | hadoop.namenode.rpcprocessingtime_avg Preprocessing: - JSONPATH: |
Hadoop | NameNode: Block Pool Renaming | - |
DEPENDENT | hadoop.namenode.percentblockpool_used Preprocessing: - JSONPATH: |
Hadoop | NameNode: Transactions since last checkpoint | Total number of transactions since last checkpoint. |
DEPENDENT | hadoop.namenode.transactionssincelast_checkpoint Preprocessing: - JSONPATH: |
Hadoop | NameNode: Percent capacity remaining | Available capacity in percent. |
DEPENDENT | hadoop.namenode.percentremaining Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
Hadoop | NameNode: Capacity remaining | Available capacity. |
DEPENDENT | hadoop.namenode.capacity_remaining Preprocessing: - JSONPATH: |
Hadoop | NameNode: Corrupt blocks | Number of corrupt blocks. |
DEPENDENT | hadoop.namenode.corrupt_blocks Preprocessing: - JSONPATH: |
Hadoop | NameNode: Missing blocks | Number of missing blocks. |
DEPENDENT | hadoop.namenode.missing_blocks Preprocessing: - JSONPATH: |
Hadoop | NameNode: Failed volumes | Number of failed volumes. |
DEPENDENT | hadoop.namenode.volumefailurestotal Preprocessing: - JSONPATH: |
Hadoop | NameNode: Alive DataNodes | Count of alive DataNodes. |
DEPENDENT | hadoop.namenode.numlivedatanodes Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
Hadoop | NameNode: Dead DataNodes | Count of dead DataNodes. |
DEPENDENT | hadoop.namenode.numdeaddatanodes Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
Hadoop | NameNode: Stale DataNodes | DataNodes that do not send a heartbeat within 30 seconds are marked as "stale". |
DEPENDENT | hadoop.namenode.numstaledatanodes Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
Hadoop | NameNode: Total files | Total count of files tracked by the NameNode. |
DEPENDENT | hadoop.namenode.files_total Preprocessing: - JSONPATH: |
Hadoop | NameNode: Total load | The current number of concurrent file accesses (read/write) across all DataNodes. |
DEPENDENT | hadoop.namenode.total_load Preprocessing: - JSONPATH: |
Hadoop | NameNode: Blocks allocable | Maximum number of blocks allocable. |
DEPENDENT | hadoop.namenode.block_capacity Preprocessing: - JSONPATH: |
Hadoop | NameNode: Total blocks | Count of blocks tracked by NameNode. |
DEPENDENT | hadoop.namenode.blocks_total Preprocessing: - JSONPATH: |
Hadoop | NameNode: Under-replicated blocks | The number of blocks with insufficient replication. |
DEPENDENT | hadoop.namenode.underreplicatedblocks Preprocessing: - JSONPATH: |
Hadoop | {#HOSTNAME}: RPC queue & processing time | Average time spent on processing RPC requests. |
DEPENDENT | hadoop.nodemanager.rpcprocessingtime_avg[{#HOSTNAME}] Preprocessing: - JSONPATH: |
Hadoop | {#HOSTNAME}: Container launch avg duration | - |
DEPENDENT | hadoop.nodemanager.containerlaunchduration_avg[{#HOSTNAME}] Preprocessing: - JSONPATH: |
Hadoop | {#HOSTNAME}: JVM Threads | The number of JVM threads. |
DEPENDENT | hadoop.nodemanager.jvm.threads[{#HOSTNAME}] Preprocessing: - JSONPATH: |
Hadoop | {#HOSTNAME}: JVM Garbage collection time | The JVM garbage collection time in milliseconds. |
DEPENDENT | hadoop.nodemanager.jvm.gc_time[{#HOSTNAME}] Preprocessing: - JSONPATH: |
Hadoop | {#HOSTNAME}: JVM Heap usage | The JVM heap usage in MBytes. |
DEPENDENT | hadoop.nodemanager.jvm.memheapused[{#HOSTNAME}] Preprocessing: - JSONPATH: |
Hadoop | {#HOSTNAME}: Uptime | - |
DEPENDENT | hadoop.nodemanager.uptime[{#HOSTNAME}] Preprocessing: - JSONPATH: - MULTIPLIER: |
Hadoop | {#HOSTNAME}: State | State of the node - valid values are: NEW, RUNNING, UNHEALTHY, DECOMMISSIONING, DECOMMISSIONED, LOST, REBOOTED, SHUTDOWN. |
DEPENDENT | hadoop.nodemanager.state[{#HOSTNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Hadoop | {#HOSTNAME}: Version | - |
DEPENDENT | hadoop.nodemanager.version[{#HOSTNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Hadoop | {#HOSTNAME}: Number of containers | - |
DEPENDENT | hadoop.nodemanager.numcontainers[{#HOSTNAME}] Preprocessing: - JSONPATH: |
Hadoop | {#HOSTNAME}: Used memory | - |
DEPENDENT | hadoop.nodemanager.usedmemory[{#HOSTNAME}] Preprocessing: - JSONPATH: |
Hadoop | {#HOSTNAME}: Available memory | - |
DEPENDENT | hadoop.nodemanager.availablememory[{#HOSTNAME}] Preprocessing: - JSONPATH: |
Hadoop | {#HOSTNAME}: Remaining | Remaining disk space. |
DEPENDENT | hadoop.datanode.remaining[{#HOSTNAME}] Preprocessing: - JSONPATH: |
Hadoop | {#HOSTNAME}: Used | Used disk space. |
DEPENDENT | hadoop.datanode.dfs_used[{#HOSTNAME}] Preprocessing: - JSONPATH: |
Hadoop | {#HOSTNAME}: Number of failed volumes | Number of failed storage volumes. |
DEPENDENT | hadoop.datanode.numfailedvolumes[{#HOSTNAME}] Preprocessing: - JSONPATH: |
Hadoop | {#HOSTNAME}: JVM Threads | The number of JVM threads. |
DEPENDENT | hadoop.datanode.jvm.threads[{#HOSTNAME}] Preprocessing: - JSONPATH: |
Hadoop | {#HOSTNAME}: JVM Garbage collection time | The JVM garbage collection time in milliseconds. |
DEPENDENT | hadoop.datanode.jvm.gc_time[{#HOSTNAME}] Preprocessing: - JSONPATH: |
Hadoop | {#HOSTNAME}: JVM Heap usage | The JVM heap usage in MBytes. |
DEPENDENT | hadoop.datanode.jvm.memheapused[{#HOSTNAME}] Preprocessing: - JSONPATH: |
Hadoop | {#HOSTNAME}: Uptime | - |
DEPENDENT | hadoop.datanode.uptime[{#HOSTNAME}] Preprocessing: - JSONPATH: - MULTIPLIER: |
Hadoop | {#HOSTNAME}: Version | DataNode software version. |
DEPENDENT | hadoop.datanode.version[{#HOSTNAME}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Hadoop | {#HOSTNAME}: Admin state | Administrative state. |
DEPENDENT | hadoop.datanode.adminstate[{#HOSTNAME}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
Hadoop | {#HOSTNAME}: Oper state | Operational state. |
DEPENDENT | hadoop.datanode.operstate[{#HOSTNAME}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:6h |
Zabbix raw items | Get ResourceManager stats | - |
HTTP_AGENT | hadoop.resourcemanager.get |
Zabbix raw items | Get NameNode stats | - |
HTTP_AGENT | hadoop.namenode.get |
Zabbix raw items | Get NodeManagers states | - |
HTTP_AGENT | hadoop.nodemanagers.get Preprocessing: - JAVASCRIPT: |
Zabbix raw items | Get DataNodes states | - |
HTTP_AGENT | hadoop.datanodes.get Preprocessing: - JAVASCRIPT: |
Zabbix raw items | Hadoop NodeManager {#HOSTNAME}: Get stats | - |
HTTP_AGENT | hadoop.nodemanager.get[{#HOSTNAME}] |
Zabbix raw items | Hadoop DataNode {#HOSTNAME}: Get stats | - |
HTTP_AGENT | hadoop.datanode.get[{#HOSTNAME}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
ResourceManager: Service is unavailable | - |
last(/Hadoop by HTTP/net.tcp.service["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"])=0 |
AVERAGE | Manual close: YES |
ResourceManager: Service response time is too high | - |
min(/Hadoop by HTTP/net.tcp.service.perf["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"],5m)>{$HADOOP.RESOURCEMANAGER.RESPONSE_TIME.MAX.WARN} |
WARNING | Manual close: YES Depends on: - ResourceManager: Service is unavailable |
ResourceManager: Service has been restarted | Uptime is less than 10 minutes. |
last(/Hadoop by HTTP/hadoop.resourcemanager.uptime)<10m |
INFO | Manual close: YES |
ResourceManager: Failed to fetch ResourceManager API page | Zabbix has not received data for items for the last 30 minutes. |
nodata(/Hadoop by HTTP/hadoop.resourcemanager.uptime,30m)=1 |
WARNING | Manual close: YES Depends on: - ResourceManager: Service is unavailable |
ResourceManager: Cluster has no active NodeManagers | Cluster is unable to execute any jobs without at least one NodeManager. |
max(/Hadoop by HTTP/hadoop.resourcemanager.num_active_nm,5m)=0 |
HIGH | |
ResourceManager: Cluster has unhealthy NodeManagers | YARN considers any node with disk utilization exceeding the value specified under the property yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage (in yarn-site.xml) to be unhealthy. Ample disk space is critical to ensure uninterrupted operation of a Hadoop cluster, and large numbers of unhealthyNodes (the number to alert on depends on the size of your cluster) should be quickly investigated and resolved. |
min(/Hadoop by HTTP/hadoop.resourcemanager.num_unhealthy_nm,15m)>0 |
AVERAGE | |
NameNode: Service is unavailable | - |
last(/Hadoop by HTTP/net.tcp.service["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"])=0 |
AVERAGE | Manual close: YES |
NameNode: Service response time is too high | - |
min(/Hadoop by HTTP/net.tcp.service.perf["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"],5m)>{$HADOOP.NAMENODE.RESPONSE_TIME.MAX.WARN} |
WARNING | Manual close: YES Depends on: - NameNode: Service is unavailable |
NameNode: Service has been restarted | Uptime is less than 10 minutes. |
last(/Hadoop by HTTP/hadoop.namenode.uptime)<10m |
INFO | Manual close: YES |
NameNode: Failed to fetch NameNode API page | Zabbix has not received data for items for the last 30 minutes. |
nodata(/Hadoop by HTTP/hadoop.namenode.uptime,30m)=1 |
WARNING | Manual close: YES Depends on: - NameNode: Service is unavailable |
NameNode: Cluster capacity remaining is low | A good practice is to ensure that disk use never exceeds 80 percent capacity. |
max(/Hadoop by HTTP/hadoop.namenode.percent_remaining,15m)<{$HADOOP.CAPACITY_REMAINING.MIN.WARN} |
WARNING | |
NameNode: Cluster has missing blocks | A missing block is far worse than a corrupt block, because a missing block cannot be recovered by copying a replica. |
min(/Hadoop by HTTP/hadoop.namenode.missing_blocks,15m)>0 |
AVERAGE | |
NameNode: Cluster has volume failures | HDFS now allows for disks to fail in place, without affecting DataNode operations, until a threshold value is reached. This is set on each DataNode via the dfs.datanode.failed.volumes.tolerated property; it defaults to 0, meaning that any volume failure will shut down the DataNode; on a production cluster where DataNodes typically have 6, 8, or 12 disks, setting this parameter to 1 or 2 is typically the best practice. |
min(/Hadoop by HTTP/hadoop.namenode.volume_failures_total,15m)>0 |
AVERAGE | |
NameNode: Cluster has DataNodes in Dead state | The death of a DataNode causes a flurry of network activity, as the NameNode initiates replication of blocks lost on the dead nodes. |
min(/Hadoop by HTTP/hadoop.namenode.num_dead_data_nodes,5m)>0 |
AVERAGE | |
{#HOSTNAME}: Service has been restarted | Uptime is less than 10 minutes. |
last(/Hadoop by HTTP/hadoop.nodemanager.uptime[{#HOSTNAME}])<10m |
INFO | Manual close: YES |
{#HOSTNAME}: Failed to fetch NodeManager API page | Zabbix has not received data for items for the last 30 minutes. |
nodata(/Hadoop by HTTP/hadoop.nodemanager.uptime[{#HOSTNAME}],30m)=1 |
WARNING | Manual close: YES Depends on: - {#HOSTNAME}: NodeManager has state {ITEM.VALUE}. |
{#HOSTNAME}: NodeManager has state {ITEM.VALUE}. | The state is different from normal. |
last(/Hadoop by HTTP/hadoop.nodemanager.state[{#HOSTNAME}])<>"RUNNING" |
AVERAGE | |
{#HOSTNAME}: Service has been restarted | Uptime is less than 10 minutes. |
last(/Hadoop by HTTP/hadoop.datanode.uptime[{#HOSTNAME}])<10m |
INFO | Manual close: YES |
{#HOSTNAME}: Failed to fetch DataNode API page | Zabbix has not received data for items for the last 30 minutes. |
nodata(/Hadoop by HTTP/hadoop.datanode.uptime[{#HOSTNAME}],30m)=1 |
WARNING | Manual close: YES Depends on: - {#HOSTNAME}: DataNode has state {ITEM.VALUE}. |
{#HOSTNAME}: DataNode has state {ITEM.VALUE}. | The state is different from normal. |
last(/Hadoop by HTTP/hadoop.datanode.oper_state[{#HOSTNAME}])<>"Live" |
AVERAGE |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
https://hadoop.apache.org/docs/current/
For Zabbix version: 6.2 and higher.
This template is designed to monitor GitLab by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template GitLab by HTTP
— collects metrics by an HTTP agent from the GitLab /-/metrics
endpoint.
See https://docs.gitlab.com/ee/administration/monitoring/prometheus/gitlab_metrics.html.
This template was tested on:
See Zabbix template operation for basic instructions.
This template works with self-hosted GitLab instances. Internal service metrics are collected from the GitLab /-/metrics
endpoint.
To access metrics following two methods are available:
Admin -> Monitoring -> Health check
page: http://your.gitlab.address/admin/health_check; Use this token in macro {$GITLAB.HEALTH.TOKEN}
as variable path, like: ?token=your_token
.
Remember to change the macros {$GITLAB.URL}
.
Also, see the Macros section for a list of macros used to set trigger values.NOTE. Some metrics may not be collected depending on your Gitlab instance version and configuration. See Gitlab's documentation for further information about its metric collection.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$GITLAB.HEALTH.TOKEN} | The token path for Gitlab health check. Example |
`` |
{$GITLAB.HTTP.FAIL.MAX.WARN} | The maximum number of HTTP request failures for a trigger expression. |
2 |
{$GITLAB.OPEN.FDS.MAX.WARN} | The maximum percentage of used file descriptors for a trigger expression. |
90 |
{$GITLAB.PUMA.QUEUE.MAX.WARN} | The maximum number of Puma queued requests for a trigger expression. |
1 |
{$GITLAB.PUMA.UTILIZATION.MAX.WARN} | The maximum percentage of Puma thread utilization for a trigger expression. |
90 |
{$GITLAB.REDIS.FAIL.MAX.WARN} | The maximum number of Redis client exceptions for a trigger expression. |
2 |
{$GITLAB.UNICORN.QUEUE.MAX.WARN} | The maximum number of Unicorn queued requests for a trigger expression. |
1 |
{$GITLAB.UNICORN.UTILIZATION.MAX.WARN} | The maximum percentage of Unicorn workers utilization for a trigger expression. |
90 |
{$GITLAB.URL} | URL of a GitLab instance. |
http://localhost |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Puma metrics discovery | Discovery of Puma specific metrics when Puma is used. |
HTTP_AGENT | gitlab.puma.discovery Preprocessing: - PROMETHEUSTOJSON: - JAVASCRIPT: |
Unicorn metrics discovery | DiscoveryUnicorn specific metrics, when Unicorn is used. |
HTTP_AGENT | gitlab.unicorn.discovery Preprocessing: - PROMETHEUSTOJSON: - JAVASCRIPT: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
GitLab | GitLab: Instance readiness check | The readiness probe checks whether the GitLab instance is ready to accept traffic via Rails Controllers. |
HTTP_AGENT | gitlab.readiness Preprocessing: - CHECKNOTSUPPORTED ⛔️ONFAIL: - JSONPATH: - BOOLTODECIMAL ⛔️ONFAIL: - DISCARDUNCHANGEDHEARTBEAT: |
GitLab | GitLab: Application server status | Checks whether the application server is running. This probe is used to know if Rails Controllers are not deadlocked due to a multi-threading. |
HTTP_AGENT | gitlab.liveness Preprocessing: - CHECKNOTSUPPORTED ⛔️ONFAIL: - JSONPATH: - BOOLTODECIMAL ⛔️ONFAIL: - DISCARDUNCHANGEDHEARTBEAT: |
GitLab | GitLab: Version | Version of the GitLab instance. |
DEPENDENT | gitlab.deployments.version Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
GitLab | GitLab: Ruby: First process start time | Minimum UNIX timestamp of ruby processes start time. |
DEPENDENT | gitlab.ruby.processstarttimeseconds.first Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:3h |
GitLab | GitLab: Ruby: Last process start time | Maximum UNIX timestamp ruby processes start time. |
DEPENDENT | gitlab.ruby.processstarttimeseconds.last Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:3h |
GitLab | GitLab: User logins, total | Counter of how many users have logged in since GitLab was started or restarted. |
DEPENDENT | gitlab.usersessionloginstotal Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
GitLab | GitLab: User CAPTCHA logins failed, total | Counter of failed CAPTCHA attempts during login. |
DEPENDENT | gitlab.failedlogincaptchatotal Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
GitLab | GitLab: User CAPTCHA logins, total | Counter of successful CAPTCHA attempts during login. |
DEPENDENT | gitlab.successfullogincaptchatotal Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
GitLab | GitLab: Upload file does not exist | Number of times an upload record could not find its file. |
DEPENDENT | gitlab.uploadfiledoesnotexist Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
GitLab | GitLab: Pipelines: Processing events, total | Total amount of pipeline processing events. |
DEPENDENT | gitlab.pipeline.processingeventstotal Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
GitLab | GitLab: Pipelines: Created, total | Counter of pipelines created. |
DEPENDENT | gitlab.pipeline.createdtotal Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
GitLab | GitLab: Pipelines: Auto DevOps pipelines, total | Counter of completed Auto DevOps pipelines. |
DEPENDENT | gitlab.pipeline.autodevopscompleted.total Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
GitLab | GitLab: Pipelines: Auto DevOps pipelines, failed | Counter of completed Auto DevOps pipelines with status "failed". |
DEPENDENT | gitlab.pipeline.autodevopscompletedtotal.failed Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
GitLab | GitLab: Pipelines: CI/CD creation duration | The sum of the time in seconds it takes to create a CI/CD pipeline. |
DEPENDENT | gitlab.pipeline.pipelinecreation Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
GitLab | GitLab: Pipelines: Pipelines: CI/CD creation count | The count of the time it takes to create a CI/CD pipeline. |
DEPENDENT | gitlab.pipeline.pipelinecreation.count Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
GitLab | GitLab: Database: Connection pool, busy | Connections to the main database in use where the owner is still alive. |
DEPENDENT | gitlab.database.connectionpoolbusy Preprocessing: - JSONPATH: |
GitLab | GitLab: Database: Connection pool, current | Current connections to the main database in the pool. |
DEPENDENT | gitlab.database.connectionpoolconnections Preprocessing: - JSONPATH: |
GitLab | GitLab: Database: Connection pool, dead | Connections to the main database in use where the owner is not alive. |
DEPENDENT | gitlab.database.connectionpooldead Preprocessing: - JSONPATH: |
GitLab | GitLab: Database: Connection pool, idle | Connections to the main database not in use. |
DEPENDENT | gitlab.database.connectionpoolidle Preprocessing: - JSONPATH: |
GitLab | GitLab: Database: Connection pool, size | Total connection to the main database pool capacity. |
DEPENDENT | gitlab.database.connectionpoolsize Preprocessing: - JSONPATH: |
GitLab | GitLab: Database: Connection pool, waiting | Threads currently waiting on this queue. |
DEPENDENT | gitlab.database.connectionpoolwaiting Preprocessing: - JSONPATH: |
GitLab | GitLab: Redis: Client requests rate, queues | Number of Redis client requests per second. (Instance: queues) |
DEPENDENT | gitlab.redis.clientrequests.queues.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - CHANGEPERSECOND |
GitLab | GitLab: Redis: Client requests rate, cache | Number of Redis client requests per second. (Instance: cache) |
DEPENDENT | gitlab.redis.clientrequests.cache.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - CHANGEPERSECOND |
GitLab | GitLab: Redis: Client requests rate, shared_state | Number of Redis client requests per second. (Instance: shared_state) |
DEPENDENT | gitlab.redis.clientrequests.sharedstate.rate Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
GitLab | GitLab: Redis: Client exceptions rate, queues | Number of Redis client exceptions per second. (Instance: queues) |
DEPENDENT | gitlab.redis.clientexceptions.queues.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - CHANGEPERSECOND |
GitLab | GitLab: Redis: Client exceptions rate, cache | Number of Redis client exceptions per second. (Instance: cache) |
DEPENDENT | gitlab.redis.clientexceptions.cache.rate Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> - CHANGEPERSECOND |
GitLab | GitLab: Redis: client exceptions rate, shared_state | Number of Redis client exceptions per second. (Instance: shared_state) |
DEPENDENT | gitlab.redis.clientexceptions.sharedstate.rate Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
GitLab | GitLab: Cache: Misses rate, total | The cache read miss count. |
DEPENDENT | gitlab.cache.missestotal.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
GitLab | GitLab: Cache: Operations rate, total | The count of cache operations. |
DEPENDENT | gitlab.cache.operationstotal.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
GitLab | GitLab: Ruby: CPU usage per second | Average CPU time util in seconds. |
DEPENDENT | gitlab.ruby.processcpuseconds.rate Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
GitLab | GitLab: Ruby: Running_threads | Number of running Ruby threads. |
DEPENDENT | gitlab.ruby.threads_running Preprocessing: - JSONPATH: |
GitLab | GitLab: Ruby: File descriptors opened, avg | Average number of opened file descriptors. |
DEPENDENT | gitlab.ruby.file_descriptors.avg Preprocessing: - JSONPATH: |
GitLab | GitLab: Ruby: File descriptors opened, max | Maximum number of opened file descriptors. |
DEPENDENT | gitlab.ruby.file_descriptors.max Preprocessing: - JSONPATH: |
GitLab | GitLab: Ruby: File descriptors opened, min | Minimum number of opened file descriptors. |
DEPENDENT | gitlab.ruby.file_descriptors.min Preprocessing: - JSONPATH: |
GitLab | GitLab: Ruby: File descriptors, max | Maximum number of open file descriptors per process. |
DEPENDENT | gitlab.ruby.processmaxfds Preprocessing: - JSONPATH: |
GitLab | GitLab: Ruby: RSS memory, avg | Average RSS Memory usage in bytes. |
DEPENDENT | gitlab.ruby.processresidentmemory_bytes.avg Preprocessing: - JSONPATH: |
GitLab | GitLab: Ruby: RSS memory, min | Minimum RSS Memory usage in bytes. |
DEPENDENT | gitlab.ruby.processresidentmemory_bytes.min Preprocessing: - JSONPATH: |
GitLab | GitLab: Ruby: RSS memory, max | Maximum RSS Memory usage in bytes. |
DEPENDENT | gitlab.ruby.processresidentmemory_bytes.max Preprocessing: - JSONPATH: |
GitLab | GitLab: HTTP requests rate, total | Number of requests received into the system. |
DEPENDENT | gitlab.http.requests.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
GitLab | GitLab: HTTP requests rate, 5xx | Number of handle failures of requests with HTTP-code 5xx. |
DEPENDENT | gitlab.http.requests.5xx.rate Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
GitLab | GitLab: HTTP requests rate, 4xx | Number of handle failures of requests with code 4XX. |
DEPENDENT | gitlab.http.requests.4xx.rate Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
GitLab | GitLab: Transactions per second | Transactions per second (gitlabtransaction* metrics). |
DEPENDENT | gitlab.transactions.rate Preprocessing: - JSONPATH: ⛔️ONFAIL: - CHANGEPER_SECOND |
GitLab: Puma stats | GitLab: Active connections | Number of puma threads processing a request. |
DEPENDENT | gitlab.puma.active_connections[{#SINGLETON}] Preprocessing: - JSONPATH: |
GitLab: Puma stats | GitLab: Workers | Total number of puma workers. |
DEPENDENT | gitlab.puma.workers[{#SINGLETON}] Preprocessing: - JSONPATH: |
GitLab: Puma stats | GitLab: Running workers | The number of booted puma workers. |
DEPENDENT | gitlab.puma.running_workers[{#SINGLETON}] Preprocessing: - JSONPATH: |
GitLab: Puma stats | GitLab: Stale workers | The number of old puma workers. |
DEPENDENT | gitlab.puma.stale_workers[{#SINGLETON}] Preprocessing: - JSONPATH: |
GitLab: Puma stats | GitLab: Running threads | The number of running puma threads. |
DEPENDENT | gitlab.puma.running[{#SINGLETON}] Preprocessing: - JSONPATH: |
GitLab: Puma stats | GitLab: Queued connections | The number of connections in that puma worker's "todo" set waiting for a worker thread. |
DEPENDENT | gitlab.puma.queued_connections[{#SINGLETON}] Preprocessing: - JSONPATH: |
GitLab: Puma stats | GitLab: Pool capacity | The number of requests the puma worker is capable of taking right now. |
DEPENDENT | gitlab.puma.pool_capacity[{#SINGLETON}] Preprocessing: - JSONPATH: |
GitLab: Puma stats | GitLab: Max threads | The maximum number of puma worker threads. |
DEPENDENT | gitlab.puma.max_threads[{#SINGLETON}] Preprocessing: - JSONPATH: |
GitLab: Puma stats | GitLab: Idle threads | The number of spawned puma threads which are not processing a request. |
DEPENDENT | gitlab.puma.idle_threads[{#SINGLETON}] Preprocessing: - JSONPATH: |
GitLab: Puma stats | GitLab: Killer terminations, total | The number of workers terminated by PumaWorkerKiller. |
DEPENDENT | gitlab.puma.killerterminationstotal[{#SINGLETON}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
GitLab: Unicorn stats | GitLab: Unicorn: Workers | The number of Unicorn workers |
DEPENDENT | gitlab.unicorn.unicorn_workers[{#SINGLETON}] Preprocessing: - JSONPATH: |
GitLab: Unicorn stats | GitLab: Unicorn: Active connections | The number of active Unicorn connections. |
DEPENDENT | gitlab.unicorn.active_connections[{#SINGLETON}] Preprocessing: - JSONPATH: |
GitLab: Unicorn stats | GitLab: Unicorn: Queued connections | The number of queued Unicorn connections. |
DEPENDENT | gitlab.unicorn.queued_connections[{#SINGLETON}] Preprocessing: - JSONPATH: |
Zabbix raw items | GitLab: Get instance metrics | - |
HTTP_AGENT | gitlab.getmetrics Preprocessing: - CHECK NOTSUPPORTED⛔️ON FAIL:DISCARD_VALUE -> - PROMETHEUSTOJSON |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
GitLab: Gitlab instance is not able to accept traffic | - |
last(/GitLab by HTTP/gitlab.readiness)=0 |
HIGH | Depends on: - GitLab: Liveness check was failed |
GitLab: Liveness check was failed | The application server is not running or Rails Controllers are deadlocked. |
last(/GitLab by HTTP/gitlab.liveness)=0 |
HIGH | |
GitLab: Version has changed | The GitLab version has changed. Perform Ack to close. |
last(/GitLab by HTTP/gitlab.deployments.version,#1)<>last(/GitLab by HTTP/gitlab.deployments.version,#2) and length(last(/GitLab by HTTP/gitlab.deployments.version))>0 |
INFO | Manual close: YES |
GitLab: Too many Redis queues client exceptions | "Too many Redis client exceptions during the requests to Redis instance queues." |
min(/GitLab by HTTP/gitlab.redis.client_exceptions.queues.rate,5m)>{$GITLAB.REDIS.FAIL.MAX.WARN} |
WARNING | |
GitLab: Too many Redis cache client exceptions | "Too many Redis client exceptions during the requests to Redis instance cache." |
min(/GitLab by HTTP/gitlab.redis.client_exceptions.cache.rate,5m)>{$GITLAB.REDIS.FAIL.MAX.WARN} |
WARNING | |
GitLab: Too many Redis shared_state client exceptions | "Too many Redis client exceptions during the requests to Redis instance shared_state." |
min(/GitLab by HTTP/gitlab.redis.client_exceptions.shared_state.rate,5m)>{$GITLAB.REDIS.FAIL.MAX.WARN} |
WARNING | |
GitLab: Failed to fetch info data | Zabbix has not received a metrics data for the last 30 minutes |
nodata(/GitLab by HTTP/gitlab.ruby.threads_running,30m)=1 |
WARNING | Manual close: YES Depends on: - GitLab: Liveness check was failed |
GitLab: Current number of open files is too high | - |
min(/GitLab by HTTP/gitlab.ruby.file_descriptors.max,5m)/last(/GitLab by HTTP/gitlab.ruby.process_max_fds)*100>{$GITLAB.OPEN.FDS.MAX.WARN} |
WARNING | |
GitLab: Too many HTTP requests failures | "Too many requests failed on GitLab instance with 5xx HTTP code" |
min(/GitLab by HTTP/gitlab.http.requests.5xx.rate,5m)>{$GITLAB.HTTP.FAIL.MAX.WARN} |
WARNING | |
GitLab: Puma instance thread utilization is too high | - |
min(/GitLab by HTTP/gitlab.puma.active_connections[{#SINGLETON}],5m)/last(/GitLab by HTTP/gitlab.puma.max_threads[{#SINGLETON}])*100>{$GITLAB.PUMA.UTILIZATION.MAX.WARN} |
WARNING | |
GitLab: Puma is queueing requests | - |
min(/GitLab by HTTP/gitlab.puma.queued_connections[{#SINGLETON}],15m)>{$GITLAB.PUMA.QUEUE.MAX.WARN} |
WARNING | |
GitLab: Unicorn worker utilization is too high | - |
min(/GitLab by HTTP/gitlab.unicorn.active_connections[{#SINGLETON}],5m)/last(/GitLab by HTTP/gitlab.unicorn.unicorn_workers[{#SINGLETON}])*100>{$GITLAB.UNICORN.UTILIZATION.MAX.WARN} |
WARNING | |
GitLab: Unicorn is queueing requests | - |
min(/GitLab by HTTP/gitlab.unicorn.queued_connections[{#SINGLETON}],5m)>{$GITLAB.UNICORN.QUEUE.MAX.WARN} |
WARNING |
Please report any issues with the template at https://support.zabbix.com.
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums.
For Zabbix version: 6.2 and higher
Official JMX Template from Zabbix distribution. Could be useful for many Java Applications (JMX).
Refer to the vendor documentation.
No specific Zabbix configuration is required.
Name | Description | Default | ||||
---|---|---|---|---|---|---|
{$JMX.CPU.LOAD.MAX} | A threshold in percent for CPU utilization trigger. |
85 |
||||
{$JMX.CPU.LOAD.TIME} | The time during which the CPU utilization may exceed the threshold. |
5m |
||||
{$JMX.FILE.DESCRIPTORS.MAX} | A threshold in percent for file descriptors count trigger. |
85 |
||||
{$JMX.FILE.DESCRIPTORS.TIME} | The time during which the file descriptors count may exceed the threshold. |
3m |
||||
{$JMX.HEAP.MEM.USAGE.MAX} | A threshold in percent for Heap memory utilization trigger. |
85 |
||||
{$JMX.HEAP.MEM.USAGE.TIME} | The time during which the Heap memory utilization may exceed the threshold. |
10m |
||||
{$JMX.MEM.POOL.NAME.MATCHES} | This macro used in memory pool discovery as a filter. |
`Old Gen | G1 | Perm Gen | Code Cache | Tenured Gen` |
{$JMX.MP.USAGE.MAX} | A threshold in percent for memory pools utilization trigger. Use a context to change the threshold for a specific pool. |
85 |
||||
{$JMX.MP.USAGE.TIME} | The time during which the memory pools utilization may exceed the threshold. |
10m |
||||
{$JMX.NONHEAP.MEM.USAGE.MAX} | A threshold in percent for Non-heap memory utilization trigger. |
85 |
||||
{$JMX.NONHEAP.MEM.USAGE.TIME} | The time during which the Non-heap memory utilization may exceed the threshold. |
10m |
||||
{$JMX.PASSWORD} | JMX password. |
`` | ||||
{$JMX.USER} | JMX username. |
`` |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Garbage collector discovery | Garbage collectors metrics discovery. |
JMX | jmx.discovery["beans","java.lang:name=*,type=GarbageCollector"] |
Memory pool discovery | Memory pools metrics discovery. |
JMX | jmx.discovery["beans","java.lang:name=*,type=MemoryPool"] Filter: - {#JMXNAME} MATCHES_REGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
JMX | ClassLoading: Loaded class count | Displays number of classes that are currently loaded in the Java virtual machine. |
JMX | jmx["java.lang:type=ClassLoading","LoadedClassCount"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
JMX | ClassLoading: Total loaded class count | Displays the total number of classes that have been loaded since the Java virtual machine has started execution. |
JMX | jmx["java.lang:type=ClassLoading","TotalLoadedClassCount"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
JMX | ClassLoading: Unloaded class count | Displays the total number of classes that have been loaded since the Java virtual machine has started execution. |
JMX | jmx["java.lang:type=ClassLoading","UnloadedClassCount"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
JMX | Compilation: Name of the current JIT compiler | Displays the total number of classes unloaded since the Java virtual machine has started execution. |
JMX | jmx["java.lang:type=Compilation","Name"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
JMX | Compilation: Accumulated time spent | Displays the approximate accumulated elapsed time spent in compilation, in seconds. |
JMX | jmx["java.lang:type=Compilation","TotalCompilationTime"] Preprocessing: - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
JMX | Memory: Heap memory committed | Current heap memory allocated. This amount of memory is guaranteed for the Java virtual machine to use. |
JMX | jmx["java.lang:type=Memory","HeapMemoryUsage.committed"] |
JMX | Memory: Heap memory maximum size | Maximum amount of heap that can be used for memory management. This amount of memory is not guaranteed to be available if it is greater than the amount of committed memory. The Java virtual machine may fail to allocate memory even if the amount of used memory does not exceed this maximum size. |
JMX | jmx["java.lang:type=Memory","HeapMemoryUsage.max"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
JMX | Memory: Heap memory used | Current memory usage outside the heap. |
JMX | jmx["java.lang:type=Memory","HeapMemoryUsage.used"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
JMX | Memory: Non-Heap memory committed | Current memory allocated outside the heap. This amount of memory is guaranteed for the Java virtual machine to use. |
JMX | jmx["java.lang:type=Memory","NonHeapMemoryUsage.committed"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
JMX | Memory: Non-Heap memory maximum size | Maximum amount of non-heap memory that can be used for memory management. This amount of memory is not guaranteed to be available if it is greater than the amount of committed memory. The Java virtual machine may fail to allocate memory even if the amount of used memory does not exceed this maximum size. |
JMX | jmx["java.lang:type=Memory","NonHeapMemoryUsage.max"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
JMX | Memory: Non-Heap memory used | Current memory usage outside the heap |
JMX | jmx["java.lang:type=Memory","NonHeapMemoryUsage.used"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
JMX | Memory: Object pending finalization count | The approximate number of objects for which finalization is pending. |
JMX | jmx["java.lang:type=Memory","ObjectPendingFinalizationCount"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
JMX | OperatingSystem: File descriptors maximum count | This is the number of file descriptors we can have opened in the same process, as determined by the operating system. You can never have more file descriptors than this number. |
JMX | jmx["java.lang:type=OperatingSystem","MaxFileDescriptorCount"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
JMX | OperatingSystem: File descriptors opened | This is the number of opened file descriptors at the moment, if this reaches the MaxFileDescriptorCount, the application will throw an IOException: Too many open files. This could mean you are opening file descriptors and never closing them. |
JMX | jmx["java.lang:type=OperatingSystem","OpenFileDescriptorCount"] |
JMX | OperatingSystem: Process CPU Load | ProcessCpuLoad represents the CPU load in this process. |
JMX | jmx["java.lang:type=OperatingSystem","ProcessCpuLoad"] Preprocessing: - MULTIPLIER: |
JMX | Runtime: JVM uptime | - |
JMX | jmx["java.lang:type=Runtime","Uptime"] Preprocessing: - MULTIPLIER: |
JMX | Runtime: JVM name | - |
JMX | jmx["java.lang:type=Runtime","VmName"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
JMX | Runtime: JVM version | - |
JMX | jmx["java.lang:type=Runtime","VmVersion"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
JMX | Threading: Daemon thread count | Number of daemon threads running. |
JMX | jmx["java.lang:type=Threading","DaemonThreadCount"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
JMX | Threading: Peak thread count | Maximum number of threads being executed at the same time since the JVM was started or the peak was reset. |
JMX | jmx["java.lang:type=Threading","PeakThreadCount"] |
JMX | Threading: Thread count | The number of threads running at the current moment. |
JMX | jmx["java.lang:type=Threading","ThreadCount"] |
JMX | Threading: Total started thread count | The number of threads started since the JVM was launched. |
JMX | jmx["java.lang:type=Threading","TotalStartedThreadCount"] |
JMX | GarbageCollector: {#JMXNAME} number of collections per second | Displays the total number of collections that have occurred per second. |
JMX | jmx["java.lang:name={#JMXNAME},type=GarbageCollector","CollectionCount"] Preprocessing: - CHANGEPERSECOND |
JMX | GarbageCollector: {#JMXNAME} accumulated time spent in collection | Displays the approximate accumulated collection elapsed time, in seconds. |
JMX | jmx["java.lang:name={#JMXNAME},type=GarbageCollector","CollectionTime"] Preprocessing: - MULTIPLIER: - DISCARDUNCHANGEDHEARTBEAT: |
JMX | Memory pool: {#JMXNAME} committed | Current memory allocated. |
JMX | jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.committed"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
JMX | Memory pool: {#JMXNAME} maximum size | Maximum amount of memory that can be used for memory management. This amount of memory is not guaranteed to be available if it is greater than the amount of committed memory. The Java virtual machine may fail to allocate memory even if the amount of used memory does not exceed this maximum size. |
JMX | jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.max"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
JMX | Memory pool: {#JMXNAME} used | Current memory usage. |
JMX | jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.used"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Compilation: {HOST.NAME} uses suboptimal JIT compiler | - |
find(/Generic Java JMX/jmx["java.lang:type=Compilation","Name"],,"like","Client")=1 |
INFO | Manual close: YES |
Memory: Heap memory usage is high | - |
min(/Generic Java JMX/jmx["java.lang:type=Memory","HeapMemoryUsage.used"],{$JMX.HEAP.MEM.USAGE.TIME})>(last(/Generic Java JMX/jmx["java.lang:type=Memory","HeapMemoryUsage.max"])*{$JMX.HEAP.MEM.USAGE.MAX}/100) and last(/Generic Java JMX/jmx["java.lang:type=Memory","HeapMemoryUsage.max"])>0 |
WARNING | |
Memory: Non-Heap memory usage is high | - |
min(/Generic Java JMX/jmx["java.lang:type=Memory","NonHeapMemoryUsage.used"],{$JMX.NONHEAP.MEM.USAGE.TIME})>(last(/Generic Java JMX/jmx["java.lang:type=Memory","NonHeapMemoryUsage.max"])*{$JMX.NONHEAP.MEM.USAGE.MAX}/100) and last(/Generic Java JMX/jmx["java.lang:type=Memory","NonHeapMemoryUsage.max"])>0 |
WARNING | |
OperatingSystem: Opened file descriptor count is high | - |
min(/Generic Java JMX/jmx["java.lang:type=OperatingSystem","OpenFileDescriptorCount"],{$JMX.FILE.DESCRIPTORS.TIME})>(last(/Generic Java JMX/jmx["java.lang:type=OperatingSystem","MaxFileDescriptorCount"])*{$JMX.FILE.DESCRIPTORS.MAX}/100) |
WARNING | |
OperatingSystem: Process CPU Load is high | - |
min(/Generic Java JMX/jmx["java.lang:type=OperatingSystem","ProcessCpuLoad"],{$JMX.CPU.LOAD.TIME})>{$JMX.CPU.LOAD.MAX} |
AVERAGE | |
Runtime: JVM is not reachable | - |
nodata(/Generic Java JMX/jmx["java.lang:type=Runtime","Uptime"],5m)=1 |
AVERAGE | Manual close: YES |
Runtime: {HOST.NAME} runs suboptimal VM type | - |
find(/Generic Java JMX/jmx["java.lang:type=Runtime","VmName"],,"like","Server")<>1 |
INFO | Manual close: YES |
Memory pool: {#JMXNAME} memory usage is high | - |
min(/Generic Java JMX/jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.used"],{$JMX.MP.USAGE.TIME:"{#JMXNAME}"})>(last(/Generic Java JMX/jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.max"])*{$JMX.MP.USAGE.MAX:"{#JMXNAME}"}/100) and last(/Generic Java JMX/jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.max"])>0 |
WARNING |
Please report any issues with the template at https://support.zabbix.com
For Zabbix version: 6.2 and higher
Refer to the vendor documentation.
No specific Zabbix configuration is required.
There are no template links in this template.
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Services | FTP service is running | - |
SIMPLE | net.tcp.service[ftp] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
FTP service is down on {HOST.NAME} | - |
max(/FTP Service/net.tcp.service[ftp],#3)=0 |
AVERAGE |
Please report any issues with the template at https://support.zabbix.com
For Zabbix version: 6.2 and higher
Official Template for Microsoft Exchange Server 2016.
This template was tested on:
See Zabbix template operation for basic instructions.
Metrics are collected by Zabbix agent active.
1. Import the template into Zabbix.
2. Link the imported template to a host with MS Exchange.
Note that template doesn't provide information about Windows services state. Recommended to use it with "OS Windows by Zabbix agent active" template.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$AGENT.TIMEOUT} | Timeout after which agent is considered unavailable. |
5m |
{$MS.EXCHANGE.DB.ACTIVE.READ.TIME} | The time during which the active database read operations latency may exceed the threshold. |
5m |
{$MS.EXCHANGE.DB.ACTIVE.READ.WARN} | Threshold for active database read operations latency trigger. |
0.02 |
{$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME} | The time during which the active database write operations latency may exceed the threshold. |
10m |
{$MS.EXCHANGE.DB.ACTIVE.WRITE.WARN} | Threshold for active database write operations latency trigger. |
0.05 |
{$MS.EXCHANGE.DB.FAULTS.TIME} | The time during which the database page faults may exceed the threshold. |
5m |
{$MS.EXCHANGE.DB.FAULTS.WARN} | Threshold for database page faults trigger. |
0 |
{$MS.EXCHANGE.DB.PASSIVE.READ.TIME} | The time during which the passive database read operations latency may exceed the threshold. |
5m |
{$MS.EXCHANGE.DB.PASSIVE.READ.WARN} | Threshold for passive database read operations latency trigger. |
0.2 |
{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME} | The time during which the passive database write operations latency may exceed the threshold. |
10m |
{$MS.EXCHANGE.LDAP.TIME} | The time during which the LDAP metrics may exceed the threshold. |
5m |
{$MS.EXCHANGE.LDAP.WARN} | Threshold for LDAP triggers. |
0.05 |
{$MS.EXCHANGE.LOG.STALLS.TIME} | The time during which the log records stalled may exceed the threshold. |
10m |
{$MS.EXCHANGE.LOG.STALLS.WARN} | Threshold for log records stalled trigger. |
100 |
{$MS.EXCHANGE.PERF.INTERVAL} | Update interval for perfcounteren items. |
60 |
{$MS.EXCHANGE.RPC.COUNT.TIME} | The time during which the RPC total requests may exceed the threshold. |
5m |
{$MS.EXCHANGE.RPC.COUNT.WARN} | Threshold for LDAP triggers. |
70 |
{$MS.EXCHANGE.RPC.TIME} | The time during which the RPC requests latency may exceed the threshold. |
10m |
{$MS.EXCHANGE.RPC.WARN} | Threshold for RPC requests latency trigger. |
0.05 |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Databases discovery | Discovery of Exchange databases. |
ZABBIX_ACTIVE | perf_instance.discovery["MSExchange Active Manager"] Preprocessing: - JAVASCRIPT: |
LDAP discovery | Discovery of domain controller. |
ZABBIX_ACTIVE | perfinstanceen.discovery["MSExchange ADAccess Domain Controllers"] |
Web services discovery | Discovery of Exchange web services. |
ZABBIX_ACTIVE | perfinstanceen.discovery["Web Service"] |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
MS Exchange | MS Exchange: Databases total mounted | Shows the number of active database copies on the server. |
ZABBIX_ACTIVE | perfcounteren["\MSExchange Active Manager(total)\Database Mounted"] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:3h |
MS Exchange | MS Exchange [Client Access Server]: ActiveSync: ping command pending | Shows the number of ping commands currently pending in the queue. |
ZABBIX_ACTIVE | perfcounteren["\MSExchange ActiveSync\Ping Commands Pending", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | MS Exchange [Client Access Server]: ActiveSync: requests per second | Shows the number of HTTP requests received from the client via ASP.NET per second. Determines the current Exchange ActiveSync request rate. Used only to determine current user load. |
ZABBIX_ACTIVE | perfcounteren["\MSExchange ActiveSync\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | MS Exchange [Client Access Server]: ActiveSync: sync commands per second | Shows the number of sync commands processed per second. Clients use this command to synchronize items within a folder. |
ZABBIX_ACTIVE | perfcounteren["\MSExchange ActiveSync\Sync Commands/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | MS Exchange [Client Access Server]: Autodiscover: requests per second | Shows the number of Autodiscover service requests processed each second. Determines current user load. |
ZABBIX_ACTIVE | perfcounteren["\MSExchangeAutodiscover\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | MS Exchange [Client Access Server]: Availability Service: availability requests per second | Shows the number of requests serviced per second. The request can be only for free/ busy information or include suggestions. One request may contain multiple mailboxes. Determines the rate at which Availability service requests are occurring. |
ZABBIX_ACTIVE | perfcounteren["\MSExchange Availability Service\Availability Requests (sec)", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | MS Exchange [Client Access Server]: Outlook Web App: current unique users | Shows the number of unique users currently logged on to Outlook Web App. This value monitors the number of unique active user sessions, so that users are only removed from this counter after they log off or their session times out. Determines current user load. |
ZABBIX_ACTIVE | perfcounteren["\MSExchange OWA\Current Unique Users", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | MS Exchange [Client Access Server]: Outlook Web App: requests per second | Shows the number of requests handled by Outlook Web App per second. Determines current user load. |
ZABBIX_ACTIVE | perfcounteren["\MSExchange OWA\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | MS Exchange [Client Access Server]: MSExchangeWS: requests per second | Shows the number of requests processed each second. Determines current user load. |
ZABBIX_ACTIVE | perfcounteren["\MSExchangeWS\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | MS Exchange: Active agent availability | Availability of active checks on the host. The value of this item corresponds to availability icons in the host list. Possible value: 0 - unknown 1 - available 2 - not available |
INTERNAL | zabbix[host,active_agent,available] |
MS Exchange | Active Manager [{#INSTANCE}]: Database copy role | Database copy active or passive role. |
ZABBIX_ACTIVE | perfcounteren["\MSExchange Active Manager({#INSTANCE})\Database Copy Role Active"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
MS Exchange | Information Store [{#INSTANCE}]: Database state | Database state. Possible values: 0: Database without any copy and dismounted. 1: Database is a primary database and mounted. 2: Database is a passive copy and the state is healthy. |
ZABBIX_ACTIVE | perfcounteren["\MSExchangeIS Store({#INSTANCE})\Database State"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
MS Exchange | Information Store [{#INSTANCE}]: Active mailboxes count | Number of active mailboxes in this database. |
ZABBIX_ACTIVE | perfcounteren["\MSExchangeIS Store({#INSTANCE})\Active mailboxes"] |
MS Exchange | Information Store [{#INSTANCE}]: Page faults per second | Indicates the rate of page faults that can't be serviced because there are no pages available for allocation from the database cache. If this counter is above 0, it's an indication that the MSExchange Database\I/O Database Writes (Attached) Average Latency is too high. |
ZABBIX_ACTIVE | perfcounteren["\MSExchange Database({#INF.STORE})\Database Page Fault Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | Information Store [{#INSTANCE}]: Log records stalled | Indicates the number of log records that can't be added to the log buffers per second because the log buffers are full. The average value should be below 10 per second. Spikes (maximum values) shouldn't be higher than 100 per second. |
ZABBIX_ACTIVE | perfcounteren["\MSExchange Database({#INF.STORE})\Log Record Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | Information Store [{#INSTANCE}]: Log threads waiting | Indicates the number of threads waiting to complete an update of the database by writing their data to the log. |
ZABBIX_ACTIVE | perfcounteren["\MSExchange Database({#INF.STORE})\Log Threads Waiting", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | Information Store [{#INSTANCE}]: RPC requests per second | Shows the number of RPC operations per second for each database instance. |
ZABBIX_ACTIVE | perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC Operations/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | Information Store [{#INSTANCE}]: RPC requests latency | RPC Latency average is the average latency of RPC requests per database. Average is calculated over all RPCs since exrpc32 was loaded. Should be less than 50ms at all times, with spikes less than 100ms. |
ZABBIX_ACTIVE | perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing: - MULTIPLIER: |
MS Exchange | Information Store [{#INSTANCE}]: RPC requests total | Indicates the overall RPC requests currently executing within the information store process. Should be below 70 at all times. |
ZABBIX_ACTIVE | perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC requests", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | Database Counters [{#INSTANCE}]: Active database read operations per second | Shows the number of database read operations. |
ZABBIX_ACTIVE | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached)/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | Database Counters [{#INSTANCE}]: Active database read operations latency | Shows the average length of time per database read operation. Should be less than 20 ms on average. |
ZABBIX_ACTIVE | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing: - MULTIPLIER: |
MS Exchange | Database Counters [{#INSTANCE}]: Passive database read operations latency | Shows the average length of time per passive database read operation. Should be less than 200ms on average. |
ZABBIX_ACTIVE | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing: - MULTIPLIER: |
MS Exchange | Database Counters [{#INSTANCE}]: Active database write operations per second | Shows the number of database write operations per second for each attached database instance. |
ZABBIX_ACTIVE | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached)/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | Database Counters [{#INSTANCE}]: Active database write operations latency | Shows the average length of time per database write operation. Should be less than 50ms on average. |
ZABBIX_ACTIVE | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing: - MULTIPLIER: |
MS Exchange | Database Counters [{#INSTANCE}]: Passive database write operations latency | Shows the average length of time, in ms, per passive database write operation. Should be less than the read latency for the same instance, as measured by the MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency counter. |
ZABBIX_ACTIVE | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing: - MULTIPLIER: |
MS Exchange | Web Service [{#INSTANCE}]: Current connections | Shows the current number of connections established to the each Web Service. |
ZABBIX_ACTIVE | perfcounteren["\Web Service({#INSTANCE})\Current Connections", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | Domain Controller [{#INSTANCE}]: Read time | Time that it takes to send an LDAP read request to the domain controller in question and get a response. Should ideally be below 50 ms; spikes below 100 ms are acceptable. |
ZABBIX_ACTIVE | perfcounteren["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Read Time", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing: - MULTIPLIER: |
MS Exchange | Domain Controller [{#INSTANCE}]: Search time | Time that it takes to send an LDAP search request and get a response. Should ideally be below 50 ms; spikes below 100 ms are acceptable. |
ZABBIX_ACTIVE | perfcounteren["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Search Time", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing: - MULTIPLIER: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MS Exchange: Zabbix agent: active checks are not available | Active checks are considered unavailable. Agent is not sending heartbeat for prolonged time. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/zabbix[host,active_agent,available],{$AGENT.TIMEOUT})=2 |
HIGH | |
Information Store [{#INSTANCE}]: Page faults is too high | Too much page faults stalls for database "{#INSTANCE}". This counter should be 0 on production servers. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database({#INF.STORE})\Database Page Fault Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.FAULTS.TIME})>{$MS.EXCHANGE.DB.FAULTS.WARN} |
AVERAGE | |
Information Store [{#INSTANCE}]: Log records stalls is too high | Stalled log records too high. The average value should be less than 10 threads waiting. |
avg(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database({#INF.STORE})\Log Record Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LOG.STALLS.TIME})>{$MS.EXCHANGE.LOG.STALLS.WARN} |
AVERAGE | |
Information Store [{#INSTANCE}]: RPC Requests latency is too high | Should be less than 50ms at all times, with spikes less than 100ms. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchangeIS Store({#INSTANCE})\RPC Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.RPC.TIME})>{$MS.EXCHANGE.RPC.WARN} |
WARNING | |
Information Store [{#INSTANCE}]: RPC Requests total count is too high | Should be below 70 at all times. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchangeIS Store({#INSTANCE})\RPC requests", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.RPC.COUNT.TIME})>{$MS.EXCHANGE.RPC.COUNT.WARN} |
WARNING | |
Database Counters [{#INSTANCE}]: Average read time latency is too high | Should be less than 20ms on average. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.ACTIVE.READ.TIME})>{$MS.EXCHANGE.DB.ACTIVE.READ.WARN} |
WARNING | |
Database Counters [{#INSTANCE}]: Average read time latency is too high | Should be less than 200ms on average. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.READ.TIME})>{$MS.EXCHANGE.DB.PASSIVE.READ.WARN} |
WARNING | |
Database Counters [{#INSTANCE}]: Average write time latency is too high for {$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME} | Should be less than 50ms on average. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME})>{$MS.EXCHANGE.DB.ACTIVE.WRITE.WARN} |
WARNING | |
Database Counters [{#INSTANCE}]: Average write time latency is higher than read time latency for {$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME} | Should be less than the read latency for the same instance, as measured by the MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency counter. |
avg(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME})>avg(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME}) |
WARNING | |
Domain Controller [{#INSTANCE}]: LDAP read time is too high | Should be less than 50ms at all times, with spikes less than 100ms. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Read Time", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LDAP.TIME})>{$MS.EXCHANGE.LDAP.WARN} |
AVERAGE | |
Domain Controller [{#INSTANCE}]: LDAP search time is too high | Should be less than 50ms at all times, with spikes less than 100ms. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Search Time", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LDAP.TIME})>{$MS.EXCHANGE.LDAP.WARN} |
AVERAGE |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
Official Template for Microsoft Exchange Server 2016.
This template was tested on:
See Zabbix template operation for basic instructions.
Metrics are collected by Zabbix agent.
1. Import the template into Zabbix.
2. Link the imported template to a host with MS Exchange.
Note that template doesn't provide information about Windows services state. Recommended to use it with "OS Windows by Zabbix agent" template.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$MS.EXCHANGE.DB.ACTIVE.READ.TIME} | The time during which the active database read operations latency may exceed the threshold. |
5m |
{$MS.EXCHANGE.DB.ACTIVE.READ.WARN} | Threshold for active database read operations latency trigger. |
0.02 |
{$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME} | The time during which the active database write operations latency may exceed the threshold. |
10m |
{$MS.EXCHANGE.DB.ACTIVE.WRITE.WARN} | Threshold for active database write operations latency trigger. |
0.05 |
{$MS.EXCHANGE.DB.FAULTS.TIME} | The time during which the database page faults may exceed the threshold. |
5m |
{$MS.EXCHANGE.DB.FAULTS.WARN} | Threshold for database page faults trigger. |
0 |
{$MS.EXCHANGE.DB.PASSIVE.READ.TIME} | The time during which the passive database read operations latency may exceed the threshold. |
5m |
{$MS.EXCHANGE.DB.PASSIVE.READ.WARN} | Threshold for passive database read operations latency trigger. |
0.2 |
{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME} | The time during which the passive database write operations latency may exceed the threshold. |
10m |
{$MS.EXCHANGE.LDAP.TIME} | The time during which the LDAP metrics may exceed the threshold. |
5m |
{$MS.EXCHANGE.LDAP.WARN} | Threshold for LDAP triggers. |
0.05 |
{$MS.EXCHANGE.LOG.STALLS.TIME} | The time during which the log records stalled may exceed the threshold. |
10m |
{$MS.EXCHANGE.LOG.STALLS.WARN} | Threshold for log records stalled trigger. |
100 |
{$MS.EXCHANGE.PERF.INTERVAL} | Update interval for perfcounteren items. |
60 |
{$MS.EXCHANGE.RPC.COUNT.TIME} | The time during which the RPC total requests may exceed the threshold. |
5m |
{$MS.EXCHANGE.RPC.COUNT.WARN} | Threshold for LDAP triggers. |
70 |
{$MS.EXCHANGE.RPC.TIME} | The time during which the RPC requests latency may exceed the threshold. |
10m |
{$MS.EXCHANGE.RPC.WARN} | Threshold for RPC requests latency trigger. |
0.05 |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Databases discovery | Discovery of Exchange databases. |
ZABBIX_PASSIVE | perf_instance.discovery["MSExchange Active Manager"] Preprocessing: - JAVASCRIPT: |
LDAP discovery | Discovery of domain controller. |
ZABBIX_PASSIVE | perfinstanceen.discovery["MSExchange ADAccess Domain Controllers"] |
Web services discovery | Discovery of Exchange web services. |
ZABBIX_PASSIVE | perfinstanceen.discovery["Web Service"] |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
MS Exchange | MS Exchange: Databases total mounted | Shows the number of active database copies on the server. |
ZABBIX_PASSIVE | perfcounteren["\MSExchange Active Manager(total)\Database Mounted"] Preprocessing: - DISCARD UNCHANGED_HEARTBEAT:3h |
MS Exchange | MS Exchange [Client Access Server]: ActiveSync: ping command pending | Shows the number of ping commands currently pending in the queue. |
ZABBIX_PASSIVE | perfcounteren["\MSExchange ActiveSync\Ping Commands Pending", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | MS Exchange [Client Access Server]: ActiveSync: requests per second | Shows the number of HTTP requests received from the client via ASP.NET per second. Determines the current Exchange ActiveSync request rate. Used only to determine current user load. |
ZABBIX_PASSIVE | perfcounteren["\MSExchange ActiveSync\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | MS Exchange [Client Access Server]: ActiveSync: sync commands per second | Shows the number of sync commands processed per second. Clients use this command to synchronize items within a folder. |
ZABBIX_PASSIVE | perfcounteren["\MSExchange ActiveSync\Sync Commands/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | MS Exchange [Client Access Server]: Autodiscover: requests per second | Shows the number of Autodiscover service requests processed each second. Determines current user load. |
ZABBIX_PASSIVE | perfcounteren["\MSExchangeAutodiscover\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | MS Exchange [Client Access Server]: Availability Service: availability requests per second | Shows the number of requests serviced per second. The request can be only for free/ busy information or include suggestions. One request may contain multiple mailboxes. Determines the rate at which Availability service requests are occurring. |
ZABBIX_PASSIVE | perfcounteren["\MSExchange Availability Service\Availability Requests (sec)", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | MS Exchange [Client Access Server]: Outlook Web App: current unique users | Shows the number of unique users currently logged on to Outlook Web App. This value monitors the number of unique active user sessions, so that users are only removed from this counter after they log off or their session times out. Determines current user load. |
ZABBIX_PASSIVE | perfcounteren["\MSExchange OWA\Current Unique Users", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | MS Exchange [Client Access Server]: Outlook Web App: requests per second | Shows the number of requests handled by Outlook Web App per second. Determines current user load. |
ZABBIX_PASSIVE | perfcounteren["\MSExchange OWA\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | MS Exchange [Client Access Server]: MSExchangeWS: requests per second | Shows the number of requests processed each second. Determines current user load. |
ZABBIX_PASSIVE | perfcounteren["\MSExchangeWS\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | Active Manager [{#INSTANCE}]: Database copy role | Database copy active or passive role. |
ZABBIX_PASSIVE | perfcounteren["\MSExchange Active Manager({#INSTANCE})\Database Copy Role Active"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
MS Exchange | Information Store [{#INSTANCE}]: Database state | Database state. Possible values: 0: Database without any copy and dismounted. 1: Database is a primary database and mounted. 2: Database is a passive copy and the state is healthy. |
ZABBIX_PASSIVE | perfcounteren["\MSExchangeIS Store({#INSTANCE})\Database State"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
MS Exchange | Information Store [{#INSTANCE}]: Active mailboxes count | Number of active mailboxes in this database. |
ZABBIX_PASSIVE | perfcounteren["\MSExchangeIS Store({#INSTANCE})\Active mailboxes"] |
MS Exchange | Information Store [{#INSTANCE}]: Page faults per second | Indicates the rate of page faults that can't be serviced because there are no pages available for allocation from the database cache. If this counter is above 0, it's an indication that the MSExchange Database\I/O Database Writes (Attached) Average Latency is too high. |
ZABBIX_PASSIVE | perfcounteren["\MSExchange Database({#INF.STORE})\Database Page Fault Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | Information Store [{#INSTANCE}]: Log records stalled | Indicates the number of log records that can't be added to the log buffers per second because the log buffers are full. The average value should be below 10 per second. Spikes (maximum values) shouldn't be higher than 100 per second. |
ZABBIX_PASSIVE | perfcounteren["\MSExchange Database({#INF.STORE})\Log Record Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | Information Store [{#INSTANCE}]: Log threads waiting | Indicates the number of threads waiting to complete an update of the database by writing their data to the log. |
ZABBIX_PASSIVE | perfcounteren["\MSExchange Database({#INF.STORE})\Log Threads Waiting", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | Information Store [{#INSTANCE}]: RPC requests per second | Shows the number of RPC operations per second for each database instance. |
ZABBIX_PASSIVE | perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC Operations/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | Information Store [{#INSTANCE}]: RPC requests latency | RPC Latency average is the average latency of RPC requests per database. Average is calculated over all RPCs since exrpc32 was loaded. Should be less than 50ms at all times, with spikes less than 100ms. |
ZABBIX_PASSIVE | perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing: - MULTIPLIER: |
MS Exchange | Information Store [{#INSTANCE}]: RPC requests total | Indicates the overall RPC requests currently executing within the information store process. Should be below 70 at all times. |
ZABBIX_PASSIVE | perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC requests", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | Database Counters [{#INSTANCE}]: Active database read operations per second | Shows the number of database read operations. |
ZABBIX_PASSIVE | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached)/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | Database Counters [{#INSTANCE}]: Active database read operations latency | Shows the average length of time per database read operation. Should be less than 20 ms on average. |
ZABBIX_PASSIVE | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing: - MULTIPLIER: |
MS Exchange | Database Counters [{#INSTANCE}]: Passive database read operations latency | Shows the average length of time per passive database read operation. Should be less than 200ms on average. |
ZABBIX_PASSIVE | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing: - MULTIPLIER: |
MS Exchange | Database Counters [{#INSTANCE}]: Active database write operations per second | Shows the number of database write operations per second for each attached database instance. |
ZABBIX_PASSIVE | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached)/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | Database Counters [{#INSTANCE}]: Active database write operations latency | Shows the average length of time per database write operation. Should be less than 50ms on average. |
ZABBIX_PASSIVE | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing: - MULTIPLIER: |
MS Exchange | Database Counters [{#INSTANCE}]: Passive database write operations latency | Shows the average length of time, in ms, per passive database write operation. Should be less than the read latency for the same instance, as measured by the MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency counter. |
ZABBIX_PASSIVE | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing: - MULTIPLIER: |
MS Exchange | Web Service [{#INSTANCE}]: Current connections | Shows the current number of connections established to the each Web Service. |
ZABBIX_PASSIVE | perfcounteren["\Web Service({#INSTANCE})\Current Connections", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange | Domain Controller [{#INSTANCE}]: Read time | Time that it takes to send an LDAP read request to the domain controller in question and get a response. Should ideally be below 50 ms; spikes below 100 ms are acceptable. |
ZABBIX_PASSIVE | perfcounteren["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Read Time", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing: - MULTIPLIER: |
MS Exchange | Domain Controller [{#INSTANCE}]: Search time | Time that it takes to send an LDAP search request and get a response. Should ideally be below 50 ms; spikes below 100 ms are acceptable. |
ZABBIX_PASSIVE | perfcounteren["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Search Time", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing: - MULTIPLIER: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Information Store [{#INSTANCE}]: Page faults is too high | Too much page faults stalls for database "{#INSTANCE}". This counter should be 0 on production servers. |
min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database({#INF.STORE})\Database Page Fault Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.FAULTS.TIME})>{$MS.EXCHANGE.DB.FAULTS.WARN} |
AVERAGE | |
Information Store [{#INSTANCE}]: Log records stalls is too high | Stalled log records too high. The average value should be less than 10 threads waiting. |
avg(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database({#INF.STORE})\Log Record Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LOG.STALLS.TIME})>{$MS.EXCHANGE.LOG.STALLS.WARN} |
AVERAGE | |
Information Store [{#INSTANCE}]: RPC Requests latency is too high | Should be less than 50ms at all times, with spikes less than 100ms. |
min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchangeIS Store({#INSTANCE})\RPC Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.RPC.TIME})>{$MS.EXCHANGE.RPC.WARN} |
WARNING | |
Information Store [{#INSTANCE}]: RPC Requests total count is too high | Should be below 70 at all times. |
min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchangeIS Store({#INSTANCE})\RPC requests", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.RPC.COUNT.TIME})>{$MS.EXCHANGE.RPC.COUNT.WARN} |
WARNING | |
Database Counters [{#INSTANCE}]: Average read time latency is too high | Should be less than 20ms on average. |
min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.ACTIVE.READ.TIME})>{$MS.EXCHANGE.DB.ACTIVE.READ.WARN} |
WARNING | |
Database Counters [{#INSTANCE}]: Average read time latency is too high | Should be less than 200ms on average. |
min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.READ.TIME})>{$MS.EXCHANGE.DB.PASSIVE.READ.WARN} |
WARNING | |
Database Counters [{#INSTANCE}]: Average write time latency is too high for {$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME} | Should be less than 50ms on average. |
min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME})>{$MS.EXCHANGE.DB.ACTIVE.WRITE.WARN} |
WARNING | |
Database Counters [{#INSTANCE}]: Average write time latency is higher than read time latency for {$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME} | Should be less than the read latency for the same instance, as measured by the MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency counter. |
avg(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME})>avg(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME}) |
WARNING | |
Domain Controller [{#INSTANCE}]: LDAP read time is too high | Should be less than 50ms at all times, with spikes less than 100ms. |
min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Read Time", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LDAP.TIME})>{$MS.EXCHANGE.LDAP.WARN} |
AVERAGE | |
Domain Controller [{#INSTANCE}]: LDAP search time is too high | Should be less than 50ms at all times, with spikes less than 100ms. |
min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Search Time", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LDAP.TIME})>{$MS.EXCHANGE.LDAP.WARN} |
AVERAGE |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher.
This template is designed to monitor etcd
by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template Etcd by HTTP
— collects metrics by help of the HTTP agent from /metrics
endpoint.
Refer to the vendor documentation.
For the users of etcd version <= 3.4
!
In
etcd v3.5
some metrics have been deprecated. See more details on Upgrade etcd from 3.4 to 3.5. Please upgrade youretcd
instance, or use olderEtcd by HTTP
template version.
This template has been tested on:
See Zabbix template operation for basic instructions.
Follow these instructions:
etcd
allows the collection of metrics. You can test it by running: curl -L http://localhost:2379/metrics
.etcd
is accessible from Zabbix proxy or Zabbix server depending on where you are planning to do the monitoring. To verify it, run curl -L http://<etcd_node_address>:2379/metrics
.etcd node
. By default, the template uses a client's port.
You can configure metrics endpoint location by adding --listen-metrics-urls flag
.
(For more details, see etcd documentation).Additional points to consider:
etcd
, don't forget to change macros: {$ETCD.SCHEME}
and {$ETCD.PORT}
.{$ETCD.USERNAME}
and {$ETCD.PASSWORD}
macros in the template to use on a host level if necessary.zabbix_get -s etcd-host -k etcd.health
.No specific Zabbix configuration is required.
Name | Description | Default | |
---|---|---|---|
{$ETCD.GRPC.ERRORS.MAX.WARN} | The maximum number of gRPC request failures. |
1 |
|
{$ETCD.GRPC_CODE.MATCHES} | The filter of discoverable gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md. |
.* |
|
{$ETCD.GRPCCODE.NOTMATCHES} | The filter to exclude discovered gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md. |
CHANGE_IF_NEEDED |
|
{$ETCD.GRPC_CODE.TRIGGER.MATCHES} | The filter of discoverable gRPC codes, which will create triggers. |
`Aborted | Unavailable` |
{$ETCD.HTTP.FAIL.MAX.WARN} | The maximum number of HTTP request failures. |
2 |
|
{$ETCD.LEADER.CHANGES.MAX.WARN} | The maximum number of leader changes. |
5 |
|
{$ETCD.OPEN.FDS.MAX.WARN} | The maximum percentage of used file descriptors. |
90 |
|
{$ETCD.PASSWORD} | - |
`` | |
{$ETCD.PORT} | The port of |
2379 |
|
{$ETCD.PROPOSAL.FAIL.MAX.WARN} | The maximum number of proposal failures. |
2 |
|
{$ETCD.PROPOSAL.PENDING.MAX.WARN} | The maximum number of proposals in queue. |
5 |
|
{$ETCD.SCHEME} | The request scheme which may be |
http |
|
{$ETCD.USER} | - |
`` |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
gRPC codes discovery | - |
DEPENDENT | etcd.grpccode.discovery Preprocessing: - PROMETHEUS TOJSON:grpc_server_handled_total - JAVASCRIPT: - DISCARD UNCHANGEDHEARTBEAT:1h Filter: AND- {#GRPC.CODE} NOT MATCHESREGEX{$ETCD.GRPC_CODE.NOT_MATCHES} - {#GRPC.CODE} MATCHES REGEX{$ETCD.GRPC_CODE.MATCHES} Overrides: trigger |
Peers discovery | - |
DEPENDENT | etcd.peer.discovery Preprocessing: - PROMETHEUSTOJSON: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Etcd | Etcd: Service's TCP port state | - |
SIMPLE | net.tcp.service["{$ETCD.SCHEME}","{HOST.CONN}","{$ETCD.PORT}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Etcd | Etcd: Node health | - |
HTTP_AGENT | etcd.health Preprocessing: - JSONPATH: - BOOLTODECIMAL ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
Etcd | Etcd: Server is a leader | It defines - whether or not this member is a leader: 1 - it is; 0 - otherwise. |
DEPENDENT | etcd.is.leader Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - DISCARDUNCHANGEDHEARTBEAT: |
Etcd | Etcd: Server has a leader | It defines - whether or not a leader exists: 1 - it exists; 0 - it does not. |
DEPENDENT | etcd.has.leader Preprocessing: - PROMETHEUSPATTERN: - DISCARDUNCHANGED_HEARTBEAT: |
Etcd | Etcd: Leader changes | The number of leader changes the member has seen since its start. |
DEPENDENT | etcd.leader.changes Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Proposals committed per second | The number of consensus proposals committed. |
DEPENDENT | etcd.proposals.committed.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
Etcd | Etcd: Proposals applied per second | The number of consensus proposals applied. |
DEPENDENT | etcd.proposals.applied.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
Etcd | Etcd: Proposals failed per second | The number of failed proposals seen. |
DEPENDENT | etcd.proposals.failed.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
Etcd | Etcd: Proposals pending | The current number of pending proposals to commit. |
DEPENDENT | etcd.proposals.pending Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Reads per second | The number of read actions by |
DEPENDENT | etcd.reads.rate Preprocessing: - PROMETHEUSTOJSON: - JAVASCRIPT: - CHANGEPERSECOND |
Etcd | Etcd: Writes per second | The number of writes (e.g., |
DEPENDENT | etcd.writes.rate Preprocessing: - PROMETHEUSTOJSON: - JAVASCRIPT: - CHANGEPERSECOND |
Etcd | Etcd: Client gRPC received bytes per second | The number of bytes received from gRPC clients per second. |
DEPENDENT | etcd.network.grpc.received.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
Etcd | Etcd: Client gRPC sent bytes per second | The number of bytes sent from gRPC clients per second. |
DEPENDENT | etcd.network.grpc.sent.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
Etcd | Etcd: HTTP requests received | The number of requests received into the system (successfully parsed and |
DEPENDENT | etcd.http.requests.rate Preprocessing: - PROMETHEUSTOJSON: - JAVASCRIPT: - CHANGEPERSECOND |
Etcd | Etcd: HTTP 5XX | The number of handled failures of requests (non-watches), by the method ( |
DEPENDENT | etcd.http.requests.5xx.rate Preprocessing: - PROMETHEUSTOJSON: - JAVASCRIPT: - CHANGEPERSECOND |
Etcd | Etcd: HTTP 4XX | The number of handled failures of requests (non-watches), by the method ( |
DEPENDENT | etcd.http.requests.4xx.rate Preprocessing: - PROMETHEUSTOJSON: - JAVASCRIPT: - CHANGEPERSECOND |
Etcd | Etcd: RPCs received per second | The number of RPC stream messages received on the server. |
DEPENDENT | etcd.grpc.received.rate Preprocessing: - PROMETHEUSTOJSON: - JAVASCRIPT: - CHANGEPERSECOND |
Etcd | Etcd: RPCs sent per second | The number of gRPC stream messages sent by the server. |
DEPENDENT | etcd.grpc.sent.rate Preprocessing: - PROMETHEUSTOJSON: - JAVASCRIPT: - CHANGEPERSECOND |
Etcd | Etcd: RPCs started per second | The number of RPCs started on the server. |
DEPENDENT | etcd.grpc.started.rate Preprocessing: - PROMETHEUSTOJSON: - JAVASCRIPT: - CHANGEPERSECOND |
Etcd | Etcd: Server version | The version of the |
DEPENDENT | etcd.server.version Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Etcd | Etcd: Cluster version | The version of the |
DEPENDENT | etcd.cluster.version Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Etcd | Etcd: DB size | The total size of the underlying database. |
DEPENDENT | etcd.db.size Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Keys compacted per second | The number of DB keys compacted per second. |
DEPENDENT | etcd.keys.compacted.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Etcd | Etcd: Keys expired per second | The number of expired keys per second. |
DEPENDENT | etcd.keys.expired.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
Etcd | Etcd: Keys total | The total number of keys. |
DEPENDENT | etcd.keys.total Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Uptime |
|
DEPENDENT | etcd.uptime Preprocessing: - PROMETHEUS_PATTERN: - JAVASCRIPT: |
Etcd | Etcd: Virtual memory | The size of virtual memory expressed in bytes. |
DEPENDENT | etcd.virtual.bytes Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Resident memory | The size of resident memory expressed in bytes. |
DEPENDENT | etcd.res.bytes Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: CPU | The total user and system CPU time spent in seconds. |
DEPENDENT | etcd.cpu.util Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
Etcd | Etcd: Open file descriptors | The number of open file descriptors. |
DEPENDENT | etcd.open.fds Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Maximum open file descriptors | The Maximum number of open file descriptors. |
DEPENDENT | etcd.max.fds Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: Deletes per second | The number of deletes seen by this member per second. |
DEPENDENT | etcd.delete.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
Etcd | Etcd: PUT per second | The number of puts seen by this member per second. |
DEPENDENT | etcd.put.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
Etcd | Etcd: Range per second | The number of ranges seen by this member per second. |
DEPENDENT | etcd.range.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
Etcd | Etcd: Transaction per second | The number of transactions seen by this member per second. |
DEPENDENT | etcd.txn.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
Etcd | Etcd: Pending events | The total number of pending events to be sent. |
DEPENDENT | etcd.events.sent.rate Preprocessing: - PROMETHEUS_PATTERN: |
Etcd | Etcd: RPCs completed with code {#GRPC.CODE} | The number of RPCs completed on the server with grpc_code {#GRPC.CODE}. |
DEPENDENT | etcd.grpc.handled.rate[{#GRPC.CODE}] Preprocessing: - PROMETHEUSTOJSON: - JAVASCRIPT: - CHANGEPERSECOND |
Etcd | Etcd: Etcd peer {#ETCD.PEER}: Bytes sent | The number of bytes sent to a peer with the ID |
DEPENDENT | etcd.bytes.sent.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Etcd | Etcd: Etcd peer {#ETCD.PEER}: Bytes received | The number of bytes received from a peer with the ID |
DEPENDENT | etcd.bytes.received.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Etcd | Etcd: Etcd peer {#ETCD.PEER}: Send failures | The number of sent failures from a peer with the ID |
DEPENDENT | etcd.sent.fail.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Etcd | Etcd: Etcd peer {#ETCD.PEER}: Receive failures | The number of received failures from a peer with the ID |
DEPENDENT | etcd.received.fail.rate[{#ETCD.PEER}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Zabbix raw items | Etcd: Get node metrics | - |
HTTP_AGENT | etcd.get_metrics |
Zabbix raw items | Etcd: Get version | - |
HTTP_AGENT | etcd.get_version |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Etcd: Service is unavailable | - |
last(/Etcd by HTTP/net.tcp.service["{$ETCD.SCHEME}","{HOST.CONN}","{$ETCD.PORT}"])=0 |
AVERAGE | Manual close: YES |
Etcd: Node healthcheck failed | See more details on https://etcd.io/docs/v3.5/op-guide/monitoring/#health-check. |
last(/Etcd by HTTP/etcd.health)=0 |
AVERAGE | Depends on: - Etcd: Service is unavailable |
Etcd: Failed to fetch info data | Zabbix has not received data for items for the last 30 minutes. |
nodata(/Etcd by HTTP/etcd.is.leader,30m)=1 |
WARNING | Manual close: YES Depends on: - Etcd: Service is unavailable |
Etcd: Member has no leader | If a member does not have a leader, it is totally unavailable. |
last(/Etcd by HTTP/etcd.has.leader)=0 |
AVERAGE | |
Etcd: Instance has seen too many leader changes | Rapid leadership changes impact the performance of |
(max(/Etcd by HTTP/etcd.leader.changes,15m)-min(/Etcd by HTTP/etcd.leader.changes,15m))>{$ETCD.LEADER.CHANGES.MAX.WARN} |
WARNING | |
Etcd: Too many proposal failures | Normally related to two issues: temporary failures related to a leader election or longer downtime caused by a loss of quorum in the cluster. |
min(/Etcd by HTTP/etcd.proposals.failed.rate,5m)>{$ETCD.PROPOSAL.FAIL.MAX.WARN} |
WARNING | |
Etcd: Too many proposals are queued to commit | Rising pending proposals suggests there is a high client load, or the member cannot commit proposals. |
min(/Etcd by HTTP/etcd.proposals.pending,5m)>{$ETCD.PROPOSAL.PENDING.MAX.WARN} |
WARNING | |
Etcd: Too many HTTP requests failures | Too many requests failed on |
min(/Etcd by HTTP/etcd.http.requests.5xx.rate,5m)>{$ETCD.HTTP.FAIL.MAX.WARN} |
WARNING | |
Etcd: Server version has changed | The Etcd version has changed. Acknowledge to close manually. |
last(/Etcd by HTTP/etcd.server.version,#1)<>last(/Etcd by HTTP/etcd.server.version,#2) and length(last(/Etcd by HTTP/etcd.server.version))>0 |
INFO | Manual close: YES |
Etcd: Cluster version has changed | The Etcd version has changed. Acknowledge to close manually. |
last(/Etcd by HTTP/etcd.cluster.version,#1)<>last(/Etcd by HTTP/etcd.cluster.version,#2) and length(last(/Etcd by HTTP/etcd.cluster.version))>0 |
INFO | Manual close: YES |
Etcd: Host has been restarted | The host uptime is less than 10 minutes. |
last(/Etcd by HTTP/etcd.uptime)<10m |
INFO | Manual close: YES |
Etcd: Current number of open files is too high | Heavy usage of a file descriptor (i.e., near the limit of the process's file descriptor) indicates a potential file descriptor exhaustion issue. If the file descriptors are exhausted, |
min(/Etcd by HTTP/etcd.open.fds,5m)/last(/Etcd by HTTP/etcd.max.fds)*100>{$ETCD.OPEN.FDS.MAX.WARN} |
WARNING | |
Etcd: Too many failed gRPC requests with code: {#GRPC.CODE} | - |
min(/Etcd by HTTP/etcd.grpc.handled.rate[{#GRPC.CODE}],5m)>{$ETCD.GRPC.ERRORS.MAX.WARN} |
WARNING |
Please report any issues with the template at https://support.zabbix.com.
For Zabbix version: 6.2 and higher
The template to monitor Envoy Proxy by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Envoy Proxy by HTTP
— collects metrics by HTTP agent from metrics endpoint {$ENVOY.METRICS.PATH} endpoint (default: /stats/prometheus).
This template was tested on:
See Zabbix template operation for basic instructions.
Internal service metrics are collected from {$ENVOY.METRICS.PATH} endpoint (default: /stats/prometheus). https://www.envoyproxy.io/docs/envoy/v1.20.0/operations/stats_overview
Don't forget to change macros {$ENVOY.URL}, {$ENVOY.METRICS.PATH}.
Also, see the Macros section for a list of macros used to set trigger values.
NOTE. Some metrics may not be collected depending on your Envoy Proxy instance version and configuration.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$ENVOY.CERT.MIN} | Minimum number of days before certificate expiration used for trigger expression. |
7 |
{$ENVOY.METRICS.PATH} | The path Zabbix will scrape metrics in prometheus format from. |
/stats/prometheus |
{$ENVOY.URL} | Instance URL. |
http://localhost:9901 |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster metrics discovery | - |
DEPENDENT | envoy.lld.cluster Preprocessing: - PROMETHEUSTOJSON - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
HTTP metrics discovery | - |
DEPENDENT | envoy.lld.http Preprocessing: - PROMETHEUSTOJSON - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
Listeners metrics discovery | - |
DEPENDENT | envoy.lld.listeners Preprocessing: - PROMETHEUSTOJSON - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Envoy Proxy | Envoy Proxy: Server state | State of the server. Live - (default) Server is live and serving traffic. Draining - Server is draining listeners in response to external health checks failing. Pre initializing - Server has not yet completed cluster manager initialization. Initializing - Server is running the cluster manager initialization callbacks (e.g., RDS). |
DEPENDENT | envoy.server.state Preprocessing: - PROMETHEUSPATTERN: - DISCARDUNCHANGED_HEARTBEAT: |
Envoy Proxy | Envoy Proxy: Server live | 1 if the server is not currently draining, 0 otherwise. |
DEPENDENT | envoy.server.live Preprocessing: - PROMETHEUSPATTERN: - DISCARDUNCHANGED_HEARTBEAT: |
Envoy Proxy | Envoy Proxy: Uptime | Current server uptime in seconds. |
DEPENDENT | envoy.server.uptime Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Envoy Proxy | Envoy Proxy: Certificate expiration, day before | Number of days until the next certificate being managed will expire. |
DEPENDENT | envoy.server.daysuntilfirstcertexpiring Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Server concurrency | Number of worker threads. |
DEPENDENT | envoy.server.concurrency Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Memory allocated | Current amount of allocated memory in bytes. Total of both new and old Envoy processes on hot restart. |
DEPENDENT | envoy.server.memoryallocated Preprocessing: - PROMETHEUS PATTERN:envoy_server_memory_allocated |
Envoy Proxy | Envoy Proxy: Memory heap size | Current reserved heap size in bytes. New Envoy process heap size on hot restart. |
DEPENDENT | envoy.server.memoryheapsize Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Memory physical size | Current estimate of total bytes of the physical memory. New Envoy process physical memory size on hot restart. |
DEPENDENT | envoy.server.memoryphysicalsize Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Filesystem, flushed by timer rate | Total number of times internal flush buffers are written to a file due to flush timeout per second. |
DEPENDENT | envoy.filesystem.flushedbytimer.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
Envoy Proxy | Envoy Proxy: Filesystem, write completed rate | Total number of times a file was written per second. |
DEPENDENT | envoy.filesystem.writecompleted.rate Preprocessing: - PROMETHEUS PATTERN:envoy_filesystem_write_completed - CHANGEPERSECOND |
Envoy Proxy | Envoy Proxy: Filesystem, write failed rate | Total number of times an error occurred during a file write operation per second. |
DEPENDENT | envoy.filesystem.writefailed.rate Preprocessing: - PROMETHEUS PATTERN:envoy_filesystem_write_failed - CHANGEPERSECOND |
Envoy Proxy | Envoy Proxy: Filesystem, reopen failed rate | Total number of times a file was failed to be opened per second. |
DEPENDENT | envoy.filesystem.reopenfailed.rate Preprocessing: - PROMETHEUS PATTERN:envoy_filesystem_reopen_failed - CHANGEPERSECOND |
Envoy Proxy | Envoy Proxy: Connections, total | Total connections of both new and old Envoy processes. |
DEPENDENT | envoy.server.totalconnections Preprocessing: - PROMETHEUS PATTERN:envoy_server_total_connections |
Envoy Proxy | Envoy Proxy: Connections, parent | Total connections of the old Envoy process on hot restart. |
DEPENDENT | envoy.server.parentconnections Preprocessing: - PROMETHEUS PATTERN:envoy_server_parent_connections |
Envoy Proxy | Envoy Proxy: Clusters, warming | Number of currently warming (not active) clusters. |
DEPENDENT | envoy.clustermanager.warmingclusters Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Clusters, active | Number of currently active (warmed) clusters. |
DEPENDENT | envoy.clustermanager.activeclusters Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Clusters, added rate | Total clusters added (either via static config or CDS) per second. |
DEPENDENT | envoy.clustermanager.clusteradded.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
Envoy Proxy | Envoy Proxy: Clusters, modified rate | Total clusters modified (via CDS) per second. |
DEPENDENT | envoy.clustermanager.clustermodified.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
Envoy Proxy | Envoy Proxy: Clusters, removed rate | Total clusters removed (via CDS) per second. |
DEPENDENT | envoy.clustermanager.clusterremoved.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
Envoy Proxy | Envoy Proxy: Clusters, updates rate | Total cluster updates per second. |
DEPENDENT | envoy.clustermanager.clusterupdated.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
Envoy Proxy | Envoy Proxy: Listeners, active | Number of currently active listeners. |
DEPENDENT | envoy.listenermanager.totallistenersactive Preprocessing: - PROMETHEUS PATTERN:envoy_listener_manager_total_listeners_active : function : sum |
Envoy Proxy | Envoy Proxy: Listeners, draining | Number of currently draining listeners. |
DEPENDENT | envoy.listenermanager.totallistenersdraining Preprocessing: - PROMETHEUS PATTERN:envoy_listener_manager_total_listeners_draining : function : sum |
Envoy Proxy | Envoy Proxy: Listener, warming | Number of currently warming listeners. |
DEPENDENT | envoy.listenermanager.totallistenerswarming Preprocessing: - PROMETHEUS PATTERN:envoy_listener_manager_total_listeners_warming : function : sum |
Envoy Proxy | Envoy Proxy: Listener manager, initialized | A boolean (1 if started and 0 otherwise) that indicates whether listeners have been initialized on workers. |
DEPENDENT | envoy.listenermanager.workersstarted Preprocessing: - PROMETHEUSPATTERN: - DISCARDUNCHANGED_HEARTBEAT: |
Envoy Proxy | Envoy Proxy: Listeners, create failure | Total failed listener object additions to workers per second. |
DEPENDENT | envoy.listenermanager.listenercreatefailure.rate Preprocessing: - PROMETHEUS PATTERN:envoy_listener_manager_listener_create_failure - CHANGEPERSECOND |
Envoy Proxy | Envoy Proxy: Listeners, create success | Total listener objects successfully added to workers per second. |
DEPENDENT | envoy.listenermanager.listenercreatesuccess.rate Preprocessing: - PROMETHEUS PATTERN:envoy_listener_manager_listener_create_success - CHANGEPERSECOND |
Envoy Proxy | Envoy Proxy: Listeners, added | Total listeners added (either via static config or LDS) per second. |
DEPENDENT | envoy.listenermanager.listeneradded.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
Envoy Proxy | Envoy Proxy: Listeners, stopped | Total listeners stopped per second. |
DEPENDENT | envoy.listenermanager.listenerstopped.rate Preprocessing: - PROMETHEUSPATTERN: - CHANGEPER_SECOND |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, total | Current cluster membership total. |
DEPENDENT | envoy.cluster.membershiptotal["{#CLUSTERNAME}"] Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, healthy | Current cluster healthy total (inclusive of both health checking and outlier detection). |
DEPENDENT | envoy.cluster.membershiphealthy["{#CLUSTERNAME}"] Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, unhealthy | Current cluster unhealthy. |
CALCULATED | envoy.cluster.membershipunhealthy["{#CLUSTERNAME}"] Expression: last(//envoy.cluster.membership_total["{#CLUSTER_NAME}"]) - last(//envoy.cluster.membership_healthy["{#CLUSTER_NAME}"]) |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, degraded | Current cluster degraded total. |
DEPENDENT | envoy.cluster.membershipdegraded["{#CLUSTERNAME}"] Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Connections, total | Current cluster total connections. |
DEPENDENT | envoy.cluster.upstreamcxtotal["{#CLUSTERNAME}"] Preprocessing: - PROMETHEUS PATTERN:envoy_cluster_upstream_cx_total{envoy_cluster_name = "{#CLUSTER_NAME}"} : function : sum |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Connections, active | Current cluster total active connections. |
DEPENDENT | envoy.cluster.upstreamcxactive["{#CLUSTERNAME}"] Preprocessing: - PROMETHEUS PATTERN:envoy_cluster_upstream_cx_active{envoy_cluster_name = "{#CLUSTER_NAME}"} : function : sum |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests total, rate | Current cluster request total per second. |
DEPENDENT | envoy.cluster.upstreamrqtotal.rate["{#CLUSTERNAME}"] Preprocessing: - PROMETHEUS PATTERN:envoy_cluster_upstream_rq_total{envoy_cluster_name = "{#CLUSTER_NAME}"} : function : sum - CHANGEPERSECOND |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests timeout, rate | Current cluster requests that timed out waiting for a response per second. |
DEPENDENT | envoy.cluster.upstreamrqtimeout.rate["{#CLUSTERNAME}"] Preprocessing: - PROMETHEUS PATTERN:envoy_cluster_upstream_rq_timeout{envoy_cluster_name = "{#CLUSTER_NAME}"} : function : sum - CHANGEPERSECOND |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests completed, rate | Total upstream requests completed per second. |
DEPENDENT | envoy.cluster.upstreamrqcompleted.rate["{#CLUSTERNAME}"] Preprocessing: - PROMETHEUS PATTERN:envoy_cluster_upstream_rq_completed{envoy_cluster_name = "{#CLUSTER_NAME}"} : function : sum - CHANGEPERSECOND |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 2xx, rate | Aggregate HTTP response codes per second. |
DEPENDENT | envoy.cluster.upstreamrq2x.rate["{#CLUSTERNAME}"] Preprocessing: - PROMETHEUS PATTERN:envoy_cluster_upstream_rq_xx{envoy_cluster_name = "{#CLUSTER_NAME}", envoy_response_code_class="2"} : function : sum - CHANGEPERSECOND |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 3xx, rate | Aggregate HTTP response codes per second. |
DEPENDENT | envoy.cluster.upstreamrq3x.rate["{#CLUSTERNAME}"] Preprocessing: - PROMETHEUS PATTERN:envoy_cluster_upstream_rq_xx{envoy_cluster_name = "{#CLUSTER_NAME}", envoy_response_code_class="3"} : function : sum - CHANGEPERSECOND |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 4xx, rate | Aggregate HTTP response codes per second. |
DEPENDENT | envoy.cluster.upstreamrq4x.rate["{#CLUSTERNAME}"] Preprocessing: - PROMETHEUS PATTERN:envoy_cluster_upstream_rq_xx{envoy_cluster_name = "{#CLUSTER_NAME}", envoy_response_code_class="4"} : function : sum - CHANGEPERSECOND |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 5xx, rate | Aggregate HTTP response codes per second. |
DEPENDENT | envoy.cluster.upstreamrq5x.rate["{#CLUSTERNAME}"] Preprocessing: - PROMETHEUS PATTERN:envoy_cluster_upstream_rq_xx{envoy_cluster_name = "{#CLUSTER_NAME}", envoy_response_code_class="5"} : function : sum - CHANGEPERSECOND |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests pending | Total active requests pending a connection pool connection. |
DEPENDENT | envoy.cluster.upstreamrqpendingactive["{#CLUSTERNAME}"] Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests active | Total active requests. |
DEPENDENT | envoy.cluster.upstreamrqactive["{#CLUSTERNAME}"] Preprocessing: - PROMETHEUS PATTERN:envoy_cluster_upstream_rq_active{envoy_cluster_name = "{#CLUSTER_NAME}"} : function : sum |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Upstream bytes out, rate | Total sent connection bytes per second. |
DEPENDENT | envoy.cluster.upstreamcxtxbytestotal.rate["{#CLUSTERNAME}"] Preprocessing: - PROMETHEUS PATTERN:envoy_cluster_upstream_cx_tx_bytes_total{envoy_cluster_name = "{#CLUSTER_NAME}"} : function : sum - CHANGEPERSECOND |
Envoy Proxy | Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Upstream bytes in, rate | Total received connection bytes per second. |
DEPENDENT | envoy.cluster.upstreamcxrxbytestotal.rate["{#CLUSTERNAME}"] Preprocessing: - PROMETHEUS PATTERN:envoy_cluster_upstream_cx_rx_bytes_total{envoy_cluster_name = "{#CLUSTER_NAME}"} : function : sum - CHANGEPERSECOND |
Envoy Proxy | Envoy Proxy: Listener ["{#LISTENER_ADDRESS}"]: Connections, active | Total active connections. |
DEPENDENT | envoy.listener.downstreamcxactive["{#LISTENERADDRESS}"] Preprocessing: - PROMETHEUS PATTERN:envoy_listener_downstream_cx_active{envoy_listener_address = "{#LISTENER_ADDRESS}"} : function : sum |
Envoy Proxy | Envoy Proxy: Listener ["{#LISTENER_ADDRESS}"]: Connections, rate | Total connections per second. |
DEPENDENT | envoy.listener.downstreamcxtotal.rate["{#LISTENERADDRESS}"] Preprocessing: - PROMETHEUS PATTERN:envoy_listener_downstream_cx_total{envoy_listener_address = "{#LISTENER_ADDRESS}"} : function : sum - CHANGEPERSECOND |
Envoy Proxy | Envoy Proxy: Listener ["{#LISTENER_ADDRESS}"]: Sockets, undergoing | Sockets currently undergoing listener filter processing. |
DEPENDENT | envoy.listener.downstreamprecxactive["{#LISTENERADDRESS}"] Preprocessing: - PROMETHEUS_PATTERN: |
Envoy Proxy | Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Requests, rate | Total active connections per second. |
DEPENDENT | envoy.http.downstreamrqtotal.rate["{#CONNMANAGER}"] Preprocessing: - PROMETHEUS PATTERN:envoy_http_downstream_rq_total{envoy_http_conn_manager_prefix = "{#CONN_MANAGER}"} : function : sum - CHANGEPERSECOND |
Envoy Proxy | Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Requests, active | Total active requests. |
DEPENDENT | envoy.http.downstreamrqactive["{#CONNMANAGER}"] Preprocessing: - PROMETHEUS PATTERN:envoy_http_downstream_rq_active{envoy_http_conn_manager_prefix = "{#CONN_MANAGER}"} : function : sum |
Envoy Proxy | Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Requests timeout, rate | Total requests closed due to a timeout on the request path per second. |
DEPENDENT | envoy.http.downstreamrqtimeout["{#CONNMANAGER}"] Preprocessing: - PROMETHEUS PATTERN:envoy_http_downstream_rq_timeout{envoy_http_conn_manager_prefix = "{#CONN_MANAGER}"} : function : sum - CHANGEPERSECOND |
Envoy Proxy | Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Connections, rate | Total connections per second. |
DEPENDENT | envoy.http.downstreamcxtotal["{#CONNMANAGER}"] Preprocessing: - PROMETHEUS PATTERN:envoy_http_downstream_cx_total{envoy_http_conn_manager_prefix = "{#CONN_MANAGER}"} : function : sum - CHANGEPERSECOND |
Envoy Proxy | Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Connections, active | Total active connections. |
DEPENDENT | envoy.http.downstreamcxactive["{#CONNMANAGER}"] Preprocessing: - PROMETHEUS PATTERN:envoy_http_downstream_cx_active{envoy_http_conn_manager_prefix = "{#CONN_MANAGER}"} : function : sum |
Envoy Proxy | Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Bytes in, rate | Total bytes received per second. |
DEPENDENT | envoy.http.downstreamcxrxbytestotal.rate["{#CONNMANAGER}"] Preprocessing: - PROMETHEUS PATTERN:envoy_http_downstream_cx_rx_bytes_total{envoy_http_conn_manager_prefix = "{#CONN_MANAGER}"} : function : sum - CHANGEPERSECOND |
Envoy Proxy | Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Bytes out, rate | Total bytes sent per second. |
DEPENDENT | envoy.http.downstreamcxtxbytestota.rate["{#CONNMANAGER}"] Preprocessing: - PROMETHEUS PATTERN:envoy_http_downstream_cx_tx_bytes_total{envoy_http_conn_manager_prefix = "{#CONN_MANAGER}"} : function : sum - CHANGEPERSECOND |
Zabbix raw items | Envoy Proxy: Get node metrics | Get server metrics. |
HTTP_AGENT | envoy.getmetrics Preprocessing: - CHECK NOTSUPPORTED⛔️ON FAIL:DISCARD_VALUE -> |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Envoy Proxy: Server state is not live | - |
last(/Envoy Proxy by HTTP/envoy.server.state) > 0 |
AVERAGE | |
Envoy Proxy: Service has been restarted | Uptime is less than 10 minutes. |
last(/Envoy Proxy by HTTP/envoy.server.uptime)<10m |
INFO | Manual close: YES |
Envoy Proxy: Failed to fetch metrics data | Zabbix has not received data for items for the last 10 minutes. |
nodata(/Envoy Proxy by HTTP/envoy.server.uptime,10m)=1 |
WARNING | Manual close: YES |
Envoy Proxy: SSL certificate expires soon | Please check certificate. Less than {$ENVOY.CERT.MIN} days left until the next certificate being managed will expire. |
last(/Envoy Proxy by HTTP/envoy.server.days_until_first_cert_expiring)<{$ENVOY.CERT.MIN} |
WARNING | |
Envoy Proxy: There are unhealthy clusters | - |
last(/Envoy Proxy by HTTP/envoy.cluster.membership_unhealthy["{#CLUSTER_NAME}"]) > 0 |
AVERAGE |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
The template to monitor Elasticsearch by Zabbix that work without any external scripts.
It works with both standalone and cluster instances.
The metrics are collected in one pass remotely using an HTTP agent.
They are getting values from REST API _cluster/health, _cluster/stats, _nodes/stats requests.
This template was tested on:
See Zabbix template operation for basic instructions.
You can set {$ELASTICSEARCH.USERNAME} and {$ELASTICSEARCH.PASSWORD} macros in the template for using on the host level. If you use an atypical location ES API, don't forget to change the macros {$ELASTICSEARCH.SCHEME},{$ELASTICSEARCH.PORT}.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN} | Maximum of fetch latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN} | Maximum of flush latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.HEAP_USED.MAX.CRIT} | The maximum percent in the use of JVM heap for critically trigger expression. |
95 |
{$ELASTICSEARCH.HEAP_USED.MAX.WARN} | The maximum percent in the use of JVM heap for warning trigger expression. |
85 |
{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN} | Maximum of indexing latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.PASSWORD} | The password of the Elasticsearch. |
`` |
{$ELASTICSEARCH.PORT} | The port of the Elasticsearch host. |
9200 |
{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN} | Maximum of query latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} | The ES cluster maximum response time in seconds for trigger expression. |
10s |
{$ELASTICSEARCH.SCHEME} | The scheme of the Elasticsearch (http/https). |
http |
{$ELASTICSEARCH.USERNAME} | The username of the Elasticsearch. |
`` |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster nodes discovery | Discovery ES cluster nodes. |
HTTP_AGENT | es.nodes.discovery Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
ES cluster | ES: Service status | Checks if the service is running and accepting TCP connections. |
SIMPLE | net.tcp.service["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
ES cluster | ES: Service response time | Checks performance of the TCP service. |
SIMPLE | net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"] |
ES cluster | ES: Cluster health status | Health status of the cluster, based on the state of its primary and replica shards. Statuses are: green All shards are assigned. yellow All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired. red One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned. |
DEPENDENT | es.cluster.status Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
ES cluster | ES: Number of nodes | The number of nodes within the cluster. |
DEPENDENT | es.cluster.numberofnodes Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
ES cluster | ES: Number of data nodes | The number of nodes that are dedicated to data nodes. |
DEPENDENT | es.cluster.numberofdatanodes Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
ES cluster | ES: Number of relocating shards | The number of shards that are under relocation. |
DEPENDENT | es.cluster.relocating_shards Preprocessing: - JSONPATH: |
ES cluster | ES: Number of initializing shards | The number of shards that are under initialization. |
DEPENDENT | es.cluster.initializing_shards Preprocessing: - JSONPATH: |
ES cluster | ES: Number of unassigned shards | The number of shards that are not allocated. |
DEPENDENT | es.cluster.unassigned_shards Preprocessing: - JSONPATH: |
ES cluster | ES: Delayed unassigned shards | The number of shards whose allocation has been delayed by the timeout settings. |
DEPENDENT | es.cluster.delayedunassignedshards Preprocessing: - JSONPATH: |
ES cluster | ES: Number of pending tasks | The number of cluster-level changes that have not yet been executed. |
DEPENDENT | es.cluster.numberofpending_tasks Preprocessing: - JSONPATH: |
ES cluster | ES: Task max waiting in queue | The time expressed in seconds since the earliest initiated task is waiting for being performed. |
DEPENDENT | es.cluster.taskmaxwaitinginqueue Preprocessing: - JSONPATH: - MULTIPLIER: |
ES cluster | ES: Inactive shards percentage | The ratio of inactive shards in the cluster expressed as a percentage. |
DEPENDENT | es.cluster.inactiveshardspercentasnumber Preprocessing: - JSONPATH: - JAVASCRIPT: |
ES cluster | ES: Cluster uptime | Uptime duration in seconds since JVM has last started. |
DEPENDENT | es.nodes.jvm.max_uptime Preprocessing: - JSONPATH: - MULTIPLIER: |
ES cluster | ES: Number of non-deleted documents | The total number of non-deleted documents across all primary shards assigned to the selected nodes. This number is based on the documents in Lucene segments and may include the documents from nested fields. |
DEPENDENT | es.indices.docs.count Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
ES cluster | ES: Indices with shards assigned to nodes | The total number of indices with shards assigned to the selected nodes. |
DEPENDENT | es.indices.count Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
ES cluster | ES: Total size of all file stores | The total size in bytes of all file stores across all selected nodes. |
DEPENDENT | es.nodes.fs.totalinbytes Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
ES cluster | ES: Total available size to JVM in all file stores | The total number of bytes available to JVM in the file stores across all selected nodes. Depending on OS or process-level restrictions, this number may be less than nodes.fs.freeinbyes. This is the actual amount of free disk space the selected Elasticsearch nodes can use. |
DEPENDENT | es.nodes.fs.availableinbytes Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
ES cluster | ES: Nodes with the data role | The number of selected nodes with the data role. |
DEPENDENT | es.nodes.count.data Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
ES cluster | ES: Nodes with the ingest role | The number of selected nodes with the ingest role. |
DEPENDENT | es.nodes.count.ingest Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
ES cluster | ES: Nodes with the master role | The number of selected nodes with the master role. |
DEPENDENT | es.nodes.count.master Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
ES cluster | ES {#ES.NODE}: Total size | Total size (in bytes) of all file stores. |
DEPENDENT | es.node.fs.total.totalinbytes[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
ES cluster | ES {#ES.NODE}: Total available size | The total number of bytes available to this Java virtual machine on all file stores. Depending on OS or process level restrictions, this might appear less than fs.total.freeinbytes. This is the actual amount of free disk space the Elasticsearch node can utilize. |
DEPENDENT | es.node.fs.total.availableinbytes[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
ES cluster | ES {#ES.NODE}: Node uptime | JVM uptime in seconds. |
DEPENDENT | es.node.jvm.uptime[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: |
ES cluster | ES {#ES.NODE}: Maximum JVM memory available for use | The maximum amount of memory, in bytes, available for use by the heap. |
DEPENDENT | es.node.jvm.mem.heapmaxinbytes[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
ES cluster | ES {#ES.NODE}: Amount of JVM heap currently in use | The memory, in bytes, currently in use by the heap. |
DEPENDENT | es.node.jvm.mem.heapusedinbytes[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
ES cluster | ES {#ES.NODE}: Percent of JVM heap currently in use | The percentage of memory currently in use by the heap. |
DEPENDENT | es.node.jvm.mem.heapusedpercent[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
ES cluster | ES {#ES.NODE}: Amount of JVM heap committed | The amount of memory, in bytes, available for use by the heap. |
DEPENDENT | es.node.jvm.mem.heapcommittedinbytes[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
ES cluster | ES {#ES.NODE}: Number of open HTTP connections | The number of currently open HTTP connections for the node. |
DEPENDENT | es.node.http.currentopen[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
ES cluster | ES {#ES.NODE}: Rate of HTTP connections opened | The number of HTTP connections opened for the node per second. |
DEPENDENT | es.node.http.opened.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
ES cluster | ES {#ES.NODE}: Time spent throttling operations | Time in seconds spent throttling operations for the last measuring span. |
DEPENDENT | es.node.indices.indexing.throttletime[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: - SIMPLE CHANGE |
ES cluster | ES {#ES.NODE}: Time spent throttling recovery operations | Time in seconds spent throttling recovery operations for the last measuring span. |
DEPENDENT | es.node.indices.recovery.throttletime[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: - SIMPLE CHANGE |
ES cluster | ES {#ES.NODE}: Time spent throttling merge operations | Time in seconds spent throttling merge operations for the last measuring span. |
DEPENDENT | es.node.indices.merges.totalthrottledtime[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: - SIMPLE_CHANGE |
ES cluster | ES {#ES.NODE}: Rate of queries | The number of query operations per second. |
DEPENDENT | es.node.indices.search.query.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
ES cluster | ES {#ES.NODE}: Time spent performing query | Time in seconds spent performing query operations for the last measuring span. |
DEPENDENT | es.node.indices.search.querytime[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: - SIMPLE CHANGE |
ES cluster | ES {#ES.NODE}: Query latency | The average query latency calculated by sampling the total number of queries and the total elapsed time at regular intervals. |
CALCULATED | es.node.indices.search.query_latency[{#ES.NODE}] Expression: change(//es.node.indices.search.query_time_in_millis[{#ES.NODE}]) / ( change(//es.node.indices.search.query_total[{#ES.NODE}]) + (change(//es.node.indices.search.query_total[{#ES.NODE}]) = 0) ) |
ES cluster | ES {#ES.NODE}: Current query operations | The number of query operations currently running. |
DEPENDENT | es.node.indices.search.query_current[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES cluster | ES {#ES.NODE}: Rate of fetch | The number of fetch operations per second. |
DEPENDENT | es.node.indices.search.fetch.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
ES cluster | ES {#ES.NODE}: Time spent performing fetch | Time in seconds spent performing fetch operations for the last measuring span. |
DEPENDENT | es.node.indices.search.fetchtime[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: - SIMPLE CHANGE |
ES cluster | ES {#ES.NODE}: Fetch latency | The average fetch latency calculated by sampling the total number of fetches and the total elapsed time at regular intervals. |
CALCULATED | es.node.indices.search.fetch_latency[{#ES.NODE}] Expression: change(//es.node.indices.search.fetch_time_in_millis[{#ES.NODE}]) / ( change(//es.node.indices.search.fetch_total[{#ES.NODE}]) + (change(//es.node.indices.search.fetch_total[{#ES.NODE}]) = 0) ) |
ES cluster | ES {#ES.NODE}: Current fetch operations | The number of fetch operations currently running. |
DEPENDENT | es.node.indices.search.fetch_current[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES cluster | ES {#ES.NODE}: Write thread pool executor tasks completed | The number of tasks completed by the write thread pool executor. |
DEPENDENT | es.node.threadpool.write.completed.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
ES cluster | ES {#ES.NODE}: Write thread pool active threads | The number of active threads in the write thread pool. |
DEPENDENT | es.node.thread_pool.write.active[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES cluster | ES {#ES.NODE}: Write thread pool tasks in queue | The number of tasks in queue for the write thread pool. |
DEPENDENT | es.node.thread_pool.write.queue[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES cluster | ES {#ES.NODE}: Write thread pool executor tasks rejected | The number of tasks rejected by the write thread pool executor. |
DEPENDENT | es.node.threadpool.write.rejected.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
ES cluster | ES {#ES.NODE}: Search thread pool executor tasks completed | The number of tasks completed by the search thread pool executor. |
DEPENDENT | es.node.threadpool.search.completed.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
ES cluster | ES {#ES.NODE}: Search thread pool active threads | The number of active threads in the search thread pool. |
DEPENDENT | es.node.thread_pool.search.active[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES cluster | ES {#ES.NODE}: Search thread pool tasks in queue | The number of tasks in queue for the search thread pool. |
DEPENDENT | es.node.thread_pool.search.queue[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES cluster | ES {#ES.NODE}: Search thread pool executor tasks rejected | The number of tasks rejected by the search thread pool executor. |
DEPENDENT | es.node.threadpool.search.rejected.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
ES cluster | ES {#ES.NODE}: Refresh thread pool executor tasks completed | The number of tasks completed by the refresh thread pool executor. |
DEPENDENT | es.node.threadpool.refresh.completed.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
ES cluster | ES {#ES.NODE}: Refresh thread pool active threads | The number of active threads in the refresh thread pool. |
DEPENDENT | es.node.thread_pool.refresh.active[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES cluster | ES {#ES.NODE}: Refresh thread pool tasks in queue | The number of tasks in queue for the refresh thread pool. |
DEPENDENT | es.node.thread_pool.refresh.queue[{#ES.NODE}] Preprocessing: - JSONPATH: |
ES cluster | ES {#ES.NODE}: Refresh thread pool executor tasks rejected | The number of tasks rejected by the refresh thread pool executor. |
DEPENDENT | es.node.threadpool.refresh.rejected.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
ES cluster | ES {#ES.NODE}: Indexing latency | The average indexing latency calculated from the available indextotal and indextimeinmillis metrics. |
CALCULATED | es.node.indices.indexing.index_latency[{#ES.NODE}] Expression: change(//es.node.indices.indexing.index_time_in_millis[{#ES.NODE}]) / ( change(//es.node.indices.indexing.index_total[{#ES.NODE}]) + (change(//es.node.indices.indexing.index_total[{#ES.NODE}]) = 0) ) |
ES cluster | ES {#ES.NODE}: Current indexing operations | The number of indexing operations currently running. |
DEPENDENT | es.node.indices.indexing.indexcurrent[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
ES cluster | ES {#ES.NODE}: Flush latency | The average flush latency calculated from the available flush.total and flush.totaltimein_millis metrics. |
CALCULATED | es.node.indices.flush.latency[{#ES.NODE}] Expression: change(//es.node.indices.flush.total_time_in_millis[{#ES.NODE}]) / ( change(//es.node.indices.flush.total[{#ES.NODE}]) + (change(//es.node.indices.flush.total[{#ES.NODE}]) = 0) ) |
ES cluster | ES {#ES.NODE}: Rate of index refreshes | The number of refresh operations per second. |
DEPENDENT | es.node.indices.refresh.rate[{#ES.NODE}] Preprocessing: - JSONPATH: - CHANGEPERSECOND |
ES cluster | ES {#ES.NODE}: Time spent performing refresh | Time in seconds spent performing refresh operations for the last measuring span. |
DEPENDENT | es.node.indices.refresh.time[{#ES.NODE}] Preprocessing: - JSONPATH: - MULTIPLIER: - SIMPLE_CHANGE |
Zabbix raw items | ES: Get cluster health | Returns the health status of a cluster. |
HTTP_AGENT | es.cluster.get_health |
Zabbix raw items | ES: Get cluster stats | Returns cluster statistics. |
HTTP_AGENT | es.cluster.get_stats |
Zabbix raw items | ES: Get nodes stats | Returns cluster nodes statistics. |
HTTP_AGENT | es.nodes.get_stats |
Zabbix raw items | ES {#ES.NODE}: Total number of query | The total number of query operations. |
DEPENDENT | es.node.indices.search.querytotal[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
Zabbix raw items | ES {#ES.NODE}: Total time spent performing query | Time in milliseconds spent performing query operations. |
DEPENDENT | es.node.indices.search.querytimeinmillis[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
Zabbix raw items | ES {#ES.NODE}: Total number of fetch | The total number of fetch operations. |
DEPENDENT | es.node.indices.search.fetchtotal[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
Zabbix raw items | ES {#ES.NODE}: Total time spent performing fetch | Time in milliseconds spent performing fetch operations. |
DEPENDENT | es.node.indices.search.fetchtimeinmillis[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
Zabbix raw items | ES {#ES.NODE}: Total number of indexing | The total number of indexing operations. |
DEPENDENT | es.node.indices.indexing.indextotal[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
Zabbix raw items | ES {#ES.NODE}: Total time spent performing indexing | Total time in milliseconds spent performing indexing operations. |
DEPENDENT | es.node.indices.indexing.indextimeinmillis[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
Zabbix raw items | ES {#ES.NODE}: Total number of index flushes to disk | The total number of flush operations. |
DEPENDENT | es.node.indices.flush.total[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Zabbix raw items | ES {#ES.NODE}: Total time spent on flushing indices to disk | Total time in milliseconds spent performing flush operations. |
DEPENDENT | es.node.indices.flush.totaltimeinmillis[{#ES.NODE}] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
ES: Service is down | The service is unavailable or does not accept TCP connections. |
last(/Elasticsearch Cluster by HTTP/net.tcp.service["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"])=0 |
AVERAGE | Manual close: YES |
ES: Service response time is too high | The performance of the TCP service is very low. |
min(/Elasticsearch Cluster by HTTP/net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{HOST.CONN}","{$ELASTICSEARCH.PORT}"],5m)>{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} |
WARNING | Manual close: YES Depends on: - ES: Service is down |
ES: Health is YELLOW | All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=1 |
AVERAGE | |
ES: Health is RED | One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=2 |
HIGH | |
ES: Health is UNKNOWN | The health status of the cluster is unknown or cannot be obtained. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=255 |
HIGH | |
ES: The number of nodes within the cluster has decreased | - |
change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)<0 |
INFO | Manual close: YES |
ES: The number of nodes within the cluster has increased | - |
change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)>0 |
INFO | Manual close: YES |
ES: Cluster has the initializing shards | The cluster has the initializing shards longer than 10 minutes. |
min(/Elasticsearch Cluster by HTTP/es.cluster.initializing_shards,10m)>0 |
AVERAGE | |
ES: Cluster has the unassigned shards | The cluster has the unassigned shards longer than 10 minutes. |
min(/Elasticsearch Cluster by HTTP/es.cluster.unassigned_shards,10m)>0 |
AVERAGE | |
ES: Cluster has been restarted | Uptime is less than 10 minutes. |
last(/Elasticsearch Cluster by HTTP/es.nodes.jvm.max_uptime)<10m |
INFO | Manual close: YES |
ES: Cluster does not have enough space for resharding | There is not enough disk space for index resharding. |
(last(/Elasticsearch Cluster by HTTP/es.nodes.fs.total_in_bytes)-last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes))/(last(/Elasticsearch Cluster by HTTP/es.cluster.number_of_data_nodes)-1)>last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes) |
HIGH | |
ES: Cluster has only two master nodes | The cluster has only two nodes with a master role and will be unavailable if one of them breaks. |
last(/Elasticsearch Cluster by HTTP/es.nodes.count.master)=2 |
DISASTER | |
ES {#ES.NODE}: has been restarted | Uptime is less than 10 minutes. |
last(/Elasticsearch Cluster by HTTP/es.node.jvm.uptime[{#ES.NODE}])<10m |
INFO | Manual close: YES |
ES {#ES.NODE}: Percent of JVM heap in use is high | This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes. |
min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.WARN} |
WARNING | Depends on: - ES {#ES.NODE}: Percent of JVM heap in use is critical |
ES {#ES.NODE}: Percent of JVM heap in use is critical | This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. To address this problem, you can either increase your heap size (as long as it remains below the recommended guidelines stated above), or scale out the cluster by adding more nodes. |
min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.CRIT} |
HIGH | |
ES {#ES.NODE}: Query latency is too high | If latency exceeds a threshold, look for potential resource bottlenecks, or investigate whether you need to optimize your queries. |
min(/Elasticsearch Cluster by HTTP/es.node.indices.search.query_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN} |
WARNING | |
ES {#ES.NODE}: Fetch latency is too high | The fetch phase should typically take much less time than the query phase. If you notice this metric consistently increasing, this could indicate a problem with slow disks, enriching of documents (highlighting the relevant text in search results, etc.), or requesting too many results. |
min(/Elasticsearch Cluster by HTTP/es.node.indices.search.fetch_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN} |
WARNING | |
ES {#ES.NODE}: Write thread pool executor has the rejected tasks | The number of tasks rejected by the write thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.write.rejected.rate[{#ES.NODE}],5m)>0 |
WARNING | |
ES {#ES.NODE}: Search thread pool executor has the rejected tasks | The number of tasks rejected by the search thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.search.rejected.rate[{#ES.NODE}],5m)>0 |
WARNING | |
ES {#ES.NODE}: Refresh thread pool executor has the rejected tasks | The number of tasks rejected by the refresh thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}],5m)>0 |
WARNING | |
ES {#ES.NODE}: Indexing latency is too high | If the latency is increasing, it may indicate that you are indexing too many documents at the same time (Elasticsearch's documentation recommends starting with a bulk indexing size of 5 to 15 megabytes and increasing slowly from there). |
min(/Elasticsearch Cluster by HTTP/es.node.indices.indexing.index_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN} |
WARNING | |
ES {#ES.NODE}: Flush latency is too high | If you see this metric increasing steadily, it may indicate a problem with slow disks; this problem may escalate and eventually prevent you from being able to add new information to your index. |
min(/Elasticsearch Cluster by HTTP/es.node.indices.flush.latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN} |
WARNING |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
https://www.elastic.co/guide/en/elasticsearch/reference/index.html
For Zabbix version: 6.2 and higher
The template to monitor Docker engine by Zabbix that work without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Docker by Zabbix agent 2
— collects metrics by polling zabbix-agent2.
This template was tested on:
See Zabbix template operation for basic instructions.
Setup and configure zabbix-agent2 compiled with the Docker monitoring plugin.
Test availability: zabbix_get -s docker-host -k docker.info
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$DOCKER.LLD.FILTER.CONTAINER.MATCHES} | Filter of discoverable containers |
.* |
{$DOCKER.LLD.FILTER.CONTAINER.NOT_MATCHES} | Filter to exclude discovered containers |
CHANGE_IF_NEEDED |
{$DOCKER.LLD.FILTER.IMAGE.MATCHES} | Filter of discoverable images |
.* |
{$DOCKER.LLD.FILTER.IMAGE.NOT_MATCHES} | Filter to exclude discovered images |
CHANGE_IF_NEEDED |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Containers discovery | Discovery for containers metrics Parameter: true - Returns all containers false - Returns only running containers |
ZABBIX_PASSIVE | docker.containers.discovery[false] Filter: AND- {#NAME} MATCHESREGEX - {#NAME} NOTMATCHES_REGEX |
Images discovery | Discovery for images metrics |
ZABBIX_PASSIVE | docker.images.discovery Filter: AND- {#NAME} MATCHESREGEX - {#NAME} NOTMATCHES_REGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Docker | Docker: Ping | ZABBIX_PASSIVE | docker.ping Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
|
Docker | Docker: Containers total | Total number of containers on this host |
DEPENDENT | docker.containers.total Preprocessing: - JSONPATH: |
Docker | Docker: Containers running | Total number of containers running on this host |
DEPENDENT | docker.containers.running Preprocessing: - JSONPATH: |
Docker | Docker: Containers stopped | Total number of containers stopped on this host |
DEPENDENT | docker.containers.stopped Preprocessing: - JSONPATH: |
Docker | Docker: Containers paused | Total number of containers paused on this host |
DEPENDENT | docker.containers.paused Preprocessing: - JSONPATH: |
Docker | Docker: Images total | Number of images with intermediate image layers |
DEPENDENT | docker.images.total Preprocessing: - JSONPATH: |
Docker | Docker: Storage driver | Docker storage driver https://docs.docker.com/storage/storagedriver/ |
DEPENDENT | docker.driver Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Docker | Docker: Memory limit enabled | - |
DEPENDENT | docker.memlimit.enabled Preprocessing: - JSONPATH: - BOOL TODECIMAL- DISCARD UNCHANGED_HEARTBEAT:1d |
Docker | Docker: Swap limit enabled | - |
DEPENDENT | docker.swaplimit.enabled Preprocessing: - JSONPATH: - BOOL TODECIMAL- DISCARD UNCHANGED_HEARTBEAT:1d |
Docker | Docker: Kernel memory enabled | - |
DEPENDENT | docker.kernelmem.enabled Preprocessing: - JSONPATH: - BOOL TODECIMAL- DISCARD UNCHANGED_HEARTBEAT:1d |
Docker | Docker: Kernel memory TCP enabled | - |
DEPENDENT | docker.kernelmemtcp.enabled Preprocessing: - JSONPATH: - BOOLTODECIMAL - DISCARDUNCHANGEDHEARTBEAT: |
Docker | Docker: CPU CFS Period enabled | https://docs.docker.com/config/containers/resource_constraints/#configure-the-default-cfs-scheduler |
DEPENDENT | docker.cpucfsperiod.enabled Preprocessing: - JSONPATH: - BOOLTODECIMAL - DISCARDUNCHANGEDHEARTBEAT: |
Docker | Docker: CPU CFS Quota enabled | https://docs.docker.com/config/containers/resource_constraints/#configure-the-default-cfs-scheduler |
DEPENDENT | docker.cpucfsquota.enabled Preprocessing: - JSONPATH: - BOOLTODECIMAL - DISCARDUNCHANGEDHEARTBEAT: |
Docker | Docker: CPU Shares enabled | https://docs.docker.com/config/containers/resource_constraints/#configure-the-default-cfs-scheduler |
DEPENDENT | docker.cpushares.enabled Preprocessing: - JSONPATH: - BOOL TODECIMAL- DISCARD UNCHANGED_HEARTBEAT:1d |
Docker | Docker: CPU Set enabled | https://docs.docker.com/config/containers/resource_constraints/#configure-the-default-cfs-scheduler |
DEPENDENT | docker.cpuset.enabled Preprocessing: - JSONPATH: - BOOL TODECIMAL- DISCARD UNCHANGED_HEARTBEAT:1d |
Docker | Docker: Pids limit enabled | - |
DEPENDENT | docker.pidslimit.enabled Preprocessing: - JSONPATH: - BOOL TODECIMAL- DISCARD UNCHANGED_HEARTBEAT:1d |
Docker | Docker: IPv4 Forwarding enabled | - |
DEPENDENT | docker.ipv4forwarding.enabled Preprocessing: - JSONPATH: - BOOL TODECIMAL- DISCARD UNCHANGED_HEARTBEAT:1d |
Docker | Docker: Debug enabled | - |
DEPENDENT | docker.debug.enabled Preprocessing: - JSONPATH: - BOOLTODECIMAL - DISCARDUNCHANGEDHEARTBEAT: |
Docker | Docker: Nfd | Number of used File Descriptors |
DEPENDENT | docker.nfd Preprocessing: - JSONPATH: |
Docker | Docker: OomKill disabled | - |
DEPENDENT | docker.oomkill.disabled Preprocessing: - JSONPATH: - BOOLTODECIMAL - DISCARDUNCHANGEDHEARTBEAT: |
Docker | Docker: Goroutines | Number of goroutines |
DEPENDENT | docker.goroutines Preprocessing: - JSONPATH: |
Docker | Docker: Logging driver | - |
DEPENDENT | docker.loggingdriver Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
Docker | Docker: Cgroup driver | - |
DEPENDENT | docker.cgroupdriver Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
Docker | Docker: NEvents listener | - |
DEPENDENT | docker.nevents_listener Preprocessing: - JSONPATH: |
Docker | Docker: Kernel version | - |
DEPENDENT | docker.kernelversion Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
Docker | Docker: Operating system | - |
DEPENDENT | docker.operatingsystem Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
Docker | Docker: OS type | - |
DEPENDENT | docker.ostype Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
Docker | Docker: Architecture | - |
DEPENDENT | docker.architecture Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Docker | Docker: NCPU | - |
DEPENDENT | docker.ncpu Preprocessing: - JSONPATH: |
Docker | Docker: Memory total | - |
DEPENDENT | docker.mem.total Preprocessing: - JSONPATH: |
Docker | Docker: Docker root dir | - |
DEPENDENT | docker.rootdir Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
Docker | Docker: Name | - |
DEPENDENT | docker.name Preprocessing: - JSONPATH: |
Docker | Docker: Server version | - |
DEPENDENT | docker.serverversion Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
Docker | Docker: Default runtime | - |
DEPENDENT | docker.defaultruntime Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
Docker | Docker: Live restore enabled | - |
DEPENDENT | docker.liverestore.enabled Preprocessing: - JSONPATH: - BOOL TODECIMAL- DISCARD UNCHANGED_HEARTBEAT:1d |
Docker | Docker: Layers size | - |
DEPENDENT | docker.layers_size Preprocessing: - JSONPATH: |
Docker | Docker: Images size | - |
DEPENDENT | docker.images_size Preprocessing: - JSONPATH: |
Docker | Docker: Containers size | - |
DEPENDENT | docker.containers_size Preprocessing: - JSONPATH: |
Docker | Docker: Volumes size | - |
DEPENDENT | docker.volumes_size Preprocessing: - JSONPATH: |
Docker | Docker: Images available | Number of top-level images |
DEPENDENT | docker.images.top_level Preprocessing: - JSONPATH: |
Docker | Image {#NAME}: Created | - |
DEPENDENT | docker.image.created["{#ID}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Docker | Image {#NAME}: Size | - |
DEPENDENT | docker.image.size["{#ID}"] Preprocessing: - JSONPATH: |
Docker | Container {#NAME}: Get stats | Get container stats based on resource usage |
ZABBIX_PASSIVE | docker.container_stats["{#NAME}"] |
Docker | Container {#NAME}: CPU total usage per second | - |
DEPENDENT | docker.containerstats.cpuusage.total.rate["{#NAME}"] Preprocessing: - JSONPATH: - CHANGEPERSECOND - MULTIPLIER: |
Docker | Container {#NAME}: CPU percent usage | - |
DEPENDENT | docker.containerstats.cpupct_usage["{#NAME}"] Preprocessing: - JSONPATH: |
Docker | Container {#NAME}: CPU kernelmode usage per second | - |
DEPENDENT | docker.containerstats.cpuusage.kernel.rate["{#NAME}"] Preprocessing: - JSONPATH: - CHANGEPERSECOND - MULTIPLIER: |
Docker | Container {#NAME}: CPU usermode usage per second | - |
DEPENDENT | docker.containerstats.cpuusage.user.rate["{#NAME}"] Preprocessing: - JSONPATH: - CHANGEPERSECOND - MULTIPLIER: |
Docker | Container {#NAME}: Online CPUs | - |
DEPENDENT | docker.containerstats.onlinecpus["{#NAME}"] Preprocessing: - JSONPATH: |
Docker | Container {#NAME}: Throttling periods | Number of periods with throttling active |
DEPENDENT | docker.containerstats.cpuusage.throttling_periods["{#NAME}"] Preprocessing: - JSONPATH: |
Docker | Container {#NAME}: Throttled periods | Number of periods when the container hits its throttling limit |
DEPENDENT | docker.containerstats.cpuusage.throttled_periods["{#NAME}"] Preprocessing: - JSONPATH: |
Docker | Container {#NAME}: Throttled time | Aggregate time the container was throttled for in nanoseconds |
DEPENDENT | docker.containerstats.cpuusage.throttled_time["{#NAME}"] Preprocessing: - JSONPATH: - MULTIPLIER: |
Docker | Container {#NAME}: Memory usage | - |
DEPENDENT | docker.container_stats.memory.usage["{#NAME}"] Preprocessing: - JSONPATH: |
Docker | Container {#NAME}: Memory maximum usage | - |
DEPENDENT | docker.containerstats.memory.maxusage["{#NAME}"] Preprocessing: - JSONPATH: |
Docker | Container {#NAME}: Memory commit bytes | - |
DEPENDENT | docker.containerstats.memory.commitbytes["{#NAME}"] Preprocessing: - JSONPATH: |
Docker | Container {#NAME}: Memory commit peak bytes | - |
DEPENDENT | docker.containerstats.memory.commitpeak_bytes["{#NAME}"] Preprocessing: - JSONPATH: |
Docker | Container {#NAME}: Memory private working set | - |
DEPENDENT | docker.containerstats.memory.privateworking_set["{#NAME}"] Preprocessing: - JSONPATH: |
Docker | Container {#NAME}: Networks bytes received per second | - |
DEPENDENT | docker.networks.rxbytes["{#NAME}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 - CHANGEPERSECOND |
Docker | Container {#NAME}: Networks packets received per second | - |
DEPENDENT | docker.networks.rxpackets["{#NAME}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 - CHANGEPERSECOND |
Docker | Container {#NAME}: Networks errors received per second | - |
DEPENDENT | docker.networks.rxerrors["{#NAME}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 - CHANGEPERSECOND |
Docker | Container {#NAME}: Networks incoming packets dropped per second | - |
DEPENDENT | docker.networks.rxdropped["{#NAME}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 - CHANGEPERSECOND |
Docker | Container {#NAME}: Networks bytes sent per second | - |
DEPENDENT | docker.networks.txbytes["{#NAME}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 - CHANGEPERSECOND |
Docker | Container {#NAME}: Networks packets sent per second | - |
DEPENDENT | docker.networks.txpackets["{#NAME}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 - CHANGEPERSECOND |
Docker | Container {#NAME}: Networks errors sent per second | - |
DEPENDENT | docker.networks.txerrors["{#NAME}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 - CHANGEPERSECOND |
Docker | Container {#NAME}: Networks outgoing packets dropped per second | - |
DEPENDENT | docker.networks.txdropped["{#NAME}"] Preprocessing: - JSONPATH: ⛔️ON FAIL:CUSTOM_VALUE -> 0 - CHANGEPERSECOND |
Docker | Container {#NAME}: Get info | Return low-level information about a container |
ZABBIX_PASSIVE | docker.container_info["{#NAME}"] |
Docker | Container {#NAME}: Created | - |
DEPENDENT | docker.containerinfo.created["{#NAME}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
Docker | Container {#NAME}: Image | - |
DEPENDENT | docker.containerinfo.image["{#NAME}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
Docker | Container {#NAME}: Restart count | - |
DEPENDENT | docker.containerinfo.restartcount["{#NAME}"] Preprocessing: - JSONPATH: |
Docker | Container {#NAME}: Status | - |
DEPENDENT | docker.containerinfo.state.status["{#NAME}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
Docker | Container {#NAME}: Running | - |
DEPENDENT | docker.containerinfo.state.running["{#NAME}"] Preprocessing: - JSONPATH: - BOOL TO_DECIMAL |
Docker | Container {#NAME}: Paused | - |
DEPENDENT | docker.containerinfo.state.paused["{#NAME}"] Preprocessing: - JSONPATH: - BOOL TO_DECIMAL |
Docker | Container {#NAME}: Restarting | - |
DEPENDENT | docker.containerinfo.state.restarting["{#NAME}"] Preprocessing: - JSONPATH: - BOOL TO_DECIMAL |
Docker | Container {#NAME}: OOMKilled | - |
DEPENDENT | docker.containerinfo.state.oomkilled["{#NAME}"] Preprocessing: - JSONPATH: - BOOL TO_DECIMAL |
Docker | Container {#NAME}: Dead | - |
DEPENDENT | docker.containerinfo.state.dead["{#NAME}"] Preprocessing: - JSONPATH: - BOOL TO_DECIMAL |
Docker | Container {#NAME}: Pid | - |
DEPENDENT | docker.containerinfo.state.pid["{#NAME}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
Docker | Container {#NAME}: Exit code | - |
DEPENDENT | docker.containerinfo.state.exitcode["{#NAME}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
Docker | Container {#NAME}: Error | - |
DEPENDENT | docker.containerinfo.state.error["{#NAME}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
Docker | Container {#NAME}: Started at | - |
DEPENDENT | docker.containerinfo.started["{#NAME}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
Docker | Container {#NAME}: Finished at | - |
DEPENDENT | docker.containerinfo.finished["{#NAME}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1d |
Zabbix raw items | Docker: Get info | ZABBIX_PASSIVE | docker.info | |
Zabbix raw items | Docker: Get containers | ZABBIX_PASSIVE | docker.containers | |
Zabbix raw items | Docker: Get images | ZABBIX_PASSIVE | docker.images | |
Zabbix raw items | Docker: Get data_usage | ZABBIX_PASSIVE | docker.data_usage |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Docker: Service is down | - |
last(/Docker by Zabbix agent 2/docker.ping)=0 |
AVERAGE | Manual close: YES |
Docker: Failed to fetch info data | Zabbix has not received data for items for the last 30 minutes |
nodata(/Docker by Zabbix agent 2/docker.name,30m)=1 |
WARNING | Manual close: YES Depends on: - Docker: Service is down |
Docker: Version has changed | Docker version has changed. Ack to close. |
last(/Docker by Zabbix agent 2/docker.server_version,#1)<>last(/Docker by Zabbix agent 2/docker.server_version,#2) and length(last(/Docker by Zabbix agent 2/docker.server_version))>0 |
INFO | Manual close: YES |
Container {#NAME}: Container has been stopped with error code | - |
last(/Docker by Zabbix agent 2/docker.container_info.state.exitcode["{#NAME}"])>0 and last(/Docker by Zabbix agent 2/docker.container_info.state.running["{#NAME}"])=0 |
AVERAGE | Manual close: YES |
Container {#NAME}: An error has occurred in the container | Container {#NAME} has an error. Ack to close. |
last(/Docker by Zabbix agent 2/docker.container_info.state.error["{#NAME}"],#1)<>last(/Docker by Zabbix agent 2/docker.container_info.state.error["{#NAME}"],#2) and length(last(/Docker by Zabbix agent 2/docker.container_info.state.error["{#NAME}"]))>0 |
WARNING | Manual close: YES |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher.
This template is designed to get metrics from the Control-M server using the Control-M Automation API with HTTP agent.
This template monitors server statistics, discovers jobs and agents using Low Level Discovery.
To use this template, macros {$API.TOKEN}
, {$API.URI.ENDPOINT}
, and {$SERVER.NAME}
need to be set.
See Zabbix template operation for basic instructions.
This template has been tested on:
This template is primarily intended for using in conjunction with the Control-M enterprise manager by HTTP
template in order to create host prototypes.
It monitors:
However, if you wish to monitor the Control-M server separately with this template, you must set the following macros: {$API.TOKEN}, {$API.URI.ENDPOINT}, and {$SERVER.NAME}.
To access the {$API.TOKEN}
macro, use one of the following interfaces:
{$API.URI.ENDPOINT}
- is the Control-M Automation API endpoint for the API requests, including your server IP, or DNS address, the Automation API port and path.
For example, https://monitored.controlm.instance:8443/automation-api
.
{$SERVER.NAME}
- is the name of the Control-M server to be monitored.
Name | Description | Default |
---|---|---|
{$SERVER.NAME} | The name of the Control-M server. |
|
{$API.URI.ENDPOINT} | The API endpoint is a URI - for example, |
|
{$API.TOKEN} | A token to use for API connections. |
Name | Description | Type | Key and additional info |
---|---|---|---|
Control-M: Get Control-M server stats | Gets the statistics of the server. | Http Agent | controlm.server.stats Preprocessing
|
Control-M: Get jobs | Gets the status of jobs. | Http Agent | controlm.jobs |
Control-M: Get agents | Gets agents for the server. | Http Agent | controlm.agents |
Control-M: Jobs statistics | Gets the statistics of jobs. | Dependent | controlm.jobs.statistics Preprocessing
|
Control-M: Jobs returned | Gets the count of returned jobs. | Dependent | controlm.jobs.statistics.returned Preprocessing
|
Control-M: Jobs total | Gets the count of total jobs. | Dependent | controlm.jobs.statistics.total Preprocessing
|
Control-M: Server state | Gets the metric of the server state. | Dependent | server.state Preprocessing
|
Control-M: Server message | Gets the metric of the server message. | Dependent | server.message Preprocessing
|
Control-M: Server version | Gets the metric of the server version. | Dependent | server.version Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Control-M: Server is down | The server is down. | last(/Control-M server by HTTP/Control-M: Server state)=0 or last(/Control-M server by HTTP/Control-M: Server state)=10 |High |
- | |
Control-M: Server disconnected | The server is disconnected. | last(/Control-M server by HTTP/Control-M: Server message,#1)="Disconnected" |High |
- | |
Control-M: Server error | The server has encountered an error. | last(/Control-M server by HTTP/Control-M: Server message,#1)<>"Connected" and last(/Control-M server by HTTP/Control-M: Server message,#1)<>"Disconnected" and last(/Control-M server by HTTP/Control-M: Server message,#1)<>"" |High |
- | |
Control-M: Server version has changed | The server version has changed. Acknowledge (Ack) to close. | last(/Control-M server by HTTP/Control-M: Server version,#1)<>last(/Control-M server by HTTP/Control-M: Server version,#2) and length(last(/Control-M server by HTTP/Control-M: Server version))>0 |Info |
- |
Name | Description | Type | Key and additional info |
---|---|---|---|
Jobs discovery | Discovers jobs on the server. | Dependent | controlm.jobs.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Job [{#JOB.ID}]: stats | Gets the statistics of a job. | Dependent | job.stats['{#JOB.ID}'] Preprocessing
|
Job [{#JOB.ID}]: status | Gets the status of a job. | Dependent | job.status['{#JOB.ID}'] Preprocessing
|
Job [{#JOB.ID}]: number of runs | Gets the number of runs for a job. | Dependent | job.numberOfRuns['{#JOB.ID}'] Preprocessing
|
Job [{#JOB.ID}]: type | Gets the job type. | Dependent | job.type['{#JOB.ID}'] Preprocessing
|
Job [{#JOB.ID}]: held status | Gets the held status of a job. | Dependent | job.held['{#JOB.ID}'] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Job [{#JOB.ID}]: status [{ITEM.VALUE}] | The job has encountered an issue. | last(/Control-M server by HTTP/Job [{#JOB.ID}]: status,#1)=1 or last(/Control-M server by HTTP/Job [{#JOB.ID}]: status,#1)=10 |Warning |
- |
Name | Description | Type | Key and additional info |
---|---|---|---|
Agent discovery | Discovers agents on the server. | Dependent | controlm.agent.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Agent [{#AGENT.NAME}]: stats | Gets the statistics of an agent. | Dependent | agent.stats['{#AGENT.NAME}'] Preprocessing
|
Agent [{#AGENT.NAME}]: status | Gets the status of an agent. | Dependent | agent.status['{#AGENT.NAME}'] Preprocessing
|
Agent [{#AGENT.NAME}]: version | Gets the version number of an agent. | Dependent | agent.version['{#AGENT.NAME}'] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Agent [{#AGENT.NAME}]: status [{ITEM.VALUE}] | The agent has encountered an issue. | last(/Control-M server by HTTP/Agent [{#AGENT.NAME}]: status,#1)=1 or last(/Control-M server by HTTP/Agent [{#AGENT.NAME}]: status,#1)=10 |Average |
- |
Agent [{#AGENT.NAME}}: status disabled | The agent is disabled. | last(/Control-M server by HTTP/Agent [{#AGENT.NAME}]: status,#1)=2 or last(/Control-M server by HTTP/Agent [{#AGENT.NAME}]: status,#1)=3 |Info |
- |
Agent [{#AGENT.NAME}]: version has changed | The agent version has changed. Acknowledge (Ack) to close. | last(/Control-M server by HTTP/Agent [{#AGENT.NAME}]: version,#1)<>last(/Control-M server by HTTP/Agent [{#AGENT.NAME}]: version,#2) |Info |
- |
Agent [{#AGENT.NAME}]: unknown version | The agent version is unknown. | last(/Control-M server by HTTP/Agent [{#AGENT.NAME}]: version,#1)="Unknown" |Warning |
- |
Please report any issues with the template at https://support.zabbix.com.
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums.
For Zabbix version: 6.2 and higher.
This template is designed to get metrics from the Control-M Enterprise Manager using the Control-M Automation API with HTTP agent.
This template monitors active Service Level Agreement (SLA) services, discovers Control-M servers using Low Level Discovery and also creates host prototypes for them in conjunction with the Control-M server by HTTP
template.
To use this template, macros {$API.TOKEN}
and {$API.URI.ENDPOINT}
need to be set.
See Zabbix template operation for basic instructions.
This template has been tested on:
This template is intended to be used on Control-M Enterprise Manager instances.
It monitors:
Control-M server by HTTP
template.To use this template, you must set macros: {$API.TOKEN} and {$API.URI.ENDPOINT}.
To access the API token, use one of the following Control-M interfaces:
{$API.URI.ENDPOINT}
- is the Control-M Automation API endpoint for the API requests, including your server IP, or DNS address, Automation API port and path.
For example, https://monitored.controlm.instance:8443/automation-api
.
Name | Description | Default |
---|---|---|
{$API.URI.ENDPOINT} | The API endpoint is a URI - for example, |
|
{$API.TOKEN} | A token to use for API connections. |
Name | Description | Type | Key and additional info |
---|---|---|---|
Control-M: Get Control-M servers | Gets a list of servers. | Http Agent | controlm.servers |
Control-M: Get SLA services | Gets all the SLA active services. | Http Agent | controlm.services |
Name | Description | Expression | Severity | Dependencies and additional info |
---|
Name | Description | Type | Key and additional info |
---|---|---|---|
Server discovery | Discovers the Control-M servers. | Dependent | controlm.server.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
SLA services discovery | Discovers the SLA services in the Control-M environment. | Dependent | controlm.services.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: stats | Gets the service statistics. | Dependent | service.stats['{#SERVICE.NAME}','{#SERVICE.JOB}'] Preprocessing
|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: status | Gets the service status. | Dependent | service.status['{#SERVICE.NAME}','{#SERVICE.JOB}'] Preprocessing
|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'executed' | Gets the number of jobs in the state - executed . |
Dependent | service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',executed] Preprocessing
|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'waitCondition' | Gets the number of jobs in the state - waitCondition . |
Dependent | service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',waitCondition] Preprocessing
|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'waitResource' | Gets the number of jobs in the state - waitResource . |
Dependent | service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',waitResource] Preprocessing
|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'waitHost' | Gets the number of jobs in the state - waitHost . |
Dependent | service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',waitHost] Preprocessing
|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'waitWorkload' | Gets the number of jobs in the state - waitWorkload . |
Dependent | service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',waitWorkload] Preprocessing
|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'completed' | Gets the number of jobs in the state - completed . |
Dependent | service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',completed] Preprocessing
|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'error' | Gets the number of jobs in the state - error . |
Dependent | service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',error] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: status [{ITEM.VALUE}] | The service has encountered an issue. | last(/Control-M enterprise manager by HTTP/Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: status,#1)=0 or last(/Control-M enterprise manager by HTTP/Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: status,#1)=10 |Average |
- |
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: status [{ITEM.VALUE}] | The service has finished its job late. | last(/Control-M enterprise manager by HTTP/Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: status,#1)=3 |Warning |
- |
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs in 'error' state | There are services present which are in the state - error . |
last(/Control-M enterprise manager by HTTP/Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'error',#1)>0 |Average |
- |
Please report any issues with the template at https://support.zabbix.com.
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums.
For Zabbix version: 6.2 and higher
The template to monitor HashiCorp Consul by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template HashiCorp Consul Cluster by HTTP
— collects metrics by HTTP agent from API endpoints.
More information about metrics you can find in official documentation.
This template was tested on:
See Zabbix template operation for basic instructions.
Template need to use Authorization via API token.
Don't forget to change macros {$CONSUL.CLUSTER.URL}, {$CONSUL.TOKEN}.
Also, see the Macros section for a list of macros used to set trigger values.
This template support Consul namespaces. You can set macro {$CONSUL.NAMESPACE}, if you are interested in only one service namespace. Do not specify this macro to get all of services.
In case of Open Source version leave this macro empty.
NOTE. Some metrics may not be collected depending on your HashiCorp Consul instance version and configuration.
NOTE. You maybe are interested in Envoy Proxy by HTTP template.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$CONSUL.API.PORT} | Consul API port. Using in node LLD. |
8500 |
{$CONSUL.API.SCHEME} | Consul API scheme. Using in node LLD. |
http |
{$CONSUL.CLUSTER.URL} | Consul cluster URL. |
http://localhost:8500 |
{$CONSUL.LLD.FILTER.NODE_NAME.MATCHES} | Filter of discoverable discovered nodes. |
.* |
{$CONSUL.LLD.FILTER.NODENAME.NOTMATCHES} | Filter to exclude discovered nodes. |
CHANGE IF NEEDED |
{$CONSUL.LLD.FILTER.SERVICE_NAME.MATCHES} | Filter of discoverable discovered services. |
.* |
{$CONSUL.LLD.FILTER.SERVICENAME.NOTMATCHES} | Filter to exclude discovered services. |
CHANGE IF NEEDED |
{$CONSUL.NAMESPACE} | Consul service namespace. Enterprise only, in case of Open Source version leave this macro empty. Do not specify this macro to get all of services. |
`` |
{$CONSUL.SERVICE_NODES.CRITICAL.MAX.AVG} | Maximum number of service nodes in status 'critical' for trigger expression. Can be used with context. |
0 |
{$CONSUL.TOKEN} | Consul auth token. |
<PUT YOUR AUTH TOKEN> |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Consul cluster nodes discovery | - |
DEPENDENT | consul.lldnodes Preprocessing: - JAVASCRIPT: - DISCARD UNCHANGEDHEARTBEAT:3h Filter: - {#NODE NAME} MATCHESREGEX{$CONSUL.LLD.FILTER.NODE_NAME.MATCHES} - {#NODE NAME} NOTMATCHESREGEX{$CONSUL.LLD.FILTER.NODE_NAME.NOT_MATCHES} |
Consul cluster services discovery | - |
DEPENDENT | consul.lldservices Preprocessing: - JAVASCRIPT: - DISCARD UNCHANGEDHEARTBEAT:3h Filter: - {#SERVICE NAME} MATCHESREGEX{$CONSUL.LLD.FILTER.SERVICE_NAME.MATCHES} - {#SERVICE NAME} NOTMATCHESREGEX{$CONSUL.LLD.FILTER.SERVICE_NAME.NOT_MATCHES} |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Consul | Consul: Nodes: total | Number of nodes on current dc. |
DEPENDENT | consul.nodestotal Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:3h |
Consul | Consul: Nodes: passing | Number of agents on current dc with serf health status 'passing'. |
DEPENDENT | consul.nodespassing Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:3h |
Consul | Consul: Nodes: critical | Number of agents on current dc with serf health status 'critical'. |
DEPENDENT | consul.nodescritical Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:3h |
Consul | Consul: Nodes: warning | Number of agents on current dc with serf health status 'warning'. |
DEPENDENT | consul.nodeswarning Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:3h |
Consul | Consul: Services: total | Number of services on current dc. |
DEPENDENT | consul.servicestotal Preprocessing: - JAVASCRIPT: - DISCARD UNCHANGED_HEARTBEAT:3h |
Consul | Consul: Node ["{#NODE_NAME}"]: Serf Health | Node Serf Health Status. |
DEPENDENT | consul.serf.health["{#NODENAME}"] Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARD UNCHANGED_HEARTBEAT:3h |
Consul | Consul: Service ["{#SERVICE_NAME}"]: Nodes passing | - |
DEPENDENT | consul.service.nodespassing["{#SERVICENAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Consul | Consul: Service ["{#SERVICE_NAME}"]: Nodes warning | - |
DEPENDENT | consul.service.nodeswarning["{#SERVICENAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Consul | Consul: Service ["{#SERVICE_NAME}"]: Nodes critical | - |
DEPENDENT | consul.service.nodescritical["{#SERVICENAME}"] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Consul cluster | Consul cluster: Cluster leader | Current leader address. |
HTTP_AGENT | consul.getleader Preprocessing: - CHECK NOTSUPPORTED⛔️ON FAIL:DISCARD_VALUE -> - TRIM: - DISCARDUNCHANGEDHEARTBEAT: |
Zabbix raw items | Consul cluster: Nodes: peers | The number of Raft peers for the datacenter in which the agent is running. |
HTTP_AGENT | consul.getpeers Preprocessing: - CHECK NOTSUPPORTED⛔️ON FAIL:DISCARD_VALUE -> - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Zabbix raw items | Consul cluster: Get nodes | Catalog of nodes registered in a given datacenter. |
HTTP_AGENT | consul.getnodes Preprocessing: - CHECK NOTSUPPORTED⛔️ON FAIL:DISCARD_VALUE -> |
Zabbix raw items | Consul cluster: Get nodes Serf health status | Get Serf Health Status for all agents in cluster. |
HTTP_AGENT | consul.getclusterserf Preprocessing: - CHECKNOTSUPPORTED ⛔️ON_FAIL: |
Zabbix raw items | Consul cluster: Get services | Catalog of services registered in a given datacenter. |
HTTP_AGENT | consul.getcatalogservices Preprocessing: - CHECKNOTSUPPORTED ⛔️ON_FAIL: |
Zabbix raw items | Consul cluster: ["{#SERVICE_NAME}"]: Get raw service state | Retrieve service instances providing the service indicated on the path. |
HTTP_AGENT | consul.getservicestats["{#SERVICENAME}"] Preprocessing: - CHECK NOTSUPPORTED⛔️ON FAIL:DISCARD_VALUE -> |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Consul: One or more nodes in cluster in 'critical' state | One or more agents on current dc with serf health status 'critical'. |
last(/HashiCorp Consul Cluster by HTTP/consul.nodes_critical)>0 |
AVERAGE | |
Consul: One or more nodes in cluster in 'warning' state | One or more agents on current dc with serf health status 'warning'. |
last(/HashiCorp Consul Cluster by HTTP/consul.nodes_warning)>0 |
WARNING | |
Consul: Service ["{#SERVICE_NAME}"]: Too many nodes with service status 'critical' | ||||
One or more nodes with service status 'critical'. |
last(/HashiCorp Consul Cluster by HTTP/consul.service.nodes_critical["{#SERVICE_NAME}"])>{$CONSUL.CLUSTER.SERVICE_NODES.CRITICAL.MAX.AVG:"{#SERVICE_NAME}"} |
AVERAGE | ||
Consul cluster: Leader has been changed | Consul cluster version has changed. Ack to close. |
last(/HashiCorp Consul Cluster by HTTP/consul.get_leader,#1)<>last(/HashiCorp Consul Cluster by HTTP/consul.get_leader,#2) and length(last(/HashiCorp Consul Cluster by HTTP/consul.get_leader))>0 |
INFO | Manual close: YES |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
The template to monitor HashiCorp Consul by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Do not forget to enable Prometheus format for export metrics.
See documentation.
More information about metrics you can find in official documentation.
Template HashiCorp Consul Node by HTTP
— collects metrics by HTTP agent from /v1/agent/metrics endpoint.
This template was tested on:
See Zabbix template operation for basic instructions.
Internal service metrics are collected from /v1/agent/metrics endpoint. Do not forget to enable Prometheus format for export metrics. See documentation. Template need to use Authorization via API token.
Don't forget to change macros {$CONSUL.NODE.API.URL}, {$CONSUL.TOKEN}.
Also, see the Macros section for a list of macros used to set trigger values.
This template support Consul namespaces. You can set macros {$CONSUL.LLD.FILTER.SERVICENAMESPACE.MATCHES}, {$CONSUL.LLD.FILTER.SERVICENAMESPACE.NOT_MATCHES} if you want to filter discovered services by namespace.
In case of Open Source version service namespace will be set to 'None'.
NOTE. Some metrics may not be collected depending on your HashiCorp Consul instance version and configuration.
NOTE. You maybe are interested in Envoy Proxy by HTTP template.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$CONSUL.LLD.FILTER.LOCALSERVICENAME.MATCHES} | Filter of discoverable discovered services on local node. |
.* |
{$CONSUL.LLD.FILTER.LOCALSERVICENAME.NOT_MATCHES} | Filter to exclude discovered services on local node. |
CHANGE IF NEEDED |
{$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.MATCHES} | Filter of discoverable discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'. |
.* |
{$CONSUL.LLD.FILTER.SERVICENAMESPACE.NOTMATCHES} | Filter to exclude discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'. |
CHANGE IF NEEDED |
{$CONSUL.NODE.API.URL} | Consul instance URL. |
http://localhost:8500 |
{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH} | Maximum acceptable value of node's health score for AVERAGE trigger expression. |
4 |
{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN} | Maximum acceptable value of node's health score for WARNING trigger expression. |
2 |
{$CONSUL.OPEN.FDS.MAX.WARN} | Maximum percentage of used file descriptors. |
90 |
{$CONSUL.TOKEN} | Consul auth token. |
<PUT YOUR AUTH TOKEN> |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
HTTP API methods discovery | Discovery HTTP API methods specific metrics. |
DEPENDENT | consul.httpapidiscovery Preprocessing: - PROMETHEUSTOJSON: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
Local node services discovery | Discover metrics for services that are registered with the local agent. |
DEPENDENT | consul.nodeserviceslld Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: Filter: - {#SERVICENAME} MATCHESREGEX - {#SERVICENAME} NOTMATCHESREGEX - {#SERVICENAMESPACE} MATCHESREGEX - {#SERVICENAMESPACE} NOTMATCHESREGEX Overrides: aggregated status - ITEMPROTOTYPE LIKE State - DISCOVERchecks service_check - ITEM_PROTOTYPE LIKE Check - DISCOVER |
Raft leader metrics discovery | Discover raft metrics for leader nodes. |
DEPENDENT | consul.raft.leader.discovery Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
Raft server metrics discovery | Discover raft metrics for server nodes. |
DEPENDENT | consul.raft.server.discovery Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Consul | Consul: Role | Role of current Consul agent. |
DEPENDENT | consul.role Preprocessing: - JSONPATH: - BOOLTODECIMAL - DISCARDUNCHANGEDHEARTBEAT: |
Consul | Consul: Version | Version of Consul agent. |
DEPENDENT | consul.version Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Consul | Consul: Number of services | Number of services on current node. |
DEPENDENT | consul.servicesnumber Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:3h |
Consul | Consul: Number of checks | Number of checks on current node. |
DEPENDENT | consul.checksnumber Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:3h |
Consul | Consul: Number of check monitors | Number of check monitors on current node. |
DEPENDENT | consul.checkmonitorsnumber Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Consul | Consul: Process CPU seconds, total | Total user and system CPU time spent in seconds. |
DEPENDENT | consul.cpusecondstotal.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Consul | Consul: Virtual memory size | Virtual memory size in bytes. |
DEPENDENT | consul.virtualmemorybytes Preprocessing: - PROMETHEUS_PATTERN: |
Consul | Consul: RSS memory usage | Resident memory size in bytes. |
DEPENDENT | consul.residentmemorybytes Preprocessing: - PROMETHEUS_PATTERN: |
Consul | Consul: Goroutine count | The number of Goroutines on Consul instance. |
DEPENDENT | consul.goroutines Preprocessing: - PROMETHEUS_PATTERN: |
Consul | Consul: Open file descriptors | Number of open file descriptors. |
DEPENDENT | consul.processopenfds Preprocessing: - PROMETHEUS_PATTERN: |
Consul | Consul: Open file descriptors, max | Maximum number of open file descriptors. |
DEPENDENT | consul.processmaxfds Preprocessing: - PROMETHEUS_PATTERN: |
Consul | Consul: Client RPC, per second | Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server. This gives a measure of how much a given agent is loading the Consul servers. This is only generated by agents in client mode, not Consul servers. |
DEPENDENT | consul.clientrpc Preprocessing: - PROMETHEUS PATTERN:consul_client_rpc ⛔️ONFAIL: - CHANGEPER_SECOND |
Consul | Consul: Client RPC failed ,per second | Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server and fails. |
DEPENDENT | consul.clientrpcfailed Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Consul | Consul: TCP connections, accepted per second | This metric counts the number of times a Consul agent has accepted an incoming TCP stream connection per second. |
DEPENDENT | consul.memberlist.tcpaccept Preprocessing: - PROMETHEUS PATTERN:consul_memberlist_tcp_accept ⛔️ONFAIL: - CHANGEPER_SECOND |
Consul | Consul: TCP connections, per second | This metric counts the number of times a Consul agent has initiated a push/pull sync with an other agent per second. |
DEPENDENT | consul.memberlist.tcpconnect Preprocessing: - PROMETHEUS PATTERN:consul_memberlist_tcp_connect ⛔️ONFAIL: - CHANGEPER_SECOND |
Consul | Consul: TCP send bytes, per second | This metric measures the total number of bytes sent by a Consul agent through the TCP protocol per second. |
DEPENDENT | consul.memberlist.tcpsent Preprocessing: - PROMETHEUS PATTERN:consul_memberlist_tcp_sent ⛔️ONFAIL: - CHANGEPER_SECOND |
Consul | Consul: UDP received bytes, per second | This metric measures the total number of bytes received by a Consul agent through the UDP protocol per second. |
DEPENDENT | consul.memberlist.udpreceived Preprocessing: - PROMETHEUS PATTERN:consul_memberlist_udp_received ⛔️ONFAIL: - CHANGEPER_SECOND |
Consul | Consul: UDP sent bytes, per second | This metric measures the total number of bytes sent by a Consul agent through the UDP protocol per second. |
DEPENDENT | consul.memberlist.udpsent Preprocessing: - PROMETHEUS PATTERN:consul_memberlist_udp_sent ⛔️ONFAIL: - CHANGEPER_SECOND |
Consul | Consul: GC pause, p90 | The 90 percentile for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds. |
DEPENDENT | consul.gcpause.p90 Preprocessing: - PROMETHEUS PATTERN:consul_runtime_gc_pause_ns{quantile="0.9"} ⛔️ON_FAIL: - JAVASCRIPT: - MULTIPLIER: |
Consul | Consul: GC pause, p50 | The 50 percentile (median) for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds. |
DEPENDENT | consul.gcpause.p50 Preprocessing: - PROMETHEUS PATTERN:consul_runtime_gc_pause_ns{quantile="0.5"} ⛔️ON_FAIL: - JAVASCRIPT: - MULTIPLIER: |
Consul | Consul: Memberlist: degraded | This metric counts the number of times the Consul agent has performed failure detection on another agent at a slower probe rate. The agent uses its own health metric as an indicator to perform this action. If its health score is low, it means that the node is healthy, and vice versa. |
DEPENDENT | consul.memberlist.degraded Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Consul | Consul: Memberlist: health score | This metric describes a node's perception of its own health based on how well it is meeting the soft real-time requirements of the protocol. This metric ranges from 0 to 8, where 0 indicates "totally healthy". |
DEPENDENT | consul.memberlist.healthscore Preprocessing: - PROMETHEUS PATTERN:consul_memberlist_health_score ⛔️ON_FAIL: |
Consul | Consul: Memberlist: gossip, p90 | The 90 percentile for the number of gossips (messages) broadcasted to a set of randomly selected nodes. |
DEPENDENT | consul.memberlist.dispatchlog.p90 Preprocessing: - PROMETHEUS PATTERN:consul_memberlist_gossip{quantile="0.9"} ⛔️ON_FAIL: - JAVASCRIPT: |
Consul | Consul: Memberlist: gossip, p50 | The 50 for the number of gossips (messages) broadcasted to a set of randomly selected nodes. |
DEPENDENT | consul.memberlist.gossip.p50 Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - JAVASCRIPT: |
Consul | Consul: Memberlist: msg alive | This metric counts the number of alive Consul agents, that the agent has mapped out so far, based on the message information given by the network layer. |
DEPENDENT | consul.memberlist.msg.alive Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Consul | Consul: Memberlist: msg dead | This metric counts the number of times a Consul agent has marked another agent to be a dead node. |
DEPENDENT | consul.memberlist.msg.dead Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Consul | Consul: Memberlist: msg suspect | The number of times a Consul agent suspects another as failed while probing during gossip protocol. |
DEPENDENT | consul.memberlist.msg.suspect Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Consul | Consul: Memberlist: probe node, p90 | The 90 percentile for the time taken to perform a single round of failure detection on a select Consul agent. |
DEPENDENT | consul.memberlist.probenode.p90 Preprocessing: - PROMETHEUS PATTERN:consul_memberlist_probeNode{quantile="0.9"} ⛔️ON_FAIL: - JAVASCRIPT: |
Consul | Consul: Memberlist: probe node, p50 | The 50 percentile (median) for the time taken to perform a single round of failure detection on a select Consul agent. |
DEPENDENT | consul.memberlist.probenode.p50 Preprocessing: - PROMETHEUS PATTERN:consul_memberlist_probeNode{quantile="0.5"} ⛔️ON_FAIL: - JAVASCRIPT: |
Consul | Consul: Memberlist: push pull node, p90 | The 90 percentile for the number of Consul agents that have exchanged state with this agent. |
DEPENDENT | consul.memberlist.pushpullnode.p90 Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - JAVASCRIPT: |
Consul | Consul: Memberlist: push pull node, p50 | The 50 percentile (median) for the number of Consul agents that have exchanged state with this agent. |
DEPENDENT | consul.memberlist.pushpullnode.p50 Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - JAVASCRIPT: |
Consul | Consul: KV store: apply, p90 | The 90 percentile for the time it takes to complete an update to the KV store. |
DEPENDENT | consul.kvs.apply.p90 Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - JAVASCRIPT: |
Consul | Consul: KV store: apply, p50 | The 50 percentile (median) for the time it takes to complete an update to the KV store. |
DEPENDENT | consul.kvs.apply.p50 Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - JAVASCRIPT: |
Consul | Consul: KV store: apply, rate | The number of updates to the KV store per second. |
DEPENDENT | consul.kvs.apply.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Consul | Consul: Serf member: flap, rate | Increments when an agent is marked dead and then recovers within a short time period. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports. Shown as events per second. |
DEPENDENT | consul.serf.member.flap.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Consul | Consul: Serf member: failed, rate | Increments when an agent is marked dead. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports. Shown as events per second. |
DEPENDENT | consul.serf.member.failed.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Consul | Consul: Serf member: join, rate | Increments when an agent joins the cluster. If an agent flapped or failed this counter also increments when it re-joins. Shown as events per second. |
DEPENDENT | consul.serf.member.join.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Consul | Consul: Serf member: left, rate | Increments when an agent leaves the cluster. Shown as events per second. |
DEPENDENT | consul.serf.member.left.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Consul | Consul: Serf member: update, rate | Increments when a Consul agent updates. Shown as events per second. |
DEPENDENT | consul.serf.member.update.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Consul | Consul: ACL: resolves, rate | The number of ACL resolves per second. |
DEPENDENT | consul.acl.resolves.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Consul | Consul: Catalog: register, rate | The number of catalog register operation per second. |
DEPENDENT | consul.catalog.register.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Consul | Consul: Catalog: deregister, rate | The number of catalog deregister operation per second. |
DEPENDENT | consul.catalog.deregister.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Consul | Consul: Snapshot: append line, p90 | The 90 percentile for the time taken by the Consul agent to append an entry into the existing log. |
DEPENDENT | consul.snapshot.appendline.p90 Preprocessing: - PROMETHEUS PATTERN:consul_serf_snapshot_appendLine{quantile="0.9"} ⛔️ON_FAIL: - JAVASCRIPT: |
Consul | Consul: Snapshot: append line, p50 | The 50 percentile (median) for the time taken by the Consul agent to append an entry into the existing log. |
DEPENDENT | consul.snapshot.appendline.p50 Preprocessing: - PROMETHEUS PATTERN:consul_serf_snapshot_appendLine{quantile="0.5"} ⛔️ON_FAIL: - JAVASCRIPT: |
Consul | Consul: Snapshot: append line, rate | The number of snapshot appendLine operations per second. |
DEPENDENT | consul.snapshot.appendline.rate Preprocessing: - PROMETHEUS PATTERN:consul_serf_snapshot_appendLine_count ⛔️ONFAIL: - CHANGEPER_SECOND |
Consul | Consul: Snapshot: compact, p90 | The 90 percentile for the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction. |
DEPENDENT | consul.snapshot.compact.p90 Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - JAVASCRIPT: |
Consul | Consul: Snapshot: compact, p50 | The 50 percentile (median) for the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction. |
DEPENDENT | consul.snapshot.compact.p50 Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - JAVASCRIPT: |
Consul | Consul: Snapshot: compact, rate | The number of snapshot compact operations per second. |
DEPENDENT | consul.snapshot.compact.rate Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Consul | Consul: Get local services check | Data collection check. |
DEPENDENT | consul.getlocalservices.check Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
Consul | Consul: ["{#SERVICE_NAME}"]: Aggregated status | Aggregated values of all health checks for the service instance. |
DEPENDENT | consul.service.aggregatedstate["{#SERVICEID}"] Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
Consul | Consul: ["{#SERVICENAME}"]: Check ["{#SERVICECHECK_NAME}"]: Status | Current state of health check for the service. |
DEPENDENT | consul.service.check.state["{#SERVICEID}/{#SERVICECHECKID}"] Preprocessing: - JSONPATH: - JAVASCRIPT: - DISCARD UNCHANGED_HEARTBEAT:3h |
Consul | Consul: ["{#SERVICENAME}"]: Check ["{#SERVICECHECK_NAME}"]: Output | Current output of health check for the service. |
DEPENDENT | consul.service.check.output["{#SERVICEID}/{#SERVICECHECKID}"] Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:3h |
Consul | Consul: HTTP request: ["{#HTTP_METHOD}"], p90 | The 90 percentile of how long it takes to service the given HTTP request for the given verb. |
DEPENDENT | consul.http.api.p90["{#HTTPMETHOD}"] Preprocessing: - PROMETHEUS PATTERN:consul_api_http{method = "{#HTTP_METHOD}", quantile = "0.9"} : function : sum ⛔️ON_FAIL: |
Consul | Consul: HTTP request: ["{#HTTP_METHOD}"], p50 | The 50 percentile (median) of how long it takes to service the given HTTP request for the given verb. |
DEPENDENT | consul.http.api.p50["{#HTTPMETHOD}"] Preprocessing: - PROMETHEUS PATTERN:consul_api_http{method = "{#HTTP_METHOD}", quantile = "0.5"} : function : sum ⛔️ON_FAIL: |
Consul | Consul: HTTP request: ["{#HTTP_METHOD}"], rate | Thr number of HTTP request for the given verb per second. |
DEPENDENT | consul.http.api.rate["{#HTTPMETHOD}"] Preprocessing: - PROMETHEUS PATTERN:consul_api_http_count{method = "{#HTTP_METHOD}"} : function : sum ⛔️ONFAIL: - CHANGEPER_SECOND |
Consul | Consul: Raft state | Current state of Consul agent. |
DEPENDENT | consul.raft.state[{#SINGLETON}] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Consul | Consul: Raft state: leader | Increments when a server becomes a leader. |
DEPENDENT | consul.raft.stateleader[{#SINGLETON}] Preprocessing: - PROMETHEUS PATTERN:consul_raft_state_leader ⛔️ON_FAIL: |
Consul | Consul: Raft state: candidate | The number of initiated leader elections. |
DEPENDENT | consul.raft.statecandidate[{#SINGLETON}] Preprocessing: - PROMETHEUS PATTERN:consul_raft_state_candidate ⛔️ON_FAIL: |
Consul | Consul: Raft: apply, rate | Incremented whenever a leader first passes a message into the Raft commit process (called an Apply operation). This metric describes the arrival rate of new logs into Raft per second. |
DEPENDENT | consul.raft.apply.rate[{#SINGLETON}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - CHANGEPERSECOND |
Consul | Consul: Raft state: leader last contact, p90 | The 90 percentile of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds. |
DEPENDENT | consul.raft.leaderlastcontact.p90[{#SINGLETON}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - JAVASCRIPT: |
Consul | Consul: Raft state: leader last contact, p50 | The 50 percentile (median) of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds. |
DEPENDENT | consul.raft.leaderlastcontact.p50[{#SINGLETON}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: - JAVASCRIPT: |
Consul | Consul: Raft state: commit time, p90 | The 90 percentile time it takes to commit a new entry to the raft log on the leader, in milliseconds. |
DEPENDENT | consul.raft.committime.p90[{#SINGLETON}] Preprocessing: - PROMETHEUS PATTERN:consul_raft_commitTime{quantile="0.9"} ⛔️ON_FAIL: - JAVASCRIPT: |
Consul | Consul: Raft state: commit time, p50 | The 50 percentile (median) time it takes to commit a new entry to the raft log on the leader, in milliseconds. |
DEPENDENT | consul.raft.committime.p50[{#SINGLETON}] Preprocessing: - PROMETHEUS PATTERN:consul_raft_commitTime{quantile="0.5"} ⛔️ON_FAIL: - JAVASCRIPT: |
Consul | Consul: Raft state: dispatch log, p90 | The 90 percentile time it takes for the leader to write log entries to disk, in milliseconds. |
DEPENDENT | consul.raft.dispatchlog.p90[{#SINGLETON}] Preprocessing: - PROMETHEUS PATTERN:consul_raft_leader_dispatchLog{quantile="0.9"} ⛔️ON_FAIL: - JAVASCRIPT: |
Consul | Consul: Raft state: dispatch log, p50 | The 50 percentile (median) time it takes for the leader to write log entries to disk, in milliseconds. |
DEPENDENT | consul.raft.dispatchlog.p50[{#SINGLETON}] Preprocessing: - PROMETHEUS PATTERN:consul_raft_leader_dispatchLog{quantile="0.5"} ⛔️ON_FAIL: - JAVASCRIPT: |
Consul | Consul: Raft state: dispatch log, rate | The number of times a Raft leader writes a log to disk per second. |
DEPENDENT | consul.raft.dispatchlog.rate[{#SINGLETON}] Preprocessing: - PROMETHEUS PATTERN:consul_raft_leader_dispatchLog_count ⛔️ONFAIL: - CHANGEPER_SECOND |
Consul | Consul: Raft state: commit, rate | The number of commits a new entry to the Raft log on the leader per second. |
DEPENDENT | consul.raft.committime.rate[{#SINGLETON}] Preprocessing: - PROMETHEUS PATTERN:consul_raft_commitTime_count ⛔️ONFAIL: - CHANGEPER_SECOND |
Consul | Consul: Autopilot healthy | Tracks the overall health of the local server cluster. 1 if all servers are healthy, 0 if one or more are unhealthy. |
DEPENDENT | consul.autopilot.healthy[{#SINGLETON}] Preprocessing: - PROMETHEUSPATTERN: ⛔️ONFAIL: |
Zabbix raw items | Consul: Get instance metrics | Get raw metrics from Consul instance /metrics endpoint. |
HTTP_AGENT | consul.getmetrics Preprocessing: - CHECK NOTSUPPORTED⛔️ON FAIL:DISCARD_VALUE -> |
Zabbix raw items | Consul: Get node info | Get configuration and member information of the local agent. |
HTTP_AGENT | consul.getnodeinfo Preprocessing: - CHECKNOTSUPPORTED ⛔️ON_FAIL: |
Zabbix raw items | Consul: Get local services | Get all the services that are registered with the local agent and their status. |
SCRIPT | consul.getlocalservices Expression: The text is too long. Please see the template. |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Consul: Version has been changed | Consul version has changed. Ack to close. |
last(/HashiCorp Consul Node by HTTP/consul.version,#1)<>last(/HashiCorp Consul Node by HTTP/consul.version,#2) and length(last(/HashiCorp Consul Node by HTTP/consul.version))>0 |
INFO | Manual close: YES |
Consul: Current number of open files is too high | "Heavy file descriptor usage (i.e., near the process’s file descriptor limit) indicates a potential file descriptor exhaustion issue." |
min(/HashiCorp Consul Node by HTTP/consul.process_open_fds,5m)/last(/HashiCorp Consul Node by HTTP/consul.process_max_fds)*100>{$CONSUL.OPEN.FDS.MAX.WARN} |
WARNING | |
Consul: Node's health score is warning | This metric ranges from 0 to 8, where 0 indicates "totally healthy". This health score is used to scale the time between outgoing probes, and higher scores translate into longer probing intervals. For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf |
max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN} |
WARNING | Depends on: - Consul: Node's health score is critical |
Consul: Node's health score is critical | This metric ranges from 0 to 8, where 0 indicates "totally healthy". This health score is used to scale the time between outgoing probes, and higher scores translate into longer probing intervals. For more details see section IV of the Lifeguard paper: https://arxiv.org/pdf/1707.00788.pdf |
max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH} |
AVERAGE | |
Consul: Failed to get local services | Failed to get local services. Check debug log for more information. |
length(last(/HashiCorp Consul Node by HTTP/consul.get_local_services.check))>0 |
WARNING | |
Consul: Aggregated status is 'warning' | Aggregated state of service on the local agent is 'warning'. |
last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 1 |
WARNING | |
Consul: Aggregated status is 'critical' | Aggregated state of service on the local agent is 'critical'. |
last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 2 |
AVERAGE |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher
The template to monitor Cloudflare to watch your web traffic and DNS metrics.
It works without any external scripts and uses the Script item.
See Zabbix template operation for basic instructions.
1. Create a host, for example mywebsite.com, for a site in your Cloudflare account.
2. Link the template to the host.
3. Customize the values of {$CLOUDFLARE.API.TOKEN}, {$CLOUDFLARE.ZONE_ID} macros.
Cloudflare API Tokens are available in your Cloudflare account under My Profile > API Tokens.
Zone ID is available in your Cloudflare account under Account Home > Site.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$CLOUDFLARE.API.TOKEN} | Your Cloudflare API Token. |
<change> |
{$CLOUDFLARE.API.URL} | The URL of Cloudflare API endpoint. |
https://api.cloudflare.com/client/v4 |
{$CLOUDFLARE.CACHED_BANDWIDTH.MIN.WARN} | Minimum of cached bandwidth in %. |
50 |
{$CLOUDFLARE.ERRORS.MAX.WARN} | Maximum responses with errors in %. |
30 |
{$CLOUDFLARE.GET_DATA.TIMEOUT} | Response timeout for Cloudflare API. |
3s |
{$CLOUDFLARE.ZONE_ID} | Your Cloudflare Site Zone ID. |
<change> |
There are no template links in this template.
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
General | Cloudflare: Total bandwidth | The volume of all data. |
DEPENDENT | cloudflare.bandwidth.all Preprocessing: - JSONPATH: |
General | Cloudflare: Cached bandwidth | The volume of cached data. |
DEPENDENT | cloudflare.bandwidth.cached Preprocessing: - JSONPATH: |
General | Cloudflare: Uncached bandwidth | The volume of uncached data. |
DEPENDENT | cloudflare.bandwidth.uncached Preprocessing: - JSONPATH: |
General | Cloudflare: Cache hit ratio of bandwidth | The ratio of the amount cached bandwidth to the bandwidth in percentage. |
DEPENDENT | cloudflare.bandwidth.cachehitratio Preprocessing: - JSONPATH: |
General | Cloudflare: SSL encrypted bandwidth | The volume of encrypted data. |
DEPENDENT | cloudflare.bandwidth.ssl.encrypted Preprocessing: - JSONPATH: |
General | Cloudflare: Unencrypted bandwidth | The volume of unencrypted data. |
DEPENDENT | cloudflare.bandwidth.ssl.unencrypted Preprocessing: - JSONPATH: |
General | Cloudflare: DNS queries | The amount of all DNS queries. |
DEPENDENT | cloudflare.dns.query.all Preprocessing: - JSONPATH: |
General | Cloudflare: Stale DNS queries | The number of stale DNS queries. |
DEPENDENT | cloudflare.dns.query.stale Preprocessing: - JSONPATH: |
General | Cloudflare: Uncached DNS queries | The number of uncached DNS queries. |
DEPENDENT | cloudflare.dns.query.uncached Preprocessing: - JSONPATH: |
General | Cloudflare: Total page views | The amount of all pageviews. |
DEPENDENT | cloudflare.pageviews.all Preprocessing: - JSONPATH: |
General | Cloudflare: Total requests | The amount of all requests. |
DEPENDENT | cloudflare.requests.all Preprocessing: - JSONPATH: |
General | Cloudflare: Cached requests | - |
DEPENDENT | cloudflare.requests.cached Preprocessing: - JSONPATH: |
General | Cloudflare: Uncached requests | The number of uncached requests. |
DEPENDENT | cloudflare.requests.uncached Preprocessing: - JSONPATH: |
General | Cloudflare: Cache hit ratio % over time | The ratio of the amount cached requests to all requests in percentage. |
DEPENDENT | cloudflare.requests.cachehitratio Preprocessing: - JSONPATH: |
General | Cloudflare: Response codes 1xx | The number requests with 1xx response codes. |
DEPENDENT | cloudflare.requests.response_100 Preprocessing: - JSONPATH: |
General | Cloudflare: Response codes 2xx | The number requests with 2xx response codes. |
DEPENDENT | cloudflare.requests.response_200 Preprocessing: - JSONPATH: |
General | Cloudflare: Response codes 3xx | The number requests with 3xx response codes. |
DEPENDENT | cloudflare.requests.response_300 Preprocessing: - JSONPATH: |
General | Cloudflare: Response codes 4xx | The number requests with 4xx response codes. |
DEPENDENT | cloudflare.requests.response_400 Preprocessing: - JSONPATH: |
General | Cloudflare: Response codes 5xx | The number requests with 5xx response codes. |
DEPENDENT | cloudflare.requests.response_500 Preprocessing: - JSONPATH: |
General | Cloudflare: Non-2xx responses ratio | The ratio of the amount requests with non-2xx response codes to all requests in percentage. |
DEPENDENT | cloudflare.requests.others_ratio Preprocessing: - JSONPATH: |
General | Cloudflare: 2xx responses ratio | The ratio of the amount requests with 2xx response codes to all requests in percentage. |
DEPENDENT | cloudflare.requests.success_ratio Preprocessing: - JSONPATH: |
General | Cloudflare: SSL encrypted requests | The number of encrypted requests. |
DEPENDENT | cloudflare.requests.ssl.encrypted Preprocessing: - JSONPATH: |
General | Cloudflare: Unencrypted requests | The number of unencrypted requests. |
DEPENDENT | cloudflare.requests.ssl.unencrypted Preprocessing: - JSONPATH: |
General | Cloudflare: Total threats | The number of all threats. |
DEPENDENT | cloudflare.threats.all Preprocessing: - JSONPATH: |
General | Cloudflare: Unique visitors | The number of all visitors IPs. |
DEPENDENT | cloudflare.uniques.all Preprocessing: - JSONPATH: |
Zabbix raw items | Cloudflare: Get data | The JSON with result of Cloudflare API request. |
SCRIPT | cloudflare.get Expression: The text is too long. Please see the template. |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Cloudflare: Cached bandwidth is too low | max(/Cloudflare by HTTP/cloudflare.bandwidth.cache_hit_ratio,#3) < {$CLOUDFLARE.CACHED_BANDWIDTH.MIN.WARN} |
WARNING | ||
Cloudflare: Ratio of non-2xx responses is too high | A large number of errors can indicate a malfunction of the site. |
min(/Cloudflare by HTTP/cloudflare.requests.others_ratio,#3) > {$CLOUDFLARE.ERRORS.MAX.WARN} |
AVERAGE |
Please report any issues with the template at https://support.zabbix.com
For Zabbix version: 6.2 and higher
The template to monitor TLS/SSL certificate on the website by Zabbix agent 2 that works without any external scripts.
Zabbix agent 2 with the WebCertificate plugin requests certificate using the web.certificate.get key and returns
JSON with certificate attributes.
See Zabbix template operation for basic instructions.
1. Setup and configure zabbix-agent2 with the WebCertificate plugin.
2. Test availability: zabbix_get -s <zabbix_agent_addr> -k web.certificate.get[<website_DNS_name>]
3. Create a host for the TLS/SSL certificate with Zabbix agent interface.
4. Link the template to the host.
5. Customize the value of {$CERT.WEBSITE.HOSTNAME} macro.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$CERT.EXPIRY.WARN} | Number of days until the certificate expires. |
7 |
{$CERT.WEBSITE.HOSTNAME} | The website DNS name for the connection. |
<Put DNS name> |
{$CERT.WEBSITE.IP} | The website IP address for the connection. |
`` |
{$CERT.WEBSITE.PORT} | The TLS/SSL port number of the website. |
443 |
There are no template links in this template.
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
General | Cert: Validation result | The certificate validation result. Possible values: valid/invalid/valid-but-self-signed |
DEPENDENT | cert.validation Preprocessing: - JSONPATH: |
General | Cert: Last validation status | Last check result message. |
DEPENDENT | cert.message Preprocessing: - JSONPATH: |
General | Cert: Version | The version of the encoded certificate. |
DEPENDENT | cert.version Preprocessing: - JSONPATH: |
General | Cert: Serial number | The serial number is a positive integer assigned by the CA to each certificate. It is unique for each certificate issued by a given CA. Non-conforming CAs may issue certificates with serial numbers that are negative or zero. |
DEPENDENT | cert.serial_number Preprocessing: - JSONPATH: |
General | Cert: Signature algorithm | The algorithm identifier for the algorithm used by the CA to sign the certificate. |
DEPENDENT | cert.signature_algorithm Preprocessing: - JSONPATH: |
General | Cert: Issuer | The field identifies the entity that has signed and issued the certificate. |
DEPENDENT | cert.issuer Preprocessing: - JSONPATH: |
General | Cert: Valid from | The date on which the certificate validity period begins. |
DEPENDENT | cert.not_before Preprocessing: - JSONPATH: |
General | Cert: Expires on | The date on which the certificate validity period ends. |
DEPENDENT | cert.not_after Preprocessing: - JSONPATH: |
General | Cert: Subject | The field identifies the entity associated with the public key stored in the subject public key field. |
DEPENDENT | cert.subject Preprocessing: - JSONPATH: |
General | Cert: Subject alternative name | The subject alternative name extension allows identities to be bound to the subject of the certificate. These identities may be included in addition to or in place of the identity in the subject field of the certificate. Defined options include an Internet electronic mail address, a DNS name, an IP address, and a Uniform Resource Identifier (URI). |
DEPENDENT | cert.alternative_names Preprocessing: - JSONPATH: |
General | Cert: Public key algorithm | The digital signature algorithm is used to verify the signature of a certificate. |
DEPENDENT | cert.publickeyalgorithm Preprocessing: - JSONPATH: |
General | Cert: Fingerprint | The Certificate Signature (SHA1 Fingerprint or Thumbprint) is the hash of the entire certificate in DER form. |
DEPENDENT | cert.sha1_fingerprint Preprocessing: - JSONPATH: |
Zabbix raw items | Cert: Get | Returns the JSON with attributes of a certificate of the requested site. |
ZABBIX_PASSIVE | web.certificate.get[{$CERT.WEBSITE.HOSTNAME},{$CERT.WEBSITE.PORT},{$CERT.WEBSITE.IP}] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Cert: SSL certificate is invalid | SSL certificate has expired or it is issued for another domain. |
find(/Website certificate by Zabbix agent 2/cert.validation,,"like","invalid")=1 |
HIGH | |
Cert: SSL certificate expires soon | The SSL certificate should be updated or it will become untrusted. |
(last(/Website certificate by Zabbix agent 2/cert.not_after) - now()) / 86400 < {$CERT.EXPIRY.WARN} |
WARNING | Depends on: - Cert: SSL certificate is invalid |
Cert: Fingerprint has changed | The SSL certificate fingerprint has changed. If you did not update the certificate, it may mean your certificate has been hacked. Ack to close. There could be multiple valid certificates on some installations. In this case, the trigger will have a false positive. You can ignore it or disable the trigger. |
last(/Website certificate by Zabbix agent 2/cert.sha1_fingerprint) <> last(/Website certificate by Zabbix agent 2/cert.sha1_fingerprint,#2) |
INFO | Manual close: YES |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher. The template is designed to monitor Ceph cluster by Zabbix, which works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template Ceph by Zabbix agent 2
— collects metrics by polling zabbix-agent2.
This template was tested on:
See Zabbix template operation for basic instructions.
Test availability: zabbix_get -s ceph-host -k ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$CEPH.API.KEY} | - |
zabbix_pass |
{$CEPH.CONNSTRING} | - |
https://localhost:8003 |
{$CEPH.USER} | - |
zabbix |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
OSD | - |
ZABBIX_PASSIVE | ceph.osd.discovery["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] |
Pool | - |
ZABBIX_PASSIVE | ceph.pool.discovery["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Ceph | Ceph: Ping | ZABBIX_PASSIVE | ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
|
Ceph | Ceph: Number of Monitors | The number of Monitors configured in a Ceph cluster. |
DEPENDENT | ceph.nummon Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:30m |
Ceph | Ceph: Overall cluster status | The overall Ceph cluster status, eg 0 - HEALTHOK, 1 - HEALTHWARN or 2 - HEALTH_ERR. |
DEPENDENT | ceph.overallstatus Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:10m |
Ceph | Ceph: Minimum Mon release version | minmonrelease_name |
DEPENDENT | ceph.minmonreleasename Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:1h |
Ceph | Ceph: Ceph Read bandwidth | The global read bytes per second. |
DEPENDENT | ceph.rdbytes.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Ceph | Ceph: Ceph Write bandwidth | The global write bytes per second |
DEPENDENT | ceph.wrbytes.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Ceph | Ceph: Ceph Read operations per sec | The global read operations per second. |
DEPENDENT | ceph.rdops.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Ceph | Ceph: Ceph Write operations per sec | The global write operations per second. |
DEPENDENT | ceph.wrops.rate Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Ceph | Ceph: Total bytes available | The total bytes available in a Ceph cluster. |
DEPENDENT | ceph.totalavailbytes Preprocessing: - JSONPATH: |
Ceph | Ceph: Total bytes | The total (RAW) capacity of a Ceph cluster in bytes. |
DEPENDENT | ceph.total_bytes Preprocessing: - JSONPATH: |
Ceph | Ceph: Total bytes used | The total bytes used in a Ceph cluster. |
DEPENDENT | ceph.totalusedbytes Preprocessing: - JSONPATH: |
Ceph | Ceph: Total number of objects | The total number of objects in a Ceph cluster. |
DEPENDENT | ceph.total_objects Preprocessing: - JSONPATH: |
Ceph | Ceph: Number of Placement Groups | The total number of Placement Groups in a Ceph cluster. |
DEPENDENT | ceph.numpg Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:10m |
Ceph | Ceph: Number of Placement Groups in Temporary state | The total number of Placement Groups in a pg_temp state |
DEPENDENT | ceph.numpgtemp Preprocessing: - JSONPATH: |
Ceph | Ceph: Number of Placement Groups in Active state | The total number of Placement Groups in an active state. |
DEPENDENT | ceph.pg_states.active Preprocessing: - JSONPATH: |
Ceph | Ceph: Number of Placement Groups in Clean state | The total number of Placement Groups in a clean state. |
DEPENDENT | ceph.pg_states.clean Preprocessing: - JSONPATH: |
Ceph | Ceph: Number of Placement Groups in Peering state | The total number of Placement Groups in a peering state. |
DEPENDENT | ceph.pg_states.peering Preprocessing: - JSONPATH: |
Ceph | Ceph: Number of Placement Groups in Scrubbing state | The total number of Placement Groups in a scrubbing state. |
DEPENDENT | ceph.pg_states.scrubbing Preprocessing: - JSONPATH: |
Ceph | Ceph: Number of Placement Groups in Undersized state | The total number of Placement Groups in an undersized state. |
DEPENDENT | ceph.pg_states.undersized Preprocessing: - JSONPATH: |
Ceph | Ceph: Number of Placement Groups in Backfilling state | The total number of Placement Groups in a backfill state. |
DEPENDENT | ceph.pg_states.backfilling Preprocessing: - JSONPATH: |
Ceph | Ceph: Number of Placement Groups in degraded state | The total number of Placement Groups in a degraded state. |
DEPENDENT | ceph.pg_states.degraded Preprocessing: - JSONPATH: |
Ceph | Ceph: Number of Placement Groups in inconsistent state | The total number of Placement Groups in an inconsistent state. |
DEPENDENT | ceph.pg_states.inconsistent Preprocessing: - JSONPATH: |
Ceph | Ceph: Number of Placement Groups in Unknown state | The total number of Placement Groups in an unknown state. |
DEPENDENT | ceph.pg_states.unknown Preprocessing: - JSONPATH: |
Ceph | Ceph: Number of Placement Groups in remapped state | The total number of Placement Groups in a remapped state. |
DEPENDENT | ceph.pg_states.remapped Preprocessing: - JSONPATH: |
Ceph | Ceph: Number of Placement Groups in recovering state | The total number of Placement Groups in a recovering state. |
DEPENDENT | ceph.pg_states.recovering Preprocessing: - JSONPATH: |
Ceph | Ceph: Number of Placement Groups in backfill_toofull state | The total number of Placement Groups in a backfill_toofull state. |
DEPENDENT | ceph.pgstates.backfilltoofull Preprocessing: - JSONPATH: |
Ceph | Ceph: Number of Placement Groups in backfill_wait state | The total number of Placement Groups in a backfill_wait state. |
DEPENDENT | ceph.pgstates.backfillwait Preprocessing: - JSONPATH: |
Ceph | Ceph: Number of Placement Groups in recovery_wait state | The total number of Placement Groups in a recovery_wait state. |
DEPENDENT | ceph.pgstates.recoverywait Preprocessing: - JSONPATH: |
Ceph | Ceph: Number of Pools | The total number of pools in a Ceph cluster. |
DEPENDENT | ceph.num_pools Preprocessing: - JSONPATH: |
Ceph | Ceph: Number of OSDs | The number of the known storage daemons in a Ceph cluster. |
DEPENDENT | ceph.numosd Preprocessing: - JSONPATH: - DISCARD UNCHANGED_HEARTBEAT:10m |
Ceph | Ceph: Number of OSDs in state: UP | The total number of the online storage daemons in a Ceph cluster. |
DEPENDENT | ceph.numosdup Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Ceph | Ceph: Number of OSDs in state: IN | The total number of the participating storage daemons in a Ceph cluster. |
DEPENDENT | ceph.numosdin Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Ceph | Ceph: Ceph OSD avg fill | The average fill of OSDs. |
DEPENDENT | ceph.osd_fill.avg Preprocessing: - JSONPATH: |
Ceph | Ceph: Ceph OSD max fill | The percentage of the most filled OSD. |
DEPENDENT | ceph.osd_fill.max Preprocessing: - JSONPATH: |
Ceph | Ceph: Ceph OSD min fill | The percentage fill of the minimum filled OSD. |
DEPENDENT | ceph.osd_fill.min Preprocessing: - JSONPATH: |
Ceph | Ceph: Ceph OSD max PGs | The maximum amount of Placement Groups on OSDs. |
DEPENDENT | ceph.osd_pgs.max Preprocessing: - JSONPATH: |
Ceph | Ceph: Ceph OSD min PGs | The minimum amount of Placement Groups on OSDs. |
DEPENDENT | ceph.osd_pgs.min Preprocessing: - JSONPATH: |
Ceph | Ceph: Ceph OSD avg PGs | The average amount of Placement Groups on OSDs. |
DEPENDENT | ceph.osd_pgs.avg Preprocessing: - JSONPATH: |
Ceph | Ceph: Ceph OSD Apply latency Avg | The average apply latency of OSDs. |
DEPENDENT | ceph.osdlatencyapply.avg Preprocessing: - JSONPATH: |
Ceph | Ceph: Ceph OSD Apply latency Max | The maximum apply latency of OSDs. |
DEPENDENT | ceph.osdlatencyapply.max Preprocessing: - JSONPATH: |
Ceph | Ceph: Ceph OSD Apply latency Min | The minimum apply latency of OSDs. |
DEPENDENT | ceph.osdlatencyapply.min Preprocessing: - JSONPATH: |
Ceph | Ceph: Ceph OSD Commit latency Avg | The average commit latency of OSDs. |
DEPENDENT | ceph.osdlatencycommit.avg Preprocessing: - JSONPATH: |
Ceph | Ceph: Ceph OSD Commit latency Max | The maximum commit latency of OSDs. |
DEPENDENT | ceph.osdlatencycommit.max Preprocessing: - JSONPATH: |
Ceph | Ceph: Ceph OSD Commit latency Min | The minimum commit latency of OSDs. |
DEPENDENT | ceph.osdlatencycommit.min Preprocessing: - JSONPATH: |
Ceph | Ceph: Ceph backfill full ratio | The backfill full ratio setting of the Ceph cluster as configured on OSDMap. |
DEPENDENT | ceph.osdbackfillfullratio Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Ceph | Ceph: Ceph full ratio | The full ratio setting of the Ceph cluster as configured on OSDMap. |
DEPENDENT | ceph.osdfullratio Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Ceph | Ceph: Ceph nearfull ratio | The near full ratio setting of the Ceph cluster as configured on OSDMap. |
DEPENDENT | ceph.osdnearfullratio Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Ceph | Ceph: [osd.{#OSDNAME}] OSD in | DEPENDENT | ceph.osd[{#OSDNAME},in] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
|
Ceph | Ceph: [osd.{#OSDNAME}] OSD up | DEPENDENT | ceph.osd[{#OSDNAME},up] Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
|
Ceph | Ceph: [osd.{#OSDNAME}] OSD PGs | DEPENDENT | ceph.osd[{#OSDNAME},numpgs] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
|
Ceph | Ceph: [osd.{#OSDNAME}] OSD fill | DEPENDENT | ceph.osd[{#OSDNAME},fill] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
|
Ceph | Ceph: [osd.{#OSDNAME}] OSD latency apply | The time taken to flush an update to disks. |
DEPENDENT | ceph.osd[{#OSDNAME},latencyapply] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
Ceph | Ceph: [osd.{#OSDNAME}] OSD latency commit | The time taken to commit an operation to the journal. |
DEPENDENT | ceph.osd[{#OSDNAME},latencycommit] Preprocessing: - JSONPATH: ⛔️ON FAIL:DISCARD_VALUE -> |
Ceph | Ceph: [{#POOLNAME}] Pool Used | The total bytes used in a pool. |
DEPENDENT | ceph.pool["{#POOLNAME}",bytes_used] Preprocessing: - JSONPATH: |
Ceph | Ceph: [{#POOLNAME}] Max available | The maximum available space in the given pool. |
DEPENDENT | ceph.pool["{#POOLNAME}",max_avail] Preprocessing: - JSONPATH: |
Ceph | Ceph: [{#POOLNAME}] Pool RAW Used | Bytes used in pool including the copies made. |
DEPENDENT | ceph.pool["{#POOLNAME}",stored_raw] Preprocessing: - JSONPATH: |
Ceph | Ceph: [{#POOLNAME}] Pool Percent Used | The percentage of the storage used per pool. |
DEPENDENT | ceph.pool["{#POOLNAME}",percent_used] Preprocessing: - JSONPATH: |
Ceph | Ceph: [{#POOLNAME}] Pool objects | The number of objects in the pool. |
DEPENDENT | ceph.pool["{#POOLNAME}",objects] Preprocessing: - JSONPATH: |
Ceph | Ceph: [{#POOLNAME}] Pool Read bandwidth | The read rate per pool (bytes per second). |
DEPENDENT | ceph.pool["{#POOLNAME}",rdbytes.rate] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Ceph | Ceph: [{#POOLNAME}] Pool Write bandwidth | The write rate per pool (bytes per second). |
DEPENDENT | ceph.pool["{#POOLNAME}",wrbytes.rate] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Ceph | Ceph: [{#POOLNAME}] Pool Read operations | The read rate per pool (operations per second). |
DEPENDENT | ceph.pool["{#POOLNAME}",rdops.rate] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Ceph | Ceph: [{#POOLNAME}] Pool Write operations | The write rate per pool (operations per second). |
DEPENDENT | ceph.pool["{#POOLNAME}",wrops.rate] Preprocessing: - JSONPATH: - CHANGE PER_SECOND |
Zabbix raw items | Ceph: Get overall cluster status | ZABBIX_PASSIVE | ceph.status["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] | |
Zabbix raw items | Ceph: Get OSD stats | ZABBIX_PASSIVE | ceph.osd.stats["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] | |
Zabbix raw items | Ceph: Get OSD dump | ZABBIX_PASSIVE | ceph.osd.dump["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] | |
Zabbix raw items | Ceph: Get df | ZABBIX_PASSIVE | ceph.df.details["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Ceph: Can not connect to cluster | The connection to the Ceph RESTful module is broken (if there is any error presented including AUTH and the configuration issues). |
last(/Ceph by Zabbix agent 2/ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"])=0 |
AVERAGE | |
Ceph: Cluster in ERROR state | - |
last(/Ceph by Zabbix agent 2/ceph.overall_status)=2 |
AVERAGE | Manual close: YES |
Ceph: Cluster in WARNING state | - |
last(/Ceph by Zabbix agent 2/ceph.overall_status)=1 Recovery expression: last(/Ceph by Zabbix agent 2/ceph.overall_status)=0 |
WARNING | Manual close: YES Depends on: - Ceph: Cluster in ERROR state |
Ceph: Minimum monitor release version has changed | A Ceph version has changed. Perform Ack to close manually. |
last(/Ceph by Zabbix agent 2/ceph.min_mon_release_name,#1)<>last(/Ceph by Zabbix agent 2/ceph.min_mon_release_name,#2) and length(last(/Ceph by Zabbix agent 2/ceph.min_mon_release_name))>0 |
INFO | Manual close: YES |
Ceph: OSD osd.{#OSDNAME} is down | OSD osd.{#OSDNAME} is marked "down" in the osdmap. The OSD daemon may have been stopped, or peer OSDs may be unable to reach the OSD over the network. |
last(/Ceph by Zabbix agent 2/ceph.osd[{#OSDNAME},up]) = 0 |
AVERAGE | |
Ceph: OSD osd.{#OSDNAME} is full | - |
min(/Ceph by Zabbix agent 2/ceph.osd[{#OSDNAME},fill],15m) > last(/Ceph by Zabbix agent 2/ceph.osd_full_ratio)*100 |
AVERAGE | |
Ceph: Ceph OSD osd.{#OSDNAME} is near full | - |
min(/Ceph by Zabbix agent 2/ceph.osd[{#OSDNAME},fill],15m) > last(/Ceph by Zabbix agent 2/ceph.osd_nearfull_ratio)*100 |
WARNING | Depends on: - Ceph: OSD osd.{#OSDNAME} is full |
Please report any issues with the template at https://support.zabbix.com.
You can also provide feedback, discuss the template or ask for help at ZABBIX forums.
For Zabbix version: 6.2 and higher
See Zabbix template operation for basic instructions.
Refer to the vendor documentation.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$ARANET.API.ENDPOINT} | Aranet Cloud API endpoint. |
https://aranet.cloud/api |
{$ARANET.API.PASSWORD} | Aranet Cloud password. |
<PUT YOUR PASSWORD> |
{$ARANET.API.SPACE_NAME} | Aranet Cloud organization name. |
<PUT YOUR SPACE NAME> |
{$ARANET.API.USERNAME} | Aranet Cloud username. |
<PUT YOUR USERNAME> |
{$ARANET.BATT.VOLTAGE.MIN.CRIT} | Battery voltage critical threshold. |
2 |
{$ARANET.BATT.VOLTAGE.MIN.WARN} | Battery voltage warning threshold. |
1 |
{$ARANET.CO2.MAX.CRIT} | CO2 critical threshold. |
1000 |
{$ARANET.CO2.MAX.WARN} | CO2 warning threshold. |
600 |
{$ARANET.HUMIDITY.MAX.WARN} | Maximum humidity threshold. |
70 |
{$ARANET.HUMIDITY.MIN.WARN} | Minimum humidity threshold. |
20 |
{$ARANET.LAST_UPDATE.MAX.WARN} | Data update delay threshold. |
1h |
{$ARANET.LLD.FILTER.GATEWAY_ID.MATCHES} | Filter of discoverable sensors by gateway id. |
.+ |
{$ARANET.LLD.FILTER.GATEWAY_NAME.MATCHES} | Filter of discoverable sensors by gateway name. |
.+ |
{$ARANET.LLD.FILTER.GATEWAYNAME.NOTMATCHES} | Filter to exclude discoverable sensors by gateway name. |
CHANGE_IF_NEEDED |
{$ARANET.LLD.FILTER.SENSOR_ID.MATCHES} | Filter of discoverable sensors by id. |
.+ |
{$ARANET.LLD.FILTER.SENSOR_NAME.MATCHES} | Filter of discoverable sensors by name. |
.+ |
{$ARANET.LLD.FILTER.SENSORNAME.NOTMATCHES} | Filter to exclude discoverable sensors by name. |
CHANGE_IF_NEEDED |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Atmospheric pressure discovery | Discovery for Aranet Cloud atmospheric pressure sensors |
DEPENDENT | aranet.pressure.discovery Filter: AND- {#SENSORNAME} MATCHESREGEX - {#SENSORNAME} NOTMATCHESREGEX - {#SENSORID} MATCHESREGEX - {#GATEWAYNAME} MATCHESREGEX - {#GATEWAYNAME} NOTMATCHESREGEX - {#GATEWAYID} MATCHESREGEX - {#METRIC} MATCHES_REGEX |
Battery voltage discovery | Discovery for Aranet Cloud Battery voltage sensors |
DEPENDENT | aranet.battery.voltage.discovery Filter: AND- {#SENSORNAME} MATCHESREGEX - {#SENSORNAME} NOTMATCHESREGEX - {#SENSORID} MATCHESREGEX - {#GATEWAYNAME} MATCHESREGEX - {#GATEWAYNAME} NOTMATCHESREGEX - {#GATEWAYID} MATCHESREGEX - {#METRIC} MATCHES_REGEX |
CO2 discovery | Discovery for Aranet Cloud CO2 sensors |
DEPENDENT | aranet.co2.discovery Filter: AND- {#SENSORNAME} MATCHESREGEX - {#SENSORNAME} NOTMATCHESREGEX - {#SENSORID} MATCHESREGEX - {#GATEWAYNAME} MATCHESREGEX - {#GATEWAYNAME} NOTMATCHESREGEX - {#GATEWAYID} MATCHESREGEX - {#METRIC} MATCHES_REGEX |
Current discovery | Discovery for Aranet Cloud Current sensors |
DEPENDENT | aranet.current.discovery Filter: AND- {#SENSORNAME} MATCHESREGEX - {#SENSORNAME} NOTMATCHESREGEX - {#SENSORID} MATCHESREGEX - {#GATEWAYNAME} MATCHESREGEX - {#GATEWAYNAME} NOTMATCHESREGEX - {#GATEWAYID} MATCHESREGEX - {#METRIC} MATCHES_REGEX |
Differential Pressure discovery | Discovery for Aranet Cloud Differential Pressure sensors |
DEPENDENT | aranet.diffpressure.discovery Filter: AND- {#SENSOR NAME} MATCHESREGEX{$ARANET.LLD.FILTER.SENSOR_NAME.MATCHES} - {#SENSOR NAME} NOTMATCHESREGEX{$ARANET.LLD.FILTER.SENSOR_NAME.NOT_MATCHES} - {#SENSORID} MATCHESREGEX - {#GATEWAYNAME} MATCHESREGEX - {#GATEWAYNAME} NOTMATCHESREGEX - {#GATEWAYID} MATCHESREGEX - {#METRIC} MATCHESREGEX |
Distance discovery | Discovery for Aranet Cloud Distance sensors |
DEPENDENT | aranet.distance.discovery Filter: AND- {#SENSORNAME} MATCHESREGEX - {#SENSORNAME} NOTMATCHESREGEX - {#SENSORID} MATCHESREGEX - {#GATEWAYNAME} MATCHESREGEX - {#GATEWAYNAME} NOTMATCHESREGEX - {#GATEWAYID} MATCHESREGEX - {#METRIC} MATCHES_REGEX |
Humidity discovery | Discovery for Aranet Cloud humidity sensors |
DEPENDENT | aranet.humidity.discovery Filter: AND- {#SENSORNAME} MATCHESREGEX - {#SENSORNAME} NOTMATCHESREGEX - {#SENSORID} MATCHESREGEX - {#GATEWAYNAME} MATCHESREGEX - {#GATEWAYNAME} NOTMATCHESREGEX - {#GATEWAYID} MATCHESREGEX - {#METRIC} MATCHES_REGEX |
Illuminance discovery | Discovery for Aranet Cloud Illuminance sensors |
DEPENDENT | aranet.illuminance.discovery Filter: AND- {#SENSORNAME} MATCHESREGEX - {#SENSORNAME} NOTMATCHESREGEX - {#SENSORID} MATCHESREGEX - {#GATEWAYNAME} MATCHESREGEX - {#GATEWAYNAME} NOTMATCHESREGEX - {#GATEWAYID} MATCHESREGEX - {#METRIC} MATCHES_REGEX |
Last update discovery | Discovery for Aranet Cloud Last update metric |
DEPENDENT | aranet.lastupdate.discovery Filter: AND- {#SENSOR NAME} MATCHESREGEX{$ARANET.LLD.FILTER.SENSOR_NAME.MATCHES} - {#SENSOR NAME} NOTMATCHESREGEX{$ARANET.LLD.FILTER.SENSOR_NAME.NOT_MATCHES} - {#SENSORID} MATCHESREGEX - {#GATEWAYNAME} MATCHESREGEX - {#GATEWAYNAME} NOTMATCHESREGEX - {#GATEWAYID} MATCHESREGEX - {#METRIC} MATCHESREGEX |
pH discovery | Discovery for Aranet Cloud pH sensors |
DEPENDENT | aranet.ph.discovery Filter: AND- {#SENSORNAME} MATCHESREGEX - {#SENSORNAME} NOTMATCHESREGEX - {#SENSORID} MATCHESREGEX - {#GATEWAYNAME} MATCHESREGEX - {#GATEWAYNAME} NOTMATCHESREGEX - {#GATEWAYID} MATCHESREGEX - {#METRIC} MATCHES_REGEX |
Pore Electrical Conductivity discovery | Discovery for Aranet Cloud Pore Electrical Conductivity sensors |
DEPENDENT | aranet.poreelectriccond.discovery Filter: AND- {#SENSORNAME} MATCHESREGEX - {#SENSORNAME} NOTMATCHESREGEX - {#SENSORID} MATCHESREGEX - {#GATEWAYNAME} MATCHESREGEX - {#GATEWAYNAME} NOTMATCHESREGEX - {#GATEWAYID} MATCHESREGEX - {#METRIC} MATCHES_REGEX |
PPFD discovery | Discovery for Aranet Cloud PPFD sensors |
DEPENDENT | aranet.ppfd.discovery Filter: AND- {#SENSORNAME} MATCHESREGEX - {#SENSORNAME} NOTMATCHESREGEX - {#SENSORID} MATCHESREGEX - {#GATEWAYNAME} MATCHESREGEX - {#GATEWAYNAME} NOTMATCHESREGEX - {#GATEWAYID} MATCHESREGEX - {#METRIC} MATCHES_REGEX |
Pulses Cumulative discovery | Discovery for Aranet Cloud Pulses Cumulative sensors |
DEPENDENT | aranet.pulsescumulative.discovery Filter: AND- {#SENSOR NAME} MATCHESREGEX{$ARANET.LLD.FILTER.SENSOR_NAME.MATCHES} - {#SENSOR NAME} NOTMATCHESREGEX{$ARANET.LLD.FILTER.SENSOR_NAME.NOT_MATCHES} - {#SENSORID} MATCHESREGEX - {#GATEWAYNAME} MATCHESREGEX - {#GATEWAYNAME} NOTMATCHESREGEX - {#GATEWAYID} MATCHESREGEX - {#METRIC} MATCHESREGEX |
Pulses discovery | Discovery for Aranet Cloud Pulses sensors |
DEPENDENT | aranet.pulses.discovery Filter: AND- {#SENSORNAME} MATCHESREGEX - {#SENSORNAME} NOTMATCHESREGEX - {#SENSORID} MATCHESREGEX - {#GATEWAYNAME} MATCHESREGEX - {#GATEWAYNAME} NOTMATCHESREGEX - {#GATEWAYID} MATCHESREGEX - {#METRIC} MATCHES_REGEX |
RSSI discovery | Discovery for Aranet Cloud RSSI sensors |
DEPENDENT | aranet.rssi.discovery Filter: AND- {#SENSORNAME} MATCHESREGEX - {#SENSORNAME} NOTMATCHESREGEX - {#SENSORID} MATCHESREGEX - {#GATEWAYNAME} MATCHESREGEX - {#GATEWAYNAME} NOTMATCHESREGEX - {#GATEWAYID} MATCHESREGEX - {#METRIC} MATCHES_REGEX |
Soil Dielectric Permittivity discovery | Discovery for Aranet Cloud Soil Dielectric Permittivity sensors |
DEPENDENT | aranet.soildielectricperm.discovery Filter: AND- {#SENSORNAME} MATCHESREGEX - {#SENSORNAME} NOTMATCHESREGEX - {#SENSORID} MATCHESREGEX - {#GATEWAYNAME} MATCHESREGEX - {#GATEWAYNAME} NOTMATCHESREGEX - {#GATEWAYID} MATCHESREGEX - {#METRIC} MATCHES_REGEX |
Soil Electrical Conductivity discovery | Discovery for Aranet Cloud Soil Electrical Conductivity sensors |
DEPENDENT | aranet.soilelectriccond.discovery Filter: AND- {#SENSORNAME} MATCHESREGEX - {#SENSORNAME} NOTMATCHESREGEX - {#SENSORID} MATCHESREGEX - {#GATEWAYNAME} MATCHESREGEX - {#GATEWAYNAME} NOTMATCHESREGEX - {#GATEWAYID} MATCHESREGEX - {#METRIC} MATCHES_REGEX |
Temperature discovery | Discovery for Aranet Cloud temperature sensors |
DEPENDENT | aranet.temp.discovery Filter: AND- {#SENSORNAME} MATCHESREGEX - {#SENSORNAME} NOTMATCHESREGEX - {#SENSORID} MATCHESREGEX - {#GATEWAYNAME} MATCHESREGEX - {#GATEWAYNAME} NOTMATCHESREGEX - {#GATEWAYID} MATCHESREGEX - {#METRIC} MATCHES_REGEX |
Voltage discovery | Discovery for Aranet Cloud Voltage sensors |
DEPENDENT | aranet.voltage.discovery Filter: AND- {#SENSORNAME} MATCHESREGEX - {#SENSORNAME} NOTMATCHESREGEX - {#SENSORID} MATCHESREGEX - {#GATEWAYNAME} MATCHESREGEX - {#GATEWAYNAME} NOTMATCHESREGEX - {#GATEWAYID} MATCHESREGEX - {#METRIC} MATCHES_REGEX |
Volumetric Water Content discovery | Discovery for Aranet Cloud Volumetric Water Content sensors |
DEPENDENT | aranet.volumwatercontent.discovery Filter: AND- {#SENSORNAME} MATCHESREGEX - {#SENSORNAME} NOTMATCHESREGEX - {#SENSORID} MATCHESREGEX - {#GATEWAYNAME} MATCHESREGEX - {#GATEWAYNAME} NOTMATCHESREGEX - {#GATEWAYID} MATCHESREGEX - {#METRIC} MATCHES_REGEX |
Weight discovery | Discovery for Aranet Cloud Weight sensors |
DEPENDENT | aranet.weight.discovery Filter: AND- {#SENSORNAME} MATCHESREGEX - {#SENSORNAME} NOTMATCHESREGEX - {#SENSORID} MATCHESREGEX - {#GATEWAYNAME} MATCHESREGEX - {#GATEWAYNAME} NOTMATCHESREGEX - {#GATEWAYID} MATCHESREGEX - {#METRIC} MATCHES_REGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Aranet | {#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | - |
DEPENDENT | aranet.temp["{#GATEWAYID}", "{#SENSORID}"] Preprocessing: - JSONPATH: |
Aranet | {#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | - |
DEPENDENT | aranet.humidity["{#GATEWAYID}", "{#SENSORID}"] Preprocessing: - JSONPATH: |
Aranet | {#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | - |
DEPENDENT | aranet.rssi["{#GATEWAYID}", "{#SENSORID}"] Preprocessing: - JSONPATH: |
Aranet | {#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | - |
DEPENDENT | aranet.battery.voltage["{#GATEWAYID}", "{#SENSORID}"] Preprocessing: - JSONPATH: |
Aranet | {#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | - |
DEPENDENT | aranet.co2["{#GATEWAYID}", "{#SENSORID}"] Preprocessing: - JSONPATH: |
Aranet | {#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | - |
DEPENDENT | aranet.pressure["{#GATEWAYID}", "{#SENSORID}"] Preprocessing: - JSONPATH: |
Aranet | {#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | - |
DEPENDENT | aranet.voltage["{#GATEWAYID}", "{#SENSORID}"] Preprocessing: - JSONPATH: |
Aranet | {#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | - |
DEPENDENT | aranet.weight["{#GATEWAYID}", "{#SENSORID}"] Preprocessing: - JSONPATH: |
Aranet | {#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | - |
DEPENDENT | aranet.volumetric.water.content["{#GATEWAYID}", "{#SENSORID}"] Preprocessing: - JSONPATH: |
Aranet | {#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | - |
DEPENDENT | aranet.ppfd["{#GATEWAYID}", "{#SENSORID}"] Preprocessing: - JSONPATH: |
Aranet | {#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | - |
DEPENDENT | aranet.distance["{#GATEWAYID}", "{#SENSORID}"] Preprocessing: - JSONPATH: |
Aranet | {#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | - |
DEPENDENT | aranet.illuminance["{#GATEWAYID}", "{#SENSORID}"] Preprocessing: - JSONPATH: |
Aranet | {#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | - |
DEPENDENT | aranet.ph["{#GATEWAYID}", "{#SENSORID}"] Preprocessing: - JSONPATH: |
Aranet | {#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | - |
DEPENDENT | aranet.current["{#GATEWAYID}", "{#SENSORID}"] Preprocessing: - JSONPATH: |
Aranet | {#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | - |
DEPENDENT | aranet.soildielectricperm["{#GATEWAYID}", "{#SENSORID}"] Preprocessing: - JSONPATH: |
Aranet | {#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | - |
DEPENDENT | aranet.soilelectriccond["{#GATEWAYID}", "{#SENSORID}"] Preprocessing: - JSONPATH: |
Aranet | {#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | - |
DEPENDENT | aranet.poreelectriccond["{#GATEWAYID}", "{#SENSORID}"] Preprocessing: - JSONPATH: |
Aranet | {#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | - |
DEPENDENT | aranet.pulses["{#GATEWAYID}", "{#SENSORID}"] Preprocessing: - JSONPATH: |
Aranet | {#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | - |
DEPENDENT | aranet.pulsescumulative["{#GATEWAYID}", "{#SENSOR_ID}"] Preprocessing: - JSONPATH: |
Aranet | {#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | - |
DEPENDENT | aranet.diffpressure["{#GATEWAYID}", "{#SENSOR_ID}"] Preprocessing: - JSONPATH: |
Aranet | {#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | - |
DEPENDENT | aranet.lastupdate["{#GATEWAYID}", "{#SENSOR_ID}"] Preprocessing: - JSONPATH: - JAVASCRIPT: |
Zabbix raw items | Aranet: Sensors discovery | Discovery for Aranet Cloud sensors |
DEPENDENT | aranet.sensor.discovery Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
Zabbix raw items | Aranet: Get data | - |
SCRIPT | aranet.get_data Expression: The text is too long. Please see the template. |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
{#METRIC}: Low humidity on "[{#GATEWAYNAME}] {#SENSORNAME}" | max(/Aranet Cloud/aranet.humidity["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) < {$ARANET.HUMIDITY.MIN.WARN:"{#SENSOR_NAME}"} |
WARNING | Depends on: - {#METRIC}: High humidity on "[{#GATEWAYNAME}] {#SENSORNAME}" |
|
{#METRIC}: High humidity on "[{#GATEWAYNAME}] {#SENSORNAME}" | min(/Aranet Cloud/aranet.humidity["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) > {$ARANET.HUMIDITY.MAX.WARN:"{#SENSOR_NAME}"} |
HIGH | ||
{#METRIC}: Low battery voltage on "[{#GATEWAYNAME}] {#SENSORNAME}" | - |
max(/Aranet Cloud/aranet.battery.voltage["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) < {$ARANET.BATT.VOLTAGE.MIN.WARN:"{#SENSOR_NAME}"} |
WARNING | Depends on: - {#METRIC}: Critically low battery voltage on "[{#GATEWAYNAME}] {#SENSORNAME}" |
{#METRIC}: Critically low battery voltage on "[{#GATEWAYNAME}] {#SENSORNAME}" | - |
max(/Aranet Cloud/aranet.battery.voltage["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) < {$ARANET.BATT.VOLTAGE.MIN.CRIT:"{#SENSOR_NAME}"} |
HIGH | |
{#METRIC}: High CO2 level on "[{#GATEWAYNAME}] {#SENSORNAME}" | - |
min(/Aranet Cloud/aranet.co2["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) > {$ARANET.CO2.MAX.WARN:"{#SENSOR_NAME}"} |
WARNING | Depends on: - {#METRIC}: Critically high CO2 level on "[{#GATEWAYNAME}] {#SENSORNAME}" |
{#METRIC}: Critically high CO2 level on "[{#GATEWAYNAME}] {#SENSORNAME}" | - |
min(/Aranet Cloud/aranet.co2["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) > {$ARANET.CO2.MAX.CRIT:"{#SENSOR_NAME}"} |
HIGH | |
{#METRIC}: Sensor data "[{#GATEWAYNAME}] {#SENSORNAME}" is not updated | - |
last(/Aranet Cloud/aranet.last_update["{#GATEWAY_ID}", "{#SENSOR_ID}"]) > {$ARANET.LAST_UPDATE.MAX.WARN:"{#SENSOR_NAME}"} |
WARNING |
Please report any issues with the template at https://support.zabbix.com
For Zabbix version: 6.2 and higher
The template to monitor Apache HTTPD by Zabbix that work without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Apache by HTTP
- collects metrics by polling mod_status with HTTP agent remotely:
127.0.0.1
ServerVersion: Apache/2.4.41 (Unix)
ServerMPM: event
Server Built: Aug 14 2019 00:35:10
CurrentTime: Friday, 16-Aug-2019 12:38:40 UTC
RestartTime: Wednesday, 14-Aug-2019 07:58:26 UTC
ParentServerConfigGeneration: 1
ParentServerMPMGeneration: 0
ServerUptimeSeconds: 189613
ServerUptime: 2 days 4 hours 40 minutes 13 seconds
Load1: 4.60
Load5: 1.20
Load15: 0.47
Total Accesses: 27860
Total kBytes: 33011
Total Duration: 54118
CPUUser: 18.02
CPUSystem: 31.76
CPUChildrenUser: 0
CPUChildrenSystem: 0
CPULoad: .0262535
Uptime: 189613
ReqPerSec: .146931
BytesPerSec: 178.275
BytesPerReq: 1213.33
DurationPerReq: 1.9425
BusyWorkers: 7
IdleWorkers: 93
Processes: 4
Stopping: 0
BusyWorkers: 7
IdleWorkers: 93
ConnsTotal: 13
ConnsAsyncWriting: 0
ConnsAsyncKeepAlive: 5
ConnsAsyncClosing: 0
Scoreboard: __________________________________________W_____________W___________________LW_____W______W_W_______............................................................................................................................................................................................................................................................................................................
This template was tested on:
See Zabbix template operation for basic instructions.
Setup mod_status
Check module availability: httpd -M 2>/dev/null | grep status_module
Example configuration of Apache:
<Location "/server-status">
SetHandler server-status
Require host example.com
</Location>
If you use another path, then don't forget to change {$APACHE.STATUS.PATH}
macro.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$APACHE.RESPONSE_TIME.MAX.WARN} | Maximum Apache response time in seconds for trigger expression |
10 |
{$APACHE.STATUS.PATH} | The URL path |
server-status?auto |
{$APACHE.STATUS.PORT} | The port of Apache status page |
80 |
{$APACHE.STATUS.SCHEME} | Request scheme which may be http or https |
http |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Event MPM discovery | Additional metrics if event MPM is used https://httpd.apache.org/docs/current/mod/event.html |
DEPENDENT | apache.mpm.event.discovery Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Apache | Apache: Service ping | - |
SIMPLE | net.tcp.service[http,"{HOST.CONN}","{$APACHE.STATUS.PORT}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Apache | Apache: Service response time | - |
SIMPLE | net.tcp.service.perf[http,"{HOST.CONN}","{$APACHE.STATUS.PORT}"] |
Apache | Apache: Total bytes | Total bytes served |
DEPENDENT | apache.bytes Preprocessing: - JSONPATH: - MULTIPLIER: |
Apache | Apache: Bytes per second | Calculated as change rate for 'Total bytes' stat. BytesPerSec is not used, as it counts average since last Apache server start. |
DEPENDENT | apache.bytes.rate Preprocessing: - JSONPATH: - MULTIPLIER: - CHANGEPERSECOND |
Apache | Apache: Requests per second | Calculated as change rate for 'Total requests' stat. ReqPerSec is not used, as it counts average since last Apache server start. |
DEPENDENT | apache.requests.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Apache | Apache: Total requests | A total number of accesses |
DEPENDENT | apache.requests Preprocessing: - JSONPATH: |
Apache | Apache: Uptime | Service uptime in seconds |
DEPENDENT | apache.uptime Preprocessing: - JSONPATH: |
Apache | Apache: Version | Service version |
DEPENDENT | apache.version Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Apache | Apache: Total workers busy | Total number of busy worker threads/processes |
DEPENDENT | apache.workers_total.busy Preprocessing: - JSONPATH: |
Apache | Apache: Total workers idle | Total number of idle worker threads/processes |
DEPENDENT | apache.workers_total.idle Preprocessing: - JSONPATH: |
Apache | Apache: Workers closing connection | Number of workers in closing state |
DEPENDENT | apache.workers.closing Preprocessing: - JSONPATH: |
Apache | Apache: Workers DNS lookup | Number of workers in dnslookup state |
DEPENDENT | apache.workers.dnslookup Preprocessing: - JSONPATH: |
Apache | Apache: Workers finishing | Number of workers in finishing state |
DEPENDENT | apache.workers.finishing Preprocessing: - JSONPATH: |
Apache | Apache: Workers idle cleanup | Number of workers in cleanup state |
DEPENDENT | apache.workers.cleanup Preprocessing: - JSONPATH: |
Apache | Apache: Workers keepalive (read) | Number of workers in keepalive state |
DEPENDENT | apache.workers.keepalive Preprocessing: - JSONPATH: |
Apache | Apache: Workers logging | Number of workers in logging state |
DEPENDENT | apache.workers.logging Preprocessing: - JSONPATH: |
Apache | Apache: Workers reading request | Number of workers in reading state |
DEPENDENT | apache.workers.reading Preprocessing: - JSONPATH: |
Apache | Apache: Workers sending reply | Number of workers in sending state |
DEPENDENT | apache.workers.sending Preprocessing: - JSONPATH: |
Apache | Apache: Workers slot with no current process | Number of slots with no current process |
DEPENDENT | apache.workers.slot Preprocessing: - JSONPATH: |
Apache | Apache: Workers starting up | Number of workers in starting state |
DEPENDENT | apache.workers.starting Preprocessing: - JSONPATH: |
Apache | Apache: Workers waiting for connection | Number of workers in waiting state |
DEPENDENT | apache.workers.waiting Preprocessing: - JSONPATH: |
Apache | Apache: Connections async closing | Number of async connections in closing state (only applicable to event MPM) |
DEPENDENT | apache.connections[async_closing{#SINGLETON}] Preprocessing: - JSONPATH: |
Apache | Apache: Connections async keep alive | Number of async connections in keep-alive state (only applicable to event MPM) |
DEPENDENT | apache.connections[asynckeepalive{#SINGLETON}] Preprocessing: - JSONPATH: |
Apache | Apache: Connections async writing | Number of async connections in writing state (only applicable to event MPM) |
DEPENDENT | apache.connections[async_writing{#SINGLETON}] Preprocessing: - JSONPATH: |
Apache | Apache: Connections total | Number of total connections |
DEPENDENT | apache.connections[total{#SINGLETON}] Preprocessing: - JSONPATH: |
Apache | Apache: Bytes per request | Average number of client requests per second |
DEPENDENT | apache.bytes[per_request{#SINGLETON}] Preprocessing: - JSONPATH: |
Apache | Apache: Number of async processes | Number of async processes |
DEPENDENT | apache.process[num{#SINGLETON}] Preprocessing: - JSONPATH: |
Zabbix raw items | Apache: Get status | Getting data from a machine-readable version of the Apache status page. https://httpd.apache.org/docs/current/mod/mod_status.html |
HTTP_AGENT | apache.get_status Preprocessing: - JAVASCRIPT: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Apache: Service is down | - |
last(/Apache by HTTP/net.tcp.service[http,"{HOST.CONN}","{$APACHE.STATUS.PORT}"])=0 |
AVERAGE | Manual close: YES |
Apache: Service response time is too high | - |
min(/Apache by HTTP/net.tcp.service.perf[http,"{HOST.CONN}","{$APACHE.STATUS.PORT}"],5m)>{$APACHE.RESPONSE_TIME.MAX.WARN} |
WARNING | Manual close: YES Depends on: - Apache: Service is down |
Apache: has been restarted | Uptime is less than 10 minutes. |
last(/Apache by HTTP/apache.uptime)<10m |
INFO | Manual close: YES |
Apache: Version has changed | Apache version has changed. Ack to close. |
last(/Apache by HTTP/apache.version,#1)<>last(/Apache by HTTP/apache.version,#2) and length(last(/Apache by HTTP/apache.version))>0 |
INFO | Manual close: YES |
Apache: Failed to fetch status page | Zabbix has not received data for items for the last 30 minutes. |
nodata(/Apache by HTTP/apache.get_status,30m)=1 |
WARNING | Manual close: YES Depends on: - Apache: Service is down |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.
For Zabbix version: 6.2 and higher.
This template is developed to monitor Apache HTTPD by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template Apache by Zabbix agent
- collects metrics by polling Apache Satus module locally with Zabbix agent:
127.0.0.1
ServerVersion: Apache/2.4.41 (Unix)
ServerMPM: event
Server Built: Aug 14 2019 00:35:10
CurrentTime: Friday, 16-Aug-2019 12:38:40 UTC
RestartTime: Wednesday, 14-Aug-2019 07:58:26 UTC
ParentServerConfigGeneration: 1
ParentServerMPMGeneration: 0
ServerUptimeSeconds: 189613
ServerUptime: 2 days 4 hours 40 minutes 13 seconds
Load1: 4.60
Load5: 1.20
Load15: 0.47
Total Accesses: 27860
Total kBytes: 33011
Total Duration: 54118
CPUUser: 18.02
CPUSystem: 31.76
CPUChildrenUser: 0
CPUChildrenSystem: 0
CPULoad: .0262535
Uptime: 189613
ReqPerSec: .146931
BytesPerSec: 178.275
BytesPerReq: 1213.33
DurationPerReq: 1.9425
BusyWorkers: 7
IdleWorkers: 93
Processes: 4
Stopping: 0
BusyWorkers: 7
IdleWorkers: 93
ConnsTotal: 13
ConnsAsyncWriting: 0
ConnsAsyncKeepAlive: 5
ConnsAsyncClosing: 0
Scoreboard: __________________________________________W_____________W___________________LW_____W______W_W_______............................................................................................................................................................................................................................................................................................................
It also uses Zabbix agent to collect Apache
Linux process statistics, such as CPU usage, memory usage, and whether the process is running or not.
This template was tested on:
See Zabbix template operation for basic instructions.
See the setup instructions for Apache Satus module.
Check the availability of the module with this command line: httpd -M 2>/dev/null | grep status_module
This is an example configuration of the Apache web server:
<Location "/server-status">
SetHandler server-status
Require host example.com
</Location>
If you use another path, then do not forget to change the {$APACHE.STATUS.PATH}
macro.
Install and setup Zabbix agent.
No specific Zabbix configuration is required.
Name | Description | Default | |
---|---|---|---|
{$APACHE.PROCESS_NAME} | The process name of the Apache web server (Apache). |
`(httpd | apache2)` |
{$APACHE.RESPONSE_TIME.MAX.WARN} | The maximum Apache response time expressed in seconds for a trigger expression. |
10 |
|
{$APACHE.STATUS.HOST} | The Hostname or an IP address of the Apache status page. |
127.0.0.1 |
|
{$APACHE.STATUS.PATH} | The URL path. |
server-status?auto |
|
{$APACHE.STATUS.PORT} | The port of the Apache status page. |
80 |
|
{$APACHE.STATUS.SCHEME} | The request scheme, which may be either HTTP or HTTPS. |
http |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Apache process discovery | The discovery of the Apache process summary. |
DEPENDENT | apache.proc.discovery Filter: AND- {#NAME} MATCHES_REGEX |
Event MPM discovery | The discovery of additional metrics if the event Multi-Processing Module (MPM) is used. For more details see Apache MPM event. |
DEPENDENT | apache.mpm.event.discovery Preprocessing: - JAVASCRIPT: - DISCARDUNCHANGEDHEARTBEAT: |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
Apache | Apache: Service ping | - |
ZABBIX_PASSIVE | net.tcp.service[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Apache | Apache: Service response time | - |
ZABBIX_PASSIVE | net.tcp.service.perf[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"] |
Apache | Apache: Total bytes | The total bytes served. |
DEPENDENT | apache.bytes Preprocessing: - JSONPATH: - MULTIPLIER: |
Apache | Apache: Bytes per second | It is calculated as a rate of change for total bytes statistics.
|
DEPENDENT | apache.bytes.rate Preprocessing: - JSONPATH: - MULTIPLIER: - CHANGEPERSECOND |
Apache | Apache: Requests per second | It is calculated as a rate of change for the "Total requests" statistics.
|
DEPENDENT | apache.requests.rate Preprocessing: - JSONPATH: - CHANGEPERSECOND |
Apache | Apache: Total requests | The total number of the Apache server accesses. |
DEPENDENT | apache.requests Preprocessing: - JSONPATH: |
Apache | Apache: Uptime | The service uptime expressed in seconds. |
DEPENDENT | apache.uptime Preprocessing: - JSONPATH: |
Apache | Apache: Version | The Apache service version. |
DEPENDENT | apache.version Preprocessing: - JSONPATH: - DISCARDUNCHANGEDHEARTBEAT: |
Apache | Apache: Total workers busy | The total number of busy worker threads/processes. |
DEPENDENT | apache.workers_total.busy Preprocessing: - JSONPATH: |
Apache | Apache: Total workers idle | The total number of idle worker threads/processes. |
DEPENDENT | apache.workers_total.idle Preprocessing: - JSONPATH: |
Apache | Apache: Workers closing connection | The number of workers in closing state. |
DEPENDENT | apache.workers.closing Preprocessing: - JSONPATH: |
Apache | Apache: Workers DNS lookup | The number of workers in |
DEPENDENT | apache.workers.dnslookup Preprocessing: - JSONPATH: |
Apache | Apache: Workers finishing | The number of workers in finishing state. |
DEPENDENT | apache.workers.finishing Preprocessing: - JSONPATH: |
Apache | Apache: Workers idle cleanup | The number of workers in cleanup state. |
DEPENDENT | apache.workers.cleanup Preprocessing: - JSONPATH: |
Apache | Apache: Workers keepalive (read) | The number of workers in |
DEPENDENT | apache.workers.keepalive Preprocessing: - JSONPATH: |
Apache | Apache: Workers logging | The number of workers in logging state. |
DEPENDENT | apache.workers.logging Preprocessing: - JSONPATH: |
Apache | Apache: Workers reading request | The number of workers in reading state. |
DEPENDENT | apache.workers.reading Preprocessing: - JSONPATH: |
Apache | Apache: Workers sending reply | The number of workers in sending state. |
DEPENDENT | apache.workers.sending Preprocessing: - JSONPATH: |
Apache | Apache: Workers slot with no current process | The number of slots with no current process. |
DEPENDENT | apache.workers.slot Preprocessing: - JSONPATH: |
Apache | Apache: Workers starting up | The number of workers in starting state. |
DEPENDENT | apache.workers.starting Preprocessing: - JSONPATH: |
Apache | Apache: Workers waiting for connection | The number of workers in waiting state. |
DEPENDENT | apache.workers.waiting Preprocessing: - JSONPATH: |
Apache | Apache: Get processes summary | The aggregated data of summary metrics for all processes. |
ZABBIX_PASSIVE | proc.get[,,,summary] |
Apache | Apache: CPU utilization | The percentage of the CPU utilization by a process {#NAME}. |
ZABBIX_PASSIVE | proc.cpu.util[{#NAME}] |
Apache | Apache: Get process data | The summary metrics aggregated by a process {#NAME}. |
DEPENDENT | apache.proc.get[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Apache | Apache: Memory usage (rss) | The summary of resident set size memory used by a process {#NAME} expressed in bytes. |
DEPENDENT | apache.proc.rss[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Apache | Apache: Memory usage (vsize) | The summary of virtual memory used by a process {#NAME} expressed in bytes. |
DEPENDENT | apache.proc.vmem[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Apache | Apache: Memory usage, % | The percentage of real memory used by a process {#NAME}. |
DEPENDENT | apache.proc.pmem[{#NAME}] Preprocessing: - JSONPATH: ⛔️ON_FAIL: |
Apache | Apache: Number of running processes | The number of running processes {#NAME}. |
DEPENDENT | apache.proc.num[{#NAME}] Preprocessing: - JSONPATH: ⛔️ONFAIL: - DISCARDUNCHANGED_HEARTBEAT: |
Apache | Apache: Connections async closing | The number of asynchronous connections in closing state (applicable only to the event MPM). |
DEPENDENT | apache.connections[async_closing{#SINGLETON}] Preprocessing: - JSONPATH: |
Apache | Apache: Connections async keepalive | The number of asynchronous connections in keepalive state (applicable only to the event MPM). |
DEPENDENT | apache.connections[asynckeepalive{#SINGLETON}] Preprocessing: - JSONPATH: |
Apache | Apache: Connections async writing | The number of asynchronous connections in writing state (applicable only to the event MPM). |
DEPENDENT | apache.connections[async_writing{#SINGLETON}] Preprocessing: - JSONPATH: |
Apache | Apache: Connections total | The number of total connections. |
DEPENDENT | apache.connections[total{#SINGLETON}] Preprocessing: - JSONPATH: |
Apache | Apache: Bytes per request | The average number of client requests per second. |
DEPENDENT | apache.bytes[per_request{#SINGLETON}] Preprocessing: - JSONPATH: |
Apache | Apache: Number of async processes | The number of asynchronous processes. |
DEPENDENT | apache.process[num{#SINGLETON}] Preprocessing: - JSONPATH: |
Zabbix raw items | Apache: Get status | Getting data from a machine-readable version of the Apache status page. For more information see Apache Module mod_status. |
ZABBIX_PASSIVE | web.page.get["{$APACHE.STATUS.SCHEME}://{$APACHE.STATUS.HOST}:{$APACHE.STATUS.PORT}/{$APACHE.STATUS.PATH}"] Preprocessing: - JAVASCRIPT: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Apache: Host has been restarted | Uptime is less than 10 minutes. |
last(/Apache by Zabbix agent/apache.uptime)<10m |
INFO | Manual close: YES |
Apache: Version has changed | The Apache version has changed. Acknowledge (Ack) to close manually. |
last(/Apache by Zabbix agent/apache.version,#1)<>last(/Apache by Zabbix agent/apache.version,#2) and length(last(/Apache by Zabbix agent/apache.version))>0 |
INFO | Manual close: YES |
Apache: Process is not running | - |
last(/Apache by Zabbix agent/apache.proc.num[{#NAME}])=0 |
HIGH | |
Apache: Failed to fetch status page | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Apache by Zabbix agent/web.page.get["{$APACHE.STATUS.SCHEME}://{$APACHE.STATUS.HOST}:{$APACHE.STATUS.PORT}/{$APACHE.STATUS.PATH}"],30m)=1 and last(/Apache by Zabbix agent/apache.proc.num[{#NAME}])>0 |
WARNING | Manual close: YES Depends on: - Apache: Service is down |
Apache: Service is down | - |
last(/Apache by Zabbix agent/net.tcp.service[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"])=0 and last(/Apache by Zabbix agent/apache.proc.num[{#NAME}])>0 |
AVERAGE | Manual close: YES |
Apache: Service response time is too high | - |
min(/Apache by Zabbix agent/net.tcp.service.perf[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"],5m)>{$APACHE.RESPONSE_TIME.MAX.WARN} and last(/Apache by Zabbix agent/apache.proc.num[{#NAME}])>0 |
WARNING | Manual close: YES Depends on: - Apache: Service is down |
Please report any issues with the template at https://support.zabbix.com.
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums.
For Zabbix version: 6.2 and higher
Official JMX Template for Apache ActiveMQ.
This template was tested on:
See Zabbix template operation for basic instructions.
Metrics are collected by JMX.
No specific Zabbix configuration is required.
Name | Description | Default |
---|---|---|
{$ACTIVEMQ.BROKER.CONSUMERS.MIN.HIGH} | Minimum amount of consumers for broker. Can be used with broker name as context. |
1 |
{$ACTIVEMQ.BROKER.CONSUMERS.MIN.TIME} | Time during which there may be no consumers on destination. Can be used with broker name as context. |
5m |
{$ACTIVEMQ.BROKER.PRODUCERS.MIN.HIGH} | Minimum amount of producers for broker. Can be used with broker name as context. |
1 |
{$ACTIVEMQ.BROKER.PRODUCERS.MIN.TIME} | Time during which there may be no producers on broker. Can be used with broker name as context. |
5m |
{$ACTIVEMQ.DESTINATION.CONSUMERS.MIN.HIGH} | Minimum amount of consumers for destination. Can be used with destination name as context. |
1 |
{$ACTIVEMQ.DESTINATION.CONSUMERS.MIN.TIME} | Time during which there may be no consumers in destination. Can be used with destination name as context. |
10m |
{$ACTIVEMQ.DESTINATION.PRODUCERS.MIN.HIGH} | Minimum amount of producers for destination. Can be used with destination name as context. |
1 |
{$ACTIVEMQ.DESTINATION.PRODUCERS.MIN.TIME} | Time during which there may be no producers on destination. Can be used with destination name as context. |
10m |
{$ACTIVEMQ.EXPIRED.WARN} | Threshold for expired messages count. Can be used with destination name as context. |
0 |
{$ACTIVEMQ.LLD.FILTER.BROKER.MATCHES} | Filter of discoverable discovered brokers |
.* |
{$ACTIVEMQ.LLD.FILTER.BROKER.NOT_MATCHES} | Filter to exclude discovered brokers |
CHANGE IF NEEDED |
{$ACTIVEMQ.LLD.FILTER.DESTINATION.MATCHES} | Filter of discoverable discovered destinations |
.* |
{$ACTIVEMQ.LLD.FILTER.DESTINATION.NOT_MATCHES} | Filter to exclude discovered destinations |
CHANGE IF NEEDED |
{$ACTIVEMQ.MEM.MAX.HIGH} | Memory threshold for HIGH trigger. Can be used with destination or broker name as context. |
90 |
{$ACTIVEMQ.MEM.MAX.WARN} | Memory threshold for AVERAGE trigger. Can be used with destination or broker name as context. |
75 |
{$ACTIVEMQ.MEM.TIME} | Time during which the metric can be above the threshold. Can be used with destination or broker name as context. |
5m |
{$ACTIVEMQ.MSG.RATE.WARN.TIME} | The time for message enqueue/dequeue rate. Can be used with destination or broker name as context. |
15m |
{$ACTIVEMQ.PASSWORD} | Password for JMX |
activemq |
{$ACTIVEMQ.PORT} | Port for JMX |
1099 |
{$ACTIVEMQ.QUEUE.ENABLED} | Use this to disable alerting for specific destination. 1 = enabled, 0 = disabled. Can be used with destination name as context. |
1 |
{$ACTIVEMQ.QUEUE.TIME} | Time during which the QueueSize can be higher than threshold. Can be used with destination name as context. |
10m |
{$ACTIVEMQ.QUEUE.WARN} | Threshold for QueueSize. Can be used with destination name as context. |
100 |
{$ACTIVEMQ.STORE.MAX.HIGH} | Storage threshold for HIGH trigger. Can be used with broker name as context. |
90 |
{$ACTIVEMQ.STORE.MAX.WARN} | Storage threshold for AVERAGE trigger. Can be used with broker name as context. |
75 |
{$ACTIVEMQ.STORE.TIME} | Time during which the metric can be above the threshold. Can be used with destination or broker name as context. |
5m |
{$ACTIVEMQ.TEMP.MAX.HIGH} | Temp threshold for HIGH trigger. Can be used with broker name as context. |
90 |
{$ACTIVEMQ.TEMP.MAX.WARN} | Temp threshold for AVERAGE trigger. Can be used with broker name as context. |
75 |
{$ACTIVEMQ.TEMP.TIME} | Time during which the metric can be above the threshold. Can be used with destination or broker name as context. |
5m |
{$ACTIVEMQ.TOTAL.CONSUMERS.COUNT} | Attribute for TotalConsumerCount per destination. Used to suppress destination's triggers when the count of consumers on the broker is lower than threshold. |
TotalConsumerCount |
{$ACTIVEMQ.TOTAL.PRODUCERS.COUNT} | Attribute for TotalProducerCount per destination. Used to suppress destination's triggers when the count of consumers on the broker is lower than threshold. |
TotalProducerCount |
{$ACTIVEMQ.USER} | User for JMX |
admin |
There are no template links in this template.
Name | Description | Type | Key and additional info |
---|---|---|---|
Brokers discovery | Discovery of brokers |
JMX | jmx.discovery[beans,"org.apache.activemq:type=Broker,brokerName=*"] Filter: FORMULA A and B- {#JMXBROKERNAME} MATCHESREGEX - {#JMXBROKERNAME} NOTMATCHES_REGEX |
Destinations discovery | Discovery of destinations |
JMX | jmx.discovery[beans,"org.apache.activemq:type=Broker,brokerName=,destinationType=,destinationName=*"] Filter: FORMULA A and B- {#JMXDESTINATIONNAME} MATCHESREGEX - {#JMXDESTINATIONNAME} NOTMATCHES_REGEX |
Group | Name | Description | Type | Key and additional info |
---|---|---|---|---|
ActiveMQ | Broker {#JMXBROKERNAME}: Version | The version of the broker. |
JMX | jmx[{#JMXOBJ},BrokerVersion] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
ActiveMQ | Broker {#JMXBROKERNAME}: Uptime | The uptime of the broker. |
JMX | jmx[{#JMXOBJ},UptimeMillis] Preprocessing: - MULTIPLIER: |
ActiveMQ | Broker {#JMXBROKERNAME}: Memory limit | Memory limit, in bytes, used for holding undelivered messages before paging to temporary storage. |
JMX | jmx[{#JMXOBJ},MemoryLimit] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
ActiveMQ | Broker {#JMXBROKERNAME}: Memory usage in percents | Percent of memory limit used. |
JMX | jmx[{#JMXOBJ}, MemoryPercentUsage] |
ActiveMQ | Broker {#JMXBROKERNAME}: Storage limit | Disk limit, in bytes, used for persistent messages before producers are blocked. |
JMX | jmx[{#JMXOBJ},StoreLimit] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
ActiveMQ | Broker {#JMXBROKERNAME}: Storage usage in percents | Percent of store limit used. |
JMX | jmx[{#JMXOBJ},StorePercentUsage] |
ActiveMQ | Broker {#JMXBROKERNAME}: Temp limit | Disk limit, in bytes, used for non-persistent messages and temporary data before producers are blocked. |
JMX | jmx[{#JMXOBJ},TempLimit] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
ActiveMQ | Broker {#JMXBROKERNAME}: Temp usage in percents | Percent of temp limit used. |
JMX | jmx[{#JMXOBJ},TempPercentUsage] |
ActiveMQ | Broker {#JMXBROKERNAME}: Messages enqueue rate | Rate of messages that have been sent to the broker. |
JMX | jmx[{#JMXOBJ},TotalEnqueueCount] Preprocessing: - CHANGEPERSECOND |
ActiveMQ | Broker {#JMXBROKERNAME}: Messages dequeue rate | Rate of messages that have been delivered by the broker and acknowledged by consumers. |
JMX | jmx[{#JMXOBJ},TotalDequeueCount] Preprocessing: - CHANGEPERSECOND |
ActiveMQ | Broker {#JMXBROKERNAME}: Consumers count total | Number of consumers attached to this broker. |
JMX | jmx[{#JMXOBJ},TotalConsumerCount] |
ActiveMQ | Broker {#JMXBROKERNAME}: Producers count total | Number of producers attached to this broker. |
JMX | jmx[{#JMXOBJ},TotalProducerCount] |
ActiveMQ | {#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Consumers count | Number of consumers attached to this destination. |
JMX | jmx[{#JMXOBJ},ConsumerCount] |
ActiveMQ | {#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Consumers count total on {#JMXBROKERNAME} | Number of consumers attached to the broker of this destination. Used to suppress destination's triggers when the count of consumers on the broker is lower than threshold. |
JMX | jmx["org.apache.activemq:type=Broker,brokerName={#JMXBROKERNAME}",{$ACTIVEMQ.TOTAL.CONSUMERS.COUNT: "{#JMXDESTINATIONNAME}"}] Preprocessing: - INRANGE: ⛔️ONFAIL: - DISCARDUNCHANGEDHEARTBEAT: |
ActiveMQ | {#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Producers count | Number of producers attached to this destination. |
JMX | jmx[{#JMXOBJ},ProducerCount] |
ActiveMQ | {#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Producers count total on {#JMXBROKERNAME} | Number of producers attached to the broker of this destination. Used to suppress destination's triggers when the count of producers on the broker is lower than threshold. |
JMX | jmx["org.apache.activemq:type=Broker,brokerName={#JMXBROKERNAME}",{$ACTIVEMQ.TOTAL.PRODUCERS.COUNT: "{#JMXDESTINATIONNAME}"}] Preprocessing: - INRANGE: ⛔️ONFAIL: - DISCARDUNCHANGEDHEARTBEAT: |
ActiveMQ | {#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Memory usage in percents | The percentage of the memory limit used. |
JMX | jmx[{#JMXOBJ},MemoryPercentUsage] |
ActiveMQ | {#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Messages enqueue rate | Rate of messages that have been sent to the destination. |
JMX | jmx[{#JMXOBJ},EnqueueCount] Preprocessing: - CHANGEPERSECOND |
ActiveMQ | {#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Messages dequeue rate | Rate of messages that has been acknowledged (and removed) from the destination. |
JMX | jmx[{#JMXOBJ},DequeueCount] Preprocessing: - CHANGEPERSECOND |
ActiveMQ | {#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Queue size | Number of messages on this destination, including any that have been dispatched but not acknowledged. |
JMX | jmx[{#JMXOBJ},QueueSize] |
ActiveMQ | {#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Expired messages count | Number of messages that have been expired. |
JMX | jmx[{#JMXOBJ},ExpiredCount] Preprocessing: - DISCARDUNCHANGEDHEARTBEAT: |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Broker {#JMXBROKERNAME}: Version has been changed | Broker {#JMXBROKERNAME} version has changed. Ack to close. |
last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},BrokerVersion],#1)<>last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},BrokerVersion],#2) and length(last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},BrokerVersion]))>0 |
INFO | Manual close: YES |
Broker {#JMXBROKERNAME}: Broker has been restarted | Uptime is less than 10 minutes. |
last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},UptimeMillis])<10m |
INFO | Manual close: YES |
Broker {#JMXBROKERNAME}: Memory usage is too high | - |
min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ}, MemoryPercentUsage],{$ACTIVEMQ.MEM.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.MEM.MAX.WARN:"{#JMXBROKERNAME}"} |
AVERAGE | Depends on: - Broker {#JMXBROKERNAME}: Memory usage is too high |
Broker {#JMXBROKERNAME}: Memory usage is too high | - |
min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ}, MemoryPercentUsage],{$ACTIVEMQ.MEM.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.MEM.MAX.HIGH:"{#JMXBROKERNAME}"} |
HIGH | |
Broker {#JMXBROKERNAME}: Storage usage is too high | - |
min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},StorePercentUsage],{$ACTIVEMQ.STORE.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.STORE.MAX.WARN:"{#JMXBROKERNAME}"} |
AVERAGE | Depends on: - Broker {#JMXBROKERNAME}: Storage usage is too high |
Broker {#JMXBROKERNAME}: Storage usage is too high | - |
min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},StorePercentUsage],{$ACTIVEMQ.STORE.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.STORE.MAX.HIGH:"{#JMXBROKERNAME}"} |
HIGH | |
Broker {#JMXBROKERNAME}: Temp usage is too high | - |
min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TempPercentUsage],{$ACTIVEMQ.TEMP.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.TEMP.MAX.WARN} |
AVERAGE | Depends on: - Broker {#JMXBROKERNAME}: Temp usage is too high |
Broker {#JMXBROKERNAME}: Temp usage is too high | - |
min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TempPercentUsage],{$ACTIVEMQ.TEMP.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.TEMP.MAX.HIGH} |
HIGH | |
Broker {#JMXBROKERNAME}: Message enqueue rate is higher than dequeue rate | Enqueue rate is higher than dequeue rate. It may indicate performance problems. |
avg(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TotalEnqueueCount],{$ACTIVEMQ.MSG.RATE.WARN.TIME:"{#JMXBROKERNAME}"})>avg(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TotalEnqueueCount],{$ACTIVEMQ.MSG.RATE.WARN.TIME:"{#JMXBROKERNAME}"}) |
AVERAGE | |
Broker {#JMXBROKERNAME}: Consumers count is too low | - |
max(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TotalConsumerCount],{$ACTIVEMQ.BROKER.CONSUMERS.MIN.TIME:"{#JMXBROKERNAME}"})<{$ACTIVEMQ.BROKER.CONSUMERS.MIN.HIGH:"{#JMXBROKERNAME}"} |
HIGH | |
Broker {#JMXBROKERNAME}: Producers count is too low | - |
max(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TotalProducerCount],{$ACTIVEMQ.BROKER.PRODUCERS.MIN.TIME:"{#JMXBROKERNAME}"})<{$ACTIVEMQ.BROKER.PRODUCERS.MIN.HIGH:"{#JMXBROKERNAME}"} |
HIGH | |
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Consumers count is too low | - |
max(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},ConsumerCount],{$ACTIVEMQ.DESTINATION.CONSUMERS.MIN.TIME:"{#JMXDESTINATIONNAME}"})<{$ACTIVEMQ.DESTINATION.CONSUMERS.MIN.HIGH:"{#JMXDESTINATIONNAME}"} and last(/Apache ActiveMQ by JMX/jmx["org.apache.activemq:type=Broker,brokerName={#JMXBROKERNAME}",{$ACTIVEMQ.TOTAL.CONSUMERS.COUNT: "{#JMXDESTINATIONNAME}"}])>{$ACTIVEMQ.BROKER.CONSUMERS.MIN.HIGH:"{#JMXBROKERNAME}"} Recovery expression: min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},ConsumerCount],{$ACTIVEMQ.DESTINATION.CONSUMERS.MIN.TIME:"{#JMXDESTINATIONNAME}"})>={$ACTIVEMQ.DESTINATION.CONSUMERS.MIN.HIGH:"{#JMXDESTINATIONNAME}"} |
AVERAGE | Manual close: YES |
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Producers count is too low | - |
max(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},ProducerCount],{$ACTIVEMQ.DESTINATION.PRODUCERS.MIN.TIME:"{#JMXDESTINATIONNAME}"})<{$ACTIVEMQ.DESTINATION.PRODUCERS.MIN.HIGH:"{#JMXDESTINATIONNAME}"} and last(/Apache ActiveMQ by JMX/jmx["org.apache.activemq:type=Broker,brokerName={#JMXBROKERNAME}",{$ACTIVEMQ.TOTAL.PRODUCERS.COUNT: "{#JMXDESTINATIONNAME}"}])>{$ACTIVEMQ.BROKER.PRODUCERS.MIN.HIGH:"{#JMXBROKERNAME}"} Recovery expression: min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},ProducerCount],{$ACTIVEMQ.DESTINATION.PRODUCERS.MIN.TIME:"{#JMXDESTINATIONNAME}"})>={$ACTIVEMQ.DESTINATION.PRODUCERS.MIN.HIGH:"{#JMXDESTINATIONNAME}"} |
AVERAGE | Manual close: YES |
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Memory usage is too high | - |
last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},MemoryPercentUsage])>{$ACTIVEMQ.MEM.MAX.WARN:"{#JMXDESTINATIONNAME}"} |
AVERAGE | |
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Memory usage is too high | - |
last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},MemoryPercentUsage])>{$ACTIVEMQ.MEM.MAX.HIGH:"{#JMXDESTINATIONNAME}"} |
HIGH | |
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Message enqueue rate is higher than dequeue rate | Enqueue rate is higher than dequeue rate. It may indicate performance problems. |
avg(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},EnqueueCount],{$ACTIVEMQ.MSG.RATE.WARN.TIME:"{#JMXDESTINATIONNAME}"})>avg(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},DequeueCount],{$ACTIVEMQ.MSG.RATE.WARN.TIME:"{#JMXDESTINATIONNAME}"}) |
AVERAGE | |
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Queue size is high | Queue size is higher than threshold. It may indicate performance problems. |
min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},QueueSize],{$ACTIVEMQ.QUEUE.TIME:"{#JMXDESTINATIONNAME}"})>{$ACTIVEMQ.QUEUE.WARN:"{#JMXDESTINATIONNAME}"} and {$ACTIVEMQ.QUEUE.ENABLED:"{#JMXDESTINATIONNAME}"}=1 |
AVERAGE | |
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Expired messages count is high | This metric represents the number of messages that expired before they could be delivered. If you expect all messages to be delivered and acknowledged within a certain amount of time, you can set an expiration for each message, and investigate if your ExpiredCount metric rises above zero. |
last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},ExpiredCount])>{$ACTIVEMQ.EXPIRED.WARN:"{#JMXDESTINATIONNAME}"} |
AVERAGE |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template or ask for help with it at ZABBIX forums.