This template is designed for the effortless deployment of Apache Zookeeper monitoring by Zabbix via HTTP and doesn't require any external scripts.
This template works with standalone and cluster instances. Metrics are collected from each Zookeeper node by requests to AdminServer.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Enable the AdminServer and configure the parameters according to the official documentation
.
Set the hostname or IP address of the Apache Zookeeper host in the {$ZOOKEEPER.HOST}
macro. You can also change the {$ZOOKEEPER.COMMAND_URL}
, {$ZOOKEEPER.PORT}
and {$ZOOKEEPER.SCHEME}
macros if necessary.
Name | Description | Default |
---|---|---|
{$ZOOKEEPER.HOST} | The hostname or IP address of the Apache Zookeeper host. |
<SET ZOOKEEPER HOST> |
{$ZOOKEEPER.PORT} | The port the embedded Jetty server listens on (admin.serverPort). |
8080 |
{$ZOOKEEPER.COMMAND_URL} | The URL for listing and issuing commands relative to the root URL (admin.commandURL). |
commands |
{$ZOOKEEPER.SCHEME} | Request scheme which may be http or https |
http |
{$ZOOKEEPER.FILE_DESCRIPTORS.MAX.WARN} | Maximum percentage of file descriptors usage alert threshold (for trigger expression). |
85 |
{$ZOOKEEPER.OUTSTANDING_REQ.MAX.WARN} | Maximum number of outstanding requests (for trigger expression). |
10 |
{$ZOOKEEPER.PENDING_SYNCS.MAX.WARN} | Maximum number of pending syncs from the followers (for trigger expression). |
10 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get server metrics | HTTP agent | zookeeper.get_metrics | |
Get connections stats | Get information on client connections to server. Note, depending on the number of client connections this operation may be expensive (i.e. impact server performance). |
HTTP agent | zookeeper.getconnectionsstats |
Server mode | Mode of the server. In an ensemble, this may either be leader or follower. Otherwise, it is standalone |
Dependent item | zookeeper.server_state Preprocessing
|
Uptime | Uptime that a peer has been in a table leading/following/observing state. |
Dependent item | zookeeper.uptime Preprocessing
|
Version | Version of Zookeeper server. |
Dependent item | zookeeper.version Preprocessing
|
Approximate data size | Data tree size in bytes.The size includes the znode path and its value. |
Dependent item | zookeeper.approximatedatasize Preprocessing
|
File descriptors, max | Maximum number of file descriptors that a zookeeper server can open. |
Dependent item | zookeeper.maxfiledescriptor_count Preprocessing
|
File descriptors, open | Number of file descriptors that a zookeeper server has open. |
Dependent item | zookeeper.openfiledescriptor_count Preprocessing
|
Outstanding requests | The number of queued requests when the server is under load and is receiving more sustained requests than it can process. |
Dependent item | zookeeper.outstanding_requests Preprocessing
|
Commit per sec | The number of commits performed per second |
Dependent item | zookeeper.commit_count.rate Preprocessing
|
Diff syncs per sec | Number of diff syncs performed per second |
Dependent item | zookeeper.diff_count.rate Preprocessing
|
Snap syncs per sec | Number of snap syncs performed per second |
Dependent item | zookeeper.snap_count.rate Preprocessing
|
Looking per sec | Rate of transitions into looking state. |
Dependent item | zookeeper.looking_count.rate Preprocessing
|
Alive connections | Number of active clients connected to a zookeeper server. |
Dependent item | zookeeper.numaliveconnections Preprocessing
|
Global sessions | Number of global sessions. |
Dependent item | zookeeper.global_sessions Preprocessing
|
Local sessions | Number of local sessions. |
Dependent item | zookeeper.local_sessions Preprocessing
|
Drop connections per sec | Rate of connection drops. |
Dependent item | zookeeper.connectiondropcount.rate Preprocessing
|
Rejected connections per sec | Rate of connection rejected. |
Dependent item | zookeeper.connection_rejected.rate Preprocessing
|
Revalidate connections per sec | Rate of connection revalidations. |
Dependent item | zookeeper.connectionrevalidatecount.rate Preprocessing
|
Revalidate per sec | Rate of revalidations. |
Dependent item | zookeeper.revalidate_count.rate Preprocessing
|
Latency, max | The maximum amount of time it takes for the server to respond to a client request. |
Dependent item | zookeeper.max_latency Preprocessing
|
Latency, min | The minimum amount of time it takes for the server to respond to a client request. |
Dependent item | zookeeper.min_latency Preprocessing
|
Latency, avg | The average amount of time it takes for the server to respond to a client request. |
Dependent item | zookeeper.avg_latency Preprocessing
|
Znode count | The number of znodes in the ZooKeeper namespace (the data) |
Dependent item | zookeeper.znode_count Preprocessing
|
Ephemeral nodes count | Number of ephemeral nodes that a zookeeper server has in its data tree. |
Dependent item | zookeeper.ephemerals_count Preprocessing
|
Watch count | Number of watches currently set on the local ZooKeeper process. |
Dependent item | zookeeper.watch_count Preprocessing
|
Packets sent per sec | The number of zookeeper packets sent from a server per second. |
Dependent item | zookeeper.packets_sent Preprocessing
|
Packets received per sec | The number of zookeeper packets received by a server per second. |
Dependent item | zookeeper.packets_received.rate Preprocessing
|
Bytes received per sec | Number of bytes received per second. |
Dependent item | zookeeper.bytesreceivedcount.rate Preprocessing
|
Election time, avg | Time between entering and leaving election. |
Dependent item | zookeeper.avgelectiontime Preprocessing
|
Elections | Number of elections happened. |
Dependent item | zookeeper.cntelectiontime Preprocessing
|
Fsync time, avg | Time to fsync transaction log. |
Dependent item | zookeeper.avg_fsynctime Preprocessing
|
Fsync | Count of performed fsyncs. |
Dependent item | zookeeper.cnt_fsynctime Preprocessing
|
Snapshot write time, avg | Average time to write a snapshot. |
Dependent item | zookeeper.avg_snapshottime Preprocessing
|
Snapshot writes | Count of performed snapshot writes. |
Dependent item | zookeeper.cnt_snapshottime Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zookeeper: Server mode has changed | Zookeeper node state has changed. Acknowledge to close the problem manually. |
last(/Zookeeper by HTTP/zookeeper.server_state,#1)<>last(/Zookeeper by HTTP/zookeeper.server_state,#2) and length(last(/Zookeeper by HTTP/zookeeper.server_state))>0 |Info |
Manual close: Yes | |
Zookeeper: Failed to fetch info data | Zabbix has not received data for items for the last 10 minutes |
nodata(/Zookeeper by HTTP/zookeeper.uptime,10m)=1 |Warning |
Manual close: Yes | |
Zookeeper: Version has changed | Zookeeper version has changed. Acknowledge to close the problem manually. |
last(/Zookeeper by HTTP/zookeeper.version,#1)<>last(/Zookeeper by HTTP/zookeeper.version,#2) and length(last(/Zookeeper by HTTP/zookeeper.version))>0 |Info |
Manual close: Yes | |
Zookeeper: Too many file descriptors used | Number of file descriptors used more than {$ZOOKEEPER.FILE_DESCRIPTORS.MAX.WARN}% of the available number of file descriptors. |
min(/Zookeeper by HTTP/zookeeper.open_file_descriptor_count,5m) * 100 / last(/Zookeeper by HTTP/zookeeper.max_file_descriptor_count) > {$ZOOKEEPER.FILE_DESCRIPTORS.MAX.WARN} |Warning |
||
Zookeeper: Too many queued requests | Number of queued requests in the server. This goes up when the server receives more requests than it can process. |
min(/Zookeeper by HTTP/zookeeper.outstanding_requests,5m)>{$ZOOKEEPER.OUTSTANDING_REQ.MAX.WARN} |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Leader metrics discovery | Additional metrics for leader node |
Dependent item | zookeeper.metrics.leader Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Pending syncs{#SINGLETON} | Number of pending syncs to carry out to ZooKeeper ensemble followers. |
Dependent item | zookeeper.pending_syncs[{#SINGLETON}] Preprocessing
|
Quorum size{#SINGLETON} | Dependent item | zookeeper.quorum_size[{#SINGLETON}] Preprocessing
|
|
Synced followers{#SINGLETON} | Number of synced followers reported when a node server_state is leader. |
Dependent item | zookeeper.synced_followers[{#SINGLETON}] Preprocessing
|
Synced non-voting follower{#SINGLETON} | Number of synced voting followers reported when a node server_state is leader. |
Dependent item | zookeeper.syncednonvoting_followers[{#SINGLETON}] Preprocessing
|
Synced observers{#SINGLETON} | Number of synced observers. |
Dependent item | zookeeper.synced_observers[{#SINGLETON}] Preprocessing
|
Learners{#SINGLETON} | Number of learners. |
Dependent item | zookeeper.learners[{#SINGLETON}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zookeeper: Too many pending syncs | min(/Zookeeper by HTTP/zookeeper.pending_syncs[{#SINGLETON}],5m)>{$ZOOKEEPER.PENDING_SYNCS.MAX.WARN} |Average |
Manual close: Yes | ||
Zookeeper: Too few active followers | The number of followers should equal the total size of your ZooKeeper ensemble, minus 1 (the leader is not included in the follower count). If the ensemble fails to maintain quorum, all automatic failover features are suspended. |
last(/Zookeeper by HTTP/zookeeper.synced_followers[{#SINGLETON}]) < last(/Zookeeper by HTTP/zookeeper.quorum_size[{#SINGLETON}])-1 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Clients discovery | Get list of client connections. Note, depending on the number of client connections this operation may be expensive (i.e. impact server performance). |
HTTP agent | zookeeper.clients Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Zookeeper client {#TYPE} [{#CLIENT}]: Get client info | The item gets information about "{#CLIENT}" client of "{#TYPE}" type. |
Dependent item | zookeeper.client_info[{#TYPE},{#CLIENT}] Preprocessing
|
Zookeeper client {#TYPE} [{#CLIENT}]: Latency, max | The maximum amount of time it takes for the server to respond to a client request. |
Dependent item | zookeeper.max_latency[{#TYPE},{#CLIENT}] Preprocessing
|
Zookeeper client {#TYPE} [{#CLIENT}]: Latency, min | The minimum amount of time it takes for the server to respond to a client request. |
Dependent item | zookeeper.min_latency[{#TYPE},{#CLIENT}] Preprocessing
|
Zookeeper client {#TYPE} [{#CLIENT}]: Latency, avg | The average amount of time it takes for the server to respond to a client request. |
Dependent item | zookeeper.avg_latency[{#TYPE},{#CLIENT}] Preprocessing
|
Zookeeper client {#TYPE} [{#CLIENT}]: Packets sent per sec | The number of packets sent. |
Dependent item | zookeeper.packets_sent[{#TYPE},{#CLIENT}] Preprocessing
|
Zookeeper client {#TYPE} [{#CLIENT}]: Packets received per sec | The number of packets received. |
Dependent item | zookeeper.packets_received[{#TYPE},{#CLIENT}] Preprocessing
|
Zookeeper client {#TYPE} [{#CLIENT}]: Outstanding requests | The number of queued requests when the server is under load and is receiving more sustained requests than it can process. |
Dependent item | zookeeper.outstanding_requests[{#TYPE},{#CLIENT}] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed to monitor internal Zabbix metrics on the local Zabbix server.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Link this template to the local Zabbix server host.
Name | Description | Default |
---|---|---|
{$ZABBIX.PROXY.LAST_SEEN.MAX} | The maximum number of seconds that Zabbix proxy has not been seen. |
600 |
{$PROXY.GROUP.AVAIL.PERCENT.MIN} | Minimum threshold for the proxy group availability percentage triggers. |
75 |
{$PROXY.GROUP.DISCOVERY.NAME.MATCHES} | Filter to include discovered proxy groups by their name. |
.* |
{$PROXY.GROUP.DISCOVERY.NAME.NOT_MATCHES} | Filter to exclude discovered proxy groups by their name. |
CHANGE_IF_NEEDED |
{$ZABBIX.SERVER.UTIL.MAX} | Default maximum threshold for percentage utilization triggers (use macro context for specification). |
75 |
{$ZABBIX.SERVER.UTIL.MIN} | Default minimum threshold for percentage utilization triggers (use macro context for specification). |
65 |
{$ZABBIX.SERVER.UTIL.MAX:"value cache"} | Maximum threshold for the value cache utilization trigger. |
95 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Zabbix stats cluster | The master item of Zabbix cluster statistics. |
Zabbix internal | zabbix[cluster,discovery,nodes] |
Zabbix proxies stats | The master item of Zabbix proxies' statistics. |
Zabbix internal | zabbix[proxy,discovery] |
Zabbix proxy groups stats | The master item of Zabbix proxy groups' statistics. |
Zabbix internal | zabbix[proxy group,discovery] |
Queue over 10 minutes | The number of monitored items in the queue that are delayed by at least 10 minutes. |
Zabbix internal | zabbix[queue,10m] |
Queue | The number of monitored items in the queue that are delayed by at least 6 seconds. |
Zabbix internal | zabbix[queue] |
Zabbix preprocessing | The master item of Zabbix server's preprocessing statistics. |
Zabbix internal | zabbix[preprocessing] |
Utilization of alert manager internal processes, in % | The average percentage of the time during which the alert manager processes have been busy for the last minute. |
Zabbix internal | zabbix[process,alert manager,avg,busy] |
Utilization of alert syncer internal processes, in % | The average percentage of the time during which the alert syncer processes have been busy for the last minute. |
Zabbix internal | zabbix[process,alert syncer,avg,busy] |
Utilization of alerter internal processes, in % | The average percentage of the time during which the alerter processes have been busy for the last minute. |
Zabbix internal | zabbix[process,alerter,avg,busy] |
Utilization of availability manager internal processes, in % | The average percentage of the time during which the availability manager processes have been busy for the last minute. |
Zabbix internal | zabbix[process,availability manager,avg,busy] |
Utilization of configuration syncer internal processes, in % | The average percentage of the time during which the configuration syncer processes have been busy for the last minute. |
Zabbix internal | zabbix[process,configuration syncer,avg,busy] |
Utilization of configuration syncer worker internal processes, in % | The average percentage of the time during which the configuration syncer worker processes have been busy for the last minute. |
Zabbix internal | zabbix[process,configuration syncer worker,avg,busy] |
Utilization of escalator internal processes, in % | The average percentage of the time during which the escalator processes have been busy for the last minute. |
Zabbix internal | zabbix[process,escalator,avg,busy] |
Utilization of history poller internal processes, in % | The average percentage of the time during which the history poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,history poller,avg,busy] |
Utilization of ODBC poller data collector processes, in % | The average percentage of the time during which the ODBC poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,odbc poller,avg,busy] |
Utilization of history syncer internal processes, in % | The average percentage of the time during which the history syncer processes have been busy for the last minute. |
Zabbix internal | zabbix[process,history syncer,avg,busy] |
Utilization of housekeeper internal processes, in % | The average percentage of the time during which the housekeeper processes have been busy for the last minute. |
Zabbix internal | zabbix[process,housekeeper,avg,busy] |
Utilization of http poller data collector processes, in % | The average percentage of the time during which the http poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,http poller,avg,busy] |
Utilization of icmp pinger data collector processes, in % | The average percentage of the time during which the icmp pinger processes have been busy for the last minute. |
Zabbix internal | zabbix[process,icmp pinger,avg,busy] |
Utilization of ipmi manager internal processes, in % | The average percentage of the time during which the ipmi manager processes have been busy for the last minute. |
Zabbix internal | zabbix[process,ipmi manager,avg,busy] |
Utilization of ipmi poller data collector processes, in % | The average percentage of the time during which the ipmi poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,ipmi poller,avg,busy] |
Utilization of java poller data collector processes, in % | The average percentage of the time during which the java poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,java poller,avg,busy] |
Utilization of LLD manager internal processes, in % | The average percentage of the time during which the LLD manager processes have been busy for the last minute. |
Zabbix internal | zabbix[process,lld manager,avg,busy] |
Utilization of LLD worker internal processes, in % | The average percentage of the time during which the LLD worker processes have been busy for the last minute. |
Zabbix internal | zabbix[process,lld worker,avg,busy] |
Utilization of connector manager internal processes, in % | The average percentage of the time during which the connector manager processes have been busy for the last minute. |
Zabbix internal | zabbix[process,connector manager,avg,busy] |
Utilization of connector worker internal processes, in % | The average percentage of the time during which the connector worker processes have been busy for the last minute. |
Zabbix internal | zabbix[process,connector worker,avg,busy] |
Utilization of discovery manager internal processes, in % | The average percentage of the time during which the discovery manager processes have been busy for the last minute. |
Zabbix internal | zabbix[process,discovery manager,avg,busy] |
Utilization of discovery worker internal processes, in % | The average percentage of the time during which the discovery worker processes have been busy for the last minute. |
Zabbix internal | zabbix[process,discovery worker,avg,busy] |
Utilization of poller data collector processes, in % | The average percentage of the time during which the poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,poller,avg,busy] |
Utilization of preprocessing worker internal processes, in % | The average percentage of the time during which the preprocessing worker processes have been busy for the last minute. |
Zabbix internal | zabbix[process,preprocessing worker,avg,busy] |
Utilization of preprocessing manager internal processes, in % | The average percentage of the time during which the preprocessing manager processes have been busy for the last minute. |
Zabbix internal | zabbix[process,preprocessing manager,avg,busy] |
Utilization of proxy poller data collector processes, in % | The average percentage of the time during which the proxy poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,proxy poller,avg,busy] |
Utilization of proxy group manager internal processes, in % | The average percentage of the time during which the proxy group manager processes have been busy for the last minute. |
Zabbix internal | zabbix[process,proxy group manager,avg,busy] |
Utilization of report manager internal processes, in % | The average percentage of the time during which the report manager processes have been busy for the last minute. |
Zabbix internal | zabbix[process,report manager,avg,busy] |
Utilization of report writer internal processes, in % | The average percentage of the time during which the report writer processes have been busy for the last minute. |
Zabbix internal | zabbix[process,report writer,avg,busy] |
Utilization of self-monitoring internal processes, in % | The average percentage of the time during which the self-monitoring processes have been busy for the last minute. |
Zabbix internal | zabbix[process,self-monitoring,avg,busy] |
Utilization of snmp trapper data collector processes, in % | The average percentage of the time during which the snmp trapper processes have been busy for the last minute. |
Zabbix internal | zabbix[process,snmp trapper,avg,busy] |
Utilization of task manager internal processes, in % | The average percentage of the time during which the task manager processes have been busy for the last minute. |
Zabbix internal | zabbix[process,task manager,avg,busy] |
Utilization of timer internal processes, in % | The average percentage of the time during which the timer processes have been busy for the last minute. |
Zabbix internal | zabbix[process,timer,avg,busy] |
Utilization of service manager internal processes, in % | The average percentage of the time during which the service manager processes have been busy for the last minute. |
Zabbix internal | zabbix[process,service manager,avg,busy] |
Utilization of trigger housekeeper internal processes, in % | The average percentage of the time during which the trigger housekeeper processes have been busy for the last minute. |
Zabbix internal | zabbix[process,trigger housekeeper,avg,busy] |
Utilization of trapper data collector processes, in % | The average percentage of the time during which the trapper processes have been busy for the last minute. |
Zabbix internal | zabbix[process,trapper,avg,busy] |
Utilization of unreachable poller data collector processes, in % | The average percentage of the time during which the unreachable poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,unreachable poller,avg,busy] |
Utilization of vmware collector data collector processes, in % | The average percentage of the time during which the vmware collector processes have been busy for the last minute. |
Zabbix internal | zabbix[process,vmware collector,avg,busy] |
Utilization of agent poller data collector processes, in % | The average percentage of the time during which the agent poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,agent poller,avg,busy] |
Utilization of http agent poller data collector processes, in % | The average percentage of the time during which the http agent poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,http agent poller,avg,busy] |
Utilization of snmp poller data collector processes, in % | The average percentage of the time during which the snmp poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,snmp poller,avg,busy] |
Utilization of internal poller data collector processes, in % | The average percentage of the time during which the internal poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,internal poller,avg,busy] |
Utilization of browser poller data collector processes, in % | The average percentage of the time during which the browser poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,browser poller,avg,busy] |
Configuration cache, % used | The availability statistics of Zabbix configuration cache. The percentage of used data buffer. |
Zabbix internal | zabbix[rcache,buffer,pused] |
Trend function cache, % of unique requests | The effectiveness statistics of the Zabbix trend function cache. The percentage of cached items calculated from the sum of the cached items plus requests. A low percentage most likely means that the cache size can be reduced. |
Zabbix internal | zabbix[tcache,cache,pitems] |
Trend function cache, % of misses | The effectiveness statistics of the Zabbix trend function cache. The percentage of cache misses. |
Zabbix internal | zabbix[tcache,cache,pmisses] |
Value cache, % used | The availability statistics of Zabbix value cache. The percentage of used data buffer. |
Zabbix internal | zabbix[vcache,buffer,pused] |
Value cache hits | The effectiveness statistics of Zabbix value cache. The number of cache hits (history values taken from the cache). |
Zabbix internal | zabbix[vcache,cache,hits] Preprocessing
|
Value cache misses | The effectiveness statistics of Zabbix value cache. The number of cache misses (history values taken from the database). |
Zabbix internal | zabbix[vcache,cache,misses] Preprocessing
|
Value cache operating mode | The operating mode of the value cache. |
Zabbix internal | zabbix[vcache,cache,mode] |
Zabbix server check | Flag indicating whether it is a server or not. |
Zabbix internal | zabbix[triggers] Preprocessing
|
Version | The version of Zabbix server. |
Zabbix internal | zabbix[version] Preprocessing
|
VMware cache, % used | The availability statistics of Zabbix vmware cache. The percentage of used data buffer. |
Zabbix internal | zabbix[vmware,buffer,pused] |
History write cache, % used | The statistics and availability of Zabbix write cache. The percentage of used history buffer. The history cache is used to store item values. A high number indicates database performance problems. |
Zabbix internal | zabbix[wcache,history,pused] |
History index cache, % used | The statistics and availability of Zabbix write cache. The percentage of used history index buffer. The history index cache is used to index values stored in the history cache. |
Zabbix internal | zabbix[wcache,index,pused] |
Trend write cache, % used | The statistics and availability of Zabbix write cache. The percentage of used trend buffer. The trend cache stores the aggregate of all items that have received data for the current hour. |
Zabbix internal | zabbix[wcache,trend,pused] |
Number of processed values per second | The statistics and availability of Zabbix write cache. The total number of values processed by Zabbix server or Zabbix proxy, except unsupported items. |
Zabbix internal | zabbix[wcache,values] Preprocessing
|
Number of processed numeric (unsigned) values per second | The statistics and availability of Zabbix write cache. The number of processed numeric (unsigned) values. |
Zabbix internal | zabbix[wcache,values,uint] Preprocessing
|
Number of processed numeric (float) values per second | The statistics and availability of Zabbix write cache. The number of processed numeric (float) values. |
Zabbix internal | zabbix[wcache,values,float] Preprocessing
|
Number of processed log values per second | The statistics and availability of Zabbix write cache. The number of processed log values. |
Zabbix internal | zabbix[wcache,values,log] Preprocessing
|
Number of processed not supported values per second | The statistics and availability of Zabbix write cache. The number of times the item processing resulted in an item becoming unsupported or remaining in that state. |
Zabbix internal | zabbix[wcache,values,not supported] Preprocessing
|
Number of processed character values per second | The statistics and availability of Zabbix write cache. The number of processed character values. |
Zabbix internal | zabbix[wcache,values,str] Preprocessing
|
Number of processed text values per second | The statistics and availability of Zabbix write cache. The number of processed text values. |
Zabbix internal | zabbix[wcache,values,text] Preprocessing
|
Number of values synchronized with the database per second | Average quantity of values written to the database, recalculated once per minute. |
Zabbix internal | zabbix[vps,written] Preprocessing
|
LLD queue | The number of values enqueued in the low-level discovery processing queue. |
Zabbix internal | zabbix[lld_queue] |
Preprocessing queue | The number of values enqueued in the preprocessing queue. |
Dependent item | zabbix[preprocessing_queue] Preprocessing
|
Preprocessing throughput | Reflects the throughput of the preprocessing. |
Dependent item | zabbix[preprocessing_throughput] Preprocessing
|
Connector queue | The count of values enqueued in the connector queue. |
Zabbix internal | zabbix[connector_queue] |
Discovery queue | The count of values enqueued in the discovery queue. |
Zabbix internal | zabbix[discovery_queue] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zabbix server: More than 100 items have been missing data for over 10 minutes | Indicates potential issues with network connectivity, agent failures, or unresponsive monitored resources that require attention. |
min(/Zabbix server health/zabbix[queue,10m],10m)>100 |Warning |
Manual close: Yes | |
Zabbix server: Utilization of alert manager processes is high | Indicates potential performance issues with the alert manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,alert manager,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"alert manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of alert syncer processes is high | Indicates potential performance issues with the alert syncer, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,alert syncer,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"alert syncer"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of alerter processes is high | Indicates potential performance issues with the alerter, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,alerter,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"alerter"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of availability manager processes is high | Indicates potential performance issues with the availability manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,availability manager,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"availability manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of configuration syncer processes is high | Indicates potential performance issues with the configuration syncer, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,configuration syncer,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"configuration syncer"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of configuration syncer worker processes is high | Indicates potential performance issues with the configuration syncer worker, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,configuration syncer worker,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"configuration syncer worker"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of escalator processes is high | Indicates potential performance issues with the escalator, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,escalator,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"escalator"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of history poller processes is high | Indicates potential performance issues with the history poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,history poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"history poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of ODBC poller processes is high | Indicates potential performance issues with the ODBC poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,odbc poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"ODBC poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of history syncer processes is high | Indicates potential performance issues with the history syncer, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,history syncer,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"history syncer"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of housekeeper processes is high | Indicates potential performance issues with the housekeeper, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,housekeeper,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"housekeeper"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of http poller processes is high | Indicates potential performance issues with the http poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,http poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"http poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of icmp pinger processes is high | Indicates potential performance issues with the icmp pinger, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,icmp pinger,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"icmp pinger"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of ipmi manager processes is high | Indicates potential performance issues with the ipmi manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,ipmi manager,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"ipmi manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of ipmi poller processes is high | Indicates potential performance issues with the ipmi poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,ipmi poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"ipmi poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of java poller processes is high | Indicates potential performance issues with the java poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,java poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"java poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of LLD manager processes is high | Indicates potential performance issues with the LLD manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,lld manager,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"LLD manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of LLD worker processes is high | Indicates potential performance issues with the LLD worker, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,lld worker,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"LLD worker"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of connector manager processes is high | Indicates potential performance issues with the connector manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,connector manager,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"connector manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of connector worker processes is high | Indicates potential performance issues with the connector worker, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,connector worker,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"connector worker"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of discovery manager processes is high | Indicates potential performance issues with the discovery manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,discovery manager,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"discovery manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of discovery worker processes is high | Indicates potential performance issues with the discovery worker, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,discovery worker,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"discovery worker"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of poller processes is high | Indicates potential performance issues with the poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of preprocessing worker processes is high | Indicates potential performance issues with the preprocessing worker, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,preprocessing worker,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"preprocessing worker"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of preprocessing manager processes is high | Indicates potential performance issues with the preprocessing manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,preprocessing manager,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"preprocessing manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of proxy poller processes is high | Indicates potential performance issues with the proxy poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,proxy poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"proxy poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of proxy group manager processes is high | Indicates potential performance issues with the proxy group manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,proxy group manager,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"proxy group manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of report manager processes is high | Indicates potential performance issues with the report manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,report manager,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"report manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of report writer processes is high | Indicates potential performance issues with the report writer, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,report writer,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"report writer"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of self-monitoring processes is high | Indicates potential performance issues with the self-monitoring, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,self-monitoring,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"self-monitoring"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of snmp trapper processes is high | Indicates potential performance issues with the snmp trapper, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,snmp trapper,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"snmp trapper"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of task manager processes is high | Indicates potential performance issues with the task manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,task manager,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"task manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of timer processes is high | Indicates potential performance issues with the timer, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,timer,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"timer"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of service manager processes is high | Indicates potential performance issues with the service manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,service manager,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"service manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of trigger housekeeper processes is high | Indicates potential performance issues with the trigger housekeeper, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,trigger housekeeper,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"trigger housekeeper"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of trapper processes is high | Indicates potential performance issues with the trapper, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,trapper,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"trapper"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of unreachable poller processes is high | Indicates potential performance issues with the unreachable poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,unreachable poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"unreachable poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of vmware collector processes is high | Indicates potential performance issues with the vmware collector, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,vmware collector,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"vmware collector"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of agent poller processes is high | Indicates potential performance issues with the agent poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,agent poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"agent poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of http agent poller processes is high | Indicates potential performance issues with the http agent poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,http agent poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"http agent poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of snmp poller processes is high | Indicates potential performance issues with the snmp poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,snmp poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"snmp poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of internal poller processes is high | Indicates potential performance issues with the internal poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,internal poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"internal poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of browser poller processes is high | Indicates potential performance issues with the browser poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health/zabbix[process,browser poller,avg,busy],10m)>{$ZABBIX.SERVER.UTIL.MAX:"browser poller"} |Average |
Manual close: Yes | |
Zabbix server: Excessive configuration cache usage | Consider increasing |
max(/Zabbix server health/zabbix[rcache,buffer,pused],10m)>{$ZABBIX.SERVER.UTIL.MAX:"configuration cache"} |Average |
Manual close: Yes | |
Zabbix server: Excessive value cache usage | Consider increasing |
max(/Zabbix server health/zabbix[vcache,buffer,pused],10m)>{$ZABBIX.SERVER.UTIL.MAX:"value cache"} |Average |
Manual close: Yes | |
Zabbix server: Zabbix value cache working in low-memory mode | Once low-memory mode has been switched on, the value cache will remain in this state for 24 hours, even if the problem that triggered this mode is resolved sooner. |
last(/Zabbix server health/zabbix[vcache,cache,mode])=1 |High |
Manual close: Yes | |
Zabbix server: Wrong template assigned | Check that the template has been selected correctly. |
last(/Zabbix server health/zabbix[triggers])=0 |Disaster |
Manual close: Yes | |
Zabbix server: Version has changed | Zabbix server version has changed. Acknowledge to close the problem manually. |
last(/Zabbix server health/zabbix[version],#1)<>last(/Zabbix server health/zabbix[version],#2) and length(last(/Zabbix server health/zabbix[version]))>0 |Info |
Manual close: Yes | |
Zabbix server: Excessive vmware cache usage | Consider increasing |
max(/Zabbix server health/zabbix[vmware,buffer,pused],10m)>{$ZABBIX.SERVER.UTIL.MAX:"vmware cache"} |Average |
Manual close: Yes | |
Zabbix server: Excessive history cache usage | Consider increasing |
max(/Zabbix server health/zabbix[wcache,history,pused],10m)>{$ZABBIX.SERVER.UTIL.MAX:"history cache"} |Average |
Manual close: Yes | |
Zabbix server: Excessive history index cache usage | Consider increasing |
max(/Zabbix server health/zabbix[wcache,index,pused],10m)>{$ZABBIX.SERVER.UTIL.MAX:"index cache"} |Average |
Manual close: Yes | |
Zabbix server: Excessive trends cache usage | Consider increasing |
max(/Zabbix server health/zabbix[wcache,trend,pused],10m)>{$ZABBIX.SERVER.UTIL.MAX:"trend cache"} |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Zabbix proxy discovery | LLD rule with item and trigger prototypes for proxy discovery. |
Dependent item | zabbix.proxy.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Proxy [{#PROXY.NAME}]: Stats | The statistics for the discovered proxy. |
Dependent item | zabbix.proxy.stats[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Mode | The mode of Zabbix proxy. |
Dependent item | zabbix.proxy.mode[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Unencrypted | The encryption status for connections from a proxy. |
Dependent item | zabbix.proxy.unencrypted[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: PSK | The encryption status for connections from a proxy. |
Dependent item | zabbix.proxy.psk[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Certificate | The encryption status for connections from a proxy. |
Dependent item | zabbix.proxy.cert[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Compression | The compression status of a proxy. |
Dependent item | zabbix.proxy.compression[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Item count | The number of enabled items on enabled hosts assigned to a proxy. |
Dependent item | zabbix.proxy.items[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Host count | The number of enabled hosts assigned to a proxy. |
Dependent item | zabbix.proxy.hosts[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Version | A version of Zabbix proxy. |
Dependent item | zabbix.proxy.version[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Last seen, in seconds | The time when a proxy was last seen by a server. |
Dependent item | zabbix.proxy.last_seen[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Compatibility | Version of proxy compared to Zabbix server version. Possible values: 0 - Undefined; 1 - Current version (proxy and server have the same major version); 2 - Outdated version (proxy version is older than server version, but is partially supported); 3 - Unsupported version (proxy version is older than server previous LTS release version or server major version is older than proxy major version). |
Dependent item | zabbix.proxy.compatibility[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Required VPS | The required performance of a proxy (the number of values that need to be collected per second). |
Dependent item | zabbix.proxy.requiredperformance[{#PROXY.NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy last seen | Zabbix proxy is not updating the configuration data. |
last(/Zabbix server health/zabbix.proxy.last_seen[{#PROXY.NAME}],#1)>{$ZABBIX.PROXY.LAST_SEEN.MAX} |Warning |
||
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy never seen | Zabbix proxy is not updating the configuration data. |
last(/Zabbix server health/zabbix.proxy.last_seen[{#PROXY.NAME}],#1)=-1 |Warning |
||
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy is outdated | Zabbix proxy version is older than server version, but is partially supported. Only data collection and remote execution is available. |
last(/Zabbix server health/zabbix.proxy.compatibility[{#PROXY.NAME}],#1)=2 |Warning |
||
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy is not supported | Zabbix proxy version is older than server previous LTS release version or server major version is older than proxy major version. |
last(/Zabbix server health/zabbix.proxy.compatibility[{#PROXY.NAME}],#1)=3 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Zabbix proxy groups discovery | LLD rule with item and trigger prototypes for proxy groups discovery. |
Dependent item | zabbix.proxy.groups.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Proxy group [{#PROXY.GROUP.NAME}]: Stats | The statistics for the discovered proxy group. |
Dependent item | zabbix.proxy.group.stats[{#PROXY.GROUP.NAME}] Preprocessing
|
Proxy group [{#PROXY.GROUP.NAME}]: State | State of the Zabbix proxy group. Possible values: 0 - unknown; 1 - offline; 2 - recovering; 3 - online; 4 - degrading. |
Dependent item | zabbix.proxy.group.state[{#PROXY.GROUP.NAME}] Preprocessing
|
Proxy group [{#PROXY.GROUP.NAME}]: Available proxies | Count of available proxies in the Zabbix proxy group. |
Dependent item | zabbix.proxy.group.avail.proxies[{#PROXY.GROUP.NAME}] Preprocessing
|
Proxy group [{#PROXY.GROUP.NAME}]: Available proxies, in % | Percentage of available proxies in the Zabbix proxy group. |
Dependent item | zabbix.proxy.group.avail.proxies.percent[{#PROXY.GROUP.NAME}] Preprocessing
|
Proxy group [{#PROXY.GROUP.NAME}]: Settings | The settings for the discovered proxy group. |
Dependent item | zabbix.proxy.group.settings[{#PROXY.GROUP.NAME}] Preprocessing
|
Proxy group [{#PROXY.GROUP.NAME}]: Failover period | Failover period in the Zabbix proxy group. |
Dependent item | zabbix.proxy.group.failover[{#PROXY.GROUP.NAME}] Preprocessing
|
Proxy group [{#PROXY.GROUP.NAME}]: Minimum number of proxies | Minimum number of proxies online in the Zabbix proxy group. |
Dependent item | zabbix.proxy.group.online.min[{#PROXY.GROUP.NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zabbix server: Proxy group [{#PROXY.GROUP.NAME}]: Status "offline" | The state of the Zabbix proxy group is "offline". |
last(/Zabbix server health/zabbix.proxy.group.state[{#PROXY.GROUP.NAME}],#1)=1 |High |
||
Zabbix server: Proxy group [{#PROXY.GROUP.NAME}]: Status "degrading" | The state of the Zabbix proxy group is "degrading". |
last(/Zabbix server health/zabbix.proxy.group.state[{#PROXY.GROUP.NAME}],#1)=4 |Average |
Depends on:
|
|
Zabbix server: Proxy group [{#PROXY.GROUP.NAME}]: Status changed | The state of the Zabbix proxy group has changed. Acknowledge to close the problem manually. |
last(/Zabbix server health/zabbix.proxy.group.state[{#PROXY.GROUP.NAME}],#1)<>last(/Zabbix server health/zabbix.proxy.group.state[{#PROXY.GROUP.NAME}],#2) and length(last(/Zabbix server health/zabbix.proxy.group.state[{#PROXY.GROUP.NAME}]))>0 |Info |
Manual close: Yes Depends on:
|
|
Zabbix server: Proxy group [{#PROXY.GROUP.NAME}]: Availability too low | The availability of proxies in a proxy group is below {$PROXY.GROUP.AVAIL.PERCENT.MIN}% for at least 3 minutes. |
max(/Zabbix server health/zabbix.proxy.group.avail.proxies.percent[{#PROXY.GROUP.NAME}],3m)<{$PROXY.GROUP.AVAIL.PERCENT.MIN} |Warning |
||
Zabbix server: Proxy group [{#PROXY.GROUP.NAME}]: Failover invalid value | Proxy group failover has an invalid value. |
last(/Zabbix server health/zabbix.proxy.group.failover[{#PROXY.GROUP.NAME}],#1)=-1 |Warning |
||
Zabbix server: Proxy group [{#PROXY.GROUP.NAME}]: Minimum number of proxies invalid value | Proxy group minimum number of proxies has an invalid value. |
last(/Zabbix server health/zabbix.proxy.group.online.min[{#PROXY.GROUP.NAME}],#1)=-1 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
High availability cluster node discovery | LLD rule with item and trigger prototypes for node discovery. |
Dependent item | zabbix.nodes.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster node [{#NODE.NAME}]: Stats | Provides the statistics of a node. |
Dependent item | zabbix.node.stats[{#NODE.ID}] Preprocessing
|
Cluster node [{#NODE.NAME}]: Address | The IPv4 address of a node. |
Dependent item | zabbix.node.address[{#NODE.ID}] Preprocessing
|
Cluster node [{#NODE.NAME}]: Last access time | Last access time. |
Dependent item | zabbix.node.lastaccess.time[{#NODE.ID}] Preprocessing
|
Cluster node [{#NODE.NAME}]: Last access age | The time between the database's |
Dependent item | zabbix.node.lastaccess.age[{#NODE.ID}] Preprocessing
|
Cluster node [{#NODE.NAME}]: Status | The status of a node. |
Dependent item | zabbix.node.status[{#NODE.ID}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zabbix server: Cluster node [{#NODE.NAME}]: Status changed | The state of the node has changed. Acknowledge to close the problem manually. |
last(/Zabbix server health/zabbix.node.status[{#NODE.ID}],#1)<>last(/Zabbix server health/zabbix.node.status[{#NODE.ID}],#2) |Info |
Manual close: Yes |
This template is designed to monitor internal Zabbix metrics on the remote Zabbix server.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Specify the address of the remote Zabbix server by changing the {$ZABBIX.SERVER.ADDRESS}
and {$ZABBIX.SERVER.PORT}
macros. Don't forget to adjust the StatsAllowedIP
parameter in the remote server's configuration file to allow the collection of statistics.
Name | Description | Default |
---|---|---|
{$ZABBIX.SERVER.ADDRESS} | IP/DNS/network mask list of servers to be remotely queried (default is 127.0.0.1). |
|
{$ZABBIX.SERVER.PORT} | Port of server to be remotely queried (default is 10051). |
|
{$ZABBIX.PROXY.LAST_SEEN.MAX} | The maximum number of seconds that Zabbix proxy has not been seen. |
600 |
{$ZABBIX.SERVER.NODATA_TIMEOUT} | The time threshold after which statistics are considered unavailable. Used in trigger expressions. |
5m |
{$ZABBIX.SERVER.UTIL.MAX} | Default maximum threshold for percentage utilization triggers (use macro context for specification). |
75 |
{$ZABBIX.SERVER.UTIL.MIN} | Default minimum threshold for percentage utilization triggers (use macro context for specification). |
65 |
{$ZABBIX.SERVER.UTIL.MAX:"value cache"} | Maximum threshold for value cache utilization triggers. |
95 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Zabbix stats | The master item of Zabbix server statistics. |
Zabbix internal | zabbix[stats,{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT}] |
Zabbix proxies stats | The master item of Zabbix proxies' statistics. |
Dependent item | zabbix.proxies.stats Preprocessing
|
Queue over 10 minutes | The number of monitored items in the queue that are delayed by at least 10 minutes. |
Zabbix internal | zabbix[stats,{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT},queue,10m] Preprocessing
|
Queue | The number of monitored items in the queue that are delayed by at least 6 seconds. |
Zabbix internal | zabbix[stats,{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT},queue] Preprocessing
|
Utilization of alert manager internal processes, in % | The average percentage of the time during which the alert manager processes have been busy for the last minute. |
Dependent item | process.alert_manager.avg.busy Preprocessing
|
Utilization of alert syncer internal processes, in % | The average percentage of the time during which the alert syncer processes have been busy for the last minute. |
Dependent item | process.alert_syncer.avg.busy Preprocessing
|
Utilization of alerter internal processes, in % | The average percentage of the time during which the alerter processes have been busy for the last minute. |
Dependent item | process.alerter.avg.busy Preprocessing
|
Utilization of availability manager internal processes, in % | The average percentage of the time during which the availability manager processes have been busy for the last minute. |
Dependent item | process.availability_manager.avg.busy Preprocessing
|
Utilization of configuration syncer internal processes, in % | The average percentage of the time during which the configuration syncer processes have been busy for the last minute. |
Dependent item | process.configuration_syncer.avg.busy Preprocessing
|
Utilization of configuration syncer worker internal processes, in % | The average percentage of the time during which the configuration syncer worker processes have been busy for the last minute. |
Dependent item | process.configurationsyncerworker.avg.busy Preprocessing
|
Utilization of escalator internal processes, in % | The average percentage of the time during which the escalator processes have been busy for the last minute. |
Dependent item | process.escalator.avg.busy Preprocessing
|
Utilization of history poller internal processes, in % | The average percentage of the time during which the history poller processes have been busy for the last minute. |
Dependent item | process.history_poller.avg.busy Preprocessing
|
Utilization of ODBC poller data collector processes, in % | The average percentage of the time during which the ODBC poller processes have been busy for the last minute. |
Dependent item | process.odbc_poller.avg.busy Preprocessing
|
Utilization of history syncer internal processes, in % | The average percentage of the time during which the history syncer processes have been busy for the last minute. |
Dependent item | process.history_syncer.avg.busy Preprocessing
|
Utilization of housekeeper internal processes, in % | The average percentage of the time during which the housekeeper processes have been busy for the last minute. |
Dependent item | process.housekeeper.avg.busy Preprocessing
|
Utilization of http poller data collector processes, in % | The average percentage of the time during which the http poller processes have been busy for the last minute. |
Dependent item | process.http_poller.avg.busy Preprocessing
|
Utilization of icmp pinger data collector processes, in % | The average percentage of the time during which the icmp pinger processes have been busy for the last minute. |
Dependent item | process.icmp_pinger.avg.busy Preprocessing
|
Utilization of ipmi manager internal processes, in % | The average percentage of the time during which the ipmi manager processes have been busy for the last minute. |
Dependent item | process.ipmi_manager.avg.busy Preprocessing
|
Utilization of ipmi poller data collector processes, in % | The average percentage of the time during which the ipmi poller processes have been busy for the last minute. |
Dependent item | process.ipmi_poller.avg.busy Preprocessing
|
Utilization of java poller data collector processes, in % | The average percentage of the time during which the java poller processes have been busy for the last minute. |
Dependent item | process.java_poller.avg.busy Preprocessing
|
Utilization of LLD manager internal processes, in % | The average percentage of the time during which the LLD manager processes have been busy for the last minute. |
Dependent item | process.lld_manager.avg.busy Preprocessing
|
Utilization of LLD worker internal processes, in % | The average percentage of the time during which the LLD worker processes have been busy for the last minute. |
Dependent item | process.lld_worker.avg.busy Preprocessing
|
Utilization of connector manager internal processes, in % | The average percentage of the time during which the connector manager processes have been busy for the last minute. |
Dependent item | process.connector_manager.avg.busy Preprocessing
|
Utilization of connector worker internal processes, in % | The average percentage of the time during which the connector worker processes have been busy for the last minute. |
Dependent item | process.connector_worker.avg.busy Preprocessing
|
Utilization of discovery manager internal processes, in % | The average percentage of the time during which the discovery manager processes have been busy for the last minute. |
Dependent item | process.discovery_manager.avg.busy Preprocessing
|
Utilization of discovery worker internal processes, in % | The average percentage of the time during which the discovery worker processes have been busy for the last minute. |
Dependent item | process.discovery_worker.avg.busy Preprocessing
|
Utilization of poller data collector processes, in % | The average percentage of the time during which the poller processes have been busy for the last minute. |
Dependent item | process.poller.avg.busy Preprocessing
|
Utilization of preprocessing worker internal processes, in % | The average percentage of the time during which the preprocessing worker processes have been busy for the last minute. |
Dependent item | process.preprocessing_worker.avg.busy Preprocessing
|
Utilization of preprocessing manager internal processes, in % | The average percentage of the time during which the preprocessing manager processes have been busy for the last minute. |
Dependent item | process.preprocessing_manager.avg.busy Preprocessing
|
Utilization of proxy poller data collector processes, in % | The average percentage of the time during which the proxy poller processes have been busy for the last minute. |
Dependent item | process.proxy_poller.avg.busy Preprocessing
|
Utilization of proxy group manager internal processes, in % | The average percentage of the time during which the proxy group manager processes have been busy for the last minute. |
Dependent item | process.proxygroupmanager.avg.busy Preprocessing
|
Utilization of report manager internal processes, in % | The average percentage of the time during which the report manager processes have been busy for the last minute. |
Dependent item | process.report_manager.avg.busy Preprocessing
|
Utilization of report writer internal processes, in % | The average percentage of the time during which the report writer processes have been busy for the last minute. |
Dependent item | process.report_writer.avg.busy Preprocessing
|
Utilization of self-monitoring internal processes, in % | The average percentage of the time during which the self-monitoring processes have been busy for the last minute. |
Dependent item | process.self-monitoring.avg.busy Preprocessing
|
Utilization of snmp trapper data collector processes, in % | The average percentage of the time during which the snmp trapper processes have been busy for the last minute. |
Dependent item | process.snmp_trapper.avg.busy Preprocessing
|
Utilization of task manager internal processes, in % | The average percentage of the time during which the task manager processes have been busy for the last minute. |
Dependent item | process.task_manager.avg.busy Preprocessing
|
Utilization of timer internal processes, in % | The average percentage of the time during which the timer processes have been busy for the last minute. |
Dependent item | process.timer.avg.busy Preprocessing
|
Utilization of service manager internal processes, in % | The average percentage of the time during which the service manager processes have been busy for the last minute. |
Dependent item | process.service_manager.avg.busy Preprocessing
|
Utilization of trigger housekeeper internal processes, in % | The average percentage of the time during which the trigger housekeeper processes have been busy for the last minute. |
Dependent item | process.trigger_housekeeper.avg.busy Preprocessing
|
Utilization of trapper data collector processes, in % | The average percentage of the time during which the trapper processes have been busy for the last minute. |
Dependent item | process.trapper.avg.busy Preprocessing
|
Utilization of unreachable poller data collector processes, in % | The average percentage of the time during which the unreachable poller processes have been busy for the last minute. |
Dependent item | process.unreachable_poller.avg.busy Preprocessing
|
Utilization of vmware collector data collector processes, in % | The average percentage of the time during which the vmware collector processes have been busy for the last minute. |
Dependent item | process.vmware_collector.avg.busy Preprocessing
|
Utilization of agent poller data collector processes, in % | The average percentage of the time during which the agent poller processes have been busy for the last minute. |
Dependent item | process.agent_poller.avg.busy Preprocessing
|
Utilization of http agent poller data collector processes, in % | The average percentage of the time during which the http agent poller processes have been busy for the last minute. |
Dependent item | process.httpagentpoller.avg.busy Preprocessing
|
Utilization of snmp poller data collector processes, in % | The average percentage of the time during which the snmp poller processes have been busy for the last minute. |
Dependent item | process.snmp_poller.avg.busy Preprocessing
|
Utilization of internal poller data collector processes, in % | The average percentage of the time during which the internal poller processes have been busy for the last minute. |
Dependent item | process.internal_poller.avg.busy Preprocessing
|
Utilization of browser poller data collector processes, in % | The average percentage of the time during which the browser poller processes have been busy for the last minute. |
Dependent item | process.browser_poller.avg.busy Preprocessing
|
Configuration cache, % used | The availability statistics of Zabbix configuration cache. The percentage of used data buffer. |
Dependent item | rcache.buffer.pused Preprocessing
|
Trend function cache, % of unique requests | The effectiveness statistics of the Zabbix trend function cache. The percentage of cached items calculated from the sum of the cached items plus requests. A low percentage most likely means that the cache size can be reduced. |
Dependent item | tcache.pitems Preprocessing
|
Trend function cache, % of misses | The effectiveness statistics of the Zabbix trend function cache. The percentage of cache misses. |
Dependent item | tcache.pmisses Preprocessing
|
Value cache, % used | The availability statistics of Zabbix value cache. The percentage of used data buffer. |
Dependent item | vcache.buffer.pused Preprocessing
|
Value cache hits | The effectiveness statistics of Zabbix value cache. The number of cache hits (history values taken from the cache). |
Dependent item | vcache.cache.hits Preprocessing
|
Value cache misses | The effectiveness statistics of Zabbix value cache. The number of cache misses (history values taken from the database). |
Dependent item | vcache.cache.misses Preprocessing
|
Value cache operating mode | The operating mode of the value cache. |
Dependent item | vcache.cache.mode Preprocessing
|
Zabbix server check | Flag indicating whether it is a server or not. |
Dependent item | server_check Preprocessing
|
Version | The version of Zabbix server. |
Dependent item | version Preprocessing
|
VMware cache, % used | The availability statistics of Zabbix vmware cache. The percentage of used data buffer. |
Dependent item | vmware.buffer.pused Preprocessing
|
History write cache, % used | The statistics and availability of Zabbix write cache. The percentage of used history buffer. The history cache is used to store item values. A high number indicates database performance problems. |
Dependent item | wcache.history.pused Preprocessing
|
History index cache, % used | The statistics and availability of Zabbix write cache. The percentage of used history index buffer. The history index cache is used to index values stored in the history cache. |
Dependent item | wcache.index.pused Preprocessing
|
Trend write cache, % used | The statistics and availability of Zabbix write cache. The percentage of used trend buffer. The trend cache stores the aggregate of all items that have received data for the current hour. |
Dependent item | wcache.trend.pused Preprocessing
|
Number of processed values per second | The statistics and availability of Zabbix write cache. The total number of values processed by Zabbix server or Zabbix proxy, except unsupported items. |
Dependent item | wcache.values Preprocessing
|
Number of processed numeric (unsigned) values per second | The statistics and availability of Zabbix write cache. The number of processed numeric (unsigned) values. |
Dependent item | wcache.values.uint Preprocessing
|
Number of processed numeric (float) values per second | The statistics and availability of Zabbix write cache. The number of processed numeric (float) values. |
Dependent item | wcache.values.float Preprocessing
|
Number of processed log values per second | The statistics and availability of Zabbix write cache. The number of processed log values. |
Dependent item | wcache.values.log Preprocessing
|
Number of processed not supported values per second | The statistics and availability of Zabbix write cache. The number of times the item processing resulted in an item becoming unsupported or remaining in that state. |
Dependent item | wcache.values.not_supported Preprocessing
|
Number of processed character values per second | The statistics and availability of Zabbix write cache. The number of processed character values. |
Dependent item | wcache.values.str Preprocessing
|
Number of processed text values per second | The statistics and availability of Zabbix write cache. The number of processed text values. |
Dependent item | wcache.values.text Preprocessing
|
Number of values synchronized with the database per second | Average quantity of values written to the database, recalculated once per minute. |
Dependent item | vps.written Preprocessing
|
LLD queue | The number of values enqueued in the low-level discovery processing queue. |
Dependent item | lld_queue Preprocessing
|
Preprocessing queue | The number of values enqueued in the preprocessing queue. |
Dependent item | preprocessing_queue Preprocessing
|
Preprocessing throughput | Reflects the throughput of the preprocessing. |
Dependent item | preprocessing_throughput Preprocessing
|
Connector queue | The count of values enqueued in the connector queue. |
Dependent item | connector_queue Preprocessing
|
Discovery queue | The count of values enqueued in the discovery queue. |
Dependent item | discovery_queue Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zabbix server: More than 100 items have been missing data for over 10 minutes | Indicates potential issues with network connectivity, agent failures, or unresponsive monitored resources that require attention. |
min(/Remote Zabbix server health/zabbix[stats,{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT},queue,10m],10m)>100 |Warning |
Manual close: Yes | |
Zabbix server: Utilization of alert manager processes is high | Indicates potential performance issues with the alert manager, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.alert_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"alert manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of alert syncer processes is high | Indicates potential performance issues with the alert syncer, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.alert_syncer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"alert syncer"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of alerter processes is high | Indicates potential performance issues with the alerter, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.alerter.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"alerter"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of availability manager processes is high | Indicates potential performance issues with the availability manager, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.availability_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"availability manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of configuration syncer processes is high | Indicates potential performance issues with the configuration syncer, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.configuration_syncer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"configuration syncer"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of configuration syncer worker processes is high | Indicates potential performance issues with the configuration syncer worker, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.configuration_syncer_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"configuration syncer worker"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of escalator processes is high | Indicates potential performance issues with the escalator, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.escalator.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"escalator"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of history poller processes is high | Indicates potential performance issues with the history poller, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.history_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"history poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of ODBC poller processes is high | Indicates potential performance issues with the ODBC poller, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.odbc_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"ODBC poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of history syncer processes is high | Indicates potential performance issues with the history syncer, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.history_syncer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"history syncer"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of housekeeper processes is high | Indicates potential performance issues with the housekeeper, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.housekeeper.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"housekeeper"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of http poller processes is high | Indicates potential performance issues with the http poller, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.http_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"http poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of icmp pinger processes is high | Indicates potential performance issues with the icmp pinger, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.icmp_pinger.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"icmp pinger"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of ipmi manager processes is high | Indicates potential performance issues with the ipmi manager, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.ipmi_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"ipmi manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of ipmi poller processes is high | Indicates potential performance issues with the ipmi poller, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.ipmi_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"ipmi poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of java poller processes is high | Indicates potential performance issues with the java poller, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.java_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"java poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of LLD manager processes is high | Indicates potential performance issues with the LLD manager, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.lld_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"LLD manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of LLD worker processes is high | Indicates potential performance issues with the LLD worker, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.lld_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"LLD worker"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of connector manager processes is high | Indicates potential performance issues with the connector manager, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.connector_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"connector manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of connector worker processes is high | Indicates potential performance issues with the connector worker, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.connector_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"connector worker"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of discovery manager processes is high | Indicates potential performance issues with the discovery manager, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.discovery_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"discovery manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of discovery worker processes is high | Indicates potential performance issues with the discovery worker, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.discovery_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"discovery worker"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of poller processes is high | Indicates potential performance issues with the poller, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of preprocessing worker processes is high | Indicates potential performance issues with the preprocessing worker, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.preprocessing_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"preprocessing worker"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of preprocessing manager processes is high | Indicates potential performance issues with the preprocessing manager, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.preprocessing_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"preprocessing manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of proxy poller processes is high | Indicates potential performance issues with the proxy poller, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.proxy_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"proxy poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of proxy group manager processes is high | Indicates potential performance issues with the proxy group manager, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.proxy_group_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"proxy group manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of report manager processes is high | Indicates potential performance issues with the report manager, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.report_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"report manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of report writer processes is high | Indicates potential performance issues with the report writer, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.report_writer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"report writer"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of self-monitoring processes is high | Indicates potential performance issues with the self-monitoring, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.self-monitoring.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"self-monitoring"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of snmp trapper processes is high | Indicates potential performance issues with the snmp trapper, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.snmp_trapper.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"snmp trapper"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of task manager processes is high | Indicates potential performance issues with the task manager, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.task_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"task manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of timer processes is high | Indicates potential performance issues with the timer, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.timer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"timer"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of service manager processes is high | Indicates potential performance issues with the service manager, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.service_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"service manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of trigger housekeeper processes is high | Indicates potential performance issues with the trigger housekeeper, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.trigger_housekeeper.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"trigger housekeeper"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of trapper processes is high | Indicates potential performance issues with the trapper, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.trapper.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"trapper"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of unreachable poller processes is high | Indicates potential performance issues with the unreachable poller, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.unreachable_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"unreachable poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of vmware collector processes is high | Indicates potential performance issues with the vmware collector, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.vmware_collector.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"vmware collector"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of agent poller processes is high | Indicates potential performance issues with the agent poller, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.agent_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"agent poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of http agent poller processes is high | Indicates potential performance issues with the http agent poller, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.http_agent_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"http agent poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of snmp poller processes is high | Indicates potential performance issues with the snmp poller, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.snmp_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"snmp poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of internal poller processes is high | Indicates potential performance issues with the internal poller, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.internal_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"internal poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of browser poller processes is high | Indicates potential performance issues with the browser poller, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix server health/process.browser_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"browser poller"} |Average |
Manual close: Yes | |
Zabbix server: Excessive configuration cache usage | Consider increasing |
max(/Remote Zabbix server health/rcache.buffer.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"configuration cache"} |Average |
Manual close: Yes | |
Zabbix server: Failed to fetch stats data | Zabbix has not received statistics data for |
nodata(/Remote Zabbix server health/rcache.buffer.pused,{$ZABBIX.SERVER.NODATA_TIMEOUT})=1 |Warning |
||
Zabbix server: Excessive value cache usage | Consider increasing |
max(/Remote Zabbix server health/vcache.buffer.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"value cache"} |Average |
Manual close: Yes | |
Zabbix server: Zabbix value cache working in low-memory mode | Once low-memory mode has been switched on, the value cache will remain in this state for 24 hours, even if the problem that triggered this mode is resolved sooner. |
last(/Remote Zabbix server health/vcache.cache.mode)=1 |High |
Manual close: Yes | |
Zabbix server: Wrong template assigned | Check that the template has been selected correctly. |
last(/Remote Zabbix server health/server_check)=0 |Disaster |
Manual close: Yes | |
Zabbix server: Version has changed | Zabbix server version has changed. Acknowledge to close the problem manually. |
last(/Remote Zabbix server health/version,#1)<>last(/Remote Zabbix server health/version,#2) and length(last(/Remote Zabbix server health/version))>0 |Info |
Manual close: Yes | |
Zabbix server: Excessive vmware cache usage | Consider increasing |
max(/Remote Zabbix server health/vmware.buffer.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"vmware cache"} |Average |
Manual close: Yes | |
Zabbix server: Excessive history cache usage | Consider increasing |
max(/Remote Zabbix server health/wcache.history.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"history cache"} |Average |
Manual close: Yes | |
Zabbix server: Excessive history index cache usage | Consider increasing |
max(/Remote Zabbix server health/wcache.index.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"index cache"} |Average |
Manual close: Yes | |
Zabbix server: Excessive trends cache usage | Consider increasing |
max(/Remote Zabbix server health/wcache.trend.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"trend cache"} |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Zabbix proxy discovery | LLD rule with item and trigger prototypes for proxy discovery. |
Dependent item | zabbix.proxy.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Proxy [{#PROXY.NAME}]: Stats | The statistics for the discovered proxy. |
Dependent item | zabbix.proxy.stats[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Mode | The mode of Zabbix proxy. |
Dependent item | zabbix.proxy.mode[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Unencrypted | The encryption status for connections from a proxy. |
Dependent item | zabbix.proxy.unencrypted[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: PSK | The encryption status for connections from a proxy. |
Dependent item | zabbix.proxy.psk[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Certificate | The encryption status for connections from a proxy. |
Dependent item | zabbix.proxy.cert[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Compression | The compression status of a proxy. |
Dependent item | zabbix.proxy.compression[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Item count | The number of enabled items on enabled hosts assigned to a proxy. |
Dependent item | zabbix.proxy.items[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Host count | The number of enabled hosts assigned to a proxy. |
Dependent item | zabbix.proxy.hosts[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Version | A version of Zabbix proxy. |
Dependent item | zabbix.proxy.version[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Last seen, in seconds | The time when a proxy was last seen by a server. |
Dependent item | zabbix.proxy.last_seen[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Compatibility | Version of proxy compared to Zabbix server version. Possible values: 0 - Undefined; 1 - Current version (proxy and server have the same major version); 2 - Outdated version (proxy version is older than server version, but is partially supported); 3 - Unsupported version (proxy version is older than server previous LTS release version or server major version is older than proxy major version). |
Dependent item | zabbix.proxy.compatibility[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Required VPS | The required performance of a proxy (the number of values that need to be collected per second). |
Dependent item | zabbix.proxy.requiredperformance[{#PROXY.NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy last seen | Zabbix proxy is not updating the configuration data. |
last(/Remote Zabbix server health/zabbix.proxy.last_seen[{#PROXY.NAME}],#1)>{$ZABBIX.PROXY.LAST_SEEN.MAX} |Warning |
||
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy never seen | Zabbix proxy is not updating the configuration data. |
last(/Remote Zabbix server health/zabbix.proxy.last_seen[{#PROXY.NAME}],#1)=-1 |Warning |
||
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy is outdated | Zabbix proxy version is older than server version, but is partially supported. Only data collection and remote execution is available. |
last(/Remote Zabbix server health/zabbix.proxy.compatibility[{#PROXY.NAME}],#1)=2 |Warning |
||
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy is not supported | Zabbix proxy version is older than server previous LTS release version or server major version is older than proxy major version. |
last(/Remote Zabbix server health/zabbix.proxy.compatibility[{#PROXY.NAME}],#1)=3 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
High availability cluster node discovery | LLD rule with item and trigger prototypes for node discovery. |
Dependent item | zabbix.nodes.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster node [{#NODE.NAME}]: Stats | Provides the statistics of a node. |
Dependent item | zabbix.node.stats[{#NODE.ID}] Preprocessing
|
Cluster node [{#NODE.NAME}]: Address | The IPv4 address of a node. |
Dependent item | zabbix.node.address[{#NODE.ID}] Preprocessing
|
Cluster node [{#NODE.NAME}]: Last access time | Last access time. |
Dependent item | zabbix.node.lastaccess.time[{#NODE.ID}] Preprocessing
|
Cluster node [{#NODE.NAME}]: Last access age | The time between the database's |
Dependent item | zabbix.node.lastaccess.age[{#NODE.ID}] Preprocessing
|
Cluster node [{#NODE.NAME}]: Status | The status of a node. |
Dependent item | zabbix.nodes.status[{#NODE.ID}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zabbix server: Cluster node [{#NODE.NAME}]: Status changed | The state of the node has changed. Acknowledge to close the problem manually. |
last(/Remote Zabbix server health/zabbix.nodes.status[{#NODE.ID}],#1)<>last(/Remote Zabbix server health/zabbix.nodes.status[{#NODE.ID}],#2) |Info |
Manual close: Yes |
This template is designed to monitor Zabbix server metrics via the passive Zabbix agent.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Specify the address of the remote Zabbix server by changing the {$ZABBIX.SERVER.ADDRESS}
and {$ZABBIX.SERVER.PORT}
macros. Don't forget to adjust the StatsAllowedIP
parameter in the remote server's configuration file to allow the collection of statistics.
Name | Description | Default |
---|---|---|
{$ZABBIX.SERVER.ADDRESS} | IP/DNS/network mask list of servers to be remotely queried (default is 127.0.0.1). |
|
{$ZABBIX.SERVER.PORT} | Port of server to be remotely queried (default is 10051). |
|
{$ZABBIX.PROXY.LAST_SEEN.MAX} | The maximum number of seconds that Zabbix proxy has not been seen. |
600 |
{$ZABBIX.SERVER.NODATA_TIMEOUT} | The time threshold after which statistics are considered unavailable. Used in trigger expressions. |
5m |
{$ZABBIX.SERVER.UTIL.MAX} | Default maximum threshold for percentage utilization triggers (use macro context for specification). |
75 |
{$ZABBIX.SERVER.UTIL.MIN} | Default minimum threshold for percentage utilization triggers (use macro context for specification). |
65 |
{$ZABBIX.SERVER.UTIL.MAX:"value cache"} | Maximum threshold for value cache utilization triggers. |
95 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Zabbix stats | The master item of Zabbix server statistics. |
Zabbix agent | zabbix.stats[{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT}] |
Zabbix proxies stats | The master item of Zabbix proxies' statistics. |
Dependent item | zabbix.proxies.stats Preprocessing
|
Queue over 10 minutes | The number of monitored items in the queue that are delayed by at least 10 minutes. |
Zabbix agent | zabbix.stats[{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT},queue,10m] Preprocessing
|
Queue | The number of monitored items in the queue that are delayed by at least 6 seconds. |
Zabbix agent | zabbix.stats[{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT},queue] Preprocessing
|
Utilization of alert manager internal processes, in % | The average percentage of the time during which the alert manager processes have been busy for the last minute. |
Dependent item | process.alert_manager.avg.busy Preprocessing
|
Utilization of alert syncer internal processes, in % | The average percentage of the time during which the alert syncer processes have been busy for the last minute. |
Dependent item | process.alert_syncer.avg.busy Preprocessing
|
Utilization of alerter internal processes, in % | The average percentage of the time during which the alerter processes have been busy for the last minute. |
Dependent item | process.alerter.avg.busy Preprocessing
|
Utilization of availability manager internal processes, in % | The average percentage of the time during which the availability manager processes have been busy for the last minute. |
Dependent item | process.availability_manager.avg.busy Preprocessing
|
Utilization of configuration syncer internal processes, in % | The average percentage of the time during which the configuration syncer processes have been busy for the last minute. |
Dependent item | process.configuration_syncer.avg.busy Preprocessing
|
Utilization of configuration syncer worker internal processes, in % | The average percentage of the time during which the configuration syncer worker processes have been busy for the last minute. |
Dependent item | process.configurationsyncerworker.avg.busy Preprocessing
|
Utilization of escalator internal processes, in % | The average percentage of the time during which the escalator processes have been busy for the last minute. |
Dependent item | process.escalator.avg.busy Preprocessing
|
Utilization of history poller internal processes, in % | The average percentage of the time during which the history poller processes have been busy for the last minute. |
Dependent item | process.history_poller.avg.busy Preprocessing
|
Utilization of ODBC poller data collector processes, in % | The average percentage of the time during which the ODBC poller processes have been busy for the last minute. |
Dependent item | process.odbc_poller.avg.busy Preprocessing
|
Utilization of history syncer internal processes, in % | The average percentage of the time during which the history syncer processes have been busy for the last minute. |
Dependent item | process.history_syncer.avg.busy Preprocessing
|
Utilization of housekeeper internal processes, in % | The average percentage of the time during which the housekeeper processes have been busy for the last minute. |
Dependent item | process.housekeeper.avg.busy Preprocessing
|
Utilization of http poller data collector processes, in % | The average percentage of the time during which the http poller processes have been busy for the last minute. |
Dependent item | process.http_poller.avg.busy Preprocessing
|
Utilization of icmp pinger data collector processes, in % | The average percentage of the time during which the icmp pinger processes have been busy for the last minute. |
Dependent item | process.icmp_pinger.avg.busy Preprocessing
|
Utilization of ipmi manager internal processes, in % | The average percentage of the time during which the ipmi manager processes have been busy for the last minute. |
Dependent item | process.ipmi_manager.avg.busy Preprocessing
|
Utilization of ipmi poller data collector processes, in % | The average percentage of the time during which the ipmi poller processes have been busy for the last minute. |
Dependent item | process.ipmi_poller.avg.busy Preprocessing
|
Utilization of java poller data collector processes, in % | The average percentage of the time during which the java poller processes have been busy for the last minute. |
Dependent item | process.java_poller.avg.busy Preprocessing
|
Utilization of LLD manager internal processes, in % | The average percentage of the time during which the LLD manager processes have been busy for the last minute. |
Dependent item | process.lld_manager.avg.busy Preprocessing
|
Utilization of LLD worker internal processes, in % | The average percentage of the time during which the LLD worker processes have been busy for the last minute. |
Dependent item | process.lld_worker.avg.busy Preprocessing
|
Utilization of connector manager internal processes, in % | The average percentage of the time during which the connector manager processes have been busy for the last minute. |
Dependent item | process.connector_manager.avg.busy Preprocessing
|
Utilization of connector worker internal processes, in % | The average percentage of the time during which the connector worker processes have been busy for the last minute. |
Dependent item | process.connector_worker.avg.busy Preprocessing
|
Utilization of discovery manager internal processes, in % | The average percentage of the time during which the discovery manager processes have been busy for the last minute. |
Dependent item | process.discovery_manager.avg.busy Preprocessing
|
Utilization of discovery worker internal processes, in % | The average percentage of the time during which the discovery worker processes have been busy for the last minute. |
Dependent item | process.discovery_worker.avg.busy Preprocessing
|
Utilization of poller data collector processes, in % | The average percentage of the time during which the poller processes have been busy for the last minute. |
Dependent item | process.poller.avg.busy Preprocessing
|
Utilization of preprocessing worker internal processes, in % | The average percentage of the time during which the preprocessing worker processes have been busy for the last minute. |
Dependent item | process.preprocessing_worker.avg.busy Preprocessing
|
Utilization of preprocessing manager internal processes, in % | The average percentage of the time during which the preprocessing manager processes have been busy for the last minute. |
Dependent item | process.preprocessing_manager.avg.busy Preprocessing
|
Utilization of proxy poller data collector processes, in % | The average percentage of the time during which the proxy poller processes have been busy for the last minute. |
Dependent item | process.proxy_poller.avg.busy Preprocessing
|
Utilization of proxy group manager internal processes, in % | The average percentage of the time during which the proxy group manager processes have been busy for the last minute. |
Dependent item | process.proxygroupmanager.avg.busy Preprocessing
|
Utilization of report manager internal processes, in % | The average percentage of the time during which the report manager processes have been busy for the last minute. |
Dependent item | process.report_manager.avg.busy Preprocessing
|
Utilization of report writer internal processes, in % | The average percentage of the time during which the report writer processes have been busy for the last minute. |
Dependent item | process.report_writer.avg.busy Preprocessing
|
Utilization of self-monitoring internal processes, in % | The average percentage of the time during which the self-monitoring processes have been busy for the last minute. |
Dependent item | process.self-monitoring.avg.busy Preprocessing
|
Utilization of snmp trapper data collector processes, in % | The average percentage of the time during which the snmp trapper processes have been busy for the last minute. |
Dependent item | process.snmp_trapper.avg.busy Preprocessing
|
Utilization of task manager internal processes, in % | The average percentage of the time during which the task manager processes have been busy for the last minute. |
Dependent item | process.task_manager.avg.busy Preprocessing
|
Utilization of timer internal processes, in % | The average percentage of the time during which the timer processes have been busy for the last minute. |
Dependent item | process.timer.avg.busy Preprocessing
|
Utilization of service manager internal processes, in % | The average percentage of the time during which the service manager processes have been busy for the last minute. |
Dependent item | process.service_manager.avg.busy Preprocessing
|
Utilization of trigger housekeeper internal processes, in % | The average percentage of the time during which the trigger housekeeper processes have been busy for the last minute. |
Dependent item | process.trigger_housekeeper.avg.busy Preprocessing
|
Utilization of trapper data collector processes, in % | The average percentage of the time during which the trapper processes have been busy for the last minute. |
Dependent item | process.trapper.avg.busy Preprocessing
|
Utilization of unreachable poller data collector processes, in % | The average percentage of the time during which the unreachable poller processes have been busy for the last minute. |
Dependent item | process.unreachable_poller.avg.busy Preprocessing
|
Utilization of vmware collector data collector processes, in % | The average percentage of the time during which the vmware collector processes have been busy for the last minute. |
Dependent item | process.vmware_collector.avg.busy Preprocessing
|
Utilization of agent poller data collector processes, in % | The average percentage of the time during which the agent poller processes have been busy for the last minute. |
Dependent item | process.agent_poller.avg.busy Preprocessing
|
Utilization of http agent poller data collector processes, in % | The average percentage of the time during which the http agent poller processes have been busy for the last minute. |
Dependent item | process.httpagentpoller.avg.busy Preprocessing
|
Utilization of snmp poller data collector processes, in % | The average percentage of the time during which the snmp poller processes have been busy for the last minute. |
Dependent item | process.snmp_poller.avg.busy Preprocessing
|
Utilization of internal poller data collector processes, in % | The average percentage of the time during which the internal poller processes have been busy for the last minute. |
Dependent item | process.internal_poller.avg.busy Preprocessing
|
Utilization of browser poller data collector processes, in % | The average percentage of the time during which the browser poller processes have been busy for the last minute. |
Dependent item | process.browser_poller.avg.busy Preprocessing
|
Configuration cache, % used | The availability statistics of Zabbix configuration cache. The percentage of used data buffer. |
Dependent item | rcache.buffer.pused Preprocessing
|
Trend function cache, % of unique requests | The effectiveness statistics of the Zabbix trend function cache. The percentage of cached items calculated from the sum of the cached items plus requests. A low percentage most likely means that the cache size can be reduced. |
Dependent item | tcache.pitems Preprocessing
|
Trend function cache, % of misses | The effectiveness statistics of the Zabbix trend function cache. The percentage of cache misses. |
Dependent item | tcache.pmisses Preprocessing
|
Value cache, % used | The availability statistics of Zabbix value cache. The percentage of used data buffer. |
Dependent item | vcache.buffer.pused Preprocessing
|
Value cache hits | The effectiveness statistics of Zabbix value cache. The number of cache hits (history values taken from the cache). |
Dependent item | vcache.cache.hits Preprocessing
|
Value cache misses | The effectiveness statistics of Zabbix value cache. The number of cache misses (history values taken from the database). |
Dependent item | vcache.cache.misses Preprocessing
|
Value cache operating mode | The operating mode of the value cache. |
Dependent item | vcache.cache.mode Preprocessing
|
Zabbix server check | Flag indicating whether it is a server or not. |
Dependent item | server_check Preprocessing
|
Version | The version of Zabbix server. |
Dependent item | version Preprocessing
|
VMware cache, % used | The availability statistics of Zabbix vmware cache. The percentage of used data buffer. |
Dependent item | vmware.buffer.pused Preprocessing
|
History write cache, % used | The statistics and availability of Zabbix write cache. The percentage of used history buffer. The history cache is used to store item values. A high number indicates database performance problems. |
Dependent item | wcache.history.pused Preprocessing
|
History index cache, % used | The statistics and availability of Zabbix write cache. The percentage of used history index buffer. The history index cache is used to index values stored in the history cache. |
Dependent item | wcache.index.pused Preprocessing
|
Trend write cache, % used | The statistics and availability of Zabbix write cache. The percentage of used trend buffer. The trend cache stores the aggregate of all items that have received data for the current hour. |
Dependent item | wcache.trend.pused Preprocessing
|
Number of processed values per second | The statistics and availability of Zabbix write cache. The total number of values processed by Zabbix server or Zabbix proxy, except unsupported items. |
Dependent item | wcache.values Preprocessing
|
Number of processed numeric (unsigned) values per second | The statistics and availability of Zabbix write cache. The number of processed numeric (unsigned) values. |
Dependent item | wcache.values.uint Preprocessing
|
Number of processed numeric (float) values per second | The statistics and availability of Zabbix write cache. The number of processed numeric (float) values. |
Dependent item | wcache.values.float Preprocessing
|
Number of processed log values per second | The statistics and availability of Zabbix write cache. The number of processed log values. |
Dependent item | wcache.values.log Preprocessing
|
Number of processed not supported values per second | The statistics and availability of Zabbix write cache. The number of times the item processing resulted in an item becoming unsupported or remaining in that state. |
Dependent item | wcache.values.not_supported Preprocessing
|
Number of processed character values per second | The statistics and availability of Zabbix write cache. The number of processed character values. |
Dependent item | wcache.values.str Preprocessing
|
Number of processed text values per second | The statistics and availability of Zabbix write cache. The number of processed text values. |
Dependent item | wcache.values.text Preprocessing
|
Number of values synchronized with the database per second | Average quantity of values written to the database, recalculated once per minute. |
Dependent item | vps.written Preprocessing
|
LLD queue | The number of values enqueued in the low-level discovery processing queue. |
Dependent item | lld_queue Preprocessing
|
Preprocessing queue | The number of values enqueued in the preprocessing queue. |
Dependent item | preprocessing_queue Preprocessing
|
Preprocessing throughput | Reflects the throughput of the preprocessing. |
Dependent item | preprocessing_throughput Preprocessing
|
Connector queue | The count of values enqueued in the connector queue. |
Dependent item | connector_queue Preprocessing
|
Discovery queue | The count of values enqueued in the discovery queue. |
Dependent item | discovery_queue Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zabbix server: More than 100 items have been missing data for over 10 minutes | Indicates potential issues with network connectivity, agent failures, or unresponsive monitored resources that require attention. |
min(/Zabbix server health by Zabbix agent/zabbix.stats[{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT},queue,10m],10m)>100 |Warning |
Manual close: Yes | |
Zabbix server: Utilization of alert manager processes is high | Indicates potential performance issues with the alert manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.alert_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"alert manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of alert syncer processes is high | Indicates potential performance issues with the alert syncer, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.alert_syncer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"alert syncer"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of alerter processes is high | Indicates potential performance issues with the alerter, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.alerter.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"alerter"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of availability manager processes is high | Indicates potential performance issues with the availability manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.availability_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"availability manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of configuration syncer processes is high | Indicates potential performance issues with the configuration syncer, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.configuration_syncer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"configuration syncer"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of configuration syncer worker processes is high | Indicates potential performance issues with the configuration syncer worker, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.configuration_syncer_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"configuration syncer worker"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of escalator processes is high | Indicates potential performance issues with the escalator, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.escalator.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"escalator"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of history poller processes is high | Indicates potential performance issues with the history poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.history_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"history poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of ODBC poller processes is high | Indicates potential performance issues with the ODBC poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.odbc_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"ODBC poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of history syncer processes is high | Indicates potential performance issues with the history syncer, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.history_syncer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"history syncer"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of housekeeper processes is high | Indicates potential performance issues with the housekeeper, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.housekeeper.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"housekeeper"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of http poller processes is high | Indicates potential performance issues with the http poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.http_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"http poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of icmp pinger processes is high | Indicates potential performance issues with the icmp pinger, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.icmp_pinger.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"icmp pinger"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of ipmi manager processes is high | Indicates potential performance issues with the ipmi manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.ipmi_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"ipmi manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of ipmi poller processes is high | Indicates potential performance issues with the ipmi poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.ipmi_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"ipmi poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of java poller processes is high | Indicates potential performance issues with the java poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.java_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"java poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of LLD manager processes is high | Indicates potential performance issues with the LLD manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.lld_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"LLD manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of LLD worker processes is high | Indicates potential performance issues with the LLD worker, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.lld_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"LLD worker"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of connector manager processes is high | Indicates potential performance issues with the connector manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.connector_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"connector manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of connector worker processes is high | Indicates potential performance issues with the connector worker, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.connector_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"connector worker"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of discovery manager processes is high | Indicates potential performance issues with the discovery manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.discovery_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"discovery manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of discovery worker processes is high | Indicates potential performance issues with the discovery worker, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.discovery_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"discovery worker"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of poller processes is high | Indicates potential performance issues with the poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of preprocessing worker processes is high | Indicates potential performance issues with the preprocessing worker, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.preprocessing_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"preprocessing worker"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of preprocessing manager processes is high | Indicates potential performance issues with the preprocessing manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.preprocessing_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"preprocessing manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of proxy poller processes is high | Indicates potential performance issues with the proxy poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.proxy_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"proxy poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of proxy group manager processes is high | Indicates potential performance issues with the proxy group manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.proxy_group_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"proxy group manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of report manager processes is high | Indicates potential performance issues with the report manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.report_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"report manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of report writer processes is high | Indicates potential performance issues with the report writer, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.report_writer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"report writer"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of self-monitoring processes is high | Indicates potential performance issues with the self-monitoring, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.self-monitoring.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"self-monitoring"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of snmp trapper processes is high | Indicates potential performance issues with the snmp trapper, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.snmp_trapper.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"snmp trapper"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of task manager processes is high | Indicates potential performance issues with the task manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.task_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"task manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of timer processes is high | Indicates potential performance issues with the timer, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.timer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"timer"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of service manager processes is high | Indicates potential performance issues with the service manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.service_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"service manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of trigger housekeeper processes is high | Indicates potential performance issues with the trigger housekeeper, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.trigger_housekeeper.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"trigger housekeeper"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of trapper processes is high | Indicates potential performance issues with the trapper, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.trapper.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"trapper"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of unreachable poller processes is high | Indicates potential performance issues with the unreachable poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.unreachable_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"unreachable poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of vmware collector processes is high | Indicates potential performance issues with the vmware collector, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.vmware_collector.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"vmware collector"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of agent poller processes is high | Indicates potential performance issues with the agent poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.agent_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"agent poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of http agent poller processes is high | Indicates potential performance issues with the http agent poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.http_agent_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"http agent poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of snmp poller processes is high | Indicates potential performance issues with the snmp poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.snmp_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"snmp poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of internal poller processes is high | Indicates potential performance issues with the internal poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.internal_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"internal poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of browser poller processes is high | Indicates potential performance issues with the browser poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent/process.browser_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"browser poller"} |Average |
Manual close: Yes | |
Zabbix server: Excessive configuration cache usage | Consider increasing |
max(/Zabbix server health by Zabbix agent/rcache.buffer.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"configuration cache"} |Average |
Manual close: Yes | |
Zabbix server: Failed to fetch stats data | Zabbix has not received statistics data for |
nodata(/Zabbix server health by Zabbix agent/rcache.buffer.pused,{$ZABBIX.SERVER.NODATA_TIMEOUT})=1 |Warning |
||
Zabbix server: Excessive value cache usage | Consider increasing |
max(/Zabbix server health by Zabbix agent/vcache.buffer.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"value cache"} |Average |
Manual close: Yes | |
Zabbix server: Zabbix value cache working in low-memory mode | Once low-memory mode has been switched on, the value cache will remain in this state for 24 hours, even if the problem that triggered this mode is resolved sooner. |
last(/Zabbix server health by Zabbix agent/vcache.cache.mode)=1 |High |
Manual close: Yes | |
Zabbix server: Wrong template assigned | Check that the template has been selected correctly. |
last(/Zabbix server health by Zabbix agent/server_check)=0 |Disaster |
Manual close: Yes | |
Zabbix server: Version has changed | Zabbix server version has changed. Acknowledge to close the problem manually. |
last(/Zabbix server health by Zabbix agent/version,#1)<>last(/Zabbix server health by Zabbix agent/version,#2) and length(last(/Zabbix server health by Zabbix agent/version))>0 |Info |
Manual close: Yes | |
Zabbix server: Excessive vmware cache usage | Consider increasing |
max(/Zabbix server health by Zabbix agent/vmware.buffer.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"vmware cache"} |Average |
Manual close: Yes | |
Zabbix server: Excessive history cache usage | Consider increasing |
max(/Zabbix server health by Zabbix agent/wcache.history.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"history cache"} |Average |
Manual close: Yes | |
Zabbix server: Excessive history index cache usage | Consider increasing |
max(/Zabbix server health by Zabbix agent/wcache.index.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"index cache"} |Average |
Manual close: Yes | |
Zabbix server: Excessive trends cache usage | Consider increasing |
max(/Zabbix server health by Zabbix agent/wcache.trend.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"trend cache"} |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Zabbix proxy discovery | LLD rule with item and trigger prototypes for proxy discovery. |
Dependent item | zabbix.proxy.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Proxy [{#PROXY.NAME}]: Stats | The statistics for the discovered proxy. |
Dependent item | zabbix.proxy.stats[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Mode | The mode of Zabbix proxy. |
Dependent item | zabbix.proxy.mode[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Unencrypted | The encryption status for connections from a proxy. |
Dependent item | zabbix.proxy.unencrypted[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: PSK | The encryption status for connections from a proxy. |
Dependent item | zabbix.proxy.psk[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Certificate | The encryption status for connections from a proxy. |
Dependent item | zabbix.proxy.cert[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Compression | The compression status of a proxy. |
Dependent item | zabbix.proxy.compression[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Item count | The number of enabled items on enabled hosts assigned to a proxy. |
Dependent item | zabbix.proxy.items[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Host count | The number of enabled hosts assigned to a proxy. |
Dependent item | zabbix.proxy.hosts[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Version | A version of Zabbix proxy. |
Dependent item | zabbix.proxy.version[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Last seen, in seconds | The time when a proxy was last seen by a server. |
Dependent item | zabbix.proxy.last_seen[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Compatibility | Version of proxy compared to Zabbix server version. Possible values: 0 - Undefined; 1 - Current version (proxy and server have the same major version); 2 - Outdated version (proxy version is older than server version, but is partially supported); 3 - Unsupported version (proxy version is older than server previous LTS release version or server major version is older than proxy major version). |
Dependent item | zabbix.proxy.compatibility[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Required VPS | The required performance of a proxy (the number of values that need to be collected per second). |
Dependent item | zabbix.proxy.requiredperformance[{#PROXY.NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy last seen | Zabbix proxy is not updating the configuration data. |
last(/Zabbix server health by Zabbix agent/zabbix.proxy.last_seen[{#PROXY.NAME}],#1)>{$ZABBIX.PROXY.LAST_SEEN.MAX} |Warning |
||
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy never seen | Zabbix proxy is not updating the configuration data. |
last(/Zabbix server health by Zabbix agent/zabbix.proxy.last_seen[{#PROXY.NAME}],#1)=-1 |Warning |
||
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy is outdated | Zabbix proxy version is older than server version, but is partially supported. Only data collection and remote execution is available. |
last(/Zabbix server health by Zabbix agent/zabbix.proxy.compatibility[{#PROXY.NAME}],#1)=2 |Warning |
||
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy is not supported | Zabbix proxy version is older than server previous LTS release version or server major version is older than proxy major version. |
last(/Zabbix server health by Zabbix agent/zabbix.proxy.compatibility[{#PROXY.NAME}],#1)=3 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
High availability cluster node discovery | LLD rule with item and trigger prototypes for node discovery. |
Dependent item | zabbix.nodes.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster node [{#NODE.NAME}]: Stats | Provides the statistics of a node. |
Dependent item | zabbix.node.stats[{#NODE.ID}] Preprocessing
|
Cluster node [{#NODE.NAME}]: Address | The IPv4 address of a node. |
Dependent item | zabbix.node.address[{#NODE.ID}] Preprocessing
|
Cluster node [{#NODE.NAME}]: Last access time | Last access time. |
Dependent item | zabbix.node.lastaccess.time[{#NODE.ID}] Preprocessing
|
Cluster node [{#NODE.NAME}]: Last access age | The time between the database's |
Dependent item | zabbix.node.lastaccess.age[{#NODE.ID}] Preprocessing
|
Cluster node [{#NODE.NAME}]: Status | The status of a node. |
Dependent item | zabbix.nodes.status[{#NODE.ID}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zabbix server: Cluster node [{#NODE.NAME}]: Status changed | The state of the node has changed. Acknowledge to close the problem manually. |
last(/Zabbix server health by Zabbix agent/zabbix.nodes.status[{#NODE.ID}],#1)<>last(/Zabbix server health by Zabbix agent/zabbix.nodes.status[{#NODE.ID}],#2) |Info |
Manual close: Yes |
This template is designed to monitor Zabbix server metrics via the active Zabbix agent.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Specify the address of the remote Zabbix server by changing the {$ZABBIX.SERVER.ADDRESS}
and {$ZABBIX.SERVER.PORT}
macros. Don't forget to adjust the StatsAllowedIP
parameter in the remote server's configuration file to allow the collection of statistics.
Name | Description | Default |
---|---|---|
{$ZABBIX.SERVER.ADDRESS} | IP/DNS/network mask list of servers to be remotely queried (default is 127.0.0.1). |
|
{$ZABBIX.SERVER.PORT} | Port of server to be remotely queried (default is 10051). |
|
{$ZABBIX.PROXY.LAST_SEEN.MAX} | The maximum number of seconds that Zabbix proxy has not been seen. |
600 |
{$ZABBIX.SERVER.NODATA_TIMEOUT} | The time threshold after which statistics are considered unavailable. Used in trigger expression. |
5m |
{$ZABBIX.SERVER.UTIL.MAX} | Default maximum threshold for percentage utilization triggers (use macro context for specification). |
75 |
{$ZABBIX.SERVER.UTIL.MIN} | Default minimum threshold for percentage utilization triggers (use macro context for specification). |
65 |
{$ZABBIX.SERVER.UTIL.MAX:"value cache"} | Maximum threshold for value cache utilization triggers. |
95 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Zabbix stats | The master item of Zabbix server statistics. |
Zabbix agent (active) | zabbix.stats[{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT}] |
Zabbix proxies stats | The master item of Zabbix proxies' statistics. |
Dependent item | zabbix.proxies.stats Preprocessing
|
Queue over 10 minutes | The number of monitored items in the queue that are delayed by at least 10 minutes. |
Zabbix agent (active) | zabbix.stats[{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT},queue,10m] Preprocessing
|
Queue | The number of monitored items in the queue that are delayed by at least 6 seconds. |
Zabbix agent (active) | zabbix.stats[{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT},queue] Preprocessing
|
Utilization of alert manager internal processes, in % | The average percentage of the time during which the alert manager processes have been busy for the last minute. |
Dependent item | process.alert_manager.avg.busy Preprocessing
|
Utilization of alert syncer internal processes, in % | The average percentage of the time during which the alert syncer processes have been busy for the last minute. |
Dependent item | process.alert_syncer.avg.busy Preprocessing
|
Utilization of alerter internal processes, in % | The average percentage of the time during which the alerter processes have been busy for the last minute. |
Dependent item | process.alerter.avg.busy Preprocessing
|
Utilization of availability manager internal processes, in % | The average percentage of the time during which the availability manager processes have been busy for the last minute. |
Dependent item | process.availability_manager.avg.busy Preprocessing
|
Utilization of configuration syncer internal processes, in % | The average percentage of the time during which the configuration syncer processes have been busy for the last minute. |
Dependent item | process.configuration_syncer.avg.busy Preprocessing
|
Utilization of configuration syncer worker internal processes, in % | The average percentage of the time during which the configuration syncer worker processes have been busy for the last minute. |
Dependent item | process.configurationsyncerworker.avg.busy Preprocessing
|
Utilization of escalator internal processes, in % | The average percentage of the time during which the escalator processes have been busy for the last minute. |
Dependent item | process.escalator.avg.busy Preprocessing
|
Utilization of history poller internal processes, in % | The average percentage of the time during which the history poller processes have been busy for the last minute. |
Dependent item | process.history_poller.avg.busy Preprocessing
|
Utilization of ODBC poller data collector processes, in % | The average percentage of the time during which the ODBC poller processes have been busy for the last minute. |
Dependent item | process.odbc_poller.avg.busy Preprocessing
|
Utilization of history syncer internal processes, in % | The average percentage of the time during which the history syncer processes have been busy for the last minute. |
Dependent item | process.history_syncer.avg.busy Preprocessing
|
Utilization of housekeeper internal processes, in % | The average percentage of the time during which the housekeeper processes have been busy for the last minute. |
Dependent item | process.housekeeper.avg.busy Preprocessing
|
Utilization of http poller data collector processes, in % | The average percentage of the time during which the http poller processes have been busy for the last minute. |
Dependent item | process.http_poller.avg.busy Preprocessing
|
Utilization of icmp pinger data collector processes, in % | The average percentage of the time during which the icmp pinger processes have been busy for the last minute. |
Dependent item | process.icmp_pinger.avg.busy Preprocessing
|
Utilization of ipmi manager internal processes, in % | The average percentage of the time during which the ipmi manager processes have been busy for the last minute. |
Dependent item | process.ipmi_manager.avg.busy Preprocessing
|
Utilization of ipmi poller data collector processes, in % | The average percentage of the time during which the ipmi poller processes have been busy for the last minute. |
Dependent item | process.ipmi_poller.avg.busy Preprocessing
|
Utilization of java poller data collector processes, in % | The average percentage of the time during which the java poller processes have been busy for the last minute. |
Dependent item | process.java_poller.avg.busy Preprocessing
|
Utilization of LLD manager internal processes, in % | The average percentage of the time during which the LLD manager processes have been busy for the last minute. |
Dependent item | process.lld_manager.avg.busy Preprocessing
|
Utilization of LLD worker internal processes, in % | The average percentage of the time during which the LLD worker processes have been busy for the last minute. |
Dependent item | process.lld_worker.avg.busy Preprocessing
|
Utilization of connector manager internal processes, in % | The average percentage of the time during which the connector manager processes have been busy for the last minute. |
Dependent item | process.connector_manager.avg.busy Preprocessing
|
Utilization of connector worker internal processes, in % | The average percentage of the time during which the connector worker processes have been busy for the last minute. |
Dependent item | process.connector_worker.avg.busy Preprocessing
|
Utilization of discovery manager internal processes, in % | The average percentage of the time during which the discovery manager processes have been busy for the last minute. |
Dependent item | process.discovery_manager.avg.busy Preprocessing
|
Utilization of discovery worker internal processes, in % | The average percentage of the time during which the discovery worker processes have been busy for the last minute. |
Dependent item | process.discovery_worker.avg.busy Preprocessing
|
Utilization of poller data collector processes, in % | The average percentage of the time during which the poller processes have been busy for the last minute. |
Dependent item | process.poller.avg.busy Preprocessing
|
Utilization of preprocessing worker internal processes, in % | The average percentage of the time during which the preprocessing worker processes have been busy for the last minute. |
Dependent item | process.preprocessing_worker.avg.busy Preprocessing
|
Utilization of preprocessing manager internal processes, in % | The average percentage of the time during which the preprocessing manager processes have been busy for the last minute. |
Dependent item | process.preprocessing_manager.avg.busy Preprocessing
|
Utilization of proxy poller data collector processes, in % | The average percentage of the time during which the proxy poller processes have been busy for the last minute. |
Dependent item | process.proxy_poller.avg.busy Preprocessing
|
Utilization of proxy group manager internal processes, in % | The average percentage of the time during which the proxy group manager processes have been busy for the last minute. |
Dependent item | process.proxygroupmanager.avg.busy Preprocessing
|
Utilization of report manager internal processes, in % | The average percentage of the time during which the report manager processes have been busy for the last minute. |
Dependent item | process.report_manager.avg.busy Preprocessing
|
Utilization of report writer internal processes, in % | The average percentage of the time during which the report writer processes have been busy for the last minute. |
Dependent item | process.report_writer.avg.busy Preprocessing
|
Utilization of self-monitoring internal processes, in % | The average percentage of the time during which the self-monitoring processes have been busy for the last minute. |
Dependent item | process.self-monitoring.avg.busy Preprocessing
|
Utilization of snmp trapper data collector processes, in % | The average percentage of the time during which the snmp trapper processes have been busy for the last minute. |
Dependent item | process.snmp_trapper.avg.busy Preprocessing
|
Utilization of task manager internal processes, in % | The average percentage of the time during which the task manager processes have been busy for the last minute. |
Dependent item | process.task_manager.avg.busy Preprocessing
|
Utilization of timer internal processes, in % | The average percentage of the time during which the timer processes have been busy for the last minute. |
Dependent item | process.timer.avg.busy Preprocessing
|
Utilization of service manager internal processes, in % | The average percentage of the time during which the service manager processes have been busy for the last minute. |
Dependent item | process.service_manager.avg.busy Preprocessing
|
Utilization of trigger housekeeper internal processes, in % | The average percentage of the time during which the trigger housekeeper processes have been busy for the last minute. |
Dependent item | process.trigger_housekeeper.avg.busy Preprocessing
|
Utilization of trapper data collector processes, in % | The average percentage of the time during which the trapper processes have been busy for the last minute. |
Dependent item | process.trapper.avg.busy Preprocessing
|
Utilization of unreachable poller data collector processes, in % | The average percentage of the time during which the unreachable poller processes have been busy for the last minute. |
Dependent item | process.unreachable_poller.avg.busy Preprocessing
|
Utilization of vmware collector data collector processes, in % | The average percentage of the time during which the vmware collector processes have been busy for the last minute. |
Dependent item | process.vmware_collector.avg.busy Preprocessing
|
Utilization of agent poller data collector processes, in % | The average percentage of the time during which the agent poller processes have been busy for the last minute. |
Dependent item | process.agent_poller.avg.busy Preprocessing
|
Utilization of http agent poller data collector processes, in % | The average percentage of the time during which the http agent poller processes have been busy for the last minute. |
Dependent item | process.httpagentpoller.avg.busy Preprocessing
|
Utilization of snmp poller data collector processes, in % | The average percentage of the time during which the snmp poller processes have been busy for the last minute. |
Dependent item | process.snmp_poller.avg.busy Preprocessing
|
Utilization of internal poller data collector processes, in % | The average percentage of the time during which the internal poller processes have been busy for the last minute. |
Dependent item | process.internal_poller.avg.busy Preprocessing
|
Utilization of browser poller data collector processes, in % | The average percentage of the time during which the browser poller processes have been busy for the last minute. |
Dependent item | process.browser_poller.avg.busy Preprocessing
|
Configuration cache, % used | The availability statistics of Zabbix configuration cache. The percentage of used data buffer. |
Dependent item | rcache.buffer.pused Preprocessing
|
Trend function cache, % of unique requests | The effectiveness statistics of the Zabbix trend function cache. The percentage of cached items calculated from the sum of the cached items plus requests. A low percentage most likely means that the cache size can be reduced. |
Dependent item | tcache.pitems Preprocessing
|
Trend function cache, % of misses | The effectiveness statistics of the Zabbix trend function cache. The percentage of cache misses. |
Dependent item | tcache.pmisses Preprocessing
|
Value cache, % used | The availability statistics of Zabbix value cache. The percentage of used data buffer. |
Dependent item | vcache.buffer.pused Preprocessing
|
Value cache hits | The effectiveness statistics of Zabbix value cache. The number of cache hits (history values taken from the cache). |
Dependent item | vcache.cache.hits Preprocessing
|
Value cache misses | The effectiveness statistics of Zabbix value cache. The number of cache misses (history values taken from the database). |
Dependent item | vcache.cache.misses Preprocessing
|
Value cache operating mode | The operating mode of the value cache. |
Dependent item | vcache.cache.mode Preprocessing
|
Zabbix server check | Flag indicating whether it is a server or not. |
Dependent item | server_check Preprocessing
|
Version | The version of Zabbix server. |
Dependent item | version Preprocessing
|
VMware cache, % used | The availability statistics of Zabbix vmware cache. The percentage of used data buffer. |
Dependent item | vmware.buffer.pused Preprocessing
|
History write cache, % used | The statistics and availability of Zabbix write cache. The percentage of used history buffer. The history cache is used to store item values. A high number indicates database performance problems. |
Dependent item | wcache.history.pused Preprocessing
|
History index cache, % used | The statistics and availability of Zabbix write cache. The percentage of used history index buffer. The history index cache is used to index values stored in the history cache. |
Dependent item | wcache.index.pused Preprocessing
|
Trend write cache, % used | The statistics and availability of Zabbix write cache. The percentage of used trend buffer. The trend cache stores the aggregate of all items that have received data for the current hour. |
Dependent item | wcache.trend.pused Preprocessing
|
Number of processed values per second | The statistics and availability of Zabbix write cache. The total number of values processed by Zabbix server or Zabbix proxy, except unsupported items. |
Dependent item | wcache.values Preprocessing
|
Number of processed numeric (unsigned) values per second | The statistics and availability of Zabbix write cache. The number of processed numeric (unsigned) values. |
Dependent item | wcache.values.uint Preprocessing
|
Number of processed numeric (float) values per second | The statistics and availability of Zabbix write cache. The number of processed numeric (float) values. |
Dependent item | wcache.values.float Preprocessing
|
Number of processed log values per second | The statistics and availability of Zabbix write cache. The number of processed log values. |
Dependent item | wcache.values.log Preprocessing
|
Number of processed not supported values per second | The statistics and availability of Zabbix write cache. The number of times the item processing resulted in an item becoming unsupported or remaining in that state. |
Dependent item | wcache.values.not_supported Preprocessing
|
Number of processed character values per second | The statistics and availability of Zabbix write cache. The number of processed character values. |
Dependent item | wcache.values.str Preprocessing
|
Number of processed text values per second | The statistics and availability of Zabbix write cache. The number of processed text values. |
Dependent item | wcache.values.text Preprocessing
|
Number of values synchronized with the database per second | Average quantity of values written to the database, recalculated once per minute. |
Dependent item | vps.written Preprocessing
|
LLD queue | The number of values enqueued in the low-level discovery processing queue. |
Dependent item | lld_queue Preprocessing
|
Preprocessing queue | The number of values enqueued in the preprocessing queue. |
Dependent item | preprocessing_queue Preprocessing
|
Preprocessing throughput | Reflects the throughput of the preprocessing. |
Dependent item | preprocessing_throughput Preprocessing
|
Connector queue | The count of values enqueued in the connector queue. |
Dependent item | connector_queue Preprocessing
|
Discovery queue | The count of values enqueued in the discovery queue. |
Dependent item | discovery_queue Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zabbix server: More than 100 items have been missing data for over 10 minutes | Indicates potential issues with network connectivity, agent failures, or unresponsive monitored resources that require attention. |
min(/Zabbix server health by Zabbix agent active/zabbix.stats[{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT},queue,10m],10m)>100 |Warning |
Manual close: Yes | |
Zabbix server: Utilization of alert manager processes is high | Indicates potential performance issues with the alert manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.alert_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"alert manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of alert syncer processes is high | Indicates potential performance issues with the alert syncer, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.alert_syncer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"alert syncer"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of alerter processes is high | Indicates potential performance issues with the alerter, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.alerter.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"alerter"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of availability manager processes is high | Indicates potential performance issues with the availability manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.availability_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"availability manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of configuration syncer processes is high | Indicates potential performance issues with the configuration syncer, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.configuration_syncer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"configuration syncer"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of configuration syncer worker processes is high | Indicates potential performance issues with the configuration syncer worker, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.configuration_syncer_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"configuration syncer worker"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of escalator processes is high | Indicates potential performance issues with the escalator, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.escalator.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"escalator"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of history poller processes is high | Indicates potential performance issues with the history poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.history_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"history poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of ODBC poller processes is high | Indicates potential performance issues with the ODBC poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.odbc_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"ODBC poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of history syncer processes is high | Indicates potential performance issues with the history syncer, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.history_syncer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"history syncer"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of housekeeper processes is high | Indicates potential performance issues with the housekeeper, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.housekeeper.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"housekeeper"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of http poller processes is high | Indicates potential performance issues with the http poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.http_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"http poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of icmp pinger processes is high | Indicates potential performance issues with the icmp pinger, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.icmp_pinger.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"icmp pinger"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of ipmi manager processes is high | Indicates potential performance issues with the ipmi manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.ipmi_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"ipmi manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of ipmi poller processes is high | Indicates potential performance issues with the ipmi poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.ipmi_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"ipmi poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of java poller processes is high | Indicates potential performance issues with the java poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.java_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"java poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of LLD manager processes is high | Indicates potential performance issues with the LLD manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.lld_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"LLD manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of LLD worker processes is high | Indicates potential performance issues with the LLD worker, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.lld_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"LLD worker"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of connector manager processes is high | Indicates potential performance issues with the connector manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.connector_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"connector manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of connector worker processes is high | Indicates potential performance issues with the connector worker, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.connector_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"connector worker"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of discovery manager processes is high | Indicates potential performance issues with the discovery manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.discovery_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"discovery manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of discovery worker processes is high | Indicates potential performance issues with the discovery worker, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.discovery_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"discovery worker"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of poller processes is high | Indicates potential performance issues with the poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of preprocessing worker processes is high | Indicates potential performance issues with the preprocessing worker, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.preprocessing_worker.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"preprocessing worker"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of preprocessing manager processes is high | Indicates potential performance issues with the preprocessing manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.preprocessing_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"preprocessing manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of proxy poller processes is high | Indicates potential performance issues with the proxy poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.proxy_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"proxy poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of proxy group manager processes is high | Indicates potential performance issues with the proxy group manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.proxy_group_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"proxy group manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of report manager processes is high | Indicates potential performance issues with the report manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.report_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"report manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of report writer processes is high | Indicates potential performance issues with the report writer, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.report_writer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"report writer"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of self-monitoring processes is high | Indicates potential performance issues with the self-monitoring, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.self-monitoring.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"self-monitoring"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of snmp trapper processes is high | Indicates potential performance issues with the snmp trapper, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.snmp_trapper.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"snmp trapper"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of task manager processes is high | Indicates potential performance issues with the task manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.task_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"task manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of timer processes is high | Indicates potential performance issues with the timer, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.timer.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"timer"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of service manager processes is high | Indicates potential performance issues with the service manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.service_manager.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"service manager"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of trigger housekeeper processes is high | Indicates potential performance issues with the trigger housekeeper, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.trigger_housekeeper.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"trigger housekeeper"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of trapper processes is high | Indicates potential performance issues with the trapper, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.trapper.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"trapper"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of unreachable poller processes is high | Indicates potential performance issues with the unreachable poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.unreachable_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"unreachable poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of vmware collector processes is high | Indicates potential performance issues with the vmware collector, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.vmware_collector.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"vmware collector"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of agent poller processes is high | Indicates potential performance issues with the agent poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.agent_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"agent poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of http agent poller processes is high | Indicates potential performance issues with the http agent poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.http_agent_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"http agent poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of snmp poller processes is high | Indicates potential performance issues with the snmp poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.snmp_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"snmp poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of internal poller processes is high | Indicates potential performance issues with the internal poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.internal_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"internal poller"} |Average |
Manual close: Yes | |
Zabbix server: Utilization of browser poller processes is high | Indicates potential performance issues with the browser poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix server health by Zabbix agent active/process.browser_poller.avg.busy,10m)>{$ZABBIX.SERVER.UTIL.MAX:"browser poller"} |Average |
Manual close: Yes | |
Zabbix server: Excessive configuration cache usage | Consider increasing |
max(/Zabbix server health by Zabbix agent active/rcache.buffer.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"configuration cache"} |Average |
Manual close: Yes | |
Zabbix server: Failed to fetch stats data | Zabbix has not received statistics data for |
nodata(/Zabbix server health by Zabbix agent active/rcache.buffer.pused,{$ZABBIX.SERVER.NODATA_TIMEOUT})=1 |Warning |
||
Zabbix server: Excessive value cache usage | Consider increasing |
max(/Zabbix server health by Zabbix agent active/vcache.buffer.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"value cache"} |Average |
Manual close: Yes | |
Zabbix server: Zabbix value cache working in low-memory mode | Once low-memory mode has been switched on, the value cache will remain in this state for 24 hours, even if the problem that triggered this mode is resolved sooner. |
last(/Zabbix server health by Zabbix agent active/vcache.cache.mode)=1 |High |
Manual close: Yes | |
Zabbix server: Wrong template assigned | Check that the template has been selected correctly. |
last(/Zabbix server health by Zabbix agent active/server_check)=0 |Disaster |
Manual close: Yes | |
Zabbix server: Version has changed | Zabbix server version has changed. Acknowledge to close the problem manually. |
last(/Zabbix server health by Zabbix agent active/version,#1)<>last(/Zabbix server health by Zabbix agent active/version,#2) and length(last(/Zabbix server health by Zabbix agent active/version))>0 |Info |
Manual close: Yes | |
Zabbix server: Excessive vmware cache usage | Consider increasing |
max(/Zabbix server health by Zabbix agent active/vmware.buffer.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"vmware cache"} |Average |
Manual close: Yes | |
Zabbix server: Excessive history cache usage | Consider increasing |
max(/Zabbix server health by Zabbix agent active/wcache.history.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"history cache"} |Average |
Manual close: Yes | |
Zabbix server: Excessive history index cache usage | Consider increasing |
max(/Zabbix server health by Zabbix agent active/wcache.index.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"index cache"} |Average |
Manual close: Yes | |
Zabbix server: Excessive trends cache usage | Consider increasing |
max(/Zabbix server health by Zabbix agent active/wcache.trend.pused,10m)>{$ZABBIX.SERVER.UTIL.MAX:"trend cache"} |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Zabbix proxy discovery | LLD rule with item and trigger prototypes for proxy discovery. |
Dependent item | zabbix.proxy.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Proxy [{#PROXY.NAME}]: Stats | The statistics for the discovered proxy. |
Dependent item | zabbix.proxy.stats[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Mode | The mode of Zabbix proxy. |
Dependent item | zabbix.proxy.mode[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Unencrypted | The encryption status for connections from a proxy. |
Dependent item | zabbix.proxy.unencrypted[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: PSK | The encryption status for connections from a proxy. |
Dependent item | zabbix.proxy.psk[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Certificate | The encryption status for connections from a proxy. |
Dependent item | zabbix.proxy.cert[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Compression | The compression status of a proxy. |
Dependent item | zabbix.proxy.compression[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Item count | The number of enabled items on enabled hosts assigned to a proxy. |
Dependent item | zabbix.proxy.items[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Host count | The number of enabled hosts assigned to a proxy. |
Dependent item | zabbix.proxy.hosts[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Version | A version of Zabbix proxy. |
Dependent item | zabbix.proxy.version[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Last seen, in seconds | The time when a proxy was last seen by a server. |
Dependent item | zabbix.proxy.last_seen[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Compatibility | Version of proxy compared to Zabbix server version. Possible values: 0 - Undefined; 1 - Current version (proxy and server have the same major version); 2 - Outdated version (proxy version is older than server version, but is partially supported); 3 - Unsupported version (proxy version is older than server previous LTS release version or server major version is older than proxy major version). |
Dependent item | zabbix.proxy.compatibility[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Required VPS | The required performance of a proxy (the number of values that need to be collected per second). |
Dependent item | zabbix.proxy.requiredperformance[{#PROXY.NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy last seen | Zabbix proxy is not updating the configuration data. |
last(/Zabbix server health by Zabbix agent active/zabbix.proxy.last_seen[{#PROXY.NAME}],#1)>{$ZABBIX.PROXY.LAST_SEEN.MAX} |Warning |
||
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy never seen | Zabbix proxy is not updating the configuration data. |
last(/Zabbix server health by Zabbix agent active/zabbix.proxy.last_seen[{#PROXY.NAME}],#1)=-1 |Warning |
||
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy is outdated | Zabbix proxy version is older than server version, but is partially supported. Only data collection and remote execution is available. |
last(/Zabbix server health by Zabbix agent active/zabbix.proxy.compatibility[{#PROXY.NAME}],#1)=2 |Warning |
||
Zabbix server: Proxy [{#PROXY.NAME}]: Zabbix proxy is not supported | Zabbix proxy version is older than server previous LTS release version or server major version is older than proxy major version. |
last(/Zabbix server health by Zabbix agent active/zabbix.proxy.compatibility[{#PROXY.NAME}],#1)=3 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
High availability cluster node discovery | LLD rule with item and trigger prototypes for node discovery. |
Dependent item | zabbix.nodes.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster node [{#NODE.NAME}]: Stats | Provides the statistics of a node. |
Dependent item | zabbix.node.stats[{#NODE.ID}] Preprocessing
|
Cluster node [{#NODE.NAME}]: Address | The IPv4 address of a node. |
Dependent item | zabbix.node.address[{#NODE.ID}] Preprocessing
|
Cluster node [{#NODE.NAME}]: Last access time | Last access time. |
Dependent item | zabbix.node.lastaccess.time[{#NODE.ID}] Preprocessing
|
Cluster node [{#NODE.NAME}]: Last access age | The time between the database's |
Dependent item | zabbix.node.lastaccess.age[{#NODE.ID}] Preprocessing
|
Cluster node [{#NODE.NAME}]: Status | The status of a node. |
Dependent item | zabbix.nodes.status[{#NODE.ID}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zabbix server: Cluster node [{#NODE.NAME}]: Status changed | The state of the node has changed. Acknowledge to close the problem manually. |
last(/Zabbix server health by Zabbix agent active/zabbix.nodes.status[{#NODE.ID}],#1)<>last(/Zabbix server health by Zabbix agent active/zabbix.nodes.status[{#NODE.ID}],#2) |Info |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed to monitor internal Zabbix metrics on the local Zabbix proxy.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Link this template to the local Zabbix proxy host.
Name | Description | Default |
---|---|---|
{$ZABBIX.PROXY.UTIL.MAX} | Default maximum threshold for percentage utilization triggers (use macro context for specification). |
75 |
{$ZABBIX.PROXY.UTIL.MIN} | Default minimum threshold for percentage utilization triggers (use macro context for specification). |
65 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Queue over 10 minutes | The number of monitored items in the queue that are delayed by at least 10 minutes. |
Zabbix internal | zabbix[queue,10m] |
Queue | The number of monitored items in the queue that are delayed by at least 6 seconds. |
Zabbix internal | zabbix[queue] |
Utilization of data sender internal processes, in % | The average percentage of the time during which the data sender processes have been busy for the last minute. |
Zabbix internal | zabbix[process,data sender,avg,busy] |
Utilization of availability manager internal processes, in % | The average percentage of the time during which the availability manager processes have been busy for the last minute. |
Zabbix internal | zabbix[process,availability manager,avg,busy] |
Utilization of configuration syncer internal processes, in % | The average percentage of the time during which the configuration syncer processes have been busy for the last minute. |
Zabbix internal | zabbix[process,configuration syncer,avg,busy] |
Utilization of discovery manager internal processes, in % | The average percentage of the time during which the discovery manager processes have been busy for the last minute. |
Zabbix internal | zabbix[process,discovery manager,avg,busy] |
Utilization of discovery worker internal processes, in % | The average percentage of the time during which the discovery worker processes have been busy for the last minute. |
Zabbix internal | zabbix[process,discovery worker,avg,busy] |
Utilization of ODBC poller data collector processes, in % | The average percentage of the time during which the ODBC poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,odbc poller,avg,busy] |
Utilization of history syncer internal processes, in % | The average percentage of the time during which the history syncer processes have been busy for the last minute. |
Zabbix internal | zabbix[process,history syncer,avg,busy] |
Utilization of housekeeper internal processes, in % | The average percentage of the time during which the housekeeper processes have been busy for the last minute. |
Zabbix internal | zabbix[process,housekeeper,avg,busy] |
Utilization of http poller data collector processes, in % | The average percentage of the time during which the http poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,http poller,avg,busy] |
Utilization of icmp pinger data collector processes, in % | The average percentage of the time during which the icmp pinger processes have been busy for the last minute. |
Zabbix internal | zabbix[process,icmp pinger,avg,busy] |
Utilization of ipmi manager internal processes, in % | The average percentage of the time during which the ipmi manager processes have been busy for the last minute. |
Zabbix internal | zabbix[process,ipmi manager,avg,busy] |
Utilization of ipmi poller data collector processes, in % | The average percentage of the time during which the ipmi poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,ipmi poller,avg,busy] |
Utilization of java poller data collector processes, in % | The average percentage of the time during which the java poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,java poller,avg,busy] |
Utilization of poller data collector processes, in % | The average percentage of the time during which the poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,poller,avg,busy] |
Utilization of preprocessing worker internal processes, in % | The average percentage of the time during which the preprocessing worker processes have been busy for the last minute. |
Zabbix internal | zabbix[process,preprocessing worker,avg,busy] |
Utilization of preprocessing manager internal processes, in % | The average percentage of the time during which the preprocessing manager processes have been busy for the last minute. |
Zabbix internal | zabbix[process,preprocessing manager,avg,busy] |
Utilization of self-monitoring internal processes, in % | The average percentage of the time during which the self-monitoring processes have been busy for the last minute. |
Zabbix internal | zabbix[process,self-monitoring,avg,busy] |
Utilization of snmp trapper data collector processes, in % | The average percentage of the time during which the snmp trapper processes have been busy for the last minute. |
Zabbix internal | zabbix[process,snmp trapper,avg,busy] |
Utilization of task manager internal processes, in % | The average percentage of the time during which the task manager processes have been busy for the last minute. |
Zabbix internal | zabbix[process,task manager,avg,busy] |
Utilization of trapper data collector processes, in % | The average percentage of the time during which the trapper processes have been busy for the last minute. |
Zabbix internal | zabbix[process,trapper,avg,busy] |
Utilization of unreachable poller data collector processes, in % | The average percentage of the time during which the unreachable poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,unreachable poller,avg,busy] |
Utilization of vmware collector data collector processes, in % | The average percentage of the time during which the vmware collector processes have been busy for the last minute. |
Zabbix internal | zabbix[process,vmware collector,avg,busy] |
Utilization of agent poller data collector processes, in % | The average percentage of the time during which the agent poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,agent poller,avg,busy] |
Utilization of http agent poller data collector processes, in % | The average percentage of the time during which the http agent poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,http agent poller,avg,busy] |
Utilization of snmp poller data collector processes, in % | The average percentage of the time during which the snmp poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,snmp poller,avg,busy] |
Utilization of internal poller data collector processes, in % | The average percentage of the time during which the internal poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,internal poller,avg,busy] |
Utilization of browser poller data collector processes, in % | The average percentage of the time during which the browser poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,browser poller,avg,busy] |
Configuration cache, % used | The availability statistics of Zabbix configuration cache. The percentage of used data buffer. |
Zabbix internal | zabbix[rcache,buffer,pused] |
Zabbix proxy check | Flag indicating whether it is a proxy or not. |
Zabbix internal | zabbix[triggers] Preprocessing
|
Version | The version of Zabbix proxy. |
Zabbix internal | zabbix[version] Preprocessing
|
VMware cache, % used | The availability statistics of Zabbix vmware cache. The percentage of used data buffer. |
Zabbix internal | zabbix[vmware,buffer,pused] |
History write cache, % used | The statistics and availability of Zabbix write cache. The percentage of used history buffer. The history cache is used to store item values. A high number indicates database performance problems. |
Zabbix internal | zabbix[wcache,history,pused] |
History index cache, % used | The statistics and availability of Zabbix write cache. The percentage of used history index buffer. The history index cache is used to index values stored in the history cache. |
Zabbix internal | zabbix[wcache,index,pused] |
Proxy memory buffer, % used | Statistics and availability of Zabbix proxy memory buffer usage statistics. Percentage of used proxy memory buffer. Proxy memory buffer is used to store the new historical data and upload from it without accessing database. |
Zabbix internal | zabbix[proxy_buffer,buffer,pused] |
Proxy buffer, state | The current working state of proxy buffer where the new data is being stored. Possible values: 0 - disk (also returned when memory buffer is disabled); 1 - memory. |
Zabbix internal | zabbix[proxy_buffer,state,current] Preprocessing
|
Proxy buffer, state changes | The number of state changes between disk/memory buffer modes since proxy start. |
Zabbix internal | zabbix[proxy_buffer,state,changes] Preprocessing
|
Number of processed values per second | The statistics and availability of Zabbix write cache. The total number of values processed by Zabbix server or Zabbix proxy, except unsupported items. |
Zabbix internal | zabbix[wcache,values] Preprocessing
|
Number of processed numeric (unsigned) values per second | The statistics and availability of Zabbix write cache. The number of processed numeric (unsigned) values. |
Zabbix internal | zabbix[wcache,values,uint] Preprocessing
|
Number of processed numeric (float) values per second | The statistics and availability of Zabbix write cache. The number of processed numeric (float) values. |
Zabbix internal | zabbix[wcache,values,float] Preprocessing
|
Number of processed log values per second | The statistics and availability of Zabbix write cache. The number of processed log values. |
Zabbix internal | zabbix[wcache,values,log] Preprocessing
|
Number of processed not supported values per second | The statistics and availability of Zabbix write cache. The number of times the item processing resulted in an item becoming unsupported or remaining in that state. |
Zabbix internal | zabbix[wcache,values,not supported] Preprocessing
|
Number of processed character values per second | The statistics and availability of Zabbix write cache. The number of processed character values. |
Zabbix internal | zabbix[wcache,values,str] Preprocessing
|
Number of processed text values per second | The statistics and availability of Zabbix write cache. The number of processed text values. |
Zabbix internal | zabbix[wcache,values,text] Preprocessing
|
Preprocessing queue | The number of values enqueued in the preprocessing queue. |
Zabbix internal | zabbix[preprocessing_queue] |
Discovery queue | The count of values enqueued in the discovery queue. |
Zabbix internal | zabbix[discovery_queue] |
Values waiting to be sent | The number of values in the proxy history table waiting to be sent to the server. |
Zabbix internal | zabbix[proxy_history] |
Required VPS | The required performance of a proxy (the number of values that need to be collected per second). |
Zabbix internal | zabbix[requiredperformance] Preprocessing
|
Uptime | Uptime of the Zabbix proxy process in seconds. |
Zabbix internal | zabbix[uptime] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zabbix proxy: More than 100 items have been missing data for over 10 minutes | Indicates potential issues with network connectivity, agent failures, or unresponsive monitored resources that require attention. |
min(/Zabbix proxy health/zabbix[queue,10m],10m)>100 |Warning |
Manual close: Yes | |
Zabbix proxy: Utilization of data sender processes is high | Indicates potential performance issues with the data sender, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,data sender,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"data sender"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of availability manager processes is high | Indicates potential performance issues with the availability manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,availability manager,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"availability manager"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of configuration syncer processes is high | Indicates potential performance issues with the configuration syncer, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,configuration syncer,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"configuration syncer"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of discovery manager processes is high | Indicates potential performance issues with the discovery manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,discovery manager,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"discovery manager"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of discovery worker processes is high | Indicates potential performance issues with the discovery worker, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,discovery worker,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"discovery worker"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of ODBC poller processes is high | Indicates potential performance issues with the ODBC poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,odbc poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"ODBC poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of history syncer processes is high | Indicates potential performance issues with the history syncer, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,history syncer,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"history syncer"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of housekeeper processes is high | Indicates potential performance issues with the housekeeper, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,housekeeper,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"housekeeper"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of http poller processes is high | Indicates potential performance issues with the http poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,http poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"http poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of icmp pinger processes is high | Indicates potential performance issues with the icmp pinger, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,icmp pinger,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"icmp pinger"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of ipmi manager processes is high | Indicates potential performance issues with the ipmi manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,ipmi manager,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi manager"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of ipmi poller processes is high | Indicates potential performance issues with the ipmi poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,ipmi poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of java poller processes is high | Indicates potential performance issues with the java poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,java poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"java poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of poller processes is high | Indicates potential performance issues with the poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of preprocessing worker processes is high | Indicates potential performance issues with the preprocessing worker, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,preprocessing worker,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing worker"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of preprocessing manager processes is high | Indicates potential performance issues with the preprocessing manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,preprocessing manager,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing manager"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of self-monitoring processes is high | Indicates potential performance issues with the self-monitoring, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,self-monitoring,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"self-monitoring"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of snmp trapper processes is high | Indicates potential performance issues with the snmp trapper, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,snmp trapper,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"snmp trapper"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of task manager processes is high | Indicates potential performance issues with the task manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,task manager,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"task manager"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of trapper processes is high | Indicates potential performance issues with the trapper, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,trapper,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"trapper"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of unreachable poller processes is high | Indicates potential performance issues with the unreachable poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,unreachable poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"unreachable poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of vmware collector processes is high | Indicates potential performance issues with the vmware collector, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,vmware collector,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"vmware collector"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of agent poller processes is high | Indicates potential performance issues with the agent poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,agent poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"agent poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of http agent poller processes is high | Indicates potential performance issues with the http agent poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,http agent poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"http agent poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of snmp poller processes is high | Indicates potential performance issues with the snmp poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,snmp poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"snmp poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of internal poller processes is high | Indicates potential performance issues with the internal poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,internal poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"internal poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of browser poller processes is high | Indicates potential performance issues with the browser poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health/zabbix[process,browser poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"browser poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Excessive configuration cache usage | Consider increasing |
max(/Zabbix proxy health/zabbix[rcache,buffer,pused],10m)>{$ZABBIX.PROXY.UTIL.MAX:"configuration cache"} |Average |
Manual close: Yes | |
Zabbix proxy: Wrong template assigned | Check that the template has been selected correctly. |
last(/Zabbix proxy health/zabbix[triggers])=1 |Disaster |
Manual close: Yes | |
Zabbix proxy: Version has changed | Zabbix proxy version has changed. Acknowledge to close the problem manually. |
last(/Zabbix proxy health/zabbix[version],#1)<>last(/Zabbix proxy health/zabbix[version],#2) and length(last(/Zabbix proxy health/zabbix[version]))>0 |Info |
Manual close: Yes | |
Zabbix proxy: Excessive vmware cache usage | Consider increasing |
max(/Zabbix proxy health/zabbix[vmware,buffer,pused],10m)>{$ZABBIX.PROXY.UTIL.MAX:"vmware cache"} |Average |
Manual close: Yes | |
Zabbix proxy: Excessive history cache usage | Consider increasing |
max(/Zabbix proxy health/zabbix[wcache,history,pused],10m)>{$ZABBIX.PROXY.UTIL.MAX:"history cache"} |Average |
Manual close: Yes | |
Zabbix proxy: Excessive history index cache usage | Consider increasing |
max(/Zabbix proxy health/zabbix[wcache,index,pused],10m)>{$ZABBIX.PROXY.UTIL.MAX:"index cache"} |Average |
Manual close: Yes | |
Zabbix proxy: Excessive proxy memory buffer usage | Consider increasing ProxyMemoryBufferSize in the zabbix_proxy.conf configuration file. |
max(/Zabbix proxy health/zabbix[proxy_buffer,buffer,pused],10m)>{$ZABBIX.PROXY.UTIL.MAX:"proxy buffer"} |Average |
Manual close: Yes | |
Zabbix proxy: {HOST.NAME} has been restarted | Uptime is less than 10 minutes. |
last(/Zabbix proxy health/zabbix[uptime])<10m |Info |
Manual close: Yes |
This template is designed to monitor internal Zabbix metrics on the remote Zabbix proxy.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Specify the address of the remote Zabbix proxy by updating the {$ZABBIX.PROXY.ADDRESS}
and {$ZABBIX.PROXY.PORT}
macros. Don't forget to adjust the StatsAllowedIP
parameter in the remote proxy's configuration file to allow the collection of statistics.
Name | Description | Default |
---|---|---|
{$ZABBIX.PROXY.ADDRESS} | IP/DNS/network mask list of proxies to be remotely queried (default is 127.0.0.1). |
|
{$ZABBIX.PROXY.PORT} | Port of proxy to be remotely queried (default is 10051). |
|
{$ZABBIX.PROXY.UTIL.MAX} | Default maximum threshold for percentage utilization triggers (use macro context for specification). |
75 |
{$ZABBIX.PROXY.UTIL.MIN} | Default minimum threshold for percentage utilization triggers (use macro context for specification). |
65 |
{$ZABBIX.PROXY.NODATA_TIMEOUT} | The time threshold after which statistics are considered unavailable. Used in trigger expressions. |
5m |
Name | Description | Type | Key and additional info |
---|---|---|---|
Zabbix stats | Zabbix server statistics master item. |
Zabbix internal | zabbix[stats,{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT}] |
Queue over 10 minutes | The number of monitored items in the queue that are delayed by at least 10 minutes. |
Zabbix internal | zabbix[stats,{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue,10m] Preprocessing
|
Queue | The number of monitored items in the queue that are delayed by at least 6 seconds. |
Zabbix internal | zabbix[stats,{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue] Preprocessing
|
Utilization of data sender internal processes, in % | The average percentage of the time during which the data sender processes have been busy for the last minute. |
Dependent item | process.data_sender.avg.busy Preprocessing
|
Utilization of availability manager internal processes, in % | The average percentage of the time during which the availability manager processes have been busy for the last minute. |
Dependent item | process.availability_manager.avg.busy Preprocessing
|
Utilization of configuration syncer internal processes, in % | The average percentage of the time during which the configuration syncer processes have been busy for the last minute. |
Dependent item | process.configuration_syncer.avg.busy Preprocessing
|
Utilization of discovery manager internal processes, in % | The average percentage of the time during which the discovery manager processes have been busy for the last minute. |
Dependent item | process.discovery_manager.avg.busy Preprocessing
|
Utilization of discovery worker internal processes, in % | The average percentage of the time during which the discovery worker processes have been busy for the last minute. |
Dependent item | process.discovery_worker.avg.busy Preprocessing
|
Utilization of ODBC poller data collector processes, in % | The average percentage of the time during which the ODBC poller processes have been busy for the last minute. |
Dependent item | process.odbc_poller.avg.busy Preprocessing
|
Utilization of history syncer internal processes, in % | The average percentage of the time during which the history syncer processes have been busy for the last minute. |
Dependent item | process.history_syncer.avg.busy Preprocessing
|
Utilization of housekeeper internal processes, in % | The average percentage of the time during which the housekeeper processes have been busy for the last minute. |
Dependent item | process.housekeeper.avg.busy Preprocessing
|
Utilization of http poller data collector processes, in % | The average percentage of the time during which the http poller processes have been busy for the last minute. |
Dependent item | process.http_poller.avg.busy Preprocessing
|
Utilization of icmp pinger data collector processes, in % | The average percentage of the time during which the icmp pinger processes have been busy for the last minute. |
Dependent item | process.icmp_pinger.avg.busy Preprocessing
|
Utilization of ipmi manager internal processes, in % | The average percentage of the time during which the ipmi manager processes have been busy for the last minute. |
Dependent item | process.ipmi_manager.avg.busy Preprocessing
|
Utilization of ipmi poller data collector processes, in % | The average percentage of the time during which the ipmi poller processes have been busy for the last minute. |
Dependent item | process.ipmi_poller.avg.busy Preprocessing
|
Utilization of java poller data collector processes, in % | The average percentage of the time during which the java poller processes have been busy for the last minute. |
Dependent item | process.java_poller.avg.busy Preprocessing
|
Utilization of poller data collector processes, in % | The average percentage of the time during which the poller processes have been busy for the last minute. |
Dependent item | process.poller.avg.busy Preprocessing
|
Utilization of preprocessing worker internal processes, in % | The average percentage of the time during which the preprocessing worker processes have been busy for the last minute. |
Dependent item | process.preprocessing_worker.avg.busy Preprocessing
|
Utilization of preprocessing manager internal processes, in % | The average percentage of the time during which the preprocessing manager processes have been busy for the last minute. |
Dependent item | process.preprocessing_manager.avg.busy Preprocessing
|
Utilization of self-monitoring internal processes, in % | The average percentage of the time during which the self-monitoring processes have been busy for the last minute. |
Dependent item | process.self-monitoring.avg.busy Preprocessing
|
Utilization of snmp trapper data collector processes, in % | The average percentage of the time during which the snmp trapper processes have been busy for the last minute. |
Dependent item | process.snmp_trapper.avg.busy Preprocessing
|
Utilization of task manager internal processes, in % | The average percentage of the time during which the task manager processes have been busy for the last minute. |
Dependent item | process.task_manager.avg.busy Preprocessing
|
Utilization of trapper data collector processes, in % | The average percentage of the time during which the trapper processes have been busy for the last minute. |
Dependent item | process.trapper.avg.busy Preprocessing
|
Utilization of unreachable poller data collector processes, in % | The average percentage of the time during which the unreachable poller processes have been busy for the last minute. |
Dependent item | process.unreachable_poller.avg.busy Preprocessing
|
Utilization of vmware collector data collector processes, in % | The average percentage of the time during which the vmware collector processes have been busy for the last minute. |
Dependent item | process.vmware_collector.avg.busy Preprocessing
|
Utilization of agent poller data collector processes, in % | The average percentage of the time during which the agent poller processes have been busy for the last minute. |
Dependent item | process.agent_poller.avg.busy Preprocessing
|
Utilization of http agent poller data collector processes, in % | The average percentage of the time during which the http agent poller processes have been busy for the last minute. |
Dependent item | process.httpagentpoller.avg.busy Preprocessing
|
Utilization of snmp poller data collector processes, in % | The average percentage of the time during which the snmp poller processes have been busy for the last minute. |
Dependent item | process.snmp_poller.avg.busy Preprocessing
|
Utilization of internal poller data collector processes, in % | The average percentage of the time during which the internal poller processes have been busy for the last minute. |
Dependent item | process.internal_poller.avg.busy Preprocessing
|
Utilization of browser poller data collector processes, in % | The average percentage of the time during which the browser poller processes have been busy for the last minute. |
Dependent item | process.browser_poller.avg.busy Preprocessing
|
Configuration cache, % used | The availability statistics of Zabbix configuration cache. The percentage of used data buffer. |
Dependent item | rcache.buffer.pused Preprocessing
|
Zabbix proxy check | Flag indicating whether it is a proxy or not. |
Dependent item | proxy_check Preprocessing
|
Version | The version of Zabbix proxy. |
Dependent item | version Preprocessing
|
VMware cache, % used | The availability statistics of Zabbix vmware cache. The percentage of used data buffer. |
Dependent item | vmware.buffer.pused Preprocessing
|
History write cache, % used | The statistics and availability of Zabbix write cache. The percentage of used history buffer. The history cache is used to store item values. A high number indicates database performance problems. |
Dependent item | wcache.history.pused Preprocessing
|
History index cache, % used | The statistics and availability of Zabbix write cache. The percentage of used history index buffer. The history index cache is used to index values stored in the history cache. |
Dependent item | wcache.index.pused Preprocessing
|
Proxy memory buffer, % used | Statistics and availability of Zabbix proxy memory buffer usage statistics. Percentage of used proxy memory buffer. Proxy memory buffer is used to store the new historical data and upload from it without accessing database. |
Dependent item | proxy_buffer.buffer.pused Preprocessing
|
Proxy buffer, state | The current working state of proxy buffer where the new data is being stored. Possible values: 0 - disk (also returned when memory buffer is disabled); 1 - memory. |
Dependent item | proxy_buffer.state.current Preprocessing
|
Proxy buffer, state changes | The number of state changes between disk/memory buffer modes since proxy start. |
Dependent item | proxy_buffer.state.changes Preprocessing
|
Number of processed values per second | The statistics and availability of Zabbix write cache. The total number of values processed by Zabbix server or Zabbix proxy, except unsupported items. |
Dependent item | wcache.values Preprocessing
|
Number of processed numeric (unsigned) values per second | The statistics and availability of Zabbix write cache. The number of processed numeric (unsigned) values. |
Dependent item | wcache.values.uint Preprocessing
|
Number of processed numeric (float) values per second | The statistics and availability of Zabbix write cache. The number of processed numeric (float) values. |
Dependent item | wcache.values.float Preprocessing
|
Number of processed log values per second | The statistics and availability of Zabbix write cache. The number of processed log values. |
Dependent item | wcache.values.log Preprocessing
|
Number of processed not supported values per second | The statistics and availability of Zabbix write cache. The number of times the item processing resulted in an item becoming unsupported or remaining in that state. |
Dependent item | wcache.values.not_supported Preprocessing
|
Number of processed character values per second | The statistics and availability of Zabbix write cache. The number of processed character values. |
Dependent item | wcache.values.str Preprocessing
|
Number of processed text values per second | The statistics and availability of Zabbix write cache. The number of processed text values. |
Dependent item | wcache.values.text Preprocessing
|
Preprocessing queue | The number of values enqueued in the preprocessing queue. |
Dependent item | preprocessing_queue Preprocessing
|
Discovery queue | The count of values enqueued in the discovery queue. |
Dependent item | discovery_queue Preprocessing
|
Required VPS | The required performance of a proxy (the number of values that need to be collected per second). |
Dependent item | requiredperformance Preprocessing
|
Uptime | Uptime of the Zabbix proxy process in seconds. |
Dependent item | uptime Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zabbix proxy: More than 100 items have been missing data for over 10 minutes | Indicates potential issues with network connectivity, agent failures, or unresponsive monitored resources that require attention. |
min(/Remote Zabbix proxy health/zabbix[stats,{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue,10m],10m)>100 |Warning |
Manual close: Yes | |
Zabbix proxy: Utilization of data sender processes is high | Indicates potential performance issues with the data sender, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.data_sender.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"data sender"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of availability manager processes is high | Indicates potential performance issues with the availability manager, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.availability_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"availability manager"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of configuration syncer processes is high | Indicates potential performance issues with the configuration syncer, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.configuration_syncer.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"configuration syncer"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of discovery manager processes is high | Indicates potential performance issues with the discovery manager, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.discovery_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"discovery manager"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of discovery worker processes is high | Indicates potential performance issues with the discovery worker, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.discovery_worker.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"discovery worker"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of ODBC poller processes is high | Indicates potential performance issues with the ODBC poller, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.odbc_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ODBC poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of history syncer processes is high | Indicates potential performance issues with the history syncer, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.history_syncer.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"history syncer"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of housekeeper processes is high | Indicates potential performance issues with the housekeeper, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.housekeeper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"housekeeper"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of http poller processes is high | Indicates potential performance issues with the http poller, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.http_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"http poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of icmp pinger processes is high | Indicates potential performance issues with the icmp pinger, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.icmp_pinger.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"icmp pinger"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of ipmi manager processes is high | Indicates potential performance issues with the ipmi manager, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.ipmi_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi manager"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of ipmi poller processes is high | Indicates potential performance issues with the ipmi poller, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.ipmi_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of java poller processes is high | Indicates potential performance issues with the java poller, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.java_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"java poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of poller processes is high | Indicates potential performance issues with the poller, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of preprocessing worker processes is high | Indicates potential performance issues with the preprocessing worker, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.preprocessing_worker.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing worker"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of preprocessing manager processes is high | Indicates potential performance issues with the preprocessing manager, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.preprocessing_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing manager"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of self-monitoring processes is high | Indicates potential performance issues with the self-monitoring, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.self-monitoring.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"self-monitoring"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of snmp trapper processes is high | Indicates potential performance issues with the snmp trapper, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.snmp_trapper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"snmp trapper"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of task manager processes is high | Indicates potential performance issues with the task manager, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.task_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"task manager"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of trapper processes is high | Indicates potential performance issues with the trapper, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.trapper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"trapper"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of unreachable poller processes is high | Indicates potential performance issues with the unreachable poller, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.unreachable_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"unreachable poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of vmware collector processes is high | Indicates potential performance issues with the vmware collector, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.vmware_collector.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"vmware collector"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of agent poller processes is high | Indicates potential performance issues with the agent poller, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.agent_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"agent poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of http agent poller processes is high | Indicates potential performance issues with the http agent poller, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.http_agent_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"http agent poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of snmp poller processes is high | Indicates potential performance issues with the snmp poller, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.snmp_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"snmp poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of internal poller processes is high | Indicates potential performance issues with the internal poller, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.internal_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"internal poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of browser poller processes is high | Indicates potential performance issues with the browser poller, which may affect monitoring efficiency and response times. |
avg(/Remote Zabbix proxy health/process.browser_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"browser poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Excessive configuration cache usage | Consider increasing |
max(/Remote Zabbix proxy health/rcache.buffer.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"configuration cache"} |Average |
Manual close: Yes | |
Zabbix proxy: Failed to fetch stats data | Zabbix has not received statistics data for |
nodata(/Remote Zabbix proxy health/rcache.buffer.pused,{$ZABBIX.PROXY.NODATA_TIMEOUT})=1 |Warning |
||
Zabbix proxy: Wrong template assigned | Check that the template has been selected correctly. |
last(/Remote Zabbix proxy health/proxy_check)=1 |Disaster |
Manual close: Yes | |
Zabbix proxy: Version has changed | Zabbix proxy version has changed. Acknowledge to close the problem manually. |
last(/Remote Zabbix proxy health/version,#1)<>last(/Remote Zabbix proxy health/version,#2) and length(last(/Remote Zabbix proxy health/version))>0 |Info |
Manual close: Yes | |
Zabbix proxy: Excessive vmware cache usage | Consider increasing |
max(/Remote Zabbix proxy health/vmware.buffer.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"vmware cache"} |Average |
Manual close: Yes | |
Zabbix proxy: Excessive history cache usage | Consider increasing |
max(/Remote Zabbix proxy health/wcache.history.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"history cache"} |Average |
Manual close: Yes | |
Zabbix proxy: Excessive history index cache usage | Consider increasing |
max(/Remote Zabbix proxy health/wcache.index.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"index cache"} |Average |
Manual close: Yes | |
Zabbix proxy: Excessive proxy memory buffer usage | Consider increasing ProxyMemoryBufferSize in the zabbix_proxy.conf configuration file. |
max(/Remote Zabbix proxy health/proxy_buffer.buffer.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"proxy buffer"} |Average |
Manual close: Yes | |
Zabbix proxy: {HOST.NAME} has been restarted | Uptime is less than 10 minutes. |
last(/Remote Zabbix proxy health/uptime)<10m |Info |
Manual close: Yes |
This template is designed to monitor internal Zabbix metrics on the remote Zabbix proxy via the passive Zabbix agent.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Specify the address of the remote Zabbix proxy by changing the {$ZABBIX.PROXY.ADDRESS}
and {$ZABBIX.PROXY.PORT}
macros. Don't forget to adjust the StatsAllowedIP
parameter in the remote proxy's configuration file to allow the collection of statistics.
Name | Description | Default |
---|---|---|
{$ZABBIX.PROXY.ADDRESS} | IP/DNS/network mask list of proxies to be remotely queried (default is 127.0.0.1). |
|
{$ZABBIX.PROXY.PORT} | Port of proxy to be remotely queried (default is 10051). |
|
{$ZABBIX.PROXY.UTIL.MAX} | Default maximum threshold for percentage utilization triggers (use macro context for specification). |
75 |
{$ZABBIX.PROXY.UTIL.MIN} | Default minimum threshold for percentage utilization triggers (use macro context for specification). |
65 |
{$ZABBIX.PROXY.NODATA_TIMEOUT} | The time threshold after which statistics are considered unavailable. Used in trigger expressions. |
5m |
Name | Description | Type | Key and additional info |
---|---|---|---|
Zabbix stats | The master item of Zabbix proxy statistics. |
Zabbix agent | zabbix.stats[{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT}] |
Queue over 10 minutes | The number of monitored items in the queue that are delayed by at least 10 minutes. |
Zabbix agent | zabbix.stats[{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue,10m] Preprocessing
|
Queue | The number of monitored items in the queue that are delayed by at least 6 seconds. |
Zabbix agent | zabbix.stats[{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue] Preprocessing
|
Utilization of data sender internal processes, in % | The average percentage of the time during which the data sender processes have been busy for the last minute. |
Dependent item | process.data_sender.avg.busy Preprocessing
|
Utilization of availability manager internal processes, in % | The average percentage of the time during which the availability manager processes have been busy for the last minute. |
Dependent item | process.availability_manager.avg.busy Preprocessing
|
Utilization of configuration syncer internal processes, in % | The average percentage of the time during which the configuration syncer processes have been busy for the last minute. |
Dependent item | process.configuration_syncer.avg.busy Preprocessing
|
Utilization of discovery manager internal processes, in % | The average percentage of the time during which the discovery manager processes have been busy for the last minute. |
Dependent item | process.discovery_manager.avg.busy Preprocessing
|
Utilization of discovery worker internal processes, in % | The average percentage of the time during which the discovery worker processes have been busy for the last minute. |
Dependent item | process.discovery_worker.avg.busy Preprocessing
|
Utilization of ODBC poller data collector processes, in % | The average percentage of the time during which the ODBC poller processes have been busy for the last minute. |
Dependent item | process.odbc_poller.avg.busy Preprocessing
|
Utilization of history syncer internal processes, in % | The average percentage of the time during which the history syncer processes have been busy for the last minute. |
Dependent item | process.history_syncer.avg.busy Preprocessing
|
Utilization of housekeeper internal processes, in % | The average percentage of the time during which the housekeeper processes have been busy for the last minute. |
Dependent item | process.housekeeper.avg.busy Preprocessing
|
Utilization of http poller data collector processes, in % | The average percentage of the time during which the http poller processes have been busy for the last minute. |
Dependent item | process.http_poller.avg.busy Preprocessing
|
Utilization of icmp pinger data collector processes, in % | The average percentage of the time during which the icmp pinger processes have been busy for the last minute. |
Dependent item | process.icmp_pinger.avg.busy Preprocessing
|
Utilization of ipmi manager internal processes, in % | The average percentage of the time during which the ipmi manager processes have been busy for the last minute. |
Dependent item | process.ipmi_manager.avg.busy Preprocessing
|
Utilization of ipmi poller data collector processes, in % | The average percentage of the time during which the ipmi poller processes have been busy for the last minute. |
Dependent item | process.ipmi_poller.avg.busy Preprocessing
|
Utilization of java poller data collector processes, in % | The average percentage of the time during which the java poller processes have been busy for the last minute. |
Dependent item | process.java_poller.avg.busy Preprocessing
|
Utilization of poller data collector processes, in % | The average percentage of the time during which the poller processes have been busy for the last minute. |
Dependent item | process.poller.avg.busy Preprocessing
|
Utilization of preprocessing worker internal processes, in % | The average percentage of the time during which the preprocessing worker processes have been busy for the last minute. |
Dependent item | process.preprocessing_worker.avg.busy Preprocessing
|
Utilization of preprocessing manager internal processes, in % | The average percentage of the time during which the preprocessing manager processes have been busy for the last minute. |
Dependent item | process.preprocessing_manager.avg.busy Preprocessing
|
Utilization of self-monitoring internal processes, in % | The average percentage of the time during which the self-monitoring processes have been busy for the last minute. |
Dependent item | process.self-monitoring.avg.busy Preprocessing
|
Utilization of snmp trapper data collector processes, in % | The average percentage of the time during which the snmp trapper processes have been busy for the last minute. |
Dependent item | process.snmp_trapper.avg.busy Preprocessing
|
Utilization of task manager internal processes, in % | The average percentage of the time during which the task manager processes have been busy for the last minute. |
Dependent item | process.task_manager.avg.busy Preprocessing
|
Utilization of trapper data collector processes, in % | The average percentage of the time during which the trapper processes have been busy for the last minute. |
Dependent item | process.trapper.avg.busy Preprocessing
|
Utilization of unreachable poller data collector processes, in % | The average percentage of the time during which the unreachable poller processes have been busy for the last minute. |
Dependent item | process.unreachable_poller.avg.busy Preprocessing
|
Utilization of vmware collector data collector processes, in % | The average percentage of the time during which the vmware collector processes have been busy for the last minute. |
Dependent item | process.vmware_collector.avg.busy Preprocessing
|
Utilization of agent poller data collector processes, in % | The average percentage of the time during which the agent poller processes have been busy for the last minute. |
Dependent item | process.agent_poller.avg.busy Preprocessing
|
Utilization of http agent poller data collector processes, in % | The average percentage of the time during which the http agent poller processes have been busy for the last minute. |
Dependent item | process.httpagentpoller.avg.busy Preprocessing
|
Utilization of snmp poller data collector processes, in % | The average percentage of the time during which the snmp poller processes have been busy for the last minute. |
Dependent item | process.snmp_poller.avg.busy Preprocessing
|
Utilization of internal poller data collector processes, in % | The average percentage of the time during which the internal poller processes have been busy for the last minute. |
Dependent item | process.internal_poller.avg.busy Preprocessing
|
Utilization of browser poller data collector processes, in % | The average percentage of the time during which the browser poller processes have been busy for the last minute. |
Dependent item | process.browser_poller.avg.busy Preprocessing
|
Configuration cache, % used | The availability statistics of Zabbix configuration cache. The percentage of used data buffer. |
Dependent item | rcache.buffer.pused Preprocessing
|
Zabbix proxy check | Flag indicating whether it is a proxy or not. |
Dependent item | proxy_check Preprocessing
|
Version | The version of Zabbix proxy. |
Dependent item | version Preprocessing
|
VMware cache, % used | The availability statistics of Zabbix vmware cache. The percentage of used data buffer. |
Dependent item | vmware.buffer.pused Preprocessing
|
History write cache, % used | The statistics and availability of Zabbix write cache. The percentage of used history buffer. The history cache is used to store item values. A high number indicates database performance problems. |
Dependent item | wcache.history.pused Preprocessing
|
History index cache, % used | The statistics and availability of Zabbix write cache. The percentage of used history index buffer. The history index cache is used to index values stored in the history cache. |
Dependent item | wcache.index.pused Preprocessing
|
Proxy memory buffer, % used | Statistics and availability of Zabbix proxy memory buffer usage statistics. Percentage of used proxy memory buffer. Proxy memory buffer is used to store the new historical data and upload from it without accessing database. |
Dependent item | proxy_buffer.buffer.pused Preprocessing
|
Proxy buffer, state | The current working state of proxy buffer where the new data is being stored. Possible values: 0 - disk (also returned when memory buffer is disabled); 1 - memory. |
Dependent item | proxy_buffer.state.current Preprocessing
|
Proxy buffer, state changes | The number of state changes between disk/memory buffer modes since proxy start. |
Dependent item | proxy_buffer.state.changes Preprocessing
|
Number of processed values per second | The statistics and availability of Zabbix write cache. The total number of values processed by Zabbix server or Zabbix proxy, except unsupported items. |
Dependent item | wcache.values Preprocessing
|
Number of processed numeric (unsigned) values per second | The statistics and availability of Zabbix write cache. The number of processed numeric (unsigned) values. |
Dependent item | wcache.values.uint Preprocessing
|
Number of processed numeric (float) values per second | The statistics and availability of Zabbix write cache. The number of processed numeric (float) values. |
Dependent item | wcache.values.float Preprocessing
|
Number of processed log values per second | The statistics and availability of Zabbix write cache. The number of processed log values. |
Dependent item | wcache.values.log Preprocessing
|
Number of processed not supported values per second | The statistics and availability of Zabbix write cache. The number of times the item processing resulted in an item becoming unsupported or remaining in that state. |
Dependent item | wcache.values.not_supported Preprocessing
|
Number of processed character values per second | The statistics and availability of Zabbix write cache. The number of processed character values. |
Dependent item | wcache.values.str Preprocessing
|
Number of processed text values per second | The statistics and availability of Zabbix write cache. The number of processed text values. |
Dependent item | wcache.values.text Preprocessing
|
Preprocessing queue | The number of values enqueued in the preprocessing queue. |
Dependent item | preprocessing_queue Preprocessing
|
Discovery queue | The count of values enqueued in the discovery queue. |
Dependent item | discovery_queue Preprocessing
|
Required VPS | The required performance of a proxy (the number of values that need to be collected per second). |
Dependent item | requiredperformance Preprocessing
|
Uptime | Uptime of the Zabbix proxy process in seconds. |
Dependent item | uptime Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zabbix proxy: More than 100 items have been missing data for over 10 minutes | Indicates potential issues with network connectivity, agent failures, or unresponsive monitored resources that require attention. |
min(/Zabbix proxy health by Zabbix agent/zabbix.stats[{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue,10m],10m)>100 |Warning |
Manual close: Yes | |
Zabbix proxy: Utilization of data sender processes is high | Indicates potential performance issues with the data sender, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.data_sender.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"data sender"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of availability manager processes is high | Indicates potential performance issues with the availability manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.availability_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"availability manager"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of configuration syncer processes is high | Indicates potential performance issues with the configuration syncer, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.configuration_syncer.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"configuration syncer"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of discovery manager processes is high | Indicates potential performance issues with the discovery manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.discovery_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"discovery manager"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of discovery worker processes is high | Indicates potential performance issues with the discovery worker, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.discovery_worker.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"discovery worker"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of ODBC poller processes is high | Indicates potential performance issues with the ODBC poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.odbc_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ODBC poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of history syncer processes is high | Indicates potential performance issues with the history syncer, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.history_syncer.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"history syncer"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of housekeeper processes is high | Indicates potential performance issues with the housekeeper, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.housekeeper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"housekeeper"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of http poller processes is high | Indicates potential performance issues with the http poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.http_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"http poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of icmp pinger processes is high | Indicates potential performance issues with the icmp pinger, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.icmp_pinger.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"icmp pinger"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of ipmi manager processes is high | Indicates potential performance issues with the ipmi manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.ipmi_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi manager"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of ipmi poller processes is high | Indicates potential performance issues with the ipmi poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.ipmi_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of java poller processes is high | Indicates potential performance issues with the java poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.java_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"java poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of poller processes is high | Indicates potential performance issues with the poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of preprocessing worker processes is high | Indicates potential performance issues with the preprocessing worker, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.preprocessing_worker.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing worker"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of preprocessing manager processes is high | Indicates potential performance issues with the preprocessing manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.preprocessing_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing manager"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of self-monitoring processes is high | Indicates potential performance issues with the self-monitoring, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.self-monitoring.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"self-monitoring"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of snmp trapper processes is high | Indicates potential performance issues with the snmp trapper, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.snmp_trapper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"snmp trapper"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of task manager processes is high | Indicates potential performance issues with the task manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.task_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"task manager"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of trapper processes is high | Indicates potential performance issues with the trapper, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.trapper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"trapper"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of unreachable poller processes is high | Indicates potential performance issues with the unreachable poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.unreachable_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"unreachable poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of vmware collector processes is high | Indicates potential performance issues with the vmware collector, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.vmware_collector.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"vmware collector"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of agent poller processes is high | Indicates potential performance issues with the agent poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.agent_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"agent poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of http agent poller processes is high | Indicates potential performance issues with the http agent poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.http_agent_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"http agent poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of snmp poller processes is high | Indicates potential performance issues with the snmp poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.snmp_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"snmp poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of internal poller processes is high | Indicates potential performance issues with the internal poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.internal_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"internal poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of browser poller processes is high | Indicates potential performance issues with the browser poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent/process.browser_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"browser poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Excessive configuration cache usage | Consider increasing |
max(/Zabbix proxy health by Zabbix agent/rcache.buffer.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"configuration cache"} |Average |
Manual close: Yes | |
Zabbix proxy: Failed to fetch stats data | Zabbix has not received statistics data for |
nodata(/Zabbix proxy health by Zabbix agent/rcache.buffer.pused,{$ZABBIX.PROXY.NODATA_TIMEOUT})=1 |Warning |
||
Zabbix proxy: Wrong template assigned | Check that the template has been selected correctly. |
last(/Zabbix proxy health by Zabbix agent/proxy_check)=1 |Disaster |
Manual close: Yes | |
Zabbix proxy: Version has changed | Zabbix proxy version has changed. Acknowledge to close the problem manually. |
last(/Zabbix proxy health by Zabbix agent/version,#1)<>last(/Zabbix proxy health by Zabbix agent/version,#2) and length(last(/Zabbix proxy health by Zabbix agent/version))>0 |Info |
Manual close: Yes | |
Zabbix proxy: Excessive vmware cache usage | Consider increasing |
max(/Zabbix proxy health by Zabbix agent/vmware.buffer.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"vmware cache"} |Average |
Manual close: Yes | |
Zabbix proxy: Excessive history cache usage | Consider increasing |
max(/Zabbix proxy health by Zabbix agent/wcache.history.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"history cache"} |Average |
Manual close: Yes | |
Zabbix proxy: Excessive history index cache usage | Consider increasing |
max(/Zabbix proxy health by Zabbix agent/wcache.index.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"index cache"} |Average |
Manual close: Yes | |
Zabbix proxy: Excessive proxy memory buffer usage | Consider increasing ProxyMemoryBufferSize in the zabbix_proxy.conf configuration file. |
max(/Zabbix proxy health by Zabbix agent/proxy_buffer.buffer.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"proxy buffer"} |Average |
Manual close: Yes | |
Zabbix proxy: {HOST.NAME} has been restarted | Uptime is less than 10 minutes. |
last(/Zabbix proxy health by Zabbix agent/uptime)<10m |Info |
Manual close: Yes |
This template is designed to monitor internal Zabbix metrics on the remote Zabbix proxy via the active Zabbix agent.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Specify the address of the remote Zabbix proxy by changing the {$ZABBIX.PROXY.ADDRESS}
and {$ZABBIX.PROXY.PORT}
macros. Don't forget to adjust the StatsAllowedIP
parameter in the remote proxy's configuration file to allow the collection of statistics.
Name | Description | Default |
---|---|---|
{$ZABBIX.PROXY.ADDRESS} | IP/DNS/network mask list of proxies to be remotely queried (default is 127.0.0.1). |
|
{$ZABBIX.PROXY.PORT} | Port of proxy to be remotely queried (default is 10051). |
|
{$ZABBIX.PROXY.UTIL.MAX} | Default maximum threshold for percentage utilization triggers (use macro context for specification). |
75 |
{$ZABBIX.PROXY.UTIL.MIN} | Default minimum threshold for percentage utilization triggers (use macro context for specification). |
65 |
{$ZABBIX.PROXY.NODATA_TIMEOUT} | The time threshold after which statistics are considered unavailable. Used in trigger expressions. |
5m |
Name | Description | Type | Key and additional info |
---|---|---|---|
Zabbix stats | The master item of Zabbix proxy statistics. |
Zabbix agent (active) | zabbix.stats[{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT}] |
Queue over 10 minutes | The number of monitored items in the queue that are delayed by at least 10 minutes. |
Zabbix agent (active) | zabbix.stats[{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue,10m] Preprocessing
|
Queue | The number of monitored items in the queue that are delayed by at least 6 seconds. |
Zabbix agent (active) | zabbix.stats[{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue] Preprocessing
|
Utilization of data sender internal processes, in % | The average percentage of the time during which the data sender processes have been busy for the last minute. |
Dependent item | process.data_sender.avg.busy Preprocessing
|
Utilization of availability manager internal processes, in % | The average percentage of the time during which the availability manager processes have been busy for the last minute. |
Dependent item | process.availability_manager.avg.busy Preprocessing
|
Utilization of configuration syncer internal processes, in % | The average percentage of the time during which the configuration syncer processes have been busy for the last minute. |
Dependent item | process.configuration_syncer.avg.busy Preprocessing
|
Utilization of discovery manager internal processes, in % | The average percentage of the time during which the discovery manager processes have been busy for the last minute. |
Dependent item | process.discovery_manager.avg.busy Preprocessing
|
Utilization of discovery worker internal processes, in % | The average percentage of the time during which the discovery worker processes have been busy for the last minute. |
Dependent item | process.discovery_worker.avg.busy Preprocessing
|
Utilization of ODBC poller data collector processes, in % | The average percentage of the time during which the ODBC poller processes have been busy for the last minute. |
Dependent item | process.odbc_poller.avg.busy Preprocessing
|
Utilization of history syncer internal processes, in % | The average percentage of the time during which the history syncer processes have been busy for the last minute. |
Dependent item | process.history_syncer.avg.busy Preprocessing
|
Utilization of housekeeper internal processes, in % | The average percentage of the time during which the housekeeper processes have been busy for the last minute. |
Dependent item | process.housekeeper.avg.busy Preprocessing
|
Utilization of http poller data collector processes, in % | The average percentage of the time during which the http poller processes have been busy for the last minute. |
Dependent item | process.http_poller.avg.busy Preprocessing
|
Utilization of icmp pinger data collector processes, in % | The average percentage of the time during which the icmp pinger processes have been busy for the last minute. |
Dependent item | process.icmp_pinger.avg.busy Preprocessing
|
Utilization of ipmi manager internal processes, in % | The average percentage of the time during which the ipmi manager processes have been busy for the last minute. |
Dependent item | process.ipmi_manager.avg.busy Preprocessing
|
Utilization of ipmi poller data collector processes, in % | The average percentage of the time during which the ipmi poller processes have been busy for the last minute. |
Dependent item | process.ipmi_poller.avg.busy Preprocessing
|
Utilization of java poller data collector processes, in % | The average percentage of the time during which the java poller processes have been busy for the last minute. |
Dependent item | process.java_poller.avg.busy Preprocessing
|
Utilization of poller data collector processes, in % | The average percentage of the time during which the poller processes have been busy for the last minute. |
Dependent item | process.poller.avg.busy Preprocessing
|
Utilization of preprocessing worker internal processes, in % | The average percentage of the time during which the preprocessing worker processes have been busy for the last minute. |
Dependent item | process.preprocessing_worker.avg.busy Preprocessing
|
Utilization of preprocessing manager internal processes, in % | The average percentage of the time during which the preprocessing manager processes have been busy for the last minute. |
Dependent item | process.preprocessing_manager.avg.busy Preprocessing
|
Utilization of self-monitoring internal processes, in % | The average percentage of the time during which the self-monitoring processes have been busy for the last minute. |
Dependent item | process.self-monitoring.avg.busy Preprocessing
|
Utilization of snmp trapper data collector processes, in % | The average percentage of the time during which the snmp trapper processes have been busy for the last minute. |
Dependent item | process.snmp_trapper.avg.busy Preprocessing
|
Utilization of task manager internal processes, in % | The average percentage of the time during which the task manager processes have been busy for the last minute. |
Dependent item | process.task_manager.avg.busy Preprocessing
|
Utilization of trapper data collector processes, in % | The average percentage of the time during which the trapper processes have been busy for the last minute. |
Dependent item | process.trapper.avg.busy Preprocessing
|
Utilization of unreachable poller data collector processes, in % | The average percentage of the time during which the unreachable poller processes have been busy for the last minute. |
Dependent item | process.unreachable_poller.avg.busy Preprocessing
|
Utilization of vmware collector data collector processes, in % | The average percentage of the time during which the vmware collector processes have been busy for the last minute. |
Dependent item | process.vmware_collector.avg.busy Preprocessing
|
Utilization of agent poller data collector processes, in % | The average percentage of the time during which the agent poller processes have been busy for the last minute. |
Dependent item | process.agent_poller.avg.busy Preprocessing
|
Utilization of http agent poller data collector processes, in % | The average percentage of the time during which the http agent poller processes have been busy for the last minute. |
Dependent item | process.httpagentpoller.avg.busy Preprocessing
|
Utilization of snmp poller data collector processes, in % | The average percentage of the time during which the snmp poller processes have been busy for the last minute. |
Dependent item | process.snmp_poller.avg.busy Preprocessing
|
Utilization of internal poller data collector processes, in % | The average percentage of the time during which the internal poller processes have been busy for the last minute. |
Dependent item | process.internal_poller.avg.busy Preprocessing
|
Utilization of browser poller data collector processes, in % | The average percentage of the time during which the browser poller processes have been busy for the last minute. |
Dependent item | process.browser_poller.avg.busy Preprocessing
|
Configuration cache, % used | The availability statistics of Zabbix configuration cache. The percentage of used data buffer. |
Dependent item | rcache.buffer.pused Preprocessing
|
Zabbix proxy check | Flag indicating whether it is a proxy or not. |
Dependent item | proxy_check Preprocessing
|
Version | The version of Zabbix proxy. |
Dependent item | version Preprocessing
|
VMware cache, % used | The availability statistics of Zabbix vmware cache. The percentage of used data buffer. |
Dependent item | vmware.buffer.pused Preprocessing
|
History write cache, % used | The statistics and availability of Zabbix write cache. The percentage of used history buffer. The history cache is used to store item values. A high number indicates database performance problems. |
Dependent item | wcache.history.pused Preprocessing
|
History index cache, % used | The statistics and availability of Zabbix write cache. The percentage of used history index buffer. The history index cache is used to index values stored in the history cache. |
Dependent item | wcache.index.pused Preprocessing
|
Proxy memory buffer, % used | Statistics and availability of Zabbix proxy memory buffer usage statistics. Percentage of used proxy memory buffer. Proxy memory buffer is used to store the new historical data and upload from it without accessing database. |
Dependent item | proxy_buffer.buffer.pused Preprocessing
|
Proxy buffer, state | The current working state of proxy buffer where the new data is being stored. Possible values: 0 - disk (also returned when memory buffer is disabled); 1 - memory. |
Dependent item | proxy_buffer.state.current Preprocessing
|
Proxy buffer, state changes | The number of state changes between disk/memory buffer modes since proxy start. |
Dependent item | proxy_buffer.state.changes Preprocessing
|
Number of processed values per second | The statistics and availability of Zabbix write cache. The total number of values processed by Zabbix server or Zabbix proxy, except unsupported items. |
Dependent item | wcache.values Preprocessing
|
Number of processed numeric (unsigned) values per second | The statistics and availability of Zabbix write cache. The number of processed numeric (unsigned) values. |
Dependent item | wcache.values.uint Preprocessing
|
Number of processed numeric (float) values per second | The statistics and availability of Zabbix write cache. The number of processed numeric (float) values. |
Dependent item | wcache.values.float Preprocessing
|
Number of processed log values per second | The statistics and availability of Zabbix write cache. The number of processed log values. |
Dependent item | wcache.values.log Preprocessing
|
Number of processed not supported values per second | The statistics and availability of Zabbix write cache. The number of times the item processing resulted in an item becoming unsupported or remaining in that state. |
Dependent item | wcache.values.not_supported Preprocessing
|
Number of processed character values per second | The statistics and availability of Zabbix write cache. The number of processed character values. |
Dependent item | wcache.values.str Preprocessing
|
Number of processed text values per second | The statistics and availability of Zabbix write cache. The number of processed text values. |
Dependent item | wcache.values.text Preprocessing
|
Preprocessing queue | The number of values enqueued in the preprocessing queue. |
Dependent item | preprocessing_queue Preprocessing
|
Discovery queue | The count of values enqueued in the discovery queue. |
Dependent item | discovery_queue Preprocessing
|
Required VPS | The required performance of a proxy (the number of values that need to be collected per second). |
Dependent item | requiredperformance Preprocessing
|
Uptime | Uptime of the Zabbix proxy process in seconds. |
Dependent item | uptime Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zabbix proxy: More than 100 items have been missing data for over 10 minutes | Indicates potential issues with network connectivity, agent failures, or unresponsive monitored resources that require attention. |
min(/Zabbix proxy health by Zabbix agent active/zabbix.stats[{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue,10m],10m)>100 |Warning |
Manual close: Yes | |
Zabbix proxy: Utilization of data sender processes is high | Indicates potential performance issues with the data sender, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.data_sender.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"data sender"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of availability manager processes is high | Indicates potential performance issues with the availability manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.availability_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"availability manager"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of configuration syncer processes is high | Indicates potential performance issues with the configuration syncer, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.configuration_syncer.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"configuration syncer"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of discovery manager processes is high | Indicates potential performance issues with the discovery manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.discovery_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"discovery manager"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of discovery worker processes is high | Indicates potential performance issues with the discovery worker, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.discovery_worker.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"discovery worker"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of ODBC poller processes is high | Indicates potential performance issues with the ODBC poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.odbc_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ODBC poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of history syncer processes is high | Indicates potential performance issues with the history syncer, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.history_syncer.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"history syncer"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of housekeeper processes is high | Indicates potential performance issues with the housekeeper, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.housekeeper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"housekeeper"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of http poller processes is high | Indicates potential performance issues with the http poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.http_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"http poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of icmp pinger processes is high | Indicates potential performance issues with the icmp pinger, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.icmp_pinger.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"icmp pinger"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of ipmi manager processes is high | Indicates potential performance issues with the ipmi manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.ipmi_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi manager"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of ipmi poller processes is high | Indicates potential performance issues with the ipmi poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.ipmi_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of java poller processes is high | Indicates potential performance issues with the java poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.java_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"java poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of poller processes is high | Indicates potential performance issues with the poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of preprocessing worker processes is high | Indicates potential performance issues with the preprocessing worker, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.preprocessing_worker.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing worker"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of preprocessing manager processes is high | Indicates potential performance issues with the preprocessing manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.preprocessing_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing manager"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of self-monitoring processes is high | Indicates potential performance issues with the self-monitoring, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.self-monitoring.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"self-monitoring"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of snmp trapper processes is high | Indicates potential performance issues with the snmp trapper, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.snmp_trapper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"snmp trapper"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of task manager processes is high | Indicates potential performance issues with the task manager, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.task_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"task manager"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of trapper processes is high | Indicates potential performance issues with the trapper, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.trapper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"trapper"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of unreachable poller processes is high | Indicates potential performance issues with the unreachable poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.unreachable_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"unreachable poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of vmware collector processes is high | Indicates potential performance issues with the vmware collector, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.vmware_collector.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"vmware collector"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of agent poller processes is high | Indicates potential performance issues with the agent poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.agent_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"agent poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of http agent poller processes is high | Indicates potential performance issues with the http agent poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.http_agent_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"http agent poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of snmp poller processes is high | Indicates potential performance issues with the snmp poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.snmp_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"snmp poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of internal poller processes is high | Indicates potential performance issues with the internal poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.internal_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"internal poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Utilization of browser poller processes is high | Indicates potential performance issues with the browser poller, which may affect monitoring efficiency and response times. |
avg(/Zabbix proxy health by Zabbix agent active/process.browser_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"browser poller"} |Average |
Manual close: Yes | |
Zabbix proxy: Excessive configuration cache usage | Consider increasing |
max(/Zabbix proxy health by Zabbix agent active/rcache.buffer.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"configuration cache"} |Average |
Manual close: Yes | |
Zabbix proxy: Failed to fetch stats data | Zabbix has not received statistics data for |
nodata(/Zabbix proxy health by Zabbix agent active/rcache.buffer.pused,{$ZABBIX.PROXY.NODATA_TIMEOUT})=1 |Warning |
||
Zabbix proxy: Wrong template assigned | Check that the template has been selected correctly. |
last(/Zabbix proxy health by Zabbix agent active/proxy_check)=1 |Disaster |
Manual close: Yes | |
Zabbix proxy: Version has changed | Zabbix proxy version has changed. Acknowledge to close the problem manually. |
last(/Zabbix proxy health by Zabbix agent active/version,#1)<>last(/Zabbix proxy health by Zabbix agent active/version,#2) and length(last(/Zabbix proxy health by Zabbix agent active/version))>0 |Info |
Manual close: Yes | |
Zabbix proxy: Excessive vmware cache usage | Consider increasing |
max(/Zabbix proxy health by Zabbix agent active/vmware.buffer.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"vmware cache"} |Average |
Manual close: Yes | |
Zabbix proxy: Excessive history cache usage | Consider increasing |
max(/Zabbix proxy health by Zabbix agent active/wcache.history.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"history cache"} |Average |
Manual close: Yes | |
Zabbix proxy: Excessive history index cache usage | Consider increasing |
max(/Zabbix proxy health by Zabbix agent active/wcache.index.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"index cache"} |Average |
Manual close: Yes | |
Zabbix proxy: Excessive proxy memory buffer usage | Consider increasing ProxyMemoryBufferSize in the zabbix_proxy.conf configuration file. |
max(/Zabbix proxy health by Zabbix agent active/proxy_buffer.buffer.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX:"proxy buffer"} |Average |
Manual close: Yes | |
Zabbix proxy: {HOST.NAME} has been restarted | Uptime is less than 10 minutes. |
last(/Zabbix proxy health by Zabbix agent active/uptime)<10m |Info |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Name | Description | Default |
---|---|---|
{$AGENT.TIMEOUT} | Timeout after which agent is considered unavailable. Works only for agents reachable from Zabbix server/proxy (passive mode). |
3m |
Name | Description | Type | Key and additional info |
---|---|---|---|
Version of Zabbix agent running | Zabbix agent | agent.version Preprocessing
|
|
Host name of Zabbix agent running | Zabbix agent | agent.hostname Preprocessing
|
|
Zabbix agent ping | The agent always returns "1" for this item. May be used in combination with |
Zabbix agent | agent.ping |
Zabbix agent availability | Used for monitoring the availability status of the agent. |
Zabbix internal | zabbix[host,agent,available] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zabbix agent is not available | For passive agents only, host availability is used with |
max(/Zabbix agent/zabbix[host,agent,available],{$AGENT.TIMEOUT})=0 |Average |
Manual close: Yes |
Name | Description | Default |
---|---|---|
{$AGENT.NODATA_TIMEOUT} | No data timeout for active agents. Consider to keep it relatively high. |
30m |
{$AGENT.TIMEOUT} | Timeout after which agent is considered unavailable. |
5m |
Name | Description | Type | Key and additional info |
---|---|---|---|
Version of Zabbix agent running | Zabbix agent (active) | agent.version Preprocessing
|
|
Host name of Zabbix agent running | Zabbix agent (active) | agent.hostname Preprocessing
|
|
Zabbix agent ping | The agent always returns "1" for this item. May be used in combination with |
Zabbix agent (active) | agent.ping |
Active agent availability | Availability of active checks on the host. The value of this item corresponds to availability icons in the host list. Possible values: 0 - unknown 1 - available 2 - not available |
Zabbix internal | zabbix[host,active_agent,available] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zabbix agent is not available | For active agents, |
nodata(/Zabbix agent active/agent.ping,{$AGENT.NODATA_TIMEOUT})=1 |Average |
Manual close: Yes | |
Active checks are not available | Active checks are considered unavailable. Agent has not sent a heartbeat for a prolonged time. |
min(/Zabbix agent active/zabbix[host,active_agent,available],{$AGENT.TIMEOUT})=2 |High |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Official JMX Template for WildFly server.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Metrics are collected by JMX. This template works with standalone and domain instances.
/(wildfly,EAP,Jboss,AS)/bin/client
in to directory /usr/share/zabbix-java-gateway/lib
Name | Description | Default |
---|---|---|
{$WILDFLY.USER} | zabbix |
|
{$WILDFLY.PASSWORD} | zabbix |
|
{$WILDFLY.JMX.PROTOCOL} | remote+http |
|
{$WILDFLY.DEPLOYMENT.MATCHES} | Filter of discoverable deployments |
.* |
{$WILDFLY.DEPLOYMENT.NOT_MATCHES} | Filter to exclude discovered deployments |
CHANGE_IF_NEEDED |
{$WILDFLY.CONN.USAGE.WARN.MAX} | The maximum connection usage percent for trigger expression. |
80 |
{$WILDFLY.CONN.WAIT.MAX.WARN} | The maximum number of waiting connections for trigger expression. |
300 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Launch type | The manner in which the server process was launched. Either "DOMAIN" for a domain mode server launched by a Host Controller, "STANDALONE" for a standalone server launched from the command line, or "EMBEDDED" for a standalone server launched as an embedded part of an application running in the same virtual machine. |
JMX agent | jmx["jboss.as:management-root=server","launchType"] Preprocessing
|
Name | For standalone mode: The name of this server. If not set, defaults to the runtime value of InetAddress.getLocalHost().getHostName(). For domain mode: The name given to this domain. |
JMX agent | jmx["jboss.as:management-root=server","name"] Preprocessing
|
Process type | The type of process represented by this root resource. |
JMX agent | jmx["jboss.as:management-root=server","processType"] Preprocessing
|
Runtime configuration state | The current persistent configuration state, one of starting, ok, reload-required, restart-required, stopping or stopped. |
JMX agent | jmx["jboss.as:management-root=server","runtimeConfigurationState"] Preprocessing
|
Server controller state | The current state of the server controller; either STARTING, RUNNING, RESTARTREQUIRED, RELOADREQUIRED or STOPPING. |
JMX agent | jmx["jboss.as:management-root=server","serverState"] Preprocessing
|
Version | The version of the WildFly Core based product release. |
JMX agent | jmx["jboss.as:management-root=server","productVersion"] Preprocessing
|
Uptime | WildFly server uptime. |
JMX agent | jmx["java.lang:type=Runtime","Uptime"] Preprocessing
|
Transactions: Total, rate | The total number of transactions (top-level and nested) created per second. |
JMX agent | jmx["jboss.as:subsystem=transactions","numberOfTransactions"] Preprocessing
|
Transactions: Aborted, rate | The number of aborted (i.e. rolledback) transactions per second. |
JMX agent | jmx["jboss.as:subsystem=transactions","numberOfAbortedTransactions"] Preprocessing
|
Transactions: Application rollbacks, rate | The number of transactions that have been rolled back by application request. This includes those that timeout, since the timeout behavior is considered an attribute of the application configuration. |
JMX agent | jmx["jboss.as:subsystem=transactions","numberOfApplicationRollbacks"] Preprocessing
|
Transactions: Committed, rate | The number of committed transactions. |
JMX agent | jmx["jboss.as:subsystem=transactions","numberOfCommittedTransactions"] Preprocessing
|
Transactions: Heuristics, rate | The number of transactions which have terminated with heuristic outcomes. |
JMX agent | jmx["jboss.as:subsystem=transactions","numberOfHeuristics"] Preprocessing
|
Transactions: Current | The number of transactions that have begun but not yet terminated. |
JMX agent | jmx["jboss.as:subsystem=transactions","numberOfInflightTransactions"] |
Transactions: Nested, rate | The total number of nested (sub) transactions created. |
JMX agent | jmx["jboss.as:subsystem=transactions","numberOfNestedTransactions"] Preprocessing
|
Transactions: ResourceRollbacks, rate | The number of transactions that rolled back due to resource (participant) failure. |
JMX agent | jmx["jboss.as:subsystem=transactions","numberOfResourceRollbacks"] Preprocessing
|
Transactions: System rollbacks, rate | The number of transactions that have been rolled back due to internal system errors. |
JMX agent | jmx["jboss.as:subsystem=transactions","numberOfSystemRollbacks"] Preprocessing
|
Transactions: Timed out, rate | The number of transactions that have rolled back due to timeout. |
JMX agent | jmx["jboss.as:subsystem=transactions","numberOfTimedOutTransactions"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
WildFly Server: Server needs to restart for configuration change. | find(/WildFly Server by JMX/jmx["jboss.as:management-root=server","runtimeConfigurationState"],,"like","ok")=0 |Warning |
|||
WildFly Server: Server controller is not in RUNNING state | find(/WildFly Server by JMX/jmx["jboss.as:management-root=server","serverState"],,"like","running")=0 |Warning |
Depends on:
|
||
WildFly Server: Version has changed | WildFly version has changed. Acknowledge to close the problem manually. |
last(/WildFly Server by JMX/jmx["jboss.as:management-root=server","productVersion"],#1)<>last(/WildFly Server by JMX/jmx["jboss.as:management-root=server","productVersion"],#2) and length(last(/WildFly Server by JMX/jmx["jboss.as:management-root=server","productVersion"]))>0 |Info |
Manual close: Yes | |
WildFly Server: Host has been restarted | Uptime is less than 10 minutes. |
last(/WildFly Server by JMX/jmx["java.lang:type=Runtime","Uptime"])<10m |Info |
Manual close: Yes | |
WildFly Server: Failed to fetch info data | Zabbix has not received data for items for the last 15 minutes |
nodata(/WildFly Server by JMX/jmx["java.lang:type=Runtime","Uptime"],15m)=1 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Deployments discovery | Discovery deployments metrics. |
JMX agent | jmx.get[beans,"jboss.as.expr:deployment=*"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Deployment [{#DEPLOYMENT}]: Status | The current runtime status of a deployment. Possible status modes are OK, FAILED, and STOPPED. FAILED indicates a dependency is missing or a service could not start. STOPPED indicates that the deployment was not enabled or was manually stopped. |
JMX agent | jmx["{#JMXOBJ}",status] Preprocessing
|
Deployment [{#DEPLOYMENT}]: Enabled | Boolean indicating whether the deployment content is currently deployed in the runtime (or should be deployed in the runtime the next time the server starts). |
JMX agent | jmx["{#JMXOBJ}",enabled] Preprocessing
|
Deployment [{#DEPLOYMENT}]: Managed | Indicates if the deployment is managed (aka uses the ContentRepository). |
JMX agent | jmx["{#JMXOBJ}",managed] Preprocessing
|
Deployment [{#DEPLOYMENT}]: Persistent | Indicates if the deployment is managed (aka uses the ContentRepository). |
JMX agent | jmx["{#JMXOBJ}",persistent] Preprocessing
|
Deployment [{#DEPLOYMENT}]: Enabled time | Indicates if the deployment is managed (aka uses the ContentRepository). |
JMX agent | jmx["{#JMXOBJ}",enabledTime] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
WildFly Server: Deployment [{#DEPLOYMENT}]: Deployment status has changed | Deployment status has changed. Acknowledge to close the problem manually. |
last(/WildFly Server by JMX/jmx["{#JMXOBJ}",status],#1)<>last(/WildFly Server by JMX/jmx["{#JMXOBJ}",status],#2) and length(last(/WildFly Server by JMX/jmx["{#JMXOBJ}",status]))>0 |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
JDBC metrics discovery | JMX agent | jmx.get[beans,"jboss.as:subsystem=datasources,data-source=*,statistics=jdbc"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#JMXDATASOURCE}: Cache access, rate | The number of times that the statement cache was accessed per second. |
JMX agent | jmx["{#JMXOBJ}",PreparedStatementCacheAccessCount] Preprocessing
|
{#JMXDATASOURCE}: Cache add, rate | The number of statements added to the statement cache per second. |
JMX agent | jmx["{#JMXOBJ}",PreparedStatementCacheAddCount] Preprocessing
|
{#JMXDATASOURCE}: Cache current size | The number of prepared and callable statements currently cached in the statement cache. |
JMX agent | jmx["{#JMXOBJ}",PreparedStatementCacheCurrentSize] |
{#JMXDATASOURCE}: Cache delete, rate | The number of statements discarded from the cache per second. |
JMX agent | jmx["{#JMXOBJ}",PreparedStatementCacheDeleteCount] Preprocessing
|
{#JMXDATASOURCE}: Cache hit, rate | The number of times that statements from the cache were used per second. |
JMX agent | jmx["{#JMXOBJ}",PreparedStatementCacheHitCount] Preprocessing
|
{#JMXDATASOURCE}: Cache miss, rate | The number of times that a statement request could not be satisfied with a statement from the cache per second. |
JMX agent | jmx["{#JMXOBJ}",PreparedStatementCacheMissCount] Preprocessing
|
{#JMXDATASOURCE}: Statistics enabled | Define whether runtime statistics are enabled or not. |
JMX agent | jmx["{#JMXOBJ}",statisticsEnabled, "JDBC"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
WildFly Server: {#JMXDATASOURCE}: JDBC monitoring statistic is not enabled | last(/WildFly Server by JMX/jmx["{#JMXOBJ}",statisticsEnabled, "JDBC"])=0 |Info |
Name | Description | Type | Key and additional info |
---|---|---|---|
Pools metrics discovery | JMX agent | jmx.get[beans,"jboss.as:subsystem=datasources,data-source=*,statistics=pool"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#JMXDATASOURCE}: Connections: Active | The number of open connections. |
JMX agent | jmx["{#JMXOBJ}",ActiveCount] |
{#JMXDATASOURCE}: Connections: Available | The available count. |
JMX agent | jmx["{#JMXOBJ}",AvailableCount] |
{#JMXDATASOURCE}: Blocking time, avg | Average Blocking Time for pool. |
JMX agent | jmx["{#JMXOBJ}",AverageBlockingTime] |
{#JMXDATASOURCE}: Connections: Creating time, avg | The average time spent creating a physical connection. |
JMX agent | jmx["{#JMXOBJ}",AverageCreationTime] |
{#JMXDATASOURCE}: Connections: Get time, avg | The average time spent obtaining a physical connection. |
JMX agent | jmx["{#JMXOBJ}",AverageGetTime] |
{#JMXDATASOURCE}: Connections: Pool time, avg | The average time for a physical connection spent in the pool. |
JMX agent | jmx["{#JMXOBJ}",AveragePoolTime] |
{#JMXDATASOURCE}: Connections: Usage time, avg | The average time spent using a physical connection |
JMX agent | jmx["{#JMXOBJ}",AverageUsageTime] |
{#JMXDATASOURCE}: Connections: Blocking failure, rate | The number of failures trying to obtain a physical connection per second. |
JMX agent | jmx["{#JMXOBJ}",BlockingFailureCount] Preprocessing
|
{#JMXDATASOURCE}: Connections: Created, rate | The created per second |
JMX agent | jmx["{#JMXOBJ}",CreatedCount] Preprocessing
|
{#JMXDATASOURCE}: Connections: Destroyed, rate | The destroyed count. |
JMX agent | jmx["{#JMXOBJ}",DestroyedCount] Preprocessing
|
{#JMXDATASOURCE}: Connections: Idle | The number of physical connections currently idle. |
JMX agent | jmx["{#JMXOBJ}",IdleCount] |
{#JMXDATASOURCE}: Connections: In use | The number of physical connections currently in use. |
JMX agent | jmx["{#JMXOBJ}",InUseCount] |
{#JMXDATASOURCE}: Connections: Used, max | The maximum number of connections used. |
JMX agent | jmx["{#JMXOBJ}",MaxUsedCount] |
{#JMXDATASOURCE}: Statistics enabled | Define whether runtime statistics are enabled or not. |
JMX agent | jmx["{#JMXOBJ}",statisticsEnabled] Preprocessing
|
{#JMXDATASOURCE}: Connections: Timed out, rate | The timed out connections per second. |
JMX agent | jmx["{#JMXOBJ}",TimedOut] Preprocessing
|
{#JMXDATASOURCE}: Connections: Wait | The number of requests that had to wait to obtain a physical connection. |
JMX agent | jmx["{#JMXOBJ}",WaitCount] |
{#JMXDATASOURCE}: XA: Commit time, avg | The average time for a XAResource commit invocation. |
JMX agent | jmx["{#JMXOBJ}",XACommitAverageTime] |
{#JMXDATASOURCE}: XA: Commit, rate | The number of XAResource commit invocations per second. |
JMX agent | jmx["{#JMXOBJ}",XACommitCount] Preprocessing
|
{#JMXDATASOURCE}: XA: End time, avg | The average time for a XAResource end invocation. |
JMX agent | jmx["{#JMXOBJ}",XAEndAverageTime] |
{#JMXDATASOURCE}: XA: End, rate | The number of XAResource end invocations per second. |
JMX agent | jmx["{#JMXOBJ}",XAEndCount] Preprocessing
|
{#JMXDATASOURCE}: XA: Forget time, avg | The average time for a XAResource forget invocation. |
JMX agent | jmx["{#JMXOBJ}",XAForgetAverageTime] |
{#JMXDATASOURCE}: XA: Forget, rate | The number of XAResource forget invocations per second. |
JMX agent | jmx["{#JMXOBJ}",XAForgetCount] Preprocessing
|
{#JMXDATASOURCE}: XA: Prepare time, avg | The average time for a XAResource prepare invocation. |
JMX agent | jmx["{#JMXOBJ}",XAPrepareAverageTime] |
{#JMXDATASOURCE}: XA: Prepare, rate | The number of XAResource prepare invocations per second. |
JMX agent | jmx["{#JMXOBJ}",XAPrepareCount] Preprocessing
|
{#JMXDATASOURCE}: XA: Recover time, avg | The average time for a XAResource recover invocation. |
JMX agent | jmx["{#JMXOBJ}",XARecoverAverageTime] |
{#JMXDATASOURCE}: XA: Recover, rate | The number of XAResource recover invocations per second. |
JMX agent | jmx["{#JMXOBJ}",XARecoverCount] Preprocessing
|
{#JMXDATASOURCE}: XA: Rollback time, avg | The average time for a XAResource rollback invocation. |
JMX agent | jmx["{#JMXOBJ}",XARollbackAverageTime] |
{#JMXDATASOURCE}: XA: Rollback, rate | The number of XAResource rollback invocations per second. |
JMX agent | jmx["{#JMXOBJ}",XARollbackCount] Preprocessing
|
{#JMXDATASOURCE}: XA: Start time, avg | The average time for a XAResource start invocation. |
JMX agent | jmx["{#JMXOBJ}",XAStartAverageTime] |
{#JMXDATASOURCE}: XA: Start rate | The number of XAResource start invocations per second. |
JMX agent | jmx["{#JMXOBJ}",XAStartCount] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
WildFly Server: {#JMXDATASOURCE}: There are no active connections for 5m | max(/WildFly Server by JMX/jmx["{#JMXOBJ}",ActiveCount],5m)=0 |Warning |
|||
WildFly Server: {#JMXDATASOURCE}: Connection usage is too high | min(/WildFly Server by JMX/jmx["{#JMXOBJ}",InUseCount],5m)/last(/WildFly Server by JMX/jmx["{#JMXOBJ}",AvailableCount])*100>{$WILDFLY.CONN.USAGE.WARN.MAX} |High |
|||
WildFly Server: {#JMXDATASOURCE}: Pools monitoring statistic is not enabled | Zabbix has not received data for items for the last 15 minutes |
last(/WildFly Server by JMX/jmx["{#JMXOBJ}",statisticsEnabled])=0 |Info |
||
WildFly Server: {#JMXDATASOURCE}: There are timeout connections | last(/WildFly Server by JMX/jmx["{#JMXOBJ}",TimedOut])>0 |Warning |
|||
WildFly Server: {#JMXDATASOURCE}: Too many waiting connections | min(/WildFly Server by JMX/jmx["{#JMXOBJ}",WaitCount],5m)>{$WILDFLY.CONN.WAIT.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Undertow metrics discovery | JMX agent | jmx.get[beans,"jboss.as:subsystem=undertow,server=,http-listener="] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Listener {#HTTP_LISTENER}: Errors, rate | The number of 500 responses that have been sent by this listener per second. |
JMX agent | jmx["{#JMXOBJ}",errorCount] Preprocessing
|
Listener {#HTTP_LISTENER}: Requests, rate | The number of requests this listener has served per second. |
JMX agent | jmx["{#JMXOBJ}",requestCount] Preprocessing
|
Listener {#HTTP_LISTENER}: Bytes sent, rate | The number of bytes that have been sent out on this listener per second. |
JMX agent | jmx["{#JMXOBJ}",bytesSent] Preprocessing
|
Listener {#HTTP_LISTENER}: Bytes received, rate | The number of bytes that have been received by this listener per second. |
JMX agent | jmx["{#JMXOBJ}",bytesReceived] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
WildFly Server: Listener {#HTTP_LISTENER}: There are 500 responses by this listener. | last(/WildFly Server by JMX/jmx["{#JMXOBJ}",errorCount])>0 |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Official JMX Template for WildFly Domain Controller.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Metrics are collected by JMX. This template works with Domain Controller.
/(wildfly,EAP,Jboss,AS)/bin/client
in to directory /usr/share/zabbix-java-gateway/lib
Name | Description | Default |
---|---|---|
{$WILDFLY.USER} | zabbix |
|
{$WILDFLY.PASSWORD} | zabbix |
|
{$WILDFLY.JMX.PROTOCOL} | remote+http |
|
{$WILDFLY.DEPLOYMENT.MATCHES} | Filter of discoverable deployments |
.* |
{$WILDFLY.DEPLOYMENT.NOT_MATCHES} | Filter to exclude discovered deployments |
CHANGE_IF_NEEDED |
{$WILDFLY.SERVER.MATCHES} | Filter of discoverable servers |
.* |
{$WILDFLY.SERVER.NOT_MATCHES} | Filter to exclude discovered servers |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
Launch type | The manner in which the server process was launched. Either "DOMAIN" for a domain mode server launched by a Host Controller, "STANDALONE" for a standalone server launched from the command line, or "EMBEDDED" for a standalone server launched as an embedded part of an application running in the same virtual machine. |
JMX agent | jmx["jboss.as:management-root=server","launchType"] Preprocessing
|
Name | For standalone mode: The name of this server. If not set, defaults to the runtime value of InetAddress.getLocalHost().getHostName(). For domain mode: The name given to this domain |
JMX agent | jmx["jboss.as:management-root=server","name"] Preprocessing
|
Process type | The type of process represented by this root resource. |
JMX agent | jmx["jboss.as:management-root=server","processType"] Preprocessing
|
Version | The version of the WildFly Core based product release. |
JMX agent | jmx["jboss.as:management-root=server","productVersion"] Preprocessing
|
Uptime | WildFly server uptime. |
JMX agent | jmx["java.lang:type=Runtime","Uptime"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
WildFly Domain: WildFly: Version has changed | WildFly version has changed. Acknowledge to close the problem manually. |
last(/WildFly Domain by JMX/jmx["jboss.as:management-root=server","productVersion"],#1)<>last(/WildFly Domain by JMX/jmx["jboss.as:management-root=server","productVersion"],#2) and length(last(/WildFly Domain by JMX/jmx["jboss.as:management-root=server","productVersion"]))>0 |Info |
Manual close: Yes | |
WildFly Domain: WildFly: Host has been restarted | Uptime is less than 10 minutes. |
last(/WildFly Domain by JMX/jmx["java.lang:type=Runtime","Uptime"])<10m |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Deployments discovery | Discovery deployments metrics. |
JMX agent | jmx.get[beans,"jboss.as.expr:deployment=,server-group="] |
Name | Description | Type | Key and additional info |
---|---|---|---|
WildFly deployment [{#DEPLOYMENT}]: Enabled | Boolean indicating whether the deployment content is currently deployed in the runtime (or should be deployed in the runtime the next time the server starts). |
JMX agent | jmx["{#JMXOBJ}",enabled] Preprocessing
|
WildFly deployment [{#DEPLOYMENT}]: Managed | Indicates if the deployment is managed (aka uses the ContentRepository). |
JMX agent | jmx["{#JMXOBJ}",managed] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Servers discovery | Discovery instances in domain. |
JMX agent | jmx.get[beans,"jboss.as:host=master,server-config=*"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Server {#SERVER}: Autostart | Whether or not this server should be started when the Host Controller starts. |
JMX agent | jmx["{#JMXOBJ}",autoStart] Preprocessing
|
Server {#SERVER}: Status | The current status of the server. |
JMX agent | jmx["{#JMXOBJ}",status] Preprocessing
|
Server {#SERVER}: Server group | The name of a server group from the domain model. |
JMX agent | jmx["{#JMXOBJ}",group] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
WildFly Domain: Server {#SERVER}: Server status has changed | Server status has changed. Acknowledge to close the problem manually. |
last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",status],#1)<>last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",status],#2) and length(last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",status]))>0 |Warning |
Manual close: Yes | |
WildFly Domain: Server {#SERVER}: Server group has changed | Server group has changed. Acknowledge to close the problem manually. |
last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",group],#1)<>last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",group],#2) and length(last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",group]))>0 |Info |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Install WebDriver. For more information, please refer to the Selenium WebDriver page. Run selenium-server. Add in configuration file WebDriver interface HTTP[S] URL. For example http://localhost:4444
Name | Description | Default |
---|---|---|
{$WEBSITE.BROWSER} | Browser to be used for data collection. |
chrome |
{$WEBSITE.DOMAIN} | The domain name. |
www.example.com |
{$WEBSITE.PATH} | The path to resource. |
|
{$WEBSITE.SCHEME} | The request scheme, which may be either HTTP or HTTPS. |
https |
{$WEBSITE.SCREEN.WIDTH} | Screen size width in pixels, used for screenshot. |
1920 |
{$WEBSITE.SCREEN.HEIGHT} | Screen size height in pixels, used for screenshot. |
1080 |
{$WEBSITE.RESOURCE.LOAD.MAX.WARN} | The maximum browser response time expressed in seconds for a trigger expression. |
5 |
{$WEBSITE.NAVIGATION.LOAD.MAX.WARN} | The maximum browser response time expressed in seconds for a trigger expression. |
5 |
{$WEBSITE.GET.DATA.INTERVAL} | Update interval for get raw data item. |
0s;m/15 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Website {$WEBSITE.DOMAIN} Get data | Returns the JSON with performance counters of the requested website. |
Browser | website.get.data Preprocessing
|
Get metrics check | Check that the performance counters of the requested website data has been received correctly. |
Dependent item | website.metrics.check Preprocessing
|
Website {$WEBSITE.DOMAIN} Screenshot | Website {$WEBSITE.DOMAIN} screenshot. |
Dependent item | website.screenshot Preprocessing
|
Website {$WEBSITE.DOMAIN} Navigation load event time | Measuring of load finished time (loadEventEnd). |
Dependent item | website.navigation.load_time Preprocessing
|
Website {$WEBSITE.DOMAIN} Navigation response time | Measuring of time spend on the response (responseEnd - responseStart). |
Dependent item | website.navigation.response_time Preprocessing
|
Website {$WEBSITE.DOMAIN} Navigation request time | Measuring of time spend on the request (responseStart - requestStart). |
Dependent item | website.navigation.request_time Preprocessing
|
Website {$WEBSITE.DOMAIN} Navigation resource fetch time | Measuring of time spent to fetch the resource (without redirects) (responseEnd - fetchStart). |
Dependent item | website.navigation.resourcefetchtime Preprocessing
|
Website {$WEBSITE.DOMAIN} Navigation service worker processing time | Measuring of sum of time spend on browser's service worker processing (fetchStart - workerStart). |
Dependent item | website.navigation.serviceworkerprocessing_time Preprocessing
|
Website {$WEBSITE.DOMAIN} Navigation domContentLoaded time | Measuring of time spent on DOM content loading (domContentLoadedEventEnd - domContentLoadedEventStart). |
Dependent item | website.navigation.domcontentloaded_time Preprocessing
|
Website {$WEBSITE.DOMAIN} Navigation DNS lookup time | Measuring of time spent on DNS lookup (domainLookupEnd - domainLookupStart). |
Dependent item | website.navigation.dnslookuptime Preprocessing
|
Website {$WEBSITE.DOMAIN} Navigation TCP handshake time | Measuring of time spent on TCP handshake (connectEnd - connectStart). |
Dependent item | website.navigation.tcphandshaketime Preprocessing
|
Website {$WEBSITE.DOMAIN} Navigation TLS negotiation time | Measuring of time spent on TLS negotiation (requestStart - secureConnectionStart). |
Dependent item | website.navigation.tlsnegotiationtime Preprocessing
|
Website {$WEBSITE.DOMAIN} Navigation encodedBody size | Measuring of encoded size (encodedBodySize). |
Dependent item | website.navigation.encoded_size Preprocessing
|
Website {$WEBSITE.DOMAIN} Navigation decodedBody size | Measuring of total size (decodedBodySize). |
Dependent item | website.navigation.total_size Preprocessing
|
Website {$WEBSITE.DOMAIN} Navigation transfer size | Measuring of transferred size (transferSize). |
Dependent item | website.navigation.transferred_size Preprocessing
|
Website {$WEBSITE.DOMAIN} Resource load event time | Measuring of load finished time (loadEventEnd). |
Dependent item | website.resource.load_time Preprocessing
|
Website {$WEBSITE.DOMAIN} Resource response time | Measuring of time spend on the response (responseEnd - responseStart). |
Dependent item | website.resource.response_time Preprocessing
|
Website {$WEBSITE.DOMAIN} Resource request time | Measuring of time spend on the request (responseStart - requestStart). |
Dependent item | website.resource.request_time Preprocessing
|
Website {$WEBSITE.DOMAIN} Resource fetch time | Measuring of time spent to fetch the resource (without redirects) (responseEnd - fetchStart). |
Dependent item | website.resource.fetch_time Preprocessing
|
Website {$WEBSITE.DOMAIN} Resource service worker processing time | Measuring of sum of time spend on browser's service worker processing (fetchStart - workerStart). |
Dependent item | website.resource.serviceworkerprocessing_time Preprocessing
|
Website {$WEBSITE.DOMAIN} Resource domContentLoaded time | Measuring of time spent on DOM content loading (domContentLoadedEventEnd - domContentLoadedEventStart). |
Dependent item | website.resource.domcontentloaded_time Preprocessing
|
Website {$WEBSITE.DOMAIN} Resource DNS lookup time | Measuring of time spent on DNS lookup (domainLookupEnd - domainLookupStart). |
Dependent item | website.resource.dnslookuptime Preprocessing
|
Website {$WEBSITE.DOMAIN} Resource TCP handshake time | Measuring of time spent on TCP handshake (connectEnd - connectStart). |
Dependent item | website.resource.tcphandshaketime Preprocessing
|
Website {$WEBSITE.DOMAIN} Resource TLS negotiation time | Measuring of time spent on TLS negotiation (requestStart - secureConnectionStart). |
Dependent item | website.resource.tlsnegotiationtime Preprocessing
|
Website {$WEBSITE.DOMAIN} Resource encodedBody size | Measuring of encoded size (encodedBodySize). |
Dependent item | website.resource.encoded_size Preprocessing
|
Website {$WEBSITE.DOMAIN} Resource decodedBody size | Measuring of total size (decodedBodySize). |
Dependent item | website.resource.total_size Preprocessing
|
Website {$WEBSITE.DOMAIN} Resource transfer size | Measuring of transferred size (transferSize). |
Dependent item | website.resource.transferred_size Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Website by Browser: Failed to get metrics data | Failed to get JSON with performance counters of the requested website '{$WEBSITE.DOMAIN}'. |
length(last(/Website by Browser/website.metrics.check))>0 |High |
||
Website by Browser: Website navigation load event time is too slow | last(/Website by Browser/website.navigation.load_time)>{$WEBSITE.NAVIGATION.LOAD.MAX.WARN} |Warning |
Depends on:
|
||
Website by Browser: Website resource load event time is too slow | last(/Website by Browser/website.resource.load_time)>{$WEBSITE.RESOURCE.LOAD.MAX.WARN} |Warning |
Depends on:
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template set is designed for the effortless deployment of VMware vCenter and ESX hypervisor monitoring and doesn't require any external scripts.
For additional information, please see Zabbix documentation on VM monitoring.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
--with-libxml2
and --with-libcurl
)StartVMwareCollectors
option in the Zabbix server configuration file to "1" or moreSystemConfiguration.ReadOnly
and vStatsGroup
groups
Set the host macros (on the host or template level) required for VMware authentication:{$VMWARE.URL}
{$VMWARE.USERNAME}
{$VMWARE.PASSWORD}
Note: To enable discovery of hardware sensors of VMware hypervisors, set the macro {$VMWARE.HV.SENSOR.DISCOVERY}
to the value true
on the discovered host level.
Additional resources:
Name | Description | Default |
---|---|---|
{$VMWARE.URL} | VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk). |
|
{$VMWARE.USERNAME} | VMware service user name. |
|
{$VMWARE.PASSWORD} | VMware service |
|
{$VMWARE.PROXY} | Sets the HTTP proxy for script items. If this parameter is empty, then no proxy is used. |
|
{$VMWARE.HV.SENSOR.DISCOVERY} | Set "true"/"false" to enable or disable monitoring of hardware sensors. |
false |
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.MATCHES} | Sets the regex string of hardware sensor names to be allowed in discovery. |
.* |
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.NOT_MATCHES} | Sets the regex string of hardware sensor names to be ignored in discovery. |
CHANGE_IF_NEEDED |
{$VMWARE.VM.POWERSTATE} | Possibility to filter out VMs by power state. |
poweredOn|poweredOff|suspended |
{$VMWARE.DATASTORE.SPACE.CRIT} | The critical threshold of the datastore free space. |
10 |
{$VMWARE.DATASTORE.SPACE.WARN} | The warning threshold of the datastore free space. |
20 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get alarms | Get alarm status. |
Simple check | vmware.alarms.get[{$VMWARE.URL}] |
Event log | Collect VMware event log. See also: https://www.zabbix.com/documentation/7.0/manual/config/items/preprocessing/examples#filteringvmwareeventlogrecords |
Simple check | vmware.eventlog[{$VMWARE.URL},skip] |
Full name | VMware service full name. |
Simple check | vmware.fullname[{$VMWARE.URL}] Preprocessing
|
Version | VMware service version. |
Simple check | vmware.version[{$VMWARE.URL}] Preprocessing
|
Get Overall Health VC State | Gets overall health of the system. This item works only with VMware vCenter versions above 6.5. |
Script | vmware.health.get |
Overall Health VC State error check | Data collection error check. |
Dependent item | vmware.health.check Preprocessing
|
Overall Health VC State | VMware Overall health of system. One of the following: - Gray: No health data is available for this service. - Green: Service is healthy. - Yellow: The service is in a healthy state, but experiencing some level of problems. - Orange: The service health is degraded. The service might have serious problems. - Red: The service is unavailable, not functioning properly, or will stop functioning soon. - Not available: The health status is unavailable (not supported on the vCenter or ESXi side). |
Dependent item | vmware.health.state Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware FQDN: Failed to get Overall Health VC State | Failed to get data. Check debug log for more information. |
length(last(/VMware FQDN/vmware.health.check))>0 |Warning |
||
VMware FQDN: Overall Health VC State is not Green | One or more components in the appliance might be in an unusable status and the appliance might soon become unresponsive. |
last(/VMware FQDN/vmware.health.state)>0 and last(/VMware FQDN/vmware.health.state)<>6 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware alarm discovery | Discovery of alarms. |
Dependent item | vmware.alarms.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
{#VMWARE.ALARMS.NAME} | VMware alarm status. |
Dependent item | vmware.alarms.status["{#VMWARE.ALARMS.KEY}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware FQDN: {#VMWARE.ALARMS.NAME} | {#VMWARE.ALARMS.DESC} |
last(/VMware FQDN/vmware.alarms.status["{#VMWARE.ALARMS.KEY}"])<>-1 |Not_classified |
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware cluster discovery | Discovery of clusters. |
Simple check | vmware.cluster.discovery[{$VMWARE.URL}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Status of [{#CLUSTER.NAME}] cluster | VMware cluster status. One of the following: - Gray: Unknown; - Green: OK; - Yellow: It might have a problem; - Red: It has a problem. |
Simple check | vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware FQDN: The [{#CLUSTER.NAME}] status is Red | A cluster enabled for DRS becomes invalid (red) when the tree is no longer internally consistent, that is, when resource constraints are not observed. See also: https://docs.vmware.com/en/VMware-vSphere/8.0/vsphere-resource-management/GUID-C7417CAA-BD38-41D0-9529-9E7A5898BB12.html |
last(/VMware FQDN/vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}])=3 |High |
||
VMware FQDN: The [{#CLUSTER.NAME}] status is Yellow | A cluster becomes overcommitted (yellow) when the tree of resource pools and virtual machines is internally consistent but the cluster does not have the capacity to support all the resources reserved by the child resource pools. See also: https://docs.vmware.com/en/VMware-vSphere/8.0/vsphere-resource-management/GUID-ED8240A0-FB54-4A31-BD3D-F23FE740F10C.html |
last(/VMware FQDN/vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}])=2 |Average |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware datastore discovery | Discovery of VMware datastores. |
Simple check | vmware.datastore.discovery[{$VMWARE.URL}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Average read IOPS of the datastore [{#DATASTORE}] | IOPS for a read operation from the datastore (milliseconds). |
Simple check | vmware.datastore.read[{$VMWARE.URL},{#DATASTORE.UUID},rps] |
Average write IOPS of the datastore [{#DATASTORE}] | IOPS for a write operation to the datastore (milliseconds). |
Simple check | vmware.datastore.write[{$VMWARE.URL},{#DATASTORE.UUID},rps] |
Average read latency of the datastore [{#DATASTORE}] | Amount of time for a read operation from the datastore (milliseconds). |
Simple check | vmware.datastore.read[{$VMWARE.URL},{#DATASTORE.UUID},latency] |
Free space on datastore [{#DATASTORE}] (percentage) | VMware datastore free space (percentage from the total). |
Simple check | vmware.datastore.size[{$VMWARE.URL},{#DATASTORE.UUID},pfree] |
Total size of datastore [{#DATASTORE}] | VMware datastore space in bytes. |
Simple check | vmware.datastore.size[{$VMWARE.URL},{#DATASTORE.UUID}] Preprocessing
|
Average write latency of the datastore [{#DATASTORE}] | Amount of time for a write operation to the datastore (milliseconds). |
Simple check | vmware.datastore.write[{$VMWARE.URL},{#DATASTORE.UUID},latency] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware FQDN: [{#DATASTORE}]: Free space is critically low | Datastore free space has fallen below the critical threshold. |
last(/VMware FQDN/vmware.datastore.size[{$VMWARE.URL},{#DATASTORE.UUID},pfree])<{$VMWARE.DATASTORE.SPACE.CRIT} |High |
||
VMware FQDN: [{#DATASTORE}]: Free space is low | Datastore free space has fallen below the warning threshold. |
last(/VMware FQDN/vmware.datastore.size[{$VMWARE.URL},{#DATASTORE.UUID},pfree])<{$VMWARE.DATASTORE.SPACE.WARN} |Warning |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware hypervisor discovery | Discovery of hypervisors. |
Simple check | vmware.hv.discovery[{$VMWARE.URL}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware VM FQDN discovery | Discovery of guest virtual machines. |
Simple check | vmware.vm.discovery[{$VMWARE.URL}] |
Name | Description | Default |
---|---|---|
{$VMWARE.URL} | VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk). |
|
{$VMWARE.USERNAME} | VMware service user name. |
|
{$VMWARE.PASSWORD} | VMware service |
|
{$VMWARE.VM.FS.PFREE.MIN.WARN} | VMware guest free space threshold for the warning trigger. |
20 |
{$VMWARE.VM.FS.PFREE.MIN.CRIT} | VMware guest free space threshold for the critical trigger. |
10 |
{$VMWARE.VM.FS.TRIGGER.USED} | VMware guest used free space trigger. Set to "1"/"0" to enable or disable the trigger. |
0 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Snapshot consolidation needed | Displays whether snapshot consolidation is needed or not. One of the following: - True; - False. |
Simple check | vmware.vm.consolidationneeded[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
Snapshot count | Snapshot count of the guest VM. |
Dependent item | vmware.vm.snapshot.count Preprocessing
|
Get snapshots | Snapshots of the guest VM. |
Simple check | vmware.vm.snapshot.get[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
Snapshot latest date | Latest snapshot date of the guest VM. |
Dependent item | vmware.vm.snapshot.latestdate Preprocessing
|
VM state | VMware virtual machine state. One of the following: - Not running; - Resetting; - Running; - Shutting down; - Standby; - Unknown. |
Simple check | vmware.vm.state[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
VMware Tools status | Monitoring of VMware Tools. One of the following: - Guest tools executing scripts: VMware Tools is starting. - Guest tools not running: VMware Tools is not running. - Guest tools running: VMware Tools is running. |
Simple check | vmware.vm.tools[{$VMWARE.URL},{$VMWARE.VM.UUID},status] Preprocessing
|
VMware Tools version | Monitoring of the VMware Tools version. |
Simple check | vmware.vm.tools[{$VMWARE.URL},{$VMWARE.VM.UUID},version] Preprocessing
|
Cluster name | Cluster name of the guest VM. |
Simple check | vmware.vm.cluster.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
Number of virtual CPUs | Number of virtual CPUs assigned to the guest. |
Simple check | vmware.vm.cpu.num[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
CPU ready | Time that the VM was ready, but unable to get scheduled to run on the physical CPU during the last measurement interval (VMware vCenter/ESXi Server performance counter sampling interval - 20 seconds). |
Simple check | vmware.vm.cpu.ready[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
CPU usage | Current upper-bound on CPU usage. The upper-bound is based on the host the VM is current running on, as well as limits configured on the VM itself or any parent resource pool. Valid while the VM is running. |
Simple check | vmware.vm.cpu.usage[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Datacenter name | Datacenter name of the guest VM. |
Simple check | vmware.vm.datacenter.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
Hypervisor name | Hypervisor name of the guest VM. |
Simple check | vmware.vm.hv.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
Ballooned memory | The amount of guest physical memory that is currently reclaimed through the balloon driver. |
Simple check | vmware.vm.memory.size.ballooned[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Compressed memory | The amount of memory currently in the compression cache for this VM. |
Simple check | vmware.vm.memory.size.compressed[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Private memory | Amount of memory backed by host memory and not being shared. |
Simple check | vmware.vm.memory.size.private[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Shared memory | The amount of guest physical memory shared through transparent page sharing. |
Simple check | vmware.vm.memory.size.shared[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Swapped memory | The amount of guest physical memory swapped out to the VM's swap device by ESX. |
Simple check | vmware.vm.memory.size.swapped[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Guest memory usage | The amount of guest physical memory that is being used by the VM. |
Simple check | vmware.vm.memory.size.usage.guest[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Host memory usage | The amount of host physical memory allocated to the VM, accounting for the amount saved from memory sharing with other VMs. |
Simple check | vmware.vm.memory.size.usage.host[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Memory size | Total size of configured memory. |
Simple check | vmware.vm.memory.size[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
Power state | The current power state of the VM. One of the following: - Powered off; - Powered on; - Suspended. |
Simple check | vmware.vm.powerstate[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
Committed storage space | Total storage space, in bytes, committed to this VM across all datastores. |
Simple check | vmware.vm.storage.committed[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Uncommitted storage space | Additional storage space, in bytes, potentially used by this VM on all datastores. |
Simple check | vmware.vm.storage.uncommitted[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Unshared storage space | Total storage space, in bytes, occupied by the VM across all datastores that is not shared with any other VM. |
Simple check | vmware.vm.storage.unshared[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Uptime | System uptime. |
Simple check | vmware.vm.uptime[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Guest memory swapped | Amount of guest physical memory that is swapped out to the swap space. |
Simple check | vmware.vm.guest.memory.size.swapped[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Host memory consumed | Amount of host physical memory consumed for backing up guest physical memory pages. |
Simple check | vmware.vm.memory.size.consumed[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Host memory usage in percent | Percentage of host physical memory that has been consumed. |
Simple check | vmware.vm.memory.usage[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
CPU usage in percent | CPU usage as a percentage during the interval. |
Simple check | vmware.vm.cpu.usage.perf[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
CPU latency in percent | Percentage of time the VM is unable to run because it is contending for access to the physical CPU(s). |
Simple check | vmware.vm.cpu.latency[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
CPU readiness latency in percent | Percentage of time that the virtual machine was ready, but was unable to get scheduled to run on the physical CPU. |
Simple check | vmware.vm.cpu.readiness[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
CPU swap-in latency in percent | Percentage of CPU time spent waiting for a swap-in. |
Simple check | vmware.vm.cpu.swapwait[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Uptime of guest OS | Total time elapsed since the last operating system boot-up (in seconds). |
Simple check | vmware.vm.guest.osuptime[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware Guest: Snapshot consolidation needed | Snapshot consolidation needed. |
last(/VMware Guest/vmware.vm.consolidationneeded[{$VMWARE.URL},{$VMWARE.VM.UUID}])=0 |Average |
Manual close: Yes | |
VMware Guest: VM is not running | VMware virtual machine is not running. |
last(/VMware Guest/vmware.vm.state[{$VMWARE.URL},{$VMWARE.VM.UUID}]) <> 2 |Average |
||
VMware Guest: VMware Tools is not running | VMware Tools is not running on the VM. |
last(/VMware Guest/vmware.vm.tools[{$VMWARE.URL},{$VMWARE.VM.UUID},status]) = 1 |Warning |
Depends on:
|
|
VMware Guest: VM has been restarted | Uptime is less than 10 minutes. |
(between(last(/VMware Guest/vmware.vm.guest.osuptime[{$VMWARE.URL},{$VMWARE.VM.UUID}]),1,10m)=1 or between(last(/VMware Guest/vmware.vm.uptime[{$VMWARE.URL},{$VMWARE.VM.UUID}]),1,10m)=1) and last(/VMware Guest/vmware.vm.powerstate[{$VMWARE.URL},{$VMWARE.VM.UUID}]) = 1 |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Network device discovery | Discovery of all network devices. |
Simple check | vmware.vm.net.if.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Number of bytes received on interface [{#IFBACKINGDEVICE}]/[{#IFDESC}] | VMware virtual machine network interface input statistics (bytes per second). |
Simple check | vmware.vm.net.if.in[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},bps] |
Number of packets received on interface [{#IFBACKINGDEVICE}]/[{#IFDESC}] | VMware virtual machine network interface input statistics (packets per second). |
Simple check | vmware.vm.net.if.in[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},pps] |
Number of bytes transmitted on interface [{#IFBACKINGDEVICE}]/[{#IFDESC}] | VMware virtual machine network interface output statistics (bytes per second). |
Simple check | vmware.vm.net.if.out[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},bps] |
Number of packets transmitted on interface [{#IFBACKINGDEVICE}]/[{#IFDESC}] | VMware virtual machine network interface output statistics (packets per second). |
Simple check | vmware.vm.net.if.out[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},pps] |
Network utilization on interface [{#IFBACKINGDEVICE}]/[{#IFDESC}] | VMware virtual machine network utilization (combined transmit and receive rates) during the interval. |
Simple check | vmware.vm.net.if.usage[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Disk device discovery | Discovery of all disk devices. |
Simple check | vmware.vm.vfs.dev.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Average number of bytes read from the disk [{#DISKDESC}] | VMware virtual machine disk device read statistics (bytes per second). |
Simple check | vmware.vm.vfs.dev.read[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},bps] |
Average number of reads from the disk [{#DISKDESC}] | VMware virtual machine disk device read statistics (operations per second). |
Simple check | vmware.vm.vfs.dev.read[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},ops] |
Average number of bytes written to the disk [{#DISKDESC}] | VMware virtual machine disk device write statistics (bytes per second). |
Simple check | vmware.vm.vfs.dev.write[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},bps] |
Average number of writes to the disk [{#DISKDESC}] | VMware virtual machine disk device write statistics (operations per second). |
Simple check | vmware.vm.vfs.dev.write[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},ops] |
Average number of outstanding read requests to the disk [{#DISKDESC}] | Average number of outstanding read requests to the virtual disk during the collection interval. |
Simple check | vmware.vm.storage.readoio[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}] |
Average number of outstanding write requests to the disk [{#DISKDESC}] | Average number of outstanding write requests to the virtual disk during the collection interval. |
Simple check | vmware.vm.storage.writeoio[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}] |
Average write latency to the disk [{#DISKDESC}] | The average time a write to the virtual disk takes. |
Simple check | vmware.vm.storage.totalwritelatency[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}] |
Average read latency from the disk [{#DISKDESC}] | The average time a read from the virtual disk takes. |
Simple check | vmware.vm.storage.totalreadlatency[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Mounted filesystem discovery | Discovery of all guest file systems. |
Simple check | vmware.vm.vfs.fs.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Free disk space on [{#FSNAME}] | VMware virtual machine file system statistics (bytes). |
Simple check | vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},free] |
Free disk space on [{#FSNAME}] (percentage) | VMware virtual machine file system statistics (percentage). |
Simple check | vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},pfree] |
Total disk space on [{#FSNAME}] | VMware virtual machine total disk space (bytes). |
Simple check | vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},total] Preprocessing
|
Used disk space on [{#FSNAME}] | VMware virtual machine used disk space (bytes). |
Simple check | vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},used] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware Guest: [{#FSNAME}]: Disk space is critically low | The disk free space on [{#FSNAME}] has been less than |
max(/VMware Guest/vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},pfree],5m)<{$VMWARE.VM.FS.PFREE.MIN.CRIT:"{#FSNAME}"} and {$VMWARE.VM.FS.TRIGGER.USED:"{#FSNAME}"}=1 |Average |
Manual close: Yes | |
VMware Guest: [{#FSNAME}]: Disk space is low | The disk free space on [{#FSNAME}] has been less than |
max(/VMware Guest/vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},pfree],5m)<{$VMWARE.VM.FS.PFREE.MIN.WARN:"{#FSNAME}"} and {$VMWARE.VM.FS.TRIGGER.USED:"{#FSNAME}"}=1 |Warning |
Manual close: Yes Depends on:
|
This template is designed for the effortless deployment of VMware ESX hypervisor monitoring and doesn't require any external scripts.
This template can be used in discovery as well as manually linked to a host.
For additional information, please see Zabbix documentation on VM monitoring.
To use this template as manually linked to a host, attach it to the host and manually set the value of the {$VMWARE.HV.UUID}
macro.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
To use this template as manually linked to a host:
--with-libxml2
and --with-libcurl
)StartVMwareCollectors
option in the Zabbix server configuration file to "1" or more{$VMWARE.URL}
{$VMWARE.USERNAME}
{$VMWARE.PASSWORD}
{$VMWARE.HV.UUID}
:
vim-cmd hostsvc/hostsummary | grep uuid
Name | Description | Default |
---|---|---|
{$VMWARE.URL} | VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk). |
|
{$VMWARE.USERNAME} | VMware service user name. |
|
{$VMWARE.PASSWORD} | VMware service |
|
{$VMWARE.HV.SENSOR.DISCOVERY} | Set to "true"/"false" to enable or disable the monitoring of hardware sensors. |
false |
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.MATCHES} | Sets the regex string of hardware sensor names to be allowed in discovery. |
.* |
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.NOT_MATCHES} | Sets the regex string of hardware sensor names to be ignored in discovery. |
CHANGE_IF_NEEDED |
{$VMWARE.HV.DATASTORE.SPACE.CRIT} | The critical threshold of the datastore free space. |
10 |
{$VMWARE.HV.DATASTORE.SPACE.WARN} | The warning threshold of the datastore free space. |
20 |
{$VMWARE.HV.UUID} | UUID of hypervisor. |
Name | Description | Type | Key and additional info |
---|---|---|---|
Connection state | VMware hypervisor connection state. One of the following: - Connected; - Disconnected; - Not responding. |
Simple check | vmware.hv.connectionstate[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing
|
Number of errors received | VMware hypervisor network input statistics (errors). |
Simple check | vmware.hv.network.in[{$VMWARE.URL},{$VMWARE.HV.UUID},errors] |
Number of broadcasts received | VMware hypervisor network input statistics (broadcasts). |
Simple check | vmware.hv.network.in[{$VMWARE.URL},{$VMWARE.HV.UUID},broadcast] |
Number of dropped received packets | VMware hypervisor network input statistics (packets dropped). |
Simple check | vmware.hv.network.in[{$VMWARE.URL},{$VMWARE.HV.UUID},dropped] |
Number of broadcasts transmitted | VMware hypervisor network output statistics (broadcasts). |
Simple check | vmware.hv.network.out[{$VMWARE.URL},{$VMWARE.HV.UUID},broadcast] |
Number of dropped transmitted packets | VMware hypervisor network output statistics (packets dropped). |
Simple check | vmware.hv.network.out[{$VMWARE.URL},{$VMWARE.HV.UUID},dropped] |
Number of errors transmitted | VMware hypervisor network output statistics (errors). |
Simple check | vmware.hv.network.out[{$VMWARE.URL},{$VMWARE.HV.UUID},errors] |
Hypervisor ping | Checks if the hypervisor is running and accepting ICMP pings. One of the following: - Down; - Up. |
Simple check | icmpping[] Preprocessing
|
Cluster name | Cluster name of the guest VM. |
Simple check | vmware.hv.cluster.name[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing
|
CPU usage | Aggregated CPU usage across all cores on the host in Hz. This is only available if the host is connected. |
Simple check | vmware.hv.cpu.usage[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
CPU usage in percent | CPU usage as a percentage during the interval. |
Simple check | vmware.hv.cpu.usage.perf[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
CPU utilization | CPU utilization as a percentage during the interval depends on power management or hyper-threading. |
Simple check | vmware.hv.cpu.utilization[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Power usage | Current power usage. |
Simple check | vmware.hv.power[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Power usage maximum allowed | Maximum allowed power usage. |
Simple check | vmware.hv.power[{$VMWARE.URL},{$VMWARE.HV.UUID},max] Preprocessing
|
Datacenter name | Datacenter name of the hypervisor. |
Simple check | vmware.hv.datacenter.name[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing
|
Full name | The complete product name, including the version information. |
Simple check | vmware.hv.fullname[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing
|
CPU frequency | The speed of the CPU cores. This is an average value if there are multiple speeds. The product of CPU frequency and the number of cores is approximately equal to the sum of the MHz for all the individual cores on the host. |
Simple check | vmware.hv.hw.cpu.freq[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
CPU model | The CPU model. |
Simple check | vmware.hv.hw.cpu.model[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
CPU cores | Number of physical CPU cores on the host. Physical CPU cores are the processors contained by a CPU package. |
Simple check | vmware.hv.hw.cpu.num[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing
|
CPU threads | Number of physical CPU threads on the host. |
Simple check | vmware.hv.hw.cpu.threads[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Total memory | The physical memory size. |
Simple check | vmware.hv.hw.memory[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Model | The system model identification. |
Simple check | vmware.hv.hw.model[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Bios UUID | The hardware BIOS identification. |
Simple check | vmware.hv.hw.uuid[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Vendor | The hardware vendor identification. |
Simple check | vmware.hv.hw.vendor[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Ballooned memory | The amount of guest physical memory that is currently reclaimed through the balloon driver. Sum of all guest VMs. |
Simple check | vmware.hv.memory.size.ballooned[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Used memory | Physical memory usage on the host. |
Simple check | vmware.hv.memory.used[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Number of bytes received | VMware hypervisor network input statistics (bytes per second). |
Simple check | vmware.hv.network.in[{$VMWARE.URL},{$VMWARE.HV.UUID},bps] |
Number of bytes transmitted | VMware hypervisor network output statistics (bytes per second). |
Simple check | vmware.hv.network.out[{$VMWARE.URL},{$VMWARE.HV.UUID},bps] |
Overall status | The overall alarm status of the host. One of the following: - Gray: Unknown; - Green: OK; - Yellow: It might have a problem; - Red: It has a problem. |
Simple check | vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Uptime | System uptime. |
Simple check | vmware.hv.uptime[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Version | Dot-separated version string. |
Simple check | vmware.hv.version[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Number of guest VMs | Number of guest virtual machines. |
Simple check | vmware.hv.vm.num[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Get sensors | Master item for sensor data. |
Simple check | vmware.hv.sensors.get[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware Hypervisor: Hypervisor is down | The service is unavailable or is not accepting ICMP pings. |
last(/VMware Hypervisor/icmpping[])=0 |Average |
Manual close: Yes | |
VMware Hypervisor: The {$VMWARE.HV.UUID} health is Red | One or more components in the appliance might be in an unusable status and the appliance might soon become unresponsive. Security patches might be available. |
last(/VMware Hypervisor/vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}])=3 |High |
||
VMware Hypervisor: The {$VMWARE.HV.UUID} health is Yellow | One or more components in the appliance might soon become overloaded. |
last(/VMware Hypervisor/vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}])=2 |Average |
Depends on:
|
|
VMware Hypervisor: Hypervisor has been restarted | Uptime is less than 10 minutes. |
last(/VMware Hypervisor/vmware.hv.uptime[{$VMWARE.URL},{$VMWARE.HV.UUID}])<10m |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Network interface discovery | Discovery of VMware hypervisor network interfaces. |
Simple check | vmware.hv.net.if.discovery[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
[{#IFNAME}] network interface speed | VMware hypervisor network interface speed. |
Simple check | vmware.hv.network.linkspeed[{$VMWARE.URL},{$VMWARE.HV.UUID},{#IFNAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Datastore discovery | Discovery of VMware datastores. |
Simple check | vmware.hv.datastore.discovery[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Average read IOPS of the datastore [{#DATASTORE}] | Average IOPS for a read operation from the datastore. |
Simple check | vmware.hv.datastore.read[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},rps] |
Average write IOPS of the datastore [{#DATASTORE}] | Average IOPS for a write operation to the datastore (milliseconds). |
Simple check | vmware.hv.datastore.write[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},rps] |
Average read latency of the datastore [{#DATASTORE}] | Average amount of time for a read operation from the datastore (milliseconds). |
Simple check | vmware.hv.datastore.read[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},latency] |
Free space on datastore [{#DATASTORE}] (percentage) | VMware datastore free space (percentage from the total). |
Simple check | vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},pfree] |
Total size of datastore [{#DATASTORE}] | VMware datastore space in bytes. |
Simple check | vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID}] Preprocessing
|
Average write latency of the datastore [{#DATASTORE}] | Average amount of time for a write operation to the datastore (milliseconds). |
Simple check | vmware.hv.datastore.write[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},latency] |
Multipath count for datastore [{#DATASTORE}] | Number of available datastore paths. |
Simple check | vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware Hypervisor: [{#DATASTORE}]: Free space is critically low | Datastore free space has fallen below the critical threshold. |
last(/VMware Hypervisor/vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},pfree])<{$VMWARE.HV.DATASTORE.SPACE.CRIT} |High |
||
VMware Hypervisor: [{#DATASTORE}]: Free space is low | Datastore free space has fallen below the warning threshold. |
last(/VMware Hypervisor/vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},pfree])<{$VMWARE.HV.DATASTORE.SPACE.WARN} |Warning |
Depends on:
|
|
VMware Hypervisor: The multipath count has been changed | The number of available datastore paths is less than registered ( |
last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID}],#1)<>last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID}],#2) and last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID}])<{#MULTIPATH.COUNT} |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Serial number discovery | VMware hypervisor serial number discovery. This item works only with VMware hypervisor versions above 6.7. |
Dependent item | vmware.hv.serial.number.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Serial number | VMware hypervisor serial number. |
Simple check | vmware.hv.hw.serialnumber[{$VMWARE.URL},{#VMWARE.HV.UUID}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Healthcheck discovery | VMware Rollup Health State sensor discovery. |
Dependent item | vmware.hv.healthcheck.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Health state rollup | The host's Rollup Health State sensor value. One of the following: - Gray: Unknown; - Green: OK; - Yellow: It might have a problem; - Red: It has a problem. |
Dependent item | vmware.hv.sensor.health.state[{#SINGLETON}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware Hypervisor: The {$VMWARE.HV.UUID} health is Red | One or more components in the appliance might be in an unusable status and the appliance might soon become unresponsive. Security patches might be available. |
last(/VMware Hypervisor/vmware.hv.sensor.health.state[{#SINGLETON}])=3 |High |
Depends on:
|
|
VMware Hypervisor: The {$VMWARE.HV.UUID} health is Yellow | One or more components in the appliance might soon become overloaded. |
last(/VMware Hypervisor/vmware.hv.sensor.health.state[{#SINGLETON}])=2 |Average |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Sensor discovery | VMware hardware sensor discovery. The data is retrieved from numeric sensor probes and provides information about the health of the physical system. |
Dependent item | vmware.hv.sensors.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Sensor [{#NAME}] health state | VMware hardware sensor health state. One of the following: - Gray: Unknown; - Green: OK; - Yellow: It might have a problem; - Red: It has a problem. |
Dependent item | vmware.hv.sensor.state["{#NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware Hypervisor: Sensor [{#NAME}] health state is Red | One or more components in the appliance might be in an unusable status and the appliance might soon become unresponsive. |
last(/VMware Hypervisor/vmware.hv.sensor.state["{#NAME}"])=3 |High |
Depends on:
|
|
VMware Hypervisor: Sensor [{#NAME}] health state is Yellow | One or more components in the appliance might soon become overloaded. |
last(/VMware Hypervisor/vmware.hv.sensor.state["{#NAME}"])=2 |Average |
Depends on:
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template set is designed for the effortless deployment of VMware vCenter and ESX hypervisor monitoring and doesn't require any external scripts.
For additional information, please see Zabbix documentation on VM monitoring.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
--with-libxml2
and --with-libcurl
)StartVMwareCollectors
option in the Zabbix server configuration file to "1" or moreSystemConfiguration.ReadOnly
and vStatsGroup
groups
Set the host macros (on the host or template level) required for VMware authentication:{$VMWARE.URL}
{$VMWARE.USERNAME}
{$VMWARE.PASSWORD}
Note: To enable discovery of hardware sensors of VMware hypervisors, set the macro {$VMWARE.HV.SENSOR.DISCOVERY}
to the value true
on the discovered host level.
Additional resources:
Name | Description | Default |
---|---|---|
{$VMWARE.URL} | VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk). |
|
{$VMWARE.USERNAME} | VMware service user name. |
|
{$VMWARE.PASSWORD} | VMware service |
|
{$VMWARE.PROXY} | Sets the HTTP proxy for script items. If this parameter is empty, then no proxy is used. |
|
{$VMWARE.HV.SENSOR.DISCOVERY} | Set "true"/"false" to enable or disable monitoring of hardware sensors. |
false |
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.MATCHES} | Sets the regex string of hardware sensor names to be allowed in discovery. |
.* |
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.NOT_MATCHES} | Sets the regex string of hardware sensor names to be ignored in discovery. |
CHANGE_IF_NEEDED |
{$VMWARE.VM.POWERSTATE} | Possibility to filter out VMs by power state. |
poweredOn|poweredOff|suspended |
{$VMWARE.DATASTORE.SPACE.CRIT} | The critical threshold of the datastore free space. |
10 |
{$VMWARE.DATASTORE.SPACE.WARN} | The warning threshold of the datastore free space. |
20 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get alarms | Get alarm status. |
Simple check | vmware.alarms.get[{$VMWARE.URL}] |
Event log | Collect VMware event log. See also: https://www.zabbix.com/documentation/7.0/manual/config/items/preprocessing/examples#filteringvmwareeventlogrecords |
Simple check | vmware.eventlog[{$VMWARE.URL},skip] |
Full name | VMware service full name. |
Simple check | vmware.fullname[{$VMWARE.URL}] Preprocessing
|
Version | VMware service version. |
Simple check | vmware.version[{$VMWARE.URL}] Preprocessing
|
Get Overall Health VC State | Gets overall health of the system. This item works only with VMware vCenter versions above 6.5. |
Script | vmware.health.get |
Overall Health VC State error check | Data collection error check. |
Dependent item | vmware.health.check Preprocessing
|
Overall Health VC State | VMware Overall health of system. One of the following: - Gray: No health data is available for this service. - Green: Service is healthy. - Yellow: The service is in a healthy state, but experiencing some level of problems. - Orange: The service health is degraded. The service might have serious problems. - Red: The service is unavailable, not functioning properly, or will stop functioning soon. - Not available: The health status is unavailable (not supported on the vCenter or ESXi side). |
Dependent item | vmware.health.state Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware: Failed to get Overall Health VC State | Failed to get data. Check debug log for more information. |
length(last(/VMware/vmware.health.check))>0 |Warning |
||
VMware: Overall Health VC State is not Green | One or more components in the appliance might be in an unusable status and the appliance might soon become unresponsive. |
last(/VMware/vmware.health.state)>0 and last(/VMware/vmware.health.state)<>6 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware alarm discovery | Discovery of alarms. |
Dependent item | vmware.alarms.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
{#VMWARE.ALARMS.NAME} | VMware alarm status. |
Dependent item | vmware.alarms.status["{#VMWARE.ALARMS.KEY}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware: {#VMWARE.ALARMS.NAME} | {#VMWARE.ALARMS.DESC} |
last(/VMware/vmware.alarms.status["{#VMWARE.ALARMS.KEY}"])<>-1 |Not_classified |
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware cluster discovery | Discovery of clusters. |
Simple check | vmware.cluster.discovery[{$VMWARE.URL}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Status of [{#CLUSTER.NAME}] cluster | VMware cluster status. One of the following: - Gray: Unknown; - Green: OK; - Yellow: It might have a problem; - Red: It has a problem. |
Simple check | vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware: The [{#CLUSTER.NAME}] status is Red | A cluster enabled for DRS becomes invalid (red) when the tree is no longer internally consistent, that is, when resource constraints are not observed. See also: https://docs.vmware.com/en/VMware-vSphere/8.0/vsphere-resource-management/GUID-C7417CAA-BD38-41D0-9529-9E7A5898BB12.html |
last(/VMware/vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}])=3 |High |
||
VMware: The [{#CLUSTER.NAME}] status is Yellow | A cluster becomes overcommitted (yellow) when the tree of resource pools and virtual machines is internally consistent but the cluster does not have the capacity to support all the resources reserved by the child resource pools. See also: https://docs.vmware.com/en/VMware-vSphere/8.0/vsphere-resource-management/GUID-ED8240A0-FB54-4A31-BD3D-F23FE740F10C.html |
last(/VMware/vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}])=2 |Average |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware datastore discovery | Discovery of VMware datastores. |
Simple check | vmware.datastore.discovery[{$VMWARE.URL}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Average read IOPS of the datastore [{#DATASTORE}] | IOPS for a read operation from the datastore (milliseconds). |
Simple check | vmware.datastore.read[{$VMWARE.URL},{#DATASTORE.UUID},rps] |
Average write IOPS of the datastore [{#DATASTORE}] | IOPS for a write operation to the datastore (milliseconds). |
Simple check | vmware.datastore.write[{$VMWARE.URL},{#DATASTORE.UUID},rps] |
Average read latency of the datastore [{#DATASTORE}] | Amount of time for a read operation from the datastore (milliseconds). |
Simple check | vmware.datastore.read[{$VMWARE.URL},{#DATASTORE.UUID},latency] |
Free space on datastore [{#DATASTORE}] (percentage) | VMware datastore free space (percentage from the total). |
Simple check | vmware.datastore.size[{$VMWARE.URL},{#DATASTORE.UUID},pfree] |
Total size of datastore [{#DATASTORE}] | VMware datastore space in bytes. |
Simple check | vmware.datastore.size[{$VMWARE.URL},{#DATASTORE.UUID}] Preprocessing
|
Average write latency of the datastore [{#DATASTORE}] | Amount of time for a write operation to the datastore (milliseconds). |
Simple check | vmware.datastore.write[{$VMWARE.URL},{#DATASTORE.UUID},latency] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware: [{#DATASTORE}]: Free space is critically low | Datastore free space has fallen below the critical threshold. |
last(/VMware/vmware.datastore.size[{$VMWARE.URL},{#DATASTORE.UUID},pfree])<{$VMWARE.DATASTORE.SPACE.CRIT} |High |
||
VMware: [{#DATASTORE}]: Free space is low | Datastore free space has fallen below the warning threshold. |
last(/VMware/vmware.datastore.size[{$VMWARE.URL},{#DATASTORE.UUID},pfree])<{$VMWARE.DATASTORE.SPACE.WARN} |Warning |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware hypervisor discovery | Discovery of hypervisors. |
Simple check | vmware.hv.discovery[{$VMWARE.URL}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware VM discovery | Discovery of guest virtual machines. |
Simple check | vmware.vm.discovery[{$VMWARE.URL}] |
Name | Description | Default |
---|---|---|
{$VMWARE.URL} | VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk). |
|
{$VMWARE.USERNAME} | VMware service user name. |
|
{$VMWARE.PASSWORD} | VMware service |
|
{$VMWARE.VM.FS.PFREE.MIN.WARN} | VMware guest free space threshold for the warning trigger. |
20 |
{$VMWARE.VM.FS.PFREE.MIN.CRIT} | VMware guest free space threshold for the critical trigger. |
10 |
{$VMWARE.VM.FS.TRIGGER.USED} | VMware guest used free space trigger. Set to "1"/"0" to enable or disable the trigger. |
0 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Snapshot consolidation needed | Displays whether snapshot consolidation is needed or not. One of the following: - True; - False. |
Simple check | vmware.vm.consolidationneeded[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
Snapshot count | Snapshot count of the guest VM. |
Dependent item | vmware.vm.snapshot.count Preprocessing
|
Get snapshots | Snapshots of the guest VM. |
Simple check | vmware.vm.snapshot.get[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
Snapshot latest date | Latest snapshot date of the guest VM. |
Dependent item | vmware.vm.snapshot.latestdate Preprocessing
|
VM state | VMware virtual machine state. One of the following: - Not running; - Resetting; - Running; - Shutting down; - Standby; - Unknown. |
Simple check | vmware.vm.state[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
VMware Tools status | Monitoring of VMware Tools. One of the following: - Guest tools executing scripts: VMware Tools is starting. - Guest tools not running: VMware Tools is not running. - Guest tools running: VMware Tools is running. |
Simple check | vmware.vm.tools[{$VMWARE.URL},{$VMWARE.VM.UUID},status] Preprocessing
|
VMware Tools version | Monitoring of the VMware Tools version. |
Simple check | vmware.vm.tools[{$VMWARE.URL},{$VMWARE.VM.UUID},version] Preprocessing
|
Cluster name | Cluster name of the guest VM. |
Simple check | vmware.vm.cluster.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
Number of virtual CPUs | Number of virtual CPUs assigned to the guest. |
Simple check | vmware.vm.cpu.num[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
CPU ready | Time that the VM was ready, but unable to get scheduled to run on the physical CPU during the last measurement interval (VMware vCenter/ESXi Server performance counter sampling interval - 20 seconds). |
Simple check | vmware.vm.cpu.ready[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
CPU usage | Current upper-bound on CPU usage. The upper-bound is based on the host the VM is current running on, as well as limits configured on the VM itself or any parent resource pool. Valid while the VM is running. |
Simple check | vmware.vm.cpu.usage[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Datacenter name | Datacenter name of the guest VM. |
Simple check | vmware.vm.datacenter.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
Hypervisor name | Hypervisor name of the guest VM. |
Simple check | vmware.vm.hv.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
Ballooned memory | The amount of guest physical memory that is currently reclaimed through the balloon driver. |
Simple check | vmware.vm.memory.size.ballooned[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Compressed memory | The amount of memory currently in the compression cache for this VM. |
Simple check | vmware.vm.memory.size.compressed[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Private memory | Amount of memory backed by host memory and not being shared. |
Simple check | vmware.vm.memory.size.private[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Shared memory | The amount of guest physical memory shared through transparent page sharing. |
Simple check | vmware.vm.memory.size.shared[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Swapped memory | The amount of guest physical memory swapped out to the VM's swap device by ESX. |
Simple check | vmware.vm.memory.size.swapped[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Guest memory usage | The amount of guest physical memory that is being used by the VM. |
Simple check | vmware.vm.memory.size.usage.guest[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Host memory usage | The amount of host physical memory allocated to the VM, accounting for the amount saved from memory sharing with other VMs. |
Simple check | vmware.vm.memory.size.usage.host[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Memory size | Total size of configured memory. |
Simple check | vmware.vm.memory.size[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
Power state | The current power state of the VM. One of the following: - Powered off; - Powered on; - Suspended. |
Simple check | vmware.vm.powerstate[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
Committed storage space | Total storage space, in bytes, committed to this VM across all datastores. |
Simple check | vmware.vm.storage.committed[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Uncommitted storage space | Additional storage space, in bytes, potentially used by this VM on all datastores. |
Simple check | vmware.vm.storage.uncommitted[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Unshared storage space | Total storage space, in bytes, occupied by the VM across all datastores that is not shared with any other VM. |
Simple check | vmware.vm.storage.unshared[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Uptime | System uptime. |
Simple check | vmware.vm.uptime[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Guest memory swapped | Amount of guest physical memory that is swapped out to the swap space. |
Simple check | vmware.vm.guest.memory.size.swapped[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Host memory consumed | Amount of host physical memory consumed for backing up guest physical memory pages. |
Simple check | vmware.vm.memory.size.consumed[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Host memory usage in percent | Percentage of host physical memory that has been consumed. |
Simple check | vmware.vm.memory.usage[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
CPU usage in percent | CPU usage as a percentage during the interval. |
Simple check | vmware.vm.cpu.usage.perf[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
CPU latency in percent | Percentage of time the VM is unable to run because it is contending for access to the physical CPU(s). |
Simple check | vmware.vm.cpu.latency[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
CPU readiness latency in percent | Percentage of time that the virtual machine was ready, but was unable to get scheduled to run on the physical CPU. |
Simple check | vmware.vm.cpu.readiness[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
CPU swap-in latency in percent | Percentage of CPU time spent waiting for a swap-in. |
Simple check | vmware.vm.cpu.swapwait[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Uptime of guest OS | Total time elapsed since the last operating system boot-up (in seconds). |
Simple check | vmware.vm.guest.osuptime[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware Guest: Snapshot consolidation needed | Snapshot consolidation needed. |
last(/VMware Guest/vmware.vm.consolidationneeded[{$VMWARE.URL},{$VMWARE.VM.UUID}])=0 |Average |
Manual close: Yes | |
VMware Guest: VM is not running | VMware virtual machine is not running. |
last(/VMware Guest/vmware.vm.state[{$VMWARE.URL},{$VMWARE.VM.UUID}]) <> 2 |Average |
||
VMware Guest: VMware Tools is not running | VMware Tools is not running on the VM. |
last(/VMware Guest/vmware.vm.tools[{$VMWARE.URL},{$VMWARE.VM.UUID},status]) = 1 |Warning |
Depends on:
|
|
VMware Guest: VM has been restarted | Uptime is less than 10 minutes. |
(between(last(/VMware Guest/vmware.vm.guest.osuptime[{$VMWARE.URL},{$VMWARE.VM.UUID}]),1,10m)=1 or between(last(/VMware Guest/vmware.vm.uptime[{$VMWARE.URL},{$VMWARE.VM.UUID}]),1,10m)=1) and last(/VMware Guest/vmware.vm.powerstate[{$VMWARE.URL},{$VMWARE.VM.UUID}]) = 1 |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Network device discovery | Discovery of all network devices. |
Simple check | vmware.vm.net.if.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Number of bytes received on interface [{#IFBACKINGDEVICE}]/[{#IFDESC}] | VMware virtual machine network interface input statistics (bytes per second). |
Simple check | vmware.vm.net.if.in[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},bps] |
Number of packets received on interface [{#IFBACKINGDEVICE}]/[{#IFDESC}] | VMware virtual machine network interface input statistics (packets per second). |
Simple check | vmware.vm.net.if.in[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},pps] |
Number of bytes transmitted on interface [{#IFBACKINGDEVICE}]/[{#IFDESC}] | VMware virtual machine network interface output statistics (bytes per second). |
Simple check | vmware.vm.net.if.out[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},bps] |
Number of packets transmitted on interface [{#IFBACKINGDEVICE}]/[{#IFDESC}] | VMware virtual machine network interface output statistics (packets per second). |
Simple check | vmware.vm.net.if.out[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},pps] |
Network utilization on interface [{#IFBACKINGDEVICE}]/[{#IFDESC}] | VMware virtual machine network utilization (combined transmit and receive rates) during the interval. |
Simple check | vmware.vm.net.if.usage[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Disk device discovery | Discovery of all disk devices. |
Simple check | vmware.vm.vfs.dev.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Average number of bytes read from the disk [{#DISKDESC}] | VMware virtual machine disk device read statistics (bytes per second). |
Simple check | vmware.vm.vfs.dev.read[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},bps] |
Average number of reads from the disk [{#DISKDESC}] | VMware virtual machine disk device read statistics (operations per second). |
Simple check | vmware.vm.vfs.dev.read[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},ops] |
Average number of bytes written to the disk [{#DISKDESC}] | VMware virtual machine disk device write statistics (bytes per second). |
Simple check | vmware.vm.vfs.dev.write[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},bps] |
Average number of writes to the disk [{#DISKDESC}] | VMware virtual machine disk device write statistics (operations per second). |
Simple check | vmware.vm.vfs.dev.write[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},ops] |
Average number of outstanding read requests to the disk [{#DISKDESC}] | Average number of outstanding read requests to the virtual disk during the collection interval. |
Simple check | vmware.vm.storage.readoio[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}] |
Average number of outstanding write requests to the disk [{#DISKDESC}] | Average number of outstanding write requests to the virtual disk during the collection interval. |
Simple check | vmware.vm.storage.writeoio[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}] |
Average write latency to the disk [{#DISKDESC}] | The average time a write to the virtual disk takes. |
Simple check | vmware.vm.storage.totalwritelatency[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}] |
Average read latency from the disk [{#DISKDESC}] | The average time a read from the virtual disk takes. |
Simple check | vmware.vm.storage.totalreadlatency[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Mounted filesystem discovery | Discovery of all guest file systems. |
Simple check | vmware.vm.vfs.fs.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Free disk space on [{#FSNAME}] | VMware virtual machine file system statistics (bytes). |
Simple check | vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},free] |
Free disk space on [{#FSNAME}] (percentage) | VMware virtual machine file system statistics (percentage). |
Simple check | vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},pfree] |
Total disk space on [{#FSNAME}] | VMware virtual machine total disk space (bytes). |
Simple check | vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},total] Preprocessing
|
Used disk space on [{#FSNAME}] | VMware virtual machine used disk space (bytes). |
Simple check | vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},used] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware Guest: [{#FSNAME}]: Disk space is critically low | The disk free space on [{#FSNAME}] has been less than |
max(/VMware Guest/vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},pfree],5m)<{$VMWARE.VM.FS.PFREE.MIN.CRIT:"{#FSNAME}"} and {$VMWARE.VM.FS.TRIGGER.USED:"{#FSNAME}"}=1 |Average |
Manual close: Yes | |
VMware Guest: [{#FSNAME}]: Disk space is low | The disk free space on [{#FSNAME}] has been less than |
max(/VMware Guest/vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},pfree],5m)<{$VMWARE.VM.FS.PFREE.MIN.WARN:"{#FSNAME}"} and {$VMWARE.VM.FS.TRIGGER.USED:"{#FSNAME}"}=1 |Warning |
Manual close: Yes Depends on:
|
This template is designed for the effortless deployment of VMware ESX hypervisor monitoring and doesn't require any external scripts.
This template can be used in discovery as well as manually linked to a host.
For additional information, please see Zabbix documentation on VM monitoring.
To use this template as manually linked to a host, attach it to the host and manually set the value of the {$VMWARE.HV.UUID}
macro.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
To use this template as manually linked to a host:
--with-libxml2
and --with-libcurl
)StartVMwareCollectors
option in the Zabbix server configuration file to "1" or more{$VMWARE.URL}
{$VMWARE.USERNAME}
{$VMWARE.PASSWORD}
{$VMWARE.HV.UUID}
:
vim-cmd hostsvc/hostsummary | grep uuid
Name | Description | Default |
---|---|---|
{$VMWARE.URL} | VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk). |
|
{$VMWARE.USERNAME} | VMware service user name. |
|
{$VMWARE.PASSWORD} | VMware service |
|
{$VMWARE.HV.SENSOR.DISCOVERY} | Set to "true"/"false" to enable or disable the monitoring of hardware sensors. |
false |
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.MATCHES} | Sets the regex string of hardware sensor names to be allowed in discovery. |
.* |
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.NOT_MATCHES} | Sets the regex string of hardware sensor names to be ignored in discovery. |
CHANGE_IF_NEEDED |
{$VMWARE.HV.DATASTORE.SPACE.CRIT} | The critical threshold of the datastore free space. |
10 |
{$VMWARE.HV.DATASTORE.SPACE.WARN} | The warning threshold of the datastore free space. |
20 |
{$VMWARE.HV.UUID} | UUID of hypervisor. |
Name | Description | Type | Key and additional info |
---|---|---|---|
Connection state | VMware hypervisor connection state. One of the following: - Connected; - Disconnected; - Not responding. |
Simple check | vmware.hv.connectionstate[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing
|
Number of errors received | VMware hypervisor network input statistics (errors). |
Simple check | vmware.hv.network.in[{$VMWARE.URL},{$VMWARE.HV.UUID},errors] |
Number of broadcasts received | VMware hypervisor network input statistics (broadcasts). |
Simple check | vmware.hv.network.in[{$VMWARE.URL},{$VMWARE.HV.UUID},broadcast] |
Number of dropped received packets | VMware hypervisor network input statistics (packets dropped). |
Simple check | vmware.hv.network.in[{$VMWARE.URL},{$VMWARE.HV.UUID},dropped] |
Number of broadcasts transmitted | VMware hypervisor network output statistics (broadcasts). |
Simple check | vmware.hv.network.out[{$VMWARE.URL},{$VMWARE.HV.UUID},broadcast] |
Number of dropped transmitted packets | VMware hypervisor network output statistics (packets dropped). |
Simple check | vmware.hv.network.out[{$VMWARE.URL},{$VMWARE.HV.UUID},dropped] |
Number of errors transmitted | VMware hypervisor network output statistics (errors). |
Simple check | vmware.hv.network.out[{$VMWARE.URL},{$VMWARE.HV.UUID},errors] |
Hypervisor ping | Checks if the hypervisor is running and accepting ICMP pings. One of the following: - Down; - Up. |
Simple check | icmpping[] Preprocessing
|
Cluster name | Cluster name of the guest VM. |
Simple check | vmware.hv.cluster.name[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing
|
CPU usage | Aggregated CPU usage across all cores on the host in Hz. This is only available if the host is connected. |
Simple check | vmware.hv.cpu.usage[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
CPU usage in percent | CPU usage as a percentage during the interval. |
Simple check | vmware.hv.cpu.usage.perf[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
CPU utilization | CPU utilization as a percentage during the interval depends on power management or hyper-threading. |
Simple check | vmware.hv.cpu.utilization[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Power usage | Current power usage. |
Simple check | vmware.hv.power[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Power usage maximum allowed | Maximum allowed power usage. |
Simple check | vmware.hv.power[{$VMWARE.URL},{$VMWARE.HV.UUID},max] Preprocessing
|
Datacenter name | Datacenter name of the hypervisor. |
Simple check | vmware.hv.datacenter.name[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing
|
Full name | The complete product name, including the version information. |
Simple check | vmware.hv.fullname[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing
|
CPU frequency | The speed of the CPU cores. This is an average value if there are multiple speeds. The product of CPU frequency and the number of cores is approximately equal to the sum of the MHz for all the individual cores on the host. |
Simple check | vmware.hv.hw.cpu.freq[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
CPU model | The CPU model. |
Simple check | vmware.hv.hw.cpu.model[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
CPU cores | Number of physical CPU cores on the host. Physical CPU cores are the processors contained by a CPU package. |
Simple check | vmware.hv.hw.cpu.num[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing
|
CPU threads | Number of physical CPU threads on the host. |
Simple check | vmware.hv.hw.cpu.threads[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Total memory | The physical memory size. |
Simple check | vmware.hv.hw.memory[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Model | The system model identification. |
Simple check | vmware.hv.hw.model[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Bios UUID | The hardware BIOS identification. |
Simple check | vmware.hv.hw.uuid[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Vendor | The hardware vendor identification. |
Simple check | vmware.hv.hw.vendor[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Ballooned memory | The amount of guest physical memory that is currently reclaimed through the balloon driver. Sum of all guest VMs. |
Simple check | vmware.hv.memory.size.ballooned[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Used memory | Physical memory usage on the host. |
Simple check | vmware.hv.memory.used[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Number of bytes received | VMware hypervisor network input statistics (bytes per second). |
Simple check | vmware.hv.network.in[{$VMWARE.URL},{$VMWARE.HV.UUID},bps] |
Number of bytes transmitted | VMware hypervisor network output statistics (bytes per second). |
Simple check | vmware.hv.network.out[{$VMWARE.URL},{$VMWARE.HV.UUID},bps] |
Overall status | The overall alarm status of the host. One of the following: - Gray: Unknown; - Green: OK; - Yellow: It might have a problem; - Red: It has a problem. |
Simple check | vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Uptime | System uptime. |
Simple check | vmware.hv.uptime[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Version | Dot-separated version string. |
Simple check | vmware.hv.version[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Number of guest VMs | Number of guest virtual machines. |
Simple check | vmware.hv.vm.num[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Get sensors | Master item for sensor data. |
Simple check | vmware.hv.sensors.get[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware Hypervisor: Hypervisor is down | The service is unavailable or is not accepting ICMP pings. |
last(/VMware Hypervisor/icmpping[])=0 |Average |
Manual close: Yes | |
VMware Hypervisor: The {$VMWARE.HV.UUID} health is Red | One or more components in the appliance might be in an unusable status and the appliance might soon become unresponsive. Security patches might be available. |
last(/VMware Hypervisor/vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}])=3 |High |
||
VMware Hypervisor: The {$VMWARE.HV.UUID} health is Yellow | One or more components in the appliance might soon become overloaded. |
last(/VMware Hypervisor/vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}])=2 |Average |
Depends on:
|
|
VMware Hypervisor: Hypervisor has been restarted | Uptime is less than 10 minutes. |
last(/VMware Hypervisor/vmware.hv.uptime[{$VMWARE.URL},{$VMWARE.HV.UUID}])<10m |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Network interface discovery | Discovery of VMware hypervisor network interfaces. |
Simple check | vmware.hv.net.if.discovery[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
[{#IFNAME}] network interface speed | VMware hypervisor network interface speed. |
Simple check | vmware.hv.network.linkspeed[{$VMWARE.URL},{$VMWARE.HV.UUID},{#IFNAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Datastore discovery | Discovery of VMware datastores. |
Simple check | vmware.hv.datastore.discovery[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Average read IOPS of the datastore [{#DATASTORE}] | Average IOPS for a read operation from the datastore. |
Simple check | vmware.hv.datastore.read[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},rps] |
Average write IOPS of the datastore [{#DATASTORE}] | Average IOPS for a write operation to the datastore (milliseconds). |
Simple check | vmware.hv.datastore.write[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},rps] |
Average read latency of the datastore [{#DATASTORE}] | Average amount of time for a read operation from the datastore (milliseconds). |
Simple check | vmware.hv.datastore.read[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},latency] |
Free space on datastore [{#DATASTORE}] (percentage) | VMware datastore free space (percentage from the total). |
Simple check | vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},pfree] |
Total size of datastore [{#DATASTORE}] | VMware datastore space in bytes. |
Simple check | vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID}] Preprocessing
|
Average write latency of the datastore [{#DATASTORE}] | Average amount of time for a write operation to the datastore (milliseconds). |
Simple check | vmware.hv.datastore.write[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},latency] |
Multipath count for datastore [{#DATASTORE}] | Number of available datastore paths. |
Simple check | vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware Hypervisor: [{#DATASTORE}]: Free space is critically low | Datastore free space has fallen below the critical threshold. |
last(/VMware Hypervisor/vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},pfree])<{$VMWARE.HV.DATASTORE.SPACE.CRIT} |High |
||
VMware Hypervisor: [{#DATASTORE}]: Free space is low | Datastore free space has fallen below the warning threshold. |
last(/VMware Hypervisor/vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID},pfree])<{$VMWARE.HV.DATASTORE.SPACE.WARN} |Warning |
Depends on:
|
|
VMware Hypervisor: The multipath count has been changed | The number of available datastore paths is less than registered ( |
last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID}],#1)<>last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID}],#2) and last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE.UUID}])<{#MULTIPATH.COUNT} |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Serial number discovery | VMware hypervisor serial number discovery. This item works only with VMware hypervisor versions above 6.7. |
Dependent item | vmware.hv.serial.number.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Serial number | VMware hypervisor serial number. |
Simple check | vmware.hv.hw.serialnumber[{$VMWARE.URL},{#VMWARE.HV.UUID}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Healthcheck discovery | VMware Rollup Health State sensor discovery. |
Dependent item | vmware.hv.healthcheck.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Health state rollup | The host's Rollup Health State sensor value. One of the following: - Gray: Unknown; - Green: OK; - Yellow: It might have a problem; - Red: It has a problem. |
Dependent item | vmware.hv.sensor.health.state[{#SINGLETON}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware Hypervisor: The {$VMWARE.HV.UUID} health is Red | One or more components in the appliance might be in an unusable status and the appliance might soon become unresponsive. Security patches might be available. |
last(/VMware Hypervisor/vmware.hv.sensor.health.state[{#SINGLETON}])=3 |High |
Depends on:
|
|
VMware Hypervisor: The {$VMWARE.HV.UUID} health is Yellow | One or more components in the appliance might soon become overloaded. |
last(/VMware Hypervisor/vmware.hv.sensor.health.state[{#SINGLETON}])=2 |Average |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Sensor discovery | VMware hardware sensor discovery. The data is retrieved from numeric sensor probes and provides information about the health of the physical system. |
Dependent item | vmware.hv.sensors.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Sensor [{#NAME}] health state | VMware hardware sensor health state. One of the following: - Gray: Unknown; - Green: OK; - Yellow: It might have a problem; - Red: It has a problem. |
Dependent item | vmware.hv.sensor.state["{#NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware Hypervisor: Sensor [{#NAME}] health state is Red | One or more components in the appliance might be in an unusable status and the appliance might soon become unresponsive. |
last(/VMware Hypervisor/vmware.hv.sensor.state["{#NAME}"])=3 |High |
Depends on:
|
|
VMware Hypervisor: Sensor [{#NAME}] health state is Yellow | One or more components in the appliance might soon become overloaded. |
last(/VMware Hypervisor/vmware.hv.sensor.state["{#NAME}"])=2 |Average |
Depends on:
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
It works without any external scripts and uses the script item.
NOTE: Veeam Backup Enterprise Manager REST API may not be available for some editions, the template will only work with the following editions of Veeam Backup and Replication:
See Veeam Data Platform Feature Comparison for more details.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
See Zabbix template operation for basic instructions.
Portal Administrator
role.
> See Veeam Help Center for more details.{$VEEAM.MANAGER.API.URL}
, {$VEEAM.MANAGER.USER}
, {$VEEAM.MANAGER.PASSWORD}
.Name | Description | Default |
---|---|---|
{$VEEAM.MANAGER.API.URL} | Veeam Backup Enterprise Manager API endpoint is a URL in the format: |
https://localhost:9398 |
{$VEEAM.MANAGER.HTTP.PROXY} | Sets the HTTP proxy to |
|
{$VEEAM.MANAGER.PASSWORD} | The |
|
{$VEEAM.MANAGER.USER} | The |
|
{$VEEAM.MANAGER.DATA.TIMEOUT} | A response timeout for API. |
10 |
{$BACKUP.TYPE.MATCHES} | This macro is used in backup discovery rule. |
.* |
{$BACKUP.TYPE.NOT_MATCHES} | This macro is used in backup discovery rule. |
CHANGE_IF_NEEDED |
{$BACKUP.NAME.MATCHES} | This macro is used in backup discovery rule. |
.* |
{$BACKUP.NAME.NOT_MATCHES} | This macro is used in backup discovery rule. |
CHANGE_IF_NEEDED |
{$VEEAM.MANAGER.JOB.MAX.WARN} | The maximum score of warning jobs (for a trigger expression). |
10 |
{$VEEAM.MANAGER.JOB.MAX.FAIL} | The maximum score of failed jobs (for a trigger expression). |
5 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get metrics | The result of API requests is expressed in the JSON. |
Script | veeam.manager.get.metrics |
Get errors | The errors from API requests. |
Dependent item | veeam.manager.get.errors Preprocessing
|
Running Jobs | Informs about the running jobs. |
Dependent item | veeam.manager.running.jobs Preprocessing
|
Scheduled Jobs | Informs about the scheduled jobs. |
Dependent item | veeam.manager.scheduled.jobs Preprocessing
|
Scheduled Backup Jobs | Informs about the scheduled backup jobs. |
Dependent item | veeam.manager.scheduled.backup.jobs Preprocessing
|
Scheduled Replica Jobs | Informs about the scheduled replica jobs. |
Dependent item | veeam.manager.scheduled.replica.jobs Preprocessing
|
Total Job Runs | Informs about the total job runs. |
Dependent item | veeam.manager.scheduled.total.jobs Preprocessing
|
Warnings Job Runs | Informs about the warning job runs. |
Dependent item | veeam.manager.warning.jobs Preprocessing
|
Failed Job Runs | Informs about the failed job runs. |
Dependent item | veeam.manager.failed.jobs Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Veeam Backup: There are errors in requests to API | Zabbix has received errors in response to API requests. |
length(last(/Veeam Backup Enterprise Manager by HTTP/veeam.manager.get.errors))>0 |Average |
||
Veeam Backup: Warning job runs is too high | last(/Veeam Backup Enterprise Manager by HTTP/veeam.manager.warning.jobs)>{$VEEAM.MANAGER.JOB.MAX.WARN} |Warning |
Manual close: Yes | ||
Veeam Backup: Failed job runs is too high | last(/Veeam Backup Enterprise Manager by HTTP/veeam.manager.failed.jobs)>{$VEEAM.MANAGER.JOB.MAX.FAIL} |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Backup Files discovery | Discovery of all backup files created on, or imported to the backup servers that are connected to Veeam Backup Enterprise Manager. |
Dependent item | veeam.backup.files.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Backup Size [{#NAME}] | Gets the backup size with the name |
Dependent item | veeam.backup.file.size[{#NAME}] Preprocessing
|
Data Size [{#NAME}] | Gets the data size with the name |
Dependent item | veeam.backup.data.size[{#NAME}] Preprocessing
|
Compression ratio [{#NAME}] | Gets the data compression ratio with the name |
Dependent item | veeam.backup.compress.ratio[{#NAME}] Preprocessing
|
Deduplication Ratio [{#NAME}] | Gets the data deduplication ratio with the name |
Dependent item | veeam.backup.deduplication.ratio[{#NAME}] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed to monitor Veeam Backup and Replication. It works without any external scripts and uses the script item.
NOTE: Since the RESTful API may not be available for some editions, the template will only work with the following editions of Veeam Backup and Replication:
See Veeam Data Platform Feature Comparison for more details.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
{$VEEAM.API.URL}
, {$VEEAM.USER}
, and {$VEEAM.PASSWORD}
.Name | Description | Default |
---|---|---|
{$VEEAM.API.URL} | The Veeam API endpoint is a URL in the format |
https://localhost:9419 |
{$VEEAM.HTTP.PROXY} | Sets the HTTP proxy to |
|
{$VEEAM.PASSWORD} | The |
|
{$VEEAM.USER} | The |
|
{$VEEAM.DATA.TIMEOUT} | A response timeout for the API. |
10 |
{$CREATED.AFTER} | Returns sessions that are created after chosen days. |
7 |
{$SESSION.NAME.MATCHES} | This macro is used in discovery rule to evaluate sessions. |
.* |
{$SESSION.NAME.NOT_MATCHES} | This macro is used in discovery rule to evaluate sessions. |
CHANGE_IF_NEEDED |
{$SESSION.TYPE.MATCHES} | This macro is used in discovery rule to evaluate sessions. |
.* |
{$SESSION.TYPE.NOT_MATCHES} | This macro is used in discovery rule to evaluate sessions. |
CHANGE_IF_NEEDED |
{$PROXIES.NAME.MATCHES} | This macro is used in proxies discovery rule. |
.* |
{$PROXIES.NAME.NOT_MATCHES} | This macro is used in proxies discovery rule. |
CHANGE_IF_NEEDED |
{$PROXIES.TYPE.MATCHES} | This macro is used in proxies discovery rule. |
.* |
{$PROXIES.TYPE.NOT_MATCHES} | This macro is used in proxies discovery rule. |
CHANGE_IF_NEEDED |
{$REPOSITORIES.NAME.MATCHES} | This macro is used in repositories discovery rule. |
.* |
{$REPOSITORIES.NAME.NOT_MATCHES} | This macro is used in repositories discovery rule. |
CHANGE_IF_NEEDED |
{$REPOSITORIES.TYPE.MATCHES} | This macro is used in repositories discovery rule. |
.* |
{$REPOSITORIES.TYPE.NOT_MATCHES} | This macro is used in repositories discovery rule. |
CHANGE_IF_NEEDED |
{$JOB.NAME.MATCHES} | This macro is used in discovery rule to evaluate the states of jobs. |
.* |
{$JOB.NAME.NOT_MATCHES} | This macro is used in discovery rule to evaluate the states of jobs. |
CHANGE_IF_NEEDED |
{$JOB.TYPE.MATCHES} | This macro is used in discovery rule to evaluate the states of jobs. |
.* |
{$JOB.TYPE.NOT_MATCHES} | This macro is used in discovery rule to evaluate the states of jobs. |
CHANGE_IF_NEEDED |
{$JOB.STATUS.MATCHES} | This macro is used in discovery rule to evaluate the states of jobs. |
.* |
{$JOB.STATUS.NOT_MATCHES} | This macro is used in discovery rule to evaluate the states of jobs. |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get metrics | The result of API requests is expressed in the JSON. |
Script | veeam.get.metrics |
Get errors | The errors from API requests. |
Dependent item | veeam.get.errors Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Veeam Backup: There are errors in requests to API | Zabbix has received errors in response to API requests. |
length(last(/Veeam Backup and Replication by HTTP/veeam.get.errors))>0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Proxies discovery | Discovery of proxies. |
Dependent item | veeam.proxies.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Server [{#NAME}]: Get data | Gets raw data collected by the proxy server. |
Dependent item | veeam.proxy.server.raw[{#NAME}] Preprocessing
|
Proxy [{#NAME}] [{#TYPE}]: Get data | Gets raw data collected by the proxy with the name |
Dependent item | veeam.proxy.raw[{#NAME}] Preprocessing
|
Proxy [{#NAME}] [{#TYPE}]: Max Task Count | The maximum number of concurrent tasks. |
Dependent item | veeam.proxy.maxtask[{#NAME}] Preprocessing
|
Proxy [{#NAME}] [{#TYPE}]: Host name | The name of the proxy server. |
Dependent item | veeam.proxy.server.name[{#NAME}] Preprocessing
|
Proxy [{#NAME}] [{#TYPE}]: Host type | The type of the proxy server. |
Dependent item | veeam.proxy.server.type[{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Repositories discovery | Discovery of repositories. |
Dependent item | veeam.repositories.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Repository [{#NAME}] [{#TYPE}]: Get data | Gets raw data from repository with the name: |
Dependent item | veeam.repositories.raw[{#NAME}] Preprocessing
|
Repository [{#NAME}] [{#TYPE}]: Used space [{#PATH}] | Used space by repositories expressed in gigabytes (GB). |
Dependent item | veeam.repository.capacity[{#NAME}] Preprocessing
|
Repository [{#NAME}] [{#TYPE}]: Free space [{#PATH}] | Free space of repositories expressed in gigabytes (GB). |
Dependent item | veeam.repository.free.space[{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Sessions discovery | Discovery of sessions. |
Dependent item | veeam.sessions.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Session [{#NAME}] [{#TYPE}]: Get data | Gets raw data from session with the name: |
Dependent item | veeam.sessions.raw[{#ID}] Preprocessing
|
Session [{#NAME}] [{#TYPE}]: State | The state of the session. The enums used: |
Dependent item | veeam.sessions.state[{#ID}] Preprocessing
|
Session [{#NAME}] [{#TYPE}]: Result | The result of the session. The enums used: |
Dependent item | veeam.sessions.result[{#ID}] Preprocessing
|
Session [{#NAME}] [{#TYPE}]: Message | A message that explains the session result. |
Dependent item | veeam.sessions.message[{#ID}] Preprocessing
|
Session progress percent [{#NAME}] [{#TYPE}] | The progress of the session expressed as percentage. |
Dependent item | veeam.sessions.progress.percent[{#ID}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Veeam Backup: Last result session failed | find(/Veeam Backup and Replication by HTTP/veeam.sessions.result[{#ID}],,"like","Failed")=1 |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Jobs states discovery | Discovery of the jobs states. |
Dependent item | veeam.job.state.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Job states [{#NAME}] [{#TYPE}]: Get data | Gets raw data from the job states with the name |
Dependent item | veeam.jobs.states.raw[{#ID}] Preprocessing
|
Job states [{#NAME}] [{#TYPE}]: Status | The current status of the job. The enums used: |
Dependent item | veeam.jobs.status[{#ID}] Preprocessing
|
Job states [{#NAME}] [{#TYPE}]: Last result | The result of the session. The enums used: |
Dependent item | veeam.jobs.last.result[{#ID}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Veeam Backup: Last result job failed | find(/Veeam Backup and Replication by HTTP/veeam.jobs.last.result[{#ID}],,"like","Failed")=1 |Average |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor HashiCorp Vault by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Vault by HTTP
— collects metrics by HTTP agent from /sys/metrics
API endpoint.
See https://www.vaultproject.io/api-docs/system/metrics.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
See Zabbix template operation for basic instructions.
Configure Vault API. See Vault Configuration.
Create a Vault service token and set it to the macro {$VAULT.TOKEN}
.
Name | Description | Default |
---|---|---|
{$VAULT.API.PORT} | Vault port. |
8200 |
{$VAULT.API.SCHEME} | Vault API scheme. |
http |
{$VAULT.HOST} | Vault host name. |
<PUT YOUR VAULT HOST> |
{$VAULT.OPEN.FDS.MAX.WARN} | Maximum percentage of used file descriptors for trigger expression. |
90 |
{$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN} | Maximum number of Vault leadership setup failed. |
5 |
{$VAULT.LEADERSHIP.LOSSES.MAX.WARN} | Maximum number of Vault leadership losses. |
5 |
{$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN} | Maximum number of Vault leadership step downs. |
5 |
{$VAULT.LLD.FILTER.STORAGE.MATCHES} | Filter of discoverable storage backends. |
.+ |
{$VAULT.TOKEN} | Vault auth token. |
<PUT YOUR AUTH TOKEN> |
{$VAULT.TOKEN.ACCESSORS} | Vault accessors separated by spaces for monitoring token expiration time. |
|
{$VAULT.TOKEN.TTL.MIN.CRIT} | Token TTL critical threshold. |
3d |
{$VAULT.TOKEN.TTL.MIN.WARN} | Token TTL warning threshold. |
7d |
Name | Description | Type | Key and additional info | |||
---|---|---|---|---|---|---|
Get health | HTTP agent | vault.get_health Preprocessing
|
||||
Get leader | HTTP agent | vault.get_leader Preprocessing
|
||||
Get metrics | HTTP agent | vault.get_metrics Preprocessing
|
||||
Clear metrics | Dependent item | vault.clear_metrics Preprocessing
|
||||
Get tokens | Get information about tokens via their accessors. Accessors are defined in the macro "{$VAULT.TOKEN.ACCESSORS}". |
Script | vault.get_tokens | |||
Check WAL discovery | Dependent item | vault.checkwaldiscovery Preprocessing
|
||||
Check replication discovery | Dependent item | vault.checkreplicationdiscovery Preprocessing
|
||||
Check storage discovery | Dependent item | vault.checkstoragediscovery Preprocessing
|
put | list | delete)_count$"}</p><p>⛔️Custom on fail: Discard value</p></li><li><p>JavaScript: The text is too long. Please see the template.</p></li><li><p>Discard unchanged with heartbeat: 15m` |
|
Check mountpoint discovery | Dependent item | vault.checkmountpointdiscovery Preprocessing
|
||||
Initialized | Initialization status. |
Dependent item | vault.health.initialized Preprocessing
|
|||
Sealed | Seal status. |
Dependent item | vault.health.sealed Preprocessing
|
|||
Standby | Standby status. |
Dependent item | vault.health.standby Preprocessing
|
|||
Performance standby | Performance standby status. |
Dependent item | vault.health.performance_standby Preprocessing
|
|||
Performance replication | Performance replication mode https://www.vaultproject.io/docs/enterprise/replication |
Dependent item | vault.health.replicationperformancemode Preprocessing
|
|||
Disaster Recovery replication | Disaster recovery replication mode https://www.vaultproject.io/docs/enterprise/replication |
Dependent item | vault.health.replicationdrmode Preprocessing
|
|||
Version | Server version. |
Dependent item | vault.health.version Preprocessing
|
|||
Healthcheck | Vault healthcheck. |
Dependent item | vault.health.check Preprocessing
|
|||
HA enabled | HA enabled status. |
Dependent item | vault.leader.ha_enabled Preprocessing
|
|||
Is leader | Leader status. |
Dependent item | vault.leader.is_self Preprocessing
|
|||
Get metrics error | Get metrics error. |
Dependent item | vault.get_metrics.error Preprocessing
|
|||
Process CPU seconds, total | Total user and system CPU time spent in seconds. |
Dependent item | vault.metrics.process.cpu.seconds.total Preprocessing
|
|||
Open file descriptors, max | Maximum number of open file descriptors. |
Dependent item | vault.metrics.process.max.fds Preprocessing
|
|||
Open file descriptors, current | Number of open file descriptors. |
Dependent item | vault.metrics.process.open.fds Preprocessing
|
|||
Process resident memory | Resident memory size in bytes. |
Dependent item | vault.metrics.process.resident_memory.bytes Preprocessing
|
|||
Uptime | Server uptime. |
Dependent item | vault.metrics.process.uptime Preprocessing
|
|||
Process virtual memory, current | Virtual memory size in bytes. |
Dependent item | vault.metrics.process.virtual_memory.bytes Preprocessing
|
|||
Process virtual memory, max | Maximum amount of virtual memory available in bytes. |
Dependent item | vault.metrics.process.virtual_memory.max.bytes Preprocessing
|
|||
Audit log requests, rate | Number of all audit log requests across all audit log devices. |
Dependent item | vault.metrics.audit.log.request.rate Preprocessing
|
|||
Audit log request failures, rate | Number of audit log request failures. |
Dependent item | vault.metrics.audit.log.request.failure.rate Preprocessing
|
|||
Audit log response, rate | Number of audit log responses across all audit log devices. |
Dependent item | vault.metrics.audit.log.response.rate Preprocessing
|
|||
Audit log response failures, rate | Number of audit log response failures. |
Dependent item | vault.metrics.audit.log.response.failure.rate Preprocessing
|
|||
Barrier DELETE ops, rate | Number of DELETE operations at the barrier. |
Dependent item | vault.metrics.barrier.delete.rate Preprocessing
|
|||
Barrier GET ops, rate | Number of GET operations at the barrier. |
Dependent item | vault.metrics.vault.barrier.get.rate Preprocessing
|
|||
Barrier LIST ops, rate | Number of LIST operations at the barrier. |
Dependent item | vault.metrics.barrier.list.rate Preprocessing
|
|||
Barrier PUT ops, rate | Number of PUT operations at the barrier. |
Dependent item | vault.metrics.barrier.put.rate Preprocessing
|
|||
Cache hit, rate | Number of times a value was retrieved from the LRU cache. |
Dependent item | vault.metrics.cache.hit.rate Preprocessing
|
|||
Cache miss, rate | Number of times a value was not in the LRU cache. The results in a read from the configured storage. |
Dependent item | vault.metrics.cache.miss.rate Preprocessing
|
|||
Cache write, rate | Number of times a value was written to the LRU cache. |
Dependent item | vault.metrics.cache.write.rate Preprocessing
|
|||
Check token, rate | Number of token checks handled by Vault core. |
Dependent item | vault.metrics.core.check.token.rate Preprocessing
|
|||
Fetch ACL and token, rate | Number of ACL and corresponding token entry fetches handled by Vault core. |
Dependent item | vault.metrics.core.fetch.aclandtoken Preprocessing
|
|||
Requests, rate | Number of requests handled by Vault core. |
Dependent item | vault.metrics.core.handle.request Preprocessing
|
|||
Leadership setup failed, counter | Cluster leadership setup failures which have occurred in a highly available Vault cluster. |
Dependent item | vault.metrics.core.leadership.setup_failed Preprocessing
|
|||
Leadership setup lost, counter | Cluster leadership losses which have occurred in a highly available Vault cluster. |
Dependent item | vault.metrics.core.leadership_lost Preprocessing
|
|||
Post-unseal ops, counter | Duration of time taken by post-unseal operations handled by Vault core. |
Dependent item | vault.metrics.core.post_unseal Preprocessing
|
|||
Pre-seal ops, counter | Duration of time taken by pre-seal operations. |
Dependent item | vault.metrics.core.pre_seal Preprocessing
|
|||
Requested seal ops, counter | Duration of time taken by requested seal operations. |
Dependent item | vault.metrics.core.sealwithrequest Preprocessing
|
|||
Seal ops, counter | Duration of time taken by seal operations. |
Dependent item | vault.metrics.core.seal Preprocessing
|
|||
Internal seal ops, counter | Duration of time taken by internal seal operations. |
Dependent item | vault.metrics.core.seal_internal Preprocessing
|
|||
Leadership step downs, counter | Cluster leadership step down. |
Dependent item | vault.metrics.core.step_down Preprocessing
|
|||
Unseal ops, counter | Duration of time taken by unseal operations. |
Dependent item | vault.metrics.core.unseal Preprocessing
|
|||
Fetch lease times, counter | Time taken to fetch lease times. |
Dependent item | vault.metrics.expire.fetch.lease.times Preprocessing
|
|||
Fetch lease times by token, counter | Time taken to fetch lease times by token. |
Dependent item | vault.metrics.expire.fetch.lease.times.by_token Preprocessing
|
|||
Number of expiring leases | Number of all leases which are eligible for eventual expiry. |
Dependent item | vault.metrics.expire.num_leases Preprocessing
|
|||
Expire revoke, count | Time taken to revoke a token. |
Dependent item | vault.metrics.expire.revoke Preprocessing
|
|||
Expire revoke force, count | Time taken to forcibly revoke a token. |
Dependent item | vault.metrics.expire.revoke.force Preprocessing
|
|||
Expire revoke prefix, count | Tokens revoke on a prefix. |
Dependent item | vault.metrics.expire.revoke.prefix Preprocessing
|
|||
Revoke secrets by token, count | Time taken to revoke all secrets issued with a given token. |
Dependent item | vault.metrics.expire.revoke.by_token Preprocessing
|
|||
Expire renew, count | Time taken to renew a lease. |
Dependent item | vault.metrics.expire.renew Preprocessing
|
|||
Renew token, count | Time taken to renew a token which does not need to invoke a logical backend. |
Dependent item | vault.metrics.expire.renew_token Preprocessing
|
|||
Register ops, count | Time taken for register operations. |
Dependent item | vault.metrics.expire.register Preprocessing
|
|||
Register auth ops, count | Time taken for register authentication operations which create lease entries without lease ID. |
Dependent item | vault.metrics.expire.register.auth Preprocessing
|
|||
Policy GET ops, rate | Number of operations to get a policy. |
Dependent item | vault.metrics.policy.get_policy.rate Preprocessing
|
|||
Policy LIST ops, rate | Number of operations to list policies. |
Dependent item | vault.metrics.policy.list_policies.rate Preprocessing
|
|||
Policy DELETE ops, rate | Number of operations to delete a policy. |
Dependent item | vault.metrics.policy.delete_policy.rate Preprocessing
|
|||
Policy SET ops, rate | Number of operations to set a policy. |
Dependent item | vault.metrics.policy.set_policy.rate Preprocessing
|
|||
Token create, count | The time taken to create a token. |
Dependent item | vault.metrics.token.create Preprocessing
|
|||
Token createAccessor, count | The time taken to create a token accessor. |
Dependent item | vault.metrics.token.createAccessor Preprocessing
|
|||
Token lookup, rate | Number of token look up. |
Dependent item | vault.metrics.token.lookup.rate Preprocessing
|
|||
Token revoke, count | The time taken to look up a token. |
Dependent item | vault.metrics.token.revoke Preprocessing
|
|||
Token revoke tree, count | Time taken to revoke a token tree. |
Dependent item | vault.metrics.token.revoke.tree Preprocessing
|
|||
Token store, count | Time taken to store an updated token entry without writing to the secondary index. |
Dependent item | vault.metrics.token.store Preprocessing
|
|||
Runtime allocated bytes | Number of bytes allocated by the Vault process. This could burst from time to time, but should return to a steady state value. |
Dependent item | vault.metrics.runtime.alloc.bytes Preprocessing
|
|||
Runtime freed objects | Number of freed objects. |
Dependent item | vault.metrics.runtime.free.count Preprocessing
|
|||
Runtime heap objects | Number of objects on the heap. This is a good general memory pressure indicator worth establishing a baseline and thresholds for alerting. |
Dependent item | vault.metrics.runtime.heap.objects Preprocessing
|
|||
Runtime malloc count | Cumulative count of allocated heap objects. |
Dependent item | vault.metrics.runtime.malloc.count Preprocessing
|
|||
Runtime num goroutines | Number of goroutines. This serves as a general system load indicator worth establishing a baseline and thresholds for alerting. |
Dependent item | vault.metrics.runtime.num_goroutines Preprocessing
|
|||
Runtime sys bytes | Number of bytes allocated to Vault. This includes what is being used by Vault's heap and what has been reclaimed but not given back to the operating system. |
Dependent item | vault.metrics.runtime.sys.bytes Preprocessing
|
|||
Runtime GC pause, total | The total garbage collector pause time since Vault was last started. |
Dependent item | vault.metrics.total.gc.pause Preprocessing
|
|||
Runtime GC runs, total | Total number of garbage collection runs since Vault was last started. |
Dependent item | vault.metrics.runtime.total.gc.runs Preprocessing
|
|||
Token count, total | Total number of service tokens available for use; counts all un-expired and un-revoked tokens in Vault's token store. This measurement is performed every 10 minutes. |
Dependent item | vault.metrics.token Preprocessing
|
|||
Token count by auth, total | Total number of service tokens that were created by an auth method. |
Dependent item | vault.metrics.token.by_auth Preprocessing
|
|||
Token count by policy, total | Total number of service tokens that have a policy attached. |
Dependent item | vault.metrics.token.by_policy Preprocessing
|
|||
Token count by ttl, total | Number of service tokens, grouped by the TTL range they were assigned at creation. |
Dependent item | vault.metrics.token.by_ttl Preprocessing
|
|||
Token creation, rate | Number of service or batch tokens created. |
Dependent item | vault.metrics.token.creation.rate Preprocessing
|
|||
Secret kv entries | Number of entries in each key-value secret engine. |
Dependent item | vault.metrics.secret.kv.count Preprocessing
|
|||
Token secret lease creation, rate | Counts the number of leases created by secret engines. |
Dependent item | vault.metrics.secret.lease.creation.rate Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HashiCorp Vault: Vault server is sealed | https://www.vaultproject.io/docs/concepts/seal |
last(/HashiCorp Vault by HTTP/vault.health.sealed)=1 |Average |
||
HashiCorp Vault: Version has changed | Vault version has changed. Acknowledge to close the problem manually. |
last(/HashiCorp Vault by HTTP/vault.health.version,#1)<>last(/HashiCorp Vault by HTTP/vault.health.version,#2) and length(last(/HashiCorp Vault by HTTP/vault.health.version))>0 |Info |
Manual close: Yes | |
HashiCorp Vault: Vault server is not responding | last(/HashiCorp Vault by HTTP/vault.health.check)=0 |High |
|||
HashiCorp Vault: Failed to get metrics | length(last(/HashiCorp Vault by HTTP/vault.get_metrics.error))>0 |Warning |
Depends on:
|
||
HashiCorp Vault: Current number of open files is too high | min(/HashiCorp Vault by HTTP/vault.metrics.process.open.fds,5m)/last(/HashiCorp Vault by HTTP/vault.metrics.process.max.fds)*100>{$VAULT.OPEN.FDS.MAX.WARN} |Warning |
|||
HashiCorp Vault: Service has been restarted | Uptime is less than 10 minutes. |
last(/HashiCorp Vault by HTTP/vault.metrics.process.uptime)<10m |Info |
Manual close: Yes | |
HashiCorp Vault: High frequency of leadership setup failures | There have been more than {$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN} Vault leadership setup failures in the past 1h. |
(max(/HashiCorp Vault by HTTP/vault.metrics.core.leadership.setup_failed,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.leadership.setup_failed,1h))>{$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN} |Average |
||
HashiCorp Vault: High frequency of leadership losses | There have been more than {$VAULT.LEADERSHIP.LOSSES.MAX.WARN} Vault leadership losses in the past 1h. |
(max(/HashiCorp Vault by HTTP/vault.metrics.core.leadership_lost,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.leadership_lost,1h))>{$VAULT.LEADERSHIP.LOSSES.MAX.WARN} |Average |
||
HashiCorp Vault: High frequency of leadership step downs | There have been more than {$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN} Vault leadership step downs in the past 1h. |
(max(/HashiCorp Vault by HTTP/vault.metrics.core.step_down,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.step_down,1h))>{$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN} |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Storage metrics discovery | Storage backend metrics discovery. |
Dependent item | vault.storage.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Storage [{#STORAGE}] {#OPERATION} ops, rate | Number of a {#OPERATION} operation against the {#STORAGE} storage backend. |
Dependent item | vault.metrics.storage.rate[{#STORAGE}, {#OPERATION}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Mountpoint metrics discovery | Mountpoint metrics discovery. |
Dependent item | vault.mountpoint.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Rollback attempt [{#MOUNTPOINT}] ops, rate | Number of operations to perform a rollback operation on the given mount point. |
Dependent item | vault.metrics.rollback.attempt.rate[{#MOUNTPOINT}] Preprocessing
|
Route rollback [{#MOUNTPOINT}] ops, rate | Number of operations to dispatch a rollback operation to a backend, and for that backend to process it. Rollback operations are automatically scheduled to clean up partial errors. |
Dependent item | vault.metrics.route.rollback.rate[{#MOUNTPOINT}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
WAL metrics discovery | Discovery for WAL metrics. |
Dependent item | vault.wal.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Delete WALs, count{#SINGLETON} | Time taken to delete a Write Ahead Log (WAL). |
Dependent item | vault.metrics.wal.deletewals[{#SINGLETON}] Preprocessing
|
GC deleted WAL{#SINGLETON} | Number of Write Ahead Logs (WAL) deleted during each garbage collection run. |
Dependent item | vault.metrics.wal.gc.deleted[{#SINGLETON}] Preprocessing
|
WALs on disk, total{#SINGLETON} | Total Number of Write Ahead Logs (WAL) on disk. |
Dependent item | vault.metrics.wal.gc.total[{#SINGLETON}] Preprocessing
|
Load WALs, count{#SINGLETON} | Time taken to load a Write Ahead Log (WAL). |
Dependent item | vault.metrics.wal.loadWAL[{#SINGLETON}] Preprocessing
|
Persist WALs, count{#SINGLETON} | Time taken to persist a Write Ahead Log (WAL). |
Dependent item | vault.metrics.wal.persistwals[{#SINGLETON}] Preprocessing
|
Flush ready WAL, count{#SINGLETON} | Time taken to flush a ready Write Ahead Log (WAL) to storage. |
Dependent item | vault.metrics.wal.flushready[{#SINGLETON}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Replication metrics discovery | Discovery for replication metrics. |
Dependent item | vault.replication.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Stream WAL missing guard, count{#SINGLETON} | Number of incidences where the starting Merkle Tree index used to begin streaming WAL entries is not matched/found. |
Dependent item | vault.metrics.logshipper.streamWALs.missing_guard[{#SINGLETON}] Preprocessing
|
Stream WAL guard found, count{#SINGLETON} | Number of incidences where the starting Merkle Tree index used to begin streaming WAL entries is matched/found. |
Dependent item | vault.metrics.logshipper.streamWALs.guard_found[{#SINGLETON}] Preprocessing
|
Merkle commit index{#SINGLETON} | The last committed index in the Merkle Tree. |
Dependent item | vault.metrics.replication.merkle.commit_index[{#SINGLETON}] Preprocessing
|
Last WAL{#SINGLETON} | The index of the last WAL. |
Dependent item | vault.metrics.replication.wal.last_wal[{#SINGLETON}] Preprocessing
|
Last DR WAL{#SINGLETON} | The index of the last DR WAL. |
Dependent item | vault.metrics.replication.wal.lastdrwal[{#SINGLETON}] Preprocessing
|
Last performance WAL{#SINGLETON} | The index of the last Performance WAL. |
Dependent item | vault.metrics.replication.wal.lastperformancewal[{#SINGLETON}] Preprocessing
|
Last remote WAL{#SINGLETON} | The index of the last remote WAL. |
Dependent item | vault.metrics.replication.fsm.lastremotewal[{#SINGLETON}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Token metrics discovery | Tokens metrics discovery. |
Dependent item | vault.tokens.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Token [{#TOKEN_NAME}] error | Token lookup error text. |
Dependent item | vault.tokenviaaccessor.error["{#ACCESSOR}"] Preprocessing
|
Token [{#TOKEN_NAME}] has TTL | The Token has TTL. |
Dependent item | vault.tokenviaaccessor.has_ttl["{#ACCESSOR}"] Preprocessing
|
Token [{#TOKEN_NAME}] TTL | The TTL period of the token. |
Dependent item | vault.tokenviaaccessor.ttl["{#ACCESSOR}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HashiCorp Vault: Token [{#TOKEN_NAME}] lookup error occurred | length(last(/HashiCorp Vault by HTTP/vault.token_via_accessor.error["{#ACCESSOR}"]))>0 |Warning |
Depends on:
|
||
HashiCorp Vault: Token [{#TOKEN_NAME}] will expire soon | last(/HashiCorp Vault by HTTP/vault.token_via_accessor.has_ttl["{#ACCESSOR}"])=1 and last(/HashiCorp Vault by HTTP/vault.token_via_accessor.ttl["{#ACCESSOR}"])<{$VAULT.TOKEN.TTL.MIN.CRIT} |Average |
|||
HashiCorp Vault: Token [{#TOKEN_NAME}] will expire soon | last(/HashiCorp Vault by HTTP/vault.token_via_accessor.has_ttl["{#ACCESSOR}"])=1 and last(/HashiCorp Vault by HTTP/vault.token_via_accessor.ttl["{#ACCESSOR}"])<{$VAULT.TOKEN.TTL.MIN.WARN} |Warning |
Depends on:
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Template for monitoring TrueNAS CORE by SNMP.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Name | Description | Default |
---|---|---|
{$CPU.UTIL.CRIT} | Threshold of CPU utilization for warning trigger in %. |
90 |
{$ICMPLOSSWARN} | Threshold of ICMP packets loss for warning trigger in %. |
20 |
{$ICMPRESPONSETIME_WARN} | Threshold of average ICMP response time for warning trigger in seconds. |
0.15 |
{$IF.ERRORS.WARN} | Threshold of error packets rate for warning trigger. Can be used with interface name as context. |
2 |
{$IF.UTIL.MAX} | Threshold of interface bandwidth utilization for warning trigger in %. Can be used with interface name as context. |
90 |
{$IFCONTROL} | Macro for operational state of the interface for link down trigger. Can be used with interface name as context. |
1 |
{$LOADAVGPER_CPU.MAX.WARN} | Load per CPU considered sustainable. Tune if needed. |
1.5 |
{$MEMORY.AVAILABLE.MIN} | Threshold of available memory for trigger in bytes. |
20M |
{$MEMORY.UTIL.MAX} | Threshold of memory utilization for trigger in % |
90 |
{$NET.IF.IFADMINSTATUS.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
^.* |
{$NET.IF.IFADMINSTATUS.NOT_MATCHES} | Ignore down(2) administrative status |
^2$ |
{$NET.IF.IFALIAS.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
.* |
{$NET.IF.IFALIAS.NOT_MATCHES} | This macro is used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
{$NET.IF.IFDESCR.MATCHES} | This macro used in filters of network interfaces discovery rule. |
.* |
{$NET.IF.IFDESCR.NOT_MATCHES} | This macro used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
{$NET.IF.IFNAME.NOT_MATCHES} | This macro used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
{$NET.IF.IFOPERSTATUS.MATCHES} | This macro used in filters of network interfaces discovery rule. |
^.*$ |
{$NET.IF.IFOPERSTATUS.NOT_MATCHES} | Ignore notPresent(6) |
^6$ |
{$NET.IF.IFTYPE.MATCHES} | This macro used in filters of network interfaces discovery rule. |
.* |
{$NET.IF.IFTYPE.NOT_MATCHES} | This macro used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
{$SNMP.TIMEOUT} | The time interval for SNMP availability trigger. |
5m |
{$SWAP.PFREE.MIN.WARN} | Threshold of free swap space for warning trigger in %. |
50 |
{$VFS.DEV.DEVNAME.MATCHES} | This macro is used in block devices discovery. Can be overridden on the host or linked template level |
.+ |
{$VFS.DEV.DEVNAME.NOT_MATCHES} | This macro is used in block devices discovery. Can be overridden on the host or linked template level |
Macro too long. Please see the template. |
{$DATASET.NAME.MATCHES} | This macro is used in datasets discovery. Can be overridden on the host or linked template level |
.+ |
{$DATASET.NAME.NOT_MATCHES} | This macro is used in datasets discovery. Can be overridden on the host or linked template level |
^(boot|.+\.system(.+)?$) |
{$ZPOOL.PUSED.MAX.WARN} | Threshold of used pool space for warning trigger in %. |
80 |
{$ZPOOL.FREE.MIN.WARN} | This macro is used for trigger expression. It can be overridden on the host or linked on the template level. |
5G |
{$ZPOOL.PUSED.MAX.CRIT} | Threshold of used pool space for average severity trigger in %. |
90 |
{$ZPOOL.FREE.MIN.CRIT} | This macro is used for trigger expression. It can be overridden on the host or linked on the template level. |
5G |
{$DATASET.PUSED.MAX.WARN} | Threshold of used dataset space for warning trigger in %. |
80 |
{$DATASET.FREE.MIN.WARN} | This macro is used for trigger expression. It can be overridden on the host or linked on the template level. |
5G |
{$DATASET.PUSED.MAX.CRIT} | Threshold of used dataset space for average severity trigger in %. |
90 |
{$DATASET.FREE.MIN.CRIT} | This macro is used for trigger expression. It can be overridden on the host or linked on the template level. |
5G |
{$TEMPERATURE.MAX.WARN} | This macro is used for trigger expression. It can be overridden on the host or linked on the template level. |
50 |
{$TEMPERATURE.MAX.CRIT} | This macro is used for trigger expression. It can be overridden on the host or linked on the template level. |
65 |
Name | Description | Type | Key and additional info |
---|---|---|---|
ICMP ping | Host accessibility by ICMP. 0 - ICMP ping fails. 1 - ICMP ping successful. |
Simple check | icmpping |
ICMP loss | Percentage of lost packets. |
Simple check | icmppingloss |
ICMP response time | ICMP ping response time (in seconds). |
Simple check | icmppingsec |
System contact details | MIB: SNMPv2-MIB The textual identification of the contact person for this managed node, together with information on how to contact this person. If no contact information is known, the value is the zero-length string. |
SNMP agent | system.contact Preprocessing
|
System description | MIB: SNMPv2-MIB System description of the host. |
SNMP agent | system.descr Preprocessing
|
System location | MIB: SNMPv2-MIB The physical location of this node. If the location is unknown, the value is the zero-length string. |
SNMP agent | system.location Preprocessing
|
System name | MIB: SNMPv2-MIB The host name of the system. |
SNMP agent | system.name Preprocessing
|
System object ID | MIB: SNMPv2-MIB The vendor authoritative identification of the network management subsystem contained in the entity. This value is allocated within the SMI enterprises subtree (1.3.6.1.4.1) and provides an easy and unambiguous means for determining what kind of box is being managed. |
SNMP agent | system.objectid Preprocessing
|
Uptime | MIB: HOST-RESOURCES-MIB The amount of time since this host was last initialized. Note that this is different from sysUpTime in the SNMPv2-MIB [RFC1907] because sysUpTime is the uptime of the network management portion of the system. |
SNMP agent | system.uptime Preprocessing
|
SNMP traps (fallback) | The item is used to collect all SNMP traps unmatched by other snmptrap items. |
SNMP trap | snmptrap.fallback |
SNMP agent availability | Availability of SNMP checks on the host. The value of this item corresponds to availability icons in the host list. Possible values: 0 - not available 1 - available 2 - unknown |
Zabbix internal | zabbix[host,snmp,available] |
Interrupts per second | MIB: UCD-SNMP-MIB Number of interrupts processed. |
SNMP agent | system.cpu.intr Preprocessing
|
Context switches per second | MIB: UCD-SNMP-MIB Number of context switches. |
SNMP agent | system.cpu.switches Preprocessing
|
Load average (1m avg) | MIB: UCD-SNMP-MIB The 1 minute load averages. |
SNMP agent | system.cpu.load.avg1 |
Load average (5m avg) | MIB: UCD-SNMP-MIB The 5 minutes load averages. |
SNMP agent | system.cpu.load.avg5 |
Load average (15m avg) | MIB: UCD-SNMP-MIB The 15 minutes load averages. |
SNMP agent | system.cpu.load.avg15 |
Number of CPUs | MIB: HOST-RESOURCES-MIB Count the number of CPU cores by counting number of cores discovered in hrProcessorTable using LLD. |
SNMP agent | system.cpu.num Preprocessing
|
Free memory | MIB: UCD-SNMP-MIB The amount of real/physical memory currently unused or available. |
SNMP agent | vm.memory.free Preprocessing
|
Memory (buffers) | MIB: UCD-SNMP-MIB The total amount of real or virtual memory currently allocated for use as memory buffers. |
SNMP agent | vm.memory.buffers Preprocessing
|
Memory (cached) | MIB: UCD-SNMP-MIB The total amount of real or virtual memory currently allocated for use as cached memory. |
SNMP agent | vm.memory.cached Preprocessing
|
Total memory | MIB: UCD-SNMP-MIB The total memory expressed in bytes. |
SNMP agent | vm.memory.total Preprocessing
|
Available memory | Please note that memory utilization is a rough estimate, since memory available is calculated as free+buffers+cached, which is not 100% accurate, but the best we can get using SNMP. |
Calculated | vm.memory.available |
Memory utilization | Please note that memory utilization is a rough estimate, since memory available is calculated as free+buffers+cached, which is not 100% accurate, but the best we can get using SNMP. |
Calculated | vm.memory.util |
Total swap space | MIB: UCD-SNMP-MIB The total amount of swap space configured for this host. |
SNMP agent | system.swap.total Preprocessing
|
Free swap space | MIB: UCD-SNMP-MIB The amount of swap space currently unused or available. |
SNMP agent | system.swap.free Preprocessing
|
Free swap space in % | The free space of the swap volume/file expressed in %. |
Calculated | system.swap.pfree Preprocessing
|
ARC size | MIB: FREENAS-MIB ARC size in bytes. |
SNMP agent | truenas.zfs.arc.size Preprocessing
|
ARC metadata size | MIB: FREENAS-MIB ARC metadata size used in bytes. |
SNMP agent | truenas.zfs.arc.meta Preprocessing
|
ARC data size | MIB: FREENAS-MIB ARC data size used in bytes. |
SNMP agent | truenas.zfs.arc.data Preprocessing
|
ARC hits | MIB: FREENAS-MIB Total amount of cache hits in the ARC per second. |
SNMP agent | truenas.zfs.arc.hits Preprocessing
|
ARC misses | MIB: FREENAS-MIB Total amount of cache misses in the ARC per second. |
SNMP agent | truenas.zfs.arc.misses Preprocessing
|
ARC target size of cache | MIB: FREENAS-MIB ARC target size of cache in bytes. |
SNMP agent | truenas.zfs.arc.c Preprocessing
|
ARC target size of MRU | MIB: FREENAS-MIB ARC target size of MRU in bytes. |
SNMP agent | truenas.zfs.arc.p Preprocessing
|
ARC cache hit ratio | MIB: FREENAS-MIB ARC cache hit ration percentage. |
SNMP agent | truenas.zfs.arc.hit.ratio |
ARC cache miss ratio | MIB: FREENAS-MIB ARC cache miss ration percentage. |
SNMP agent | truenas.zfs.arc.miss.ratio |
L2ARC hits | MIB: FREENAS-MIB Hits to the L2 cache per second. |
SNMP agent | truenas.zfs.l2arc.hits Preprocessing
|
L2ARC misses | MIB: FREENAS-MIB Misses to the L2 cache per second. |
SNMP agent | truenas.zfs.l2arc.misses Preprocessing
|
L2ARC read rate | MIB: FREENAS-MIB Read rate from L2 cache in bytes per second. |
SNMP agent | truenas.zfs.l2arc.read Preprocessing
|
L2ARC write rate | MIB: FREENAS-MIB Write rate from L2 cache in bytes per second. |
SNMP agent | truenas.zfs.l2arc.write Preprocessing
|
L2ARC size | MIB: FREENAS-MIB L2ARC size in bytes. |
SNMP agent | truenas.zfs.l2arc.size Preprocessing
|
ZIL operations 1 second | MIB: FREENAS-MIB The ops column parsed from the command zilstat 1 1. |
SNMP agent | truenas.zfs.zil.ops1 |
ZIL operations 5 seconds | MIB: FREENAS-MIB The ops column parsed from the command zilstat 5 1. |
SNMP agent | truenas.zfs.zil.ops5 |
ZIL operations 10 seconds | MIB: FREENAS-MIB The ops column parsed from the command zilstat 10 1. |
SNMP agent | truenas.zfs.zil.ops10 |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
TrueNAS CORE: Unavailable by ICMP ping | Last three attempts returned timeout. Please check device connectivity. |
max(/TrueNAS CORE by SNMP/icmpping,#3)=0 |High |
||
TrueNAS CORE: High ICMP ping loss | ICMP packets loss detected. |
min(/TrueNAS CORE by SNMP/icmppingloss,5m)>{$ICMP_LOSS_WARN} and min(/TrueNAS CORE by SNMP/icmppingloss,5m)<100 |Warning |
Depends on:
|
|
TrueNAS CORE: High ICMP ping response time | Average ICMP response time is too big. |
avg(/TrueNAS CORE by SNMP/icmppingsec,5m)>{$ICMP_RESPONSE_TIME_WARN} |Warning |
Depends on:
|
|
TrueNAS CORE: System name has changed | The name of the system has changed. Acknowledge to close the problem manually. |
last(/TrueNAS CORE by SNMP/system.name,#1)<>last(/TrueNAS CORE by SNMP/system.name,#2) and length(last(/TrueNAS CORE by SNMP/system.name))>0 |Info |
Manual close: Yes | |
TrueNAS CORE: Host has been restarted | Uptime is less than 10 minutes. |
last(/TrueNAS CORE by SNMP/system.uptime)<10m |Info |
Manual close: Yes | |
TrueNAS CORE: No SNMP data collection | SNMP is not available for polling. Please check device connectivity and SNMP settings. |
max(/TrueNAS CORE by SNMP/zabbix[host,snmp,available],{$SNMP.TIMEOUT})=0 |Warning |
Depends on:
|
|
TrueNAS CORE: Load average is too high | The load average per CPU is too high. The system may be slow to respond. |
min(/TrueNAS CORE by SNMP/system.cpu.load.avg1,5m)/last(/TrueNAS CORE by SNMP/system.cpu.num)>{$LOAD_AVG_PER_CPU.MAX.WARN} and last(/TrueNAS CORE by SNMP/system.cpu.load.avg5)>0 and last(/TrueNAS CORE by SNMP/system.cpu.load.avg15)>0 |Average |
||
TrueNAS CORE: Lack of available memory | The system is running out of memory. |
min(/TrueNAS CORE by SNMP/vm.memory.available,5m)<{$MEMORY.AVAILABLE.MIN} and last(/TrueNAS CORE by SNMP/vm.memory.total)>0 |Average |
||
TrueNAS CORE: High memory utilization | The system is running out of free memory. |
min(/TrueNAS CORE by SNMP/vm.memory.util,5m)>{$MEMORY.UTIL.MAX} |Average |
Depends on:
|
|
TrueNAS CORE: High swap space usage | If there is no swap configured, this trigger is ignored. |
min(/TrueNAS CORE by SNMP/system.swap.pfree,5m)<{$SWAP.PFREE.MIN.WARN} and last(/TrueNAS CORE by SNMP/system.swap.total)>0 |Warning |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
CPU discovery | This discovery will create set of per core CPU metrics from UCD-SNMP-MIB, using {#CPU.COUNT} in preprocessing. That's the only reason why LLD is used. |
Dependent item | cpu.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
CPU idle time | MIB: UCD-SNMP-MIB The time the CPU has spent doing nothing. |
SNMP agent | system.cpu.idle[{#SNMPINDEX}] |
CPU system time | MIB: UCD-SNMP-MIB The time the CPU has spent running the kernel and its processes. |
SNMP agent | system.cpu.system[{#SNMPINDEX}] Preprocessing
|
CPU user time | MIB: UCD-SNMP-MIB The time the CPU has spent running users' processes that are not niced. |
SNMP agent | system.cpu.user[{#SNMPINDEX}] Preprocessing
|
CPU nice time | MIB: UCD-SNMP-MIB The time the CPU has spent running users' processes that have been niced. |
SNMP agent | system.cpu.nice[{#SNMPINDEX}] Preprocessing
|
CPU iowait time | MIB: UCD-SNMP-MIB The amount of time the CPU has been waiting for I/O to complete. |
SNMP agent | system.cpu.iowait[{#SNMPINDEX}] Preprocessing
|
CPU interrupt time | MIB: UCD-SNMP-MIB The amount of time the CPU has been servicing hardware interrupts. |
SNMP agent | system.cpu.interrupt[{#SNMPINDEX}] Preprocessing
|
CPU utilization | The CPU utilization expressed in %. |
Dependent item | system.cpu.util[{#SNMPINDEX}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
TrueNAS CORE: High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/TrueNAS CORE by SNMP/system.cpu.util[{#SNMPINDEX}],5m)>{$CPU.UTIL.CRIT} |Warning |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Block devices discovery | Block devices are discovered from UCD-DISKIO-MIB::diskIOTable (http://net-snmp.sourceforge.net/docs/mibs/ucdDiskIOMIB.html#diskIOTable). |
SNMP agent | vfs.dev.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
TrueNAS CORE: [{#DEVNAME}]: Disk read rate | MIB: UCD-DISKIO-MIB The number of read accesses from this device since boot. |
SNMP agent | vfs.dev.read.rate[{#SNMPINDEX}] Preprocessing
|
TrueNAS CORE: [{#DEVNAME}]: Disk write rate | MIB: UCD-DISKIO-MIB The number of write accesses from this device since boot. |
SNMP agent | vfs.dev.write.rate[{#SNMPINDEX}] Preprocessing
|
TrueNAS CORE: [{#DEVNAME}]: Disk utilization | MIB: UCD-DISKIO-MIB The 1 minute average load of disk (%). |
SNMP agent | vfs.dev.util[{#SNMPINDEX}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Network interfaces discovery | Discovering interfaces from IF-MIB. |
SNMP agent | net.if.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Interface [{#IFNAME}({#IFALIAS})]: Inbound packets discarded | MIB: IF-MIB The number of inbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.in.discards[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Inbound packets with errors | MIB: IF-MIB For packet-oriented interfaces, the number of inbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of inbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.in.errors[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Bits received | MIB: IF-MIB The total number of octets received on the interface, including framing characters. This object is a 64-bit version of ifInOctets. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.in[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Outbound packets discarded | MIB: IF-MIB The number of outbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.out.discards[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Outbound packets with errors | MIB: IF-MIB For packet-oriented interfaces, the number of outbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of outbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.out.errors[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Bits sent | MIB: IF-MIB The total number of octets transmitted out of the interface, including framing characters. This object is a 64-bit version of ifOutOctets.Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.out[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Speed | MIB: IF-MIB An estimate of the interface's current bandwidth in units of 1,000,000 bits per second. If this object reports a value of |
SNMP agent | net.if.speed[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Operational status | MIB: IF-MIB The current operational state of the interface. - The testing(3) state indicates that no operational packet scan be passed - If ifAdminStatus is down(2) then ifOperStatus should be down(2) - If ifAdminStatus is changed to up(1) then ifOperStatus should change to up(1) if the interface is ready to transmit and receive network traffic - It should change todormant(5) if the interface is waiting for external actions (such as a serial line waiting for an incoming connection) - It should remain in the down(2) state if and only if there is a fault that prevents it from going to the up(1) state - It should remain in the notPresent(6) state if the interface has missing(typically, hardware) components. |
SNMP agent | net.if.status[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Interface type | MIB: IF-MIB The type of interface. Additional values for ifType are assigned by the Internet Assigned Numbers Authority (IANA), through updating the syntax of the IANAifType textual convention. |
SNMP agent | net.if.type[{#SNMPINDEX}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: High input error rate | It recovers when it is below 80% of the |
min(/TrueNAS CORE by SNMP/net.if.in.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} |Warning |
Depends on:
|
|
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: High inbound bandwidth usage | The utilization of the network interface is close to its estimated maximum bandwidth. |
(avg(/TrueNAS CORE by SNMP/net.if.in[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/TrueNAS CORE by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/TrueNAS CORE by SNMP/net.if.speed[{#SNMPINDEX}])>0 |Warning |
Depends on:
|
|
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: High output error rate | It recovers when it is below 80% of the |
min(/TrueNAS CORE by SNMP/net.if.out.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} |Warning |
Depends on:
|
|
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: High outbound bandwidth usage | The utilization of the network interface is close to its estimated maximum bandwidth. |
(avg(/TrueNAS CORE by SNMP/net.if.out[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/TrueNAS CORE by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/TrueNAS CORE by SNMP/net.if.speed[{#SNMPINDEX}])>0 |Warning |
Depends on:
|
|
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Ethernet has changed to lower speed than it was before | This Ethernet connection has transitioned down from its known maximum speed. This might be a sign of autonegotiation issues. Acknowledge to close the problem manually. |
change(/TrueNAS CORE by SNMP/net.if.speed[{#SNMPINDEX}])<0 and last(/TrueNAS CORE by SNMP/net.if.speed[{#SNMPINDEX}])>0 and ( last(/TrueNAS CORE by SNMP/net.if.type[{#SNMPINDEX}])=6 or last(/TrueNAS CORE by SNMP/net.if.type[{#SNMPINDEX}])=7 or last(/TrueNAS CORE by SNMP/net.if.type[{#SNMPINDEX}])=11 or last(/TrueNAS CORE by SNMP/net.if.type[{#SNMPINDEX}])=62 or last(/TrueNAS CORE by SNMP/net.if.type[{#SNMPINDEX}])=69 or last(/TrueNAS CORE by SNMP/net.if.type[{#SNMPINDEX}])=117 ) and (last(/TrueNAS CORE by SNMP/net.if.status[{#SNMPINDEX}])<>2) |Info |
Depends on:
|
|
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Link down | This trigger expression works as follows: |
{$IFCONTROL:"{#IFNAME}"}=1 and (last(/TrueNAS CORE by SNMP/net.if.status[{#SNMPINDEX}])=2) |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
ZFS pools discovery | ZFS pools discovery from FREENAS-MIB. |
SNMP agent | truenas.zfs.pools.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Pool [{#POOLNAME}]: Total space | MIB: FREENAS-MIB The size of the storage pool in bytes. |
SNMP agent | truenas.zpool.size.total[{#POOLNAME}] Preprocessing
|
Pool [{#POOLNAME}]: Used space | MIB: FREENAS-MIB The used size of the storage pool in bytes. |
SNMP agent | truenas.zpool.used[{#POOLNAME}] Preprocessing
|
Pool [{#POOLNAME}]: Available space | MIB: FREENAS-MIB The available size of the storage pool in bytes. |
SNMP agent | truenas.zpool.avail[{#POOLNAME}] Preprocessing
|
Pool [{#POOLNAME}]: Usage in % | The used size of the storage pool in %. |
Calculated | truenas.zpool.pused[{#POOLNAME}] |
Pool [{#POOLNAME}]: Health | MIB: FREENAS-MIB The current health of the containing pool, as reported by zpool status. |
SNMP agent | truenas.zpool.health[{#POOLNAME}] Preprocessing
|
Pool [{#POOLNAME}]: Read operations rate | MIB: FREENAS-MIB The number of read I/O operations sent to the pool or device, including metadata requests (averaged since system booted). |
SNMP agent | truenas.zpool.read.ops[{#POOLNAME}] Preprocessing
|
Pool [{#POOLNAME}]: Write operations rate | MIB: FREENAS-MIB The number of write I/O operations sent to the pool or device (averaged since system booted). |
SNMP agent | truenas.zpool.write.ops[{#POOLNAME}] Preprocessing
|
Pool [{#POOLNAME}]: Read rate | MIB: FREENAS-MIB The bandwidth of all read operations (including metadata), expressed as units per second (averaged since system booted). |
SNMP agent | truenas.zpool.read.bytes[{#POOLNAME}] Preprocessing
|
Pool [{#POOLNAME}]: Write rate | MIB: FREENAS-MIB The bandwidth of all write operations, expressed as units per second (averaged since system booted). |
SNMP agent | truenas.zpool.write.bytes[{#POOLNAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
TrueNAS CORE: Pool [{#POOLNAME}]: Very high space usage | Two conditions should match: |
min(/TrueNAS CORE by SNMP/truenas.zpool.pused[{#POOLNAME}],5m) > {$ZPOOL.PUSED.MAX.CRIT:"{#POOLNAME}"} and last(/TrueNAS CORE by SNMP/truenas.zpool.avail[{#POOLNAME}]) < {$ZPOOL.FREE.MIN.CRIT:"{#POOLNAME}"} |Average |
||
TrueNAS CORE: Pool [{#POOLNAME}]: High space usage | Two conditions should match: |
min(/TrueNAS CORE by SNMP/truenas.zpool.pused[{#POOLNAME}],5m) > {$ZPOOL.PUSED.MAX.WARN:"{#POOLNAME}"} and last(/TrueNAS CORE by SNMP/truenas.zpool.avail[{#POOLNAME}]) < {$ZPOOL.FREE.MIN.WARN:"{#POOLNAME}"} |Warning |
Depends on:
|
|
TrueNAS CORE: Pool [{#POOLNAME}]: Status is not online | Please check pool status. |
last(/TrueNAS CORE by SNMP/truenas.zpool.health[{#POOLNAME}]) <> 0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
ZFS datasets discovery | ZFS datasets discovery from FREENAS-MIB. |
SNMP agent | truenas.zfs.dataset.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Dataset [{#DATASET_NAME}]: Total space | MIB: FREENAS-MIB The size of the dataset in bytes. |
SNMP agent | truenas.dataset.size.total[{#DATASET_NAME}] Preprocessing
|
Dataset [{#DATASET_NAME}]: Used space | MIB: FREENAS-MIB The used size of the dataset in bytes. |
SNMP agent | truenas.dataset.used[{#DATASET_NAME}] Preprocessing
|
Dataset [{#DATASET_NAME}]: Available space | MIB: FREENAS-MIB The available size of the dataset in bytes. |
SNMP agent | truenas.dataset.avail[{#DATASET_NAME}] Preprocessing
|
Dataset [{#DATASET_NAME}]: Usage in % | The used size of the dataset in %. |
Calculated | truenas.dataset.pused[{#DATASET_NAME}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
TrueNAS CORE: Dataset [{#DATASET_NAME}]: Very high space usage | Two conditions should match: |
min(/TrueNAS CORE by SNMP/truenas.dataset.pused[{#DATASET_NAME}],5m) > {$DATASET.PUSED.MAX.CRIT:"{#DATASET_NAME}"} and last(/TrueNAS CORE by SNMP/truenas.dataset.avail[{#DATASET_NAME}]) < {$DATASET.FREE.MIN.CRIT:"{#POOLNAME}"} |Average |
||
TrueNAS CORE: Dataset [{#DATASET_NAME}]: High space usage | Two conditions should match: |
min(/TrueNAS CORE by SNMP/truenas.dataset.pused[{#DATASET_NAME}],5m) > {$DATASET.PUSED.MAX.WARN:"{#DATASET_NAME}"} and last(/TrueNAS CORE by SNMP/truenas.dataset.avail[{#DATASET_NAME}]) < {$DATASET.FREE.MIN.WARN:"{#POOLNAME}"} |Warning |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
ZFS volumes discovery | ZFS volumes discovery from FREENAS-MIB. |
SNMP agent | truenas.zfs.zvols.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
TrueNAS CORE: ZFS volume [{#ZVOL_NAME}]: Total space | MIB: FREENAS-MIB The size of the ZFS volume in bytes. |
SNMP agent | truenas.zvol.size.total[{#ZVOL_NAME}] Preprocessing
|
TrueNAS CORE: ZFS volume [{#ZVOL_NAME}]: Used space | MIB: FREENAS-MIB The used size of the ZFS volume in bytes. |
SNMP agent | truenas.zvol.used[{#ZVOL_NAME}] Preprocessing
|
TrueNAS CORE: ZFS volume [{#ZVOL_NAME}]: Available space | MIB: FREENAS-MIB The available of the ZFS volume in bytes. |
SNMP agent | truenas.zvol.avail[{#ZVOL_NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Disks temperature discovery | Disks temperature discovery from FREENAS-MIB. |
SNMP agent | truenas.disk.temp.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Disk [{#DISK_NAME}]': Temperature | MIB: FREENAS-MIB The temperature of this HDD in mC. |
SNMP agent | truenas.disk.temp[{#DISK_NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
TrueNAS CORE: Disk [{#DISK_NAME}]': Average disk temperature is too high | Disk temperature is high. |
avg(/TrueNAS CORE by SNMP/truenas.disk.temp[{#DISK_NAME}],5m) > {$TEMPERATURE.MAX.CRIT:"{#DISK_NAME}"} |Average |
||
TrueNAS CORE: Disk [{#DISK_NAME}]': Average disk temperature is too high | Disk temperature is high. |
avg(/TrueNAS CORE by SNMP/truenas.disk.temp[{#DISK_NAME}],5m) > {$TEMPERATURE.MAX.WARN:"{#DISK_NAME}"} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Travis CI by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
You must set {$TRAVIS.API.TOKEN} and {$TRAVIS.API.URL} macros. {$TRAVIS.API.TOKEN} is a Travis API authentication token located in User -> Settings -> API authentication. {$TRAVIS.API.URL} could be in 2 different variations:
Name | Description | Default |
---|---|---|
{$TRAVIS.API.TOKEN} | Travis API Token |
|
{$TRAVIS.API.URL} | Travis API URL |
api.travis-ci.com |
{$TRAVIS.BUILDS.SUCCESS.PERCENT} | Percent of successful builds in the repo (for trigger expression) |
80 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get repos | Getting repos using Travis API. |
HTTP agent | travis.get_repos |
Get builds | Getting builds using Travis API. |
HTTP agent | travis.get_builds |
Get jobs | Getting jobs using Travis API. |
HTTP agent | travis.get_jobs |
Get health | Getting home JSON using Travis API. |
HTTP agent | travis.get_health Preprocessing
|
Jobs passed | Total count of passed jobs in all repos. |
Dependent item | travis.jobs.total Preprocessing
|
Jobs active | Active jobs in all repos. |
Dependent item | travis.jobs.active Preprocessing
|
Jobs in queue | Jobs in queue in all repos. |
Dependent item | travis.jobs.queue Preprocessing
|
Builds | Total count of builds in all repos. |
Dependent item | travis.builds.total Preprocessing
|
Builds duration | Sum of all builds durations in all repos. |
Dependent item | travis.builds.duration Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Travis CI: Service is unavailable | Travis API is unavailable. Please check if the correct macros are set. |
last(/Travis CI by HTTP/travis.get_health)=0 |High |
Manual close: Yes | |
Travis CI: Failed to fetch home page | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Travis CI by HTTP/travis.get_health,30m)=1 |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Repos metrics discovery | Metrics for Repos statistics. |
Dependent item | travis.repos.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Repo [{#SLUG}]: Get builds | Getting builds of {#SLUG} using Travis API. |
HTTP agent | travis.repo.get_builds[{#SLUG}] |
Repo [{#SLUG}]: Get caches | Getting caches of {#SLUG} using Travis API. |
HTTP agent | travis.repo.get_caches[{#SLUG}] |
Repo [{#SLUG}]: Cache files | Count of cache files in {#SLUG} repo. |
Dependent item | travis.repo.caches.files[{#SLUG}] Preprocessing
|
Repo [{#SLUG}]: Cache size | Total size of cache files in {#SLUG} repo. |
Dependent item | travis.repo.caches.size[{#SLUG}] Preprocessing
|
Repo [{#SLUG}]: Builds passed | Count of all passed builds in {#SLUG} repo. |
Dependent item | travis.repo.builds.passed[{#SLUG}] Preprocessing
|
Repo [{#SLUG}]: Builds failed | Count of all failed builds in {#SLUG} repo. |
Dependent item | travis.repo.builds.failed[{#SLUG}] Preprocessing
|
Repo [{#SLUG}]: Builds total | Count of total builds in {#SLUG} repo. |
Dependent item | travis.repo.builds.total[{#SLUG}] Preprocessing
|
Repo [{#SLUG}]: Builds passed, % | Percent of passed builds in {#SLUG} repo. |
Calculated | travis.repo.builds.passed.pct[{#SLUG}] |
Repo [{#SLUG}]: Description | Description of Travis repo (git project description). |
Dependent item | travis.repo.description[{#SLUG}] Preprocessing
|
Repo [{#SLUG}]: Last build duration | Last build duration in {#SLUG} repo. |
Dependent item | travis.repo.last_build.duration[{#SLUG}] Preprocessing
|
Repo [{#SLUG}]: Last build state | Last build state in {#SLUG} repo. |
Dependent item | travis.repo.last_build.state[{#SLUG}] Preprocessing
|
Repo [{#SLUG}]: Last build number | Last build number in {#SLUG} repo. |
Dependent item | travis.repo.last_build.number[{#SLUG}] Preprocessing
|
Repo [{#SLUG}]: Last build id | Last build id in {#SLUG} repo. |
Dependent item | travis.repo.last_build.id[{#SLUG}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Travis CI: Repo [{#SLUG}]: Percent of successful builds | Low successful builds rate. |
last(/Travis CI by HTTP/travis.repo.builds.passed.pct[{#SLUG}])<{$TRAVIS.BUILDS.SUCCESS.PERCENT} |Warning |
Manual close: Yes | |
Travis CI: Repo [{#SLUG}]: Last build status is 'errored' | Last build status is errored. |
find(/Travis CI by HTTP/travis.repo.last_build.state[{#SLUG}],,"like","errored")=1 |Warning |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Apache Tomcat monitoring by Zabbix via JMX and doesn't require any external scripts.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Metrics are collected by JMX.
Name | Description | Default |
---|---|---|
{$TOMCAT.USER} | User for JMX |
|
{$TOMCAT.PASSWORD} | Password for JMX |
|
{$TOMCAT.LLD.FILTER.REQUEST_PROCESSOR.MATCHES} | Filter for discoverable global request processors. |
.* |
{$TOMCAT.LLD.FILTER.REQUESTPROCESSOR.NOTMATCHES} | Filter to exclude global request processors. |
CHANGE_IF_NEEDED |
{$TOMCAT.LLD.FILTER.MANAGER.MATCHES} | Filter for discoverable managers. |
.* |
{$TOMCAT.LLD.FILTER.MANAGER.NOT_MATCHES} | Filter to exclude managers. |
CHANGE_IF_NEEDED |
{$TOMCAT.LLD.FILTER.THREAD_POOL.MATCHES} | Filter for discoverable thread pools. |
.* |
{$TOMCAT.LLD.FILTER.THREADPOOL.NOTMATCHES} | Filter to exclude thread pools. |
CHANGE_IF_NEEDED |
{$TOMCAT.THREADS.MAX.PCT} | Threshold for busy worker threads trigger. Can be used with {#JMXNAME} as context. |
75 |
{$TOMCAT.THREADS.MAX.TIME} | The time during which the number of busy threads can exceed the threshold. Can be used with {#JMXNAME} as context. |
5m |
Name | Description | Type | Key and additional info |
---|---|---|---|
Version | The version of the Tomcat. |
JMX agent | jmx["Catalina:type=Server",serverInfo] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Apache Tomcat: Version has been changed | The Tomcat version has changed. Acknowledge to close the problem manually. |
last(/Apache Tomcat by JMX/jmx["Catalina:type=Server",serverInfo],#1)<>last(/Apache Tomcat by JMX/jmx["Catalina:type=Server",serverInfo],#2) and length(last(/Apache Tomcat by JMX/jmx["Catalina:type=Server",serverInfo]))>0 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Global request processors discovery | Discovery for GlobalRequestProcessor |
JMX agent | jmx.discovery[beans,"Catalina:type=GlobalRequestProcessor,name=*"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#JMXNAME}: Bytes received per second | Bytes received rate by processor {#JMXNAME} |
JMX agent | jmx[{#JMXOBJ},bytesReceived] Preprocessing
|
{#JMXNAME}: Bytes sent per second | Bytes sent rate by processor {#JMXNAME} |
JMX agent | jmx[{#JMXOBJ},bytesSent] Preprocessing
|
{#JMXNAME}: Errors per second | Error rate of request processor {#JMXNAME} |
JMX agent | jmx[{#JMXOBJ},errorCount] Preprocessing
|
{#JMXNAME}: Requests per second | Rate of requests served by request processor {#JMXNAME} |
JMX agent | jmx[{#JMXOBJ},requestCount] Preprocessing
|
{#JMXNAME}: Requests processing time | The total time to process all incoming requests of request processor {#JMXNAME} |
JMX agent | jmx[{#JMXOBJ},processingTime] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Protocol handlers discovery | Discovery for ProtocolHandler |
JMX agent | jmx.discovery[attributes,"Catalina:type=ProtocolHandler,port=*"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#JMXVALUE}: Gzip compression status | Gzip compression status on {#JMXNAME}. Enabling gzip compression may save server bandwidth. |
JMX agent | jmx[{#JMXOBJ},compression] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Apache Tomcat: {#JMXVALUE}: Gzip compression is disabled | gzip compression is disabled for connector {#JMXVALUE}. |
find(/Apache Tomcat by JMX/jmx[{#JMXOBJ},compression],,"like","off") = 1 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Thread pools discovery | Discovery for ThreadPool |
JMX agent | jmx.discovery[beans,"Catalina:type=ThreadPool,name=*"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#JMXNAME}: Threads count | Amount of threads the thread pool has right now, both busy and free. |
JMX agent | jmx[{#JMXOBJ},currentThreadCount] Preprocessing
|
{#JMXNAME}: Threads limit | Limit of the threads count. When currentThreadsBusy counter reaches the maxThreads limit, no more requests could be handled, and the application chokes. |
JMX agent | jmx[{#JMXOBJ},maxThreads] Preprocessing
|
{#JMXNAME}: Threads busy | Number of the requests that are being currently handled. |
JMX agent | jmx[{#JMXOBJ},currentThreadsBusy] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Apache Tomcat: {#JMXNAME}: Busy worker threads count is high | When current threads busy counter reaches the limit, no more requests could be handled, and the application chokes. |
min(/Apache Tomcat by JMX/jmx[{#JMXOBJ},currentThreadsBusy],{$TOMCAT.THREADS.MAX.TIME:"{#JMXNAME}"})>last(/Apache Tomcat by JMX/jmx[{#JMXOBJ},maxThreads])*{$TOMCAT.THREADS.MAX.PCT:"{#JMXNAME}"}/100 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Contexts discovery | Discovery for contexts |
JMX agent | jmx.discovery[beans,"Catalina:type=Manager,host=,context="] |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#JMXHOST}{#JMXCONTEXT}: Sessions active | Active sessions of the application. |
JMX agent | jmx[{#JMXOBJ},activeSessions] |
{#JMXHOST}{#JMXCONTEXT}: Sessions active maximum so far | Maximum number of active sessions so far. |
JMX agent | jmx[{#JMXOBJ},maxActive] |
{#JMXHOST}{#JMXCONTEXT}: Sessions created per second | Rate of sessions created by this application per second. |
JMX agent | jmx[{#JMXOBJ},sessionCounter] Preprocessing
|
{#JMXHOST}{#JMXCONTEXT}: Sessions rejected per second | Rate of sessions we rejected due to maxActive being reached. |
JMX agent | jmx[{#JMXOBJ},rejectedSessions] Preprocessing
|
{#JMXHOST}{#JMXCONTEXT}: Sessions allowed maximum | The maximum number of active Sessions allowed, or -1 for no limit. |
JMX agent | jmx[{#JMXOBJ},maxActiveSessions] |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Systemd monitoring by Zabbix via Zabbix agent 2 and doesn't require any external scripts.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Name | Description | Default |
---|---|---|
{$SYSTEMD.NAME.SOCKET.MATCHES} | Filter of systemd socket units by name. |
.+ |
{$SYSTEMD.NAME.SOCKET.NOT_MATCHES} | Filter of systemd socket units by name. |
CHANGE_IF_NEEDED |
{$SYSTEMD.ACTIVESTATE.SOCKET.MATCHES} | Filter of systemd socket units by active state. |
.+ |
{$SYSTEMD.ACTIVESTATE.SOCKET.NOT_MATCHES} | Filter of systemd socket units by active state. |
^inactive$ |
{$SYSTEMD.UNITFILESTATE.SOCKET.MATCHES} | Filter of systemd socket units by unit file state. |
^enabled$ |
{$SYSTEMD.UNITFILESTATE.SOCKET.NOT_MATCHES} | Filter of systemd socket units by unit file state. |
CHANGE_IF_NEEDED |
{$SYSTEMD.NAME.SERVICE.MATCHES} | Filter of systemd service units by name. |
.+ |
{$SYSTEMD.NAME.SERVICE.NOT_MATCHES} | Filter of systemd service units by name. |
CHANGE_IF_NEEDED |
{$SYSTEMD.ACTIVESTATE.SERVICE.MATCHES} | Filter of systemd service units by active state. |
.+ |
{$SYSTEMD.ACTIVESTATE.SERVICE.NOT_MATCHES} | Filter of systemd service units by active state. |
^inactive$ |
{$SYSTEMD.UNITFILESTATE.SERVICE.MATCHES} | Filter of systemd service units by unit file state. |
^enabled$ |
{$SYSTEMD.UNITFILESTATE.SERVICE.NOT_MATCHES} | Filter of systemd service units by unit file state. |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
Service units discovery | Discover systemd service units and their details. |
Zabbix agent | systemd.unit.discovery[service] |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#UNIT.NAME}: Get unit info | Returns all properties of a systemd service unit. Unit description: {#UNIT.DESCRIPTION}. |
Zabbix agent | systemd.unit.get["{#UNIT.NAME}"] |
{#UNIT.NAME}: Active state | State value that reflects whether the unit is currently active or not. The following states are currently defined: "active", "reloading", "inactive", "failed", "activating", and "deactivating". |
Dependent item | systemd.service.active_state["{#UNIT.NAME}"] Preprocessing
|
{#UNIT.NAME}: Load state | State value that reflects whether the configuration file of this unit has been loaded. The following states are currently defined: "loaded", "error", and "masked". |
Dependent item | systemd.service.load_state["{#UNIT.NAME}"] Preprocessing
|
{#UNIT.NAME}: Unit file state | Encodes the install state of the unit file of FragmentPath. It currently knows the following states: "enabled", "enabled-runtime", "linked", "linked-runtime", "masked", "masked-runtime", "static", "disabled", and "invalid". |
Dependent item | systemd.service.unitfile_state["{#UNIT.NAME}"] Preprocessing
|
{#UNIT.NAME}: Active time | Number of seconds since unit entered the active state. |
Dependent item | systemd.service.uptime["{#UNIT.NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Systemd: {#UNIT.NAME}: Service is not running | last(/Systemd by Zabbix agent 2/systemd.service.active_state["{#UNIT.NAME}"])<>1 |Warning |
Manual close: Yes | ||
Systemd: {#UNIT.NAME} has been restarted | Uptime is less than 10 minutes. |
last(/Systemd by Zabbix agent 2/systemd.service.uptime["{#UNIT.NAME}"])<10m |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Socket units discovery | Discover systemd socket units and their details. |
Zabbix agent | systemd.unit.discovery[socket] |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#UNIT.NAME}: Get unit info | Returns all properties of a systemd socket unit. Unit description: {#UNIT.DESCRIPTION}. |
Zabbix agent | systemd.unit.get["{#UNIT.NAME}",Socket] |
{#UNIT.NAME}: Connections accepted per sec | The number of accepted socket connections (NAccepted) per second. |
Dependent item | systemd.socket.conn_accepted.rate["{#UNIT.NAME}"] Preprocessing
|
{#UNIT.NAME}: Connections connected | The current number of socket connections (NConnections). |
Dependent item | systemd.socket.conn_count["{#UNIT.NAME}"] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Squid monitoring by Zabbix via SNMP and doesn't require any external scripts.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Enable SNMP support following official documentation. Required parameters in squid.conf:
snmp_port <port_number>
acl <zbx_acl_name> snmp_community <community_name>
snmp_access allow <zbx_acl_name> <zabbix_server_ip>
1. Import the template templateappsquid_snmp.yaml into Zabbix.
2. Set values for {$SQUID.SNMP.COMMUNITY}, {$SQUID.SNMP.PORT} and {$SQUID.HTTP.PORT} as configured in squid.conf.
3. Link the imported template to a host with Squid.
4. Add SNMPv2 interface to Squid host. Set Port as {$SQUID.SNMP.PORT} and SNMP community as {$SQUID.SNMP.COMMUNITY}.
Name | Description | Default |
---|---|---|
{$SQUID.SNMP.PORT} | snmp_port configured in squid.conf (Default: 3401) |
3401 |
{$SQUID.HTTP.PORT} | http_port configured in squid.conf (Default: 3128) |
3128 |
{$SQUID.SNMP.COMMUNITY} | SNMP community allowed by ACL in squid.conf |
public |
{$SQUID.FILE.DESC.WARN.MIN} | The threshold for minimum number of available file descriptors |
100 |
{$SQUID.PAGE.FAULT.WARN} | The threshold for sys page faults rate in percent of received HTTP requests |
90 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Service ping | Simple check | net.tcp.service[tcp,,{$SQUID.HTTP.PORT}] Preprocessing
|
|
Uptime | The Uptime of the cache in timeticks (in hundredths of a second) with preprocessing |
SNMP agent | squid[cacheUptime] Preprocessing
|
Version | Cache Software Version |
SNMP agent | squid[cacheVersionId] Preprocessing
|
CPU usage | The percentage use of the CPU |
SNMP agent | squid[cacheCpuUsage] |
Memory maximum resident size | Maximum Resident Size |
SNMP agent | squid[cacheMaxResSize] Preprocessing
|
Memory maximum cache size | The value of the cache_mem parameter |
SNMP agent | squid[cacheMemMaxSize] Preprocessing
|
Memory cache usage | Total accounted memory |
SNMP agent | squid[cacheMemUsage] Preprocessing
|
Cache swap low water mark | Cache Swap Low Water Mark |
SNMP agent | squid[cacheSwapLowWM] |
Cache swap high water mark | Cache Swap High Water Mark |
SNMP agent | squid[cacheSwapHighWM] |
Cache swap directory size | The total of the cache_dir space allocated |
SNMP agent | squid[cacheSwapMaxSize] Preprocessing
|
Cache swap current size | Storage Swap Size |
SNMP agent | squid[cacheCurrentSwapSize] |
File descriptor count - current used | Number of file descriptors in use |
SNMP agent | squid[cacheCurrentFileDescrCnt] |
File descriptor count - current maximum | Highest number of file descriptors in use |
SNMP agent | squid[cacheCurrentFileDescrMax] |
File descriptor count - current reserved | Reserved number of file descriptors |
SNMP agent | squid[cacheCurrentResFileDescrCnt] |
File descriptor count - current available | Available number of file descriptors |
SNMP agent | squid[cacheCurrentUnusedFDescrCnt] |
Byte hit ratio per 1 minute | Byte Hit Ratios |
SNMP agent | squid[cacheRequestByteRatio.1] |
Byte hit ratio per 5 minutes | Byte Hit Ratios |
SNMP agent | squid[cacheRequestByteRatio.5] |
Byte hit ratio per 1 hour | Byte Hit Ratios |
SNMP agent | squid[cacheRequestByteRatio.60] |
Request hit ratio per 1 minute | Byte Hit Ratios |
SNMP agent | squid[cacheRequestHitRatio.1] |
Request hit ratio per 5 minutes | Byte Hit Ratios |
SNMP agent | squid[cacheRequestHitRatio.5] |
Request hit ratio per 1 hour | Byte Hit Ratios |
SNMP agent | squid[cacheRequestHitRatio.60] |
Sys page faults per second | Page faults with physical I/O |
SNMP agent | squid[cacheSysPageFaults] Preprocessing
|
HTTP requests received per second | Number of HTTP requests received |
SNMP agent | squid[cacheProtoClientHttpRequests] Preprocessing
|
HTTP traffic received per second | Number of HTTP traffic received from clients |
SNMP agent | squid[cacheHttpInKb] Preprocessing
|
HTTP traffic sent per second | Number of HTTP traffic sent to clients |
SNMP agent | squid[cacheHttpOutKb] Preprocessing
|
HTTP Hits sent from cache per second | Number of HTTP Hits sent to clients from cache |
SNMP agent | squid[cacheHttpHits] Preprocessing
|
HTTP Errors sent per second | Number of HTTP Errors sent to clients |
SNMP agent | squid[cacheHttpErrors] Preprocessing
|
ICP messages sent per second | Number of ICP messages sent |
SNMP agent | squid[cacheIcpPktsSent] Preprocessing
|
ICP messages received per second | Number of ICP messages received |
SNMP agent | squid[cacheIcpPktsRecv] Preprocessing
|
ICP traffic transmitted per second | Number of ICP traffic transmitted |
SNMP agent | squid[cacheIcpKbSent] Preprocessing
|
ICP traffic received per second | Number of ICP traffic received |
SNMP agent | squid[cacheIcpKbRecv] Preprocessing
|
DNS server requests per second | Number of external dns server requests |
SNMP agent | squid[cacheDnsRequests] Preprocessing
|
DNS server replies per second | Number of external dns server replies |
SNMP agent | squid[cacheDnsReplies] Preprocessing
|
FQDN cache requests per second | Number of FQDN Cache requests |
SNMP agent | squid[cacheFqdnRequests] Preprocessing
|
FQDN cache hits per second | Number of FQDN Cache hits |
SNMP agent | squid[cacheFqdnHits] Preprocessing
|
FQDN cache misses per second | Number of FQDN Cache misses |
SNMP agent | squid[cacheFqdnMisses] Preprocessing
|
IP cache requests per second | Number of IP Cache requests |
SNMP agent | squid[cacheIpRequests] Preprocessing
|
IP cache hits per second | Number of IP Cache hits |
SNMP agent | squid[cacheIpHits] Preprocessing
|
IP cache misses per second | Number of IP Cache misses |
SNMP agent | squid[cacheIpMisses] Preprocessing
|
Objects count | Number of objects stored by the cache |
SNMP agent | squid[cacheNumObjCount] |
Objects LRU expiration age | Storage LRU Expiration Age |
SNMP agent | squid[cacheCurrentLRUExpiration] Preprocessing
|
Objects unlinkd requests | Requests given to unlinkd |
SNMP agent | squid[cacheCurrentUnlinkRequests] |
HTTP all service time per 5 minutes | HTTP all service time per 5 minutes |
SNMP agent | squid[cacheHttpAllSvcTime.5] Preprocessing
|
HTTP all service time per hour | HTTP all service time per hour |
SNMP agent | squid[cacheHttpAllSvcTime.60] Preprocessing
|
HTTP miss service time per 5 minutes | HTTP miss service time per 5 minutes |
SNMP agent | squid[cacheHttpMissSvcTime.5] Preprocessing
|
HTTP miss service time per hour | HTTP miss service time per hour |
SNMP agent | squid[cacheHttpMissSvcTime.60] Preprocessing
|
HTTP hit service time per 5 minutes | HTTP hit service time per 5 minutes |
SNMP agent | squid[cacheHttpHitSvcTime.5] Preprocessing
|
HTTP hit service time per hour | HTTP hit service time per hour |
SNMP agent | squid[cacheHttpHitSvcTime.60] Preprocessing
|
ICP query service time per 5 minutes | ICP query service time per 5 minutes |
SNMP agent | squid[cacheIcpQuerySvcTime.5] Preprocessing
|
ICP query service time per hour | ICP query service time per hour |
SNMP agent | squid[cacheIcpQuerySvcTime.60] Preprocessing
|
ICP reply service time per 5 minutes | ICP reply service time per 5 minutes |
SNMP agent | squid[cacheIcpReplySvcTime.5] Preprocessing
|
ICP reply service time per hour | ICP reply service time per hour |
SNMP agent | squid[cacheIcpReplySvcTime.60] Preprocessing
|
DNS service time per 5 minutes | DNS service time per 5 minutes |
SNMP agent | squid[cacheDnsSvcTime.5] Preprocessing
|
DNS service time per hour | DNS service time per hour |
SNMP agent | squid[cacheDnsSvcTime.60] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Squid: Port {$SQUID.HTTP.PORT} is down | last(/Squid by SNMP/net.tcp.service[tcp,,{$SQUID.HTTP.PORT}])=0 |Average |
Manual close: Yes | ||
Squid: Squid has been restarted | Uptime is less than 10 minutes. |
last(/Squid by SNMP/squid[cacheUptime])<10m |Info |
Manual close: Yes | |
Squid: Squid version has been changed | Squid version has changed. Acknowledge to close the problem manually. |
last(/Squid by SNMP/squid[cacheVersionId],#1)<>last(/Squid by SNMP/squid[cacheVersionId],#2) and length(last(/Squid by SNMP/squid[cacheVersionId]))>0 |Info |
Manual close: Yes | |
Squid: Swap usage is more than low watermark | last(/Squid by SNMP/squid[cacheCurrentSwapSize])>last(/Squid by SNMP/squid[cacheSwapLowWM])*last(/Squid by SNMP/squid[cacheSwapMaxSize])/100 |Warning |
|||
Squid: Swap usage is more than high watermark | last(/Squid by SNMP/squid[cacheCurrentSwapSize])>last(/Squid by SNMP/squid[cacheSwapHighWM])*last(/Squid by SNMP/squid[cacheSwapMaxSize])/100 |High |
|||
Squid: Squid is running out of file descriptors | last(/Squid by SNMP/squid[cacheCurrentUnusedFDescrCnt])<{$SQUID.FILE.DESC.WARN.MIN} |Warning |
|||
Squid: High sys page faults rate | avg(/Squid by SNMP/squid[cacheSysPageFaults],5m)>avg(/Squid by SNMP/squid[cacheProtoClientHttpRequests],5m)/100*{$SQUID.PAGE.FAULT.WARN} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Microsoft SharePoint monitoring by Zabbix via HTTP and doesn't require any external scripts.
SharePoint includes a Representational State Transfer (REST) service. Developers can perform read operations from their SharePoint Add-ins, solutions, and client applications, using REST web technologies and standard Open Data Protocol (OData) syntax. Details in https://docs.microsoft.com/ru-ru/sharepoint/dev/sp-add-ins/get-to-know-the-sharepoint-rest-service?tabs=csom
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Create a new host. Define macros according to your Sharepoint web portal. It is recommended to fill in the values of the filter macros to avoid getting redundant data.
Name | Description | Default |
---|---|---|
{$SHAREPOINT.USER} | ||
{$SHAREPOINT.PASSWORD} | ||
{$SHAREPOINT.URL} | Portal page URL. For example http://sharepoint.companyname.local/ |
|
{$SHAREPOINT.LLD.FILTER.NAME.MATCHES} | Filter of discoverable dictionaries by name. |
.* |
{$SHAREPOINT.LLD.FILTER.FULL_PATH.MATCHES} | Filter of discoverable dictionaries by full path. |
^/ |
{$SHAREPOINT.LLD.FILTER.TYPE.MATCHES} | Filter of discoverable types. |
FOLDER |
{$SHAREPOINT.LLD.FILTER.NAME.NOT_MATCHES} | Filter to exclude discovered dictionaries by name. |
CHANGE_IF_NEEDED |
{$SHAREPOINT.LLD.FILTER.FULLPATH.NOTMATCHES} | Filter to exclude discovered dictionaries by full path. |
CHANGE_IF_NEEDED |
{$SHAREPOINT.LLD.FILTER.TYPE.NOT_MATCHES} | Filter to exclude discovered types. |
CHANGE_IF_NEEDED |
{$SHAREPOINT.ROOT} | /Shared Documents |
|
{$SHAREPOINT.LLD_INTERVAL} | 3h |
|
{$SHAREPOINT.GET_INTERVAL} | 1m |
|
{$SHAREPOINT.MAXHEALTHSCORE} | Must be in the range from 0 to 10 in details: https://docs.microsoft.com/en-us/openspecs/sharepoint_protocols/ms-wsshp/c60ddeb6-4113-4a73-9e97-26b5c3907d33 |
5 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get directory structure | Used to get directory structure information |
Script | sharepoint.get_dir Preprocessing
|
Get directory structure: Status | HTTP response (status) code. Indicates whether the HTTP request was successfully completed. Additional information is available in the server log file. |
Dependent item | sharepoint.get_dir.status Preprocessing
|
Get directory structure: Exec time | The time taken to execute the script for obtaining the data structure (in ms). Less is better. |
Dependent item | sharepoint.get_dir.time Preprocessing
|
Health score | This item specifies a value between 0 and 10, where 0 represents a low load and a high ability to process requests and 10 represents a high load and that the server is throttling requests to maintain adequate throughput. |
HTTP agent | sharepoint.health_score Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MS SharePoint: Error getting directory structure. | Error getting directory structure. Check the Zabbix server log for more details. |
last(/Microsoft SharePoint by HTTP/sharepoint.get_dir.status)<>200 |Warning |
Manual close: Yes | |
MS SharePoint: Server responds slowly to API request | last(/Microsoft SharePoint by HTTP/sharepoint.get_dir.time)>2000 |Warning |
Manual close: Yes | ||
MS SharePoint: Bad health score | last(/Microsoft SharePoint by HTTP/sharepoint.health_score)>"{$SHAREPOINT.MAX_HEALTH_SCORE}" |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Directory discovery | Script | sharepoint.directory.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Size ({#SHAREPOINT.LLD.FULL_PATH}) | Size of: {#SHAREPOINT.LLD.FULL_PATH} |
Dependent item | sharepoint.size["{#SHAREPOINT.LLD.FULL_PATH}"] Preprocessing
|
Modified ({#SHAREPOINT.LLD.FULL_PATH}) | Date of change: {#SHAREPOINT.LLD.FULL_PATH} |
Dependent item | sharepoint.modified["{#SHAREPOINT.LLD.FULL_PATH}"] Preprocessing
|
Created ({#SHAREPOINT.LLD.FULL_PATH}) | Date of creation: {#SHAREPOINT.LLD.FULL_PATH} |
Dependent item | sharepoint.created["{#SHAREPOINT.LLD.FULL_PATH}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MS SharePoint: Sharepoint object is changed | Updated date of modification of folder / file |
last(/Microsoft SharePoint by HTTP/sharepoint.modified["{#SHAREPOINT.LLD.FULL_PATH}"],#1)<>last(/Microsoft SharePoint by HTTP/sharepoint.modified["{#SHAREPOINT.LLD.FULL_PATH}"],#2) |Info |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is developed to monitor the messaging broker RabbitMQ cluster by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template collects metrics by polling RabbitMQ management plugin with HTTP agent remotely.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Enable the RabbitMQ management plugin. See the RabbitMQ documentation
for the instructions.
Create a user to monitor the service:
rabbitmqctl add_user zbx_monitor <PASSWORD>
rabbitmqctl set_permissions -p / zbx_monitor "" "" ".*"
rabbitmqctl set_user_tags zbx_monitor monitoring
Set the hostname or IP address of the RabbitMQ cluster host in the {$RABBITMQ.API.CLUSTER_HOST}
macro. You can also change the port in the {$RABBITMQ.API.PORT}
macro and the scheme in the {$RABBITMQ.API.SCHEME}
macro if necessary.
Set the user name and password in the macros {$RABBITMQ.API.USER}
and {$RABBITMQ.API.PASSWORD}
.
Name | Description | Default |
---|---|---|
{$RABBITMQ.API.USER} | zbx_monitor |
|
{$RABBITMQ.API.PASSWORD} | zabbix |
|
{$RABBITMQ.API.CLUSTER_HOST} | The hostname or IP of the API endpoint for the RabbitMQ cluster. |
<SET CLUSTER API HOST> |
{$RABBITMQ.API.PORT} | The port of the RabbitMQ API endpoint. |
15672 |
{$RABBITMQ.API.SCHEME} | The request scheme, which may be HTTP or HTTPS. |
http |
{$RABBITMQ.LLD.FILTER.EXCHANGE.MATCHES} | This macro is used in the discovery of exchanges. It can be overridden at host level or its linked template level. |
.* |
{$RABBITMQ.LLD.FILTER.EXCHANGE.NOT_MATCHES} | This macro is used in the discovery of exchanges. It can be overridden at host level or its linked template level. |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get overview | The HTTP API endpoint that returns cluster-wide metrics. |
HTTP agent | rabbitmq.get_overview |
Get exchanges | The HTTP API endpoint that returns exchanges metrics. |
HTTP agent | rabbitmq.get_exchanges |
Connections total | The total number of connections. |
Dependent item | rabbitmq.overview.object_totals.connections Preprocessing
|
Channels total | The total number of channels. |
Dependent item | rabbitmq.overview.object_totals.channels Preprocessing
|
Queues total | The total number of queues. |
Dependent item | rabbitmq.overview.object_totals.queues Preprocessing
|
Consumers total | The total number of consumers. |
Dependent item | rabbitmq.overview.object_totals.consumers Preprocessing
|
Exchanges total | The total number of exchanges. |
Dependent item | rabbitmq.overview.object_totals.exchanges Preprocessing
|
Messages total | The total number of messages (ready, plus unacknowledged). |
Dependent item | rabbitmq.overview.queue_totals.messages Preprocessing
|
Messages ready for delivery | The number of messages ready for delivery. |
Dependent item | rabbitmq.overview.queue_totals.messages.ready Preprocessing
|
Messages unacknowledged | The number of unacknowledged messages. |
Dependent item | rabbitmq.overview.queue_totals.messages.unacknowledged Preprocessing
|
Messages acknowledged | The number of messages delivered to clients and acknowledged. |
Dependent item | rabbitmq.overview.messages.ack Preprocessing
|
Messages acknowledged per second | The rate of messages (per second) delivered to clients and acknowledged. |
Dependent item | rabbitmq.overview.messages.ack.rate Preprocessing
|
Messages confirmed | The count of confirmed messages. |
Dependent item | rabbitmq.overview.messages.confirm Preprocessing
|
Messages confirmed per second | The rate of messages confirmed per second. |
Dependent item | rabbitmq.overview.messages.confirm.rate Preprocessing
|
Messages delivered | The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the |
Dependent item | rabbitmq.overview.messages.deliver_get Preprocessing
|
Messages delivered per second | The rate of the sum of messages (per second) delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the |
Dependent item | rabbitmq.overview.messages.deliver_get.rate Preprocessing
|
Messages published | The count of published messages. |
Dependent item | rabbitmq.overview.messages.publish Preprocessing
|
Messages published per second | The rate of messages published per second. |
Dependent item | rabbitmq.overview.messages.publish.rate Preprocessing
|
Messages publish_in | The count of messages published from the channels into this overview. |
Dependent item | rabbitmq.overview.messages.publish_in Preprocessing
|
Messages publish_in per second | The rate of messages (per second) published from the channels into this overview. |
Dependent item | rabbitmq.overview.messages.publish_in.rate Preprocessing
|
Messages publish_out | The count of messages published from this overview into queues. |
Dependent item | rabbitmq.overview.messages.publish_out Preprocessing
|
Messages publish_out per second | The rate of messages (per second) published from this overview into queues. |
Dependent item | rabbitmq.overview.messages.publish_out.rate Preprocessing
|
Messages returned unroutable | The count of messages returned to a publisher as unroutable. |
Dependent item | rabbitmq.overview.messages.return_unroutable Preprocessing
|
Messages returned unroutable per second | The rate of messages (per second) returned to a publisher as unroutable. |
Dependent item | rabbitmq.overview.messages.return_unroutable.rate Preprocessing
|
Messages returned redeliver | The count of subset of messages in the |
Dependent item | rabbitmq.overview.messages.redeliver Preprocessing
|
Messages returned redeliver per second | The rate of subset of messages (per second) in the |
Dependent item | rabbitmq.overview.messages.redeliver.rate Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ cluster: Failed to fetch overview data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/RabbitMQ cluster by HTTP/rabbitmq.get_overview,30m)=1 |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Health Check 3.8.10+ discovery | Specific metrics for the versions: up to and including 3.8.10. |
Dependent item | rabbitmq.healthcheck.v3810.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Healthcheck: alarms in effect in the cluster{#SINGLETON} | Responds a 200 OK if there are no alarms in effect in the cluster, otherwise responds with a 503 Service Unavailable. |
HTTP agent | rabbitmq.healthcheck.alarms[{#SINGLETON}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ cluster: There are active alarms in the cluster | This is the default API endpoint path: http://{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ cluster by HTTP/rabbitmq.healthcheck.alarms[{#SINGLETON}])=0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Exchanges discovery | The metrics for an individual exchange. |
Dependent item | rabbitmq.exchanges.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Get data | The HTTP API endpoint that returns [{#VHOST}][{#EXCHANGE}][{#TYPE}] exchanges metrics |
Dependent item | rabbitmq.get_exchanges["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages acknowledged | The number of messages delivered to clients and acknowledged. |
Dependent item | rabbitmq.exchange.messages.ack["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages acknowledged per second | The rate of messages (per second) delivered to clients and acknowledged. |
Dependent item | rabbitmq.exchange.messages.ack.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages confirmed | The count of confirmed messages. |
Dependent item | rabbitmq.exchange.messages.confirm["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages confirmed per second | The rate of messages confirmed per second. |
Dependent item | rabbitmq.exchange.messages.confirm.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages delivered | The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the |
Dependent item | rabbitmq.exchange.messages.deliver_get["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages delivered per second | The rate of the sum of messages (per second) delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the |
Dependent item | rabbitmq.exchange.messages.deliver_get.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages published | The count of published messages. |
Dependent item | rabbitmq.exchange.messages.publish["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages published per second | The rate of messages published per second. |
Dependent item | rabbitmq.exchange.messages.publish.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_in | The count of messages published from the channels into this overview. |
Dependent item | rabbitmq.exchange.messages.publish_in["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_in per second | The rate of messages (per second) published from the channels into this overview. |
Dependent item | rabbitmq.exchange.messages.publish_in.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_out | The count of messages published from this overview into queues. |
Dependent item | rabbitmq.exchange.messages.publish_out["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_out per second | The rate of messages (per second) published from this overview into queues. |
Dependent item | rabbitmq.exchange.messages.publish_out.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages returned unroutable | The count of messages returned to a publisher as unroutable. |
Dependent item | rabbitmq.exchange.messages.return_unroutable["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages returned unroutable per second | The rate of messages (per second) returned to a publisher as unroutable. |
Dependent item | rabbitmq.exchange.messages.return_unroutable.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages redelivered | The count of subset of messages in the |
Dependent item | rabbitmq.exchange.messages.redeliver["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange {#VHOST}/{#EXCHANGE}/{#TYPE}: Messages redelivered per second | The rate of subset of messages (per second) in the |
Dependent item | rabbitmq.exchange.messages.redeliver.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
This template is developed to monitor the messaging broker RabbitMQ node by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template collects metrics by polling RabbitMQ management plugin with HTTP agent remotely.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Enable the RabbitMQ management plugin. See the RabbitMQ documentation
for the instructions.
Create a user to monitor the service:
rabbitmqctl add_user zbx_monitor <PASSWORD>
rabbitmqctl set_permissions -p / zbx_monitor "" "" ".*"
rabbitmqctl set_user_tags zbx_monitor monitoring
Set the hostname or IP address of the RabbitMQ node host in the {$RABBITMQ.API.HOST}
macro. You can also change the port in the {$RABBITMQ.API.PORT}
macro and the scheme in the {$RABBITMQ.API.SCHEME}
macro if necessary.
Set the user name and password in the macros {$RABBITMQ.API.USER}
and {$RABBITMQ.API.PASSWORD}
.
Name | Description | Default |
---|---|---|
{$RABBITMQ.API.USER} | zbx_monitor |
|
{$RABBITMQ.API.PASSWORD} | zabbix |
|
{$RABBITMQ.CLUSTER.NAME} | The name of the RabbitMQ cluster. |
rabbit |
{$RABBITMQ.API.PORT} | The port of the RabbitMQ API endpoint. |
15672 |
{$RABBITMQ.API.SCHEME} | The request scheme, which may be HTTP or HTTPS. |
http |
{$RABBITMQ.API.HOST} | The hostname or IP of the API endpoint for the RabbitMQ. |
<SET NODE API HOST> |
{$RABBITMQ.LLD.FILTER.QUEUE.MATCHES} | This macro is used in the discovery of queues. It can be overridden at host level or its linked template level. |
.* |
{$RABBITMQ.LLD.FILTER.QUEUE.NOT_MATCHES} | This macro is used in the discovery of queues. It can be overridden at host level or its linked template level. |
CHANGE_IF_NEEDED |
{$RABBITMQ.RESPONSE_TIME.MAX.WARN} | The maximum response time by the RabbitMQ expressed in seconds for a trigger expression. |
10 |
{$RABBITMQ.MESSAGES.MAX.WARN} | The maximum number of messages in the queue for a trigger expression. |
1000 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Service ping | Simple check | net.tcp.service["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"] Preprocessing
|
|
Get node overview | The HTTP API endpoint that returns cluster-wide metrics. |
HTTP agent | rabbitmq.getnodeoverview Preprocessing
|
Get nodes | The HTTP API endpoint that returns metrics of the nodes. |
HTTP agent | rabbitmq.get_nodes |
Get queues | The HTTP API endpoint that returns metrics of the queues metrics. |
HTTP agent | rabbitmq.get_queues |
Management plugin version | The version of the management plugin in use. |
Dependent item | rabbitmq.node.overview.management_version Preprocessing
|
RabbitMQ version | The version of the RabbitMQ on the node, which processed this request. |
Dependent item | rabbitmq.node.overview.rabbitmq_version Preprocessing
|
Used file descriptors | The descriptors of the used file. |
Dependent item | rabbitmq.node.fd_used Preprocessing
|
Free disk space | The current free disk space. |
Dependent item | rabbitmq.node.disk_free Preprocessing
|
Disk free limit | The free space limit of a disk expressed in bytes. |
Dependent item | rabbitmq.node.diskfreelimit Preprocessing
|
Memory used | The memory usage expressed in bytes. |
Dependent item | rabbitmq.node.mem_used Preprocessing
|
Memory limit | The memory usage with high watermark properties expressed in bytes. |
Dependent item | rabbitmq.node.mem_limit Preprocessing
|
Runtime run queue | The average number of Erlang processes waiting to run. |
Dependent item | rabbitmq.node.run_queue Preprocessing
|
Sockets used | The number of file descriptors used as sockets. |
Dependent item | rabbitmq.node.sockets_used Preprocessing
|
Sockets available | The file descriptors available for use as sockets. |
Dependent item | rabbitmq.node.sockets_total Preprocessing
|
Number of network partitions | The number of network partitions, which this node "sees". |
Dependent item | rabbitmq.node.partitions Preprocessing
|
Is running | It "sees" whether the node is running or not. |
Dependent item | rabbitmq.node.running Preprocessing
|
Memory alarm | It checks whether the host has a memory alarm or not. |
Dependent item | rabbitmq.node.mem_alarm Preprocessing
|
Disk free alarm | It checks whether the node has a disk alarm or not. |
Dependent item | rabbitmq.node.diskfreealarm Preprocessing
|
Uptime | Uptime expressed in milliseconds. |
Dependent item | rabbitmq.node.uptime Preprocessing
|
Service response time | Simple check | net.tcp.service.perf["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ node: Service is down | last(/RabbitMQ node by HTTP/net.tcp.service["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"])=0 |Average |
Manual close: Yes | ||
RabbitMQ node: Failed to fetch nodes data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/RabbitMQ node by HTTP/rabbitmq.get_nodes,30m)=1 |Warning |
Manual close: Yes Depends on:
|
|
RabbitMQ node: Version has changed | RabbitMQ version has changed. Acknowledge to close the problem manually. |
last(/RabbitMQ node by HTTP/rabbitmq.node.overview.rabbitmq_version,#1)<>last(/RabbitMQ node by HTTP/rabbitmq.node.overview.rabbitmq_version,#2) and length(last(/RabbitMQ node by HTTP/rabbitmq.node.overview.rabbitmq_version))>0 |Info |
Manual close: Yes | |
RabbitMQ node: Number of network partitions is too high | For more details see Detecting Network Partitions. |
min(/RabbitMQ node by HTTP/rabbitmq.node.partitions,5m)>0 |Warning |
||
RabbitMQ node: Node is not running | RabbitMQ node is not running. |
max(/RabbitMQ node by HTTP/rabbitmq.node.running,5m)=0 |Average |
Depends on:
|
|
RabbitMQ node: Memory alarm | For more details see Memory Alarms. |
last(/RabbitMQ node by HTTP/rabbitmq.node.mem_alarm)=1 |Average |
||
RabbitMQ node: Free disk space alarm | For more details see Free Disk Space Alarms. |
last(/RabbitMQ node by HTTP/rabbitmq.node.disk_free_alarm)=1 |Average |
||
RabbitMQ node: Host has been restarted | Uptime is less than 10 minutes. |
last(/RabbitMQ node by HTTP/rabbitmq.node.uptime)<10m |Info |
Manual close: Yes | |
RabbitMQ node: Service response time is too high | min(/RabbitMQ node by HTTP/net.tcp.service.perf["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"],5m)>{$RABBITMQ.RESPONSE_TIME.MAX.WARN} |Warning |
Manual close: Yes Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Health Check 3.8.10+ discovery | Specific metrics for the versions: up to and including 3.8.10. |
Dependent item | rabbitmq.healthcheck.v3810.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Healthcheck: local alarms in effect on this node{#SINGLETON} | It responds with a status code Otherwise, it responds with a status code |
HTTP agent | rabbitmq.healthcheck.local_alarms[{#SINGLETON}] Preprocessing
|
Healthcheck: expiration date on the certificates{#SINGLETON} | It checks the expiration date on the certificates for every listener configured to use the Transport Layer Security (TLS). It responds with a status code Otherwise, it responds with a status code |
HTTP agent | rabbitmq.healthcheck.certificate_expiration[{#SINGLETON}] Preprocessing
|
Healthcheck: virtual hosts on this node{#SINGLETON} | It responds with It responds with a status code Otherwise it responds with a status code |
HTTP agent | rabbitmq.healthcheck.virtual_hosts[{#SINGLETON}] Preprocessing
|
Healthcheck: classic mirrored queues without synchronized mirrors online{#SINGLETON} | It checks if there are classic mirrored queues without synchronized mirrors online (queues that would potentially lose data if the target node is shut down). It responds with a status code Otherwise, it responds with a status code |
HTTP agent | rabbitmq.healthcheck.mirror_sync[{#SINGLETON}] Preprocessing
|
Healthcheck: queues with minimum online quorum{#SINGLETON} | It checks if there are quorum queues with minimum online quorum (queues that would lose their quorum and availability if the target node is shut down). It responds with a status code Otherwise, it responds with a status code |
HTTP agent | rabbitmq.healthcheck.quorum[{#SINGLETON}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ node: There are active alarms in the node | This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by HTTP/rabbitmq.healthcheck.local_alarms[{#SINGLETON}])=0 |Average |
||
RabbitMQ node: There are valid TLS certificates expiring in the next month | This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by HTTP/rabbitmq.healthcheck.certificate_expiration[{#SINGLETON}])=0 |Average |
||
RabbitMQ node: There are not running virtual hosts | This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by HTTP/rabbitmq.healthcheck.virtual_hosts[{#SINGLETON}])=0 |Average |
||
RabbitMQ node: There are queues that could potentially lose data if this node goes offline. | This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by HTTP/rabbitmq.healthcheck.mirror_sync[{#SINGLETON}])=0 |Average |
||
RabbitMQ node: There are queues that would lose their quorum and availability if this node is shut down. | This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by HTTP/rabbitmq.healthcheck.quorum[{#SINGLETON}])=0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Health Check 3.8.9- discovery | Specific metrics for the versions: up to and including 3.8.4. |
Dependent item | rabbitmq.healthcheck.v389.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Healthcheck{#SINGLETON} | It checks whether the RabbitMQ application is running; and whether the channels and queues can be listed successfully; and that no alarms are in effect. |
HTTP agent | rabbitmq.healthcheck[{#SINGLETON}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ node: Node healthcheck failed | For more details see Health Checks. |
last(/RabbitMQ node by HTTP/rabbitmq.healthcheck[{#SINGLETON}])=0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Queues discovery | The metrics for an individual queue. |
Dependent item | rabbitmq.queues.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Queue [{#VHOST}][{#QUEUE}]: Get data | The HTTP API endpoint that returns [{#VHOST}][{#QUEUE}] queue metrics |
Dependent item | rabbitmq.get_exchanges["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages total | The count of total messages in the queue. |
Dependent item | rabbitmq.queue.messages["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages per second | The count of total messages per second in the queue. |
Dependent item | rabbitmq.queue.messages.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Consumers | The number of consumers. |
Dependent item | rabbitmq.queue.consumers["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Memory | The bytes of memory consumed by the Erlang process associated with the queue, including stack, heap and internal structures. |
Dependent item | rabbitmq.queue.memory["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages ready | The number of messages ready to be delivered to clients. |
Dependent item | rabbitmq.queue.messages_ready["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages ready per second | The number of messages per second ready to be delivered to clients. |
Dependent item | rabbitmq.queue.messages_ready.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages unacknowledged | The number of messages delivered to clients but not yet acknowledged. |
Dependent item | rabbitmq.queue.messages_unacknowledged["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages unacknowledged per second | The number of messages per second delivered to clients but not yet acknowledged. |
Dependent item | rabbitmq.queue.messages_unacknowledged.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages acknowledged | The number of messages delivered to clients and acknowledged. |
Dependent item | rabbitmq.queue.messages.ack["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages acknowledged per second | The number of messages (per second) delivered to clients and acknowledged. |
Dependent item | rabbitmq.queue.messages.ack.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages delivered | The count of messages delivered to consumers in acknowledgement mode. |
Dependent item | rabbitmq.queue.messages.deliver["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages delivered per second | The count of messages (per second) delivered to consumers in acknowledgement mode. |
Dependent item | rabbitmq.queue.messages.deliver.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Sum of messages delivered | The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the |
Dependent item | rabbitmq.queue.messages.deliver_get["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Sum of messages delivered per second | The rate of delivery per second. The sum of messages delivered (per second) to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to |
Dependent item | rabbitmq.queue.messages.deliver_get.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages published | The count of published messages. |
Dependent item | rabbitmq.queue.messages.publish["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages published per second | The rate of published messages per second. |
Dependent item | rabbitmq.queue.messages.publish.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages redelivered | The count of subset of messages in the |
Dependent item | rabbitmq.queue.messages.redeliver["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages redelivered per second | The rate of messages redelivered per second. |
Dependent item | rabbitmq.queue.messages.redeliver.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ node: Too many messages in queue [{#VHOST}][{#QUEUE}] | min(/RabbitMQ node by HTTP/rabbitmq.queue.messages["{#VHOST}/{#QUEUE}"],5m)>{$RABBITMQ.MESSAGES.MAX.WARN:"{#QUEUE}"} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is developed to monitor the messaging broker RabbitMQ by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template RabbitMQ Cluster
— collects metrics by polling RabbitMQ management plugin with Zabbix agent.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Enable the RabbitMQ management plugin. See RabbitMQ documentation for the instructions.
Create a user to monitor the service:
rabbitmqctl add_user zbx_monitor <PASSWORD>
rabbitmqctl set_permissions -p / zbx_monitor "" "" ".*"
rabbitmqctl set_user_tags zbx_monitor monitoring
A login name and password are also supported in macros functions:
If your cluster consists of several nodes, it is recommended to assign the cluster
template to a separate balancing host.
In the case of a single-node installation, you can assign the cluster
template to one host with a node
template.
If you use another API endpoint, then don't forget to change {$RABBITMQ.API.CLUSTER_HOST}
macro.
Install and setup Zabbix agent.
Name | Description | Default |
---|---|---|
{$RABBITMQ.API.USER} | zbx_monitor |
|
{$RABBITMQ.API.PASSWORD} | zabbix |
|
{$RABBITMQ.API.CLUSTER_HOST} | The hostname or IP of the API endpoint for the RabbitMQ cluster. |
127.0.0.1 |
{$RABBITMQ.API.PORT} | The port of the RabbitMQ API endpoint. |
15672 |
{$RABBITMQ.API.SCHEME} | The request scheme, which may be HTTP or HTTPS. |
http |
{$RABBITMQ.LLD.FILTER.EXCHANGE.MATCHES} | This macro is used in the discovery of exchanges. It can be overridden at host level or its linked template level. |
.* |
{$RABBITMQ.LLD.FILTER.EXCHANGE.NOT_MATCHES} | This macro is used in the discovery of exchanges. It can be overridden at host level or its linked template level. |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get overview | The HTTP API endpoint that returns cluster-wide metrics. |
Zabbix agent | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/overview"] Preprocessing
|
Get exchanges | The HTTP API endpoint that returns exchanges metrics. |
Zabbix agent | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/exchanges"] Preprocessing
|
Connections total | The total number of connections. |
Dependent item | rabbitmq.overview.object_totals.connections Preprocessing
|
Channels total | The total number of channels. |
Dependent item | rabbitmq.overview.object_totals.channels Preprocessing
|
Queues total | The total number of queues. |
Dependent item | rabbitmq.overview.object_totals.queues Preprocessing
|
Consumers total | The total number of consumers. |
Dependent item | rabbitmq.overview.object_totals.consumers Preprocessing
|
Exchanges total | The total number of exchanges. |
Dependent item | rabbitmq.overview.object_totals.exchanges Preprocessing
|
Messages total | The total number of messages (ready, plus unacknowledged). |
Dependent item | rabbitmq.overview.queue_totals.messages Preprocessing
|
Messages ready for delivery | The number of messages ready for delivery. |
Dependent item | rabbitmq.overview.queue_totals.messages.ready Preprocessing
|
Messages unacknowledged | The number of unacknowledged messages. |
Dependent item | rabbitmq.overview.queue_totals.messages.unacknowledged Preprocessing
|
Messages acknowledged | The number of messages delivered to clients and acknowledged. |
Dependent item | rabbitmq.overview.messages.ack Preprocessing
|
Messages acknowledged per second | The rate of messages (per second) delivered to clients and acknowledged. |
Dependent item | rabbitmq.overview.messages.ack.rate Preprocessing
|
Messages confirmed | The count of confirmed messages. |
Dependent item | rabbitmq.overview.messages.confirm Preprocessing
|
Messages confirmed per second | The rate of messages confirmed per second. |
Dependent item | rabbitmq.overview.messages.confirm.rate Preprocessing
|
Messages delivered | The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the |
Dependent item | rabbitmq.overview.messages.deliver_get Preprocessing
|
Messages delivered per second | The rate of the sum of messages (per second) delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the |
Dependent item | rabbitmq.overview.messages.deliver_get.rate Preprocessing
|
Messages published | The count of published messages. |
Dependent item | rabbitmq.overview.messages.publish Preprocessing
|
Messages published per second | The rate of messages published per second. |
Dependent item | rabbitmq.overview.messages.publish.rate Preprocessing
|
Messages publish_in | The count of messages published from the channels into this overview. |
Dependent item | rabbitmq.overview.messages.publish_in Preprocessing
|
Messages publish_in per second | The rate of messages (per second) published from the channels into this overview. |
Dependent item | rabbitmq.overview.messages.publish_in.rate Preprocessing
|
Messages publish_out | The count of messages published from this overview into queues. |
Dependent item | rabbitmq.overview.messages.publish_out Preprocessing
|
Messages publish_out per second | The rate of messages (per second) published from this overview into queues. |
Dependent item | rabbitmq.overview.messages.publish_out.rate Preprocessing
|
Messages returned unroutable | The count of messages returned to a publisher as unroutable. |
Dependent item | rabbitmq.overview.messages.return_unroutable Preprocessing
|
Messages returned unroutable per second | The rate of messages (per second) returned to a publisher as unroutable. |
Dependent item | rabbitmq.overview.messages.return_unroutable.rate Preprocessing
|
Messages returned redeliver | The count of subset of messages in the |
Dependent item | rabbitmq.overview.messages.redeliver Preprocessing
|
Messages returned redeliver per second | The rate of subset of messages (per second) in the |
Dependent item | rabbitmq.overview.messages.redeliver.rate Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ cluster: Failed to fetch overview data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/RabbitMQ cluster by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/overview"],30m)=1 |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Health Check 3.8.10+ discovery | Specific metrics for the versions: up to and including 3.8.10. |
Dependent item | rabbitmq.healthcheck.v3810.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Healthcheck: alarms in effect in the cluster{#SINGLETON} | It responds with a status code Otherwise, it responds with a status code |
Zabbix agent | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/health/checks/alarms{#SINGLETON}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ cluster: There are active alarms in the cluster | This is the default API endpoint path: http://{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ cluster by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/health/checks/alarms{#SINGLETON}"])=0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Exchanges discovery | The metrics for an individual exchange. |
Dependent item | rabbitmq.exchanges.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Get data | The HTTP API endpoint that returns [{#VHOST}][{#EXCHANGE}][{#TYPE}] exchanges metrics |
Dependent item | rabbitmq.get_exchanges["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages acknowledged | The number of messages delivered to clients and acknowledged. |
Dependent item | rabbitmq.exchange.messages.ack["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages acknowledged per second | The rate of messages (per second) delivered to clients and acknowledged. |
Dependent item | rabbitmq.exchange.messages.ack.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages confirmed | The count of confirmed messages. |
Dependent item | rabbitmq.exchange.messages.confirm["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages confirmed per second | The rate of messages confirmed per second. |
Dependent item | rabbitmq.exchange.messages.confirm.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages delivered | The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the |
Dependent item | rabbitmq.exchange.messages.deliver_get["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages delivered per second | The rate of the sum of messages (per second) delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the |
Dependent item | rabbitmq.exchange.messages.deliver_get.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages published | The count of published messages. |
Dependent item | rabbitmq.exchange.messages.publish["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages published per second | The rate of messages published per second. |
Dependent item | rabbitmq.exchange.messages.publish.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_in | The count of messages published from the channels into this overview. |
Dependent item | rabbitmq.exchange.messages.publish_in["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_in per second | The rate of messages (per second) published from the channels into this overview. |
Dependent item | rabbitmq.exchange.messages.publish_in.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_out | The count of messages published from this overview into queues. |
Dependent item | rabbitmq.exchange.messages.publish_out["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_out per second | The rate of messages (per second) published from this overview into queues. |
Dependent item | rabbitmq.exchange.messages.publish_out.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages returned unroutable | The count of messages returned to a publisher as unroutable. |
Dependent item | rabbitmq.exchange.messages.return_unroutable["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages returned unroutable per second | The rate of messages (per second) returned to a publisher as unroutable. |
Dependent item | rabbitmq.exchange.messages.return_unroutable.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages redelivered | The count of subset of messages in the |
Dependent item | rabbitmq.exchange.messages.redeliver["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
Exchange {#VHOST}/{#EXCHANGE}/{#TYPE}: Messages redelivered per second | The rate of subset of messages (per second) in the |
Dependent item | rabbitmq.exchange.messages.redeliver.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
This template is developed to monitor RabbitMQ by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template RabbitMQ Node
— (Zabbix version >= 4.2) collects metrics by polling RabbitMQ management plugin with Zabbix agent.
It also uses Zabbix agent to collect RabbitMQ Linux process statistics, such as CPU usage, memory usage, and whether the process is running or not.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Enable the RabbitMQ management plugin. See RabbitMQ documentation for the instructions.
Create a user to monitor the service:
rabbitmqctl add_user zbx_monitor <PASSWORD>
rabbitmqctl set_permissions -p / zbx_monitor "" "" ".*"
rabbitmqctl set_user_tags zbx_monitor monitoring
A login name and password are also supported in macros functions:
If you use another API endpoint, then don't forget to change {$RABBITMQ.API.HOST}
macro.
Install and setup Zabbix agent.
Name | Description | Default |
---|---|---|
{$RABBITMQ.API.USER} | zbx_monitor |
|
{$RABBITMQ.API.PASSWORD} | zabbix |
|
{$RABBITMQ.CLUSTER.NAME} | The name of the RabbitMQ cluster. |
rabbit |
{$RABBITMQ.API.PORT} | The port of the RabbitMQ API endpoint. |
15672 |
{$RABBITMQ.API.SCHEME} | The request scheme, which may be HTTP or HTTPS. |
http |
{$RABBITMQ.API.HOST} | The hostname or IP of the API endpoint for the RabbitMQ. |
127.0.0.1 |
{$RABBITMQ.PROCESS_NAME} | The process name filter for the RabbitMQ process discovery. |
beam.smp |
{$RABBITMQ.PROCESS.NAME.PARAMETER} | The process name of the RabbitMQ server used in the item key |
|
{$RABBITMQ.LLD.FILTER.QUEUE.MATCHES} | This macro is used in the discovery of queues. It can be overridden at host level or its linked template level. |
.* |
{$RABBITMQ.LLD.FILTER.QUEUE.NOT_MATCHES} | This macro is used in the discovery of queues. It can be overridden at host level or its linked template level. |
CHANGE_IF_NEEDED |
{$RABBITMQ.RESPONSE_TIME.MAX.WARN} | The maximum response time by the RabbitMQ expressed in seconds for a trigger expression. |
10 |
{$RABBITMQ.MESSAGES.MAX.WARN} | The maximum number of messages in the queue for a trigger expression. |
1000 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Service ping | Zabbix agent | net.tcp.service["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"] Preprocessing
|
|
Get node overview | The HTTP API endpoint that returns cluster-wide metrics. |
Zabbix agent | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/overview"] Preprocessing
|
Get nodes | The HTTP API endpoint that returns metrics of the nodes. |
Zabbix agent | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/nodes/{$RABBITMQ.CLUSTER.NAME}@{HOST.NAME}?memory=true"] Preprocessing
|
Get queues | The HTTP API endpoint that returns metrics of the queues metrics. |
Zabbix agent | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/queues"] Preprocessing
|
Management plugin version | The version of the management plugin in use. |
Dependent item | rabbitmq.node.overview.management_version Preprocessing
|
RabbitMQ version | The version of the RabbitMQ on the node, which processed this request. |
Dependent item | rabbitmq.node.overview.rabbitmq_version Preprocessing
|
Used file descriptors | The descriptors of the used file. |
Dependent item | rabbitmq.node.fd_used Preprocessing
|
Free disk space | The current free disk space. |
Dependent item | rabbitmq.node.disk_free Preprocessing
|
Memory used | The memory usage expressed in bytes. |
Dependent item | rabbitmq.node.mem_used Preprocessing
|
Memory limit | The memory usage with high watermark properties expressed in bytes. |
Dependent item | rabbitmq.node.mem_limit Preprocessing
|
Disk free limit | The free space limit of a disk expressed in bytes. |
Dependent item | rabbitmq.node.diskfreelimit Preprocessing
|
Runtime run queue | The average number of Erlang processes waiting to run. |
Dependent item | rabbitmq.node.run_queue Preprocessing
|
Sockets used | The number of file descriptors used as sockets. |
Dependent item | rabbitmq.node.sockets_used Preprocessing
|
Sockets available | The file descriptors available for use as sockets. |
Dependent item | rabbitmq.node.sockets_total Preprocessing
|
Number of network partitions | The number of network partitions, which this node "sees". |
Dependent item | rabbitmq.node.partitions Preprocessing
|
Is running | It "sees" whether the node is running or not. |
Dependent item | rabbitmq.node.running Preprocessing
|
Memory alarm | It checks whether the host has a memory alarm or not. |
Dependent item | rabbitmq.node.mem_alarm Preprocessing
|
Disk free alarm | It checks whether the node has a disk alarm or not. |
Dependent item | rabbitmq.node.diskfreealarm Preprocessing
|
Uptime | Uptime expressed in milliseconds. |
Dependent item | rabbitmq.node.uptime Preprocessing
|
Get processes summary | The aggregated data of summary metrics for all processes. |
Zabbix agent | proc.get[{$RABBITMQ.PROCESS.NAME.PARAMETER},,,summary] |
Service response time | Zabbix agent | net.tcp.service.perf["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ node: Version has changed | RabbitMQ version has changed. Acknowledge to close the problem manually. |
last(/RabbitMQ node by Zabbix agent/rabbitmq.node.overview.rabbitmq_version,#1)<>last(/RabbitMQ node by Zabbix agent/rabbitmq.node.overview.rabbitmq_version,#2) and length(last(/RabbitMQ node by Zabbix agent/rabbitmq.node.overview.rabbitmq_version))>0 |Info |
Manual close: Yes | |
RabbitMQ node: Number of network partitions is too high | For more details see Detecting Network Partitions. |
min(/RabbitMQ node by Zabbix agent/rabbitmq.node.partitions,5m)>0 |Warning |
||
RabbitMQ node: Memory alarm | For more details see Memory Alarms. |
last(/RabbitMQ node by Zabbix agent/rabbitmq.node.mem_alarm)=1 |Average |
||
RabbitMQ node: Free disk space alarm | For more details see Free Disk Space Alarms. |
last(/RabbitMQ node by Zabbix agent/rabbitmq.node.disk_free_alarm)=1 |Average |
||
RabbitMQ node: Host has been restarted | Uptime is less than 10 minutes. |
last(/RabbitMQ node by Zabbix agent/rabbitmq.node.uptime)<10m |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
RabbitMQ process discovery | The discovery of the RabbitMQ summary processes. |
Dependent item | rabbitmq.proc.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get process data | The summary metrics aggregated by a process {#RABBITMQ.NAME}. |
Dependent item | rabbitmq.proc.get[{#RABBITMQ.NAME}] Preprocessing
|
Number of running processes | The number of running processes {#RABBITMQ.NAME}. |
Dependent item | rabbitmq.proc.num[{#RABBITMQ.NAME}] Preprocessing
|
Memory usage (rss) | The summary of resident set size memory used by a process {#RABBITMQ.NAME} expressed in bytes. |
Dependent item | rabbitmq.proc.rss[{#RABBITMQ.NAME}] Preprocessing
|
Memory usage (vsize) | The summary of virtual memory used by a process {#RABBITMQ.NAME} expressed in bytes. |
Dependent item | rabbitmq.proc.vmem[{#RABBITMQ.NAME}] Preprocessing
|
Memory usage, % | The percentage of real memory used by a process {#RABBITMQ.NAME}. |
Dependent item | rabbitmq.proc.pmem[{#RABBITMQ.NAME}] Preprocessing
|
CPU utilization | The percentage of the CPU utilization by a process {#RABBITMQ.NAME}. |
Zabbix agent | proc.cpu.util[{#RABBITMQ.NAME}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ node: Process is not running | last(/RabbitMQ node by Zabbix agent/rabbitmq.proc.num[{#RABBITMQ.NAME}])=0 |High |
|||
RabbitMQ node: Failed to fetch nodes data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/nodes/{$RABBITMQ.CLUSTER.NAME}@{HOST.NAME}?memory=true"],30m)=1 and last(/RabbitMQ node by Zabbix agent/rabbitmq.proc.num[{#RABBITMQ.NAME}])>0 |Warning |
Manual close: Yes Depends on:
|
|
RabbitMQ node: Service is down | last(/RabbitMQ node by Zabbix agent/net.tcp.service["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"])=0 and last(/RabbitMQ node by Zabbix agent/rabbitmq.proc.num[{#RABBITMQ.NAME}])>0 |Average |
Manual close: Yes | ||
RabbitMQ node: Node is not running | RabbitMQ node is not running. |
max(/RabbitMQ node by Zabbix agent/rabbitmq.node.running,5m)=0 and last(/RabbitMQ node by Zabbix agent/rabbitmq.proc.num[{#RABBITMQ.NAME}])>0 |Average |
Depends on:
|
|
RabbitMQ node: Service response time is too high | min(/RabbitMQ node by Zabbix agent/net.tcp.service.perf["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"],5m)>{$RABBITMQ.RESPONSE_TIME.MAX.WARN} and last(/RabbitMQ node by Zabbix agent/rabbitmq.proc.num[{#RABBITMQ.NAME}])>0 |Warning |
Manual close: Yes Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Health Check 3.8.10+ discovery | Specific metrics for the versions: up to and including 3.8.10. |
Dependent item | rabbitmq.healthcheck.v3810.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Healthcheck: local alarms in effect on this node{#SINGLETON} | It responds with a status code Otherwise, it responds with a status code |
Zabbix agent | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/local-alarms{#SINGLETON}"] Preprocessing
|
Healthcheck: expiration date on the certificates{#SINGLETON} | It checks the expiration date on the certificates for every listener configured to use the Transport Layer Security (TLS). It responds with a status code Otherwise, it responds with a status code |
Zabbix agent | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/certificate-expiration/1/months{#SINGLETON}"] Preprocessing
|
Healthcheck: virtual hosts on this node{#SINGLETON} | It responds with It responds with a status code Otherwise it responds with a status code |
Zabbix agent | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/virtual-hosts{#SINGLETON}"] Preprocessing
|
Healthcheck: classic mirrored queues without synchronized mirrors online{#SINGLETON} | It checks if there are classic mirrored queues without synchronized mirrors online (queues that would potentially lose data if the target node is shut down). It responds with a status code Otherwise, it responds with a status code |
Zabbix agent | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/node-is-mirror-sync-critical{#SINGLETON}"] Preprocessing
|
Healthcheck: queues with minimum online quorum{#SINGLETON} | It checks if there are quorum queues with minimum online quorum (queues that would lose their quorum and availability if the target node is shut down). It responds with a status code Otherwise, it responds with a status code |
Zabbix agent | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/node-is-quorum-critical{#SINGLETON}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ node: There are active alarms in the node | It checks the active alarms in the nodes via API. This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/local-alarms{#SINGLETON}"])=0 |Average |
||
RabbitMQ node: There are valid TLS certificates expiring in the next month | It checks if there are valid TLS certificates expiring in the next month. This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/certificate-expiration/1/months{#SINGLETON}"])=0 |Average |
||
RabbitMQ node: There are not running virtual hosts | It checks if there are not running virtual hosts via API. This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/virtual-hosts{#SINGLETON}"])=0 |Average |
||
RabbitMQ node: There are queues that could potentially lose data if this node goes offline. | It checks whether there are queues that could potentially lose data if this node goes offline via API. This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/node-is-mirror-sync-critical{#SINGLETON}"])=0 |Average |
||
RabbitMQ node: There are queues that would lose their quorum and availability if this node is shut down. | It checks if there are queues that could potentially lose data if this node goes offline via API. This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/node-is-quorum-critical{#SINGLETON}"])=0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Health Check 3.8.9- discovery | Specific metrics for the versions: up to and including 3.8.4. |
Dependent item | rabbitmq.healthcheck.v389.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Healthcheck{#SINGLETON} | It checks whether the RabbitMQ application is running; and whether the channels and queues can be listed successfully; and that no alarms are in effect. |
Zabbix agent | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/healthchecks/node{#SINGLETON}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ node: Node healthcheck failed | For more details see Health Checks. |
last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/healthchecks/node{#SINGLETON}"])=0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Queues discovery | The metrics for an individual queue. |
Dependent item | rabbitmq.queues.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Queue [{#VHOST}][{#QUEUE}]: Get data | The HTTP API endpoint that returns [{#VHOST}][{#QUEUE}] queue metrics |
Dependent item | rabbitmq.get_exchanges["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages total | The count of total messages in the queue. |
Dependent item | rabbitmq.queue.messages["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages per second | The count of total messages per second in the queue. |
Dependent item | rabbitmq.queue.messages.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Consumers | The number of consumers. |
Dependent item | rabbitmq.queue.consumers["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Memory | The bytes of memory consumed by the Erlang process associated with the queue, including stack, heap and internal structures. |
Dependent item | rabbitmq.queue.memory["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages ready | The number of messages ready to be delivered to clients. |
Dependent item | rabbitmq.queue.messages_ready["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages ready per second | The number of messages per second ready to be delivered to clients. |
Dependent item | rabbitmq.queue.messages_ready.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages unacknowledged | The number of messages delivered to clients but not yet acknowledged. |
Dependent item | rabbitmq.queue.messages_unacknowledged["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages unacknowledged per second | The number of messages per second delivered to clients but not yet acknowledged. |
Dependent item | rabbitmq.queue.messages_unacknowledged.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages acknowledged | The number of messages delivered to clients and acknowledged. |
Dependent item | rabbitmq.queue.messages.ack["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages acknowledged per second | The number of messages (per second) delivered to clients and acknowledged. |
Dependent item | rabbitmq.queue.messages.ack.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages delivered | The count of messages delivered to consumers in acknowledgement mode. |
Dependent item | rabbitmq.queue.messages.deliver["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages delivered per second | The count of messages (per second) delivered to consumers in acknowledgement mode. |
Dependent item | rabbitmq.queue.messages.deliver.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Sum of messages delivered | The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the |
Dependent item | rabbitmq.queue.messages.deliver_get["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Sum of messages delivered per second | The rate of delivery per second. The sum of messages delivered (per second) to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to |
Dependent item | rabbitmq.queue.messages.deliver_get.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages published | The count of published messages. |
Dependent item | rabbitmq.queue.messages.publish["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages published per second | The rate of published messages per second. |
Dependent item | rabbitmq.queue.messages.publish.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages redelivered | The count of subset of messages in the |
Dependent item | rabbitmq.queue.messages.redeliver["{#VHOST}/{#QUEUE}"] Preprocessing
|
Queue [{#VHOST}][{#QUEUE}]: Messages redelivered per second | The rate of messages redelivered per second. |
Dependent item | rabbitmq.queue.messages.redeliver.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ node: Too many messages in queue [{#VHOST}][{#QUEUE}] | min(/RabbitMQ node by Zabbix agent/rabbitmq.queue.messages["{#VHOST}/{#QUEUE}"],5m)>{$RABBITMQ.MESSAGES.MAX.WARN:"{#QUEUE}"} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Proxmox VE monitoring by Zabbix via HTTP and doesn't require any external scripts.
Proxmox VE uses a REST like API. The concept is described in Resource Oriented Architecture (ROA).
Check the API documentation
for details.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Please provide the necessary access levels for both the User and the Token:
Copy the resulting Token ID and Secret into the host macros {$PVE.TOKEN.ID}
and {$PVE.TOKEN.SECRET}
.
Set the hostname or IP address of the Proxmox API VE host in the {$PVE.URL.HOST}
macro. You can also change the API port in the {$PVE.URL.PORT}
macro if necessary.
Name | Description | Default |
---|---|---|
{$PVE.URL.HOST} | The hostname or IP address of the Proxmox VE API host. |
<SET PVE HOST> |
{$PVE.URL.PORT} | The API uses the HTTPS protocol and the server listens to port 8006 by default. |
8006 |
{$PVE.TOKEN.ID} | API tokens allow stateless access to most parts of the REST API by another system, software or API client. |
USER@REALM!TOKENID |
{$PVE.TOKEN.SECRET} | Secret key. |
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx |
{$PVE.ROOT.PUSE.MAX.WARN} | Maximum used root space in percentage. |
90 |
{$PVE.MEMORY.PUSE.MAX.WARN} | Maximum used memory in percentage. |
90 |
{$PVE.CPU.PUSE.MAX.WARN} | Maximum used CPU in percentage. |
90 |
{$PVE.SWAP.PUSE.MAX.WARN} | Maximum used swap space in percentage. |
90 |
{$PVE.VM.MEMORY.PUSE.MAX.WARN} | Maximum used memory in percentage. |
90 |
{$PVE.VM.CPU.PUSE.MAX.WARN} | Maximum used CPU in percentage. |
90 |
{$PVE.LXC.MEMORY.PUSE.MAX.WARN} | Maximum used memory in percentage. |
90 |
{$PVE.LXC.CPU.PUSE.MAX.WARN} | Maximum used CPU in percentage. |
90 |
{$PVE.LXC.DISK.PUSE.MAX.WARN} | Maximum used disk in percentage. |
90 |
{$PVE.STORAGE.PUSE.MAX.WARN} | Maximum used storage space in percentage. |
90 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get cluster resources | Resources index. |
HTTP agent | proxmox.cluster.resources Preprocessing
|
Get cluster status | Get cluster status information. |
HTTP agent | proxmox.cluster.status Preprocessing
|
API service status | Get API service status. |
Script | proxmox.api.available Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Proxmox VE: API service not available | The API service is not available. Check your network and authorization settings. |
last(/Proxmox VE by HTTP/proxmox.api.available) <> 200 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster discovery | Dependent item | proxmox.cluster.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster [{#RESOURCE.NAME}]: Quorate | Indicates if there is a majority of nodes online to make decisions. |
Dependent item | proxmox.cluster.quorate[{#RESOURCE.NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Proxmox VE: Cluster [{#RESOURCE.NAME}] not quorum | Proxmox VE use a quorum-based technique to provide a consistent state among all cluster nodes. |
last(/Proxmox VE by HTTP/proxmox.cluster.quorate[{#RESOURCE.NAME}]) <> 1 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Node discovery | Dependent item | proxmox.node.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Node [{#NODE.NAME}]: Status | Indicates if the node is online or offline. |
Dependent item | proxmox.node.online[{#NODE.NAME}] Preprocessing
|
Node [{#NODE.NAME}]: Status | Read node status. |
HTTP agent | proxmox.node.status[{#NODE.NAME}] |
Node [{#NODE.NAME}]: RRD statistics | Read node RRD statistics. |
HTTP agent | proxmox.node.rrd[{#NODE.NAME}] Preprocessing
|
Node [{#NODE.NAME}]: Time | Read server time and time zone settings. |
HTTP agent | proxmox.node.time[{#NODE.NAME}] |
Node [{#NODE.NAME}]: Uptime | The system uptime expressed in the following format: "N days, hh:mm:ss". |
Dependent item | proxmox.node.uptime[{#NODE.NAME}] Preprocessing
|
Node [{#NODE.NAME}]: PVE version | PVE manager version. |
Dependent item | proxmox.node.pveversion[{#NODE.NAME}] Preprocessing
|
Node [{#NODE.NAME}]: Kernel version | Kernel version info. |
Dependent item | proxmox.node.kernelversion[{#NODE.NAME}] Preprocessing
|
Node [{#NODE.NAME}]: Root filesystem, used | Root filesystem usage. |
Dependent item | proxmox.node.rootused[{#NODE.NAME}] Preprocessing
|
Node [{#NODE.NAME}]: Root filesystem, total | Root filesystem total. |
Dependent item | proxmox.node.roottotal[{#NODE.NAME}] Preprocessing
|
Node [{#NODE.NAME}]: Memory, used | Memory usage. |
Dependent item | proxmox.node.memused[{#NODE.NAME}] Preprocessing
|
Node [{#NODE.NAME}]: Memory, total | Memory total. |
Dependent item | proxmox.node.memtotal[{#NODE.NAME}] Preprocessing
|
Node [{#NODE.NAME}]: CPU, usage | CPU usage. |
Dependent item | proxmox.node.cpu[{#NODE.NAME}] Preprocessing
|
Node [{#NODE.NAME}]: Outgoing data, rate | Network usage. |
Dependent item | proxmox.node.netout[{#NODE.NAME}] Preprocessing
|
Node [{#NODE.NAME}]: Incoming data, rate | Network usage. |
Dependent item | proxmox.node.netin[{#NODE.NAME}] Preprocessing
|
Node [{#NODE.NAME}]: CPU, loadavg | CPU average load. |
Dependent item | proxmox.node.loadavg[{#NODE.NAME}] Preprocessing
|
Node [{#NODE.NAME}]: CPU, iowait | CPU iowait time. |
Dependent item | proxmox.node.iowait[{#NODE.NAME}] Preprocessing
|
Node [{#NODE.NAME}]: Swap filesystem, total | Swap total. |
Dependent item | proxmox.node.swaptotal[{#NODE.NAME}] Preprocessing
|
Node [{#NODE.NAME}]: Swap filesystem, used | Swap used. |
Dependent item | proxmox.node.swapused[{#NODE.NAME}] Preprocessing
|
Node [{#NODE.NAME}]: Time zone | Time zone. |
Dependent item | proxmox.node.timezone[{#NODE.NAME}] Preprocessing
|
Node [{#NODE.NAME}]: Localtime | Seconds since 1970-01-01 00:00:00 (local time). |
Dependent item | proxmox.node.localtime[{#NODE.NAME}] Preprocessing
|
Node [{#NODE.NAME}]: Time | Seconds since 1970-01-01 00:00:00 UTC. |
Dependent item | proxmox.node.utctime[{#NODE.NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Proxmox VE: Node [{#NODE.NAME}] offline | Node offline. |
last(/Proxmox VE by HTTP/proxmox.node.online[{#NODE.NAME}]) <> 1 |High |
||
Proxmox VE: Node [{#NODE.NAME}] has been restarted | Uptime is less than 10 minutes. |
last(/Proxmox VE by HTTP/proxmox.node.uptime[{#NODE.NAME}])<10m |Info |
Manual close: Yes Depends on:
|
|
Proxmox VE: Node [{#NODE.NAME}]: PVE manager has changed | Firmware version has changed. Acknowledge to close the problem manually. |
last(/Proxmox VE by HTTP/proxmox.node.pveversion[{#NODE.NAME}],#1)<>last(/Proxmox VE by HTTP/proxmox.node.pveversion[{#NODE.NAME}],#2) and length(last(/Proxmox VE by HTTP/proxmox.node.pveversion[{#NODE.NAME}]))>0 |Info |
Manual close: Yes | |
Proxmox VE: Node [{#NODE.NAME}]: Kernel version has changed | Firmware version has changed. Acknowledge to close the problem manually. |
last(/Proxmox VE by HTTP/proxmox.node.kernelversion[{#NODE.NAME}],#1)<>last(/Proxmox VE by HTTP/proxmox.node.kernelversion[{#NODE.NAME}],#2) and length(last(/Proxmox VE by HTTP/proxmox.node.kernelversion[{#NODE.NAME}]))>0 |Info |
Manual close: Yes | |
Proxmox VE: Node [{#NODE.NAME}] high root filesystem space usage | Root filesystem space usage. |
min(/Proxmox VE by HTTP/proxmox.node.rootused[{#NODE.NAME}],5m) / last(/Proxmox VE by HTTP/proxmox.node.roottotal[{#NODE.NAME}]) * 100 >{$PVE.ROOT.PUSE.MAX.WARN:"{#NODE.NAME}"} |Warning |
||
Proxmox VE: Node [{#NODE.NAME}] high memory usage | Memory usage. |
min(/Proxmox VE by HTTP/proxmox.node.memused[{#NODE.NAME}],5m) / last(/Proxmox VE by HTTP/proxmox.node.memtotal[{#NODE.NAME}]) * 100 >{$PVE.MEMORY.PUSE.MAX.WARN:"{#NODE.NAME}"} |Warning |
||
Proxmox VE: Node [{#NODE.NAME}] high CPU usage | CPU usage. |
min(/Proxmox VE by HTTP/proxmox.node.cpu[{#NODE.NAME}],5m) > {$PVE.CPU.PUSE.MAX.WARN:"{#NODE.NAME}"} |Warning |
||
Proxmox VE: Node [{#NODE.NAME}] high swap space usage | If there is no swap configured, this trigger is ignored. |
min(/Proxmox VE by HTTP/proxmox.node.swapused[{#NODE.NAME}],5m) / last(/Proxmox VE by HTTP/proxmox.node.swaptotal[{#NODE.NAME}]) * 100 > {$PVE.SWAP.PUSE.MAX.WARN:"{#NODE.NAME}"} and last(/Proxmox VE by HTTP/proxmox.node.swaptotal[{#NODE.NAME}]) > 0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Storage discovery | Dependent item | proxmox.storage.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Storage [{#NODE.NAME}/{#STORAGE.NAME}]: Type | More specific type, if available. |
Dependent item | proxmox.node.plugintype[{#NODE.NAME},{#STORAGE.NAME}] Preprocessing
|
Storage [{#NODE.NAME}/{#STORAGE.NAME}]: Size | Storage size in bytes. |
Dependent item | proxmox.node.maxdisk[{#NODE.NAME},{#STORAGE.NAME}] Preprocessing
|
Storage [{#NODE.NAME}/{#STORAGE.NAME}]: Content | Allowed storage content types. |
Dependent item | proxmox.node.content[{#NODE.NAME},{#STORAGE.NAME}] Preprocessing
|
Storage [{#NODE.NAME}/{#STORAGE.NAME}]: Used | Used disk space in bytes. |
Dependent item | proxmox.node.disk[{#NODE.NAME},{#STORAGE.NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Proxmox VE: Storage [{#NODE.NAME}/{#STORAGE.NAME}] high filesystem space usage | Root filesystem space usage. |
min(/Proxmox VE by HTTP/proxmox.node.disk[{#NODE.NAME},{#STORAGE.NAME}],5m) / last(/Proxmox VE by HTTP/proxmox.node.maxdisk[{#NODE.NAME},{#STORAGE.NAME}]) * 100 >{$PVE.STORAGE.PUSE.MAX.WARN:"{#NODE.NAME}/{#STORAGE.NAME}"} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
QEMU discovery | Dependent item | proxmox.qemu.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Disk write, rate | Disk write. |
Dependent item | proxmox.qemu.diskwrite[{#QEMU.ID}] Preprocessing
|
VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Disk read, rate | Disk read. |
Dependent item | proxmox.qemu.diskread[{#QEMU.ID}] Preprocessing
|
VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Memory usage | Used memory in bytes. |
Dependent item | proxmox.qemu.mem[{#QEMU.ID}] Preprocessing
|
VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Memory total | The total memory expressed in bytes. |
Dependent item | proxmox.qemu.maxmem[{#QEMU.ID}] Preprocessing
|
VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Incoming data, rate | Incoming data rate. |
Dependent item | proxmox.qemu.netin[{#QEMU.ID}] Preprocessing
|
VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Outgoing data, rate | Outgoing data rate. |
Dependent item | proxmox.qemu.netout[{#QEMU.ID}] Preprocessing
|
VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: CPU usage | CPU load. |
Dependent item | proxmox.qemu.cpu[{#QEMU.ID}] Preprocessing
|
VM [{#NODE.NAME}/{#QEMU.NAME}]: Get data | Get VM status data. |
HTTP agent | proxmox.qemu.get.data[{#QEMU.ID}] |
VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Uptime | The system uptime expressed in the following format: "N days, hh:mm:ss". |
Dependent item | proxmox.qemu.uptime[{#QEMU.ID}] Preprocessing
|
VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Status | Status of Virtual Machine. |
Dependent item | proxmox.qemu.vmstatus[{#QEMU.ID}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Proxmox VE: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})] high memory usage | Memory usage. |
min(/Proxmox VE by HTTP/proxmox.qemu.mem[{#QEMU.ID}],5m) / last(/Proxmox VE by HTTP/proxmox.qemu.maxmem[{#QEMU.ID}]) * 100 >{$PVE.VM.MEMORY.PUSE.MAX.WARN:"{#QEMU.ID}"} |Warning |
||
Proxmox VE: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})] high CPU usage | CPU usage. |
min(/Proxmox VE by HTTP/proxmox.qemu.cpu[{#QEMU.ID}],5m) > {$PVE.VM.CPU.PUSE.MAX.WARN:"{#QEMU.ID}"} |Warning |
||
Proxmox VE: VM [{#NODE.NAME}/{#QEMU.NAME}] has been restarted | Uptime is less than 10 minutes. |
last(/Proxmox VE by HTTP/proxmox.qemu.uptime[{#QEMU.ID}])<10m |Info |
Manual close: Yes Depends on:
|
|
Proxmox VE: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Not running | VM state is not "running". |
last(/Proxmox VE by HTTP/proxmox.qemu.vmstatus[{#QEMU.ID}])<>"running" |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
LXC discovery | Dependent item | proxmox.lxc.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
LXC [{#NODE.NAME}/{#LXC.NAME}]: Get data | Get LXC status data. |
HTTP agent | proxmox.lxc.get.data[{#LXC.ID}] |
LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Uptime | The system uptime expressed in the following format: "N days, hh:mm:ss". |
Dependent item | proxmox.lxc.uptime[{#LXC.ID}] Preprocessing
|
LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Status | Status of LXC container. |
Dependent item | proxmox.lxc.vmstatus[{#LXC.ID}] Preprocessing
|
LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Disk write, rate | Disk write. |
Dependent item | proxmox.lxc.diskwrite[{#LXC.ID}] Preprocessing
|
LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Disk read, rate | Disk read. |
Dependent item | proxmox.lxc.diskread[{#LXC.ID}] Preprocessing
|
LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Disk space total | Total disk space. |
Dependent item | proxmox.lxc.maxdisk[{#LXC.ID}] Preprocessing
|
LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Disk space usage | Disk space usage. |
Dependent item | proxmox.lxc.disk[{#LXC.ID}] Preprocessing
|
LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Memory usage | Used memory in bytes. |
Dependent item | proxmox.lxc.mem[{#LXC.ID}] Preprocessing
|
LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Memory total | The total memory expressed in bytes. |
Dependent item | proxmox.lxc.maxmem[{#LXC.ID}] Preprocessing
|
LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Incoming data, rate | Incoming data rate. |
Dependent item | proxmox.lxc.netin[{#LXC.ID}] Preprocessing
|
LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Outgoing data, rate | Outgoing data rate. |
Dependent item | proxmox.lxc.netout[{#LXC.ID}] Preprocessing
|
LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: CPU usage | CPU load. |
Dependent item | proxmox.lxc.cpu[{#LXC.ID}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Proxmox VE: LXC [{#NODE.NAME}/{#LXC.NAME}] has been restarted | Uptime is less than 10 minutes. |
last(/Proxmox VE by HTTP/proxmox.lxc.uptime[{#LXC.ID}])<10m |Info |
Manual close: Yes Depends on:
|
|
Proxmox VE: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Not running | LXC state is not "running". |
last(/Proxmox VE by HTTP/proxmox.lxc.vmstatus[{#LXC.ID}])<>"running" |Average |
||
Proxmox VE: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: high disk space usage | Disk space usage. |
min(/Proxmox VE by HTTP/proxmox.lxc.disk[{#LXC.ID}],5m) / last(/Proxmox VE by HTTP/proxmox.lxc.maxdisk[{#LXC.ID}]) * 100 > {$PVE.LXC.DISK.PUSE.MAX.WARN:"{#LXC.ID}"} |Warning |
||
Proxmox VE: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})] high memory usage | Memory usage. |
min(/Proxmox VE by HTTP/proxmox.lxc.mem[{#LXC.ID}],5m) / last(/Proxmox VE by HTTP/proxmox.lxc.maxmem[{#LXC.ID}]) * 100 >{$PVE.LXC.MEMORY.PUSE.MAX.WARN:"{#LXC.ID}"} |Warning |
||
Proxmox VE: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})] high CPU usage | CPU usage. |
min(/Proxmox VE by HTTP/proxmox.lxc.cpu[{#LXC.ID}],5m) > {$PVE.LXC.CPU.PUSE.MAX.WARN:"{#LXC.ID}"} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed to monitor processes by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection. For example, by specifying "zabbix" as macro value, you can monitor all zabbix processes.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Install and setup Zabbix agent.
Custom processes set in macros:
Name | Description | Default |
---|---|---|
{$PROC.NAME.MATCHES} | This macro is used in the discovery of processes. It can be overridden on a host-level or on a linked template-level. |
<CHANGE VALUE> |
{$PROC.NAME.NOT_MATCHES} | This macro is used in the discovery of processes. It can be overridden on a host-level or on a linked template-level. |
<CHANGE VALUE> |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get process summary | The summary of data metrics for all processes. |
Zabbix agent | proc.get[,,,summary] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Processes discovery | Discovery of OS summary processes. |
Dependent item | custom.proc.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Process [{#NAME}]: Get data | Summary metrics collected during the process {#NAME}. |
Dependent item | custom.proc.get[{#NAME}] Preprocessing
|
Process [{#NAME}]: Memory usage (rss) | The summary of Resident Set Size (RSS) memory used by the process {#NAME} in bytes. |
Dependent item | custom.proc.rss[{#NAME}] Preprocessing
|
Process [{#NAME}]: Memory usage (vsize) | The summary of virtual memory used by process {#NAME} in bytes. |
Dependent item | custom.proc.vmem[{#NAME}] Preprocessing
|
Process [{#NAME}]: Memory usage, % | The percentage of real memory used by the process {#NAME}. |
Dependent item | custom.proc.pmem[{#NAME}] Preprocessing
|
Process [{#NAME}]: Number of running processes | The number of running processes {#NAME}. |
Dependent item | custom.proc.num[{#NAME}] Preprocessing
|
Process [{#NAME}]: Number of threads | The number of threads {#NAME}. |
Dependent item | custom.proc.thread[{#NAME}] Preprocessing
|
Process [{#NAME}]: Number of page faults | The number of page faults {#NAME}. |
Dependent item | custom.proc.page[{#NAME}] Preprocessing
|
Process [{#NAME}]: Size of locked memory | The size of locked memory {#NAME}. |
Dependent item | custom.proc.mem.locked[{#NAME}] Preprocessing
|
Process [{#NAME}]: Swap space used | The swap space used by {#NAME}. |
Dependent item | custom.proc.swap[{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
OS: Process [{#NAME}]: Process is not running | last(/OS processes by Zabbix agent/custom.proc.num[{#NAME}])=0 |High |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is developed to monitor the FastCGI Process Manager (PHP-FPM) by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template PHP-FPM by Zabbix agent
- collects metrics by polling PHP-FPM status-page with HTTP agent remotely.
Note that this solution supports HTTPS and redirects.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Note that depending on your OS distribution, the PHP-FPM executable/service name can vary. RHEL-like distributions usually name both process and service as php-fpm
, while for Debian/Ubuntu based distributions it may include the version, for example: executable name - php-fpm8.2
, systemd service name - php8.2-fpm
. Adjust the following instructions accordingly if needed.
Open the PHP-FPM configuration file and enable the status page as shown.
pm.status_path = /status
ping.path = /ping
Validate the syntax to ensure it is correct before you reload the service. Replace the <version>
in the command if needed.
$ php-fpm -t
or
$ php-fpm<version> -t
Reload the php-fpm
service to make the change active. Replace the <version>
in the command if needed.
$ systemctl reload php-fpm
or
$ systemctl reload php<version>-fpm
Next, edit the configuration of your web server.
If you use Nginx, edit the configuration file of your Nginx server block (virtual host) and add the location block below it.
# Enable php-fpm status page
location ~ ^/(status|ping)$ {
## disable access logging for request if you prefer
access_log off;
## Only allow trusted IPs for security, deny everyone else
# allow 127.0.0.1;
# allow 1.2.3.4; # your IP here
# deny all;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_index index.php;
include fastcgi_params;
## Now the port or socket of the php-fpm pool we want the status of
fastcgi_pass 127.0.0.1:9000;
# fastcgi_pass unix:/run/php-fpm/your_socket.sock;
}
If you use Apache, edit the configuration file of the virtual host and add the following location blocks.
<LocationMatch "/status">
Require ip 127.0.0.1
# Require ip 1.2.3.4 # Your IP here
# Adjust the path to the socket if needed
ProxyPass "unix:/run/php-fpm/www.sock|fcgi://localhost/status"
</LocationMatch>
<LocationMatch "/ping">
Require ip 127.0.0.1
# Require ip 1.2.3.4 # Your IP here
# Adjust the path to the socket if needed
ProxyPass "unix:/run/php-fpm/www.sock|fcgi://localhost/ping"
</LocationMatch>
$ nginx -t
or
$ httpd -t
or
$ apachectl configtest
Reload the web server configuration. The command may vary depending on the OS distribution and web server.
$ systemctl reload nginx
or
$ systemctl reload httpd
or
$ systemctl reload apache2
Verify that the pages are available with these commands.
curl -L 127.0.0.1/status
curl -L 127.0.0.1/ping
If you use another location of the status/ping pages, don't forget to change the {$PHP_FPM.STATUS.PAGE}/{$PHP_FPM.PING.PAGE}
macro.
If you use another web server port or scheme for the location of the PHP-FPM status/ping pages, don't forget to change the macros {$PHP_FPM.SCHEME}
and {$PHP_FPM.PORT}
.
Name | Description | Default |
---|---|---|
{$PHP_FPM.PORT} | The port of the PHP-FPM status host or container. |
80 |
{$PHP_FPM.SCHEME} | Request scheme which may be http or https |
http |
{$PHP_FPM.HOST} | The hostname or IP address of the PHP-FPM status for a host or container. |
localhost |
{$PHP_FPM.STATUS.PAGE} | The path of the PHP-FPM status page. |
status |
{$PHP_FPM.PING.PAGE} | The path of the PHP-FPM ping page. |
ping |
{$PHP_FPM.PING.REPLY} | The expected reply to the ping. |
pong |
{$PHP_FPM.QUEUE.WARN.MAX} | The maximum percent of the PHP-FPM queue usage for a trigger expression. |
80 |
Name | Description | Type | Key and additional info | |
---|---|---|---|---|
Get ping page | HTTP agent | php-fpm.get_ping | ||
Get status page | HTTP agent | php-fpm.get_status | ||
Ping | Dependent item | php-fpm.ping Preprocessing
|
\r?\n) 1</p><p>⛔️Custom on fail: Set value to: 0` |
|
Processes, active | The total number of active processes. |
Dependent item | php-fpm.processes_active Preprocessing
|
|
Version | The current version of the PHP. You can get it from the HTTP-Header "X-Powered-By"; it may not work if you have changed the default HTTP-headers. |
Dependent item | php-fpm.version Preprocessing
|
|
Pool name | The name of the current pool. |
Dependent item | php-fpm.name Preprocessing
|
|
Uptime | It indicates how long has this pool been running. |
Dependent item | php-fpm.uptime Preprocessing
|
|
Start time | The time when this pool was started. |
Dependent item | php-fpm.start_time Preprocessing
|
|
Processes, total | The total number of server processes currently running. |
Dependent item | php-fpm.processes_total Preprocessing
|
|
Processes, idle | The total number of idle processes. |
Dependent item | php-fpm.processes_idle Preprocessing
|
|
Process manager | The method used by the process manager to control the number of child processes for this pool. |
Dependent item | php-fpm.process_manager Preprocessing
|
|
Processes, max active | The highest value of "active processes" since the PHP-FPM server was started. |
Dependent item | php-fpm.processesmaxactive Preprocessing
|
|
Accepted connections per second | The number of accepted requests per second. |
Dependent item | php-fpm.conn_accepted.rate Preprocessing
|
|
Slow requests | The number of requests that has exceeded your |
Dependent item | php-fpm.slow_requests Preprocessing
|
|
Listen queue | The current number of connections that have been initiated but not yet accepted. |
Dependent item | php-fpm.listen_queue Preprocessing
|
|
Listen queue, max | The maximum number of requests in the queue of pending connections since this FPM pool was started. |
Dependent item | php-fpm.listenqueuemax Preprocessing
|
|
Listen queue, len | The size of the socket queue of pending connections. |
Dependent item | php-fpm.listenqueuelen Preprocessing
|
|
Queue usage | The utilization of the queue. |
Calculated | php-fpm.listenqueueusage | |
Max children reached | The number of times that |
Dependent item | php-fpm.max_children Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
PHP-FPM: Service is down | last(/PHP-FPM by HTTP/php-fpm.ping)=0 or nodata(/PHP-FPM by HTTP/php-fpm.ping,3m)=1 |High |
Manual close: Yes | ||
PHP-FPM: Version has changed | The PHP-FPM version has changed. Acknowledge to close the problem manually. |
last(/PHP-FPM by HTTP/php-fpm.version,#1)<>last(/PHP-FPM by HTTP/php-fpm.version,#2) and length(last(/PHP-FPM by HTTP/php-fpm.version))>0 |Info |
Manual close: Yes | |
PHP-FPM: Failed to fetch info data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/PHP-FPM by HTTP/php-fpm.uptime,30m)=1 |Info |
Manual close: Yes Depends on:
|
|
PHP-FPM: Pool has been restarted | Uptime is less than 10 minutes. |
last(/PHP-FPM by HTTP/php-fpm.uptime)<10m |Info |
Manual close: Yes | |
PHP-FPM: Manager changed | The PHP-FPM manager has changed. Acknowledge to close the problem manually. |
last(/PHP-FPM by HTTP/php-fpm.process_manager,#1)<>last(/PHP-FPM by HTTP/php-fpm.process_manager,#2) |Info |
Manual close: Yes | |
PHP-FPM: Detected slow requests | The PHP-FPM has detected a slow request. |
min(/PHP-FPM by HTTP/php-fpm.slow_requests,#3)>0 |Warning |
||
PHP-FPM: Queue utilization is high | The queue for this pool has reached |
min(/PHP-FPM by HTTP/php-fpm.listen_queue_usage,15m) > {$PHP_FPM.QUEUE.WARN.MAX} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is developed to monitor the FastCGI Process Manager (PHP-FPM) by Zabbix agent that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template PHP-FPM by Zabbix agent
- collects metrics by polling the PHP-FPM status-page locally with Zabbix agent.
Note that this template doesn't support HTTPS and redirects (limitations of web.page.get
).
It also uses Zabbix agent to collect php-fpm
Linux process statistics, such as CPU usage, memory usage, and whether the process is running or not.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Note that depending on your OS distribution, the PHP-FPM executable/service name can vary. RHEL-like distributions usually name both process and service as php-fpm
, while for Debian/Ubuntu based distributions it may include the version, for example: executable name - php-fpm8.2
, systemd service name - php8.2-fpm
. Adjust the following instructions accordingly if needed.
Open the PHP-FPM configuration file and enable the status page as shown.
pm.status_path = /status
ping.path = /ping
Validate the syntax to ensure it is correct before you reload the service. Replace the <version>
in the command if needed.
$ php-fpm -t
or
$ php-fpm<version> -t
Reload the php-fpm
service to make the change active. Replace the <version>
in the command if needed.
$ systemctl reload php-fpm
or
$ systemctl reload php<version>-fpm
Next, edit the configuration of your web server.
If you use Nginx, edit the configuration file of your Nginx server block (virtual host) and add the location block below it.
# Enable php-fpm status page
location ~ ^/(status|ping)$ {
## disable access logging for request if you prefer
access_log off;
## Only allow trusted IPs for security, deny everyone else
# allow 127.0.0.1;
# allow 1.2.3.4; # your IP here
# deny all;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_index index.php;
include fastcgi_params;
## Now the port or socket of the php-fpm pool we want the status of
fastcgi_pass 127.0.0.1:9000;
# fastcgi_pass unix:/run/php-fpm/your_socket.sock;
}
If you use Apache, edit the configuration file of the virtual host and add the following location blocks.
<LocationMatch "/status">
Require ip 127.0.0.1
# Require ip 1.2.3.4 # Your IP here
# Adjust the path to the socket if needed
ProxyPass "unix:/run/php-fpm/www.sock|fcgi://localhost/status"
</LocationMatch>
<LocationMatch "/ping">
Require ip 127.0.0.1
# Require ip 1.2.3.4 # Your IP here
# Adjust the path to the socket if needed
ProxyPass "unix:/run/php-fpm/www.sock|fcgi://localhost/ping"
</LocationMatch>
$ nginx -t
or
$ httpd -t
or
$ apachectl configtest
Reload the web server configuration. The command may vary depending on the OS distribution and web server.
$ systemctl reload nginx
or
$ systemctl reload httpd
or
$ systemctl reload apache2
Verify that the pages are available with these commands.
curl -L 127.0.0.1/status
curl -L 127.0.0.1/ping
Depending on your OS distribution, the PHP-FPM process name may vary as well. Please check the actual name in the line "Name" from /proc/<pid>/status file (https://www.zabbix.com/documentation/7.0/manual/appendix/items/procmemnumnotes) and change the {$PHPFPM.PROCESS.NAME.PARAMETER} macro if needed.
If you use another location of the status/ping pages, don't forget to change the {$PHP_FPM.STATUS.PAGE}/{$PHP_FPM.PING.PAGE}
macro.
If you use another web server port for the location of the PHP-FPM status/ping pages, don't forget to change the macro {$PHP_FPM.PORT}
.
Name | Description | Default |
---|---|---|
{$PHP_FPM.PORT} | The port of the PHP-FPM status host or container. |
80 |
{$PHP_FPM.HOST} | The hostname or IP address of the PHP-FPM status for a host or container. |
localhost |
{$PHP_FPM.STATUS.PAGE} | The path of the PHP-FPM status page. |
status |
{$PHP_FPM.PING.PAGE} | The path of the PHP-FPM ping page. |
ping |
{$PHP_FPM.PING.REPLY} | The expected reply to the ping. |
pong |
{$PHP_FPM.QUEUE.WARN.MAX} | The maximum percent of the PHP-FPM queue usage for a trigger expression. |
80 |
{$PHPFPM.PROCESSNAME} | The process name filter for the PHP-FPM process discovery. May vary depending on your OS distribution. |
php-fpm |
{$PHP_FPM.PROCESS.NAME.PARAMETER} | The process name of the PHP-FPM used in the item key |
Name | Description | Type | Key and additional info | |
---|---|---|---|---|
Get processes summary | The aggregated data of summary metrics for all processes. |
Zabbix agent (active) | proc.get[{$PHP_FPM.PROCESS.NAME.PARAMETER},,,summary] | |
php-fpm_ping | Zabbix agent (active) | web.page.get["{$PHPFPM.HOST}","{$PHPFPM.PING.PAGE}","{$PHP_FPM.PORT}"] | ||
Get status page | Zabbix agent (active) | web.page.get["{$PHPFPM.HOST}","{$PHPFPM.STATUS.PAGE}?json","{$PHP_FPM.PORT}"] Preprocessing
|
||
Ping | Dependent item | php-fpm.ping Preprocessing
|
\r?\n) 1</p><p>⛔️Custom on fail: Set value to: 0` |
|
Processes, active | The total number of active processes. |
Dependent item | php-fpm.processes_active Preprocessing
|
|
Version | The current version of the PHP. You can get it from the HTTP-Header "X-Powered-By"; it may not work if you have changed the default HTTP-headers. |
Dependent item | php-fpm.version Preprocessing
|
|
Pool name | The name of the current pool. |
Dependent item | php-fpm.name Preprocessing
|
|
Uptime | It indicates how long has this pool been running. |
Dependent item | php-fpm.uptime Preprocessing
|
|
Start time | The time when this pool was started. |
Dependent item | php-fpm.start_time Preprocessing
|
|
Processes, total | The total number of server processes currently running. |
Dependent item | php-fpm.processes_total Preprocessing
|
|
Processes, idle | The total number of idle processes. |
Dependent item | php-fpm.processes_idle Preprocessing
|
|
Queue usage | The utilization of the queue. |
Calculated | php-fpm.listenqueueusage | |
Process manager | The method used by the process manager to control the number of child processes for this pool. |
Dependent item | php-fpm.process_manager Preprocessing
|
|
Processes, max active | The highest value of "active processes" since the PHP-FPM server was started. |
Dependent item | php-fpm.processesmaxactive Preprocessing
|
|
Accepted connections per second | The number of accepted requests per second. |
Dependent item | php-fpm.conn_accepted.rate Preprocessing
|
|
Slow requests | The number of requests that has exceeded your |
Dependent item | php-fpm.slow_requests Preprocessing
|
|
Listen queue | The current number of connections that have been initiated but not yet accepted. |
Dependent item | php-fpm.listen_queue Preprocessing
|
|
Listen queue, max | The maximum number of requests in the queue of pending connections since this FPM pool was started. |
Dependent item | php-fpm.listenqueuemax Preprocessing
|
|
Listen queue, len | The size of the socket queue of pending connections. |
Dependent item | php-fpm.listenqueuelen Preprocessing
|
|
Max children reached | The number of times that |
Dependent item | php-fpm.max_children Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
PHP-FPM: Version has changed | The PHP-FPM version has changed. Acknowledge to close the problem manually. |
last(/PHP-FPM by Zabbix agent active/php-fpm.version,#1)<>last(/PHP-FPM by Zabbix agent active/php-fpm.version,#2) and length(last(/PHP-FPM by Zabbix agent active/php-fpm.version))>0 |Info |
Manual close: Yes | |
PHP-FPM: Pool has been restarted | Uptime is less than 10 minutes. |
last(/PHP-FPM by Zabbix agent active/php-fpm.uptime)<10m |Info |
Manual close: Yes | |
PHP-FPM: Queue utilization is high | The queue for this pool has reached |
min(/PHP-FPM by Zabbix agent active/php-fpm.listen_queue_usage,15m) > {$PHP_FPM.QUEUE.WARN.MAX} |Warning |
||
PHP-FPM: Manager changed | The PHP-FPM manager has changed. Acknowledge to close the problem manually. |
last(/PHP-FPM by Zabbix agent active/php-fpm.process_manager,#1)<>last(/PHP-FPM by Zabbix agent active/php-fpm.process_manager,#2) |Info |
Manual close: Yes | |
PHP-FPM: Detected slow requests | The PHP-FPM has detected a slow request. |
min(/PHP-FPM by Zabbix agent active/php-fpm.slow_requests,#3)>0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
PHP-FPM process discovery | The discovery of the PHP-FPM summary processes. |
Dependent item | php-fpm.proc.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get process data | The summary metrics aggregated by a process |
Dependent item | php-fpm.proc.get[{#PHP_FPM.NAME}] Preprocessing
|
Memory usage (rss) | The summary of resident set size memory used by a process |
Dependent item | php-fpm.proc.rss[{#PHP_FPM.NAME}] Preprocessing
|
Memory usage (vsize) | The summary of virtual memory used by a process |
Dependent item | php-fpm.proc.vmem[{#PHP_FPM.NAME}] Preprocessing
|
Memory usage, % | The percentage of real memory used by a process |
Dependent item | php-fpm.proc.pmem[{#PHP_FPM.NAME}] Preprocessing
|
Number of running processes | The number of running processes |
Dependent item | php-fpm.proc.num[{#PHP_FPM.NAME}] Preprocessing
|
CPU utilization | The percentage of the CPU utilization by a process |
Zabbix agent (active) | proc.cpu.util[{#PHP_FPM.NAME}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
PHP-FPM: Process is not running | last(/PHP-FPM by Zabbix agent active/php-fpm.proc.num[{#PHP_FPM.NAME}])=0 |High |
|||
PHP-FPM: Failed to fetch info data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/PHP-FPM by Zabbix agent active/php-fpm.uptime,30m)=1 and last(/PHP-FPM by Zabbix agent active/php-fpm.proc.num[{#PHP_FPM.NAME}])>0 |Info |
Manual close: Yes | |
PHP-FPM: Service is down | (last(/PHP-FPM by Zabbix agent active/php-fpm.ping)=0 or nodata(/PHP-FPM by Zabbix agent active/php-fpm.ping,3m)=1) and last(/PHP-FPM by Zabbix agent active/php-fpm.proc.num[{#PHP_FPM.NAME}])>0 |High |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is developed to monitor the FastCGI Process Manager (PHP-FPM) by Zabbix agent that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template PHP-FPM by Zabbix agent
- collects metrics by polling the PHP-FPM status-page locally with Zabbix agent.
Note that this template doesn't support HTTPS and redirects (limitations of web.page.get
).
It also uses Zabbix agent to collect php-fpm
Linux process statistics, such as CPU usage, memory usage, and whether the process is running or not.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Note that depending on your OS distribution, the PHP-FPM executable/service name can vary. RHEL-like distributions usually name both process and service as php-fpm
, while for Debian/Ubuntu based distributions it may include the version, for example: executable name - php-fpm8.2
, systemd service name - php8.2-fpm
. Adjust the following instructions accordingly if needed.
Open the PHP-FPM configuration file and enable the status page as shown.
pm.status_path = /status
ping.path = /ping
Validate the syntax to ensure it is correct before you reload the service. Replace the <version>
in the command if needed.
$ php-fpm -t
or
$ php-fpm<version> -t
Reload the php-fpm
service to make the change active. Replace the <version>
in the command if needed.
$ systemctl reload php-fpm
or
$ systemctl reload php<version>-fpm
Next, edit the configuration of your web server.
If you use Nginx, edit the configuration file of your Nginx server block (virtual host) and add the location block below it.
# Enable php-fpm status page
location ~ ^/(status|ping)$ {
## disable access logging for request if you prefer
access_log off;
## Only allow trusted IPs for security, deny everyone else
# allow 127.0.0.1;
# allow 1.2.3.4; # your IP here
# deny all;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_index index.php;
include fastcgi_params;
## Now the port or socket of the php-fpm pool we want the status of
fastcgi_pass 127.0.0.1:9000;
# fastcgi_pass unix:/run/php-fpm/your_socket.sock;
}
If you use Apache, edit the configuration file of the virtual host and add the following location blocks.
<LocationMatch "/status">
Require ip 127.0.0.1
# Require ip 1.2.3.4 # Your IP here
# Adjust the path to the socket if needed
ProxyPass "unix:/run/php-fpm/www.sock|fcgi://localhost/status"
</LocationMatch>
<LocationMatch "/ping">
Require ip 127.0.0.1
# Require ip 1.2.3.4 # Your IP here
# Adjust the path to the socket if needed
ProxyPass "unix:/run/php-fpm/www.sock|fcgi://localhost/ping"
</LocationMatch>
$ nginx -t
or
$ httpd -t
or
$ apachectl configtest
Reload the web server configuration. The command may vary depending on the OS distribution and web server.
$ systemctl reload nginx
or
$ systemctl reload httpd
or
$ systemctl reload apache2
Verify that the pages are available with these commands.
curl -L 127.0.0.1/status
curl -L 127.0.0.1/ping
Depending on your OS distribution, the PHP-FPM process name may vary as well. Please check the actual name in the line "Name" from /proc/<pid>/status file (https://www.zabbix.com/documentation/7.0/manual/appendix/items/procmemnumnotes) and change the {$PHPFPM.PROCESS.NAME.PARAMETER} macro if needed.
If you use another location of the status/ping pages, don't forget to change the {$PHP_FPM.STATUS.PAGE}/{$PHP_FPM.PING.PAGE}
macro.
If you use another web server port for the location of the PHP-FPM status/ping pages, don't forget to change the macro {$PHP_FPM.PORT}
.
Name | Description | Default |
---|---|---|
{$PHP_FPM.PORT} | The port of the PHP-FPM status host or container. |
80 |
{$PHP_FPM.HOST} | The hostname or IP address of the PHP-FPM status for a host or container. |
localhost |
{$PHP_FPM.STATUS.PAGE} | The path of the PHP-FPM status page. |
status |
{$PHP_FPM.PING.PAGE} | The path of the PHP-FPM ping page. |
ping |
{$PHP_FPM.PING.REPLY} | The expected reply to the ping. |
pong |
{$PHP_FPM.QUEUE.WARN.MAX} | The maximum percent of the PHP-FPM queue usage for a trigger expression. |
80 |
{$PHPFPM.PROCESSNAME} | The process name filter for the PHP-FPM process discovery. May vary depending on your OS distribution. |
php-fpm |
{$PHP_FPM.PROCESS.NAME.PARAMETER} | The process name of the PHP-FPM used in the item key |
Name | Description | Type | Key and additional info | |
---|---|---|---|---|
Get processes summary | The aggregated data of summary metrics for all processes. |
Zabbix agent | proc.get[{$PHP_FPM.PROCESS.NAME.PARAMETER},,,summary] | |
php-fpm_ping | Zabbix agent | web.page.get["{$PHPFPM.HOST}","{$PHPFPM.PING.PAGE}","{$PHP_FPM.PORT}"] | ||
Get status page | Zabbix agent | web.page.get["{$PHPFPM.HOST}","{$PHPFPM.STATUS.PAGE}?json","{$PHP_FPM.PORT}"] Preprocessing
|
||
Ping | Dependent item | php-fpm.ping Preprocessing
|
\r?\n) 1</p><p>⛔️Custom on fail: Set value to: 0` |
|
Processes, active | The total number of active processes. |
Dependent item | php-fpm.processes_active Preprocessing
|
|
Version | The current version of the PHP. You can get it from the HTTP-Header "X-Powered-By"; it may not work if you have changed the default HTTP-headers. |
Dependent item | php-fpm.version Preprocessing
|
|
Pool name | The name of the current pool. |
Dependent item | php-fpm.name Preprocessing
|
|
Uptime | It indicates how long has this pool been running. |
Dependent item | php-fpm.uptime Preprocessing
|
|
Start time | The time when this pool was started. |
Dependent item | php-fpm.start_time Preprocessing
|
|
Processes, total | The total number of server processes currently running. |
Dependent item | php-fpm.processes_total Preprocessing
|
|
Processes, idle | The total number of idle processes. |
Dependent item | php-fpm.processes_idle Preprocessing
|
|
Queue usage | The utilization of the queue. |
Calculated | php-fpm.listenqueueusage | |
Process manager | The method used by the process manager to control the number of child processes for this pool. |
Dependent item | php-fpm.process_manager Preprocessing
|
|
Processes, max active | The highest value of "active processes" since the PHP-FPM server was started. |
Dependent item | php-fpm.processesmaxactive Preprocessing
|
|
Accepted connections per second | The number of accepted requests per second. |
Dependent item | php-fpm.conn_accepted.rate Preprocessing
|
|
Slow requests | The number of requests that has exceeded your |
Dependent item | php-fpm.slow_requests Preprocessing
|
|
Listen queue | The current number of connections that have been initiated but not yet accepted. |
Dependent item | php-fpm.listen_queue Preprocessing
|
|
Listen queue, max | The maximum number of requests in the queue of pending connections since this FPM pool was started. |
Dependent item | php-fpm.listenqueuemax Preprocessing
|
|
Listen queue, len | The size of the socket queue of pending connections. |
Dependent item | php-fpm.listenqueuelen Preprocessing
|
|
Max children reached | The number of times that |
Dependent item | php-fpm.max_children Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
PHP-FPM: Version has changed | The PHP-FPM version has changed. Acknowledge to close the problem manually. |
last(/PHP-FPM by Zabbix agent/php-fpm.version,#1)<>last(/PHP-FPM by Zabbix agent/php-fpm.version,#2) and length(last(/PHP-FPM by Zabbix agent/php-fpm.version))>0 |Info |
Manual close: Yes | |
PHP-FPM: Pool has been restarted | Uptime is less than 10 minutes. |
last(/PHP-FPM by Zabbix agent/php-fpm.uptime)<10m |Info |
Manual close: Yes | |
PHP-FPM: Queue utilization is high | The queue for this pool has reached |
min(/PHP-FPM by Zabbix agent/php-fpm.listen_queue_usage,15m) > {$PHP_FPM.QUEUE.WARN.MAX} |Warning |
||
PHP-FPM: Manager changed | The PHP-FPM manager has changed. Acknowledge to close the problem manually. |
last(/PHP-FPM by Zabbix agent/php-fpm.process_manager,#1)<>last(/PHP-FPM by Zabbix agent/php-fpm.process_manager,#2) |Info |
Manual close: Yes | |
PHP-FPM: Detected slow requests | The PHP-FPM has detected a slow request. |
min(/PHP-FPM by Zabbix agent/php-fpm.slow_requests,#3)>0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
PHP-FPM process discovery | The discovery of the PHP-FPM summary processes. |
Dependent item | php-fpm.proc.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get process data | The summary metrics aggregated by a process |
Dependent item | php-fpm.proc.get[{#PHP_FPM.NAME}] Preprocessing
|
Memory usage (rss) | The summary of resident set size memory used by a process |
Dependent item | php-fpm.proc.rss[{#PHP_FPM.NAME}] Preprocessing
|
Memory usage (vsize) | The summary of virtual memory used by a process |
Dependent item | php-fpm.proc.vmem[{#PHP_FPM.NAME}] Preprocessing
|
Memory usage, % | The percentage of real memory used by a process |
Dependent item | php-fpm.proc.pmem[{#PHP_FPM.NAME}] Preprocessing
|
Number of running processes | The number of running processes |
Dependent item | php-fpm.proc.num[{#PHP_FPM.NAME}] Preprocessing
|
CPU utilization | The percentage of the CPU utilization by a process |
Zabbix agent | proc.cpu.util[{#PHP_FPM.NAME}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
PHP-FPM: Process is not running | last(/PHP-FPM by Zabbix agent/php-fpm.proc.num[{#PHP_FPM.NAME}])=0 |High |
|||
PHP-FPM: Failed to fetch info data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/PHP-FPM by Zabbix agent/php-fpm.uptime,30m)=1 and last(/PHP-FPM by Zabbix agent/php-fpm.proc.num[{#PHP_FPM.NAME}])>0 |Info |
Manual close: Yes | |
PHP-FPM: Service is down | (last(/PHP-FPM by Zabbix agent/php-fpm.ping)=0 or nodata(/PHP-FPM by Zabbix agent/php-fpm.ping,3m)=1) and last(/PHP-FPM by Zabbix agent/php-fpm.proc.num[{#PHP_FPM.NAME}])>0 |High |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Template for monitoring pfSense by SNMP
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Name | Description | Default |
---|---|---|
{$IF.ERRORS.WARN} | Threshold of error packets rate for warning trigger. Can be used with interface name as context. |
2 |
{$IF.UTIL.MAX} | Threshold of interface bandwidth utilization for warning trigger in %. Can be used with interface name as context. |
90 |
{$IFCONTROL} | Macro for operational state of the interface for link down trigger. Can be used with interface name as context. |
1 |
{$NET.IF.IFADMINSTATUS.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
^.* |
{$NET.IF.IFADMINSTATUS.NOT_MATCHES} | Ignore down(2) administrative status. |
^2$ |
{$NET.IF.IFALIAS.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
.* |
{$NET.IF.IFALIAS.NOT_MATCHES} | This macro is used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
{$NET.IF.IFDESCR.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
.* |
{$NET.IF.IFDESCR.NOT_MATCHES} | This macro is used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
{$NET.IF.IFNAME.NOT_MATCHES} | This macro is used in filters of network interfaces discovery rule. |
(^pflog[0-9.]*$|^pfsync[0-9.]*$) |
{$NET.IF.IFOPERSTATUS.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
^.*$ |
{$NET.IF.IFOPERSTATUS.NOT_MATCHES} | Ignore notPresent(6). |
^6$ |
{$NET.IF.IFTYPE.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
.* |
{$NET.IF.IFTYPE.NOT_MATCHES} | This macro is used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
{$SNMP.TIMEOUT} | The time interval for SNMP availability trigger. |
5m |
{$STATE.TABLE.UTIL.MAX} | Threshold of state table utilization trigger in %. |
90 |
{$SOURCE.TRACKING.TABLE.UTIL.MAX} | Threshold of source tracking table utilization trigger in %. |
90 |
Name | Description | Type | Key and additional info |
---|---|---|---|
SNMP agent availability | Availability of SNMP checks on the host. The value of this item corresponds to availability icons in the host list. Possible values: 0 - not available 1 - available 2 - unknown |
Zabbix internal | zabbix[host,snmp,available] |
Packet filter running status | MIB: BEGEMOT-PF-MIB True if packet filter is currently enabled. |
SNMP agent | pfsense.pf.status |
States table current | MIB: BEGEMOT-PF-MIB Number of entries in the state table. |
SNMP agent | pfsense.state.table.count |
States table limit | MIB: BEGEMOT-PF-MIB Maximum number of 'keep state' rules in the ruleset. |
SNMP agent | pfsense.state.table.limit |
States table utilization in % | Utilization of state table in %. |
Calculated | pfsense.state.table.pused |
Source tracking table current | MIB: BEGEMOT-PF-MIB Number of entries in the source tracking table. |
SNMP agent | pfsense.source.tracking.table.count |
Source tracking table limit | MIB: BEGEMOT-PF-MIB Maximum number of 'sticky-address' or 'source-track' rules in the ruleset. |
SNMP agent | pfsense.source.tracking.table.limit |
Source tracking table utilization in % | Utilization of source tracking table in %. |
Calculated | pfsense.source.tracking.table.pused |
DHCP server status | MIB: HOST-RESOURCES-MIB The status of DHCP server process. |
SNMP agent | pfsense.dhcpd.status Preprocessing
|
DNS server status | MIB: HOST-RESOURCES-MIB The status of DNS server process. |
SNMP agent | pfsense.dns.status Preprocessing
|
State of nginx process | MIB: HOST-RESOURCES-MIB The status of nginx process. |
SNMP agent | pfsense.nginx.status Preprocessing
|
Packets matched a filter rule | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP agent | pfsense.packets.match Preprocessing
|
Packets with bad offset | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP agent | pfsense.packets.bad.offset Preprocessing
|
Fragmented packets | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP agent | pfsense.packets.fragment Preprocessing
|
Short packets | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP agent | pfsense.packets.short Preprocessing
|
Normalized packets | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP agent | pfsense.packets.normalize Preprocessing
|
Packets dropped due to memory limitation | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP agent | pfsense.packets.mem.drop Preprocessing
|
Firewall rules count | MIB: BEGEMOT-PF-MIB The number of labeled filter rules on this system. |
SNMP agent | pfsense.rules.count |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
PFSense: No SNMP data collection | SNMP is not available for polling. Please check device connectivity and SNMP settings. |
max(/PFSense by SNMP/zabbix[host,snmp,available],{$SNMP.TIMEOUT})=0 |Warning |
||
PFSense: Packet filter is not running | Please check PF status. |
last(/PFSense by SNMP/pfsense.pf.status)<>1 |High |
||
PFSense: State table usage is high | Please check the number of connections https://docs.netgate.com/pfsense/en/latest/config/advanced-firewall-nat.html#config-advanced-firewall-maxstates |
min(/PFSense by SNMP/pfsense.state.table.pused,#3)>{$STATE.TABLE.UTIL.MAX} |Warning |
||
PFSense: Source tracking table usage is high | Please check the number of sticky connections https://docs.netgate.com/pfsense/en/latest/monitoring/status/firewall-states-sources.html |
min(/PFSense by SNMP/pfsense.source.tracking.table.pused,#3)>{$SOURCE.TRACKING.TABLE.UTIL.MAX} |Warning |
||
PFSense: DHCP server is not running | Please check DHCP server settings https://docs.netgate.com/pfsense/en/latest/services/dhcp/index.html |
last(/PFSense by SNMP/pfsense.dhcpd.status)=0 |Average |
||
PFSense: DNS server is not running | Please check DNS server settings https://docs.netgate.com/pfsense/en/latest/services/dns/index.html |
last(/PFSense by SNMP/pfsense.dns.status)=0 |Average |
||
PFSense: Web server is not running | Please check nginx service status. |
last(/PFSense by SNMP/pfsense.nginx.status)=0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Network interfaces discovery | Discovering interfaces from IF-MIB. |
SNMP agent | pfsense.net.if.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Interface [{#IFNAME}({#IFALIAS})]: Inbound packets discarded | MIB: IF-MIB The number of inbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.in.discards[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Inbound packets with errors | MIB: IF-MIB For packet-oriented interfaces, the number of inbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of inbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.in.errors[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Bits received | MIB: IF-MIB The total number of octets received on the interface, including framing characters. This object is a 64-bit version of ifInOctets. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.in[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Outbound packets discarded | MIB: IF-MIB The number of outbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.out.discards[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Outbound packets with errors | MIB: IF-MIB For packet-oriented interfaces, the number of outbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of outbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.out.errors[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Bits sent | MIB: IF-MIB The total number of octets transmitted out of the interface, including framing characters. This object is a 64-bit version of ifOutOctets.Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.out[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Speed | MIB: IF-MIB An estimate of the interface's current bandwidth in units of 1,000,000 bits per second. If this object reports a value of |
SNMP agent | net.if.speed[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Operational status | MIB: IF-MIB The current operational state of the interface. - The testing(3) state indicates that no operational packet scan be passed - If ifAdminStatus is down(2) then ifOperStatus should be down(2) - If ifAdminStatus is changed to up(1) then ifOperStatus should change to up(1) if the interface is ready to transmit and receive network traffic - It should change todormant(5) if the interface is waiting for external actions (such as a serial line waiting for an incoming connection) - It should remain in the down(2) state if and only if there is a fault that prevents it from going to the up(1) state - It should remain in the notPresent(6) state if the interface has missing(typically, hardware) components. |
SNMP agent | net.if.status[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Interface type | MIB: IF-MIB The type of interface. Additional values for ifType are assigned by the Internet Assigned Numbers Authority (IANA), through updating the syntax of the IANAifType textual convention. |
SNMP agent | net.if.type[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Rules references count | MIB: BEGEMOT-PF-MIB The number of rules referencing this interface. |
SNMP agent | net.if.rules.refs[{#SNMPINDEX}] |
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 traffic passed | MIB: BEGEMOT-PF-MIB IPv4 bits per second passed coming in on this interface. |
SNMP agent | net.if.in.pass.v4.bps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 traffic blocked | MIB: BEGEMOT-PF-MIB IPv4 bits per second blocked coming in on this interface. |
SNMP agent | net.if.in.block.v4.bps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 traffic passed | MIB: BEGEMOT-PF-MIB IPv4 bits per second passed going out on this interface. |
SNMP agent | net.if.out.pass.v4.bps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 traffic blocked | MIB: BEGEMOT-PF-MIB IPv4 bits per second blocked going out on this interface. |
SNMP agent | net.if.out.block.v4.bps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 packets passed | MIB: BEGEMOT-PF-MIB The number of IPv4 packets passed coming in on this interface. |
SNMP agent | net.if.in.pass.v4.pps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 packets blocked | MIB: BEGEMOT-PF-MIB The number of IPv4 packets blocked coming in on this interface. |
SNMP agent | net.if.in.block.v4.pps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 packets passed | MIB: BEGEMOT-PF-MIB The number of IPv4 packets passed going out on this interface. |
SNMP agent | net.if.out.pass.v4.pps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 packets blocked | MIB: BEGEMOT-PF-MIB The number of IPv4 packets blocked going out on this interface. |
SNMP agent | net.if.out.block.v4.pps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 traffic passed | MIB: BEGEMOT-PF-MIB IPv6 bits per second passed coming in on this interface. |
SNMP agent | net.if.in.pass.v6.bps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 traffic blocked | MIB: BEGEMOT-PF-MIB IPv6 bits per second blocked coming in on this interface. |
SNMP agent | net.if.in.block.v6.bps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 traffic passed | MIB: BEGEMOT-PF-MIB IPv6 bits per second passed going out on this interface. |
SNMP agent | net.if.out.pass.v6.bps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 traffic blocked | MIB: BEGEMOT-PF-MIB IPv6 bits per second blocked going out on this interface. |
SNMP agent | net.if.out.block.v6.bps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 packets passed | MIB: BEGEMOT-PF-MIB The number of IPv6 packets passed coming in on this interface. |
SNMP agent | net.if.in.pass.v6.pps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 packets blocked | MIB: BEGEMOT-PF-MIB The number of IPv6 packets blocked coming in on this interface. |
SNMP agent | net.if.in.block.v6.pps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 packets passed | MIB: BEGEMOT-PF-MIB The number of IPv6 packets passed going out on this interface. |
SNMP agent | net.if.out.pass.v6.pps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 packets blocked | MIB: BEGEMOT-PF-MIB The number of IPv6 packets blocked going out on this interface. |
SNMP agent | net.if.out.block.v6.pps[{#SNMPINDEX}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: High input error rate | It recovers when it is below 80% of the |
min(/PFSense by SNMP/net.if.in.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} |Warning |
Depends on:
|
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: High inbound bandwidth usage | The utilization of the network interface is close to its estimated maximum bandwidth. |
(avg(/PFSense by SNMP/net.if.in[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])>0 |Warning |
Depends on:
|
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: High output error rate | It recovers when it is below 80% of the |
min(/PFSense by SNMP/net.if.out.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} |Warning |
Depends on:
|
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: High outbound bandwidth usage | The utilization of the network interface is close to its estimated maximum bandwidth. |
(avg(/PFSense by SNMP/net.if.out[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])>0 |Warning |
Depends on:
|
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Ethernet has changed to lower speed than it was before | This Ethernet connection has transitioned down from its known maximum speed. This might be a sign of autonegotiation issues. Acknowledge to close the problem manually. |
change(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])<0 and last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])>0 and ( last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=6 or last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=7 or last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=11 or last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=62 or last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=69 or last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=117 ) and (last(/PFSense by SNMP/net.if.status[{#SNMPINDEX}])<>2) |Info |
Depends on:
|
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Link down | This trigger expression works as follows: |
{$IFCONTROL:"{#IFNAME}"}=1 and (last(/PFSense by SNMP/net.if.status[{#SNMPINDEX}])=2) |Average |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Template for monitoring OPNsense by SNMP
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Name | Description | Default |
---|---|---|
{$IF.ERRORS.WARN} | Threshold of error packets rate for warning trigger. Can be used with interface name as context. |
2 |
{$IF.UTIL.MAX} | Threshold of interface bandwidth utilization for warning trigger in %. Can be used with interface name as context. |
90 |
{$IFCONTROL} | Macro for operational state of the interface for link down trigger. Can be used with interface name as context. |
1 |
{$NET.IF.IFADMINSTATUS.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
^.* |
{$NET.IF.IFADMINSTATUS.NOT_MATCHES} | Ignore down(2) administrative status. |
^2$ |
{$NET.IF.IFALIAS.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
.* |
{$NET.IF.IFALIAS.NOT_MATCHES} | This macro is used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
{$NET.IF.IFDESCR.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
.* |
{$NET.IF.IFDESCR.NOT_MATCHES} | This macro is used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
{$NET.IF.IFNAME.NOT_MATCHES} | This macro is used in filters of network interfaces discovery rule. |
(^pflog[0-9.]*$|^pfsync[0-9.]*$) |
{$NET.IF.IFOPERSTATUS.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
^.*$ |
{$NET.IF.IFOPERSTATUS.NOT_MATCHES} | Ignore notPresent(6). |
^6$ |
{$NET.IF.IFTYPE.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
.* |
{$NET.IF.IFTYPE.NOT_MATCHES} | This macro is used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
{$SNMP.TIMEOUT} | The time interval for SNMP availability trigger. |
5m |
{$STATE.TABLE.UTIL.MAX} | Threshold of state table utilization trigger in %. |
90 |
{$SOURCE.TRACKING.TABLE.UTIL.MAX} | Threshold of source tracking table utilization trigger in %. |
90 |
Name | Description | Type | Key and additional info |
---|---|---|---|
SNMP agent availability | Availability of SNMP checks on the host. The value of this item corresponds to availability icons in the host list. Possible values: 0 - not available 1 - available 2 - unknown |
Zabbix internal | zabbix[host,snmp,available] |
Packet filter running status | MIB: BEGEMOT-PF-MIB True if packet filter is currently enabled. |
SNMP agent | opnsense.pf.status |
States table current | MIB: BEGEMOT-PF-MIB Number of entries in the state table. |
SNMP agent | opnsense.state.table.count |
States table limit | MIB: BEGEMOT-PF-MIB Maximum number of 'keep state' rules in the ruleset. |
SNMP agent | opnsense.state.table.limit |
States table utilization in % | Utilization of state table in %. |
Calculated | opnsense.state.table.pused |
Source tracking table current | MIB: BEGEMOT-PF-MIB Number of entries in the source tracking table. |
SNMP agent | opnsense.source.tracking.table.count |
Source tracking table limit | MIB: BEGEMOT-PF-MIB Maximum number of 'sticky-address' or 'source-track' rules in the ruleset. |
SNMP agent | opnsense.source.tracking.table.limit |
Source tracking table utilization in % | Utilization of source tracking table in %. |
Calculated | opnsense.source.tracking.table.pused |
DHCP server status | MIB: HOST-RESOURCES-MIB The status of DHCP server process. |
SNMP agent | opnsense.dhcpd.status Preprocessing
|
DNS server status | MIB: HOST-RESOURCES-MIB The status of DNS server process. |
SNMP agent | opnsense.dns.status Preprocessing
|
Web server status | MIB: HOST-RESOURCES-MIB The status of lighttpd process. |
SNMP agent | opnsense.lighttpd.status Preprocessing
|
Packets matched a filter rule | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP agent | opnsense.packets.match Preprocessing
|
Packets with bad offset | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP agent | opnsense.packets.bad.offset Preprocessing
|
Fragmented packets | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP agent | opnsense.packets.fragment Preprocessing
|
Short packets | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP agent | opnsense.packets.short Preprocessing
|
Normalized packets | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP agent | opnsense.packets.normalize Preprocessing
|
Packets dropped due to memory limitation | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP agent | opnsense.packets.mem.drop Preprocessing
|
Firewall rules count | MIB: BEGEMOT-PF-MIB The number of labeled filter rules on this system. |
SNMP agent | opnsense.rules.count |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
OPNsense: No SNMP data collection | SNMP is not available for polling. Please check device connectivity and SNMP settings. |
max(/OPNsense by SNMP/zabbix[host,snmp,available],{$SNMP.TIMEOUT})=0 |Warning |
||
OPNsense: Packet filter is not running | Please check PF status. |
last(/OPNsense by SNMP/opnsense.pf.status)<>1 |High |
||
OPNsense: State table usage is high | Please check the number of connections. |
min(/OPNsense by SNMP/opnsense.state.table.pused,#3)>{$STATE.TABLE.UTIL.MAX} |Warning |
||
OPNsense: Source tracking table usage is high | Please check the number of sticky connections. |
min(/OPNsense by SNMP/opnsense.source.tracking.table.pused,#3)>{$SOURCE.TRACKING.TABLE.UTIL.MAX} |Warning |
||
OPNsense: DHCP server is not running | Please check DHCP server settings. |
last(/OPNsense by SNMP/opnsense.dhcpd.status)=0 |Average |
||
OPNsense: DNS server is not running | Please check DNS server settings. |
last(/OPNsense by SNMP/opnsense.dns.status)=0 |Average |
||
OPNsense: Web server is not running | Please check lighttpd service status. |
last(/OPNsense by SNMP/opnsense.lighttpd.status)=0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Network interfaces discovery | Discovering interfaces from IF-MIB. |
SNMP agent | opnsense.net.if.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Interface [{#IFNAME}({#IFALIAS})]: Inbound packets discarded | MIB: IF-MIB The number of inbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.in.discards[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Inbound packets with errors | MIB: IF-MIB For packet-oriented interfaces, the number of inbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of inbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.in.errors[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Bits received | MIB: IF-MIB The total number of octets received on the interface, including framing characters. This object is a 64-bit version of ifInOctets. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.in[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Outbound packets discarded | MIB: IF-MIB The number of outbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.out.discards[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Outbound packets with errors | MIB: IF-MIB For packet-oriented interfaces, the number of outbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of outbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.out.errors[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Bits sent | MIB: IF-MIB The total number of octets transmitted out of the interface, including framing characters. This object is a 64-bit version of ifOutOctets.Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.out[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Speed | MIB: IF-MIB An estimate of the interface's current bandwidth in units of 1,000,000 bits per second. If this object reports a value of |
SNMP agent | net.if.speed[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Operational status | MIB: IF-MIB The current operational state of the interface. - The testing(3) state indicates that no operational packet scan be passed - If ifAdminStatus is down(2) then ifOperStatus should be down(2) - If ifAdminStatus is changed to up(1) then ifOperStatus should change to up(1) if the interface is ready to transmit and receive network traffic - It should change todormant(5) if the interface is waiting for external actions (such as a serial line waiting for an incoming connection) - It should remain in the down(2) state if and only if there is a fault that prevents it from going to the up(1) state - It should remain in the notPresent(6) state if the interface has missing(typically, hardware) components. |
SNMP agent | net.if.status[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Interface type | MIB: IF-MIB The type of interface. Additional values for ifType are assigned by the Internet Assigned Numbers Authority (IANA), through updating the syntax of the IANAifType textual convention. |
SNMP agent | net.if.type[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Rules references count | MIB: BEGEMOT-PF-MIB The number of rules referencing this interface. |
SNMP agent | net.if.rules.refs[{#SNMPINDEX}] |
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 traffic passed | MIB: BEGEMOT-PF-MIB IPv4 bits per second passed coming in on this interface. |
SNMP agent | net.if.in.pass.v4.bps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 traffic blocked | MIB: BEGEMOT-PF-MIB IPv4 bits per second blocked coming in on this interface. |
SNMP agent | net.if.in.block.v4.bps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 traffic passed | MIB: BEGEMOT-PF-MIB IPv4 bits per second passed going out on this interface. |
SNMP agent | net.if.out.pass.v4.bps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 traffic blocked | MIB: BEGEMOT-PF-MIB IPv4 bits per second blocked going out on this interface. |
SNMP agent | net.if.out.block.v4.bps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 packets passed | MIB: BEGEMOT-PF-MIB The number of IPv4 packets passed coming in on this interface. |
SNMP agent | net.if.in.pass.v4.pps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 packets blocked | MIB: BEGEMOT-PF-MIB The number of IPv4 packets blocked coming in on this interface. |
SNMP agent | net.if.in.block.v4.pps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 packets passed | MIB: BEGEMOT-PF-MIB The number of IPv4 packets passed going out on this interface. |
SNMP agent | net.if.out.pass.v4.pps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 packets blocked | MIB: BEGEMOT-PF-MIB The number of IPv4 packets blocked going out on this interface. |
SNMP agent | net.if.out.block.v4.pps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 traffic passed | MIB: BEGEMOT-PF-MIB IPv6 bits per second passed coming in on this interface. |
SNMP agent | net.if.in.pass.v6.bps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 traffic blocked | MIB: BEGEMOT-PF-MIB IPv6 bits per second blocked coming in on this interface. |
SNMP agent | net.if.in.block.v6.bps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 traffic passed | MIB: BEGEMOT-PF-MIB IPv6 bits per second passed going out on this interface. |
SNMP agent | net.if.out.pass.v6.bps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 traffic blocked | MIB: BEGEMOT-PF-MIB IPv6 bits per second blocked going out on this interface. |
SNMP agent | net.if.out.block.v6.bps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 packets passed | MIB: BEGEMOT-PF-MIB The number of IPv6 packets passed coming in on this interface. |
SNMP agent | net.if.in.pass.v6.pps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 packets blocked | MIB: BEGEMOT-PF-MIB The number of IPv6 packets blocked coming in on this interface. |
SNMP agent | net.if.in.block.v6.pps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 packets passed | MIB: BEGEMOT-PF-MIB The number of IPv6 packets passed going out on this interface. |
SNMP agent | net.if.out.pass.v6.pps[{#SNMPINDEX}] Preprocessing
|
Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 packets blocked | MIB: BEGEMOT-PF-MIB The number of IPv6 packets blocked going out on this interface. |
SNMP agent | net.if.out.block.v6.pps[{#SNMPINDEX}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: High input error rate | It recovers when it is below 80% of the |
min(/OPNsense by SNMP/net.if.in.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} |Warning |
Depends on:
|
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: High inbound bandwidth usage | The utilization of the network interface is close to its estimated maximum bandwidth. |
(avg(/OPNsense by SNMP/net.if.in[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])>0 |Warning |
Depends on:
|
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: High output error rate | It recovers when it is below 80% of the |
min(/OPNsense by SNMP/net.if.out.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} |Warning |
Depends on:
|
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: High outbound bandwidth usage | The utilization of the network interface is close to its estimated maximum bandwidth. |
(avg(/OPNsense by SNMP/net.if.out[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])>0 |Warning |
Depends on:
|
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Ethernet has changed to lower speed than it was before | This Ethernet connection has transitioned down from its known maximum speed. This might be a sign of autonegotiation issues. Acknowledge to close the problem manually. |
change(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])<0 and last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])>0 and ( last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=6 or last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=7 or last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=11 or last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=62 or last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=69 or last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=117 ) and (last(/OPNsense by SNMP/net.if.status[{#SNMPINDEX}])<>2) |Info |
Depends on:
|
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Link down | This trigger expression works as follows: |
{$IFCONTROL:"{#IFNAME}"}=1 and (last(/OPNsense by SNMP/net.if.status[{#SNMPINDEX}])=2) |Average |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of OpenWeatherMap monitoring by Zabbix via HTTP and doesn't require any external scripts.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Create a host.
Link the template to the host.
Customize the values of {$OPENWEATHERMAP.API.TOKEN} and {$LOCATION} macros.
OpenWeatherMap API Tokens are available in your OpenWeatherMap account https://home.openweathermap.org/api_keys.
Locations can be set by few ways:
|
delimiter.
For example: 43.81821,7.76115|Riga|2643743|94040,us
.
Please note that API requests by city name, zip-codes and city id will be deprecated soon.Language and units macros can be customized too if necessary. List of available languages: https://openweathermap.org/current#multi. Available units of measurement are: standard, metric and imperial https://openweathermap.org/current#data.
Name | Description | Default |
---|---|---|
{$OPENWEATHERMAP.API.TOKEN} | Specify openweathermap API key. |
|
{$LANG} | List of available languages https://openweathermap.org/current#multi. |
en |
{$LOCATION} | Locations can be set by few ways: 1. by geo coordinates (for example: 56.95,24.0833) 2. by location name (for example: Riga) 3. by location ID. Link to the list of city ID: http://bulk.openweathermap.org/sample/city.list.json.gz 4. by zip/post code with a country code (for example: 94040,us) A few locations can be added to the macro at the same time by For example: Please note that API requests by city name, zip-codes and city id will be deprecated soon. |
Riga |
{$OPENWEATHERMAP.API.ENDPOINT} | OpenWeatherMap API endpoint. |
api.openweathermap.org/data/2.5/weather? |
{$UNITS} | Available units of measurement are standard, metric and imperial https://openweathermap.org/current#data. |
metric |
{$TEMP.CRIT.HIGH} | Threshold for high temperature trigger. |
30 |
{$TEMP.CRIT.LOW} | Threshold for low temperature trigger. |
-20 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get data | JSON array with result of OpenWeatherMap API requests. |
Script | openweathermap.get.data |
Get data collection errors | Errors from get data requests by script item. |
Dependent item | openweathermap.get.errors Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
OpenWeatherMap: There are errors in requests to OpenWeatherMap API | Zabbix has received errors in requests to OpenWeatherMap API. |
length(last(/OpenWeatherMap by HTTP/openweathermap.get.errors))>0 |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Locations discovery | Weather metrics discovery by location. |
Dependent item | openweathermap.locations.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
[{#LOCATION}, {#COUNTRY}]: Data | JSON with result of OpenWeatherMap API request by location. |
Dependent item | openweathermap.location.data[{#ID}] Preprocessing
|
[{#LOCATION}, {#COUNTRY}]: Atmospheric pressure | Atmospheric pressure in Pa. |
Dependent item | openweathermap.pressure[{#ID}] Preprocessing
|
[{#LOCATION}, {#COUNTRY}]: Cloudiness | Cloudiness in %. |
Dependent item | openweathermap.clouds[{#ID}] Preprocessing
|
[{#LOCATION}, {#COUNTRY}]: Humidity | Humidity in %. |
Dependent item | openweathermap.humidity[{#ID}] Preprocessing
|
[{#LOCATION}, {#COUNTRY}]: Rain volume for the last one hour | Rain volume for the lat one hour in m. |
Dependent item | openweathermap.rain[{#ID}] Preprocessing
|
[{#LOCATION}, {#COUNTRY}]: Short weather status | Short weather status description. |
Dependent item | openweathermap.description[{#ID}] Preprocessing
|
[{#LOCATION}, {#COUNTRY}]: Snow volume for the last one hour | Snow volume for the lat one hour in m. |
Dependent item | openweathermap.snow[{#ID}] Preprocessing
|
[{#LOCATION}, {#COUNTRY}]: Temperature | Atmospheric temperature value. |
Dependent item | openweathermap.temp[{#ID}] Preprocessing
|
[{#LOCATION}, {#COUNTRY}]: Visibility | Visibility in m. |
Dependent item | openweathermap.visibility[{#ID}] Preprocessing
|
[{#LOCATION}, {#COUNTRY}]: Wind direction | Wind direction in degrees. |
Dependent item | openweathermap.wind.direction[{#ID}] Preprocessing
|
[{#LOCATION}, {#COUNTRY}]: Wind speed | Wind speed value. |
Dependent item | openweathermap.wind.speed[{#ID}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
OpenWeatherMap: [{#LOCATION}, {#COUNTRY}]: Temperature is too high | Temperature value is too high. |
min(/OpenWeatherMap by HTTP/openweathermap.temp[{#ID}],#3)>{$TEMP.CRIT.HIGH} |Average |
Manual close: Yes | |
OpenWeatherMap: [{#LOCATION}, {#COUNTRY}]: Temperature is too low | Temperature value is too low. |
max(/OpenWeatherMap by HTTP/openweathermap.temp[{#ID}],#3)<{$TEMP.CRIT.LOW} |Average |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Nutanix Prism Element monitoring and doesn't require any external scripts.
The templates "Nutanix Host Prism Element by HTTP" and "Nutanix Cluster Prism Element by HTTP" can be used in discovery, as well as manually linked to a host.
More details can be found in the official documentation:
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
{$NUTANIX.PRISM.ELEMENT.IP}
{$NUTANIX.PRISM.ELEMENT.PORT}
{$NUTANIX.USER}
{$NUTANIX.PASSWORD}
Name | Description | Default |
---|---|---|
{$NUTANIX.PRISM.ELEMENT.IP} | Set the Nutanix API IP here. |
<Put your IP here> |
{$NUTANIX.PRISM.ELEMENT.PORT} | Set the Nutanix API port here. |
9440 |
{$NUTANIX.USER} | Nutanix API username. |
<Put your API username here> |
{$NUTANIX.PASSWORD} | Nutanix API password. |
<Put your API password here> |
{$NUTANIX.PROXY} | Sets the HTTP proxy value. If this macro is empty, then no proxy is used. |
|
{$NUTANIX.CLUSTER.DISCOVERY.NAME.MATCHES} | Filter of discoverable Nutanix clusters by name. |
.* |
{$NUTANIX.CLUSTER.DISCOVERY.NAME.NOT_MATCHES} | Filter to exclude discovered Nutanix clusters by name. |
CHANGE_IF_NEEDED |
{$NUTANIX.HOST.DISCOVERY.NAME.MATCHES} | Filter of discoverable Nutanix hosts by name. |
.* |
{$NUTANIX.HOST.DISCOVERY.NAME.NOT_MATCHES} | Filter to exclude discovered Nutanix hosts by name. |
CHANGE_IF_NEEDED |
{$NUTANIX.STORAGE.CONTAINER.DISCOVERY.NAME.MATCHES} | Filter of discoverable storage containers by name. |
.* |
{$NUTANIX.STORAGE.CONTAINER.DISCOVERY.NAME.NOT_MATCHES} | Filter to exclude discovered storage containers by name. |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get cluster | Get the available clusters. |
Script | nutanix.cluster.get |
Get cluster check | Data collection check. Check the latest values for details. |
Dependent item | nutanix.cluster.get.check Preprocessing
|
Get host | Get the available hosts. |
Script | nutanix.host.get |
Get host check | Data collection check. Check the latest values for details. |
Dependent item | nutanix.host.get.check Preprocessing
|
Get storage container | Get the available storage containers. |
Script | nutanix.storage.container.get |
Get storage container check | Data collection check. Check the latest values for details. |
Dependent item | nutanix.storage.container.get.check Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Nutanix: Failed to get cluster data from the API | Failed to get data from the API. Check the latest values for details. |
length(last(/Nutanix Prism Element by HTTP/nutanix.cluster.get.check))>0 |High |
||
Nutanix: Failed to get host data from the API | Failed to get data from the API. Check the latest values for details. |
length(last(/Nutanix Prism Element by HTTP/nutanix.host.get.check))>0 |High |
||
Nutanix: Failed to get storage container data from the API | Failed to get data from the API. Check the latest values for details. |
length(last(/Nutanix Prism Element by HTTP/nutanix.storage.container.get.check))>0 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster discovery | Discovery of all clusters. |
Dependent item | nutanix.cluster.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Host discovery | Discovery of all hosts. |
Dependent item | nutanix.host.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Storage container discovery | Discovery of all storage containers. |
Dependent item | nutanix.storage.container.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Container [{#STORAGE.CONTAINER.NAME}]: Space: Total, bytes | The total space of the storage container. |
Dependent item | nutanix.storage.container.capacity.bytes["{#STORAGE.CONTAINER.UUID}"] Preprocessing
|
Container [{#STORAGE.CONTAINER.NAME}]: Space: Free, bytes | The free space of the storage container. |
Dependent item | nutanix.storage.container.free.bytes["{#STORAGE.CONTAINER.UUID}"] Preprocessing
|
Container [{#STORAGE.CONTAINER.NAME}]: Replication factor | The replication factor of the storage container. |
Dependent item | nutanix.storage.container.replication.factor["{#STORAGE.CONTAINER.UUID}"] Preprocessing
|
Container [{#STORAGE.CONTAINER.NAME}]: Space: Used, bytes | The used space of the storage container. |
Dependent item | nutanix.storage.container.usage.bytes["{#STORAGE.CONTAINER.UUID}"] Preprocessing
|
This template is designed for the effortless deployment of Nutanix Cluster Prism Element monitoring and doesn't require any external scripts.
This template can be used in discovery, as well as manually linked to a host - to do so, attach it to the host and manually set the value of the {$NUTANIX.CLUSTER.UUID}
macro.
More details can be found in the official documentation:
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
{$NUTANIX.PRISM.ELEMENT.IP}
{$NUTANIX.PRISM.ELEMENT.PORT}
{$NUTANIX.USER}
{$NUTANIX.PASSWORD}
{$NUTANIX.CLUSTER.UUID}
Name | Description | Default |
---|---|---|
{$NUTANIX.PRISM.ELEMENT.IP} | Set the Nutanix API IP here. |
<Put your IP here> |
{$NUTANIX.PRISM.ELEMENT.PORT} | Set the Nutanix API port here. |
9440 |
{$NUTANIX.USER} | Nutanix API username. |
<Put your API username here> |
{$NUTANIX.PASSWORD} | Nutanix API password. |
<Put your API password here> |
{$NUTANIX.PROXY} | Sets the HTTP proxy value. If this macro is empty, then no proxy is used. |
|
{$NUTANIX.CLUSTER.UUID} | UUID of the cluster. |
|
{$NUTANIX.TIMEOUT} | API response timeout. |
10s |
{$NUTANIX.ALERT.DISCOVERY.NAME.MATCHES} | Filter of discoverable Nutanix alerts by name. |
.* |
{$NUTANIX.ALERT.DISCOVERY.NAME.NOT_MATCHES} | Filter to exclude discovered Nutanix alerts by name. |
CHANGE_IF_NEEDED |
{$NUTANIX.ALERT.DISCOVERY.STATE.MATCHES} | Filter to exclude discovered Nutanix alerts by state. Set "1" for filtering only problem alerts or "0" for resolved ones. |
.* |
{$NUTANIX.ALERT.DISCOVERY.SEVERITY.MATCHES} | Filter to exclude discovered Nutanix alerts by severity. Set all possible severities for filtering in the range 0-2. "0" - Info, "1" - Warning, "2" - Critical. |
.* |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get metric | Get data about basic metrics. |
Script | nutanix.cluster.metric.get |
Get metric check | Data collection check. Check the latest values for details. |
Dependent item | nutanix.cluster.metric.get.check Preprocessing
|
Get alert | Get data about alerts. |
Script | nutanix.cluster.alert.get |
Get alert check | Data collection check. Check the latest values for details. |
Dependent item | nutanix.cluster.alert.get.check Preprocessing
|
Content Cache: Hit rate, % | Content cache hits over all lookups. |
Dependent item | nutanix.cluster.content.cache.hit.percent Preprocessing
|
Content Cache: Logical memory usage, bytes | Logical memory used to cache data without deduplication in bytes. |
Dependent item | nutanix.cluster.content.cache.logical.memory.usage.bytes Preprocessing
|
Content Cache: Logical saved memory usage, bytes | Memory saved due to content cache deduplication in bytes. |
Dependent item | nutanix.cluster.content.cache.saved.memory.usage.bytes Preprocessing
|
Content Cache: Logical SSD usage, bytes | Logical SSD memory used to cache data without deduplication in bytes. |
Dependent item | nutanix.cluster.content.cache.logical.ssd.usage.bytes Preprocessing
|
Content Cache: Number of lookups | Number of lookups on the content cache. |
Dependent item | nutanix.cluster.content.cache.lookups.num Preprocessing
|
Content Cache: Physical memory usage, bytes | Real memory used to cache data via the content cache in bytes. |
Dependent item | nutanix.cluster.content.cache.physical.memory.usage.bytes Preprocessing
|
Content Cache: Physical SSD usage, bytes | Real SSD usage used to cache data via the content cache in bytes. |
Dependent item | nutanix.cluster.content.cache.physical.ssd.usage.bytes Preprocessing
|
Content Cache: References | Average number of content cache references. |
Dependent item | nutanix.cluster.content.cache.dedup.ref.num Preprocessing
|
Content Cache: Saved SSD usage, bytes | SSD usage saved due to content cache deduplication in bytes. |
Dependent item | nutanix.cluster.content.cache.saved.ssd.usage.bytes Preprocessing
|
Controller: Random IO | The number of random Input/Output operations from the controller. |
Dependent item | nutanix.cluster.controller.io.random Preprocessing
|
Controller: Random IO, % | The percentage of random Input/Output from the controller. |
Dependent item | nutanix.cluster.controller.io.random.percent Preprocessing
|
Controller: Sequence IO | The number of sequential Input/Output operations from the controller. |
Dependent item | nutanix.cluster.controller.io.sequence Preprocessing
|
Controller: Sequence IO, % | The percentage of sequential Input/Output from the controller. |
Dependent item | nutanix.cluster.controller.io.sequence.percent Preprocessing
|
Storage Controller: Timespan, sec | Controller timespan. |
Dependent item | nutanix.cluster.storage.controller.timespan.sec Preprocessing
|
Storage Controller: IO total, bytes | Total controller Input/Output size. |
Dependent item | nutanix.cluster.storage.controller.io.total.bytes Preprocessing
|
Storage Controller: IO total, sec | Total controller Input/Output time. |
Dependent item | nutanix.cluster.storage.controller.io.total.sec Preprocessing
|
Storage Controller: IO total read, bytes | Total controller read Input/Output size. |
Dependent item | nutanix.cluster.storage.controller.io.read.total.bytes Preprocessing
|
Storage Controller: IO total read, sec | Total controller read Input/Output time. |
Dependent item | nutanix.cluster.storage.controller.io.read.total.sec Preprocessing
|
General: Cluster operation mode | The cluster operation mode. One of the following: - NORMAL; - OVERRIDE; - READONLY; - STANDALONE; - SWITCHTOTWO_NODE; - UNKNOWN. |
Dependent item | nutanix.cluster.cluster.operation.mode Preprocessing
|
General: Current redundancy factor | Current value of the redundancy factor on the cluster. |
Dependent item | nutanix.cluster.redundancy.factor.current Preprocessing
|
General: Desired redundancy factor | The desired value of the redundancy factor on the cluster. |
Dependent item | nutanix.cluster.redundancy.factor.desired Preprocessing
|
General: IO | The number of Input/Output operations from the disk. |
Dependent item | nutanix.cluster.general.io Preprocessing
|
General: IOPS | Input/Output operations per second from the disk. |
Dependent item | nutanix.cluster.general.iops Preprocessing
|
General: IO, bandwidth | Data transferred in B/sec from the disk. |
Dependent item | nutanix.cluster.general.io.bandwidth Preprocessing
|
General: IO, latency | Input/Output latency from the disk. |
Dependent item | nutanix.cluster.general.io.latency Preprocessing
|
General: Random IO | The number of random Input/Output operations. |
Dependent item | nutanix.cluster.general.io.random Preprocessing
|
General: Random IO, % | The percentage of random Input/Output operations. |
Dependent item | nutanix.cluster.general.io.random.percent Preprocessing
|
General: Read IO | Total number of Input/Output read operations. |
Dependent item | nutanix.cluster.general.io.read Preprocessing
|
General: Read IOPS | Input/Output read operations per second from the disk. |
Dependent item | nutanix.cluster.general.iops.read Preprocessing
|
General: Read IO, % | The total percentage of Input/Output operations that are reads. |
Dependent item | nutanix.cluster.general.io.read.percent Preprocessing
|
General: Read IO, bandwidth | Read data transferred in B/sec from the disk. |
Dependent item | nutanix.cluster.general.io.read.bandwidth Preprocessing
|
General: Read IO, latency | Average Input/Output read latency. |
Dependent item | nutanix.cluster.general.io.read.latency Preprocessing
|
General: Sequence IO | The number of sequential Input/Output operations. |
Dependent item | nutanix.cluster.general.io.sequence Preprocessing
|
General: Sequence IO, % | The percentage of sequential Input/Output. |
Dependent item | nutanix.cluster.general.io.sequence.percent Preprocessing
|
General: Storage capacity, bytes | Total size of the datastores used by this system in bytes. |
Dependent item | nutanix.cluster.general.storage.capacity.bytes Preprocessing
|
General: Storage free, bytes | Total free space of the datastores used by this system in bytes. |
Dependent item | nutanix.cluster.general.storage.free.bytes Preprocessing
|
General: Storage logical usage, bytes | Total logical space used by the datastores of this system in bytes. |
Dependent item | nutanix.cluster.general.storage.logical.usage.bytes Preprocessing
|
General: Storage usage, bytes | Total physical datastore space used by this host and all its snapshots on the datastores. |
Dependent item | nutanix.cluster.general.storage.usage.bytes Preprocessing
|
General: Timespan, sec | Cluster timespan. |
Dependent item | nutanix.cluster.general.timespan.sec Preprocessing
|
General: IO total, sec | Total time of Input/Output operations. |
Dependent item | nutanix.cluster.general.io.total.sec Preprocessing
|
General: IO total, bytes | Total size of Input/Output operations. |
Dependent item | nutanix.cluster.general.io.total.bytes Preprocessing
|
General: IO total read, sec | Total time of Input/Output read operations. |
Dependent item | nutanix.cluster.general.io.read.total.sec Preprocessing
|
General: IO total read, bytes | Total size of Input/Output read operations. |
Dependent item | nutanix.cluster.general.io.read.total.bytes Preprocessing
|
General: Total transformed usage, bytes | Actual usage of storage. |
Dependent item | nutanix.cluster.general.transformed.usage.total.bytes Preprocessing
|
General: Total untransformed usage, bytes | Logical usage of storage (physical usage divided by the replication factor). |
Dependent item | nutanix.cluster.general.untransformed.usage.total.bytes Preprocessing
|
General: Upgrade progress | Indicates whether the cluster is currently in an update state. |
Dependent item | nutanix.cluster.general.upgrade.progress Preprocessing
|
General: Version | Current software version in the cluster. |
Dependent item | nutanix.cluster.general.upgrade.version Preprocessing
|
General: Write IO | Input/Output write operations from the disk. |
Dependent item | nutanix.cluster.general.io.write Preprocessing
|
General: Write IOPS | Total number of Input/Output write operations per second. |
Dependent item | nutanix.cluster.general.iops.write Preprocessing
|
General: Write IO, % | Total percentage of Input/Output operations that are writes. |
Dependent item | nutanix.cluster.general.io.write.percent Preprocessing
|
General: Write IO, bandwidth | Write data transferred in B/sec from the disk. |
Dependent item | nutanix.cluster.general.io.write.bandwidth Preprocessing
|
General: Write IO, latency | Average Input/Output write operation latency. |
Dependent item | nutanix.cluster.general.io.write.latency Preprocessing
|
Hypervisor: CPU usage, % | Percentage of CPU used by the Hypervisor. |
Dependent item | nutanix.cluster.hypervisor.cpu.usage.percent Preprocessing
|
Hypervisor: IOPS | Input/Output operations per second from the Hypervisor. |
Dependent item | nutanix.cluster.hypervisor.iops Preprocessing
|
Hypervisor: IO, bandwidth | Data transferred in B/sec from the Hypervisor. |
Dependent item | nutanix.cluster.hypervisor.io.bandwidth Preprocessing
|
Hypervisor: IO, latency | Input/Output operation latency from the Hypervisor. |
Dependent item | nutanix.cluster.hypervisor.io.latency Preprocessing
|
Hypervisor: Memory usage, % | Percentage of memory used by the Hypervisor. |
Dependent item | nutanix.cluster.hypervisor.memory.usage.percent Preprocessing
|
Hypervisor: IO | The number of Input/Output operations from the Hypervisor. |
Dependent item | nutanix.cluster.hypervisor.io Preprocessing
|
Hypervisor: Read IO | The number of Input/Output read operations from the Hypervisor. |
Dependent item | nutanix.cluster.hypervisor.io.read Preprocessing
|
Hypervisor: Read IOPS | Input/Output read operations per second from the Hypervisor. |
Dependent item | nutanix.cluster.hypervisor.iops.read Preprocessing
|
Hypervisor: Read IO, bandwidth | Read data transferred in B/sec from the Hypervisor. |
Dependent item | nutanix.cluster.hypervisor.io.read.bandwidth Preprocessing
|
Hypervisor: Read IO, latency | Input/Output read latency from the Hypervisor. |
Dependent item | nutanix.cluster.hypervisor.io.read.latency Preprocessing
|
Hypervisor: Timespan, sec | Hypervisor timespan. |
Dependent item | nutanix.cluster.hypervisor.timespan.sec Preprocessing
|
Hypervisor: IO total, sec | Total Input/Output operation time from the Hypervisor. |
Dependent item | nutanix.cluster.hypervisor.io.total.sec Preprocessing
|
Hypervisor: IO total, bytes | Total Input/Output operation size from the Hypervisor. |
Dependent item | nutanix.cluster.hypervisor.io.total.bytes Preprocessing
|
Hypervisor: IO total read, bytes | Total Input/Output read operation size from the Hypervisor. |
Dependent item | nutanix.cluster.hypervisor.io.read.total.bytes Preprocessing
|
Hypervisor: IO total read, sec | Total Input/Output read operation time from the Hypervisor. |
Dependent item | nutanix.cluster.hypervisor.io.read.total.sec Preprocessing
|
Hypervisor: Write IOPS | Input/Output write operations per second from the Hypervisor. |
Dependent item | nutanix.cluster.hypervisor.iops.write Preprocessing
|
Hypervisor: Write IO | Input/Output write operations from the Hypervisor. |
Dependent item | nutanix.cluster.hypervisor.io.write Preprocessing
|
Hypervisor: Write IO, bandwidth | Write data transferred in B/sec from the Hypervisor. |
Dependent item | nutanix.cluster.hypervisor.io.write.bandwidth Preprocessing
|
Hypervisor: Write IO, latency | Input/Output write latency from the Hypervisor. |
Dependent item | nutanix.cluster.hypervisor.io.write.latency Preprocessing
|
Storage Controller: IOPS | Input/Output operations per second from the Storage Controller. |
Dependent item | nutanix.cluster.storage.controller.iops Preprocessing
|
Storage Controller: IO | Input/Output operations from the Storage Controller. |
Dependent item | nutanix.cluster.storage.controller.io Preprocessing
|
Storage Controller: IO, bandwidth | Data transferred in B/sec from the Storage Controller. |
Dependent item | nutanix.cluster.storage.controller.io.bandwidth Preprocessing
|
Storage Controller: IO, latency | Input/Output latency from the Storage Controller. |
Dependent item | nutanix.cluster.storage.controller.io.latency Preprocessing
|
Storage Controller: Read IOPS | Input/Output read operations per second from the Storage Controller. |
Dependent item | nutanix.cluster.storage.controller.iops.read Preprocessing
|
Storage Controller: Read IO | Input/Output read operations from the Storage Controller. |
Dependent item | nutanix.cluster.storage.controller.io.read Preprocessing
|
Storage Controller: Read IO, % | Percentage of Input/Output operations from the Storage Controller that are reads. |
Dependent item | nutanix.cluster.storage.controller.io.read.percent Preprocessing
|
Storage Controller: Read IO, bandwidth | Read data transferred in B/sec from the Storage Controller. |
Dependent item | nutanix.cluster.storage.controller.io.read.bandwidth Preprocessing
|
Storage Controller: Read IO, latency | Input/Output read latency from the Storage Controller. |
Dependent item | nutanix.cluster.storage.controller.io.read.latency Preprocessing
|
Storage Controller: Read IO, bytes | Storage controller average read Input/Output in bytes. |
Dependent item | nutanix.cluster.storage.controller.io.read.bytes Preprocessing
|
Storage Controller: Total transformed usage, bytes | Actual usage of the Storage Controller. |
Dependent item | nutanix.cluster.storage.controller.transformed.usage.total.bytes Preprocessing
|
Storage Controller: Write IO | Input/Output write operations to the Storage Controller. |
Dependent item | nutanix.cluster.storage.controller.io.write Preprocessing
|
Storage Controller: Write IOPS | Input/Output write operations per second to the Storage Controller. |
Dependent item | nutanix.cluster.storage.controller.iops.write Preprocessing
|
Storage Controller: Write IO, % | Percentage of Input/Output operations to the Storage Controller that are writes. |
Dependent item | nutanix.cluster.storage.controller.io.write.percent Preprocessing
|
Storage Controller: Write IO, bandwidth | Write data transferred in B/sec to the Storage Controller. |
Dependent item | nutanix.cluster.storage.controller.io.write.bandwidth Preprocessing
|
Storage Controller: Write IO, latency | Input/Output write latency to the Storage Controller. |
Dependent item | nutanix.cluster.storage.controller.io.write.latency Preprocessing
|
Storage Controller: Write IO, bytes | Storage Controller average write Input/Output in bytes. |
Dependent item | nutanix.cluster.storage.controller.io.write.bytes Preprocessing
|
Storage Tier: Das-sata capacity, bytes | The total capacity of Das-sata in bytes. |
Dependent item | nutanix.cluster.storage.controller.tier.das_sata.capacity.bytes Preprocessing
|
Storage Tier: Das-sata free, bytes | The free space of Das-sata in bytes. |
Dependent item | nutanix.cluster.storage.controller.tier.das_sata.free.bytes Preprocessing
|
Storage Tier: Das-sata usage, bytes | The used space of Das-sata in bytes. |
Dependent item | nutanix.cluster.storage.controller.tier.das_sata.usage.bytes Preprocessing
|
Storage Tier: SSD capacity, bytes | The total capacity of SSD in bytes. |
Dependent item | nutanix.cluster.storage.controller.tier.ssd.capacity.bytes Preprocessing
|
Storage Tier: SSD free, bytes | The free space of SSD in bytes. |
Dependent item | nutanix.cluster.storage.controller.tier.ssd.free.bytes Preprocessing
|
Storage Tier: SSD usage, bytes | The used space of SSD in bytes. |
Dependent item | nutanix.cluster.storage.controller.tier.ssd.usage.bytes Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Nutanix: Failed to get metric data from the API | Failed to get data from the API. Check the latest values for details. |
length(last(/Nutanix Cluster Prism Element by HTTP/nutanix.cluster.metric.get.check))>0 |High |
||
Nutanix: Failed to get alert data from the API | Failed to get data from the API. Check the latest values for details. |
length(last(/Nutanix Cluster Prism Element by HTTP/nutanix.cluster.alert.get.check))>0 |High |
||
Nutanix: Redundancy factor mismatched | Current redundancy factor does not match the desired redundancy factor. |
last(/Nutanix Cluster Prism Element by HTTP/nutanix.cluster.redundancy.factor.current)<>last(/Nutanix Cluster Prism Element by HTTP/nutanix.cluster.redundancy.factor.desired) |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Alert discovery | Discovery of all alerts. Alerts will be grouped by title. For each alert, in addition to the basic information, the number of activation and last alert ID will be available. |
Dependent item | nutanix.cluster.alert.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Alert [{#ALERT.NAME}]: Full title | The full title of the alert. |
Dependent item | nutanix.cluster.alert.title["{#ALERT.KEY}"] Preprocessing
|
Alert [{#ALERT.NAME}]: Create datetime | The alert creation date and time. |
Dependent item | nutanix.cluster.alert.created["{#ALERT.KEY}"] Preprocessing
|
Alert [{#ALERT.NAME}]: Severity | Alert severity. One of the following: - Info; - Warning; - Critical; - Unknown. |
Dependent item | nutanix.cluster.alert.severity["{#ALERT.KEY}"] Preprocessing
|
Alert [{#ALERT.NAME}]: State | Alert state. One of the following: - OK; - Problem. |
Dependent item | nutanix.cluster.alert.state["{#ALERT.KEY}"] Preprocessing
|
Alert [{#ALERT.NAME}]: Detailed message | Detailed information about the current alert. |
Dependent item | nutanix.cluster.alert.message["{#ALERT.KEY}"] Preprocessing
|
Alert [{#ALERT.NAME}]: Last alert ID | Latest ID of the alert. |
Dependent item | nutanix.cluster.alert.last_id["{#ALERT.KEY}"] Preprocessing
|
Alert [{#ALERT.NAME}]: Count alerts | The number of times this alert was triggered. |
Dependent item | nutanix.cluster.alert.count["{#ALERT.KEY}"] Preprocessing
|
This template is designed for the effortless deployment of Nutanix Host Prism Element monitoring and doesn't require any external scripts.
This template can be used in discovery, as well as manually linked to a host - to do so, attach it to the host and manually set the value of the {$NUTANIX.HOST.UUID}
macro.
More details can be found in the official documentation:
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
{$NUTANIX.PRISM.ELEMENT.IP}
{$NUTANIX.PRISM.ELEMENT.PORT}
{$NUTANIX.USER}
{$NUTANIX.PASSWORD}
{$NUTANIX.HOST.UUID}
Name | Description | Default |
---|---|---|
{$NUTANIX.PRISM.ELEMENT.IP} | Set the Nutanix API IP here. |
<Put your IP here> |
{$NUTANIX.PRISM.ELEMENT.PORT} | Set the Nutanix API port here. |
9440 |
{$NUTANIX.USER} | Nutanix API username. |
<Put your API username here> |
{$NUTANIX.PASSWORD} | Nutanix API password. |
<Put your API password here> |
{$NUTANIX.PROXY} | Sets the HTTP proxy value. If this macro is empty, then no proxy is used. |
|
{$NUTANIX.HOST.UUID} | UUID of the host. |
|
{$NUTANIX.TIMEOUT} | API response timeout. |
10s |
{$NUTANIX.ALERT.DISCOVERY.NAME.MATCHES} | Filter of discoverable Nutanix alerts by name. |
.* |
{$NUTANIX.ALERT.DISCOVERY.NAME.NOT_MATCHES} | Filter to exclude discovered Nutanix alerts by name. |
CHANGE_IF_NEEDED |
{$NUTANIX.ALERT.DISCOVERY.STATE.MATCHES} | Filter to exclude discovered Nutanix alerts by state. Set "1" for filtering only problem alerts or "0" for resolved ones. |
.* |
{$NUTANIX.ALERT.DISCOVERY.SEVERITY.MATCHES} | Filter to exclude discovered Nutanix alerts by severity. Set all possible severities for filtering in the range 0-2. "0" - Info, "1" - Warning, "2" - Critical. |
.* |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get metric | Get data about basic metrics. |
Script | nutanix.host.metric.get |
Get metric check | Data collection check. Check the latest values for details. |
Dependent item | nutanix.host.metric.get.check Preprocessing
|
Get disk | Get data about installed disks. |
Script | nutanix.host.disk.get |
Get disk check | Data collection check. Check the latest values for details. |
Dependent item | nutanix.host.disk.get.check Preprocessing
|
Get alert | Get data about alerts. |
Script | nutanix.host.alert.get |
Get alert check | Data collection check. Check the latest values for details. |
Dependent item | nutanix.host.alert.get.check Preprocessing
|
Content Cache: Hit rate, % | Content cache hits over all lookups. |
Dependent item | nutanix.host.content.cache.hit.percent Preprocessing
|
Content Cache: Logical memory usage, bytes | Logical memory used to cache data without deduplication in bytes. |
Dependent item | nutanix.host.content.cache.logical.memory.usage.bytes Preprocessing
|
Content Cache: Logical saved memory usage, bytes | Memory saved due to content cache deduplication in bytes. |
Dependent item | nutanix.host.content.cache.saved.memory.usage.bytes Preprocessing
|
Content Cache: Logical SSD usage, bytes | Logical SSD memory used to cache data without deduplication in bytes. |
Dependent item | nutanix.host.content.cache.logical.ssd.usage.bytes Preprocessing
|
Content Cache: Number of lookups | Number of lookups on the content cache. |
Dependent item | nutanix.host.content.cache.lookups.num Preprocessing
|
Content Cache: Physical memory usage, bytes | Real memory used to cache data via the content cache in bytes. |
Dependent item | nutanix.host.content.cache.physical.memory.usage.bytes Preprocessing
|
Content Cache: Physical SSD usage, bytes | Real SSD usage used to cache data via the content cache in bytes. |
Dependent item | nutanix.host.content.cache.physical.ssd.usage.bytes Preprocessing
|
Content Cache: References | Average number of content cache references. |
Dependent item | nutanix.host.content.cache.dedup.ref.num Preprocessing
|
Content Cache: Saved SSD usage, bytes | SSD usage saved due to content cache deduplication in bytes. |
Dependent item | nutanix.host.content.cache.saved.ssd.usage.bytes Preprocessing
|
Controller: Random IO | The number of random Input/Output operations from the controller. |
Dependent item | nutanix.host.controller.io.random Preprocessing
|
Controller: Random IO, % | The percentage of random Input/Output from the controller. |
Dependent item | nutanix.host.controller.io.random.percent Preprocessing
|
Controller: Sequence IO | The number of sequential Input/Output operations from the controller. |
Dependent item | nutanix.host.controller.io.sequence Preprocessing
|
Controller: Sequence IO, % | The percentage of sequential Input/Output from the controller. |
Dependent item | nutanix.host.controller.io.sequence.percent Preprocessing
|
Storage Controller: Timespan, sec | Controller timespan. |
Dependent item | nutanix.host.storage.controller.timespan.sec Preprocessing
|
Storage Controller: IO total, bytes | Total controller Input/Output size. |
Dependent item | nutanix.host.storage.controller.io.total.bytes Preprocessing
|
Storage Controller: IO total, sec | Total controller Input/Output time. |
Dependent item | nutanix.host.storage.controller.io.total.sec Preprocessing
|
Storage Controller: IO total read, bytes | Total controller read Input/Output size. |
Dependent item | nutanix.host.storage.controller.io.read.total.bytes Preprocessing
|
Storage Controller: IO total read, sec | Total controller read Input/Output time. |
Dependent item | nutanix.host.storage.controller.io.read.total.sec Preprocessing
|
General: Boot time | The last host boot time. |
Dependent item | nutanix.host.general.boot.time Preprocessing
|
General: CPU frequency | The processor frequency. |
Dependent item | nutanix.host.general.cpu.frequency Preprocessing
|
General: CPU model | The processor model. |
Dependent item | nutanix.host.general.cpu.model Preprocessing
|
General: Host state | Displays the host state. One of the following: - NEW; - NORMAL; - MARKEDFORREMOVALBUTNOT_DETACHABLE; - DETACHABLE. |
Dependent item | nutanix.host.general.state Preprocessing
|
General: Host type | Displays the host type. One of the following: - HYPERCONVERGED; - COMPUTEONLY. |
Dependent item | nutanix.host.general.type Preprocessing
|
General: IOPS | Input/Output operations per second from the disk. |
Dependent item | nutanix.host.general.iops Preprocessing
|
General: IO | The number of Input/Output operations from the disk. |
Dependent item | nutanix.host.general.io Preprocessing
|
General: IO, bandwidth | Data transferred in B/sec from the disk. |
Dependent item | nutanix.host.general.io.bandwidth Preprocessing
|
General: IO, latency | Input/Output latency from the disk. |
Dependent item | nutanix.host.general.io.latency Preprocessing
|
General: Degrade status | Indicates whether the host is in a degraded state. One of the following: - Normal; - Degraded; - Unknown. |
Dependent item | nutanix.host.general.degraded Preprocessing
|
General: Maintenance mode | Indicates whether the host is in maintenance mode. One of the following: - Normal; - Maintenance; - Unknown. |
Dependent item | nutanix.host.general.maintenance Preprocessing
|
General: Number of virtual machines | Number of virtual machines running on this host. |
Dependent item | nutanix.host.general.vms.num Preprocessing
|
General: Random IO | The number of random Input/Output operations. |
Dependent item | nutanix.host.general.io.random Preprocessing
|
General: Random IO, % | The percentage of random Input/Output. |
Dependent item | nutanix.host.general.io.random.percent Preprocessing
|
General: Read IO | Input/Output read operations from the disk. |
Dependent item | nutanix.host.general.io.read Preprocessing
|
General: Read IOPS | Total number of Input/Output read operations per second. |
Dependent item | nutanix.host.general.iops.read Preprocessing
|
General: Read IO, % | The total percentage of Input/Output operations that are reads. |
Dependent item | nutanix.host.general.io.read.percent Preprocessing
|
General: Read IO, bandwidth | Read data transferred in B/sec from the disk. |
Dependent item | nutanix.host.general.io.read.bandwidth Preprocessing
|
General: Read IO, latency | Average Input/Output read latency. |
Dependent item | nutanix.host.general.io.read.latency Preprocessing
|
General: Reboot pending | Indicates whether the host is pending to reboot. |
Dependent item | nutanix.host.general.reboot Preprocessing
|
General: Sequence IO | The number of sequential Input/Output operations. |
Dependent item | nutanix.host.general.io.sequence Preprocessing
|
General: Sequence IO, % | The percentage of sequential Input/Output. |
Dependent item | nutanix.host.general.io.sequence.percent Preprocessing
|
General: Storage capacity, bytes | Total size of the datastores used by this system in bytes. |
Dependent item | nutanix.host.general.storage.capacity.bytes Preprocessing
|
General: Storage free, bytes | Total free space of all the datastores used by this system in bytes. |
Dependent item | nutanix.host.general.storage.free.bytes Preprocessing
|
General: Storage logical usage, bytes | Total logical used space by the datastores of this system in bytes. |
Dependent item | nutanix.host.general.storage.logical.usage.bytes Preprocessing
|
General: Storage usage, bytes | Total physical datastore space used by this host and all its snapshots on the datastores. |
Dependent item | nutanix.host.general.storage.usage.bytes Preprocessing
|
General: Timespan, sec | Host timespan. |
Dependent item | nutanix.host.general.timespan.sec Preprocessing
|
General: Total CPU capacity | Total host CPU capacity in Hz. |
Dependent item | nutanix.host.general.cpu.capacity.hz Preprocessing
|
General: IO total, sec | Total time of Input/Output operations. |
Dependent item | nutanix.host.general.io.total.sec Preprocessing
|
General: IO total, bytes | Total size of Input/Output operations. |
Dependent item | nutanix.host.general.io.total.bytes Preprocessing
|
General: Total memory, bytes | Total host memory in bytes. |
Dependent item | nutanix.host.general.memory.total.bytes Preprocessing
|
General: IO total read, sec | Total time of Input/Output read operations. |
Dependent item | nutanix.host.general.io.read.total.sec Preprocessing
|
General: IO total read, bytes | Total size of Input/Output read operations. |
Dependent item | nutanix.host.general.io.read.total.bytes Preprocessing
|
General: Total transformed usage, bytes | Actual usage of storage. |
Dependent item | nutanix.host.general.transformed.usage.total.bytes Preprocessing
|
General: Total untransformed usage, bytes | Logical usage of storage (physical usage divided by the replication factor). |
Dependent item | nutanix.host.general.untransformed.usage.total.bytes Preprocessing
|
General: Write IO | Total number of Input/Output write operations. |
Dependent item | nutanix.host.general.io.write Preprocessing
|
General: Write IOPS | Total number of Input/Output operations write per second. |
Dependent item | nutanix.host.general.iops.write Preprocessing
|
General: Write IO, % | Total percentage of Input/Output operations that are writes. |
Dependent item | nutanix.host.general.io.write.percent Preprocessing
|
General: Write IO, bandwidth | Write data transferred in B/sec from the disk. |
Dependent item | nutanix.host.general.io.write.bandwidth Preprocessing
|
General: Write IO, latency | Average Input/Output write operation latency. |
Dependent item | nutanix.host.general.io.write.latency Preprocessing
|
Hypervisor: CPU usage, % | Percentage of CPU used by the Hypervisor. |
Dependent item | nutanix.host.hypervisor.cpu.usage.percent Preprocessing
|
Hypervisor: Full name | Full name of the Hypervisor running on the host. |
Dependent item | nutanix.host.hypervisor.name Preprocessing
|
Hypervisor: IOPS | Input/Output operations per second from the Hypervisor. |
Dependent item | nutanix.host.hypervisor.iops Preprocessing
|
Hypervisor: IO, bandwidth | Data transferred in B/sec from the Hypervisor. |
Dependent item | nutanix.host.hypervisor.io.bandwidth Preprocessing
|
Hypervisor: IO, latency | Input/Output operation latency from the Hypervisor. |
Dependent item | nutanix.host.hypervisor.io.latency Preprocessing
|
Hypervisor: Memory usage, % | Percentage of memory used by the Hypervisor. |
Dependent item | nutanix.host.hypervisor.memory.usage.percent Preprocessing
|
Hypervisor: IO | The number of Input/Output operations from the Hypervisor. |
Dependent item | nutanix.host.hypervisor.io Preprocessing
|
Hypervisor: Read IOPS | The number of Input/Output read operations from the Hypervisor. |
Dependent item | nutanix.host.hypervisor.iops.read Preprocessing
|
Hypervisor: Read IO | Input/Output read operations per second from the Hypervisor. |
Dependent item | nutanix.host.hypervisor.io.read Preprocessing
|
Hypervisor: Read IO, bandwidth | Read data transferred in B/sec from the Hypervisor. |
Dependent item | nutanix.host.hypervisor.io.read.bandwidth Preprocessing
|
Hypervisor: Read IO, latency | Input/Output read latency from the Hypervisor. |
Dependent item | nutanix.host.hypervisor.io.read.latency Preprocessing
|
Hypervisor: Received, bytes | Bytes received over the network reported by the Hypervisor. |
Dependent item | nutanix.host.hypervisor.received.bytes Preprocessing
|
Hypervisor: Timespan, sec | Hypervisor timespan. |
Dependent item | nutanix.host.hypervisor.timespan.sec Preprocessing
|
Hypervisor: IO total, sec | Total Input/Output operation time from the Hypervisor. |
Dependent item | nutanix.host.hypervisor.io.total.sec Preprocessing
|
Hypervisor: IO total, bytes | Total Input/Output operation size from the Hypervisor. |
Dependent item | nutanix.host.hypervisor.io.total.bytes Preprocessing
|
Hypervisor: IO total read, bytes | Total size of Input/Output read operations from the Hypervisor. |
Dependent item | nutanix.host.hypervisor.io.read.total.bytes Preprocessing
|
Hypervisor: IO total read, sec | Total time of Input/Output read operations from the Hypervisor. |
Dependent item | nutanix.host.hypervisor.io.read.total.sec Preprocessing
|
Hypervisor: Transmitted, bytes | Bytes transmitted over the network reported by the Hypervisor. |
Dependent item | nutanix.host.hypervisor.transmitted.bytes Preprocessing
|
Hypervisor: Write IOPS | Input/Output write operations per second from the Hypervisor. |
Dependent item | nutanix.host.hypervisor.iops.write Preprocessing
|
Hypervisor: Write IO | Input/Output write operations from the Hypervisor. |
Dependent item | nutanix.host.hypervisor.io.write Preprocessing
|
Hypervisor: Write IO, bandwidth | Write data transferred in B/sec from the Hypervisor. |
Dependent item | nutanix.host.hypervisor.io.write.bandwidth Preprocessing
|
Hypervisor: Write IO, latency | Input/Output write latency from the Hypervisor. |
Dependent item | nutanix.host.hypervisor.io.write.latency Preprocessing
|
Hypervisor: Number of CPU cores | The number of CPU cores. |
Dependent item | nutanix.host.hypervisor.cpu.cores.num Preprocessing
|
Hypervisor: Number of CPU sockets | The number of CPU sockets. |
Dependent item | nutanix.host.hypervisor.cpu.sockets.num Preprocessing
|
Hypervisor: Number of CPU threads | The number of CPU threads. |
Dependent item | nutanix.host.hypervisor.cpu.threads.num Preprocessing
|
Storage Controller: IOPS | Input/Output operations per second from the Storage Controller. |
Dependent item | nutanix.host.storage.controller.iops Preprocessing
|
Storage Controller: IO | Input/Output operations from the Storage Controller. |
Dependent item | nutanix.host.storage.controller.io Preprocessing
|
Storage Controller: IO, bandwidth | Data transferred in B/sec from the Storage Controller. |
Dependent item | nutanix.host.storage.controller.io.bandwidth Preprocessing
|
Storage Controller: IO, latency | Input/Output latency from the Storage Controller. |
Dependent item | nutanix.host.storage.controller.io.latency Preprocessing
|
Storage Controller: Read IOPS | Input/Output read operations per second from the Storage Controller. |
Dependent item | nutanix.host.storage.controller.iops.read Preprocessing
|
Storage Controller: Read IO | Input/Output read operations from the Storage Controller. |
Dependent item | nutanix.host.storage.controller.io.read Preprocessing
|
Storage Controller: Read IO, % | Percentage of Input/Output operations from the Storage Controller that are reads. |
Dependent item | nutanix.host.storage.controller.io.read.percent Preprocessing
|
Storage Controller: Read IO, bandwidth | Read data transferred in B/sec from the Storage Controller. |
Dependent item | nutanix.host.storage.controller.io.read.bandwidth Preprocessing
|
Storage Controller: Read IO, latency | Input/Output read latency from the Storage Controller. |
Dependent item | nutanix.host.storage.controller.io.read.latency Preprocessing
|
Storage Controller: Read IO, bytes | Storage Controller average read Input/Output in bytes. |
Dependent item | nutanix.host.storage.controller.io.read.bytes Preprocessing
|
Storage Controller: Total transformed usage, bytes | Actual usage of the Storage Controller. |
Dependent item | nutanix.host.storage.controller.transformed.usage.total.bytes Preprocessing
|
Storage Controller: Write IO | Input/Output write operations to the Storage Controller. |
Dependent item | nutanix.host.storage.controller.io.write Preprocessing
|
Storage Controller: Write IOPS | Input/Output write operations per second to the Storage Controller. |
Dependent item | nutanix.host.storage.controller.iops.write Preprocessing
|
Storage Controller: Write IO, % | Percentage of Input/Output operations to the Storage Controller that are writes. |
Dependent item | nutanix.host.storage.controller.io.write.percent Preprocessing
|
Storage Controller: Write IO, bandwidth | Write data transferred in B/sec to the Storage Controller. |
Dependent item | nutanix.host.storage.controller.io.write.bandwidth Preprocessing
|
Storage Controller: Write IO, latency | Input/Output write latency to the Storage Controller. |
Dependent item | nutanix.host.storage.controller.io.write.latency Preprocessing
|
Storage Controller: Write IO, bytes | Storage Controller average write Input/Output in bytes. |
Dependent item | nutanix.host.storage.controller.io.write.bytes Preprocessing
|
Storage Tier: Das-sata capacity, bytes | The total capacity of Das-sata in bytes. |
Dependent item | nutanix.host.storage.controller.tier.das_sata.capacity.bytes Preprocessing
|
Storage Tier: Das-sata free, bytes | The free space of Das-sata in bytes. |
Dependent item | nutanix.host.storage.controller.tier.das_sata.free.bytes Preprocessing
|
Storage Tier: Das-sata usage, bytes | The used space of Das-sata in bytes. |
Dependent item | nutanix.host.storage.controller.tier.das_sata.usage.bytes Preprocessing
|
Storage Tier: SSD capacity, bytes | The total capacity of SSD in bytes. |
Dependent item | nutanix.host.storage.controller.tier.ssd.capacity.bytes Preprocessing
|
Storage Tier: SSD free, bytes | The free space of SSD in bytes. |
Dependent item | nutanix.host.storage.controller.tier.ssd.free.bytes Preprocessing
|
Storage Tier: SSD usage, bytes | The used space of SSD in bytes. |
Dependent item | nutanix.host.storage.controller.tier.ssd.usage.bytes Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Nutanix: Failed to get metric data from the API | Failed to get data from the API. Check the latest values for details. |
length(last(/Nutanix Host Prism Element by HTTP/nutanix.host.metric.get.check))>0 |High |
||
Nutanix: Failed to get disk data from the API | Failed to get data from the API. Check the latest values for details. |
length(last(/Nutanix Host Prism Element by HTTP/nutanix.host.disk.get.check))>0 |High |
||
Nutanix: Failed to get alert data from the API | Failed to get data from the API. Check the latest values for details. |
length(last(/Nutanix Host Prism Element by HTTP/nutanix.host.alert.get.check))>0 |High |
||
Nutanix: Host is in degraded status | Host is in a degraded status. The host may soon become unavailable. |
last(/Nutanix Host Prism Element by HTTP/nutanix.host.general.degraded)=1 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Disk discovery | Discovery of all disks. |
Dependent item | nutanix.host.disk.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Disk [{#DISK.SERIAL}]: Bandwidth | Bandwidth of the disk in B/sec. |
Dependent item | nutanix.host.disk.io.bandwidth["{#DISK.SERIAL}"] Preprocessing
|
Disk [{#DISK.SERIAL}]: Space: Total, bytes | The total disk space in bytes. |
Dependent item | nutanix.host.disk.capacity.bytes["{#DISK.SERIAL}"] Preprocessing
|
Disk [{#DISK.SERIAL}]: Space: Free, bytes | The free disk space in bytes. |
Dependent item | nutanix.host.disk.free.bytes["{#DISK.SERIAL}"] Preprocessing
|
Disk [{#DISK.SERIAL}]: IOPS | The number of Input/Output operations from the disk. |
Dependent item | nutanix.host.disk.iops["{#DISK.SERIAL}"] Preprocessing
|
Disk [{#DISK.SERIAL}]: IO, latency | The average Input/Output operation latency. |
Dependent item | nutanix.host.disk.io.avg.latency["{#DISK.SERIAL}"] Preprocessing
|
Disk [{#DISK.SERIAL}]: Space: Logical usage, bytes | The logical used disk space in bytes. |
Dependent item | nutanix.host.disk.logical.usage.bytes["{#DISK.SERIAL}"] Preprocessing
|
Disk [{#DISK.SERIAL}]: Online | Indicates whether the disk is online. |
Dependent item | nutanix.host.disk.online["{#DISK.SERIAL}"] Preprocessing
|
Disk [{#DISK.SERIAL}]: Status | Current disk status. One of the following: - NORMAL; - DATAMIGRATIONINITIATED; - MARKEDFORREMOVALBUTNOT_DETACHABLE; - DETACHABLE. |
Dependent item | nutanix.host.disk.status["{#DISK.SERIAL}"] Preprocessing
|
Disk [{#DISK.SERIAL}]: Space: Used, bytes | The used disk space in bytes. |
Dependent item | nutanix.host.disk.usage.bytes["{#DISK.SERIAL}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Alert discovery | Discovery of all alerts. Alerts will be grouped by title. For each alert, in addition to the basic information, the number of activation and last alert ID will be available. |
Dependent item | nutanix.host.alert.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Alert [{#ALERT.NAME}]: Full title | The full title of the alert. |
Dependent item | nutanix.host.alert.title["{#ALERT.KEY}"] Preprocessing
|
Alert [{#ALERT.NAME}]: Create datetime | The alert creation date and time. |
Dependent item | nutanix.host.alert.created["{#ALERT.KEY}"] Preprocessing
|
Alert [{#ALERT.NAME}]: Severity | Alert severity. One of the following: - Info; - Warning; - Critical; - Unknown. |
Dependent item | nutanix.host.alert.severity["{#ALERT.KEY}"] Preprocessing
|
Alert [{#ALERT.NAME}]: State | Alert state. One of the following: - OK; - Problem. |
Dependent item | nutanix.host.alert.state["{#ALERT.KEY}"] Preprocessing
|
Alert [{#ALERT.NAME}]: Detailed message | Detailed information about the current alert. |
Dependent item | nutanix.host.alert.message["{#ALERT.KEY}"] Preprocessing
|
Alert [{#ALERT.NAME}]: Last alert ID | Latest ID of the alert. |
Dependent item | nutanix.host.alert.last_id["{#ALERT.KEY}"] Preprocessing
|
Alert [{#ALERT.NAME}]: Count alerts | The number of times this alert was triggered. |
Dependent item | nutanix.host.alert.count["{#ALERT.KEY}"] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed to monitor HashiCorp Nomad by Zabbix. It works without any external scripts. Currently the template supports Nomad servers and clients discovery.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
{$NOMAD.ENDPOINT.API.URL}
macro value with correct web protocol, host and port.node:read
, namespace:read-job
, agent:read
and management
permissions applied. Define the {$NOMAD.TOKEN}
macro value.
> Refer to the vendor documentation about Nomad native ACL
or Nomad Vault-generated tokens
if you have the HashiCorp Vault integration configured.Additional information:
Useful links
Name | Description | Default |
---|---|---|
{$NOMAD.ENDPOINT.API.URL} | API endpoint URL for one of the Nomad cluster members. |
http://localhost:4646 |
{$NOMAD.TOKEN} | Nomad authentication token. |
<PUT YOUR AUTH TOKEN> |
{$NOMAD.DATA.TIMEOUT} | Response timeout for an API. |
15s |
{$NOMAD.HTTP.PROXY} | Sets the HTTP proxy for script and HTTP agent items. If this parameter is empty, then no proxy is used. |
|
{$NOMAD.API.RESPONSE.SUCCESS} | HTTP API successful response code. Availability triggers threshold. Change, if needed. |
200 |
{$NOMAD.SERVER.NAME.MATCHES} | The filter to include HashiCorp Nomad servers by name. |
.* |
{$NOMAD.SERVER.NAME.NOT_MATCHES} | The filter to exclude HashiCorp Nomad servers by name. |
CHANGE_IF_NEEDED |
{$NOMAD.SERVER.DC.MATCHES} | The filter to include HashiCorp Nomad servers by datacenter belonging. |
.* |
{$NOMAD.SERVER.DC.NOT_MATCHES} | The filter to exclude HashiCorp Nomad servers by datacenter belonging. |
CHANGE_IF_NEEDED |
{$NOMAD.CLIENT.NAME.MATCHES} | The filter to include HashiCorp Nomad clients by name. |
.* |
{$NOMAD.CLIENT.NAME.NOT_MATCHES} | The filter to exclude HashiCorp Nomad clients by name. |
CHANGE_IF_NEEDED |
{$NOMAD.CLIENT.DC.MATCHES} | The filter to include HashiCorp Nomad clients by datacenter belonging. |
.* |
{$NOMAD.CLIENT.DC.NOT_MATCHES} | The filter to exclude HashiCorp Nomad clients by datacenter belonging. |
CHANGE_IF_NEEDED |
{$NOMAD.CLIENT.SCHEDULE.ELIGIBILITY.MATCHES} | The filter to include HashiCorp Nomad clients by scheduling eligibility. |
.* |
{$NOMAD.CLIENT.SCHEDULE.ELIGIBILITY.NOT_MATCHES} | The filter to exclude HashiCorp Nomad clients by scheduling eligibility. |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
Nomad clients get | Nomad clients data in raw format. |
HTTP agent | nomad.client.nodes.get Preprocessing
|
Client nodes API response | Client nodes API response message. |
Dependent item | nomad.client.nodes.api.response Preprocessing
|
Nomad servers get | Nomad servers data in raw format. |
Script | nomad.server.nodes.get |
Server-related APIs response | Server-related ( |
Dependent item | nomad.server.api.response Preprocessing
|
Region | Current cluster region. |
Dependent item | nomad.region Preprocessing
|
Nomad servers count | Nomad servers count. |
Dependent item | nomad.servers.count Preprocessing
|
Nomad clients count | Nomad clients count. |
Dependent item | nomad.clients.count Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HashiCorp Nomad: Client nodes API connection has failed | Client nodes API connection has failed. |
find(/HashiCorp Nomad by HTTP/nomad.client.nodes.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0 |Average |
Manual close: Yes | |
HashiCorp Nomad: Server-related API connection has failed | Server-related API connection has failed. |
find(/HashiCorp Nomad by HTTP/nomad.server.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0 |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Clients discovery | Client nodes discovery. |
Dependent item | nomad.clients.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Servers discovery | Server nodes discovery. |
Dependent item | nomad.servers.discovery Preprocessing
|
This template is designed to monitor HashiCorp Nomad clients by Zabbix. It works without any external scripts.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
vendor documentation
.node:read
, namespace:read-job
permissions applied. Define the {$NOMAD.TOKEN}
macro value.
> Refer to the vendor documentation about Nomad native ACL
or Nomad Vault-generated tokens
if you're using integration with HashiCorp Vault.{$NOMAD.CLIENT.API.SCHEME}
and {$NOMAD.CLIENT.API.PORT}
macros to define the common Nomad API web schema and connection port.Additional information:
You have to prepare an additional ACL token only if you wish to monitor Nomad clients as separate entities. If you're using clients discovery - token will be inherited from the master host linked to the HashiCorp Nomad by HTTP template.
If you're not using ACL - skip 2nd setup step.
The Nomad clients use the default web schema - HTTP
and default API port - 4646
. If you're using clients discovery and you need to re-define macros for the particular host created from prototype, use the context macros like {{$NOMAD.CLIENT.API.SCHEME:NECESSARY.IP
}} or/and {{$NOMAD.CLIENT.API.PORT:NECESSARY.IP
}} on master host or template level.
Useful links:
Name | Description | Default |
---|---|---|
{$NOMAD.CLIENT.API.SCHEME} | Nomad client API scheme. |
http |
{$NOMAD.CLIENT.API.PORT} | Nomad client API port. |
4646 |
{$NOMAD.TOKEN} | Nomad authentication token. |
<PUT YOUR AUTH TOKEN> |
{$NOMAD.DATA.TIMEOUT} | Response timeout for an API. |
15s |
{$NOMAD.HTTP.PROXY} | Sets the HTTP proxy for HTTP agent item. If this parameter is empty, then no proxy is used. |
|
{$NOMAD.API.RESPONSE.SUCCESS} | HTTP API successful response code. Availability triggers threshold. Change, if needed. |
200 |
{$NOMAD.CLIENT.RPC.PORT} | Nomad RPC service port. |
4647 |
{$NOMAD.CLIENT.SERF.PORT} | Nomad serf service port. |
4648 |
{$NOMAD.CLIENT.OPEN.FDS.MAX.WARN} | Maximum percentage of used file descriptors. |
90 |
{$NOMAD.DISK.NAME.MATCHES} | The filter to include HashiCorp Nomad client disks by name. |
.* |
{$NOMAD.DISK.NAME.NOT_MATCHES} | The filter to exclude HashiCorp Nomad client disks by name. |
CHANGE_IF_NEEDED |
{$NOMAD.JOB.NAME.MATCHES} | The filter to include HashiCorp Nomad client jobs by name. |
.* |
{$NOMAD.JOB.NAME.NOT_MATCHES} | The filter to exclude HashiCorp Nomad client jobs by name. |
CHANGE_IF_NEEDED |
{$NOMAD.JOB.NAMESPACE.MATCHES} | The filter to include HashiCorp Nomad client jobs by namespace. |
.* |
{$NOMAD.JOB.NAMESPACE.NOT_MATCHES} | The filter to exclude HashiCorp Nomad client jobs by namespace. |
CHANGE_IF_NEEDED |
{$NOMAD.JOB.TYPE.MATCHES} | The filter to include HashiCorp Nomad client jobs by type. |
.* |
{$NOMAD.JOB.TYPE.NOT_MATCHES} | The filter to exclude HashiCorp Nomad client jobs by type. |
CHANGE_IF_NEEDED |
{$NOMAD.JOB.TASK.GROUP.MATCHES} | The filter to include HashiCorp Nomad client jobs by task group belonging. |
.* |
{$NOMAD.JOB.TASK.GROUP.NOT_MATCHES} | The filter to exclude HashiCorp Nomad client jobs by task group belonging. |
CHANGE_IF_NEEDED |
{$NOMAD.DRIVER.NAME.MATCHES} | The filter to include HashiCorp Nomad client drivers by name. |
.* |
{$NOMAD.DRIVER.NAME.NOT_MATCHES} | The filter to exclude HashiCorp Nomad client drivers by name. |
CHANGE_IF_NEEDED |
{$NOMAD.DRIVER.DETECT.MATCHES} | The filter to include HashiCorp Nomad client drivers by detection state. Possible filtering values: |
.* |
{$NOMAD.DRIVER.DETECT.NOT_MATCHES} | The filter to exclude HashiCorp Nomad client drivers by detection state. Possible filtering values: |
CHANGE_IF_NEEDED |
{$NOMAD.CPU.UTIL.MIN} | CPU utilization threshold. Measured as a percentage. |
90 |
{$NOMAD.RAM.AVAIL.MIN} | CPU utilization threshold. Measured as a percentage. |
5 |
{$NOMAD.INODES.FREE.MIN.WARN} | Warning threshold of the filesystem metadata utilization. Measured as a percentage. |
20 |
{$NOMAD.INODES.FREE.MIN.CRIT} | Critical threshold of the filesystem metadata utilization. Measured as a percentage. |
10 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Telemetry get | Telemetry data in raw format. |
HTTP agent | nomad.client.data.get Preprocessing
|
Metrics | Nomad client metrics in raw format. |
Dependent item | nomad.client.metrics.get Preprocessing
|
Monitoring API response | Monitoring API response message. |
Dependent item | nomad.client.data.api.response Preprocessing
|
Service [rpc] state | Current [rpc] service state. |
Simple check | net.tcp.service[tcp,,{$NOMAD.CLIENT.RPC.PORT}] Preprocessing
|
Service [serf] state | Current [serf] service state. |
Simple check | net.tcp.service[tcp,,{$NOMAD.CLIENT.SERF.PORT}] Preprocessing
|
CPU allocated | Total amount of CPU shares the scheduler has allocated to tasks. |
Dependent item | nomad.client.allocated.cpu Preprocessing
|
CPU unallocated | Total amount of CPU shares free for the scheduler to allocate to tasks. |
Dependent item | nomad.client.unallocated.cpu Preprocessing
|
Memory allocated | Total amount of memory the scheduler has allocated to tasks. |
Dependent item | nomad.client.allocated.memory Preprocessing
|
Memory unallocated | Total amount of memory free for the scheduler to allocate to tasks. |
Dependent item | nomad.client.unallocated.memory Preprocessing
|
Disk allocated | Total amount of disk space the scheduler has allocated to tasks. |
Dependent item | nomad.client.allocated.disk Preprocessing
|
Disk unallocated | Total amount of disk space free for the scheduler to allocate to tasks. |
Dependent item | nomad.client.unallocated.disk Preprocessing
|
Allocations blocked | Number of allocations waiting for previous versions. |
Dependent item | nomad.client.allocations.blocked Preprocessing
|
Allocations migrating | Number of allocations migrating data from previous versions. |
Dependent item | nomad.client.allocations.migrating Preprocessing
|
Allocations pending | Number of allocations pending (received by the client but not yet running). |
Dependent item | nomad.client.allocations.pending Preprocessing
|
Allocations starting | Number of allocations starting. |
Dependent item | nomad.client.allocations.start Preprocessing
|
Allocations running | Number of allocations running. |
Dependent item | nomad.client.allocations.running Preprocessing
|
Allocations terminal | Number of allocations terminal. |
Dependent item | nomad.client.allocations.terminal Preprocessing
|
Allocations failed, rate | Number of allocations failed. |
Dependent item | nomad.client.allocations.failed Preprocessing
|
Allocations completed, rate | Number of allocations completed. |
Dependent item | nomad.client.allocations.complete Preprocessing
|
Allocations restarted, rate | Number of allocations restarted. |
Dependent item | nomad.client.allocations.restart Preprocessing
|
Allocations OOM killed | Number of allocations OOM killed. |
Dependent item | nomad.client.allocations.oom_killed Preprocessing
|
CPU idle utilization | CPU utilization in idle state. |
Dependent item | nomad.client.cpu.idle Preprocessing
|
CPU system utilization | CPU utilization in system space. |
Dependent item | nomad.client.cpu.system Preprocessing
|
CPU total utilization | Total CPU utilization. |
Dependent item | nomad.client.cpu.total Preprocessing
|
CPU user utilization | CPU utilization in user space. |
Dependent item | nomad.client.cpu.user Preprocessing
|
Memory available | Total amount of memory available to processes which includes free and cached memory. |
Dependent item | nomad.client.memory.available Preprocessing
|
Memory free | Amount of memory which is free. |
Dependent item | nomad.client.memory.free Preprocessing
|
Memory size | Total amount of physical memory on the node. |
Dependent item | nomad.client.memory.total Preprocessing
|
Memory used | Amount of memory used by processes. |
Dependent item | nomad.client.memory.used Preprocessing
|
Uptime | Uptime of the host running the Nomad client. |
Dependent item | nomad.client.uptime Preprocessing
|
Node info get | Node info data in raw format. |
HTTP agent | nomad.client.node.info.get Preprocessing
|
Nomad client version | Nomad client version. |
Dependent item | nomad.client.version Preprocessing
|
Nodes API response | Nodes API response message. |
Dependent item | nomad.client.node.info.api.response Preprocessing
|
Allocated jobs get | Allocated jobs data in raw format. |
HTTP agent | nomad.client.job.allocs.get Preprocessing
|
Allocations API response | Allocations API response message. |
Dependent item | nomad.client.job.allocs.api.response Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HashiCorp Nomad Client: Monitoring API connection has failed | Monitoring API connection has failed. |
find(/HashiCorp Nomad Client by HTTP/nomad.client.data.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0 |Average |
Manual close: Yes | |
HashiCorp Nomad Client: Service [rpc] is down | Cannot establish the connection to [rpc] service port {$NOMAD.CLIENT.RPC.PORT}. |
last(/HashiCorp Nomad Client by HTTP/net.tcp.service[tcp,,{$NOMAD.CLIENT.RPC.PORT}]) = 0 |Average |
Manual close: Yes | |
HashiCorp Nomad Client: Service [serf] is down | Cannot establish the connection to [serf] service port {$NOMAD.CLIENT.SERF.PORT}. |
last(/HashiCorp Nomad Client by HTTP/net.tcp.service[tcp,,{$NOMAD.CLIENT.SERF.PORT}]) = 0 |Average |
Manual close: Yes | |
HashiCorp Nomad Client: OOM killed allocations found | OOM killed allocations found. |
last(/HashiCorp Nomad Client by HTTP/nomad.client.allocations.oom_killed) > 0 |Warning |
Manual close: Yes | |
HashiCorp Nomad Client: High CPU utilization | CPU utilization is too high. The system might be slow to respond. |
min(/HashiCorp Nomad Client by HTTP/nomad.client.cpu.total, 10m) >= {$NOMAD.CPU.UTIL.MIN} |Average |
||
HashiCorp Nomad Client: High memory utilization | RAM utilization is too high. The system might be slow to respond. |
(min(/HashiCorp Nomad Client by HTTP/nomad.client.memory.available, 10m) / last(/HashiCorp Nomad Client by HTTP/nomad.client.memory.total))*100 <= {$NOMAD.RAM.AVAIL.MIN} |Average |
||
HashiCorp Nomad Client: The host has been restarted | The host uptime is less than 10 minutes. |
last(/HashiCorp Nomad Client by HTTP/nomad.client.uptime) < 10m |Warning |
Manual close: Yes | |
HashiCorp Nomad Client: Nomad client version has changed | Nomad client version has changed. |
change(/HashiCorp Nomad Client by HTTP/nomad.client.version)<>0 |Info |
Manual close: Yes | |
HashiCorp Nomad Client: Nodes API connection has failed | Nodes API connection has failed. |
find(/HashiCorp Nomad Client by HTTP/nomad.client.node.info.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0 |Average |
Manual close: Yes Depends on:
|
|
HashiCorp Nomad Client: Allocations API connection has failed | Allocations API connection has failed. |
find(/HashiCorp Nomad Client by HTTP/nomad.client.job.allocs.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0 |Average |
Manual close: Yes Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Drivers discovery | Client drivers discovery. |
Dependent item | nomad.client.drivers.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Driver [{#DRIVER.NAME}] state | Driver [{#DRIVER.NAME}] state. |
Dependent item | nomad.client.driver.state["{#DRIVER.NAME}"] Preprocessing
|
Driver [{#DRIVER.NAME}] detection state | Driver [{#DRIVER.NAME}] detection state. |
Dependent item | nomad.client.driver.detected["{#DRIVER.NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HashiCorp Nomad Client: Driver [{#DRIVER.NAME}] is in unhealthy state | The [{#DRIVER.NAME}] driver detected, but its state is unhealthy. |
last(/HashiCorp Nomad Client by HTTP/nomad.client.driver.state["{#DRIVER.NAME}"]) = 0 and last(/HashiCorp Nomad Client by HTTP/nomad.client.driver.detected["{#DRIVER.NAME}"]) = 1 |Warning |
Manual close: Yes | |
HashiCorp Nomad Client: Driver [{#DRIVER.NAME}] detection state has changed | The [{#DRIVER.NAME}] driver detection state has changed. |
change(/HashiCorp Nomad Client by HTTP/nomad.client.driver.detected["{#DRIVER.NAME}"]) <> 0 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Physical disks discovery | Physical disks discovery. |
Dependent item | nomad.client.disk.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Disk ["{#DEV.NAME}"] space available | Amount of space which is available on ["{#DEV.NAME}"] disk. |
Dependent item | nomad.client.disk.available["{#DEV.NAME}"] Preprocessing
|
Disk ["{#DEV.NAME}"] inodes utilization | Disk space consumed by the inodes on ["{#DEV.NAME}"] disk. |
Dependent item | nomad.client.disk.inodes_percent["{#DEV.NAME}"] Preprocessing
|
Disk ["{#DEV.NAME}"] size | Total size of the ["{#DEV.NAME}"] device. |
Dependent item | nomad.client.disk.size["{#DEV.NAME}"] Preprocessing
|
Disk ["{#DEV.NAME}"] space utilization | Percentage of disk ["{#DEV.NAME}"] space used. |
Dependent item | nomad.client.disk.used_percent["{#DEV.NAME}"] Preprocessing
|
Disk ["{#DEV.NAME}"] space used | Amount of disk ["{#DEV.NAME}"] space which has been used. |
Dependent item | nomad.client.disk.used["{#DEV.NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HashiCorp Nomad Client: Running out of free inodes on [{#DEV.NAME}] device | It may become impossible to write to a disk if there are no index nodes left. |
min(/HashiCorp Nomad Client by HTTP/nomad.client.disk.inodes_percent["{#DEV.NAME}"],5m) >= {$NOMAD.INODES.FREE.MIN.WARN:"{#DEV.NAME}"} |Warning |
Manual close: Yes Depends on:
|
|
HashiCorp Nomad Client: Running out of free inodes on [{#DEV.NAME}] device | It may become impossible to write to a disk if there are no index nodes left. |
min(/HashiCorp Nomad Client by HTTP/nomad.client.disk.inodes_percent["{#DEV.NAME}"],5m) >= {$NOMAD.INODES.FREE.MIN.CRIT:"{#DEV.NAME}"} |Average |
Manual close: Yes | |
HashiCorp Nomad Client: High disk [{#DEV.NAME}] utilization | High disk [{#DEV.NAME}] utilization. |
min(/HashiCorp Nomad Client by HTTP/nomad.client.disk.used_percent["{#DEV.NAME}"],5m) >= {$NOMAD.DISK.UTIL.MIN.WARN:"{#DEV.NAME}"} |Warning |
Manual close: Yes Depends on:
|
|
HashiCorp Nomad Client: High disk [{#DEV.NAME}] utilization | High disk [{#DEV.NAME}] utilization. |
min(/HashiCorp Nomad Client by HTTP/nomad.client.disk.used_percent["{#DEV.NAME}"],5m) >= {$NOMAD.DISK.UTIL.MIN.CRIT:"{#DEV.NAME}"} |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Allocated jobs discovery | Allocated jobs discovery. |
Dependent item | nomad.client.alloc.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Job ["{#JOB.NAME}"] CPU allocated | Total CPU resources allocated by the ["{#JOB.NAME}"] job across all cores. |
Dependent item | nomad.client.allocs.cpu.allocated["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing
|
Job ["{#JOB.NAME}"] CPU system utilization | Total CPU resources consumed by the ["{#JOB.NAME}"] job in system space. |
Dependent item | nomad.client.allocs.cpu.system["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing
|
Job ["{#JOB.NAME}"] CPU user utilization | Total CPU resources consumed by the ["{#JOB.NAME}"] job in user space. |
Dependent item | nomad.client.allocs.cpu.user["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing
|
Job ["{#JOB.NAME}"] CPU total utilization | Total CPU resources consumed by the ["{#JOB.NAME}"] job across all cores. |
Dependent item | nomad.client.allocs.cpu.total_percent["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing
|
Job ["{#JOB.NAME}"] CPU throttled periods time | Total number of CPU periods that the ["{#JOB.NAME}"] job was throttled. |
Dependent item | nomad.client.allocs.cpu.throttled_periods["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing
|
Job ["{#JOB.NAME}"] CPU throttled time | Total time that the ["{#JOB.NAME}"] job was throttled. |
Dependent item | nomad.client.allocs.cpu.throttled_time["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing
|
Job ["{#JOB.NAME}"] CPU ticks | CPU ticks consumed by the process for the ["{#JOB.NAME}"] job in the last collection interval. |
Dependent item | nomad.client.allocs.cpu.total_ticks["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing
|
Job ["{#JOB.NAME}"] Memory allocated | Amount of memory allocated by the ["{#JOB.NAME}"] job. |
Dependent item | nomad.client.allocs.memory.allocated["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing
|
Job ["{#JOB.NAME}"] Memory cached | Amount of memory cached by the ["{#JOB.NAME}"] job. |
Dependent item | nomad.client.allocs.memory.cache["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing
|
Job ["{#JOB.NAME}"] Memory used | Total amount of memory used by the ["{#JOB.NAME}"] job. |
Dependent item | nomad.client.allocs.memory.usage["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing
|
Job ["{#JOB.NAME}"] Memory swapped | Amount of memory swapped by the ["{#JOB.NAME}"] job. |
Dependent item | nomad.client.allocs.memory.swap["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing
|
This template is designed to monitor HashiCorp Nomad servers by Zabbix. It works without any external scripts.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
vendor documentation
.{$NOMAD.SERVER.API.SCHEME}
and {$NOMAD.SERVER.API.PORT}
macros to define the common Nomad API web schema and connection port.Additional information:
HTTP
and default API port - 4646
. If you're using servers discovery and you need to re-define macros for the particular host created from prototype, use the context macros like {{$NOMAD.SERVER.API.SCHEME:NECESSARY.IP
}} or/and {{$NOMAD.SERVER.API.PORT:NECESSARY.IP
}} on master host or template level.{$NOMAD.REDUNDANCY.MIN}
macro value, based on your cluster nodes amount to configure the failure tolerance triggers correctly.Useful links:
Name | Description | Default |
---|---|---|
{$NOMAD.SERVER.API.SCHEME} | Nomad SERVER API scheme. |
http |
{$NOMAD.SERVER.API.PORT} | Nomad SERVER API port. |
4646 |
{$NOMAD.TOKEN} | Nomad authentication token. |
<PUT YOUR AUTH TOKEN> |
{$NOMAD.DATA.TIMEOUT} | Response timeout for an API. |
15s |
{$NOMAD.HTTP.PROXY} | Sets the HTTP proxy for HTTP agent item. If this parameter is empty, then no proxy is used. |
|
{$NOMAD.API.RESPONSE.SUCCESS} | HTTP API successful response code. Availability triggers threshold. Change, if needed. |
200 |
{$NOMAD.SERVER.RPC.PORT} | Nomad RPC service port. |
4647 |
{$NOMAD.SERVER.SERF.PORT} | Nomad serf service port. |
4648 |
{$NOMAD.REDUNDANCY.MIN} | Amount of redundant servers to keep the cluster safe. Default value - '1' for the 3-nodes cluster. Change if needed. |
1 |
{$NOMAD.OPEN.FDS.MAX} | Maximum percentage of used file descriptors. |
90 |
{$NOMAD.SERVER.LEADER.LATENCY} | Leader last contact latency threshold. |
0.3s |
Name | Description | Type | Key and additional info |
---|---|---|---|
Telemetry get | Telemetry data in raw format. |
HTTP agent | nomad.server.data.get Preprocessing
|
Metrics | Nomad server metrics in raw format. |
Dependent item | nomad.server.metrics.get Preprocessing
|
Monitoring API response | Monitoring API response message. |
Dependent item | nomad.server.data.api.response Preprocessing
|
Internal stats get | Internal stats data in raw format. |
HTTP agent | nomad.server.stats.get Preprocessing
|
Internal stats API response | Internal stats API response message. |
Dependent item | nomad.server.stats.api.response Preprocessing
|
Nomad server version | Nomad server version. |
Dependent item | nomad.server.version Preprocessing
|
Nomad raft version | Nomad raft version. |
Dependent item | nomad.raft.version Preprocessing
|
Raft peers | Current cluster raft peers amount. |
Dependent item | nomad.server.raft.peers Preprocessing
|
Cluster role | Current role in the cluster. |
Dependent item | nomad.server.raft.cluster_role Preprocessing
|
CPU time, rate | Total user and system CPU time spent in seconds. |
Dependent item | nomad.server.cpu.time Preprocessing
|
Memory used | Memory utilization in bytes. |
Dependent item | nomad.server.runtime.alloc_bytes Preprocessing
|
Virtual memory size | Virtual memory size in bytes. |
Dependent item | nomad.server.virtualmemorybytes Preprocessing
|
Resident memory size | Resident memory size in bytes. |
Dependent item | nomad.server.residentmemorybytes Preprocessing
|
Heap objects | Number of objects on the heap. General memory pressure indicator. |
Dependent item | nomad.server.runtime.heap_objects Preprocessing
|
Open file descriptors | Number of open file descriptors. |
Dependent item | nomad.server.processopenfds Preprocessing
|
Open file descriptors, max | Maximum number of open file descriptors. |
Dependent item | nomad.server.processmaxfds Preprocessing
|
Goroutines | Number of goroutines and general load pressure indicator. |
Dependent item | nomad.server.runtime.num_goroutines Preprocessing
|
Evaluations pending | Evaluations that are pending until an existing evaluation for the same job completes. |
Dependent item | nomad.server.broker.total_pending Preprocessing
|
Evaluations ready | Number of evaluations ready to be processed. |
Dependent item | nomad.server.broker.total_ready Preprocessing
|
Evaluations unacked | Evaluations dispatched for processing but incomplete. |
Dependent item | nomad.server.broker.total_unacked Preprocessing
|
CPU shares for blocked evaluations | Amount of CPU shares requested by blocked evals. |
Dependent item | nomad.server.blocked_evals.cpu Preprocessing
|
Memory shares by blocked evaluations | Amount of memory requested by blocked evals. |
Dependent item | nomad.server.blocked_evals.memory Preprocessing
|
CPU shares for blocked job evaluations | Amount of CPU shares requested by blocked evals of a job. |
Dependent item | nomad.server.blocked_evals.job.cpu Preprocessing
|
Memory shares for blocked job evaluations | Amount of memory requested by blocked evals of a job. |
Dependent item | nomad.server.blocked_evals.job.memory Preprocessing
|
Evaluations blocked | Count of evals in the blocked state for any reason (cluster resource exhaustion or quota limits). |
Dependent item | nomad.server.blockedevals.totalblocked Preprocessing
|
Evaluations escaped | Count of evals that have escaped computed node classes. This indicates a scheduler optimization was skipped and is not usually a source of concern. |
Dependent item | nomad.server.blockedevals.totalescaped Preprocessing
|
Evaluations waiting | Count of evals waiting to be enqueued. |
Dependent item | nomad.server.broker.total_waiting Preprocessing
|
Evaluations blocked due to quota limit | Count of blocked evals due to quota limits (the resources for these jobs are not counted in other blockedevals metrics, except for totalblocked). |
Dependent item | nomad.server.blockedevals.totalquota_limit Preprocessing
|
Evaluations enqueue time | Average time elapsed with evaluations waiting to be enqueued. |
Dependent item | nomad.server.broker.eval_waiting Preprocessing
|
RPC evaluation acknowledgement time | Time elapsed for Eval.Ack RPC call. |
Dependent item | nomad.server.eval.ack Preprocessing
|
RPC job summary time | Time elapsed for Job.Summary RPC call. |
Dependent item | nomad.server.jobsummary.getjob_summary Preprocessing
|
Heartbeats active | Number of active heartbeat timers. Each timer represents a Nomad client connection. |
Dependent item | nomad.server.heartbeat.active Preprocessing
|
RPC requests, rate | Number of RPC requests being handled. |
Dependent item | nomad.server.rpc.request Preprocessing
|
RPC error requests, rate | Number of RPC requests being handled that result in an error. |
Dependent item | nomad.server.rpc.request_error Preprocessing
|
RPC queries, rate | Number of RPC queries. |
Dependent item | nomad.server.rpc.query Preprocessing
|
RPC job allocations time | Time elapsed for Job.Allocations RPC call. |
Dependent item | nomad.server.job.allocations Preprocessing
|
RPC job evaluations time | Time elapsed for Job.Evaluations RPC call. |
Dependent item | nomad.server.job.evaluations Preprocessing
|
RPC get job time | Time elapsed for Job.GetJob RPC call. |
Dependent item | nomad.server.job.get_job Preprocessing
|
Plan apply time | Time elapsed to apply a plan. |
Dependent item | nomad.server.plan.apply Preprocessing
|
Plan evaluate time | Time elapsed to evaluate a plan. |
Dependent item | nomad.server.plan.evaluate Preprocessing
|
RPC plan submit time | Time elapsed for Plan.Submit RPC call. |
Dependent item | nomad.server.plan.submit Preprocessing
|
Plan raft index processing time | Time elapsed that planner waits for the raft index of the plan to be processed. |
Dependent item | nomad.server.plan.waitforindex Preprocessing
|
RPC list time | Time elapsed for Node.List RPC call. |
Dependent item | nomad.server.client.list Preprocessing
|
RPC update allocations time | Time elapsed for Node.UpdateAlloc RPC call. |
Dependent item | nomad.server.client.update_alloc Preprocessing
|
RPC update status time | Time elapsed for Node.UpdateStatus RPC call. |
Dependent item | nomad.server.client.update_status Preprocessing
|
RPC get client allocs time | Time elapsed for Node.GetClientAllocs RPC call. |
Dependent item | nomad.server.client.getclientallocs Preprocessing
|
RPC eval dequeue time | Time elapsed for Eval.Dequeue RPC call. |
Dependent item | nomad.server.client.dequeue Preprocessing
|
Vault token last renewal | Time since last successful Vault token renewal. |
Dependent item | nomad.server.vault.tokenlastrenewal Preprocessing
|
Vault token next renewal | Time until next Vault token renewal attempt. |
Dependent item | nomad.server.vault.tokennextrenewal Preprocessing
|
Vault token TTL | Time to live for Vault token. |
Dependent item | nomad.server.vault.token_ttl Preprocessing
|
Vault tokens revoked | Count of revoked tokens. |
Dependent item | nomad.server.vault.distributedtokensrevoked Preprocessing
|
Jobs dead | Number of dead jobs. |
Dependent item | nomad.server.job_status.dead Preprocessing
|
Jobs pending | Number of pending jobs. |
Dependent item | nomad.server.job_status.pending Preprocessing
|
Jobs running | Number of running jobs. |
Dependent item | nomad.server.job_status.running Preprocessing
|
Job allocations completed | Number of complete allocations for a job. |
Dependent item | nomad.server.job_summary.complete Preprocessing
|
Job allocations failed | Number of failed allocations for a job. |
Dependent item | nomad.server.job_summary.failed Preprocessing
|
Job allocations lost | Number of lost allocations for a job. |
Dependent item | nomad.server.job_summary.lost Preprocessing
|
Job allocations unknown | Number of unknown allocations for a job. |
Dependent item | nomad.server.job_summary.unknown Preprocessing
|
Job allocations queued | Number of queued allocations for a job. |
Dependent item | nomad.server.job_summary.queued Preprocessing
|
Job allocations running | Number of running allocations for a job. |
Dependent item | nomad.server.job_summary.running Preprocessing
|
Job allocations starting | Number of starting allocations for a job. |
Dependent item | nomad.server.job_summary.starting Preprocessing
|
Gossip time | Time elapsed to broadcast gossip messages. |
Dependent item | nomad.server.memberlist.gossip Preprocessing
|
Leader barrier time | Time elapsed to establish a raft barrier during leader transition. |
Dependent item | nomad.server.leader.barrier Preprocessing
|
Reconcile peer time | Time elapsed to reconcile a serf peer with state store. |
Dependent item | nomad.server.leader.reconcile_member Preprocessing
|
Total reconcile time | Time elapsed to reconcile all serf peers with state store. |
Dependent item | nomad.server.leader.reconcile Preprocessing
|
Leader last contact | Time since last contact to leader. General indicator of Raft latency. |
Dependent item | nomad.server.raft.leader.lastContact Preprocessing
|
Plan queue | Count of evals in the plan queue. |
Dependent item | nomad.server.plan.queue_depth Preprocessing
|
Worker evaluation create time | Time elapsed for worker to create an eval. |
Dependent item | nomad.server.worker.create_eval Preprocessing
|
Worker evaluation dequeue time | Time elapsed for worker to dequeue an eval. |
Dependent item | nomad.server.worker.dequeue_eval Preprocessing
|
Worker invoke scheduler time | Time elapsed for worker to invoke the scheduler. |
Dependent item | nomad.server.worker.invokeschedulerservice Preprocessing
|
Worker acknowledgement send time | Time elapsed for worker to send acknowledgement. |
Dependent item | nomad.server.worker.send_ack Preprocessing
|
Worker submit plan time | Time elapsed for worker to submit plan. |
Dependent item | nomad.server.worker.submit_plan Preprocessing
|
Worker update evaluation time | Time elapsed for worker to submit updated eval. |
Dependent item | nomad.server.worker.update_eval Preprocessing
|
Worker log replication time | Time elapsed that worker waits for the raft index of the eval to be processed. |
Dependent item | nomad.server.worker.waitforindex Preprocessing
|
Raft calls blocked, rate | Count of blocking raft API calls. |
Dependent item | nomad.server.raft.barrier Preprocessing
|
Raft commit logs enqueued | Count of logs enqueued. |
Dependent item | nomad.server.raft.commitnumlogs Preprocessing
|
Raft transactions, rate | Number of Raft transactions. |
Dependent item | nomad.server.raft.apply Preprocessing
|
Raft commit time | Time elapsed to commit writes. |
Dependent item | nomad.server.raft.commit_time Preprocessing
|
Raft transaction commit time | Raft transaction commit time. |
Dependent item | nomad.server.raft.replication.appendEntries Preprocessing
|
FSM apply time | Time elapsed to apply write to FSM. |
Dependent item | nomad.server.raft.fsm.apply Preprocessing
|
FSM enqueue time | Time elapsed to enqueue write to FSM. |
Dependent item | nomad.server.raft.fsm.enqueue Preprocessing
|
FSM autopilot time | Time elapsed to apply Autopilot raft entry. |
Dependent item | nomad.server.raft.fsm.autopilot Preprocessing
|
FSM register node time | Time elapsed to apply RegisterNode raft entry. |
Dependent item | nomad.server.raft.fsm.register_node Preprocessing
|
FSM index | Current index applied to FSM. |
Dependent item | nomad.server.raft.applied_index Preprocessing
|
Raft last index | Most recent index seen. |
Dependent item | nomad.server.raft.last_index Preprocessing
|
Dispatch log time | Time elapsed to write log, mark in flight, and start replication. |
Dependent item | nomad.server.raft.leader.dispatch_log Preprocessing
|
Logs dispatched | Count of logs dispatched. |
Dependent item | nomad.server.raft.leader.dispatchnumlogs Preprocessing
|
Heartbeat fails | Count of failing to heartbeat and starting election. |
Dependent item | nomad.server.raft.transition.heartbeat_timeout Preprocessing
|
Objects freed, rate | Count of objects freed from heap by go runtime GC. |
Dependent item | nomad.server.runtime.free_count Preprocessing
|
GC pause time | Go runtime GC pause times. |
Dependent item | nomad.server.runtime.gcpausens Preprocessing
|
GC metadata size | Go runtime GC metadata size in bytes. |
Dependent item | nomad.server.runtime.sys_bytes Preprocessing
|
GC runs | Count of go runtime GC runs. |
Dependent item | nomad.server.runtime.totalgcruns Preprocessing
|
Memberlist events | Count of memberlist events received. |
Dependent item | nomad.server.serf.queue.event Preprocessing
|
Memberlist changes | Count of memberlist changes. |
Dependent item | nomad.server.serf.queue.intent Preprocessing
|
Memberlist queries | Count of memberlist queries. |
Dependent item | nomad.server.serf.queue.queries Preprocessing
|
Snapshot index | Current snapshot index. |
Dependent item | nomad.server.state.snapshot.index Preprocessing
|
Services ready to schedule | Count of service evals ready to be scheduled. |
Dependent item | nomad.server.broker.service_ready Preprocessing
|
Services unacknowledged | Count of unacknowledged service evals. |
Dependent item | nomad.server.broker.service_unacked Preprocessing
|
System evaluations ready to schedule | Count of service evals ready to be scheduled. |
Dependent item | nomad.server.broker.system_ready Preprocessing
|
System evaluations unacknowledged | Count of unacknowledged system evals. |
Dependent item | nomad.server.broker.system_unacked Preprocessing
|
BoltDB free pages | Number of BoltDB free pages. |
Dependent item | nomad.server.raft.boltdb.numfreepages Preprocessing
|
BoltDB pending pages | Number of BoltDB pending pages. |
Dependent item | nomad.server.raft.boltdb.numpendingpages Preprocessing
|
BoltDB free page bytes | Number of free page bytes. |
Dependent item | nomad.server.raft.boltdb.freepagebytes Preprocessing
|
BoltDB freelist bytes | Number of freelist bytes. |
Dependent item | nomad.server.raft.boltdb.freelist_bytes Preprocessing
|
BoltDB read transactions, rate | Count of total read transactions. |
Dependent item | nomad.server.raft.boltdb.totalreadtxn Preprocessing
|
BoltDB open read transactions | Number of current open read transactions. |
Dependent item | nomad.server.raft.boltdb.openreadtxn Preprocessing
|
BoltDB pages in use | Number of pages in use. |
Dependent item | nomad.server.raft.boltdb.txstats.page_count Preprocessing
|
BoltDB page allocations, rate | Number of page allocations. |
Dependent item | nomad.server.raft.boltdb.txstats.page_alloc Preprocessing
|
BoltDB cursors | Count of total database cursors. |
Dependent item | nomad.server.raft.boltdb.txstats.cursor_count Preprocessing
|
BoltDB nodes, rate | Count of total database nodes. |
Dependent item | nomad.server.raft.boltdb.txstats.node_count Preprocessing
|
BoltDB node dereferences, rate | Count of total database node dereferences. |
Dependent item | nomad.server.raft.boltdb.txstats.node_deref Preprocessing
|
BoltDB rebalance operations, rate | Count of total rebalance operations. |
Dependent item | nomad.server.raft.boltdb.txstats.rebalance Preprocessing
|
BoltDB split operations, rate | Count of total split operations. |
Dependent item | nomad.server.raft.boltdb.txstats.split Preprocessing
|
BoltDB spill operations, rate | Count of total spill operations. |
Dependent item | nomad.server.raft.boltdb.txstats.spill Preprocessing
|
BoltDB write operations, rate | Count of total write operations. |
Dependent item | nomad.server.raft.boltdb.txstats.write Preprocessing
|
BoltDB rebalance time | Sample of rebalance operation times. |
Dependent item | nomad.server.raft.boltdb.txstats.rebalance_time Preprocessing
|
BoltDB spill time | Sample of spill operation times. |
Dependent item | nomad.server.raft.boltdb.txstats.spill_time Preprocessing
|
BoltDB write time | Sample of write operation times. |
Dependent item | nomad.server.raft.boltdb.txstats.write_time Preprocessing
|
Service [rpc] state | Current [rpc] service state. |
Simple check | net.tcp.service[tcp,,{$NOMAD.SERVER.RPC.PORT}] Preprocessing
|
Service [serf] state | Current [serf] service state. |
Simple check | net.tcp.service[tcp,,{$NOMAD.SERVER.SERF.PORT}] Preprocessing
|
Namespace list time | Time elapsed for Namespace.ListNamespaces. |
Dependent item | nomad.server.namespace.list_namespace Preprocessing
|
Autopilot state | Current autopilot state. |
Dependent item | nomad.server.autopilot.state Preprocessing
|
Autopilot failure tolerance | The number of redundant healthy servers that can fail without causing an outage. |
Dependent item | nomad.server.autopilot.failure_tolerance Preprocessing
|
FSM allocation client update time | Time elapsed to apply AllocClientUpdate raft entry. |
Dependent item | nomad.server.allocclientupdate Preprocessing
|
FSM apply plan results time | Time elapsed to apply ApplyPlanResults raft entry. |
Dependent item | nomad.server.fsm.applyplanresults Preprocessing
|
FSM update evaluation time | Time elapsed to apply UpdateEval raft entry. |
Dependent item | nomad.server.fsm.update_eval Preprocessing
|
FSM job registration time | Time elapsed to apply RegisterJob raft entry. |
Dependent item | nomad.server.fsm.register_job Preprocessing
|
Allocation reschedule attempts | Count of attempts to reschedule an allocation. |
Dependent item | nomad.server.scheduler.allocs.rescheduled.attempted Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HashiCorp Nomad Server: Monitoring API connection has failed | Monitoring API connection has failed. |
find(/HashiCorp Nomad Server by HTTP/nomad.server.data.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0 |Average |
Manual close: Yes | |
HashiCorp Nomad Server: Internal stats API connection has failed | Internal stats API connection has failed. |
find(/HashiCorp Nomad Server by HTTP/nomad.server.stats.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0 |Average |
Manual close: Yes Depends on:
|
|
HashiCorp Nomad Server: Nomad server version has changed | Nomad server version has changed. |
change(/HashiCorp Nomad Server by HTTP/nomad.server.version)<>0 |Info |
Manual close: Yes | |
HashiCorp Nomad Server: Cluster role has changed | Cluster role has changed. |
change(/HashiCorp Nomad Server by HTTP/nomad.server.raft.cluster_role) <> 0 |Info |
Manual close: Yes | |
HashiCorp Nomad Server: Current number of open files is too high | Heavy file descriptor usage (i.e., near the process file descriptor limit) indicates a potential file descriptor exhaustion issue. |
min(/HashiCorp Nomad Server by HTTP/nomad.server.process_open_fds,5m)/last(/HashiCorp Nomad Server by HTTP/nomad.server.process_max_fds)*100>{$NOMAD.OPEN.FDS.MAX} |Warning |
||
HashiCorp Nomad Server: Dead jobs found | Jobs with the |
last(/HashiCorp Nomad Server by HTTP/nomad.server.job_status.dead) > 0 and nodata(/HashiCorp Nomad Server by HTTP/nomad.server.job_status.dead,5m) = 0 |Warning |
Manual close: Yes | |
HashiCorp Nomad Server: Leader last contact timeout exceeded | The nomad.raft.leader.lastContact metric is a general indicator of Raft latency which can be used to observe how Raft timing is performing and guide infrastructure provisioning. |
min(/HashiCorp Nomad Server by HTTP/nomad.server.raft.leader.lastContact,5m) >= {$NOMAD.SERVER.LEADER.LATENCY} and nodata(/HashiCorp Nomad Server by HTTP/nomad.server.raft.leader.lastContact,5m) = 0 |Warning |
||
HashiCorp Nomad Server: Service [rpc] is down | Cannot establish the connection to [rpc] service port {$NOMAD.SERVER.RPC.PORT}. |
last(/HashiCorp Nomad Server by HTTP/net.tcp.service[tcp,,{$NOMAD.SERVER.RPC.PORT}]) = 0 |Average |
Manual close: Yes | |
HashiCorp Nomad Server: Service [serf] is down | Cannot establish the connection to [serf] service port {$NOMAD.SERVER.SERF.PORT}. |
last(/HashiCorp Nomad Server by HTTP/net.tcp.service[tcp,,{$NOMAD.SERVER.SERF.PORT}]) = 0 |Average |
Manual close: Yes | |
HashiCorp Nomad Server: Autopilot is unhealthy | The autopilot is in unhealthy state. The successful failover probability is extremely low. |
last(/HashiCorp Nomad Server by HTTP/nomad.server.autopilot.state) = 0 and nodata(/HashiCorp Nomad Server by HTTP/nomad.server.autopilot.state,5m) = 0 |Average |
Manual close: Yes | |
HashiCorp Nomad Server: Autopilot redundancy is low | The autopilot redundancy is low. |
last(/HashiCorp Nomad Server by HTTP/nomad.server.autopilot.failure_tolerance) < {$NOMAD.REDUNDANCY.MIN} and nodata(/HashiCorp Nomad Server by HTTP/nomad.server.autopilot.failure_tolerance,5m) = 0 |Warning |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Nginx Plus monitoring by Zabbix via HTTP and doesn't require any external scripts.
The monitoring data of the live activity is generated by the NGINX Plus API.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
<scheme>://<host>:<port>/<location>/
.Note that depending on the number of zones and upstreams discovery operation may be expensive. Therefore, use the following filters with these macros:
Name | Description | Default |
---|---|---|
{$NGINX.API.ENDPOINT} | NGINX Plus API URL in the format |
|
{$NGINX.LLD.FILTER.HTTP.ZONE.MATCHES} | The filter to include the necessary discovered HTTP server zones. |
.* |
{$NGINX.LLD.FILTER.HTTP.ZONE.NOT_MATCHES} | The filter to exclude discovered HTTP server zones. |
CHANGE_IF_NEEDED |
{$NGINX.LLD.FILTER.HTTP.LOCATION.ZONE.MATCHES} | The filter to include the necessary discovered HTTP location zones. |
.* |
{$NGINX.LLD.FILTER.HTTP.LOCATION.ZONE.NOT_MATCHES} | The filter to exclude discovered HTTP location zones. |
CHANGE_IF_NEEDED |
{$NGINX.LLD.FILTER.HTTP.UPSTREAM.MATCHES} | The filter to include the necessary discovered HTTP upstreams. |
.* |
{$NGINX.LLD.FILTER.HTTP.UPSTREAM.NOT_MATCHES} | The filter to exclude discovered HTTP upstreams. |
CHANGE_IF_NEEDED |
{$NGINX.LLD.FILTER.STREAM.ZONE.MATCHES} | The filter to include discovered server zones of the "stream" directive. |
.* |
{$NGINX.LLD.FILTER.STREAM.ZONE.NOT_MATCHES} | The filter to exclude discovered server zones of the "stream" directive. |
CHANGE_IF_NEEDED |
{$NGINX.LLD.FILTER.STREAM.UPSTREAM.MATCHES} | The filter to include the necessary discovered upstreams of the "stream" directive. |
.* |
{$NGINX.LLD.FILTER.STREAM.UPSTREAM.NOT_MATCHES} | The filter to exclude discovered upstreams of the "stream" directive |
CHANGE_IF_NEEDED |
{$NGINX.LLD.FILTER.RESOLVER.MATCHES} | The filter to include the necessary discovered |
.* |
{$NGINX.LLD.FILTER.RESOLVER.NOT_MATCHES} | The filter to exclude discovered |
CHANGE_IF_NEEDED |
{$NGINX.DROP_RATE.MAX.WARN} | The critical rate of the dropped connections for a trigger expression. |
1 |
{$NGINX.HTTP.UPSTREAM.4XX.MAX.WARN} | The maximum percentage of errors with the status code |
5 |
{$NGINX.HTTP.UPSTREAM.5XX.MAX.WARN} | The maximum percentage of errors with the status code |
5 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get info | Return status of the NGINX running instance. |
HTTP agent | nginx.info |
Get connections | Returns the statistics of client connections. |
HTTP agent | nginx.connections |
Get SSL | Returns the SSL statistics. |
HTTP agent | nginx.ssl |
Get requests | Returns the status of the client's HTTP requests. |
HTTP agent | nginx.requests |
Get HTTP zones | Returns the status information for each HTTP server zone. |
HTTP agent | nginx.http.server_zones |
Get HTTP location zones | Returns the status information for each HTTP location zone. |
HTTP agent | nginx.http.location_zones |
Get HTTP upstreams | Returns the status of each HTTP upstream server group and its servers. |
HTTP agent | nginx.http.upstreams |
Get Stream server zones | Returns the status information for each server zone configured in the "stream" directive. |
HTTP agent | nginx.stream.server_zones |
Get Stream upstreams | Returns status of each stream upstream server group and its servers. |
HTTP agent | nginx.stream.upstreams |
Get resolvers | Returns the status information for each Resolver zone. |
HTTP agent | nginx.resolvers |
Get info error | The description of NGINX errors. |
Dependent item | nginx.info.error Preprocessing
|
Version | A version number of NGINX. |
Dependent item | nginx.info.version Preprocessing
|
Address | The address of the server that accepted status request. |
Dependent item | nginx.info.address Preprocessing
|
Generation | The total number of configuration reloads. |
Dependent item | nginx.info.generation Preprocessing
|
Uptime | The server uptime. |
Dependent item | nginx.info.uptime Preprocessing
|
Connections accepted, rate | The total number of accepted client connections per second. |
Dependent item | nginx.connections.accepted.rate Preprocessing
|
Connections dropped | The total number of dropped client connections. |
Dependent item | nginx.connections.dropped Preprocessing
|
Connections active | The current number of active client connections. |
Dependent item | nginx.connections.active Preprocessing
|
Connections idle | The current number of idle client connections. |
Dependent item | nginx.connections.idle Preprocessing
|
SSL handshakes, rate | The total number of successful SSL handshakes per second. |
Dependent item | nginx.ssl.handshakes.rate Preprocessing
|
SSL handshakes failed, rate | The total number of failed SSL handshakes per second. |
Dependent item | nginx.ssl.handshakes_failed.rate Preprocessing
|
SSL session reuses, rate | The total number of session reuses during SSL handshake per second. |
Dependent item | nginx.ssl.session_reuses.rate Preprocessing
|
Requests total, rate | The total number of client requests per second. |
Dependent item | nginx.requests.total.rate Preprocessing
|
Requests current | The current number of client requests. |
Dependent item | nginx.requests.current Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
NGINX Plus: Server response error | length(last(/NGINX Plus by HTTP/nginx.info.error))>0 |High |
|||
NGINX Plus: Version has changed | The Nginx version has changed. Acknowledge to close the problem manually. |
last(/NGINX Plus by HTTP/nginx.info.version,#1)<>last(/NGINX Plus by HTTP/nginx.info.version,#2) and length(last(/NGINX Plus by HTTP/nginx.info.version))>0 |Info |
Manual close: Yes | |
NGINX Plus: Host has been restarted | Uptime is less than 10 minutes. |
last(/NGINX Plus by HTTP/nginx.info.uptime)<10m |Info |
Manual close: Yes | |
NGINX Plus: Failed to fetch info data | Zabbix has not received any data for metrics for the last 30 minutes |
nodata(/NGINX Plus by HTTP/nginx.info.uptime,30m)=1 |Warning |
Manual close: Yes | |
NGINX Plus: High connections drop rate | The rate of dropped connections is greater than |
min(/NGINX Plus by HTTP/nginx.connections.dropped,5m) > {$NGINX.DROP_RATE.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
HTTP server zones discovery | Dependent item | nginx.http.server_zones.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
HTTP server zone [{#NAME}]: Raw data | The raw data of the HTTP server zone with the name |
Dependent item | nginx.http.server_zones.raw[{#NAME}] Preprocessing
|
HTTP server zone [{#NAME}]: Processing | The number of client requests that are currently being processed. |
Dependent item | nginx.http.server_zones.processing[{#NAME}] Preprocessing
|
HTTP server zone [{#NAME}]: Requests, rate | The total number of client requests received from clients per second. |
Dependent item | nginx.http.server_zones.requests.rate[{#NAME}] Preprocessing
|
HTTP server zone [{#NAME}]: Responses 1xx, rate | The number of responses with |
Dependent item | nginx.http.server_zones.responses.1xx.rate[{#NAME}] Preprocessing
|
HTTP server zone [{#NAME}]: Responses 2xx, rate | The number of responses with |
Dependent item | nginx.http.server_zones.responses.2xx.rate[{#NAME}] Preprocessing
|
HTTP server zone [{#NAME}]: Responses 3xx, rate | The number of responses with |
Dependent item | nginx.http.server_zones.responses.3xx.rate[{#NAME}] Preprocessing
|
HTTP server zone [{#NAME}]: Responses 4xx, rate | The number of responses with |
Dependent item | nginx.http.server_zones.responses.4xx.rate[{#NAME}] Preprocessing
|
HTTP server zone [{#NAME}]: Responses 5xx, rate | The number of responses with |
Dependent item | nginx.http.server_zones.responses.5xx.rate[{#NAME}] Preprocessing
|
HTTP server zone [{#NAME}]: Responses total, rate | The total number of responses sent to clients per second. |
Dependent item | nginx.http.server_zones.responses.total.rate[{#NAME}] Preprocessing
|
HTTP server zone [{#NAME}]: Discarded, rate | The total number of requests completed without sending a response per second. |
Dependent item | nginx.http.server_zones.discarded.rate[{#NAME}] Preprocessing
|
HTTP server zone [{#NAME}]: Received, rate | The total number of bytes received from clients per second. |
Dependent item | nginx.http.server_zones.received.rate[{#NAME}] Preprocessing
|
HTTP server zone [{#NAME}]: Sent, rate | The total number of bytes sent to clients per second. |
Dependent item | nginx.http.server_zones.sent.rate[{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
HTTP location zones discovery | Dependent item | nginx.http.location_zones.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
HTTP location zone [{#NAME}]: Raw data | The raw data of the location zone with the name |
Dependent item | nginx.http.location_zones.raw[{#NAME}] Preprocessing
|
HTTP location zone [{#NAME}]: Requests, rate | The total number of client requests received from clients per second. |
Dependent item | nginx.http.location_zones.requests.rate[{#NAME}] Preprocessing
|
HTTP location zone [{#NAME}]: Responses 1xx, rate | The number of responses with |
Dependent item | nginx.http.location_zones.responses.1xx.rate[{#NAME}] Preprocessing
|
HTTP location zone [{#NAME}]: Responses 2xx, rate | The number of responses with |
Dependent item | nginx.http.location_zones.responses.2xx.rate[{#NAME}] Preprocessing
|
HTTP location zone [{#NAME}]: Responses 3xx, rate | The number of responses with |
Dependent item | nginx.http.location_zones.responses.3xx.rate[{#NAME}] Preprocessing
|
HTTP location zone [{#NAME}]: Responses 4xx, rate | The number of responses with |
Dependent item | nginx.http.location_zones.responses.4xx.rate[{#NAME}] Preprocessing
|
HTTP location zone [{#NAME}]: Responses 5xx, rate | The number of responses with |
Dependent item | nginx.http.location_zones.responses.5xx.rate[{#NAME}] Preprocessing
|
HTTP location zone [{#NAME}]: Responses total, rate | The total number of responses sent to clients per second. |
Dependent item | nginx.http.location_zones.responses.total.rate[{#NAME}] Preprocessing
|
HTTP location zone [{#NAME}]: Discarded, rate | The total number of requests completed without sending a response per second. |
Dependent item | nginx.http.location_zones.discarded.rate[{#NAME}] Preprocessing
|
HTTP location zone [{#NAME}]: Received, rate | The total number of bytes received from clients per second. |
Dependent item | nginx.http.location_zones.received.rate[{#NAME}] Preprocessing
|
HTTP location zone [{#NAME}]: Sent, rate | The total number of bytes sent to clients per second. |
Dependent item | nginx.http.location_zones.sent.rate[{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
HTTP upstreams discovery | Dependent item | nginx.http.upstreams.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
HTTP upstream [{#NAME}]: Raw data | The raw data of the HTTP upstream with the name |
Dependent item | nginx.http.upstreams.raw[{#NAME}] Preprocessing
|
HTTP upstream [{#NAME}]: Keepalive | The current number of idle keepalive connections. |
Dependent item | nginx.http.upstreams.keepalive[{#NAME}] Preprocessing
|
HTTP upstream [{#NAME}]: Zombies | The current number of servers removed from the group but still processing active client requests. |
Dependent item | nginx.http.upstreams.zombies[{#NAME}] Preprocessing
|
HTTP upstream [{#NAME}]: Zone | The name of the shared memory zone that keeps the group's configuration and run-time state. |
Dependent item | nginx.http.upstreams.zone[{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
HTTP upstream peers discovery | Dependent item | nginx.http.upstream.peers.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Raw data | The raw data of the HTTP upstream with the name |
Dependent item | nginx.http.upstream.peer.raw[{#UPSTREAM},{#PEER}] Preprocessing
|
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: State | The current state, which may be one of “up”, “draining”, “down”, “unavail”, “checking”, and “unhealthy”. |
Dependent item | nginx.http.upstream.peer.state[{#UPSTREAM},{#PEER}] Preprocessing
|
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Active | The current number of active connections. |
Dependent item | nginx.http.upstream.peer.active[{#UPSTREAM},{#PEER}] Preprocessing
|
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Requests, rate | The total number of client requests forwarded to this server per second. |
Dependent item | nginx.http.upstream.peer.requests.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses 1xx, rate | The number of responses with |
Dependent item | nginx.http.upstream.peer.responses.1xx.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses 2xx, rate | The number of responses with |
Dependent item | nginx.http.upstream.peer.responses.2xx.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses 3xx, rate | The number of responses with |
Dependent item | nginx.http.upstream.peer.responses.3xx.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses 4xx, rate | The number of responses with |
Dependent item | nginx.http.upstream.peer.responses.4xx.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses 5xx, rate | The number of responses with |
Dependent item | nginx.http.upstream.peer.responses.5xx.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses total, rate | The total number of responses obtained from this server. |
Dependent item | nginx.http.upstream.peer.responses.total.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Sent, rate | The total number of bytes sent to this server per second. |
Dependent item | nginx.http.upstream.peer.sent.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Received, rate | The total number of bytes received from this server per second. |
Dependent item | nginx.http.upstream.peer.received.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Fails, rate | The total number of unsuccessful attempts to communicate with the server per second. |
Dependent item | nginx.http.upstream.peer.fails.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Unavail | Displays how many times the server has become unavailable for client requests (the state - “unavail”) due to the number of unsuccessful attempts reaching the |
Dependent item | nginx.http.upstream.peer.unavail.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Header time | The average time to get the response header from the server. |
Dependent item | nginx.http.upstream.peer.header_time.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Response time | The average time to get the full response from the server. |
Dependent item | nginx.http.upstream.peer.response_time.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, check | The total number of health check requests made. |
Dependent item | nginx.http.upstream.peer.health_checks.checks[{#UPSTREAM},{#PEER}] Preprocessing
|
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, fails | The number of failed health checks. |
Dependent item | nginx.http.upstream.peer.health_checks.fails[{#UPSTREAM},{#PEER}] Preprocessing
|
HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, unhealthy | Displays how many times the server has become |
Dependent item | nginx.http.upstream.peer.health_checks.unhealthy[{#UPSTREAM},{#PEER}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
NGINX Plus: HTTP upstream server is not in UP or DOWN state. | find(/NGINX Plus by HTTP/nginx.http.upstream.peer.state[{#UPSTREAM},{#PEER}],,"like","up")=0 and find(/NGINX Plus by HTTP/nginx.http.upstream.peer.state[{#UPSTREAM},{#PEER}],,"like","down")=0 |Warning |
|||
NGINX Plus: Too many HTTP requests with code 4xx | sum(/NGINX Plus by HTTP/nginx.http.upstream.peer.responses.4xx.rate[{#UPSTREAM},{#PEER}],5m) > (sum(/NGINX Plus by HTTP/nginx.http.upstream.peer.responses.total.rate[{#UPSTREAM},{#PEER}],5m)*({$NGINX.HTTP.UPSTREAM.4XX.MAX.WARN}/100)) |Warning |
|||
NGINX Plus: Too many HTTP requests with code 5xx | sum(/NGINX Plus by HTTP/nginx.http.upstream.peer.responses.5xx.rate[{#UPSTREAM},{#PEER}],5m) > (sum(/NGINX Plus by HTTP/nginx.http.upstream.peer.responses.total.rate[{#UPSTREAM},{#PEER}],5m)*({$NGINX.HTTP.UPSTREAM.5XX.MAX.WARN}/100)) |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Stream server zones discovery | Dependent item | nginx.stream.server_zones.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Stream server zone [{#NAME}]: Raw data | The raw data of server zone with the name |
Dependent item | nginx.stream.server_zones.raw[{#NAME}] Preprocessing
|
Stream server zone [{#NAME}]: Processing | The number of client connections that are currently being processed. |
Dependent item | nginx.stream.server_zones.processing[{#NAME}] Preprocessing
|
Stream server zone [{#NAME}]: Connections, rate | The total number of connections accepted from clients per second. |
Dependent item | nginx.stream.server_zones.connections.rate[{#NAME}] Preprocessing
|
Stream server zone [{#NAME}]: Sessions 2xx, rate | The total number of sessions completed with status code |
Dependent item | nginx.stream.server_zones.sessions.2xx.rate[{#NAME}] Preprocessing
|
Stream server zone [{#NAME}]: Sessions 4xx, rate | The total number of sessions completed with status code |
Dependent item | nginx.stream.server_zones.sessions.4xx.rate[{#NAME}] Preprocessing
|
Stream server zone [{#NAME}]: Sessions 5xx, rate | The total number of sessions completed with status code |
Dependent item | nginx.stream.server_zones.sessions.5xx.rate[{#NAME}] Preprocessing
|
Stream server zone [{#NAME}]: Sessions total, rate | The total number of completed client sessions per second. |
Dependent item | nginx.stream.server_zones.sessions.total.rate[{#NAME}] Preprocessing
|
Stream server zone [{#NAME}]: Discarded, rate | The total number of connections completed without creating a session per second. |
Dependent item | nginx.stream.server_zones.discarded.rate[{#NAME}] Preprocessing
|
Stream server zone [{#NAME}]: Received, rate | The total number of bytes received from clients per second. |
Dependent item | nginx.stream.server_zones.received.rate[{#NAME}] Preprocessing
|
Stream server zone [{#NAME}]: Sent, rate | The total number of bytes sent to clients per second. |
Dependent item | nginx.stream.server_zones.sent.rate[{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Stream upstreams discovery | Dependent item | nginx.stream.upstreams.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Stream upstream [{#NAME}]: Raw data | The raw data of the upstream with the name |
Dependent item | nginx.stream.upstreams.raw[{#NAME}] Preprocessing
|
Stream upstream [{#NAME}]: Zombies | Dependent item | nginx.stream.upstreams.zombies[{#NAME}] Preprocessing
|
|
Stream upstream [{#NAME}]: Zone | Dependent item | nginx.stream.upstreams.zone[{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Stream upstream peers discovery | Dependent item | nginx.stream.upstream.peers.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Raw data | The raw data of the upstream with the name |
Dependent item | nginx.stream.upstream.peer.raw[{#UPSTREAM},{#PEER}] Preprocessing
|
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: State | The current state, which may be one of “up”, “draining”, “down”, “unavail”, “checking”, and “unhealthy”. |
Dependent item | nginx.stream.upstream.peer.state[{#UPSTREAM},{#PEER}] Preprocessing
|
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Active | The current number of connections. |
Dependent item | nginx.stream.upstream.peer.active[{#UPSTREAM},{#PEER}] Preprocessing
|
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Sent, rate | The total number of bytes sent to this server per second. |
Dependent item | nginx.stream.upstream.peer.sent.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Received, rate | The total number of bytes received from this server per second. |
Dependent item | nginx.stream.upstream.peer.received.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Fails, rate | The total number of unsuccessful attempts to communicate with the server per second. |
Dependent item | nginx.stream.upstream.peer.fails.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Unavail | Displays how many times the server has become unavailable for client requests (the state - “unavail”) due to the number of unsuccessful attempts reaching the |
Dependent item | nginx.stream.upstream.peer.unavail.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Connections | The total number of client connections forwarded to this server. |
Dependent item | nginx.stream.upstream.peer.connections.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Connect time | The average time to connect to the upstream server. |
Dependent item | nginx.stream.upstream.peer.connect_time.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: First byte time | The average time to receive the first byte of data. |
Dependent item | nginx.stream.upstream.peer.firstbytetime.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Response time | The average time to receive the last byte of data. |
Dependent item | nginx.stream.upstream.peer.response_time.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, check | The total number of health check requests made. |
Dependent item | nginx.stream.upstream.peer.health_checks.checks[{#UPSTREAM},{#PEER}] Preprocessing
|
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, fails | The number of failed health checks. |
Dependent item | nginx.stream.upstream.peer.health_checks.fails[{#UPSTREAM},{#PEER}] Preprocessing
|
Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, unhealthy | Displays how many times the server has become |
Dependent item | nginx.stream.upstream.peer.health_checks.unhealthy[{#UPSTREAM},{#PEER}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
NGINX Plus: Stream upstream server is not in UP or DOWN state. | find(/NGINX Plus by HTTP/nginx.stream.upstream.peer.state[{#UPSTREAM},{#PEER}],,"like","up")=0 and find(/NGINX Plus by HTTP/nginx.stream.upstream.peer.state[{#UPSTREAM},{#PEER}],,"like","down")=0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Resolvers discovery | Dependent item | nginx.resolvers.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Resolver [{#NAME}]: Raw data | The raw data of the |
Dependent item | nginx.resolvers.raw[{#NAME}] Preprocessing
|
Resolver [{#NAME}]: Requests name, rate | The total number of requests to resolve names to addresses per second. |
Dependent item | nginx.resolvers.requests.name.rate[{#NAME}] Preprocessing
|
Resolver [{#NAME}]: Requests srv, rate | The total number of requests to resolve SRV records per second. |
Dependent item | nginx.resolvers.requests.srv.rate[{#NAME}] Preprocessing
|
Resolver [{#NAME}]: Requests addr, rate | The total number of requests to resolve addresses to names per second. |
Dependent item | nginx.resolvers.requests.addr.rate[{#NAME}] Preprocessing
|
Resolver [{#NAME}]: Responses noerror, rate | The total number of successful responses per second. |
Dependent item | nginx.resolvers.responses.noerror.rate[{#NAME}] Preprocessing
|
Resolver [{#NAME}]: Responses formerr, rate | The total number of |
Dependent item | nginx.resolvers.responses.formerr.rate[{#NAME}] Preprocessing
|
Resolver [{#NAME}]: Responses servfail, rate | The total number of |
Dependent item | nginx.resolvers.responses.servfail.rate[{#NAME}] Preprocessing
|
Resolver [{#NAME}]: Responses nxdomain, rate | The total number of |
Dependent item | nginx.resolvers.responses.nxdomain.rate[{#NAME}] Preprocessing
|
Resolver [{#NAME}]: Responses notimp, rate | The total number of |
Dependent item | nginx.resolvers.responses.notimp.rate[{#NAME}] Preprocessing
|
Resolver [{#NAME}]: Responses refused, rate | The total number of |
Dependent item | nginx.resolvers.responses.refused.rate[{#NAME}] Preprocessing
|
Resolver [{#NAME}]: Responses timedout, rate | The total number of timed out requests per second. |
Dependent item | nginx.resolvers.responses.timedout.rate[{#NAME}] Preprocessing
|
Resolver [{#NAME}]: Responses unknown, rate | The total number of requests completed with an unknown error per second. |
Dependent item | nginx.resolvers.responses.unknown.rate[{#NAME}] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is developed to monitor Nginx by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template collects metrics by polling the module ngx_http_stub_status_module
with HTTP agent remotely:
Active connections: 291
server accepts handled requests
16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
ngx_http_stub_status_module
.Test the availability of the http_stub_status_module
with nginx -V 2>&1 | grep -o with-http_stub_status_module
.
Example configuration of Nginx:
location = /basic_status {
stub_status;
allow <IP of your Zabbix server/proxy>;
deny all;
}
{$NGINX.STUB_STATUS.HOST}
macro. You can also change the status page port in the {$NGINX.STUB_STATUS.PORT}
macro, the status page scheme in the {$NGINX.STUB_STATUS.SCHEME}
macro and the status page path in the {$NGINX.STUB_STATUS.PATH}
macro if necessary.Example answer from Nginx:
Active connections: 291
server accepts handled requests
16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106
Name | Description | Default |
---|---|---|
{$NGINX.STUB_STATUS.HOST} | The hostname or IP address of the Nginx host or Nginx container of a stub_status. |
<SET STUB_STATUS HOST> |
{$NGINX.STUB_STATUS.SCHEME} | The protocol http or https of Nginx stub_status host or container. |
http |
{$NGINX.STUB_STATUS.PATH} | The path of the |
basic_status |
{$NGINX.STUB_STATUS.PORT} | The port of the |
80 |
{$NGINX.RESPONSE_TIME.MAX.WARN} | The maximum response time of Nginx expressed in seconds for a trigger expression. |
10 |
{$NGINX.DROP_RATE.MAX.WARN} | The critical rate of the dropped connections for a trigger expression. |
1 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get stub status page | The following status information is provided:
See also Module ngxhttpstubstatusmodule. |
HTTP agent | nginx.getstubstatus |
Service status | Simple check | net.tcp.service[http,"{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PORT}"] Preprocessing
|
|
Service response time | Simple check | net.tcp.service.perf[http,"{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PORT}"] | |
Requests total | The total number of client requests. |
Dependent item | nginx.requests.total Preprocessing
|
Requests per second | The total number of client requests. |
Dependent item | nginx.requests.total.rate Preprocessing
|
Connections accepted per second | The total number of accepted client connections. |
Dependent item | nginx.connections.accepted.rate Preprocessing
|
Connections dropped per second | The total number of dropped client connections. |
Dependent item | nginx.connections.dropped.rate Preprocessing
|
Connections handled per second | The total number of handled connections. Generally, the parameter value is the same as for the accepted connections, unless some resource limits have been reached (for example, the |
Dependent item | nginx.connections.handled.rate Preprocessing
|
Connections active | The current number of active client connections including waiting connections. |
Dependent item | nginx.connections.active Preprocessing
|
Connections reading | The current number of connections where Nginx is reading the request header. |
Dependent item | nginx.connections.reading Preprocessing
|
Connections waiting | The current number of idle client connections waiting for a request. |
Dependent item | nginx.connections.waiting Preprocessing
|
Connections writing | The current number of connections where Nginx is writing a response back to the client. |
Dependent item | nginx.connections.writing Preprocessing
|
Version | Dependent item | nginx.version Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Nginx: Failed to fetch stub status page | Zabbix has not received any data for items for the last 30 minutes. |
find(/Nginx by HTTP/nginx.get_stub_status,,"iregexp","HTTP\\/[\\d.]+\\s+200")=0 or nodata(/Nginx by HTTP/nginx.get_stub_status,30m)=1 |Warning |
Manual close: Yes Depends on:
|
|
Nginx: Service is down | last(/Nginx by HTTP/net.tcp.service[http,"{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PORT}"])=0 |Average |
Manual close: Yes | ||
Nginx: Service response time is too high | min(/Nginx by HTTP/net.tcp.service.perf[http,"{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PORT}"],5m)>{$NGINX.RESPONSE_TIME.MAX.WARN} |Warning |
Manual close: Yes Depends on:
|
||
Nginx: High connections drop rate | The rate of dropping connections has been greater than {$NGINX.DROP_RATE.MAX.WARN} for the last 5 minutes. |
min(/Nginx by HTTP/nginx.connections.dropped.rate,5m) > {$NGINX.DROP_RATE.MAX.WARN} |Warning |
Depends on:
|
|
Nginx: Version has changed | The Nginx version has changed. Acknowledge to close the problem manually. |
last(/Nginx by HTTP/nginx.version,#1)<>last(/Nginx by HTTP/nginx.version,#2) and length(last(/Nginx by HTTP/nginx.version))>0 |Info |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is developed to monitor Nginx by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template Nginx by Zabbix agent
- collects metrics by polling the Module ngxhttpstubstatusmodule locally with Zabbix agent:
Active connections: 291
server accepts handled requests
16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106
Note that this template doesn't support HTTPS and redirects (limitations of web.page.get
).
It also uses Zabbix agent to collect Nginx
Linux process statistics, such as CPU usage, memory usage and whether the process is running or not.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
See the setup instructions for ngxhttpstubstatusmodule.
Test the availability of the http_stub_status_module
nginx -V 2>&1 | grep -o with-http_stub_status_module
.
Example configuration of Nginx:
location = /basic_status {
stub_status;
allow 127.0.0.1;
allow ::1;
deny all;
}
If you use another location, then don't forget to change the {$NGINX.STUB_STATUS.PATH} macro.
Example answer from Nginx:
Active connections: 291
server accepts handled requests
16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106
Note that this template doesn't support https and redirects (limitations of web.page.get).
Install and setup Zabbix agent.
Name | Description | Default |
---|---|---|
{$NGINX.STUB_STATUS.HOST} | The hostname or IP address of the Nginx host or Nginx container of |
localhost |
{$NGINX.STUB_STATUS.PATH} | The path of the |
basic_status |
{$NGINX.STUB_STATUS.PORT} | The port of the |
80 |
{$NGINX.RESPONSE_TIME.MAX.WARN} | The maximum response time of Nginx expressed in seconds for a trigger expression. |
10 |
{$NGINX.DROP_RATE.MAX.WARN} | The critical rate of the dropped connections for a trigger expression. |
1 |
{$NGINX.PROCESS_NAME} | The process name filter for the Nginx process discovery. |
nginx |
{$NGINX.PROCESS.NAME.PARAMETER} | The process name of the Nginx server used in the item key |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get stub status page | The following status information is provided:
See also Module ngxhttpstubstatusmodule. |
Zabbix agent (active) | web.page.get["{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PATH}","{$NGINX.STUB_STATUS.PORT}"] |
Service status | Zabbix agent (active) | net.tcp.service[http,"{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PORT}"] Preprocessing
|
|
Service response time | Zabbix agent (active) | net.tcp.service.perf[http,"{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PORT}"] | |
Requests total | The total number of client requests. |
Dependent item | nginx.requests.total Preprocessing
|
Requests per second | The total number of client requests. |
Dependent item | nginx.requests.total.rate Preprocessing
|
Connections accepted per second | The total number of accepted client connections. |
Dependent item | nginx.connections.accepted.rate Preprocessing
|
Connections dropped per second | The total number of dropped client connections. |
Dependent item | nginx.connections.dropped.rate Preprocessing
|
Connections handled per second | The total number of handled connections. Generally, the parameter value is the same as for the accepted connections, unless some resource limits have been reached (for example, the |
Dependent item | nginx.connections.handled.rate Preprocessing
|
Connections active | The current number of active client connections including waiting connections. |
Dependent item | nginx.connections.active Preprocessing
|
Connections reading | The current number of connections where Nginx is reading the request header. |
Dependent item | nginx.connections.reading Preprocessing
|
Connections waiting | The current number of idle client connections waiting for a request. |
Dependent item | nginx.connections.waiting Preprocessing
|
Connections writing | The current number of connections where Nginx is writing a response back to the client. |
Dependent item | nginx.connections.writing Preprocessing
|
Version | Dependent item | nginx.version Preprocessing
|
|
Get processes summary | The aggregated data of summary metrics for all processes. |
Zabbix agent (active) | proc.get[{$NGINX.PROCESS.NAME.PARAMETER},,,summary] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Nginx: Version has changed | The Nginx version has changed. Acknowledge to close the problem manually. |
last(/Nginx by Zabbix agent active/nginx.version,#1)<>last(/Nginx by Zabbix agent active/nginx.version,#2) and length(last(/Nginx by Zabbix agent active/nginx.version))>0 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Nginx process discovery | The discovery of Nginx process summary. |
Dependent item | nginx.proc.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
CPU utilization | The percentage of the CPU utilization by a process {#NGINX.NAME}. |
Zabbix agent (active) | proc.cpu.util[{#NGINX.NAME}] |
Get process data | The summary metrics aggregated by a process {#NGINX.NAME}. |
Dependent item | nginx.proc.get[{#NGINX.NAME}] Preprocessing
|
Memory usage (vsize) | The summary of virtual memory used by a process {#NGINX.NAME} expressed in bytes. |
Dependent item | nginx.proc.vmem[{#NGINX.NAME}] Preprocessing
|
Memory usage (rss) | The summary of resident set size memory used by a process {#NGINX.NAME} expressed in bytes. |
Dependent item | nginx.proc.rss[{#NGINX.NAME}] Preprocessing
|
Memory usage, % | The percentage of real memory used by a process {#NGINX.NAME}. |
Dependent item | nginx.proc.pmem[{#NGINX.NAME}] Preprocessing
|
Number of running processes | The number of running processes {#NGINX.NAME}. |
Dependent item | nginx.proc.num[{#NGINX.NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Nginx: Process is not running | last(/Nginx by Zabbix agent active/nginx.proc.num[{#NGINX.NAME}])=0 |High |
|||
Nginx: Service is down | last(/Nginx by Zabbix agent active/net.tcp.service[http,"{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PORT}"])=0 and last(/Nginx by Zabbix agent active/nginx.proc.num[{#NGINX.NAME}])>0 |Average |
Manual close: Yes | ||
Nginx: High connections drop rate | The rate of dropping connections has been greater than {$NGINX.DROP_RATE.MAX.WARN} for the last 5 minutes. |
min(/Nginx by Zabbix agent active/nginx.connections.dropped.rate,5m) > {$NGINX.DROP_RATE.MAX.WARN} and last(/Nginx by Zabbix agent active/nginx.proc.num[{#NGINX.NAME}])>0 |Warning |
Depends on:
|
|
Nginx: Service response time is too high | min(/Nginx by Zabbix agent active/net.tcp.service.perf[http,"{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PORT}"],5m)>{$NGINX.RESPONSE_TIME.MAX.WARN} and last(/Nginx by Zabbix agent active/nginx.proc.num[{#NGINX.NAME}])>0 |Warning |
Manual close: Yes Depends on:
|
||
Nginx: Failed to fetch stub status page | Zabbix has not received any data for items for the last 30 minutes. |
(find(/Nginx by Zabbix agent active/web.page.get["{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PATH}","{$NGINX.STUB_STATUS.PORT}"],,"iregexp","HTTP\\/[\\d.]+\\s+200")=0 or nodata(/Nginx by Zabbix agent active/web.page.get["{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PATH}","{$NGINX.STUB_STATUS.PORT}"],30m)) and last(/Nginx by Zabbix agent active/nginx.proc.num[{#NGINX.NAME}])>0 |Warning |
Manual close: Yes Depends on:
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is developed to monitor Nginx by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template Nginx by Zabbix agent
- collects metrics by polling the Module ngxhttpstubstatusmodule locally with Zabbix agent:
Active connections: 291
server accepts handled requests
16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106
Note that this template doesn't support HTTPS and redirects (limitations of web.page.get
).
It also uses Zabbix agent to collect Nginx
Linux process statistics, such as CPU usage, memory usage and whether the process is running or not.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
See the setup instructions for ngxhttpstubstatusmodule.
Test the availability of the http_stub_status_module
nginx -V 2>&1 | grep -o with-http_stub_status_module
.
Example configuration of Nginx:
location = /basic_status {
stub_status;
allow 127.0.0.1;
allow ::1;
deny all;
}
If you use another location, then don't forget to change the {$NGINX.STUB_STATUS.PATH} macro.
Example answer from Nginx:
Active connections: 291
server accepts handled requests
16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106
Note that this template doesn't support https and redirects (limitations of web.page.get).
Install and setup Zabbix agent.
Name | Description | Default |
---|---|---|
{$NGINX.STUB_STATUS.HOST} | The hostname or IP address of the Nginx host or Nginx container of |
localhost |
{$NGINX.STUB_STATUS.PATH} | The path of the |
basic_status |
{$NGINX.STUB_STATUS.PORT} | The port of the |
80 |
{$NGINX.RESPONSE_TIME.MAX.WARN} | The maximum response time of Nginx expressed in seconds for a trigger expression. |
10 |
{$NGINX.DROP_RATE.MAX.WARN} | The critical rate of the dropped connections for a trigger expression. |
1 |
{$NGINX.PROCESS_NAME} | The process name filter for the Nginx process discovery. |
nginx |
{$NGINX.PROCESS.NAME.PARAMETER} | The process name of the Nginx server used in the item key |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get stub status page | The following status information is provided:
See also Module ngxhttpstubstatusmodule. |
Zabbix agent | web.page.get["{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PATH}","{$NGINX.STUB_STATUS.PORT}"] |
Service status | Zabbix agent | net.tcp.service[http,"{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PORT}"] Preprocessing
|
|
Service response time | Zabbix agent | net.tcp.service.perf[http,"{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PORT}"] | |
Requests total | The total number of client requests. |
Dependent item | nginx.requests.total Preprocessing
|
Requests per second | The total number of client requests. |
Dependent item | nginx.requests.total.rate Preprocessing
|
Connections accepted per second | The total number of accepted client connections. |
Dependent item | nginx.connections.accepted.rate Preprocessing
|
Connections dropped per second | The total number of dropped client connections. |
Dependent item | nginx.connections.dropped.rate Preprocessing
|
Connections handled per second | The total number of handled connections. Generally, the parameter value is the same as for the accepted connections, unless some resource limits have been reached (for example, the |
Dependent item | nginx.connections.handled.rate Preprocessing
|
Connections active | The current number of active client connections including waiting connections. |
Dependent item | nginx.connections.active Preprocessing
|
Connections reading | The current number of connections where Nginx is reading the request header. |
Dependent item | nginx.connections.reading Preprocessing
|
Connections waiting | The current number of idle client connections waiting for a request. |
Dependent item | nginx.connections.waiting Preprocessing
|
Connections writing | The current number of connections where Nginx is writing a response back to the client. |
Dependent item | nginx.connections.writing Preprocessing
|
Version | Dependent item | nginx.version Preprocessing
|
|
Get processes summary | The aggregated data of summary metrics for all processes. |
Zabbix agent | proc.get[{$NGINX.PROCESS.NAME.PARAMETER},,,summary] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Nginx: Version has changed | The Nginx version has changed. Acknowledge to close the problem manually. |
last(/Nginx by Zabbix agent/nginx.version,#1)<>last(/Nginx by Zabbix agent/nginx.version,#2) and length(last(/Nginx by Zabbix agent/nginx.version))>0 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Nginx process discovery | The discovery of Nginx process summary. |
Dependent item | nginx.proc.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
CPU utilization | The percentage of the CPU utilization by a process {#NGINX.NAME}. |
Zabbix agent | proc.cpu.util[{#NGINX.NAME}] |
Get process data | The summary metrics aggregated by a process {#NGINX.NAME}. |
Dependent item | nginx.proc.get[{#NGINX.NAME}] Preprocessing
|
Memory usage (vsize) | The summary of virtual memory used by a process {#NGINX.NAME} expressed in bytes. |
Dependent item | nginx.proc.vmem[{#NGINX.NAME}] Preprocessing
|
Memory usage (rss) | The summary of resident set size memory used by a process {#NGINX.NAME} expressed in bytes. |
Dependent item | nginx.proc.rss[{#NGINX.NAME}] Preprocessing
|
Memory usage, % | The percentage of real memory used by a process {#NGINX.NAME}. |
Dependent item | nginx.proc.pmem[{#NGINX.NAME}] Preprocessing
|
Number of running processes | The number of running processes {#NGINX.NAME}. |
Dependent item | nginx.proc.num[{#NGINX.NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Nginx: Process is not running | last(/Nginx by Zabbix agent/nginx.proc.num[{#NGINX.NAME}])=0 |High |
|||
Nginx: Service is down | last(/Nginx by Zabbix agent/net.tcp.service[http,"{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PORT}"])=0 and last(/Nginx by Zabbix agent/nginx.proc.num[{#NGINX.NAME}])>0 |Average |
Manual close: Yes | ||
Nginx: High connections drop rate | The rate of dropping connections has been greater than {$NGINX.DROP_RATE.MAX.WARN} for the last 5 minutes. |
min(/Nginx by Zabbix agent/nginx.connections.dropped.rate,5m) > {$NGINX.DROP_RATE.MAX.WARN} and last(/Nginx by Zabbix agent/nginx.proc.num[{#NGINX.NAME}])>0 |Warning |
Depends on:
|
|
Nginx: Service response time is too high | min(/Nginx by Zabbix agent/net.tcp.service.perf[http,"{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PORT}"],5m)>{$NGINX.RESPONSE_TIME.MAX.WARN} and last(/Nginx by Zabbix agent/nginx.proc.num[{#NGINX.NAME}])>0 |Warning |
Manual close: Yes Depends on:
|
||
Nginx: Failed to fetch stub status page | Zabbix has not received any data for items for the last 30 minutes. |
(find(/Nginx by Zabbix agent/web.page.get["{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PATH}","{$NGINX.STUB_STATUS.PORT}"],,"iregexp","HTTP\\/[\\d.]+\\s+200")=0 or nodata(/Nginx by Zabbix agent/web.page.get["{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PATH}","{$NGINX.STUB_STATUS.PORT}"],30m)) and last(/Nginx by Zabbix agent/nginx.proc.num[{#NGINX.NAME}])>0 |Warning |
Manual close: Yes Depends on:
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for monitoring Nextcloud by HTTP via Zabbix, and it works without any external scripts.
Nextcloud is a suite of client-server software for creating and using file hosting services.
For more information, see the official documentation
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Set macros {$NEXTCLOUD.USER.NAME}
, {$NEXTCLOUD.USER.PASSWORD}
, {$NEXTCLOUD.ADDRESS}
.
The user must be included in the Administrators group.
Name | Description | Default |
---|---|---|
{$NEXTCLOUD.SCHEMA} | HTTP or HTTPS protocol of Nextcloud. |
https |
{$NEXTCLOUD.USER.NAME} | Nextcloud username. |
root |
{$NEXTCLOUD.USER.PASSWORD} | Nextcloud user password. |
<Put the password here> |
{$NEXTCLOUD.ADDRESS} | IP or DNS name of Nextcloud server. |
127.0.0.1 |
{$NEXTCLOUD.LLD.FILTER.USER.MATCHES} | Filter of discoverable users by name. |
.* |
{$NEXTCLOUD.LLD.FILTER.USER.NOT_MATCHES} | Filter to exclude discovered users by name. |
CHANGE_IF_NEEDED |
{$NEXTCLOUD.USER.QUOTA.PUSED.MAX} | Storage utilization threshold. |
90 |
{$NEXTCLOUD.USER.MAX.INACTIVE} | How many days a user can be inactive. |
30 |
{$NEXTCLOUD.CPU.LOAD.MAX} | CPU load threshold (the number of processes in the system run queue). |
95 |
{$NEXTCLOUD.MEM.PUSED.MAX} | Memory utilization threshold. |
90 |
{$NEXTCLOUD.SWAP.PUSED.MAX} | Swap utilization threshold. |
90 |
{$NEXTCLOUD.PHP.MEM.PUSED.MAX} | PHP memory utilization threshold. |
90 |
{$NEXTCLOUD.STORAGE.FREE.MIN} | Free space threshold. |
1G |
{$NEXTCLOUD.PROXY} | Proxy HTTP(S) address. |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get server information | This item provides useful server information, such as CPU load, RAM usage, disk usage, number of users, etc. https://github.com/nextcloud/serverinfo |
HTTP agent | nextcloud.serverinfo.get_data Preprocessing
|
Server information status | Server information API status |
Dependent item | nextcloud.serverinfo.status Preprocessing
|
Version | Nextcloud service version. |
Dependent item | nextcloud.serverinfo.version Preprocessing
|
Free space | The amount of free disk space. |
Dependent item | nextcloud.serverinfo.freespace Preprocessing
|
CPU load, avg 1m | The average system load (the number of processes in the system run queue), last 1 minute. |
Dependent item | nextcloud.serverinfo.cpu.avg.1m Preprocessing
|
CPU load, avg 5m | The average system load (the number of processes in the system run queue), last 5 minutes. |
Dependent item | nextcloud.serverinfo.cpu.avg.5m Preprocessing
|
CPU load, avg 15m | The average system load (the number of processes in the system run queue), last 15 minutes. |
Dependent item | nextcloud.serverinfo.cpu.avg.15m Preprocessing
|
Memory total | The size of the RAM. |
Dependent item | nextcloud.serverinfo.mem.total Preprocessing
|
Memory free | The amount of free RAM. |
Dependent item | nextcloud.serverinfo.mem.free Preprocessing
|
Memory used, in % | RAM usage, in percent. |
Dependent item | nextcloud.serverinfo.mem.pused Preprocessing
|
Swap total | The size of the swap memory. |
Dependent item | nextcloud.serverinfo.swap.total Preprocessing
|
Swap free | The amount of free swap. |
Dependent item | nextcloud.serverinfo.swap.free Preprocessing
|
Swap used, in % | Swap usage, in percent. |
Dependent item | nextcloud.serverinfo.swap.pused Preprocessing
|
Apps installed | The number of installed applications. |
Dependent item | nextcloud.serverinfo.apps.installed Preprocessing
|
Apps update available | The number of applications for which an update is available. |
Dependent item | nextcloud.serverinfo.apps.update Preprocessing
|
Web server | Web server description. |
Dependent item | nextcloud.serverinfo.apps.webserver Preprocessing
|
PHP version | PHP version |
Dependent item | nextcloud.serverinfo.php.version Preprocessing
|
PHP memory limit | By default, the PHP memory limit is generally set to 128 MB, but it can be customized based on the application's specific needs. The php.ini file is usually the standard location to set the PHP memory limit. |
Dependent item | nextcloud.serverinfo.php.memory.limit Preprocessing
|
PHP memory used | PHP memory used |
Dependent item | nextcloud.serverinfo.php.memory.used Preprocessing
|
PHP memory free | PHP free memory size. |
Dependent item | nextcloud.serverinfo.php.memory.free Preprocessing
|
PHP memory wasted | Memory allocated to the service but not in use. |
Dependent item | nextcloud.serverinfo.php.memory.wasted Preprocessing
|
PHP memory wasted, in % | Memory allocated to the service but not in use, in percent. |
Dependent item | nextcloud.serverinfo.php.memory.wasted_percentage Preprocessing
|
PHP memory used, in % | PHP memory used percentage |
Dependent item | nextcloud.serverinfo.php.memory.pused Preprocessing
|
PHP maximum execution time | By default, the maximum execution time for PHP scripts is set to 30 seconds. If a script runs for longer than 30 seconds, PHP stops the script and reports an error. You can control the amount of time PHP allows scripts to run by changing the 'maxexecutiontime' directive in your php.ini file. |
Dependent item | nextcloud.serverinfo.php.maxexecutiontime Preprocessing
|
PHP maximum upload file size | By default, the maximum upload file size for PHP scripts is set to 128 megabytes. However, you may want to change this limit. For example, you can set a lower limit to prevent users from uploading large files to your site. To do this, change the 'uploadmaxfilesize' and 'postmaxsize' directives. |
Dependent item | nextcloud.serverinfo.php.uploadmaxfilesize Preprocessing
|
Database type | Database type. |
Dependent item | nextcloud.serverinfo.db.type Preprocessing
|
Database version | Database description. |
Dependent item | nextcloud.serverinfo.db.version Preprocessing
|
Database size | Size of database. |
Dependent item | nextcloud.serverinfo.db.size Preprocessing
|
Active users, last 5 minutes | The number of active users in the last 5 minutes. |
Dependent item | nextcloud.serverinfo.active_users.last5m Preprocessing
|
Active users, last 1 hour | The number of active users in the last 1 hour. |
Dependent item | nextcloud.serverinfo.active_users.last1h Preprocessing
|
Active users, last 24 hours | The number of active users in the last day. |
Dependent item | nextcloud.serverinfo.active_users.last24hours Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Nextcloud: Server information unavailable | Failed to get server information. |
last(/Nextcloud by HTTP/nextcloud.serverinfo.status)<>"OK" |High |
||
Nextcloud: Version has changed | Nextcloud version has changed. Acknowledge to close the problem manually. |
change(/Nextcloud by HTTP/nextcloud.serverinfo.version)=1 and length(last(/Nextcloud by HTTP/nextcloud.serverinfo.version))>0 |Info |
Manual close: Yes | |
Nextcloud: Disk space is low | Condition should be the following: |
last(/Nextcloud by HTTP/nextcloud.serverinfo.freespace)<{$NEXTCLOUD.STORAGE.FREE.MIN} |Average |
Manual close: Yes | |
Nextcloud: CPU load is too high | High CPU load. |
min(/Nextcloud by HTTP/nextcloud.serverinfo.cpu.avg.1m,5m) > {$NEXTCLOUD.CPU.LOAD.MAX} |Average |
||
Nextcloud: High memory utilization | The system is running out of free memory. |
min(/Nextcloud by HTTP/nextcloud.serverinfo.mem.pused,5m) > {$NEXTCLOUD.MEM.PUSED.MAX} |Average |
||
Nextcloud: High swap utilization | The system is running out of free swap. |
min(/Nextcloud by HTTP/nextcloud.serverinfo.swap.pused,5m) > {$NEXTCLOUD.SWAP.PUSED.MAX} |Average |
||
Nextcloud: Number of installed apps has been changed | Applications have been installed or removed. |
change(/Nextcloud by HTTP/nextcloud.serverinfo.apps.installed)<>0 |Info |
Manual close: Yes | |
Nextcloud: Application updates are available | Updates are available for some of the installed applications. |
last(/Nextcloud by HTTP/nextcloud.serverinfo.apps.update)<>0 |Warning |
Manual close: Yes | |
Nextcloud: PHP version has changed | The PHP version has changed. Acknowledge to close the problem manually. |
change(/Nextcloud by HTTP/nextcloud.serverinfo.php.version)=1 and length(last(/Nextcloud by HTTP/nextcloud.serverinfo.php.version))>0 |Info |
Manual close: Yes | |
Nextcloud: High PHP memory utilization | The PHP is running out of free memory. |
min(/Nextcloud by HTTP/nextcloud.serverinfo.php.memory.pused,5m) > {$NEXTCLOUD.PHP.MEM.PUSED.MAX} |Average |
||
Nextcloud: Database version has changed | The Database version has changed. Acknowledge to close the problem manually. |
change(/Nextcloud by HTTP/nextcloud.serverinfo.db.version)=1 and length(last(/Nextcloud by HTTP/nextcloud.serverinfo.db.version))>0 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Nextcloud: User discovery | User discovery. |
HTTP agent | nextcloud.user.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
User "{#NEXTCLOUD.USER}": Get data | Get common information about user |
HTTP agent | nextcloud.user.get_data[{#NEXTCLOUD.USER}] Preprocessing
|
User "{#NEXTCLOUD.USER}": Status | User account status. |
Dependent item | nextcloud.user.enabled[{#NEXTCLOUD.USER}] Preprocessing
|
User "{#NEXTCLOUD.USER}": Storage location | The location of the user's store. |
Dependent item | nextcloud.user.storageLocation[{#NEXTCLOUD.USER}] Preprocessing
|
User "{#NEXTCLOUD.USER}": Last login | The time the user has last logged in. |
Dependent item | nextcloud.user.lastLogin[{#NEXTCLOUD.USER}] Preprocessing
|
User "{#NEXTCLOUD.USER}": Last login, days ago | The number of days since the user has last logged in. |
Dependent item | nextcloud.user.inactive[{#NEXTCLOUD.USER}] Preprocessing
|
User "{#NEXTCLOUD.USER}": Quota free space | The size of the free available space in the user's storage. |
Dependent item | nextcloud.user.quota.free[{#NEXTCLOUD.USER}] Preprocessing
|
User "{#NEXTCLOUD.USER}": Quota used space | The size of the used available space in the user storage. |
Dependent item | nextcloud.user.quota.used[{#NEXTCLOUD.USER}] Preprocessing
|
User "{#NEXTCLOUD.USER}": Quota total space | The size of space available in the user's storage. |
Dependent item | nextcloud.user.quota.total[{#NEXTCLOUD.USER}] Preprocessing
|
User "{#NEXTCLOUD.USER}": Quota used space, in % | Usage of the allocated storage space, in percent. |
Dependent item | nextcloud.user.quota.pused[{#NEXTCLOUD.USER}] Preprocessing
|
User "{#NEXTCLOUD.USER}": Quota | The size of space available in the user's storage. |
Dependent item | nextcloud.user.quota[{#NEXTCLOUD.USER}] Preprocessing
|
User "{#NEXTCLOUD.USER}": Display name | User visible name. |
Dependent item | nextcloud.user.displayname[{#NEXTCLOUD.USER}] Preprocessing
|
User "{#NEXTCLOUD.USER}": Language | User language. |
Dependent item | nextcloud.user.language[{#NEXTCLOUD.USER}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Nextcloud: User "{#NEXTCLOUD.USER}" status changed | User account status has changed. |
change(/Nextcloud by HTTP/nextcloud.user.enabled[{#NEXTCLOUD.USER}]) = 1 |Info |
||
Nextcloud: User "{#NEXTCLOUD.USER}": inactive | The user has not logged in for more than {$NEXTCLOUD.USER.MAX.INACTIVE:"{#NEXTCLOUD.USER}"} days. |
last(/Nextcloud by HTTP/nextcloud.user.inactive[{#NEXTCLOUD.USER}]) > {$NEXTCLOUD.USER.MAX.INACTIVE:"{#NEXTCLOUD.USER}"} |Info |
||
Nextcloud: User "{#NEXTCLOUD.USER}": High quota utilization | More than {$NEXTCLOUD.USER.QUOTA.PUSED.MAX:"{#NEXTCLOUD.USER}"} percent of the allocated storage space has been used. |
min(/Nextcloud by HTTP/nextcloud.user.quota.pused[{#NEXTCLOUD.USER}],5m) > {$NEXTCLOUD.USER.QUOTA.PUSED.MAX:"{#NEXTCLOUD.USER}"} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Reports.Read.All
- required for app usage and activity metricsServiceHealth.Read.All
- required for service discovery and service status metrics{$MS365.APP.ID}
, {$MS365.PASSWORD}
, {$MS365.TENANT.ID}
.Name | Description | Default |
---|---|---|
{$MS365.APP.ID} | Microsoft application ID. |
|
{$MS365.PASSWORD} | The secret for the registered Microsoft application. |
|
{$MS365.TENANT.ID} | Microsoft tenant ID. |
|
{$MS365.SERVICE.NAME.MATCHES} | This macro is used in the Microsoft cloud service discovery rule. |
.* |
{$MS365.SERVICE.NAME.NOT.MATCHES} | This macro is used in the Microsoft cloud service discovery rule. |
CHANGE_IF_NEEDED |
{$MS365.HTTP.PROXY} | Sets the HTTP proxy value. If this macro is empty, then no proxy is used. |
|
{$MS365.API.TIMEOUT} | API response timeout. |
15s |
Name | Description | Type | Key and additional info |
---|---|---|---|
Services: Get services | The list of Microsoft cloud services and their health statuses subscribed by a tenant. More information: https://learn.microsoft.com/en-us/graph/api/servicehealth-get?view=graph-rest-beta&tabs=http |
Script | ms365.services.get |
Teams: Get reports | Accumulated data from the Microsoft Graph API "reports" endpoints. More information: https://learn.microsoft.com/en-us/graph/api/resources/report?view=graph-rest-beta |
Script | ms365.teams.reports.get |
Outlook: Get reports | Accumulated data from the Microsoft Graph API "reports" endpoints. More information: https://learn.microsoft.com/en-us/graph/api/resources/report?view=graph-rest-beta |
Script | ms365.outlook.reports.get |
OneDrive: Get reports | Accumulated data from the Microsoft Graph API "reports" endpoints. More information: https://learn.microsoft.com/en-us/graph/api/resources/report?view=graph-rest-beta |
Script | ms365.onedrive.reports.get |
SharePoint: Get reports | Accumulated data from the Microsoft Graph API "reports" endpoints. More information: https://learn.microsoft.com/en-us/graph/api/resources/report?view=graph-rest-beta |
Script | ms365.sharepoint.reports.get |
Apps: Get reports | Accumulated data from the Microsoft Graph API "reports" endpoints. More information: https://learn.microsoft.com/en-us/graph/api/resources/report?view=graph-rest-beta |
Script | ms365.apps.reports.get |
Services: Get errors | A list of errors from API requests for Services metrics. |
Dependent item | ms365.services.errors Preprocessing
|
Teams: Get errors | A list of errors from API requests for Teams metrics. |
Dependent item | ms365.teams.errors Preprocessing
|
Outlook: Get errors | A list of errors from API requests for Outlook metrics. |
Dependent item | ms365.outlook.errors Preprocessing
|
OneDrive: Get errors | A list of errors from API requests for OneDrive metrics. |
Dependent item | ms365.onedrive.errors Preprocessing
|
SharePoint: Get errors | A list of errors from API requests for SharePoint metrics. |
Dependent item | ms365.sharepoint.errors Preprocessing
|
Apps: Get errors | A list of errors from API requests for Apps metrics. |
Dependent item | ms365.apps.errors Preprocessing
|
Teams: Device usage (users), web client | The number of unique licensed Microsoft Teams users recorded via the Teams web client over the week before the report refresh date. |
Dependent item | ms365.teams.device.users.web Preprocessing
|
Teams: Device usage (users), Android | The number of unique licensed Microsoft Teams users recorded via the Teams mobile client for Android over the week before the report refresh date. |
Dependent item | ms365.teams.device.users.android Preprocessing
|
Teams: Device usage (users), iOS | The number of unique licensed Microsoft Teams users recorded via the Teams mobile client for iOS over the week before the report refresh date. |
Dependent item | ms365.teams.device.users.ios Preprocessing
|
Teams: Device usage (users), Mac | The number of unique licensed Microsoft Teams users recorded via the Teams desktop client on a macOS computer over the week before the report refresh date. |
Dependent item | ms365.teams.device.users.mac Preprocessing
|
Teams: Device usage (users), Windows | The number of unique licensed Microsoft Teams users recorded via the Teams desktop client on a Windows-based computer over the week before the report refresh date. |
Dependent item | ms365.teams.device.users.windows Preprocessing
|
Teams: Device usage (users), Chrome OS | The number of unique licensed Microsoft Teams users recorded via the Teams desktop client on a ChromeOS computer over the week before the report refresh date. |
Dependent item | ms365.teams.device.users.chromeos Preprocessing
|
Teams: Device usage (users), Linux | The number of unique licensed Microsoft Teams users recorded via the Teams desktop client on a Linux computer over the week before the report refresh date. |
Dependent item | ms365.teams.device.users.linux Preprocessing
|
Teams: Device usage (total), report date | The date of the report of device usage of both licensed and non-licensed users. |
Dependent item | ms365.teams.device.total.report_date Preprocessing
|
Teams: Device usage (total), web client | The number of unique Microsoft Teams users (licensed or non-licensed) recorded via the Teams web client over the week before the report refresh date. |
Dependent item | ms365.teams.device.total.web Preprocessing
|
Teams: Device usage (total), Android | The number of unique Microsoft Teams users (licensed or non-licensed) recorded via the Teams mobile client for Android over the week before the report refresh date. |
Dependent item | ms365.teams.device.total.android Preprocessing
|
Teams: Device usage (total), iOS | The number of unique Microsoft Teams users (licensed or non-licensed) recorded via the Teams mobile client for iOS over the week before the report refresh date. |
Dependent item | ms365.teams.device.total.ios Preprocessing
|
Teams: Device usage (total), Mac | The number of unique Microsoft Teams users (licensed or non-licensed) recorded via the Teams desktop client on a macOS computer over the week before the report refresh date. |
Dependent item | ms365.teams.device.total.mac Preprocessing
|
Teams: Device usage (total), Windows | The number of unique Microsoft Teams users (licensed or non-licensed) recorded via the Teams desktop client on a Windows-based computer over the week before the report refresh date. |
Dependent item | ms365.teams.device.total.windows Preprocessing
|
Teams: Device usage (total), Chrome OS | The number of unique Microsoft Teams users (licensed or non-licensed) recorded via the Teams desktop client on a ChromeOS computer over the week before the report refresh date. |
Dependent item | ms365.teams.device.total.chromeos Preprocessing
|
Teams: Device usage (total), Linux | The number of unique Microsoft Teams users (licensed or non-licensed) recorded via the Teams desktop client on a Linux computer over the week before the report refresh date. |
Dependent item | ms365.teams.device.total.linux Preprocessing
|
Teams: Device usage (guests), web client | The number of unique non-licensed Microsoft Teams users (guests) recorded via the Teams web client over the week before the report refresh date. |
Dependent item | ms365.teams.device.guests.web Preprocessing
|
Teams: Device usage (guests), Android | The number of unique non-licensed Microsoft Teams users (guests) recorded via the Teams mobile client for Android over the week before the report refresh date. |
Dependent item | ms365.teams.device.guests.android Preprocessing
|
Teams: Device usage (guests), iOS | The number of unique non-licensed Microsoft Teams users (guests) recorded via the Teams mobile client for iOS over the week before the report refresh date. |
Dependent item | ms365.teams.device.guests.ios Preprocessing
|
Teams: Device usage (guests), Mac | The number of unique non-licensed Microsoft Teams users (guests) recorded via the Teams desktop client on a macOS computer over the week before the report refresh date. |
Dependent item | ms365.teams.device.guests.mac Preprocessing
|
Teams: Device usage (guests), Windows | The number of unique non-licensed Microsoft Teams users (guests) recorded via the Teams desktop client on a Windows-based computer over the week before the report refresh date. |
Dependent item | ms365.teams.device.guests.windows Preprocessing
|
Teams: Device usage (guests), Chrome OS | The number of unique non-licensed Microsoft Teams users (guests) recorded via the Teams desktop client on a ChromeOS computer over the week before the report refresh date. |
Dependent item | ms365.teams.device.guests.chromeos Preprocessing
|
Teams: Device usage (guests), Linux | The number of unique non-licensed Microsoft Teams users (guests) recorded via the Teams desktop client on a Linux computer over the week before the report refresh date. |
Dependent item | ms365.teams.device.guests.linux Preprocessing
|
Teams: User activity, report date | The date of the report of the number of activities by all users. |
Dependent item | ms365.teams.activity.user.report_date Preprocessing
|
Teams: User activity, team chat messages | The number of unique messages that were posted in team chats by licensed or non-licensed Microsoft Teams users during the week before the report refresh date. This includes original posts and replies. |
Dependent item | ms365.teams.activity.user.messages.in_team Preprocessing
|
Teams: User activity, private chat messages | The number of unique messages that were posted in private chats by licensed or non-licensed Microsoft Teams users during the week before the report refresh date. |
Dependent item | ms365.teams.activity.user.messages.private Preprocessing
|
Teams: User activity, calls | The number of 1:1 calls licensed or non-licensed Microsoft Teams users participated in during the week before the report refresh date. |
Dependent item | ms365.teams.activity.user.calls Preprocessing
|
Teams: User activity, meetings | The sum of the scheduled one-time and recurring, ad-hoc, and unclassified meetings licensed or non-licensed Microsoft Teams users participated in during the week before the report refresh date. |
Dependent item | ms365.teams.activity.user.meetings.total Preprocessing
|
Teams: User activity, organized meetings | The sum of the scheduled one-time and recurring, ad-hoc, and unclassified meetings organized by licensed or non-licensed Microsoft Teams users during the week before the report refresh date. |
Dependent item | ms365.teams.activity.user.meetings.organized Preprocessing
|
Teams: User activity, attended meetings | The sum of the scheduled one-time and recurring, ad-hoc, and unclassified meetings attended by licensed or non-licensed Microsoft Teams users during the week before the report refresh date. |
Dependent item | ms365.teams.activity.user.meetings.attended Preprocessing
|
Teams: User activity, audio duration | The sum of the audio duration of licensed or non-licensed Microsoft Teams users during the week before the report refresh date. |
Dependent item | ms365.teams.activity.user.duration.audio Preprocessing
|
Teams: User activity, video duration | The sum of the video duration of licensed or non-licensed Microsoft Teams users during the week before the report refresh date. |
Dependent item | ms365.teams.activity.user.duration.video Preprocessing
|
Teams: User activity, screen share duration | The sum of the screen share duration of licensed or non-licensed Microsoft Teams users during the week before the report refresh date. |
Dependent item | ms365.teams.activity.user.duration.screen_share Preprocessing
|
Teams: User activity, post messages | The number of post messages in all channels made by licensed or non-licensed Microsoft Teams users during the week before the report refresh date. A post is the original message in a teams chat. |
Dependent item | ms365.teams.activity.user.messages.posts Preprocessing
|
Teams: User activity, reply messages | The number of reply messages in all channels made by licensed or non-licensed Microsoft Teams users during the week before the report refresh date. |
Dependent item | ms365.teams.activity.user.messages.replies Preprocessing
|
Teams: User count (users), team chat messages | The number of licensed Microsoft Teams users who posted or replied in team chats during the week before the report refresh date. |
Dependent item | ms365.teams.usercount.users.messages.inteam Preprocessing
|
Teams: User count (users), private chat messages | The number of licensed Microsoft Teams users who posted or replied in private chats during the week before the report refresh date. |
Dependent item | ms365.teams.user_count.users.messages.private Preprocessing
|
Teams: User count (users), calls | The number of licensed Microsoft Teams users who participated in calls during the week before the report refresh date. |
Dependent item | ms365.teams.user_count.users.calls Preprocessing
|
Teams: User count (users), meetings | The number of licensed Microsoft Teams users who participated in meetings during the week before the report refresh date. |
Dependent item | ms365.teams.user_count.users.meetings Preprocessing
|
Teams: User count (total), report date | The date of the report of the number of licensed or non-licensed Microsoft Teams users in activity. |
Dependent item | ms365.teams.usercount.total.reportdate Preprocessing
|
Teams: User count (total), team chat messages | The number of licensed or non-licensed Microsoft Teams users who posted or replied in team chats during the week before the report refresh date. |
Dependent item | ms365.teams.usercount.total.messages.inteam Preprocessing
|
Teams: User count (total), private chat messages | The number of licensed or non-licensed Microsoft Teams users who posted or replied in private chats during the week before the report refresh date. |
Dependent item | ms365.teams.user_count.total.messages.private Preprocessing
|
Teams: User count (total), calls | The number of licensed or non-licensed Microsoft Teams users who participated in calls during the week before the report refresh date. |
Dependent item | ms365.teams.user_count.total.calls Preprocessing
|
Teams: User count (total), meetings | The number of licensed or non-licensed Microsoft Teams users who participated in meetings during the week before the report refresh date. |
Dependent item | ms365.teams.user_count.total.meetings Preprocessing
|
Teams: User count (guests), team chat messages | The number of non-licensed Microsoft Teams users (guests) who posted or replied in team chats during the week before the report refresh date. |
Dependent item | ms365.teams.usercount.guests.messages.inteam Preprocessing
|
Teams: User count (guests), private chat messages | The number of non-licensed Microsoft Teams users (guests) who posted or replied in private chats during the week before the report refresh date. |
Dependent item | ms365.teams.user_count.guests.messages.private Preprocessing
|
Teams: User count (guests), calls | The number of non-licensed Microsoft Teams users (guests) who participated in calls during the week before the report refresh date. |
Dependent item | ms365.teams.user_count.guests.calls Preprocessing
|
Teams: User count (guests), meetings | The number of non-licensed Microsoft Teams users (guests) who participated in meetings during the week before the report refresh date. |
Dependent item | ms365.teams.user_count.guests.meetings Preprocessing
|
Teams: Team activity, report date | The date of the report of the number of team activities. |
Dependent item | ms365.teams.activity.team.report_date Preprocessing
|
Teams: Team activity, active shared channels | The number of active shared channels across Microsoft Teams over the week before the report refresh date. |
Dependent item | ms365.teams.activity.team.activesharedchannels Preprocessing
|
Teams: Team activity, active external users | The number of active external users across Microsoft Teams over the week before the report refresh date. |
Dependent item | ms365.teams.activity.team.activeexternalusers Preprocessing
|
Teams: Team activity, active users | The number of active users across Microsoft Teams over the week before the report refresh date. |
Dependent item | ms365.teams.activity.team.active_users Preprocessing
|
Teams: Team activity, active channels | The number of active channels across Microsoft Teams over the week before the report refresh date. |
Dependent item | ms365.teams.activity.team.active_channels Preprocessing
|
Teams: Team activity, channel messages | The number of channel messages across Microsoft Teams over the week before the report refresh date. |
Dependent item | ms365.teams.activity.team.channel_messages Preprocessing
|
Teams: Team activity, guests | The number of guests across Microsoft Teams over the week before the report refresh date. |
Dependent item | ms365.teams.activity.team.guests Preprocessing
|
Teams: Team activity, reactions | The number of reactions across Microsoft Teams over the week before the report refresh date. |
Dependent item | ms365.teams.activity.team.reactions Preprocessing
|
Teams: Team activity, meetings organized | The number of organized meetings across Microsoft Teams over the week before the report refresh date. |
Dependent item | ms365.teams.activity.team.meetings_organized Preprocessing
|
Teams: Team activity, post messages | The number of post messages across Microsoft Teams over the week before the report refresh date. |
Dependent item | ms365.teams.activity.team.messages.posts Preprocessing
|
Teams: Team activity, reply messages | The number of reply messages across Microsoft Teams over the week before the report refresh date. |
Dependent item | ms365.teams.activity.team.messages.replies Preprocessing
|
Teams: Team activity, urgent messages | The number of urgent messages across Microsoft Teams over the week before the report refresh date. |
Dependent item | ms365.teams.activity.team.urgent_messages Preprocessing
|
Teams: Team activity, mentions | The number of mentions across Microsoft Teams over the week before the report refresh date. |
Dependent item | ms365.teams.activity.team.mentions Preprocessing
|
Outlook: Activity, report date | The date of the Outlook activity count report. |
Dependent item | ms365.outlook.activity.report_date Preprocessing
|
Outlook: Activity, emails sent | The number of times an "Email sent" action was recorded. |
Dependent item | ms365.outlook.activity.sent Preprocessing
|
Outlook: Activity, emails received | The number of times an "Email received" action was recorded. |
Dependent item | ms365.outlook.activity.received Preprocessing
|
Outlook: Activity, emails read | The number of times an "Email read" action was recorded. |
Dependent item | ms365.outlook.activity.read Preprocessing
|
Outlook: Activity, meetings created | The number of times a "Meeting request sent" action was recorded. |
Dependent item | ms365.outlook.activity.meetings_created Preprocessing
|
Outlook: Activity, meetings interacted | The number of times a meeting request accept, tentative, decline, or cancel action was recorded. |
Dependent item | ms365.outlook.activity.meetings_interacted Preprocessing
|
Outlook: User count, report date | The date of the Outlook activity user count report. |
Dependent item | ms365.outlook.usercount.reportdate Preprocessing
|
Outlook: User count, emails sent | The number of users an "Email sent" action was recorded for. |
Dependent item | ms365.outlook.user_count.sent Preprocessing
|
Outlook: User count, emails received | The number of users an "Email received" action was recorded for. |
Dependent item | ms365.outlook.user_count.received Preprocessing
|
Outlook: User count, emails read | The number of users an "Email read" action was recorded for. |
Dependent item | ms365.outlook.user_count.read Preprocessing
|
Outlook: User count, meetings created | The number of users a "Meeting request sent" action was recorded for. |
Dependent item | ms365.outlook.usercount.meetingscreated Preprocessing
|
Outlook: User count, meetings interacted | The number of users a meeting request accept, tentative, decline, or cancel action was recorded for. |
Dependent item | ms365.outlook.usercount.meetingsinteracted Preprocessing
|
Outlook: User count, app usage, report date | The date of the report of the unique user count per app. |
Dependent item | ms365.outlook.usercount.reportdate.app Preprocessing
|
Outlook: User count, mail for Mac | The number of unique users of mail for Mac. |
Dependent item | ms365.outlook.usercount.mailfor_mac Preprocessing
|
Outlook: User count, Outlook for Mac | The number of unique users of Outlook for Mac. |
Dependent item | ms365.outlook.user_count.mac Preprocessing
|
Outlook: User count, Outlook for Windows | The number of unique users of Outlook for Windows. |
Dependent item | ms365.outlook.user_count.windows Preprocessing
|
Outlook: User count, Outlook for mobile | The number of unique users of Outlook for mobile. |
Dependent item | ms365.outlook.user_count.mobile Preprocessing
|
Outlook: User count, Outlook for web | The number of unique users of Outlook for web. |
Dependent item | ms365.outlook.user_count.web Preprocessing
|
Outlook: User count, POP3 applications | The number of unique users of other POP3 applications. |
Dependent item | ms365.outlook.user_count.pop3app Preprocessing
|
Outlook: User count, IMAP4 applications | The number of unique users of other IMAP4 applications. |
Dependent item | ms365.outlook.user_count.imap4app Preprocessing
|
Outlook: User count, SMTP applications | The number of unique users of other SMTP applications. |
Dependent item | ms365.outlook.user_count.smtpapp Preprocessing
|
Outlook: Mailbox, report date | The date of the report of the unique user count per app. |
Dependent item | ms365.outlook.mailbox.report_date Preprocessing
|
Outlook: Mailbox, total | The total number of user mailboxes in your organization. |
Dependent item | ms365.outlook.mailbox.total Preprocessing
|
Outlook: Mailbox, active | The number of active user mailboxes in your organization. A mailbox is considered active if the user has sent or read any emails. |
Dependent item | ms365.outlook.mailbox.active Preprocessing
|
Outlook: Mailbox, active in % | Percentage of active user mailboxes in your organization. A mailbox is considered active if the user has sent or read any emails. |
Dependent item | ms365.outlook.mailbox.active.percentage Preprocessing
|
Outlook: Mailbox, storage, report date | The date of the mailbox storage report. |
Dependent item | ms365.outlook.storage.report_date Preprocessing
|
Outlook: Mailbox, storage used | The amount of mailbox storage used in your organization. |
Dependent item | ms365.outlook.storage.used Preprocessing
|
OneDrive: Users, report date | The date of the report of the number of active OneDrive users. |
Dependent item | ms365.onedrive.users.report_date Preprocessing
|
OneDrive: Users, viewed or edited | The number of users who have viewed or edited OneDrive files. |
Dependent item | ms365.onedrive.users.viewedoredited Preprocessing
|
OneDrive: Users, synced | The number of users who have synced OneDrive files. |
Dependent item | ms365.onedrive.users.synced Preprocessing
|
OneDrive: Users, shared internally | The number of users who have shared OneDrive files internally. |
Dependent item | ms365.onedrive.users.shared_internally Preprocessing
|
OneDrive: Users, shared externally | The number of users who have shared OneDrive files externally. |
Dependent item | ms365.onedrive.users.shared_externally Preprocessing
|
OneDrive: Files, activity, report date | The date of report of the number of active OneDrive files. |
Dependent item | ms365.onedrive.files.report_date Preprocessing
|
OneDrive: Files, viewed or edited | The number of viewed or edited OneDrive files. |
Dependent item | ms365.onedrive.files.viewedoredited Preprocessing
|
OneDrive: Files, synced | The number of synced OneDrive files. |
Dependent item | ms365.onedrive.files.synced Preprocessing
|
OneDrive: Files, shared internally | The number of internally shared OneDrive files. |
Dependent item | ms365.onedrive.files.shared_internally Preprocessing
|
OneDrive: Files, shared externally | The number of externally shared OneDrive files. |
Dependent item | ms365.onedrive.files.shared_externally Preprocessing
|
OneDrive: Business sites, report date | The date of the number of OneDrive for Business sites report. |
Dependent item | ms365.onedrive.sites.report_date Preprocessing
|
OneDrive: Business sites, total | The number of OneDrive for Business sites. |
Dependent item | ms365.onedrive.sites.total Preprocessing
|
OneDrive: Business sites, active | The number of active OneDrive for Business sites. Any site on which users have viewed, modified, uploaded, downloaded, shared, or synced files is considered an active site. |
Dependent item | ms365.onedrive.sites.active Preprocessing
|
OneDrive: File count, report date | The date of the OneDrive file count report. |
Dependent item | ms365.onedrive.filecount.reportdate Preprocessing
|
OneDrive: File count, total | The total number of files across all sites. |
Dependent item | ms365.onedrive.file_count.total Preprocessing
|
OneDrive: File count, active | The total number of active files across all sites. A file is considered active if it has been saved, synced, modified, or shared. |
Dependent item | ms365.onedrive.file_count.active Preprocessing
|
OneDrive: Storage, report date | The date of the report of the amount of storage used in OneDrive for Business. |
Dependent item | ms365.onedrive.storage.report_date Preprocessing
|
OneDrive: Storage, total | The total amount of storage used in OneDrive for Business. |
Dependent item | ms365.onedrive.storage.total Preprocessing
|
SharePoint: Files, activity, report date | The date of the report of the number of active SharePoint files. |
Dependent item | ms365.sharepoint.files.report_date Preprocessing
|
SharePoint: Files, viewed or edited | The number of viewed or edited SharePoint files. |
Dependent item | ms365.sharepoint.files.viewedoredited Preprocessing
|
SharePoint: Files, synced | The number of files synced to a SharePoint site. |
Dependent item | ms365.sharepoint.files.synced Preprocessing
|
SharePoint: Files, shared internally | The number of internally shared SharePoint files. |
Dependent item | ms365.sharepoint.files.shared_internally Preprocessing
|
SharePoint: Files, shared externally | The number of externally shared SharePoint files. |
Dependent item | ms365.sharepoint.files.shared_externally Preprocessing
|
SharePoint: Users, report date | The date of the report of the number of active SharePoint users. |
Dependent item | ms365.sharepoint.usercount.reportdate Preprocessing
|
SharePoint: Users, viewed or edited | The number of users who have viewed or edited SharePoint files. |
Dependent item | ms365.sharepoint.usercount.viewedor_edited Preprocessing
|
SharePoint: Users, synced | The number of users who have synced SharePoint files. |
Dependent item | ms365.sharepoint.user_count.synced Preprocessing
|
SharePoint: Users, shared internally | The number of users who have shared SharePoint files internally. |
Dependent item | ms365.sharepoint.usercount.sharedinternally Preprocessing
|
SharePoint: Users, shared externally | The number of users who have shared SharePoint files externally. |
Dependent item | ms365.sharepoint.usercount.sharedexternally Preprocessing
|
SharePoint: Users, pages visited | The number of users who have visited unique pages. |
Dependent item | ms365.sharepoint.usercount.visitedpage Preprocessing
|
SharePoint: Pages, visited, report date | The date of the report of the number of pages visited. |
Dependent item | ms365.sharepoint.pagesvisited.reportdate Preprocessing
|
SharePoint: Pages, visited | The number of unique pages visited by users. |
Dependent item | ms365.sharepoint.pages_visited.count Preprocessing
|
SharePoint: File count, report date | The date of the SharePoint file count report. |
Dependent item | ms365.sharepoint.filecount.reportdate Preprocessing
|
SharePoint: File count, total | The total number of files across all sites. |
Dependent item | ms365.sharepoint.file_count.total Preprocessing
|
SharePoint: File count, active | The total number of active files across all sites. A file is considered active if it has been saved, synced, modified, or shared. |
Dependent item | ms365.sharepoint.file_count.active Preprocessing
|
SharePoint: Sites, report date | The date of the report of the number of SharePoint sites. |
Dependent item | ms365.sharepoint.site.report_date Preprocessing
|
SharePoint: Sites, total | The number of SharePoint sites. |
Dependent item | ms365.sharepoint.sites.total Preprocessing
|
SharePoint: Sites, active | The number of active SharePoint sites. |
Dependent item | ms365.sharepoint.sites.active Preprocessing
|
SharePoint: Storage, report date | The date of the report of the amount of storage used in SharePoint. |
Dependent item | ms365.sharepoint.storage.report_date Preprocessing
|
SharePoint: Storage, total | The total amount of storage used in SharePoint. |
Dependent item | ms365.sharepoint.storage.total Preprocessing
|
SharePoint: Pages, viewed, report date | The date of the report of the number of pages viewed. |
Dependent item | ms365.sharepoint.pagesviewed.reportdate Preprocessing
|
SharePoint: Pages, view count | The number of pages viewed across all sites. |
Dependent item | ms365.sharepoint.pages_viewed.count Preprocessing
|
Apps: Users, report date | The date of the active user count report. |
Dependent item | ms365.apps.users.report_date Preprocessing
|
Apps: Users, Office 365 | The number of daily Office 365 users. |
Dependent item | ms365.apps.users.office365 Preprocessing
|
Apps: Users, Exchange | The number of daily Exchange users. |
Dependent item | ms365.apps.users.exchange Preprocessing
|
Apps: Users, OneDrive | The number of daily OneDrive users. |
Dependent item | ms365.apps.users.onedrive Preprocessing
|
Apps: Users, SharePoint | The number of daily SharePoint users. |
Dependent item | ms365.apps.users.sharepoint Preprocessing
|
Apps: Users, Skype for Business | The number of daily Skype for Business users. |
Dependent item | ms365.apps.users.skype Preprocessing
|
Apps: Users, Yammer | The number of daily Yammer users. |
Dependent item | ms365.apps.users.yammer Preprocessing
|
Apps: Users, Teams | The number of daily Teams users. |
Dependent item | ms365.apps.users.teams Preprocessing
|
Apps: Activity, report date | The date of the report of the user count by activity. |
Dependent item | ms365.apps.activity.report_date Preprocessing
|
Apps: Activity, Exchange active users | The number of active Exchange users during the week before the report date. |
Dependent item | ms365.apps.activity.exchange.users.active Preprocessing
|
Apps: Activity, Exchange inactive users | The number of inactive Exchange users during the week before the report date. |
Dependent item | ms365.apps.activity.exchange.users.inactive Preprocessing
|
Apps: Activity, OneDrive active users | The number of active OneDrive users during the week before the report date. |
Dependent item | ms365.apps.activity.onedrive.users.active Preprocessing
|
Apps: Activity, OneDrive inactive users | The number of inactive OneDrive users during the week before the report date. |
Dependent item | ms365.apps.activity.onedrive.users.inactive Preprocessing
|
Apps: Activity, SharePoint active users | The number of active SharePoint users during the week before the report date. |
Dependent item | ms365.apps.activity.sharepoint.users.active Preprocessing
|
Apps: Activity, SharePoint inactive users | The number of inactive SharePoint users during the week before the report date. |
Dependent item | ms365.apps.activity.sharepoint.users.inactive Preprocessing
|
Apps: Activity, Skype for Business active users | The number of active Skype for Business users during the week before the report date. |
Dependent item | ms365.apps.activity.skypeforbusiness.users.active Preprocessing
|
Apps: Activity, Skype for Business inactive users | The number of inactive Skype for Business users during the week before the report date. |
Dependent item | ms365.apps.activity.skypeforbusiness.users.inactive Preprocessing
|
Apps: Activity, Yammer active users | The number of active Yammer users during the week before the report date. |
Dependent item | ms365.apps.activity.yammer.users.active Preprocessing
|
Apps: Activity, Yammer inactive users | The number of inactive Yammer users during the week before the report date. |
Dependent item | ms365.apps.activity.yammer.users.inactive Preprocessing
|
Apps: Activity, Teams active users | The number of active Teams users during the week before the report date. |
Dependent item | ms365.apps.activity.teams.users.active Preprocessing
|
Apps: Activity, Teams inactive users | The number of inactive Teams users during the week before the report date. |
Dependent item | ms365.apps.activity.teams.users.inactive Preprocessing
|
Apps: Activity, Office 365 active users | The number of active Office 365 users during the week before the report date. |
Dependent item | ms365.apps.activity.office365.users.active Preprocessing
|
Apps: Activity, Office 365 inactive users | The number of inactive Office 365 users during the week before the report date. |
Dependent item | ms365.apps.activity.office365.users.inactive Preprocessing
|
Apps: Office, user count report date | The date of the report of the number of active users for each app. |
Dependent item | ms365.apps.office.usercount.reportdate Preprocessing
|
Apps: Office, Outlook user count | The number of active Outlook users. |
Dependent item | ms365.apps.office.user_count.outlook Preprocessing
|
Apps: Office, Word user count | The number of active Word users. |
Dependent item | ms365.apps.office.user_count.word Preprocessing
|
Apps: Office, Excel user count | The number of active Excel users. |
Dependent item | ms365.apps.office.user_count.excel Preprocessing
|
Apps: Office, PowerPoint user count | The number of active PowerPoint users. |
Dependent item | ms365.apps.office.user_count.powerpoint Preprocessing
|
Apps: Office, OneNote user count | The number of active OneNote users. |
Dependent item | ms365.apps.office.user_count.onenote Preprocessing
|
Apps: Office, Teams user count | The number of active Teams users. |
Dependent item | ms365.apps.office.user_count.teams Preprocessing
|
Apps: Platform, user count report date | The date of the report of the number of active users per platform. |
Dependent item | ms365.apps.platform.usercount.reportdate Preprocessing
|
Apps: Platform, Windows user count | The number of active users on the Windows platform. |
Dependent item | ms365.apps.platform.user_count.windows Preprocessing
|
Apps: Platform, Mac user count | The number of active users on the Mac platform. |
Dependent item | ms365.apps.platform.user_count.mac Preprocessing
|
Apps: Platform, mobile user count | The number of active users on the mobile platform. |
Dependent item | ms365.apps.platform.user_count.mobile Preprocessing
|
Apps: Platform, web user count | The number of active users on the web platform. |
Dependent item | ms365.apps.platform.user_count.web Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Microsoft 365: Services: There are errors in requests to API | Zabbix has received errors in response to API requests. |
length(last(/Microsoft 365 reports by HTTP/ms365.services.errors))>0 |Average |
||
Microsoft 365: Teams: There are errors in requests to API | Zabbix has received errors in response to API requests. |
length(last(/Microsoft 365 reports by HTTP/ms365.teams.errors))>0 |Average |
||
Microsoft 365: Outlook: There are errors in requests to API | Zabbix has received errors in response to API requests. |
length(last(/Microsoft 365 reports by HTTP/ms365.outlook.errors))>0 |Average |
||
Microsoft 365: OneDrive: There are errors in requests to API | Zabbix has received errors in response to API requests. |
length(last(/Microsoft 365 reports by HTTP/ms365.onedrive.errors))>0 |Average |
||
Microsoft 365: SharePoint: There are errors in requests to API | Zabbix has received errors in response to API requests. |
length(last(/Microsoft 365 reports by HTTP/ms365.sharepoint.errors))>0 |Average |
||
Microsoft 365: Apps: There are errors in requests to API | Zabbix has received errors in response to API requests. |
length(last(/Microsoft 365 reports by HTTP/ms365.apps.errors))>0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Microsoft cloud service discovery | The list of Microsoft cloud services subscribed by a tenant. |
Dependent item | ms365.service.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Services: {#NAME} health status | Overall service health status of More information about health status values can be found here: https://learn.microsoft.com/en-us/graph/api/resources/servicehealthissue?view=graph-rest-beta#servicehealthstatus-values |
Dependent item | ms365.service.health[{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Microsoft 365: Services: {#NAME} is degraded. | The service has the "Degraded" health status. |
last(/Microsoft 365 reports by HTTP/ms365.service.health[{#NAME}])=6 |Warning |
Manual close: Yes | |
Microsoft 365: Services: {#NAME} is interrupted. | The service has the "Interruption" health status. |
last(/Microsoft 365 reports by HTTP/ms365.service.health[{#NAME}])=7 |Average |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Memcached monitoring by Zabbix via Zabbix agent 2 and doesn't require any external scripts.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Setup and configure zabbix-agent2 compiled with the Memcached monitoring plugin.
Test availability: zabbix_get -s memcached-host -k memcached.ping
Name | Description | Default |
---|---|---|
{$MEMCACHED.CONN.URI} | Connection string in the URI format (password is not used). This param overwrites a value configured in the "Plugins.Memcached.Uri" option of the configuration file (if it's set), otherwise, the plugin's default value is used: "tcp://localhost:11211" |
tcp://localhost:11211 |
{$MEMCACHED.CONN.THROTTLED.MAX.WARN} | Maximum number of throttled connections per second |
1 |
{$MEMCACHED.CONN.QUEUED.MAX.WARN} | Maximum number of queued connections per second |
1 |
{$MEMCACHED.CONN.PRC.MAX.WARN} | Maximum percentage of connected clients |
80 |
{$MEMCACHED.MEM.PUSED.MAX.WARN} | Maximum percentage of memory used |
90 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get status | Zabbix agent | memcached.stats["{$MEMCACHED.CONN.URI}"] | |
Ping | Zabbix agent | memcached.ping["{$MEMCACHED.CONN.URI}"] Preprocessing
|
|
Max connections | Max number of concurrent connections |
Dependent item | memcached.connections.max Preprocessing
|
Maximum number of bytes | Maximum number of bytes allowed in cache. You can adjust this setting via a config file or the command line while starting your Memcached server. |
Dependent item | memcached.config.limit_maxbytes Preprocessing
|
CPU sys | System CPU consumed by the Memcached server |
Dependent item | memcached.cpu.sys Preprocessing
|
CPU user | User CPU consumed by the Memcached server |
Dependent item | memcached.cpu.user Preprocessing
|
Queued connections per second | Number of times that memcached has hit its connections limit and disabled its listener |
Dependent item | memcached.connections.queued.rate Preprocessing
|
New connections per second | Number of connections opened per second |
Dependent item | memcached.connections.rate Preprocessing
|
Throttled connections | Number of times a client connection was throttled. When sending GETs in batch mode and the connection contains too many requests (limited by -R parameter) the connection might be throttled to prevent starvation. |
Dependent item | memcached.connections.throttled.rate Preprocessing
|
Connection structures | Number of connection structures allocated by the server |
Dependent item | memcached.connections.structures Preprocessing
|
Open connections | The number of clients presently connected |
Dependent item | memcached.connections.current Preprocessing
|
Commands: FLUSH per second | The flush_all command invalidates all items in the database. This operation incurs a performance penalty and shouldn't take place in production, so check your debug scripts. |
Dependent item | memcached.commands.flush.rate Preprocessing
|
Commands: GET per second | Number of GET requests received by server per second. |
Dependent item | memcached.commands.get.rate Preprocessing
|
Commands: SET per second | Number of SET requests received by server per second. |
Dependent item | memcached.commands.set.rate Preprocessing
|
Process id | PID of the server process |
Dependent item | memcached.process_id Preprocessing
|
Memcached version | Version of the Memcached server |
Dependent item | memcached.version Preprocessing
|
Uptime | Number of seconds since Memcached server start |
Dependent item | memcached.uptime Preprocessing
|
Bytes used | Current number of bytes used to store items. |
Dependent item | memcached.stats.bytes Preprocessing
|
Written bytes per second | The network's read rate per second in B/sec |
Dependent item | memcached.stats.bytes_written.rate Preprocessing
|
Read bytes per second | The network's read rate per second in B/sec |
Dependent item | memcached.stats.bytes_read.rate Preprocessing
|
Hits per second | Number of successful GET requests (items requested and found) per second. |
Dependent item | memcached.stats.hits.rate Preprocessing
|
Misses per second | Number of missed GET requests (items requested but not found) per second. |
Dependent item | memcached.stats.misses.rate Preprocessing
|
Evictions per second | "An eviction is when an item that still has time to live is removed from the cache because a brand new item needs to be allocated. The item is selected with a pseudo-LRU mechanism. A high number of evictions coupled with a low hit rate means your application is setting a large number of keys that are never used again." |
Dependent item | memcached.stats.evictions.rate Preprocessing
|
New items per second | Number of new items stored per second. |
Dependent item | memcached.stats.total_items.rate Preprocessing
|
Current number of items stored | Current number of items stored by this instance. |
Dependent item | memcached.stats.curr_items Preprocessing
|
Threads | Number of worker threads requested |
Dependent item | memcached.stats.threads Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Memcached: Service is down | last(/Memcached by Zabbix agent 2/memcached.ping["{$MEMCACHED.CONN.URI}"])=0 |Average |
Manual close: Yes | ||
Memcached: Failed to fetch info data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Memcached by Zabbix agent 2/memcached.cpu.sys,30m)=1 |Warning |
Manual close: Yes Depends on:
|
|
Memcached: Too many queued connections | The max number of connections is reached and a new connection had to wait in the queue as a result. |
min(/Memcached by Zabbix agent 2/memcached.connections.queued.rate,5m)>{$MEMCACHED.CONN.QUEUED.MAX.WARN} |Warning |
||
Memcached: Too many throttled connections | Number of times a client connection was throttled is too high. |
min(/Memcached by Zabbix agent 2/memcached.connections.throttled.rate,5m)>{$MEMCACHED.CONN.THROTTLED.MAX.WARN} |Warning |
||
Memcached: Total number of connected clients is too high | When the number of connections reaches the value of the "max_connections" parameter, new connections will be rejected. |
min(/Memcached by Zabbix agent 2/memcached.connections.current,5m)/last(/Memcached by Zabbix agent 2/memcached.connections.max)*100>{$MEMCACHED.CONN.PRC.MAX.WARN} |Warning |
||
Memcached: Version has changed | The Memcached version has changed. Acknowledge to close the problem manually. |
last(/Memcached by Zabbix agent 2/memcached.version,#1)<>last(/Memcached by Zabbix agent 2/memcached.version,#2) and length(last(/Memcached by Zabbix agent 2/memcached.version))>0 |Info |
Manual close: Yes | |
Memcached: Service has been restarted | Uptime is less than 10 minutes. |
last(/Memcached by Zabbix agent 2/memcached.uptime)<10m |Info |
Manual close: Yes | |
Memcached: Memory usage is too high | min(/Memcached by Zabbix agent 2/memcached.stats.bytes,5m)/last(/Memcached by Zabbix agent 2/memcached.config.limit_maxbytes)*100>{$MEMCACHED.MEM.PUSED.MAX.WARN} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Mantis BT monitoring by Zabbix via HTTP and doesn't require any external scripts.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Name | Description | Default |
---|---|---|
{$MANTIS.URL} | MantisBT URL. |
|
{$MANTIS.TOKEN} | MantisBT Token. |
|
{$MANTIS.LLD.FILTER.PROJECTS.MATCHES} | Filter of discoverable projects. |
.* |
{$MANTIS.LLD.FILTER.PROJECTS.NOT_MATCHES} | Filter to exclude discovered projects. |
CHANGE_IF_NEEDED |
{$MANTIS.HTTP.PROXY} | Proxy for http requests. |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get projects | Get projects from Mantis BT. |
HTTP agent | mantisbt.get.projects |
Name | Description | Type | Key and additional info |
---|---|---|---|
Projects discovery | Discovery rule for a Mantis BT projects. |
Dependent item | mantisbt.projects.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Project [{#NAME}]: Get issues | Getting project issues. |
HTTP agent | mantisbt.get.issues[{#NAME}] |
Project [{#NAME}]: Total issues | Count of issues in project. |
Dependent item | mantis.project.total_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: New issues | Count of issues with 'new' status. |
Dependent item | mantis.project.status.new_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Resolved issues | Count of issues with 'resolved' status. |
Dependent item | mantis.project.status.resolved_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Closed issues | Count of issues with 'closed' status. |
Dependent item | mantis.project.status.closed_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Assigned issues | Count of issues with 'assigned' status. |
Dependent item | mantis.project.status.assigned_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Feedback issues | Count of issues with 'feedback' status. |
Dependent item | mantis.project.status.feedback_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Acknowledged issues | Count of issues with 'acknowledged' status. |
Dependent item | mantis.project.status.acknowledged_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Confirmed issues | Count of issues with 'confirmed' status. |
Dependent item | mantis.project.status.confirmed_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Open issues | Count of "open" resolution issues. |
Dependent item | mantis.project.resolution.open_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Fixed issues | Count of "fixed" resolution issues. |
Dependent item | mantis.project.resolution.fixed_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Reopened issues | Count of "reopened" resolution issues. |
Dependent item | mantis.project.resolution.reopened_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Unable to reproduce issues | Count of "unable to reproduce" resolution issues. |
Dependent item | mantis.project.resolution.unabletoreproduce_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Not fixable issues | Count of "not fixable" resolution issues. |
Dependent item | mantis.project.resolution.notfixableissues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Duplicate issues | Count of "duplicate" resolution issues. |
Dependent item | mantis.project.resolution.duplicate_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: No change required issues | Count of "no change required" resolution issues. |
Dependent item | mantis.project.resolution.nochangerequired_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Suspended issues | Count of "suspended" resolution issues. |
Dependent item | mantis.project.resolution.suspended_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Will not fix issues | Count of "wont fix" resolution issues. |
Dependent item | mantis.project.resolution.wontfixissues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Feature severity issues | Count of "feature" severity issues. |
Dependent item | mantis.project.severity.feature_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Trivial severity issues | Count of "trivial" severity issues. |
Dependent item | mantis.project.severity.trivial_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Text severity issues | Count of "text" severity issues. |
Dependent item | mantis.project.severity.text_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Tweak severity issues | Count of "tweak" severity issues. |
Dependent item | mantis.project.severity.tweak_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Minor severity issues | Count of "minor" severity issues. |
Dependent item | mantis.project.severity.minor_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Major severity issues | Count of "major" severity issues. |
Dependent item | mantis.project.severity.major_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Crash severity issues | Count of "crash" severity issues. |
Dependent item | mantis.project.severity.crash_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Block severity issues | Count of "block" severity issues. |
Dependent item | mantis.project.severity.block_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: None priority issues | Count of "none" priority issues. |
Dependent item | mantis.project.priority.none_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Low priority issues | Count of "low" priority issues. |
Dependent item | mantis.project.priority.low_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Normal priority issues | Count of "normal" priority issues. |
Dependent item | mantis.project.priority.normal_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: High priority issues | Count of "high" priority issues. |
Dependent item | mantis.project.priority.high_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Urgent priority issues | Count of "urgent" priority issues. |
Dependent item | mantis.project.priority.urgent_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Immediate priority issues | Count of "immediate" priority issues. |
Dependent item | mantis.project.priority.immediate_issues[{#NAME}] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Kubernetes state. It works without external scripts and uses the script item to make HTTP requests to the Kubernetes API.
Template Kubernetes cluster state by HTTP
- collects metrics by HTTP agent from kube-state-metrics endpoint and Kubernetes API.
Don't forget to change macros {$KUBE.API.URL} and {$KUBE.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values.
NOTE. Some metrics may not be collected depending on your Kubernetes version and configuration.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Install the Zabbix Helm Chart in your Kubernetes cluster. Internal service metrics are collected from kube-state-metrics endpoint.
Template needs to use authorization via API token.
Set the {$KUBE.API.URL}
such as <scheme>://<host>:<port>
.
Get the generated service account token using the command:
kubectl get secret zabbix-service-account -n monitoring -o jsonpath={.data.token} | base64 -d
Then set it to the macro {$KUBE.API.TOKEN}
.
Set {$KUBE.STATE.ENDPOINT.NAME}
with Kube state metrics endpoint name. See kubectl -n monitoring get ep
. Default: zabbix-kube-state-metrics
.
NOTE. If you wish to monitor Controller Manager and Scheduler components, you might need to set the --binding-address
option for them to the address where Zabbix proxy can reach them.
For example, for clusters created with kubeadm
it can be set in the following manifest files (changes will be applied immediately):
Depending on your Kubernetes distribution, you might need to adjust {$KUBE.CONTROL_PLANE.TAINT}
macro (for example, set it to node-role.kubernetes.io/master
for OpenShift).
NOTE. Some metrics may not be collected depending on your Kubernetes version and configuration.
Also, see the Macros section for a list of macros used to set trigger values.
Set up the macros to filter the metrics of discovered Kubelets by node names:
Set up macros to filter metrics by namespace:
Set up macros to filter node metrics by nodename:
Note: If you have a large cluster, it is highly recommended to set a filter for discoverable namespaces.
You can use the {$KUBE.KUBELET.FILTER.LABELS}
and {$KUBE.KUBELET.FILTER.ANNOTATIONS}
macros for advanced filtering of kubelets by node labels and annotations.
Notes about labels and annotations filters:
key1: value, key2: regexp
).!
) to invert the filter (!key: value
).For example: kubernetes.io/hostname: kubernetes-node[5-25], !node-role.kubernetes.io/ingress: .*
. As a result, the kubelets on nodes 5-25 without the "ingress" role will be discovered.
See the Kubernetes documentation for details about labels and annotations:
You can also set up evaluation periods for replica mismatch triggers (Deployments, ReplicaSets, StatefulSets) with the macro {$KUBE.REPLICA.MISMATCH.EVAL_PERIOD}
, which supports context and regular expressions. For example, you can create the following macros:
{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:"deployment:default:nginx-deployment"} = #3
{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:regex:"deployment:.*:.*"} = #10
or {$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:regex:"^deployment.*"} = #10
{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:regex:".*:default:.*"} = 15m
Note that different context macros with regular expressions matching the same string can be applied in an undefined order, and simple context macros (without regular expressions) have higher priority. Read the Important notes section in Zabbix documentation
for details.
Name | Description | Default |
---|---|---|
{$KUBE.API.URL} | Kubernetes API endpoint URL in the format |
https://kubernetes.default.svc.cluster.local:443 |
{$KUBE.API.READYZ.ENDPOINT} | Kubernetes API readyz endpoint /readyz |
/readyz |
{$KUBE.API.LIVEZ.ENDPOINT} | Kubernetes API livez endpoint /livez |
/livez |
{$KUBE.API.COMPONENTSTATUSES.ENDPOINT} | Kubernetes API componentstatuses endpoint /api/v1/componentstatuses |
/api/v1/componentstatuses |
{$KUBE.API.TOKEN} | Service account bearer token. |
|
{$KUBE.HTTP.PROXY} | Sets the HTTP proxy to |
|
{$KUBE.STATE.ENDPOINT.NAME} | Kubernetes state endpoint name. |
zabbix-kube-state-metrics |
{$OPENSHIFT.STATE.ENDPOINT.NAME} | OpenShift state endpoint name. |
openshift-state-metrics |
{$KUBE.API_SERVER.SCHEME} | Kubernetes API servers metrics endpoint scheme. Used in ControlPlane LLD. |
https |
{$KUBE.API_SERVER.PORT} | Kubernetes API servers metrics endpoint port. Used in ControlPlane LLD. |
6443 |
{$KUBE.CONTROL_PLANE.TAINT} | Taint that applies to control plane nodes. Change if needed. Used in ControlPlane LLD. |
node-role.kubernetes.io/control-plane |
{$KUBE.CONTROLLER_MANAGER.SCHEME} | Kubernetes Controller manager metrics endpoint scheme. Used in ControlPlane LLD. |
https |
{$KUBE.CONTROLLER_MANAGER.PORT} | Kubernetes Controller manager metrics endpoint port. Used in ControlPlane LLD. |
10257 |
{$KUBE.SCHEDULER.SCHEME} | Kubernetes Scheduler metrics endpoint scheme. Used in ControlPlane LLD. |
https |
{$KUBE.SCHEDULER.PORT} | Kubernetes Scheduler metrics endpoint port. Used in ControlPlane LLD. |
10259 |
{$KUBE.KUBELET.SCHEME} | Kubernetes Kubelet metrics endpoint scheme. Used in Kubelet LLD. |
https |
{$KUBE.KUBELET.PORT} | Kubernetes Kubelet metrics endpoint port. Used in Kubelet LLD. |
10250 |
{$KUBE.LLD.FILTER.NAMESPACE.MATCHES} | Filter of discoverable metrics by namespace. |
.* |
{$KUBE.LLD.FILTER.NAMESPACE.NOT_MATCHES} | Filter to exclude discovered metrics by namespace. |
CHANGE_IF_NEEDED |
{$KUBE.LLD.FILTER.NODE.MATCHES} | Filter of discoverable nodes by nodename. |
.* |
{$KUBE.LLD.FILTER.NODE.NOT_MATCHES} | Filter to exclude discovered nodes by nodename. |
CHANGE_IF_NEEDED |
{$KUBE.LLD.FILTER.KUBELET_NODE.MATCHES} | Filter of discoverable Kubelets by nodename. |
.* |
{$KUBE.LLD.FILTER.KUBELETNODE.NOTMATCHES} | Filter to exclude discovered Kubelets by nodename. |
CHANGE_IF_NEEDED |
{$KUBE.KUBELET.FILTER.ANNOTATIONS} | Node annotations to filter Kubelets (regex in values are supported). See the template's README.md for details. |
|
{$KUBE.KUBELET.FILTER.LABELS} | Node labels to filter Kubelets (regex in values are supported). See the template's README.md for details. |
|
{$KUBE.LLD.FILTER.PV.MATCHES} | Filter of discoverable persistent volumes by name. |
.* |
{$KUBE.LLD.FILTER.PV.NOT_MATCHES} | Filter to exclude discovered persistent volumes by name. |
CHANGE_IF_NEEDED |
{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD} | The evaluation period range which is used for calculation of expressions in trigger prototypes (time period or value range). Can be used with context. |
#5 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get state metrics | Collecting Kubernetes metrics from kube-state-metrics. |
Script | kube.state.metrics |
Control plane LLD | Generation of data for Control plane discovery rules. |
Script | kube.control_plane.lld Preprocessing
|
Node LLD | Generation of data for Kubelet discovery rules. |
Script | kube.node.lld Preprocessing
|
Get component statuses | HTTP agent | kube.componentstatuses Preprocessing
|
|
Get readyz | HTTP agent | kube.readyz Preprocessing
|
|
Get livez | HTTP agent | kube.livez Preprocessing
|
|
Namespace count | The number of namespaces. |
Dependent item | kube.namespace.count Preprocessing
|
CronJob count | Number of cronjobs. |
Dependent item | kube.cronjob.count Preprocessing
|
Job count | Number of jobs (generated by cronjob + job). |
Dependent item | kube.job.count Preprocessing
|
Endpoint count | Number of endpoints. |
Dependent item | kube.endpoint.count Preprocessing
|
Deployment count | The number of deployments. |
Dependent item | kube.deployment.count Preprocessing
|
Service count | The number of services. |
Dependent item | kube.service.count Preprocessing
|
StatefulSet count | The number of statefulsets. |
Dependent item | kube.statefulset.count Preprocessing
|
Node count | The number of nodes. |
Dependent item | kube.node.count Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
API servers discovery | Dependent item | kube.api_servers.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Controller manager nodes discovery | Dependent item | kube.controller_manager.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Scheduler servers nodes discovery | Dependent item | kube.scheduler.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubelet discovery | Dependent item | kube.kubelet.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Daemonset discovery | Dependent item | kube.daemonset.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Namespace [{#NAMESPACE}] Daemonset [{#NAME}]: Ready | The number of nodes that should be running the daemon pod and have one or more running and ready. |
Dependent item | kube.daemonset.ready[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Daemonset [{#NAME}]: Scheduled | The number of nodes that run at least one daemon pod and are supposed to. |
Dependent item | kube.daemonset.scheduled[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Daemonset [{#NAME}]: Desired | The number of nodes that should be running the daemon pod. |
Dependent item | kube.daemonset.desired[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Daemonset [{#NAME}]: Misscheduled | The number of nodes that run a daemon pod but are not supposed to. |
Dependent item | kube.daemonset.misscheduled[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Daemonset [{#NAME}]: Updated number scheduled | The total number of nodes that are running updated daemon pod. |
Dependent item | kube.daemonset.updated[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
PVC discovery | Dependent item | kube.pvc.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Namespace [{#NAMESPACE}] PVC [{#NAME}] Status phase | The current status phase of the persistent volume claim. |
Dependent item | kube.pvc.status_phase[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] PVC [{#NAME}] Requested storage | The capacity of storage requested by the persistent volume claim. |
Dependent item | kube.pvc.requested.storage[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] PVC status phase: Bound, sum | The total amount of persistent volume claims in the Bound phase. |
Dependent item | kube.pvc.status_phase.bound.sum[{#NAMESPACE}] Preprocessing
|
Namespace [{#NAMESPACE}] PVC status phase: Lost, sum | The total amount of persistent volume claims in the Lost phase. |
Dependent item | kube.pvc.status_phase.lost.sum[{#NAMESPACE}] Preprocessing
|
Namespace [{#NAMESPACE}] PVC status phase: Pending, sum | The total amount of persistent volume claims in the Pending phase. |
Dependent item | kube.pvc.status_phase.pending.sum[{#NAMESPACE}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes cluster state: NS [{#NAMESPACE}] PVC [{#NAME}]: PVC is pending | count(/Kubernetes cluster state by HTTP/kube.pvc.status_phase[{#NAMESPACE}/{#NAME}],2m,,5)>=2 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
PV discovery | Dependent item | kube.pv.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
PV [{#NAME}] Status phase | The current status phase of the persistent volume. |
Dependent item | kube.pv.status_phase[{#NAME}] Preprocessing
|
PV [{#NAME}] Capacity bytes | A capacity of the persistent volume in bytes. |
Dependent item | kube.pv.capacity.bytes[{#NAME}] Preprocessing
|
PV status phase: Pending, sum | The total amount of persistent volumes in the Pending phase. |
Dependent item | kube.pv.status_phase.pending.sum[{#SINGLETON}] Preprocessing
|
PV status phase: Available, sum | The total amount of persistent volumes in the Available phase. |
Dependent item | kube.pv.status_phase.available.sum[{#SINGLETON}] Preprocessing
|
PV status phase: Bound, sum | The total amount of persistent volumes in the Bound phase. |
Dependent item | kube.pv.status_phase.bound.sum[{#SINGLETON}] Preprocessing
|
PV status phase: Released, sum | The total amount of persistent volumes in the Released phase. |
Dependent item | kube.pv.status_phase.released.sum[{#SINGLETON}] Preprocessing
|
PV status phase: Failed, sum | The total amount of persistent volumes in the Failed phase. |
Dependent item | kube.pv.status_phase.failed.sum[{#SINGLETON}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes cluster state: PV [{#NAME}]: PV has failed | count(/Kubernetes cluster state by HTTP/kube.pv.status_phase[{#NAME}],2m,,3)>=2 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Deployment discovery | Dependent item | kube.deployment.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Paused | Whether the deployment is paused and will not be processed by the deployment controller. |
Dependent item | kube.deployment.spec_paused[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas desired | Number of desired pods for a deployment. |
Dependent item | kube.deployment.replicas_desired[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Rollingupdate max unavailable | Maximum number of unavailable replicas during a rolling update of a deployment. |
Dependent item | kube.deployment.rollingupdate.max_unavailable[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas | The number of replicas per deployment. |
Dependent item | kube.deployment.replicas[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas available | The number of available replicas per deployment. |
Dependent item | kube.deployment.replicas_available[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas unavailable | The number of unavailable replicas per deployment. |
Dependent item | kube.deployment.replicas_unavailable[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas updated | The number of updated replicas per deployment. |
Dependent item | kube.deployment.replicas_updated[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas mismatched | The number of available replicas not matching the desired number of replicas. |
Dependent item | kube.deployment.replicas_mismatched[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes cluster state: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Deployment replicas mismatch | Deployment has not matched the expected number of replicas during the specified trigger evaluation period. |
min(/Kubernetes cluster state by HTTP/kube.deployment.replicas_mismatched[{#NAMESPACE}/{#NAME}],{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:"deployment:{#NAMESPACE}:{#NAME}"})>0 and last(/Kubernetes cluster state by HTTP/kube.deployment.replicas_desired[{#NAMESPACE}/{#NAME}])>=0 and last(/Kubernetes cluster state by HTTP/kube.deployment.replicas_available[{#NAMESPACE}/{#NAME}])>=0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Endpoint discovery | Dependent item | kube.endpoint.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Namespace [{#NAMESPACE}] Endpoint [{#NAME}]: Address available | Number of addresses available in endpoint. |
Dependent item | kube.endpoint.address_available[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Endpoint [{#NAME}]: Address not ready | Number of addresses not ready in endpoint. |
Dependent item | kube.endpoint.addressnotready[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Endpoint [{#NAME}]: Age | Endpoint age (number of seconds since creation). |
Dependent item | kube.endpoint.age[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Node discovery | Dependent item | kube.node.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Node [{#NAME}]: CPU allocatable | The CPU resources of a node that are available for scheduling. |
Dependent item | kube.node.cpu_allocatable[{#NAME}] Preprocessing
|
Node [{#NAME}]: Memory allocatable | The memory resources of a node that are available for scheduling. |
Dependent item | kube.node.memory_allocatable[{#NAME}] Preprocessing
|
Node [{#NAME}]: Pods allocatable | The pods resources of a node that are available for scheduling. |
Dependent item | kube.node.pods_allocatable[{#NAME}] Preprocessing
|
Node [{#NAME}]: Ephemeral storage allocatable | The allocatable ephemeral storage of a node that is available for scheduling. |
Dependent item | kube.node.ephemeralstorageallocatable[{#NAME}] Preprocessing
|
Node [{#NAME}]: CPU capacity | The capacity for CPU resources of a node. |
Dependent item | kube.node.cpu_capacity[{#NAME}] Preprocessing
|
Node [{#NAME}]: Memory capacity | The capacity for memory resources of a node. |
Dependent item | kube.node.memory_capacity[{#NAME}] Preprocessing
|
Node [{#NAME}]: Ephemeral storage capacity | The ephemeral storage capacity of a node. |
Dependent item | kube.node.ephemeralstoragecapacity[{#NAME}] Preprocessing
|
Node [{#NAME}]: Pods capacity | The capacity for pods resources of a node. |
Dependent item | kube.node.pods_capacity[{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Pod discovery | Dependent item | kube.pod.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Namespace [{#NAMESPACE}] Pod [{#NAME}] Phase: Pending | Pod is in pending state. |
Dependent item | kube.pod.phase.pending[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Pod [{#NAME}] Phase: Succeeded | Pod is in succeeded state. |
Dependent item | kube.pod.phase.succeeded[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Pod [{#NAME}] Phase: Failed | Pod is in failed state. |
Dependent item | kube.pod.phase.failed[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Pod [{#NAME}] Phase: Unknown | Pod is in unknown state. |
Dependent item | kube.pod.phase.unknown[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Pod [{#NAME}] Phase: Running | Pod is in unknown state. |
Dependent item | kube.pod.phase.running[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers terminated | Describes whether the container is currently in terminated state. |
Dependent item | kube.pod.containers_terminated[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers waiting | Describes whether the container is currently in waiting state. |
Dependent item | kube.pod.containers_waiting[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers ready | Describes whether the containers readiness check succeeded. |
Dependent item | kube.pod.containers_ready[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers restarts | The number of container restarts. |
Dependent item | kube.pod.containers_restarts[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers running | Describes whether the container is currently in running state. |
Dependent item | kube.pod.containers_running[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Pod [{#NAME}]: Ready | Describes whether the pod is ready to serve requests. |
Dependent item | kube.pod.ready[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Pod [{#NAME}]: Scheduled | Describes the status of the scheduling process for the pod. |
Dependent item | kube.pod.scheduled[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Pod [{#NAME}]: Unschedulable | Describes the unschedulable status for the pod. |
Dependent item | kube.pod.unschedulable[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers CPU limits | The limit on CPU cores to be used by a container. |
Dependent item | kube.pod.containers.limits.cpu[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers memory limits | The limit on memory to be used by a container. |
Dependent item | kube.pod.containers.limits.memory[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers CPU requests | The number of requested CPU cores by a container. |
Dependent item | kube.pod.containers.requests.cpu[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers memory requests | The number of requested memory bytes by a container. |
Dependent item | kube.pod.containers.requests.memory[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes cluster state: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Pod is not healthy | min(/Kubernetes cluster state by HTTP/kube.pod.phase.failed[{#NAMESPACE}/{#NAME}],10m)>0 or min(/Kubernetes cluster state by HTTP/kube.pod.phase.pending[{#NAMESPACE}/{#NAME}],10m)>0 or min(/Kubernetes cluster state by HTTP/kube.pod.phase.unknown[{#NAMESPACE}/{#NAME}],10m)>0 |High |
|||
Kubernetes cluster state: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Pod is crash looping | Containers of the pod keep restarting. This most likely indicates that the pod is in the CrashLoopBackOff state. |
(last(/Kubernetes cluster state by HTTP/kube.pod.containers_restarts[{#NAMESPACE}/{#NAME}])-min(/Kubernetes cluster state by HTTP/kube.pod.containers_restarts[{#NAMESPACE}/{#NAME}],15m))>1 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
ReplicaSet discovery | Dependent item | kube.replicaset.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Namespace [{#NAMESPACE}] ReplicaSet [{#NAME}]: Replicas | The number of replicas per ReplicaSet. |
Dependent item | kube.replicaset.replicas[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] ReplicaSet [{#NAME}]: Desired replicas | Number of desired pods for a ReplicaSet. |
Dependent item | kube.replicaset.replicas_desired[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] ReplicaSet [{#NAME}]: Fully labeled replicas | The number of fully labeled replicas per ReplicaSet. |
Dependent item | kube.replicaset.fullylabeledreplicas[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] ReplicaSet [{#NAME}]: Ready | The number of ready replicas per ReplicaSet. |
Dependent item | kube.replicaset.ready[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] ReplicaSet [{#NAME}]: Replicas mismatched | The number of ready replicas not matching the desired number of replicas. |
Dependent item | kube.replicaset.replicas_mismatched[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes cluster state: Namespace [{#NAMESPACE}] RS [{#NAME}]: ReplicaSet mismatch | ReplicaSet has not matched the expected number of replicas during the specified trigger evaluation period. |
min(/Kubernetes cluster state by HTTP/kube.replicaset.replicas_mismatched[{#NAMESPACE}/{#NAME}],{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:"replicaset:{#NAMESPACE}:{#NAME}"})>0 and last(/Kubernetes cluster state by HTTP/kube.replicaset.replicas_desired[{#NAMESPACE}/{#NAME}])>=0 and last(/Kubernetes cluster state by HTTP/kube.replicaset.ready[{#NAMESPACE}/{#NAME}])>=0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
StatefulSet discovery | Dependent item | kube.statefulset.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: Replicas | The number of replicas per StatefulSet. |
Dependent item | kube.statefulset.replicas[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: Desired replicas | Number of desired pods for a StatefulSet. |
Dependent item | kube.statefulset.replicas_desired[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: Current replicas | The number of current replicas per StatefulSet. |
Dependent item | kube.statefulset.replicas_current[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: Ready replicas | The number of ready replicas per StatefulSet. |
Dependent item | kube.statefulset.replicas_ready[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: Updated replicas | The number of updated replicas per StatefulSet. |
Dependent item | kube.statefulset.replicas_updated[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: Replicas mismatched | The number of ready replicas not matching the number of replicas. |
Dependent item | kube.statefulset.replicas_mismatched[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes cluster state: Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: StatefulSet is down | (last(/Kubernetes cluster state by HTTP/kube.statefulset.replicas_ready[{#NAMESPACE}/{#NAME}]) / last(/Kubernetes cluster state by HTTP/kube.statefulset.replicas_current[{#NAMESPACE}/{#NAME}]))<>1 |High |
|||
Kubernetes cluster state: Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: StatefulSet replicas mismatch | StatefulSet has not matched the number of replicas during the specified trigger evaluation period. |
min(/Kubernetes cluster state by HTTP/kube.statefulset.replicas_mismatched[{#NAMESPACE}/{#NAME}],{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:"statefulset:{#NAMESPACE}:{#NAME}"})>0 and last(/Kubernetes cluster state by HTTP/kube.statefulset.replicas[{#NAMESPACE}/{#NAME}])>=0 and last(/Kubernetes cluster state by HTTP/kube.statefulset.replicas_ready[{#NAMESPACE}/{#NAME}])>=0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
PodDisruptionBudget discovery | Dependent item | kube.pdb.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Namespace [{#NAMESPACE}] PodDisruptionBudget [{#NAME}]: Pods healthy | Current number of healthy pods. |
Dependent item | kube.pdb.pods_healthy[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] PodDisruptionBudget [{#NAME}]: Pods desired | Minimum desired number of healthy pods. |
Dependent item | kube.pdb.pods_desired[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] PodDisruptionBudget [{#NAME}]: Disruptions allowed | Number of pod disruptions that are allowed. |
Dependent item | kube.pdb.disruptions_allowed[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] PodDisruptionBudget [{#NAME}]: Pods total | Total number of pods counted by this disruption budget. |
Dependent item | kube.pdb.pods_total[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
CronJob discovery | Dependent item | kube.cronjob.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Suspend | Suspend flag tells the controller to suspend subsequent executions. |
Dependent item | kube.cronjob.spec_suspend[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Active | Active holds pointers to currently running jobs. |
Dependent item | kube.cronjob.status_active[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Last schedule | LastScheduleTime keeps information of when was the last time the job was successfully scheduled. |
Dependent item | kube.cronjob.lastscheduletime[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Next schedule | Next time the cronjob should be scheduled. The time after lastScheduleTime or after the cron job's creation time if it's never been scheduled. Use this to determine if the job is delayed. |
Dependent item | kube.cronjob.nextscheduletime[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Failed | The number of pods which reached the Failed phase and the reason for failure. |
Dependent item | kube.cronjob.status_failed[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Succeeded | The number of pods which reached the Succeeded phase. |
Dependent item | kube.cronjob.status_succeeded[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Completion succeeded | Number of jobs the execution of which has been completed. |
Dependent item | kube.cronjob.completion.succeeded[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Completion failed | Number of jobs the execution of which has failed. |
Dependent item | kube.cronjob.completion.failed[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Job discovery | Dependent item | kube.job.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Namespace [{#NAMESPACE}] Job [{#NAME}]: Failed | The number of pods which reached the Failed phase and the reason for failure. |
Dependent item | kube.job.status_failed[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Job [{#NAME}]: Succeeded | The number of pods which reached the Succeeded phase. |
Dependent item | kube.job.status_succeeded[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Job [{#NAME}]: Completion succeeded | Number of jobs the execution of which has been completed. |
Dependent item | kube.job.completion.succeeded[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Job [{#NAME}]: Completion failed | Number of jobs the execution of which has failed. |
Dependent item | kube.job.completion.failed[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Component statuses discovery | Dependent item | kube.componentstatuses.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Component [{#NAME}]: Healthy | Cluster component healthy. |
Dependent item | kube.componentstatuses.healthy[{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes cluster state: Component [{#NAME}] is unhealthy | count(/Kubernetes cluster state by HTTP/kube.componentstatuses.healthy[{#NAME}],#2,"ne","True")=2 and length(last(/Kubernetes cluster state by HTTP/kube.componentstatuses.healthy[{#NAME}]))>0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Readyz discovery | Dependent item | kube.readyz.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Readyz [{#NAME}]: Healthcheck | Result of readyz healthcheck for component. |
Dependent item | kube.readyz.healthcheck[{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes cluster state: Readyz [{#NAME}] is unhealthy | count(/Kubernetes cluster state by HTTP/kube.readyz.healthcheck[{#NAME}],#2,"ne","ok")=2 and length(last(/Kubernetes cluster state by HTTP/kube.readyz.healthcheck[{#NAME}]))>0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Livez discovery | Dependent item | kube.livez.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Livez [{#NAME}]: Healthcheck | Result of livez healthcheck for component. |
Dependent item | kube.livez.healthcheck[{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes cluster state: Livez [{#NAME}] is unhealthy | count(/Kubernetes cluster state by HTTP/kube.livez.healthcheck[{#NAME}],#2,"ne","ok")=2 and length(last(/Kubernetes cluster state by HTTP/kube.livez.healthcheck[{#NAME}]))>0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
OpenShift BuildConfig discovery | Dependent item | openshift.buildconfig.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Namespace [{#NAMESPACE}] BuildConfig [{#NAME}]: Created | OpenShift BuildConfig Unix creation timestamp. |
Dependent item | openshift.buildconfig.created.time[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] BuildConfig [{#NAME}]: Generation | Sequence number representing a specific generation of the desired state. |
Dependent item | openshift.buildconfig.generation[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] BuildConfig [{#NAME}]: Latest version | The latest version of BuildConfig. |
Dependent item | openshift.buildconfig.status[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
OpenShift Build discovery | Dependent item | openshift.build.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Namespace [{#NAMESPACE}] Build [{#NAME}]: Created | OpenShift Build Unix creation timestamp. |
Dependent item | openshift.build.created.time[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Build [{#NAME}]: Generation | Sequence number representing a specific generation of the desired state. |
Dependent item | openshift.build.sequence.number[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Build [{#NAME}]: Status phase | The Build phase. |
Dependent item | openshift.build.status_phase[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes cluster state: Build [{#NAME}]: Build has failed | count(/Kubernetes cluster state by HTTP/openshift.build.status_phase[{#NAMESPACE}/{#NAME}],2m,"ge",6)>=2 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
OpenShift ClusterResourceQuota discovery | Dependent item | openshift.cluster.resource.quota.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Quota [{#NAME}] Resource [{#RESOURCE}]: Type [{#TYPE}]] | Usage about resource quota. |
Dependent item | openshift.cluster.resource.quota[{#RESOURCE}/{#NAME}/{#TYPE}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
OpenShift Route discovery | Dependent item | openshift.route.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Namespace [{#NAMESPACE}] Route [{#NAME}]: Created | OpenShift Route Unix creation timestamp. |
Dependent item | openshift.route.created.time[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Route [{#NAME}]: Status | Information about route status. |
Dependent item | openshift.route.status[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes cluster state: Route [{#NAME}] with issue: Status is false | count(/Kubernetes cluster state by HTTP/openshift.route.status[{#NAMESPACE}/{#NAME}],2m,,0)>=2 |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Kubernetes Scheduler by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Kubernetes Scheduler by HTTP
- collects metrics by HTTP agent from Scheduler /metrics endpoint.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Internal service metrics are collected from /metrics endpoint. Template needs to use Authorization via API token.
Don't forget change macros {$KUBE.SCHEDULER.SERVER.URL}, {$KUBE.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values.
NOTE. You might need to set the --binding-address
option for Scheduler to the address where Zabbix proxy can reach it.
For example, for clusters created with kubeadm
it can be set in the following manifest file (changes will be applied immediately):
NOTE. Some metrics may not be collected depending on your Kubernetes Scheduler instance version and configuration.
Name | Description | Default |
---|---|---|
{$KUBE.SCHEDULER.SERVER.URL} | Kubernetes Scheduler metrics endpoint URL. |
https://localhost:10259/metrics |
{$KUBE.API.TOKEN} | API Authorization Token. |
|
{$KUBE.SCHEDULER.HTTP.CLIENT.ERROR} | Maximum number of HTTP client requests failures used for trigger. |
2 |
{$KUBE.SCHEDULER.UNSCHEDULABLE} | Maximum number of scheduling failures with 'unschedulable' used for trigger. |
2 |
{$KUBE.SCHEDULER.ERROR} | Maximum number of scheduling failures with 'error' used for trigger. |
2 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get Scheduler metrics | Get raw metrics from Scheduler instance /metrics endpoint. |
HTTP agent | kubernetes.scheduler.get_metrics Preprocessing
|
Virtual memory, bytes | Virtual memory size in bytes. |
Dependent item | kubernetes.scheduler.processvirtualmemory_bytes Preprocessing
|
Resident memory, bytes | Resident memory size in bytes. |
Dependent item | kubernetes.scheduler.processresidentmemory_bytes Preprocessing
|
CPU | Total user and system CPU usage ratio. |
Dependent item | kubernetes.scheduler.cpu.util Preprocessing
|
Goroutines | Number of goroutines that currently exist. |
Dependent item | kubernetes.scheduler.go_goroutines Preprocessing
|
Go threads | Number of OS threads created. |
Dependent item | kubernetes.scheduler.go_threads Preprocessing
|
Fds open | Number of open file descriptors. |
Dependent item | kubernetes.scheduler.open_fds Preprocessing
|
Fds max | Maximum allowed open file descriptors. |
Dependent item | kubernetes.scheduler.max_fds Preprocessing
|
REST Client requests: 2xx, rate | Number of HTTP requests with 2xx status code per second. |
Dependent item | kubernetes.scheduler.clienthttprequests_200.rate Preprocessing
|
REST Client requests: 3xx, rate | Number of HTTP requests with 3xx status code per second. |
Dependent item | kubernetes.scheduler.clienthttprequests_300.rate Preprocessing
|
REST Client requests: 4xx, rate | Number of HTTP requests with 4xx status code per second. |
Dependent item | kubernetes.scheduler.clienthttprequests_400.rate Preprocessing
|
REST Client requests: 5xx, rate | Number of HTTP requests with 5xx status code per second. |
Dependent item | kubernetes.scheduler.clienthttprequests_500.rate Preprocessing
|
Schedule attempts: scheduled | Number of attempts to schedule pods with result "scheduled" per second. |
Dependent item | kubernetes.scheduler.schedulerscheduleattempts.scheduled.rate Preprocessing
|
Schedule attempts: unschedulable | Number of attempts to schedule pods with result "unschedulable" per second. |
Dependent item | kubernetes.scheduler.schedulerscheduleattempts.unschedulable.rate Preprocessing
|
Schedule attempts: error | Number of attempts to schedule pods with result "error" per second. |
Dependent item | kubernetes.scheduler.schedulerscheduleattempts.error.rate Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes Scheduler: Too many REST Client errors | "Kubernetes Scheduler REST Client requests is experiencing high error rate (with 5xx HTTP code). |
min(/Kubernetes Scheduler by HTTP/kubernetes.scheduler.client_http_requests_500.rate,5m)>{$KUBE.SCHEDULER.HTTP.CLIENT.ERROR} |Warning |
||
Kubernetes Scheduler: Too many unschedulable pods | Number of attempts to schedule pods with 'unschedulable' result is too high. 'unschedulable' means a pod could not be scheduled. |
min(/Kubernetes Scheduler by HTTP/kubernetes.scheduler.scheduler_schedule_attempts.unschedulable.rate,5m)>{$KUBE.SCHEDULER.UNSCHEDULABLE} |Warning |
||
Kubernetes Scheduler: Too many schedule attempts with errors | Number of attempts to schedule pods with 'error' result is too high. 'error' means an internal scheduler problem. |
min(/Kubernetes Scheduler by HTTP/kubernetes.scheduler.scheduler_schedule_attempts.error.rate,5m)>{$KUBE.SCHEDULER.ERROR} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Scheduling algorithm histogram | Discovery raw data of scheduling algorithm latency. |
Dependent item | kubernetes.scheduler.scheduling_algorithm.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Scheduling algorithm duration bucket, {#LE} | Scheduling algorithm latency in seconds. |
Dependent item | kubernetes.scheduler.schedulingalgorithmduration[{#LE}] Preprocessing
|
Scheduling algorithm duration, p90 | 90 percentile of scheduling algorithm latency in seconds. |
Calculated | kubernetes.scheduler.schedulingalgorithmduration_p90[{#SINGLETON}] |
Scheduling algorithm duration, p95 | 95 percentile of scheduling algorithm latency in seconds. |
Calculated | kubernetes.scheduler.schedulingalgorithmduration_p95[{#SINGLETON}] |
Scheduling algorithm duration, p99 | 99 percentile of scheduling algorithm latency in seconds. |
Calculated | kubernetes.scheduler.schedulingalgorithmduration_p99[{#SINGLETON}] |
Scheduling algorithm duration, p50 | 50 percentile of scheduling algorithm latency in seconds. |
Calculated | kubernetes.scheduler.schedulingalgorithmduration_p50[{#SINGLETON}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Binding histogram | Discovery raw data of binding latency. |
Dependent item | kubernetes.scheduler.binding.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Binding duration bucket, {#LE} | Binding latency in seconds. |
Dependent item | kubernetes.scheduler.binding_duration[{#LE}] Preprocessing
|
Binding duration, p90 | 90 percentile of binding latency in seconds. |
Calculated | kubernetes.scheduler.bindingdurationp90[{#SINGLETON}] |
Binding duration, p95 | 99 percentile of binding latency in seconds. |
Calculated | kubernetes.scheduler.bindingdurationp95[{#SINGLETON}] |
Binding duration, p99 | 95 percentile of binding latency in seconds. |
Calculated | kubernetes.scheduler.bindingdurationp99[{#SINGLETON}] |
Binding duration, p50 | 50 percentile of binding latency in seconds. |
Calculated | kubernetes.scheduler.bindingdurationp50[{#SINGLETON}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
e2e scheduling histogram | Discovery raw data and percentile items of e2e scheduling latency. |
Dependent item | kubernetes.controller.e2e_scheduling.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
["{#RESULT}"]: e2e scheduling seconds bucket, {#LE} | E2e scheduling latency in seconds (scheduling algorithm + binding) |
Dependent item | kubernetes.scheduler.e2eschedulingbucket[{#LE},"{#RESULT}"] Preprocessing
|
["{#RESULT}"]: e2e scheduling, p50 | 50 percentile of e2e scheduling latency. |
Calculated | kubernetes.scheduler.e2eschedulingp50["{#RESULT}"] |
["{#RESULT}"]: e2e scheduling, p90 | 90 percentile of e2e scheduling latency. |
Calculated | kubernetes.scheduler.e2eschedulingp90["{#RESULT}"] |
["{#RESULT}"]: e2e scheduling, p95 | 95 percentile of e2e scheduling latency. |
Calculated | kubernetes.scheduler.e2eschedulingp95["{#RESULT}"] |
["{#RESULT}"]: e2e scheduling, p99 | 95 percentile of e2e scheduling latency. |
Calculated | kubernetes.scheduler.e2eschedulingp99["{#RESULT}"] |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Kubernetes nodes that work without any external scripts. It works without external scripts and uses the script item to make HTTP requests to the Kubernetes API. Install the Zabbix Helm Chart (https://git.zabbix.com/projects/ZT/repos/kubernetes-helm/browse?at=refs%2Fheads%2Frelease%2F7.0) in your Kubernetes cluster.
Change the values according to the environment in the file $HOME/zabbix_values.yaml.
For example:
Set the {$KUBE.API.URL}
such as <scheme>://<host>:<port>
.
Get the generated service account token using the command
kubectl get secret zabbix-service-account -n monitoring -o jsonpath={.data.token} | base64 -d
Then set it to the macro {$KUBE.API.TOKEN}
.
Set up the macros to filter the metrics of discovered nodes
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Install the Zabbix Helm Chart in your Kubernetes cluster.
Set the {$KUBE.API.URL}
such as <scheme>://<host>:<port>
.
Get the generated service account token using the command
kubectl get secret zabbix-service-account -n monitoring -o jsonpath={.data.token} | base64 -d
Then set it to the macro {$KUBE.API.TOKEN}
.
Set {$KUBE.NODES.ENDPOINT.NAME}
with Zabbix agent's endpoint name. See kubectl -n monitoring get ep
. Default: zabbix-zabbix-helm-chrt-agent
.
Set up the macros to filter the metrics of discovered nodes and host creation based on host prototypes:
Set up macros to filter pod metrics by namespace:
Note, If you have a large cluster, it is highly recommended to set a filter for discoverable pods.
You can use the {$KUBE.NODE.FILTER.LABELS}
, {$KUBE.POD.FILTER.LABELS}
, {$KUBE.NODE.FILTER.ANNOTATIONS}
and {$KUBE.POD.FILTER.ANNOTATIONS}
macros for advanced filtering of nodes and pods by labels and annotations.
Notes about labels and annotations filters:
key1: value, key2: regexp
).!
) to invert the filter (!key: value
).For example: kubernetes.io/hostname: kubernetes-node[5-25], !node-role.kubernetes.io/ingress: .*
. As a result, the nodes 5-25 without the "ingress" role will be discovered.
See the Kubernetes documentation for details about labels and annotations:
Note, the discovered nodes will be created as separate hosts in Zabbix with the Linux template automatically assigned to them.
Name | Description | Default |
---|---|---|
{$KUBE.API.URL} | Kubernetes API endpoint URL in the format |
https://kubernetes.default.svc.cluster.local:443 |
{$KUBE.API.TOKEN} | Service account bearer token. |
|
{$KUBE.HTTP.PROXY} | Sets the HTTP proxy to |
|
{$KUBE.NODES.ENDPOINT.NAME} | Kubernetes nodes endpoint name. See "kubectl -n monitoring get ep". |
zabbix-zabbix-helm-chrt-agent |
{$KUBE.LLD.FILTER.NODE.MATCHES} | Filter of discoverable nodes. |
.* |
{$KUBE.LLD.FILTER.NODE.NOT_MATCHES} | Filter to exclude discovered nodes. |
CHANGE_IF_NEEDED |
{$KUBE.LLD.FILTER.NODE.ROLE.MATCHES} | Filter of discoverable nodes by role. |
.* |
{$KUBE.LLD.FILTER.NODE.ROLE.NOT_MATCHES} | Filter to exclude discovered node by role. |
CHANGE_IF_NEEDED |
{$KUBE.NODE.FILTER.ANNOTATIONS} | Annotations to filter nodes (regex in values are supported). See the template's README.md for details. |
|
{$KUBE.NODE.FILTER.LABELS} | Labels to filter nodes (regex in values are supported). See the template's README.md for details. |
|
{$KUBE.POD.FILTER.ANNOTATIONS} | Annotations to filter pods (regex in values are supported). See the template's README.md for details. |
|
{$KUBE.POD.FILTER.LABELS} | Labels to filter Pods (regex in values are supported). See the template's README.md for details. |
|
{$KUBE.LLD.FILTER.POD.NAMESPACE.MATCHES} | Filter of discoverable pods by namespace. |
.* |
{$KUBE.LLD.FILTER.POD.NAMESPACE.NOT_MATCHES} | Filter to exclude discovered pods by namespace. |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get nodes | Collecting and processing cluster nodes data via Kubernetes API. |
Script | kube.nodes |
Get nodes check | Data collection check. |
Dependent item | kube.nodes.check Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes nodes: Failed to get nodes | length(last(/Kubernetes nodes by HTTP/kube.nodes.check))>0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Node discovery | Dependent item | kube.node.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Node [{#NAME}]: Get data | Collecting and processing cluster by node [{#NAME}] data via Kubernetes API. |
Dependent item | kube.node.get[{#NAME}] Preprocessing
|
Node [{#NAME}] Addresses: External IP | Typically the IP address of the node that is externally routable (available from outside the cluster). |
Dependent item | kube.node.addresses.external_ip[{#NAME}] Preprocessing
|
Node [{#NAME}] Addresses: Internal IP | Typically the IP address of the node that is routable only within the cluster. |
Dependent item | kube.node.addresses.internal_ip[{#NAME}] Preprocessing
|
Node [{#NAME}] Allocatable: CPU | Allocatable CPU. 'Allocatable' on a Kubernetes node is defined as the amount of compute resources that are available for pods. The scheduler does not over-subscribe 'Allocatable'. 'CPU', 'memory' and 'ephemeral-storage' are supported as of now. |
Dependent item | kube.node.allocatable.cpu[{#NAME}] Preprocessing
|
Node [{#NAME}] Allocatable: Memory | Allocatable Memory. 'Allocatable' on a Kubernetes node is defined as the amount of compute resources that are available for pods. The scheduler does not over-subscribe 'Allocatable'. 'CPU', 'memory' and 'ephemeral-storage' are supported as of now. |
Dependent item | kube.node.allocatable.memory[{#NAME}] Preprocessing
|
Node [{#NAME}] Allocatable: Pods | https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/ |
Dependent item | kube.node.allocatable.pods[{#NAME}] Preprocessing
|
Node [{#NAME}] Capacity: CPU | CPU resource capacity. https://kubernetes.io/docs/concepts/architecture/nodes/#capacity |
Dependent item | kube.node.capacity.cpu[{#NAME}] Preprocessing
|
Node [{#NAME}] Capacity: Memory | Memory resource capacity. https://kubernetes.io/docs/concepts/architecture/nodes/#capacity |
Dependent item | kube.node.capacity.memory[{#NAME}] Preprocessing
|
Node [{#NAME}] Capacity: Pods | https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/ |
Dependent item | kube.node.capacity.pods[{#NAME}] Preprocessing
|
Node [{#NAME}] Conditions: Disk pressure | True if pressure exists on the disk size - that is, if the disk capacity is low; otherwise False. |
Dependent item | kube.node.conditions.diskpressure[{#NAME}] Preprocessing
|
Node [{#NAME}] Conditions: Memory pressure | True if pressure exists on the node memory - that is, if the node memory is low; otherwise False. |
Dependent item | kube.node.conditions.memorypressure[{#NAME}] Preprocessing
|
Node [{#NAME}] Conditions: Network unavailable | True if the network for the node is not correctly configured, otherwise False. |
Dependent item | kube.node.conditions.networkunavailable[{#NAME}] Preprocessing
|
Node [{#NAME}] Conditions: PID pressure | True if pressure exists on the processes - that is, if there are too many processes on the node; otherwise False. |
Dependent item | kube.node.conditions.pidpressure[{#NAME}] Preprocessing
|
Node [{#NAME}] Conditions: Ready | True if the node is healthy and ready to accept pods, False if the node is not healthy and is not accepting pods, and Unknown if the node controller has not heard from the node in the last node-monitor-grace-period (default is 40 seconds). |
Dependent item | kube.node.conditions.ready[{#NAME}] Preprocessing
|
Node [{#NAME}] Info: Architecture | Node architecture. |
Dependent item | kube.node.info.architecture[{#NAME}] Preprocessing
|
Node [{#NAME}] Info: Container runtime | Container runtime. https://kubernetes.io/docs/setup/production-environment/container-runtimes/ |
Dependent item | kube.node.info.containerruntime[{#NAME}] Preprocessing
|
Node [{#NAME}] Info: Kernel version | Node kernel version. |
Dependent item | kube.node.info.kernelversion[{#NAME}] Preprocessing
|
Node [{#NAME}] Info: Kubelet version | Version of Kubelet. |
Dependent item | kube.node.info.kubeletversion[{#NAME}] Preprocessing
|
Node [{#NAME}] Info: KubeProxy version | Version of KubeProxy. |
Dependent item | kube.node.info.kubeproxyversion[{#NAME}] Preprocessing
|
Node [{#NAME}] Info: Operating system | Node operating system. |
Dependent item | kube.node.info.operatingsystem[{#NAME}] Preprocessing
|
Node [{#NAME}] Info: OS image | Node OS image. |
Dependent item | kube.node.info.osversion[{#NAME}] Preprocessing
|
Node [{#NAME}] Info: Roles | Node roles. |
Dependent item | kube.node.info.roles[{#NAME}] Preprocessing
|
Node [{#NAME}] Limits: CPU | Node CPU limits. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |
Dependent item | kube.node.limits.cpu[{#NAME}] Preprocessing
|
Node [{#NAME}] Limits: Memory | Node Memory limits. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |
Dependent item | kube.node.limits.memory[{#NAME}] Preprocessing
|
Node [{#NAME}] Requests: CPU | Node CPU requests. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |
Dependent item | kube.node.requests.cpu[{#NAME}] Preprocessing
|
Node [{#NAME}] Requests: Memory | Node Memory requests. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |
Dependent item | kube.node.requests.memory[{#NAME}] Preprocessing
|
Node [{#NAME}] Uptime | Node uptime. |
Dependent item | kube.node.uptime[{#NAME}] Preprocessing
|
Node [{#NAME}] Used: Pods | Current number of pods on the node. |
Dependent item | kube.node.used.pods[{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes nodes: Node [{#NAME}] Conditions: Pressure exists on the disk size | True - pressure exists on the disk size - that is, if the disk capacity is low; otherwise False. |
last(/Kubernetes nodes by HTTP/kube.node.conditions.diskpressure[{#NAME}])=1 |Warning |
||
Kubernetes nodes: Node [{#NAME}] Conditions: Pressure exists on the node memory | True - pressure exists on the node memory - that is, if the node memory is low; otherwise False |
last(/Kubernetes nodes by HTTP/kube.node.conditions.memorypressure[{#NAME}])=1 |Warning |
||
Kubernetes nodes: Node [{#NAME}] Conditions: Network is not correctly configured | True - the network for the node is not correctly configured, otherwise False |
last(/Kubernetes nodes by HTTP/kube.node.conditions.networkunavailable[{#NAME}])=1 |Warning |
||
Kubernetes nodes: Node [{#NAME}] Conditions: Pressure exists on the processes | True - pressure exists on the processes - that is, if there are too many processes on the node; otherwise False |
last(/Kubernetes nodes by HTTP/kube.node.conditions.pidpressure[{#NAME}])=1 |Warning |
||
Kubernetes nodes: Node [{#NAME}] Conditions: Is not in Ready state | False - if the node is not healthy and is not accepting pods. |
last(/Kubernetes nodes by HTTP/kube.node.conditions.ready[{#NAME}])<>1 |Warning |
||
Kubernetes nodes: Node [{#NAME}] Limits: Total CPU limits are too high | last(/Kubernetes nodes by HTTP/kube.node.limits.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 0.9 |Warning |
Depends on:
|
||
Kubernetes nodes: Node [{#NAME}] Limits: Total CPU limits are too high | last(/Kubernetes nodes by HTTP/kube.node.limits.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 1 |Average |
|||
Kubernetes nodes: Node [{#NAME}] Limits: Total memory limits are too high | last(/Kubernetes nodes by HTTP/kube.node.limits.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 0.9 |Warning |
Depends on:
|
||
Kubernetes nodes: Node [{#NAME}] Limits: Total memory limits are too high | last(/Kubernetes nodes by HTTP/kube.node.limits.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 1 |Average |
|||
Kubernetes nodes: Node [{#NAME}] Requests: Total CPU requests are too high | last(/Kubernetes nodes by HTTP/kube.node.requests.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 0.5 |Warning |
Depends on:
|
||
Kubernetes nodes: Node [{#NAME}] Requests: Total CPU requests are too high | last(/Kubernetes nodes by HTTP/kube.node.requests.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 0.8 |Average |
|||
Kubernetes nodes: Node [{#NAME}] Requests: Total memory requests are too high | last(/Kubernetes nodes by HTTP/kube.node.requests.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 0.5 |Warning |
Depends on:
|
||
Kubernetes nodes: Node [{#NAME}] Requests: Total memory requests are too high | last(/Kubernetes nodes by HTTP/kube.node.requests.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 0.8 |Average |
|||
Kubernetes nodes: Node [{#NAME}] has been restarted | Uptime is less than 10 minutes. |
last(/Kubernetes nodes by HTTP/kube.node.uptime[{#NAME}])<10 |Info |
||
Kubernetes nodes: Node [{#NAME}] Used: Kubelet too many pods | Kubelet is running at capacity. |
last(/Kubernetes nodes by HTTP/kube.node.used.pods[{#NAME}])/ last(/Kubernetes nodes by HTTP/kube.node.capacity.pods[{#NAME}]) > 0.9 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Pod discovery | Dependent item | kube.pod.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Node [{#NODE}] Namespace [{#NAMESPACE}] Pod [{#POD}]: Get data | Collecting and processing cluster by node [{#NODE}] data via Kubernetes API. |
Dependent item | kube.pod.get[{#NAMESPACE}/{#POD}] Preprocessing
|
Node [{#NODE}] Namespace [{#NAMESPACE}] Pod [{#POD}] Conditions: Containers ready | All containers in the Pod are ready. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions |
Dependent item | kube.pod.conditions.containers_ready[{#NAMESPACE}/{#POD}] Preprocessing
|
Node [{#NODE}] Namespace [{#NAMESPACE}] Pod [{#POD}] Conditions: Initialized | All init containers have started successfully. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions |
Dependent item | kube.pod.conditions.initialized[{#NAMESPACE}/{#POD}] Preprocessing
|
Node [{#NODE}] Namespace [{#NAMESPACE}] Pod [{#POD}] Conditions: Ready | The Pod is able to serve requests and should be added to the load balancing pools of all matching Services. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions |
Dependent item | kube.pod.conditions.ready[{#NAMESPACE}/{#POD}] Preprocessing
|
Node [{#NODE}] Namespace [{#NAMESPACE}] Pod [{#POD}] Conditions: Scheduled | The Pod has been scheduled to a node. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions |
Dependent item | kube.pod.conditions.scheduled[{#NAMESPACE}/{#POD}] Preprocessing
|
Node [{#NODE}] Namespace [{#NAMESPACE}] Pod [{#POD}] Containers: Restarts | The number of times the container has been restarted, currently based on the number of dead containers that have not yet been removed. Note that this is calculated from dead containers. But those containers are subject to garbage collection. |
Dependent item | kube.pod.containers.restartcount[{#NAMESPACE}/{#POD}] Preprocessing
|
Node [{#NODE}] Namespace [{#NAMESPACE}] Pod [{#POD}] Status: Phase | The phase of a Pod is a simple, high-level summary of where the Pod is in its lifecycle. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#pod-phase |
Dependent item | kube.pod.status.phase[{#NAMESPACE}/{#POD}] Preprocessing
|
Node [{#NODE}] Namespace [{#NAMESPACE}] Pod [{#POD}]: Uptime | Pod uptime. |
Dependent item | kube.pod.uptime[{#NAMESPACE}/{#POD}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes nodes: Node [{#NODE}] Namespace [{#NAMESPACE}] Pod [{#POD}]: Pod is crash looping | Containers of the pod keep restarting. This most likely indicates that the pod is in the CrashLoopBackOff state. |
(last(/Kubernetes nodes by HTTP/kube.pod.containers.restartcount[{#NAMESPACE}/{#POD}])-min(/Kubernetes nodes by HTTP/kube.pod.containers.restartcount[{#NAMESPACE}/{#POD}],15m))>1 |Warning |
||
Kubernetes nodes: Node [{#NODE}] Namespace [{#NAMESPACE}] Pod [{#POD}] Status: Kubernetes Pod not healthy | Pod has been in a non-ready state for longer than 10 minutes. |
count(/Kubernetes nodes by HTTP/kube.pod.status.phase[{#NAMESPACE}/{#POD}],10m, "regexp","^(1|4|5)$")>=9 |High |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Kubernetes Kubelet by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Kubernetes Kubelet by HTTP
- collects metrics by HTTP agent from Kubelet /metrics endpoint.
Don't forget change macros {$KUBE.KUBELET.URL}, {$KUBE.API.TOKEN}.
NOTE. Some metrics may not be collected depending on your Kubernetes instance version and configuration.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Internal service metrics are collected from /metrics endpoint. Template needs to use Authorization via API token.
Don't forget change macros {$KUBE.KUBELET.URL}, {$KUBE.API.TOKEN}.
NOTE. Some metrics may not be collected depending on your Kubernetes instance version and configuration.
Name | Description | Default |
---|---|---|
{$KUBE.API.TOKEN} | Service account bearer token. |
|
{$KUBE.KUBELET.URL} | Kubernetes Kubelet instance URL. |
https://localhost:10250 |
{$KUBE.KUBELET.METRIC.ENDPOINT} | Kubelet /metrics endpoint. |
/metrics |
{$KUBE.KUBELET.CADVISOR.ENDPOINT} | cAdvisor metrics from Kubelet /metrics/cadvisor endpoint. |
/metrics/cadvisor |
{$KUBE.KUBELET.PODS.ENDPOINT} | Kubelet /pods endpoint. |
/pods |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get kubelet metrics | Collecting raw Kubelet metrics from /metrics endpoint. |
HTTP agent | kube.kubelet.metrics |
Get cadvisor metrics | Collecting raw Kubelet metrics from /metrics/cadvisor endpoint. |
HTTP agent | kube.cadvisor.metrics |
Get pods | Collecting raw Kubelet metrics from /pods endpoint. |
HTTP agent | kube.pods |
Pods running | The number of running pods. |
Dependent item | kube.kubelet.pods.running Preprocessing
|
Containers started | The number of started containers. |
Dependent item | kube.kubelet.containers.started Preprocessing
|
Containers ready | The number of ready containers. |
Dependent item | kube.kubelet.containers.ready Preprocessing
|
Containers last state terminated | The number of containers that were previously terminated. |
Dependent item | kube.kublet.containers.terminated Preprocessing
|
Containers restarts | The number of times the container has been restarted. |
Dependent item | kube.kubelet.containers.restarts Preprocessing
|
CPU cores, total | The number of cores in this machine (available until kubernetes v1.18). |
Dependent item | kube.kubelet.cpu.cores Preprocessing
|
Machine memory, bytes | Resident memory size in bytes. |
Dependent item | kube.kubelet.machine.memory Preprocessing
|
Virtual memory, bytes | Virtual memory size in bytes. |
Dependent item | kube.kubelet.virtual.memory Preprocessing
|
File descriptors, max | Maximum number of open file descriptors. |
Dependent item | kube.kubelet.processmaxfds Preprocessing
|
File descriptors, open | Number of open file descriptors. |
Dependent item | kube.kubelet.processopenfds Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Runtime operations discovery | Dependent item | kube.kubelet.runtimeoperationsbucket.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
[{#OP_TYPE}] Runtime operations bucket: {#LE} | Duration in seconds of runtime operations. Broken down by operation type. |
Dependent item | kube.kublet.runtimeopsdurationsecondsbucket[{#LE},"{#OP_TYPE}"] Preprocessing
|
[{#OP_TYPE}] Runtime operations total, rate | Cumulative number of runtime operations by operation type. |
Dependent item | kube.kublet.runtimeopstotal.rate["{#OP_TYPE}"] Preprocessing
|
[{#OP_TYPE}] Operations, p90 | 90 percentile of operation latency distribution in seconds for each verb. |
Calculated | kube.kublet.runtimeopsdurationsecondsp90["{#OP_TYPE}"] |
[{#OP_TYPE}] Operations, p95 | 95 percentile of operation latency distribution in seconds for each verb. |
Calculated | kube.kublet.runtimeopsdurationsecondsp95["{#OP_TYPE}"] |
[{#OP_TYPE}] Operations, p99 | 99 percentile of operation latency distribution in seconds for each verb. |
Calculated | kube.kublet.runtimeopsdurationsecondsp99["{#OP_TYPE}"] |
[{#OP_TYPE}] Operations, p50 | 50 percentile of operation latency distribution in seconds for each verb. |
Calculated | kube.kublet.runtimeopsdurationsecondsp50["{#OP_TYPE}"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Pods discovery | Dependent item | kube.kubelet.pods.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Namespace [{#NAMESPACE}] Pod [{#NAME}] CPU: Load average, 10s | Pods cpu load average over the last 10 seconds. |
Dependent item | kube.pod.containercpuloadaverage10s[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Pod [{#NAME}] CPU: System seconds, total | System cpu time consumed. It is calculated from the cumulative value using the |
Dependent item | kube.pod.containercpusystemsecondstotal[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Pod [{#NAME}] CPU: Usage seconds, total | Consumed cpu time. It is calculated from the cumulative value using the |
Dependent item | kube.pod.containercpuusagesecondstotal[{#NAMESPACE}/{#NAME}] Preprocessing
|
Namespace [{#NAMESPACE}] Pod [{#NAME}] CPU: User seconds, total | User cpu time consumed. It is calculated from the cumulative value using the |
Dependent item | kube.pod.containercpuusersecondstotal[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
REST client requests discovery | Dependent item | kube.kubelet.rest.requests.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Host [{#HOST}] Request method [{#METHOD}] Code:[{#CODE}] | Number of HTTP requests, partitioned by status code, method, and host. |
Dependent item | kube.kubelet.rest.requests["{#CODE}", "{#HOST}", "{#METHOD}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Container memory discovery | Dependent item | kube.kubelet.container.memory.cache.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: Memory page cache | Number of bytes of page cache memory. |
Dependent item | kube.kubelet.container.memory.cache["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing
|
Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: Memory max usage | Maximum memory usage recorded in bytes. |
Dependent item | kube.kubelet.container.memory.max_usage["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing
|
Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: RSS | Size of RSS in bytes. |
Dependent item | kube.kubelet.container.memory.rss["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing
|
Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: Swap | Container swap usage in bytes. |
Dependent item | kube.kubelet.container.memory.swap["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing
|
Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: Usage | Current memory usage in bytes, including all memory regardless of when it was accessed. |
Dependent item | kube.kubelet.container.memory.usage["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing
|
Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: Working set | Current working set in bytes. |
Dependent item | kube.kubelet.container.memory.working_set["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Kubernetes Controller manager by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Kubernetes Controller manager by HTTP
- collects metrics by HTTP agent from Controller manager /metrics endpoint.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Internal service metrics are collected from /metrics endpoint. Template needs to use Authorization via API token.
Don't forget change macros {$KUBE.CONTROLLER.SERVER.URL}, {$KUBE.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values.
NOTE. You might need to set the --binding-address
option for Controller Manager to the address where Zabbix proxy can reach it.
For example, for clusters created with kubeadm
it can be set in the following manifest file (changes will be applied immediately):
NOTE. Some metrics may not be collected depending on your Kubernetes Controller manager instance version and configuration.
Name | Description | Default |
---|---|---|
{$KUBE.CONTROLLER.SERVER.URL} | Kubernetes Controller manager metrics endpoint URL. |
https://localhost:10257/metrics |
{$KUBE.API.TOKEN} | API Authorization Token |
|
{$KUBE.CONTROLLER.HTTP.CLIENT.ERROR} | Maximum number of HTTP client requests failures used for trigger. |
2 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes Controller: Get Controller metrics | Get raw metrics from Controller instance /metrics endpoint. |
HTTP agent | kubernetes.controller.get_metrics Preprocessing
|
Leader election status | Gauge of if the reporting system is master of the relevant lease, 0 indicates backup, 1 indicates master. |
Dependent item | kubernetes.controller.leaderelectionmaster_status Preprocessing
|
Virtual memory, bytes | Virtual memory size in bytes. |
Dependent item | kubernetes.controller.processvirtualmemory_bytes Preprocessing
|
Resident memory, bytes | Resident memory size in bytes. |
Dependent item | kubernetes.controller.processresidentmemory_bytes Preprocessing
|
CPU | Total user and system CPU usage ratio. |
Dependent item | kubernetes.controller.cpu.util Preprocessing
|
Goroutines | Number of goroutines that currently exist. |
Dependent item | kubernetes.controller.go_goroutines Preprocessing
|
Go threads | Number of OS threads created. |
Dependent item | kubernetes.controller.go_threads Preprocessing
|
Fds open | Number of open file descriptors. |
Dependent item | kubernetes.controller.open_fds Preprocessing
|
Fds max | Maximum allowed open file descriptors. |
Dependent item | kubernetes.controller.max_fds Preprocessing
|
REST Client requests: 2xx, rate | Number of HTTP requests with 2xx status code per second. |
Dependent item | kubernetes.controller.clienthttprequests_200.rate Preprocessing
|
REST Client requests: 3xx, rate | Number of HTTP requests with 3xx status code per second. |
Dependent item | kubernetes.controller.clienthttprequests_300.rate Preprocessing
|
REST Client requests: 4xx, rate | Number of HTTP requests with 4xx status code per second. |
Dependent item | kubernetes.controller.clienthttprequests_400.rate Preprocessing
|
REST Client requests: 5xx, rate | Number of HTTP requests with 5xx status code per second. |
Dependent item | kubernetes.controller.clienthttprequests_500.rate Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes Controller manager: Too many HTTP client errors | "Kubernetes Controller manager is experiencing high error rate (with 5xx HTTP code). |
min(/Kubernetes Controller manager by HTTP/kubernetes.controller.client_http_requests_500.rate,5m)>{$KUBE.CONTROLLER.HTTP.CLIENT.ERROR} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Workqueue metrics discovery | Dependent item | kubernetes.controller.workqueue.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
["{#NAME}"]: Workqueue adds total, rate | Total number of adds handled by workqueue per second. |
Dependent item | kubernetes.controller.workqueueaddstotal["{#NAME}"] Preprocessing
|
["{#NAME}"]: Workqueue depth | Current depth of workqueue. |
Dependent item | kubernetes.controller.workqueue_depth["{#NAME}"] Preprocessing
|
["{#NAME}"]: Workqueue unfinished work, sec | How many seconds of work has done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases. |
Dependent item | kubernetes.controller.workqueueunfinishedwork_seconds["{#NAME}"] Preprocessing
|
["{#NAME}"]: Workqueue retries, rate | Total number of retries handled by workqueue per second. |
Dependent item | kubernetes.controller.workqueueretriestotal["{#NAME}"] Preprocessing
|
["{#NAME}"]: Workqueue longest running processor, sec | How many seconds has the longest running processor for workqueue been running. |
Dependent item | kubernetes.controller.workqueuelongestrunningprocessorseconds["{#NAME}"] Preprocessing
|
["{#NAME}"]: Workqueue work duration, p90 | 90 percentile of how long in seconds processing an item from workqueue takes, by queue. |
Calculated | kubernetes.controller.workqueueworkdurationsecondsp90["{#NAME}"] |
["{#NAME}"]: Workqueue work duration, p95 | 95 percentile of how long in seconds processing an item from workqueue takes, by queue. |
Calculated | kubernetes.controller.workqueueworkdurationsecondsp95["{#NAME}"] |
["{#NAME}"]: Workqueue work duration, p99 | 99 percentile of how long in seconds processing an item from workqueue takes, by queue. |
Calculated | kubernetes.controller.workqueueworkdurationsecondsp99["{#NAME}"] |
["{#NAME}"]: Workqueue work duration, 50p | 50 percentiles of how long in seconds processing an item from workqueue takes, by queue. |
Calculated | kubernetes.controller.workqueueworkdurationsecondsp50["{#NAME}"] |
["{#NAME}"]: Workqueue queue duration, p90 | 90 percentile of how long in seconds an item stays in workqueue before being requested, by queue. |
Calculated | kubernetes.controller.workqueuequeuedurationsecondsp90["{#NAME}"] |
["{#NAME}"]: Workqueue queue duration, p95 | 95 percentile of how long in seconds an item stays in workqueue before being requested, by queue. |
Calculated | kubernetes.controller.workqueuequeuedurationsecondsp95["{#NAME}"] |
["{#NAME}"]: Workqueue queue duration, p99 | 99 percentile of how long in seconds an item stays in workqueue before being requested, by queue. |
Calculated | kubernetes.controller.workqueuequeuedurationsecondsp99["{#NAME}"] |
["{#NAME}"]: Workqueue queue duration, 50p | 50 percentile of how long in seconds an item stays in workqueue before being requested. If there are no requests for 5 minute, item value will be discarded. |
Calculated | kubernetes.controller.workqueuequeuedurationsecondsp50["{#NAME}"] Preprocessing
|
["{#NAME}"]: Workqueue duration seconds bucket, {#LE} | How long in seconds processing an item from workqueue takes. |
Dependent item | kubernetes.controller.durationsecondsbucket[{#LE},"{#NAME}"] Preprocessing
|
["{#NAME}"]: Queue duration seconds bucket, {#LE} | How long in seconds an item stays in workqueue before being requested. |
Dependent item | kubernetes.controller.queuedurationseconds_bucket[{#LE},"{#NAME}"] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Kubernetes API server that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Kubernetes API server by HTTP
- collects metrics by HTTP agent from API server /metrics endpoint.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Internal service metrics are collected from /metrics endpoint. Template needs to use Authorization via API token.
Don't forget change macros {$KUBE.API.SERVER.URL}, {$KUBE.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values.
NOTE. Some metrics may not be collected depending on your Kubernetes API server instance version and configuration.
Name | Description | Default |
---|---|---|
{$KUBE.API.SERVER.URL} | Kubernetes API server metrics endpoint URL. |
https://localhost:6443/metrics |
{$KUBE.API.TOKEN} | API Authorization Token. |
|
{$KUBE.API.CERT.EXPIRATION} | Number of days for alert of client certificate used for trigger. |
7 |
{$KUBE.API.HTTP.CLIENT.ERROR} | Maximum number of HTTP client requests failures used for trigger. |
2 |
{$KUBE.API.HTTP.SERVER.ERROR} | Maximum number of HTTP server requests failures used for trigger. |
2 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get API instance metrics | Get raw metrics from API instance /metrics endpoint. |
HTTP agent | kubernetes.api.get_metrics Preprocessing
|
Audit events, total | Accumulated number audit events generated and sent to the audit backend. |
Dependent item | kubernetes.api.auditeventtotal Preprocessing
|
Virtual memory, bytes | Virtual memory size in bytes. |
Dependent item | kubernetes.api.processvirtualmemory_bytes Preprocessing
|
Resident memory, bytes | Resident memory size in bytes. |
Dependent item | kubernetes.api.processresidentmemory_bytes Preprocessing
|
CPU | Total user and system CPU usage ratio. |
Dependent item | kubernetes.api.cpu.util Preprocessing
|
Goroutines | Number of goroutines that currently exist. |
Dependent item | kubernetes.api.go_goroutines Preprocessing
|
Go threads | Number of OS threads created. |
Dependent item | kubernetes.api.go_threads Preprocessing
|
Fds open | Number of open file descriptors. |
Dependent item | kubernetes.api.open_fds Preprocessing
|
Fds max | Maximum allowed open file descriptors. |
Dependent item | kubernetes.api.max_fds Preprocessing
|
gRPCs client started, rate | Total number of RPCs started per second. |
Dependent item | kubernetes.api.grpcclientstarted.rate Preprocessing
|
gRPCs messages received, rate | Total number of gRPC stream messages received per second. |
Dependent item | kubernetes.api.grpcclientmsg_received.rate Preprocessing
|
gRPCs messages sent, rate | Total number of gRPC stream messages sent per second. |
Dependent item | kubernetes.api.grpcclientmsg_sent.rate Preprocessing
|
Request terminations, rate | Number of requests which apiserver terminated in self-defense per second. |
Dependent item | kubernetes.api.apiserverrequestterminations Preprocessing
|
TLS handshake errors, rate | Number of requests dropped with 'TLS handshake error from' error per second. |
Dependent item | kubernetes.api.apiservertlshandshakeerrorstotal.rate Preprocessing
|
API server requests: 5xx, rate | Counter of apiserver requests broken out for each HTTP response code. |
Dependent item | kubernetes.api.apiserverrequesttotal_500.rate Preprocessing
|
API server requests: 4xx, rate | Counter of apiserver requests broken out for each HTTP response code. |
Dependent item | kubernetes.api.apiserverrequesttotal_400.rate Preprocessing
|
API server requests: 3xx, rate | Counter of apiserver requests broken out for each HTTP response code. |
Dependent item | kubernetes.api.apiserverrequesttotal_300.rate Preprocessing
|
API server requests: 0 | Counter of apiserver requests broken out for each HTTP response code. |
Dependent item | kubernetes.api.apiserverrequesttotal_0.rate Preprocessing
|
API server requests: 2xx, rate | Counter of apiserver requests broken out for each HTTP response code. |
Dependent item | kubernetes.api.apiserverrequesttotal_200.rate Preprocessing
|
HTTP requests: 5xx, rate | Number of HTTP requests with 5xx status code per second. |
Dependent item | kubernetes.api.restclientrequeststotal500.rate Preprocessing
|
HTTP requests: 4xx, rate | Number of HTTP requests with 4xx status code per second. |
Dependent item | kubernetes.api.restclientrequeststotal400.rate Preprocessing
|
HTTP requests: 3xx, rate | Number of HTTP requests with 3xx status code per second. |
Dependent item | kubernetes.api.restclientrequeststotal300.rate Preprocessing
|
HTTP requests: 2xx, rate | Number of HTTP requests with 2xx status code per second. |
Dependent item | kubernetes.api.restclientrequeststotal200.rate Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes API server: Too many server errors | "Kubernetes API server is experiencing high error rate (with 5xx HTTP code). |
min(/Kubernetes API server by HTTP/kubernetes.api.apiserver_request_total_500.rate,5m)>{$KUBE.API.HTTP.SERVER.ERROR} |Warning |
||
Kubernetes API server: Too many client errors | "Kubernetes API client is experiencing high error rate (with 5xx HTTP code). |
min(/Kubernetes API server by HTTP/kubernetes.api.rest_client_requests_total_500.rate,5m)>{$KUBE.API.HTTP.CLIENT.ERROR} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Long-running requests | Discovery of long-running requests by verb, resource and scope. |
Dependent item | kubernetes.api.longrunning_gauge.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Long-running ["{#VERB}"] requests ["{#RESOURCE}"]: {#SCOPE} | Gauge of all active long-running apiserver requests broken out by verb, resource and scope. Not all requests are tracked this way. |
Dependent item | kubernetes.api.longrunning_gauge["{#RESOURCE}","{#SCOPE}","{#VERB}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Request duration histogram | Discovery raw data and percentile items of request duration. |
Dependent item | kubernetes.api.requests_bucket.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
["{#VERB}"] Requests bucket: {#LE} | Response latency distribution in seconds for each verb. |
Dependent item | kubernetes.api.requestdurationseconds_bucket[{#LE},"{#VERB}"] Preprocessing
|
["{#VERB}"] Requests, p90 | 90 percentile of response latency distribution in seconds for each verb. |
Calculated | kubernetes.api.requestdurationseconds_p90["{#VERB}"] |
["{#VERB}"] Requests, p95 | 95 percentile of response latency distribution in seconds for each verb. |
Calculated | kubernetes.api.requestdurationseconds_p95["{#VERB}"] |
["{#VERB}"] Requests, p99 | 99 percentile of response latency distribution in seconds for each verb. |
Calculated | kubernetes.api.requestdurationseconds_p99["{#VERB}"] |
["{#VERB}"] Requests, p50 | 50 percentile of response latency distribution in seconds for each verb. |
Calculated | kubernetes.api.requestdurationseconds_p50["{#VERB}"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Requests inflight discovery | Discovery requests inflight by kind. |
Dependent item | kubernetes.api.inflight_requests.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Requests current: {#KIND} | Maximal number of currently used inflight request limit of this apiserver per request kind in last second. |
Dependent item | kubernetes.api.currentinflightrequests["{#KIND}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
gRPC completed requests discovery | Discovery grpc completed requests by grpc code. |
Dependent item | kubernetes.api.grpcclienthandled.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
gRPCs completed: {#GRPC_CODE}, rate | Total number of RPCs completed by the client regardless of success or failure per second. |
Dependent item | kubernetes.api.grpcclienthandledtotal.rate["{#GRPCCODE}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Authentication attempts discovery | Discovery authentication attempts by result. |
Dependent item | kubernetes.api.authentication_attempts.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Authentication attempts: {#RESULT}, rate | Authentication attempts by result per second. |
Dependent item | kubernetes.api.authentication_attempts.rate["{#RESULT}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Authentication requests discovery | Discovery authentication attempts by name. |
Dependent item | kubernetes.api.authenticateduserrequests.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Authenticated requests: {#NAME}, rate | Counter of authenticated requests broken out by username per second. |
Dependent item | kubernetes.api.authenticateduserrequests.rate["{#NAME}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Watchers metrics discovery | Discovery watchers by kind. |
Dependent item | kubernetes.api.apiserverregisteredwatchers.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Watchers: {#KIND} | Number of currently registered watchers for a given resource. |
Dependent item | kubernetes.api.apiserverregisteredwatchers["{#KIND}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Etcd objects metrics discovery | Discovery etcd objects by resource. |
Dependent item | kubernetes.api.etcdobjectcounts.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
etcd objects: {#RESOURCE} | Number of stored objects at the time of last check split by kind. |
Dependent item | kubernetes.api.etcdobjectcounts["{#RESOURCE}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Workqueue metrics discovery | Discovery workqueue metrics by name. |
Dependent item | kubernetes.api.workqueue.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
["{#NAME}"] Workqueue depth | Current depth of workqueue. |
Dependent item | kubernetes.api.workqueue_depth["{#NAME}"] Preprocessing
|
["{#NAME}"] Workqueue adds total, rate | Total number of adds handled by workqueue per second. |
Dependent item | kubernetes.api.workqueueaddstotal.rate["{#NAME}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Client certificate expiration histogram | Discovery raw data of client certificate expiration |
Dependent item | kubernetes.api.certificate_expiration.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Certificate expiration seconds bucket, {#LE} | Distribution of the remaining lifetime on the certificate used to authenticate a request. |
Dependent item | kubernetes.api.clientcertificateexpirationsecondsbucket[{#LE}] Preprocessing
|
Client certificate expiration, p1 | 1 percentile of the remaining lifetime on the certificate used to authenticate a request. |
Calculated | kubernetes.api.clientcertificateexpiration_p1[{#SINGLETON}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes API server: Kubernetes client certificate is expiring | A client certificate used to authenticate to the apiserver is expiring in {$KUBE.API.CERT.EXPIRATION} days. |
last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) > 0 and last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) < {$KUBE.API.CERT.EXPIRATION}*24*60*60 |Warning |
Depends on:
|
|
Kubernetes API server: Kubernetes client certificate expires soon | A client certificate used to authenticate to the apiserver is expiring in less than 24.0 hours. |
last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) > 0 and last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) < 24*60*60 |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Apache Kafka monitoring by Zabbix via JMX and doesn't require any external scripts.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Metrics are collected by JMX.
Name | Description | Default |
---|---|---|
{$KAFKA.USER} | zabbix |
|
{$KAFKA.PASSWORD} | zabbix |
|
{$KAFKA.TOPIC.MATCHES} | Filter of discoverable topics |
.* |
{$KAFKA.TOPIC.NOT_MATCHES} | Filter to exclude discovered topics |
__consumer_offsets |
{$KAFKA.NETPROCAVG_IDLE.MIN.WARN} | The minimum Network processor average idle percent for trigger expression. |
30 |
{$KAFKA.REQUESTHANDLERAVG_IDLE.MIN.WARN} | The minimum Request handler average idle percent for trigger expression. |
30 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Leader election per second | Number of leader elections per second. |
JMX agent | jmx["kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs","Count"] |
Unclean leader election per second | Number of “unclean” elections per second. |
JMX agent | jmx["kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec","Count"] Preprocessing
|
Controller state on broker | One indicates that the broker is the controller for the cluster. |
JMX agent | jmx["kafka.controller:type=KafkaController,name=ActiveControllerCount","Value"] Preprocessing
|
Ineligible pending replica deletes | The number of ineligible pending replica deletes. |
JMX agent | jmx["kafka.controller:type=KafkaController,name=ReplicasIneligibleToDeleteCount","Value"] |
Pending replica deletes | The number of pending replica deletes. |
JMX agent | jmx["kafka.controller:type=KafkaController,name=ReplicasToDeleteCount","Value"] |
Ineligible pending topic deletes | The number of ineligible pending topic deletes. |
JMX agent | jmx["kafka.controller:type=KafkaController,name=TopicsIneligibleToDeleteCount","Value"] |
Pending topic deletes | The number of pending topic deletes. |
JMX agent | jmx["kafka.controller:type=KafkaController,name=TopicsToDeleteCount","Value"] |
Offline log directory count | The number of offline log directories (for example, after a hardware failure). |
JMX agent | jmx["kafka.log:type=LogManager,name=OfflineLogDirectoryCount","Value"] |
Offline partitions count | Number of partitions that don't have an active leader. |
JMX agent | jmx["kafka.controller:type=KafkaController,name=OfflinePartitionsCount","Value"] |
Bytes out per second | The rate at which data is fetched and read from the broker by consumers. |
JMX agent | jmx["kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec","Count"] Preprocessing
|
Bytes in per second | The rate at which data sent from producers is consumed by the broker. |
JMX agent | jmx["kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec","Count"] Preprocessing
|
Messages in per second | The rate at which individual messages are consumed by the broker. |
JMX agent | jmx["kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec","Count"] Preprocessing
|
Bytes rejected per second | The rate at which bytes rejected per second by the broker. |
JMX agent | jmx["kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec","Count"] Preprocessing
|
Client fetch request failed per second | Number of client fetch request failures per second. |
JMX agent | jmx["kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec","Count"] Preprocessing
|
Produce requests failed per second | Number of failed produce requests per second. |
JMX agent | jmx["kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec","Count"] Preprocessing
|
Request handler average idle percent | Indicates the percentage of time that the request handler (IO) threads are not in use. |
JMX agent | jmx["kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent","OneMinuteRate"] Preprocessing
|
Fetch-Consumer response send time, mean | Average time taken, in milliseconds, to send the response. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer","Mean"] |
Fetch-Consumer response send time, p95 | The time taken, in milliseconds, to send the response for 95th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer","95thPercentile"] |
Fetch-Consumer response send time, p99 | The time taken, in milliseconds, to send the response for 99th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer","99thPercentile"] |
Fetch-Follower response send time, mean | Average time taken, in milliseconds, to send the response. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchFollower","Mean"] |
Fetch-Follower response send time, p95 | The time taken, in milliseconds, to send the response for 95th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchFollower","95thPercentile"] |
Fetch-Follower response send time, p99 | The time taken, in milliseconds, to send the response for 99th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchFollower","99thPercentile"] |
Produce response send time, mean | Average time taken, in milliseconds, to send the response. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce","Mean"] |
Produce response send time, p95 | The time taken, in milliseconds, to send the response for 95th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce","95thPercentile"] |
Produce response send time, p99 | The time taken, in milliseconds, to send the response for 99th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce","99thPercentile"] |
Fetch-Consumer request total time, mean | Average time in ms to serve the Fetch-Consumer request. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer","Mean"] |
Fetch-Consumer request total time, p95 | Time in ms to serve the Fetch-Consumer request for 95th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer","95thPercentile"] |
Fetch-Consumer request total time, p99 | Time in ms to serve the specified Fetch-Consumer for 99th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer","99thPercentile"] |
Fetch-Follower request total time, mean | Average time in ms to serve the Fetch-Follower request. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower","Mean"] |
Fetch-Follower request total time, p95 | Time in ms to serve the Fetch-Follower request for 95th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower","95thPercentile"] |
Fetch-Follower request total time, p99 | Time in ms to serve the Fetch-Follower request for 99th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower","99thPercentile"] |
Produce request total time, mean | Average time in ms to serve the Produce request. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce","Mean"] |
Produce request total time, p95 | Time in ms to serve the Produce requests for 95th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce","95thPercentile"] |
Produce request total time, p99 | Time in ms to serve the Produce requests for 99th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce","99thPercentile"] |
Fetch-Consumer request total time, mean | Average time for a request to update metadata. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata","Mean"] |
UpdateMetadata request total time, p95 | Time for update metadata requests for 95th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata","95thPercentile"] |
UpdateMetadata request total time, p99 | Time for update metadata requests for 99th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata","99thPercentile"] |
Temporary memory size in bytes (Fetch), max | The maximum of temporary memory used for converting message formats and decompressing messages. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Fetch","Max"] |
Temporary memory size in bytes (Fetch), min | The minimum of temporary memory used for converting message formats and decompressing messages. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Fetch","Mean"] |
Temporary memory size in bytes (Produce), max | The maximum of temporary memory used for converting message formats and decompressing messages. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Produce","Max"] |
Temporary memory size in bytes (Produce), avg | The amount of temporary memory used for converting message formats and decompressing messages. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Produce","Mean"] |
Temporary memory size in bytes (Produce), min | The minimum of temporary memory used for converting message formats and decompressing messages. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Produce","Min"] |
Network processor average idle percent | The average percentage of time that the network processors are idle. |
JMX agent | jmx["kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent","Value"] Preprocessing
|
Requests in producer purgatory | Number of requests waiting in producer purgatory. |
JMX agent | jmx["kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Fetch","Value"] |
Requests in fetch purgatory | Number of requests waiting in fetch purgatory. |
JMX agent | jmx["kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Produce","Value"] |
Replication maximum lag | The maximum lag between the time that messages are received by the leader replica and by the follower replicas. |
JMX agent | jmx["kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica","Value"] |
Under minimum ISR partition count | The number of partitions under the minimum In-Sync Replica (ISR) count. |
JMX agent | jmx["kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCount","Value"] |
Under replicated partitions | The number of partitions that have not been fully replicated in the follower replicas (the number of non-reassigning replicas - the number of ISR > 0). |
JMX agent | jmx["kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions","Value"] |
ISR expands per second | The rate at which the number of ISRs in the broker increases. |
JMX agent | jmx["kafka.server:type=ReplicaManager,name=IsrExpandsPerSec","Count"] Preprocessing
|
ISR shrink per second | Rate of replicas leaving the ISR pool. |
JMX agent | jmx["kafka.server:type=ReplicaManager,name=IsrShrinksPerSec","Count"] Preprocessing
|
Leader count | The number of replicas for which this broker is the leader. |
JMX agent | jmx["kafka.server:type=ReplicaManager,name=LeaderCount","Value"] |
Partition count | The number of partitions in the broker. |
JMX agent | jmx["kafka.server:type=ReplicaManager,name=PartitionCount","Value"] |
Number of reassigning partitions | The number of reassigning leader partitions on a broker. |
JMX agent | jmx["kafka.server:type=ReplicaManager,name=ReassigningPartitions","Value"] |
Request queue size | The size of the delay queue. |
JMX agent | jmx["kafka.server:type=Request","queue-size"] |
Version | Current version of broker. |
JMX agent | jmx["kafka.server:type=app-info","version"] Preprocessing
|
Uptime | The service uptime expressed in seconds. |
JMX agent | jmx["kafka.server:type=app-info","start-time-ms"] Preprocessing
|
ZooKeeper client request latency | Latency in milliseconds for ZooKeeper requests from broker. |
JMX agent | jmx["kafka.server:type=ZooKeeperClientMetrics,name=ZooKeeperRequestLatencyMs","Count"] |
ZooKeeper connection status | Connection status of broker's ZooKeeper session. |
JMX agent | jmx["kafka.server:type=SessionExpireListener,name=SessionState","Value"] Preprocessing
|
ZooKeeper disconnect rate | ZooKeeper client disconnect per second. |
JMX agent | jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperDisconnectsPerSec","Count"] Preprocessing
|
ZooKeeper session expiration rate | ZooKeeper client session expiration per second. |
JMX agent | jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperExpiresPerSec","Count"] Preprocessing
|
ZooKeeper readonly rate | ZooKeeper client readonly per second. |
JMX agent | jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperReadOnlyConnectsPerSec","Count"] Preprocessing
|
ZooKeeper sync rate | ZooKeeper client sync per second. |
JMX agent | jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperSyncConnectsPerSec","Count"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Apache Kafka: Unclean leader election detected | Unclean leader elections occur when there is no qualified partition leader among Kafka brokers. If Kafka is configured to allow an unclean leader election, a leader is chosen from the out-of-sync replicas, and any messages that were not synced prior to the loss of the former leader are lost forever. Essentially, unclean leader elections sacrifice consistency for availability. |
last(/Apache Kafka by JMX/jmx["kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec","Count"])>0 |Average |
||
Apache Kafka: There are offline log directories | The offline log directory count metric indicate the number of log directories which are offline (due to a hardware failure for example) so that the broker cannot store incoming messages anymore. |
last(/Apache Kafka by JMX/jmx["kafka.log:type=LogManager,name=OfflineLogDirectoryCount","Value"]) > 0 |Warning |
||
Apache Kafka: One or more partitions have no leader | Any partition without an active leader will be completely inaccessible, and both consumers and producers of that partition will be blocked until a leader becomes available. |
last(/Apache Kafka by JMX/jmx["kafka.controller:type=KafkaController,name=OfflinePartitionsCount","Value"]) > 0 |Warning |
||
Apache Kafka: Request handler average idle percent is too low | The request handler idle ratio metric indicates the percentage of time the request handlers are not in use. The lower this number, the more loaded the broker is. |
max(/Apache Kafka by JMX/jmx["kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent","OneMinuteRate"],15m)<{$KAFKA.REQUEST_HANDLER_AVG_IDLE.MIN.WARN} |Average |
||
Apache Kafka: Network processor average idle percent is too low | The network processor idle ratio metric indicates the percentage of time the network processor are not in use. The lower this number, the more loaded the broker is. |
max(/Apache Kafka by JMX/jmx["kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent","Value"],15m)<{$KAFKA.NET_PROC_AVG_IDLE.MIN.WARN} |Average |
||
Apache Kafka: Failed to fetch info data | Zabbix has not received data for items for the last 15 minutes |
nodata(/Apache Kafka by JMX/jmx["kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent","Value"],15m)=1 |Warning |
||
Apache Kafka: There are partitions under the min ISR | The Under min ISR partitions metric displays the number of partitions, where the number of In-Sync Replicas (ISR) is less than the minimum number of in-sync replicas specified. The two most common causes of under-min ISR partitions are that one or more brokers is unresponsive, or the cluster is experiencing performance issues and one or more brokers are falling behind. |
last(/Apache Kafka by JMX/jmx["kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCount","Value"])>0 |Average |
||
Apache Kafka: There are under replicated partitions | The Under replicated partitions metric displays the number of partitions that do not have enough replicas to meet the desired replication factor. A partition will also be considered under-replicated if the correct number of replicas exist, but one or more of the replicas have fallen significantly behind the partition leader. The two most common causes of under-replicated partitions are that one or more brokers is unresponsive, or the cluster is experiencing performance issues and one or more brokers have fallen behind. |
last(/Apache Kafka by JMX/jmx["kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions","Value"])>0 |Average |
||
Apache Kafka: Version has changed | The Kafka version has changed. Acknowledge to close the problem manually. |
last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","version"],#1)<>last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","version"],#2) and length(last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","version"]))>0 |Info |
Manual close: Yes | |
Apache Kafka: Kafka service has been restarted | Uptime is less than 10 minutes. |
last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","start-time-ms"])<10m |Info |
Manual close: Yes | |
Apache Kafka: Broker is not connected to ZooKeeper | find(/Apache Kafka by JMX/jmx["kafka.server:type=SessionExpireListener,name=SessionState","Value"],,"regexp","CONNECTED")=0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Topic Metrics (write) | JMX agent | jmx.discovery[beans,"kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=*"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Kafka {#JMXTOPIC}: Messages in per second | The rate at which individual messages are consumed by topic. |
JMX agent | jmx["kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic={#JMXTOPIC}","Count"] Preprocessing
|
Kafka {#JMXTOPIC}: Bytes in per second | The rate at which data sent from producers is consumed by topic. |
JMX agent | jmx["kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic={#JMXTOPIC}","Count"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Topic Metrics (read) | JMX agent | jmx.discovery[beans,"kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic=*"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Kafka {#JMXTOPIC}: Bytes out per second | The rate at which data is fetched and read from the broker by consumers (by topic). |
JMX agent | jmx["kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic={#JMXTOPIC}","Count"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Topic Metrics (errors) | JMX agent | jmx.discovery[beans,"kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec,topic=*"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Kafka {#JMXTOPIC}: Bytes rejected per second | Rejected bytes rate by topic. |
JMX agent | jmx["kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec,topic={#JMXTOPIC}","Count"] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is used for monitoring Jira Data Center health. It is designed for standalone operation for on-premises Jira installations.
This template uses a single data source, JMX, which requires JMX RMI setup of your Jira application and Java Gateway setup on the Zabbix side. If you need "Garbage collector" and "Web server" monitoring, add "Generic Java JMX" and "Apache Tomcat by JMX" templates on the same host.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Metrics are collected by JMX.
{$JMX.USERNAME}
and {$JMX.PASSWORD}
.Name | Description | Default |
---|---|---|
{$JMX.USER} | User for JMX. |
|
{$JMX.PASSWORD} | Password for JMX. |
|
{$JIRA_DC.LICENSE.USER.CAPACITY.WARN} | User capacity warning threshold (%). |
80 |
{$JIRA_DC.DB.CONNECTION.USAGE.WARN} | Warning threshold for database connections usage (%). |
80 |
{$JIRA_DC.ISSUE.LATENCY.WARN} | Warning threshold for issue operation latency (in seconds). |
5 |
{$JIRA_DC.STORAGE.LATENCY.WARN} | Warning threshold for storage write operation latency (in seconds). |
5 |
{$JIRA_DC.INDEXING.LATENCY.WARN} | Warning threshold for indexing operation latency (in seconds). |
5 |
{$JIRA_DC.LLD.FILTER.MATCHES.HOMEFOLDERS} | Used for storage metric discovery. |
local|share |
{$JIRA_DC.LLD.FILTER.NOT.MATCHES.HOMEFOLDERS} | Used for storage metric discovery. |
NO MATCH |
{$JIRA_DC.LLD.FILTER.MATCHES.INDEXING} | Used for indexing metric discovery. |
.* |
{$JIRA_DC.LLD.FILTER.NOT.MATCHES.INDEXING} | Used for indexing metric discovery. |
NO MATCH |
{$JIRA_DC.LLD.FILTER.MATCHES.ISSUE} | Used for issue discovery. |
.* |
{$JIRA_DC.LLD.FILTER.NOT.MATCHES.ISSUE} | Used for issue discovery. |
NO MATCH |
{$JIRA_DC.LLD.FILTER.MATCHES.MAIL} | Used for mail server connection metric discovery. |
.* |
{$JIRA_DC.LLD.FILTER.NOT.MATCHES.MAIL} | Used for mail server connection metric discovery. |
NO MATCH |
{$JIRA_DC.LLD.FILTER.MATCHES.LICENSE} | Used for license discovery. |
.* |
{$JIRA_DC.LLD.FILTER.NOT.MATCHES.LICENSE} | Used for license discovery. |
NO MATCH |
Name | Description | Type | Key and additional info |
---|---|---|---|
DB: Connections: State | The state of the database connection. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=state,name=value",Value] |
DB: Connections: Failed per minute | The count of database connection failures registered in one minute. Units: fpm - fails per minute. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=failures,name=counter",Count] Preprocessing
|
DB: Pool: Connections: Idle | Idle connection count of the database pool. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=pool,category03=numIdle,name=value",Value] |
DB: Pool: Connections: Active | Active connection count of the database pool. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=pool,category03=numActive,name=value",Value] |
DB: Reads | Database read operations from Jira per second. Units: rps - read operations per second. |
JMX agent | jmx["com.atlassian.jira:type=db.reads",invocation.count] Preprocessing
|
DB: Writes | Database write operations from Jira per second. Units: wps - write operations per second. |
JMX agent | jmx["com.atlassian.jira:type=db.writes",invocation.count] Preprocessing
|
DB: Connections: Limit | Total allowed database connection count. |
JMX agent | jmx["com.atlassian.jira:name=BasicDataSource,connectionpool=connections",MaxTotal] |
DB: Connections: Active | Active database connection count. |
JMX agent | jmx["com.atlassian.jira:name=BasicDataSource,connectionpool=connections",NumActive] |
DB: Connections: Latency | The latest measure of latency when querying the database. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=latency,name=value",Value] |
License: Users: Get | License data for the discovery rule. |
JMX agent | jmx.discovery[attributes,"com.atlassian.jira:type=jira.license"] Preprocessing
|
HTTP: Pool: Connections: Active | The latest measure of the number of active connections in the HTTP connection pool. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=http,category01=connection,category02=pool,category03=numActive,name=value",Value] |
HTTP: Pool: Connections: Idle | The latest measure of the number of idle connections in the HTTP connection pool. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=http,category01=connection,category02=pool,category03=numIdle,name=value",Value] |
HTTP: Sessions: Active | The latest measure of the number of active user sessions. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=http,category01=connection,category02=sessions,category03=active,name=value",Value] |
HTTP: Requests per minute | The latest measure of the total number of HTTP requests per minute. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=http,category01=requests,name=value",Value] |
Mail: Queue | The latest measure of the number of items in a mail queue. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numItems,name=value",Value] |
Mail: Queue: Error | The latest measure of the number of items in an error mail queue. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numErrors,name=value",Value] |
Mail: Sent per minute | The latest measure of the number of emails sent by the SMTP server per minute. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numEmailsSentPerMin,name=value",Value] |
Mail: Processed per minute | The latest measure of the number of items processed by a mail queue per minute. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numItemsProcessedPerMin,name=value",Value] |
Mail: Queue: Processing state | The latest indicator of the state of a mail queue job. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=jobRunning,name=value",Value] |
Entity: Issues | The number of issues. |
JMX agent | jmx["com.atlassian.jira:type=entity.issues.total",Value] |
Entity: Attachments | The number of attachments. |
JMX agent | jmx["com.atlassian.jira:type=entity.attachments.total",Value] |
Entity: Components | The number of components. |
JMX agent | jmx["com.atlassian.jira:type=entity.components.total",Value] |
Entity: Custom fields | The number of custom fields. |
JMX agent | jmx["com.atlassian.jira:type=entity.customfields.total",Value] |
Entity: Filters | The number of filters. |
JMX agent | jmx["com.atlassian.jira:type=entity.filters.total",Value] |
Entity: Versions created | The number of versions created. |
JMX agent | jmx["com.atlassian.jira:type=entity.versions.total",Value] |
Issue: Search per minute | Issue searches performed per minute. |
JMX agent | jmx["com.atlassian.jira:type=issue.search.count",Value] Preprocessing
|
Issue: Created per minute | Issues created per minute. |
JMX agent | jmx["com.atlassian.jira:type=issue.created.count",Value] Preprocessing
|
Issue: Updates per minute | Issue updates performed per minute. |
JMX agent | jmx["com.atlassian.jira:type=issue.updated.count",Value] Preprocessing
|
Quicksearch: Concurrent searches | The number of concurrent searches that are being performed in real-time by using the quick search. |
JMX agent | jmx["com.atlassian.jira:type=quicksearch.concurrent.search",Value] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Jira Data Center: DB: Connection lost | Database connection lost |
max(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=state,name=value",Value],3m)=0 |Average |
Manual close: Yes | |
Jira Data Center: DB: Pool: Out of idle connections | Fires when out of idle connections in database pool for 5 minutes. |
min(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=pool,category03=numIdle,name=value",Value],5m)<=0 |Warning |
Manual close: Yes | |
Jira Data Center: DB: Connection usage is near the limit | 100*min(/Jira Data Center by JMX/jmx["com.atlassian.jira:name=BasicDataSource,connectionpool=connections",NumActive],5m)/last(/Jira Data Center by JMX/jmx["com.atlassian.jira:name=BasicDataSource,connectionpool=connections",MaxTotal])>{$JIRA_DC.DB.CONNECTION.USAGE.WARN} |Warning |
Manual close: Yes | ||
Jira Data Center: DB: Connection limit reached | min(/Jira Data Center by JMX/jmx["com.atlassian.jira:name=BasicDataSource,connectionpool=connections",NumActive],5m)=last(/Jira Data Center by JMX/jmx["com.atlassian.jira:name=BasicDataSource,connectionpool=connections",MaxTotal]) |Warning |
Manual close: Yes | ||
Jira Data Center: HTTP: Pool: Out of idle connections | All available connections are utilized. It can cause outages for users as the system is unable to serve their requests. |
min(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=http,category01=connection,category02=pool,category03=numIdle,name=value",Value],5m)<=0 |Warning |
Manual close: Yes | |
Jira Data Center: Mail: Queue: Doesn’t empty over an extended period | Might indicate SMTP performance or connection problems. |
min(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numItems,name=value",Value],30m)>0 |Warning |
Manual close: Yes Depends on:
|
|
Jira Data Center: Mail: Error queue contains one or more items | A mail queue attempts to resend items up to 10 times. If the operation fails for the 11th time, the items are put into an error mail queue. |
max(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numErrors,name=value",Value],5m)>0 |Warning |
Manual close: Yes | |
Jira Data Center: Mail: Queue job is not running | It should be running when its queue is not empty. |
max(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=jobRunning,name=value",Value],15m)=0 and min(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numItems,name=value",Value],15m)>0 |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Storage discovery | Discovery of the Jira storage metrics. |
JMX agent | jmx.discovery[beans,"com.atlassian.jira:type=metrics,category00=home,category01=,category02=write,category03=latency,,name=value"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Storage [{#JMXCATEGORY01}]: Latency | The median latency of writing a small file (~30 bytes) to |
JMX agent | jmx["{#JMXOBJ}",Value] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Jira Data Center: Storage [{#JMXCATEGORY01}]: Slow performance | Fires when latency grows above the threshold: |
min(/Jira Data Center by JMX/jmx["{#JMXOBJ}",Value],5m)>{$JIRA_DC.STORAGE.LATENCY.WARN:"{#JMXCATEGORY01}"} |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Mail server discovery | Discovery of the Jira connected mail servers. |
JMX agent | jmx.discovery[beans,"com.atlassian.jira:type=metrics,category00=mail,category01=,category02=connection,category03=state,name="] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Mail [{#JMXCATEGORY01},{#JMXNAME}]: Connection state | Shows connection state of Jira to discovered mail server: |
JMX agent | jmx["{#JMXOBJ}",Connected] Preprocessing
|
Mail [{#JMXCATEGORY01},{#JMXNAME}]: Failures per minute | Count of failed connections to discovered mail server |
JMX agent | jmx["{#JMXOBJ}",TotalFailures] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Jira Data Center: Mail [{#JMXCATEGORY01}-{#JMXNAME}]: Server disconnected | Trigger is fired when discovered mail server |
max(/Jira Data Center by JMX/jmx["{#JMXOBJ}",Connected],5m)=0 |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Indexing latency discovery | Discovery of the Jira indexing metrics. |
JMX agent | jmx.discovery[beans,"com.atlassian.jira:type=metrics,category00=indexing,name=*"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Indexing [{#JMXNAME}]: Latency | Average time spent on indexing operations. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=indexing,name={#JMXNAME}",Mean] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Jira Data Center: Indexing [{#JMXNAME}]: Slow performance | Fires when latency grows above the threshold: |
min(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=indexing,name={#JMXNAME}",Mean],5m)>{$JIRA_DC.INDEXING.LATENCY.WARN:"{#JMXNAME}"} |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Issue latency discovery | Discovery of the Jira issue latency metrics. |
JMX agent | jmx.discovery[beans,"com.atlassian.jira:type=metrics,category00=issue,name=*"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Issue [{#JMXNAME}]: Latency | Average time spent on issue |
JMX agent | jmx["{#JMXOBJ}",Mean] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Jira Data Center: Issue [{#JMXNAME}]: Slow operations | Fires when latency grows above the threshold: |
min(/Jira Data Center by JMX/jmx["{#JMXOBJ}",Mean],5m)>{$JIRA_DC.ISSUE.LATENCY.WARN:"{#JMXNAME}"} |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
License discovery | Discovery of the Jira licenses. |
Dependent item | jmx.license.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
License [{#LICENSE.TYPE}]: Users: Current | Current user count for |
Dependent item | jmx.license.get.user.current["{#LICENSE.TYPE}"] Preprocessing
|
License [{#LICENSE.TYPE}]: Users: Maximum | User count limit for
|
Dependent item | jmx.license.get.user.max["{#LICENSE.TYPE}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Jira Data Center: License [{#LICENSE.TYPE}]: Low user capacity | Fires when relative user quantity grows above the threshold: |
last(/Jira Data Center by JMX/jmx.license.get.user.max["{#LICENSE.TYPE}"])>=0 * (100*last(/Jira Data Center by JMX/jmx.license.get.user.current["{#LICENSE.TYPE}"])/last(/Jira Data Center by JMX/jmx.license.get.user.max["{#LICENSE.TYPE}"])>{$JIRA_DC.LICENSE.USER.CAPACITY.WARN:"{#LICENSE.TYPE}"}) |Warning |
Manual close: Yes Depends on:
|
|
Jira Data Center: License [{#LICENSE.TYPE}]: User count reached the limit | Fires when user quantity reaches the limit. |
last(/Jira Data Center by JMX/jmx.license.get.user.max["{#LICENSE.TYPE}"])>=0 * ((last(/Jira Data Center by JMX/jmx.license.get.user.max["{#LICENSE.TYPE}"])-last(/Jira Data Center by JMX/jmx.license.get.user.current["{#LICENSE.TYPE}"]))<=0) |Average |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Apache Jenkins by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Metrics are collected by requests to Metrics API. For common metrics: Install and configure Metrics plugin parameters according official documentations. Do not forget to configure access to the Metrics Servlet by issuing API key and change macro {$JENKINS.API.KEY}.
For monitoring computers and builds: Create API token for monitoring user according official documentations and change macro {$JENKINS.USER}, {$JENKINS.API.TOKEN}. Don't forget to change macros {$JENKINS.URL}.
Name | Description | Default |
---|---|---|
{$JENKINS.URL} | Jenkins URL in the format |
|
{$JENKINS.API.KEY} | API key to access Metrics Servlet |
|
{$JENKINS.USER} | Username for HTTP BASIC authentication |
zabbix |
{$JENKINS.API.TOKEN} | API token for HTTP BASIC authentication. |
|
{$JENKINS.PING.REPLY} | Expected reply to the ping. |
pong |
{$JENKINS.FILE_DESCRIPTORS.MAX.WARN} | Maximum percentage of file descriptors usage alert threshold (for trigger expression). |
85 |
{$JENKINS.JOB.HEALTH.SCORE.MIN.WARN} | Minimum job's health score (for trigger expression). |
50 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get service metrics | HTTP agent | jenkins.get_metrics Preprocessing
|
|
Get healthcheck | HTTP agent | jenkins.healthcheck Preprocessing
|
|
Get jobs info | HTTP agent | jenkins.job_info Preprocessing
|
|
Get computer info | HTTP agent | jenkins.computer_info Preprocessing
|
|
Disk space check message | The message will reference the first node which fails this check. There may be other nodes that fail the check, but this health check is designed to fail fast. |
Dependent item | jenkins.disk_space.message Preprocessing
|
Temporary space check message | The message will reference the first node which fails this check. There may be other nodes that fail the check, but this health check is designed to fail fast. |
Dependent item | jenkins.temporary_space.message Preprocessing
|
Plugins check message | The message of plugins health check. |
Dependent item | jenkins.plugins.message Preprocessing
|
Thread deadlock check message | The message of thread deadlock health check. |
Dependent item | jenkins.thread_deadlock.message Preprocessing
|
Disk space check | Returns FAIL if any of the Jenkins disk space monitors are reporting the disk space as less than the configured threshold. |
Dependent item | jenkins.disk_space Preprocessing
|
Plugins check | Returns FAIL if any of the Jenkins plugins failed to start. |
Dependent item | jenkins.plugins Preprocessing
|
Temporary space check | Returns FAIL if any of the Jenkins temporary space monitors are reporting the temporary space as less than the configured threshold. |
Dependent item | jenkins.temporary_space Preprocessing
|
Thread deadlock check | Returns FAIL if there are any deadlocked threads in the Jenkins master JVM. |
Dependent item | jenkins.thread_deadlock Preprocessing
|
Get gauges | Raw items for gauges metrics. |
Dependent item | jenkins.gauges.raw Preprocessing
|
Executors count | The number of executors available to Jenkins. This is corresponds to the sum of all the executors of all the online nodes. |
Dependent item | jenkins.executor.count Preprocessing
|
Executors free | The number of executors available to Jenkins that are not currently in use. |
Dependent item | jenkins.executor.free Preprocessing
|
Executors in use | The number of executors available to Jenkins that are currently in use. |
Dependent item | jenkins.executor.in_use Preprocessing
|
Nodes count | The number of build nodes available to Jenkins, both online and offline. |
Dependent item | jenkins.node.count Preprocessing
|
Nodes offline | The number of build nodes available to Jenkins but currently offline. |
Dependent item | jenkins.node.offline Preprocessing
|
Nodes online | The number of build nodes available to Jenkins and currently online. |
Dependent item | jenkins.node.online Preprocessing
|
Plugins active | The number of plugins in the Jenkins instance that started successfully. |
Dependent item | jenkins.plugins.active Preprocessing
|
Plugins failed | The number of plugins in the Jenkins instance that failed to start. A value other than 0 is typically indicative of a potential issue within the Jenkins installation that will either be solved by explicitly disabling the plugin(s) or by resolving the plugin dependency issues. |
Dependent item | jenkins.plugins.failed Preprocessing
|
Plugins inactive | The number of plugins in the Jenkins instance that are not currently enabled. |
Dependent item | jenkins.plugins.inactive Preprocessing
|
Plugins with update | The number of plugins in the Jenkins instance that have a newer version reported as available in the current Jenkins update center metadata held by Jenkins. This value is not indicative of an issue with Jenkins but high values can be used as a trigger to review the plugins with updates with a view to seeing whether those updates potentially contain fixes for issues that could be affecting your Jenkins instance. |
Dependent item | jenkins.plugins.with_update Preprocessing
|
Projects count | The number of projects. |
Dependent item | jenkins.project.count Preprocessing
|
Jobs count | The number of jobs in Jenkins. |
Dependent item | jenkins.job.count.value Preprocessing
|
Get meters | Raw items for meters metrics. |
Dependent item | jenkins.meters.raw Preprocessing
|
Job scheduled, m1 rate | The rate at which jobs are scheduled. If a job is already in the queue and an identical request for scheduling the job is received then Jenkins will coalesce the two requests. This metric gives a reasonably pure measure of the load requirements of the Jenkins master as it is unaffected by the number of executors available to the system. |
Dependent item | jenkins.job.scheduled.m1.rate Preprocessing
|
Jobs scheduled, m5 rate | The rate at which jobs are scheduled. If a job is already in the queue and an identical request for scheduling the job is received then Jenkins will coalesce the two requests. This metric gives a reasonably pure measure of the load requirements of the Jenkins master as it is unaffected by the number of executors available to the system. |
Dependent item | jenkins.job.scheduled.m5.rate Preprocessing
|
Get timers | Raw items for timers metrics. |
Dependent item | jenkins.timers.raw Preprocessing
|
Job blocked, m1 rate | The rate at which jobs in the build queue enter the blocked state. |
Dependent item | jenkins.job.blocked.m1.rate Preprocessing
|
Job blocked, m5 rate | The rate at which jobs in the build queue enter the blocked state. |
Dependent item | jenkins.job.blocked.m5.rate Preprocessing
|
Job blocked duration, p95 | The amount of time which jobs spend in the blocked state. |
Dependent item | jenkins.job.blocked.duration.p95 Preprocessing
|
Job blocked duration, median | The amount of time which jobs spend in the blocked state. |
Dependent item | jenkins.job.blocked.duration.p50 Preprocessing
|
Job building, m1 rate | The rate at which jobs are built. |
Dependent item | jenkins.job.building.m1.rate Preprocessing
|
Job building, m5 rate | The rate at which jobs are built. |
Dependent item | jenkins.job.building.m5.rate Preprocessing
|
Job building duration, p95 | The amount of time which jobs spend building. |
Dependent item | jenkins.job.building.duration.p95 Preprocessing
|
Job building duration, median | The amount of time which jobs spend building. |
Dependent item | jenkins.job.building.duration.p50 Preprocessing
|
Job buildable, m1 rate | The rate at which jobs in the build queue enter the buildable state. |
Dependent item | jenkins.job.buildable.m1.rate Preprocessing
|
Job buildable, m5 rate | The rate at which jobs in the build queue enter the buildable state. |
Dependent item | jenkins.job.buildable.m5.rate Preprocessing
|
Job buildable duration, p95 | The amount of time which jobs spend in the buildable state. |
Dependent item | jenkins.job.buildable.duration.p95 Preprocessing
|
Job buildable duration, median | The amount of time which jobs spend in the buildable state. |
Dependent item | jenkins.job.buildable.duration.p50 Preprocessing
|
Job queuing, m1 rate | The rate at which jobs are queued. |
Dependent item | jenkins.job.queuing.m1.rate Preprocessing
|
Job queuing, m5 rate | The rate at which jobs are queued. |
Dependent item | jenkins.job.queuing.m5.rate Preprocessing
|
Job queuing duration, p95 | The total time which jobs spend in the build queue. |
Dependent item | jenkins.job.queuing.duration.p95 Preprocessing
|
Job queuing duration, median | The total time which jobs spend in the build queue. |
Dependent item | jenkins.job.queuing.duration.p50 Preprocessing
|
Job total, m1 rate | The rate at which jobs are queued. |
Dependent item | jenkins.job.total.m1.rate Preprocessing
|
Job total, m5 rate | The rate at which jobs are queued. |
Dependent item | jenkins.job.total.m5.rate Preprocessing
|
Job total duration, p95 | The total time which jobs spend from entering the build queue to completing building. |
Dependent item | jenkins.job.total.duration.p95 Preprocessing
|
Job total duration, median | The total time which jobs spend from entering the build queue to completing building. |
Dependent item | jenkins.job.total.duration.p50 Preprocessing
|
Job waiting, m1 rate | The rate at which jobs enter the quiet period. |
Dependent item | jenkins.job.waiting.m1.rate Preprocessing
|
Job waiting, m5 rate | The rate at which jobs enter the quiet period. |
Dependent item | jenkins.job.waiting.m5.rate Preprocessing
|
Job waiting duration, p95 | The total amount of time that jobs spend in their quiet period. |
Dependent item | jenkins.job.waiting.duration.p95 Preprocessing
|
Job waiting duration, median | The total amount of time that jobs spend in their quiet period. |
Dependent item | jenkins.job.waiting.duration.p50 Preprocessing
|
Build queue, blocked | The number of jobs that are in the Jenkins build queue and currently in the blocked state. |
Dependent item | jenkins.queue.blocked Preprocessing
|
Build queue, size | The number of jobs that are in the Jenkins build queue. |
Dependent item | jenkins.queue.size Preprocessing
|
Build queue, buildable | The number of jobs that are in the Jenkins build queue and currently in the blocked state. |
Dependent item | jenkins.queue.buildable Preprocessing
|
Build queue, pending | The number of jobs that are in the Jenkins build queue and currently in the blocked state. |
Dependent item | jenkins.queue.pending Preprocessing
|
Build queue, stuck | The number of jobs that are in the Jenkins build queue and currently in the blocked state. |
Dependent item | jenkins.queue.stuck Preprocessing
|
HTTP active requests, rate | The number of currently active requests against the Jenkins master Web UI. |
Dependent item | jenkins.http.active_requests.rate Preprocessing
|
HTTP response 400, rate | The rate at which the Jenkins master Web UI is responding to requests with an HTTP/400 status code. |
Dependent item | jenkins.http.bad_request.rate Preprocessing
|
HTTP response 500, rate | The rate at which the Jenkins master Web UI is responding to requests with an HTTP/500 status code. |
Dependent item | jenkins.http.server_error.rate Preprocessing
|
HTTP response 503, rate | The rate at which the Jenkins master Web UI is responding to requests with an HTTP/503 status code. |
Dependent item | jenkins.http.service_unavailable.rate Preprocessing
|
HTTP response 200, rate | The rate at which the Jenkins master Web UI is responding to requests with an HTTP/200 status code. |
Dependent item | jenkins.http.ok.rate Preprocessing
|
HTTP response other, rate | The rate at which the Jenkins master Web UI is responding to requests with a non-informational status code that is not in the list: HTTP/200, HTTP/201, HTTP/204, HTTP/304, HTTP/400, HTTP/403, HTTP/404, HTTP/500, or HTTP/503. |
Dependent item | jenkins.http.other.rate Preprocessing
|
HTTP response 201, rate | The rate at which the Jenkins master Web UI is responding to requests with an HTTP/201 status code. |
Dependent item | jenkins.http.created.rate Preprocessing
|
HTTP response 204, rate | The rate at which the Jenkins master Web UI is responding to requests with an HTTP/204 status code. |
Dependent item | jenkins.http.no_content.rate Preprocessing
|
HTTP response 404, rate | The rate at which the Jenkins master Web UI is responding to requests with an HTTP/404 status code. |
Dependent item | jenkins.http.not_found.rate Preprocessing
|
HTTP response 304, rate | The rate at which the Jenkins master Web UI is responding to requests with an HTTP/304 status code. |
Dependent item | jenkins.http.not_modified.rate Preprocessing
|
HTTP response 403, rate | The rate at which the Jenkins master Web UI is responding to requests with an HTTP/403 status code. |
Dependent item | jenkins.http.forbidden.rate Preprocessing
|
HTTP requests, rate | The rate at which the Jenkins master Web UI is receiving requests. |
Dependent item | jenkins.http.requests.rate Preprocessing
|
HTTP requests, p95 | The time spent generating the corresponding responses. |
Dependent item | jenkins.http.requests_p95.rate Preprocessing
|
HTTP requests, median | The time spent generating the corresponding responses. |
Dependent item | jenkins.http.requests_p50.rate Preprocessing
|
Version | Version of Jenkins server. |
Dependent item | jenkins.version Preprocessing
|
CPU Load | The system load on the Jenkins master as reported by the JVM's Operating System JMX bean. The calculation of system load is operating system dependent. Typically this is the sum of the number of processes that are currently running plus the number that are waiting to run. This is typically comparable against the number of CPU cores. |
Dependent item | jenkins.system.cpu.load Preprocessing
|
Uptime | The number of seconds since the Jenkins master JVM started. |
Dependent item | jenkins.system.uptime Preprocessing
|
File descriptor ratio | The ratio of used to total file descriptors |
Dependent item | jenkins.descriptor.ratio Preprocessing
|
Service ping | HTTP agent | jenkins.ping Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Jenkins: Disk space is too low | Jenkins disk space monitors are reporting the disk space as less than the configured threshold. The message will reference the first node which fails this check. |
last(/Jenkins by HTTP/jenkins.disk_space)=0 and length(last(/Jenkins by HTTP/jenkins.disk_space.message))>0 |Warning |
||
Jenkins: One or more Jenkins plugins failed to start | A failure is typically indicative of a potential issue within the Jenkins installation that will either be solved by explicitly disabling the failing plugin(s) or by resolving the corresponding plugin dependency issues. |
last(/Jenkins by HTTP/jenkins.plugins)=0 and length(last(/Jenkins by HTTP/jenkins.plugins.message))>0 |Info |
Manual close: Yes | |
Jenkins: Temporary space is too low | Jenkins temporary space monitors are reporting the temporary space as less than the configured threshold. The message will reference the first node which fails this check. |
last(/Jenkins by HTTP/jenkins.temporary_space)=0 and length(last(/Jenkins by HTTP/jenkins.temporary_space.message))>0 |Warning |
||
Jenkins: There are deadlocked threads in Jenkins master JVM | There are any deadlocked threads in the Jenkins master JVM. |
last(/Jenkins by HTTP/jenkins.thread_deadlock)=0 and length(last(/Jenkins by HTTP/jenkins.thread_deadlock.message))>0 |Warning |
||
Jenkins: Service has no online nodes | last(/Jenkins by HTTP/jenkins.node.online)=0 |Average |
|||
Jenkins: Version has changed | The Jenkins version has changed. Acknowledge to close the problem manually. |
last(/Jenkins by HTTP/jenkins.version,#1)<>last(/Jenkins by HTTP/jenkins.version,#2) and length(last(/Jenkins by HTTP/jenkins.version))>0 |Info |
Manual close: Yes | |
Jenkins: Host has been restarted | Uptime is less than 10 minutes. |
last(/Jenkins by HTTP/jenkins.system.uptime)<10m |Info |
Manual close: Yes | |
Jenkins: Current number of used files is too high | min(/Jenkins by HTTP/jenkins.descriptor.ratio,5m)>{$JENKINS.FILE_DESCRIPTORS.MAX.WARN} |Warning |
|||
Jenkins: Service is down | last(/Jenkins by HTTP/jenkins.ping)=0 |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Jobs discovery | HTTP agent | jenkins.jobs Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Job [{#NAME}]: Get job | Raw data for a job. |
Dependent item | jenkins.job.get[{#NAME}] Preprocessing
|
Job [{#NAME}]: Health score | Represents health of project. A number between 0-100. Job Description: {#DESCRIPTION} Job Url: {#URL} |
Dependent item | jenkins.build.health[{#NAME}] Preprocessing
|
Job [{#NAME}]: Last Build number | Details: {#URL}/lastBuild/ |
Dependent item | jenkins.last_build.number[{#NAME}] Preprocessing
|
Job [{#NAME}]: Last Build duration | Build duration (in seconds). |
Dependent item | jenkins.last_build.duration[{#NAME}] Preprocessing
|
Job [{#NAME}]: Last Build timestamp | Dependent item | jenkins.last_build.timestamp[{#NAME}] Preprocessing
|
|
Job [{#NAME}]: Last Build result | Dependent item | jenkins.last_build.result[{#NAME}] Preprocessing
|
|
Job [{#NAME}]: Last Failed Build number | Details: {#URL}/lastFailedBuild/ |
Dependent item | jenkins.lastfailedbuild.number[{#NAME}] Preprocessing
|
Job [{#NAME}]: Last Failed Build duration | Build duration (in seconds). |
Dependent item | jenkins.lastfailedbuild.duration[{#NAME}] Preprocessing
|
Job [{#NAME}]: Last Failed Build timestamp | Dependent item | jenkins.lastfailedbuild.timestamp[{#NAME}] Preprocessing
|
|
Job [{#NAME}]: Last Successful Build number | Details: {#URL}/lastSuccessfulBuild/ |
Dependent item | jenkins.lastsuccessfulbuild.number[{#NAME}] Preprocessing
|
Job [{#NAME}]: Last Successful Build duration | Build duration (in seconds). |
Dependent item | jenkins.lastsuccessfulbuild.duration[{#NAME}] Preprocessing
|
Job [{#NAME}]: Last Successful Build timestamp | Dependent item | jenkins.lastsuccessfulbuild.timestamp[{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Jenkins: Job [{#NAME}]: Job is unhealthy | last(/Jenkins by HTTP/jenkins.build.health[{#NAME}])<{$JENKINS.JOB.HEALTH.SCORE.MIN.WARN} |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Computers discovery | HTTP agent | jenkins.computers Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Computer [{#DISPLAY_NAME}]: Get computer | Raw data for a computer. |
Dependent item | jenkins.computer.get[{#DISPLAY_NAME}] Preprocessing
|
Computer [{#DISPLAY_NAME}]: Executors | The maximum number of concurrent builds that Jenkins may perform on this node. |
Dependent item | jenkins.computer.numExecutors[{#DISPLAY_NAME}] Preprocessing
|
Computer [{#DISPLAY_NAME}]: State | Represents the actual online/offline state. Node description: {#DESCRIPTION} |
Dependent item | jenkins.computer.state[{#DISPLAY_NAME}] Preprocessing
|
Computer [{#DISPLAY_NAME}]: Offline cause reason | If the computer was offline (either temporarily or not), will return the cause as a string (without user info). Empty string if the system was put offline without given a cause. |
Dependent item | jenkins.computer.offline.reason[{#DISPLAY_NAME}] Preprocessing
|
Computer [{#DISPLAY_NAME}]: Idle | Returns true if all the executors of this computer are idle. |
Dependent item | jenkins.computer.idle[{#DISPLAY_NAME}] Preprocessing
|
Computer [{#DISPLAY_NAME}]: Temporarily offline | Returns true if this node is marked temporarily offline. |
Dependent item | jenkins.computer.tempoffline[{#DISPLAYNAME}] Preprocessing
|
Computer [{#DISPLAY_NAME}]: Available disk space | The available disk space of $JENKINS_HOME on agent. |
Dependent item | jenkins.computer.diskspace[{#DISPLAYNAME}] Preprocessing
|
Computer [{#DISPLAY_NAME}]: Available temp space | The available disk space of the temporary directory. Java tools and tests/builds often create files in the temporary directory, and may not function properly if there's no available space. |
Dependent item | jenkins.computer.tempspace[{#DISPLAYNAME}] Preprocessing
|
Computer [{#DISPLAY_NAME}]: Response time average | The round trip network response time from the master to the agent |
Dependent item | jenkins.computer.responsetime[{#DISPLAYNAME}] Preprocessing
|
Computer [{#DISPLAY_NAME}]: Available physical memory | The total physical memory of the system, available bytes. |
Dependent item | jenkins.computer.availablephysicalmemory[{#DISPLAY_NAME}] Preprocessing
|
Computer [{#DISPLAY_NAME}]: Available swap space | Available swap space in bytes. |
Dependent item | jenkins.computer.availableswapspace[{#DISPLAY_NAME}] Preprocessing
|
Computer [{#DISPLAY_NAME}]: Total physical memory | Total physical memory of the system, in bytes. |
Dependent item | jenkins.computer.totalphysicalmemory[{#DISPLAY_NAME}] Preprocessing
|
Computer [{#DISPLAY_NAME}]: Total swap space | Total number of swap space in bytes. |
Dependent item | jenkins.computer.totalswapspace[{#DISPLAY_NAME}] Preprocessing
|
Computer [{#DISPLAY_NAME}]: Clock difference | The clock difference between the master and nodes. |
Dependent item | jenkins.computer.clockdifference[{#DISPLAYNAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Jenkins: Computer [{#DISPLAY_NAME}]: Node is down | Node down with reason: {{ITEM.LASTVALUE2}.regsub("(.*)",\1)} |
last(/Jenkins by HTTP/jenkins.computer.state[{#DISPLAY_NAME}])=1 and length(last(/Jenkins by HTTP/jenkins.computer.offline.reason[{#DISPLAY_NAME}]))>0 |Average |
Depends on:
|
|
Jenkins: Computer [{#DISPLAY_NAME}]: Node is temporarily offline | Node is temporarily Offline with reason: {{ITEM.LASTVALUE2}.regsub("(.*)",\1)} |
last(/Jenkins by HTTP/jenkins.computer.temp_offline[{#DISPLAY_NAME}])=1 and length(last(/Jenkins by HTTP/jenkins.computer.offline.reason[{#DISPLAY_NAME}]))>0 |Info |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor IIS (Internet Information Services) by Zabbix that works without any external scripts.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
You have to enable the following Windows Features (Control Panel > Programs and Features > Turn Windows features on or off) on your server
Web Server (IIS)
Web Server (IIS)\Management Tools\IIS Management Scripts and Tools
Optionally, it is possible to customize the template:
Name | Description | Default |
---|---|---|
{$IIS.PORT} | Listening port. |
80 |
{$IIS.SERVICE} | The service (http/https/etc) for port check. See "net.tcp.service" documentation page for more information: https://www.zabbix.com/documentation/7.0/manual/config/items/itemtypes/simple_checks |
http |
{$IIS.QUEUE.MAX.WARN} | Maximum application pool's request queue length for trigger expression. |
|
{$IIS.QUEUE.MAX.TIME} | The time during which the queue length may exceed the threshold. |
5m |
{$IIS.APPPOOL.NOT_MATCHES} | This macro is used in application pools discovery. Can be overridden on the host or linked template level. |
<CHANGE_IF_NEEDED> |
{$IIS.APPPOOL.MATCHES} | This macro is used in application pools discovery. Can be overridden on the host or linked template level. |
.+ |
{$IIS.APPPOOL.MONITORED} | Monitoring status for discovered application pools. Use context to avoid trigger firing for specific application pools. "1" - enabled, "0" - disabled. |
1 |
{$AGENT.TIMEOUT} | Timeout after which agent is considered unavailable. |
5m |
Name | Description | Type | Key and additional info |
---|---|---|---|
World Wide Web Publishing Service (W3SVC) state | The World Wide Web Publishing Service (W3SVC) provides web connectivity and administration of websites through the IIS snap-in. If the World Wide Web Publishing Service stops, the operating system cannot serve any form of web request. This service was dependent on "Windows Process Activation Service". |
Zabbix agent (active) | service.info[W3SVC] Preprocessing
|
Windows Process Activation Service (WAS) state | Windows Process Activation Service (WAS) is a tool for managing worker processes that contain applications that host Windows Communication Foundation (WCF) services. Worker processes handle requests that are sent to a Web Server for specific application pools. Each application pool sets boundaries for the applications it contains. |
Zabbix agent (active) | service.info[WAS] Preprocessing
|
{$IIS.PORT} port ping | Simple check | net.tcp.service[{$IIS.SERVICE},,{$IIS.PORT}] Preprocessing
|
|
Uptime | The service uptime expressed in seconds. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Service Uptime"] |
Bytes Received per second | The average rate per minute at which data bytes are received by the service at the Application Layer. Does not include protocol headers or control bytes. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Bytes Received/sec", 60] |
Bytes Sent per second | The average rate per minute at which data bytes are sent by the service. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Bytes Sent/sec", 60] |
Bytes Total per second | The average rate per minute of total bytes/sec transferred by the Web service (sum of bytes sent/sec and bytes received/sec). |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Bytes Total/Sec", 60] |
Current connections | The number of active connections. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Current Connections"] |
Total connection attempts | The total number of connections to the Web or FTP service that have been attempted since service startup. The count is the total for all Web sites or FTP sites combined. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Total Connection Attempts (all instances)"] Preprocessing
|
Connection attempts per second | The average rate per minute that connections using the Web service are being attempted. The count is the average for all Web sites combined. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Connection Attempts/Sec", 60] Preprocessing
|
Anonymous users per second | The number of requests from users over an anonymous connection per second. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Anonymous Users/sec", 60] |
NonAnonymous users per second | The number of requests from users over a non-anonymous connection per second. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\NonAnonymous Users/sec", 60] |
Method GET requests per second | The rate of HTTP requests made using the GET method. GET requests are generally used for basic file retrievals or image maps, though they can be used with forms. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Get Requests/Sec", 60] Preprocessing
|
Method COPY requests per second | The rate of HTTP requests made using the COPY method. Copy requests are used for copying files and directories. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Copy Requests/Sec", 60] Preprocessing
|
Method CGI requests per second | The rate of CGI requests that are simultaneously being processed by the Web service. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\CGI Requests/Sec", 60] Preprocessing
|
Method DELETE requests per second | The rate of HTTP requests using the DELETE method made. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Delete Requests/Sec", 60] Preprocessing
|
Method HEAD requests per second | The rate of HTTP requests using the HEAD method made. HEAD requests generally indicate a client is querying the state of a document they already have to see if it needs to be refreshed. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Head Requests/Sec", 60] Preprocessing
|
Method ISAPI requests per second | The rate of ISAPI Extension requests that are simultaneously being processed by the Web service. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\ISAPI Extension Requests/Sec", 60] Preprocessing
|
Method LOCK requests per second | The rate of HTTP requests made using the LOCK method. Lock requests are used to lock a file for one user so that only that user can modify the file. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Lock Requests/Sec", 60] Preprocessing
|
Method MKCOL requests per second | The rate of HTTP requests using the MKCOL method made. Mkcol requests are used to create directories on the server. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Mkcol Requests/Sec", 60] Preprocessing
|
Method MOVE requests per second | The rate of HTTP requests using the MOVE method made. Move requests are used for moving files and directories. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Move Requests/Sec", 60] Preprocessing
|
Method OPTIONS requests per second | The rate of HTTP requests using the OPTIONS method made. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Options Requests/Sec", 60] Preprocessing
|
Method POST requests per second | Rate of HTTP requests using POST method. Generally used for forms or gateway requests. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Post Requests/Sec", 60] Preprocessing
|
Method PROPFIND requests per second | The rate of HTTP requests using the PROPFIND method made. Propfind requests retrieve property values on files and directories. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Propfind Requests/Sec", 60] Preprocessing
|
Method PROPPATCH requests per second | The rate of HTTP requests using the PROPPATCH method made. Proppatch requests set property values on files and directories. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Proppatch Requests/Sec", 60] Preprocessing
|
Method PUT requests per second | The rate of HTTP requests using the PUT method made. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Put Requests/Sec", 60] Preprocessing
|
Method MS-SEARCH requests per second | The rate of HTTP requests using the MS-SEARCH method made. Search requests are used to query the server to find resources that match a set of conditions provided by the client. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Search Requests/Sec", 60] Preprocessing
|
Method TRACE requests per second | The rate of HTTP requests using the TRACE method made. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Trace Requests/Sec", 60] Preprocessing
|
Method TRACE requests per second | The rate of HTTP requests using the UNLOCK method made. Unlock requests are used to remove locks from files. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Unlock Requests/Sec", 60] Preprocessing
|
Method Total requests per second | The rate of all HTTP requests received. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Total Method Requests/Sec", 60] Preprocessing
|
Method Total Other requests per second | Total Other Request Methods is the number of HTTP requests that are not OPTIONS, GET, HEAD, POST, PUT, DELETE, TRACE, MOVE, COPY, MKCOL, PROPFIND, PROPPATCH, SEARCH, LOCK or UNLOCK methods (since service startup). Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Other Request Methods/Sec", 60] Preprocessing
|
Locked errors per second | The rate of errors due to requests that couldn't be satisfied by the server because the requested document was locked. These are generally reported as an HTTP 423 error code to the client. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Locked Errors/Sec", 60] Preprocessing
|
Not Found errors per second | The rate of errors due to requests that couldn't be satisfied by the server because the requested document could not be found. These are generally reported to the client with HTTP error code 404. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Not Found Errors/Sec", 60] Preprocessing
|
Files cache hits percentage | The ratio of user-mode file cache hits to total cache requests (since service startup). Note: This value might be low if the Kernel URI cache hits percentage is high. |
Zabbix agent (active) | perfcounteren["\Web Service Cache\File Cache Hits %"] Preprocessing
|
URIs cache hits percentage | The ratio of user-mode URI Cache Hits to total cache requests (since service startup) |
Zabbix agent (active) | perfcounteren["\Web Service Cache\URI Cache Hits %"] Preprocessing
|
File cache misses | The total number of unsuccessful lookups in the user-mode file cache since service startup. |
Zabbix agent (active) | perfcounteren["\Web Service Cache\File Cache Misses"] Preprocessing
|
URI cache misses | The total number of unsuccessful lookups in the user-mode URI cache since service startup. |
Zabbix agent (active) | perfcounteren["\Web Service Cache\URI Cache Misses"] Preprocessing
|
Active agent availability | Availability of active checks on the host. The value of this item corresponds to availability icons in the host list. Possible values: 0 - unknown 1 - available 2 - not available |
Zabbix internal | zabbix[host,active_agent,available] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
IIS: The World Wide Web Publishing Service (W3SVC) is not running | The World Wide Web Publishing Service (W3SVC) is not in the running state. IIS cannot start. |
last(/IIS by Zabbix agent active/service.info[W3SVC])<>0 |High |
Depends on:
|
|
IIS: Windows process Activation Service (WAS) is not running | Windows Process Activation Service (WAS) is not in the running state. IIS cannot start. |
last(/IIS by Zabbix agent active/service.info[WAS])<>0 |High |
||
IIS: Port {$IIS.PORT} is down | last(/IIS by Zabbix agent active/net.tcp.service[{$IIS.SERVICE},,{$IIS.PORT}])=0 |Average |
Manual close: Yes Depends on:
|
||
IIS: Service has been restarted | Uptime is less than 10 minutes. |
last(/IIS by Zabbix agent active/perf_counter_en["\Web Service(_Total)\Service Uptime"])<10m |Info |
Manual close: Yes | |
IIS: Active checks are not available | Active checks are considered unavailable. Agent is not sending heartbeat for prolonged time. |
min(/IIS by Zabbix agent active/zabbix[host,active_agent,available],{$AGENT.TIMEOUT})=2 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Application pools discovery | Zabbix agent (active) | wmi.getall[root\webAdministration, select Name from ApplicationPool] |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#APPPOOL} Uptime | The web application uptime period since the last restart. |
Zabbix agent (active) | perfcounteren["\APPPOOLWAS({#APPPOOL})\Current Application Pool Uptime"] |
AppPool {#APPPOOL} state | The state of the application pool. |
Zabbix agent (active) | perfcounteren["\APPPOOLWAS({#APPPOOL})\Current Application Pool State"] Preprocessing
|
AppPool {#APPPOOL} recycles | The number of times the application pool has been recycled since Windows Process Activation Service (WAS) started. |
Zabbix agent (active) | perfcounteren["\APPPOOLWAS({#APPPOOL})\Total Application Pool Recycles"] Preprocessing
|
AppPool {#APPPOOL} current queue size | The number of requests in the queue. |
Zabbix agent (active) | perfcounteren["\HTTP Service Request Queues({#APPPOOL})\CurrentQueueSize"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
IIS: {#APPPOOL} has been restarted | Uptime is less than 10 minutes. |
last(/IIS by Zabbix agent active/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Current Application Pool Uptime"])<10m |Info |
Manual close: Yes | |
IIS: Application pool {#APPPOOL} is not in Running state | last(/IIS by Zabbix agent active/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Current Application Pool State"])<>3 and {$IIS.APPPOOL.MONITORED:"{#APPPOOL}"}=1 |High |
Depends on:
|
||
IIS: Application pool {#APPPOOL} has been recycled | last(/IIS by Zabbix agent active/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Total Application Pool Recycles"],#1)<>last(/IIS by Zabbix agent active/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Total Application Pool Recycles"],#2) and {$IIS.APPPOOL.MONITORED:"{#APPPOOL}"}=1 |Info |
|||
IIS: Request queue of {#APPPOOL} is too large | min(/IIS by Zabbix agent active/perf_counter_en["\HTTP Service Request Queues({#APPPOOL})\CurrentQueueSize"],{$IIS.QUEUE.MAX.TIME})>{$IIS.QUEUE.MAX.WARN} |Warning |
Depends on:
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor IIS (Internet Information Services) by Zabbix that works without any external scripts.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
You have to enable the following Windows Features (Control Panel > Programs and Features > Turn Windows features on or off) on your server
Web Server (IIS)
Web Server (IIS)\Management Tools\IIS Management Scripts and Tools
Optionally, it is possible to customize the template:
Name | Description | Default |
---|---|---|
{$IIS.PORT} | Listening port. |
80 |
{$IIS.SERVICE} | The service (http/https/etc) for port check. See "net.tcp.service" documentation page for more information: https://www.zabbix.com/documentation/7.0/manual/config/items/itemtypes/simple_checks |
http |
{$IIS.QUEUE.MAX.WARN} | Maximum application pool's request queue length for trigger expression. |
|
{$IIS.QUEUE.MAX.TIME} | The time during which the queue length may exceed the threshold. |
5m |
{$IIS.APPPOOL.NOT_MATCHES} | This macro is used in application pools discovery. Can be overridden on the host or linked template level. |
<CHANGE_IF_NEEDED> |
{$IIS.APPPOOL.MATCHES} | This macro is used in application pools discovery. Can be overridden on the host or linked template level. |
.+ |
{$IIS.APPPOOL.MONITORED} | Monitoring status for discovered application pools. Use context to avoid trigger firing for specific application pools. "1" - enabled, "0" - disabled. |
1 |
Name | Description | Type | Key and additional info |
---|---|---|---|
World Wide Web Publishing Service (W3SVC) state | The World Wide Web Publishing Service (W3SVC) provides web connectivity and administration of websites through the IIS snap-in. If the World Wide Web Publishing Service stops, the operating system cannot serve any form of web request. This service was dependent on "Windows Process Activation Service". |
Zabbix agent | service.info[W3SVC] Preprocessing
|
Windows Process Activation Service (WAS) state | Windows Process Activation Service (WAS) is a tool for managing worker processes that contain applications that host Windows Communication Foundation (WCF) services. Worker processes handle requests that are sent to a Web Server for specific application pools. Each application pool sets boundaries for the applications it contains. |
Zabbix agent | service.info[WAS] Preprocessing
|
{$IIS.PORT} port ping | Simple check | net.tcp.service[{$IIS.SERVICE},,{$IIS.PORT}] Preprocessing
|
|
Uptime | The service uptime expressed in seconds. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Service Uptime"] |
Bytes Received per second | The average rate per minute at which data bytes are received by the service at the Application Layer. Does not include protocol headers or control bytes. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Bytes Received/sec", 60] |
Bytes Sent per second | The average rate per minute at which data bytes are sent by the service. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Bytes Sent/sec", 60] |
Bytes Total per second | The average rate per minute of total bytes/sec transferred by the Web service (sum of bytes sent/sec and bytes received/sec). |
Zabbix agent | perfcounteren["\Web Service(_Total)\Bytes Total/Sec", 60] |
Current connections | The number of active connections. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Current Connections"] |
Total connection attempts | The total number of connections to the Web or FTP service that have been attempted since service startup. The count is the total for all Web sites or FTP sites combined. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Total Connection Attempts (all instances)"] Preprocessing
|
Connection attempts per second | The average rate per minute that connections using the Web service are being attempted. The count is the average for all Web sites combined. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Connection Attempts/Sec", 60] Preprocessing
|
Anonymous users per second | The number of requests from users over an anonymous connection per second. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Anonymous Users/sec", 60] |
NonAnonymous users per second | The number of requests from users over a non-anonymous connection per second. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\NonAnonymous Users/sec", 60] |
Method GET requests per second | The rate of HTTP requests made using the GET method. GET requests are generally used for basic file retrievals or image maps, though they can be used with forms. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Get Requests/Sec", 60] Preprocessing
|
Method COPY requests per second | The rate of HTTP requests made using the COPY method. Copy requests are used for copying files and directories. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Copy Requests/Sec", 60] Preprocessing
|
Method CGI requests per second | The rate of CGI requests that are simultaneously being processed by the Web service. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\CGI Requests/Sec", 60] Preprocessing
|
Method DELETE requests per second | The rate of HTTP requests using the DELETE method made. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Delete Requests/Sec", 60] Preprocessing
|
Method HEAD requests per second | The rate of HTTP requests using the HEAD method made. HEAD requests generally indicate a client is querying the state of a document they already have to see if it needs to be refreshed. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Head Requests/Sec", 60] Preprocessing
|
Method ISAPI requests per second | The rate of ISAPI Extension requests that are simultaneously being processed by the Web service. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\ISAPI Extension Requests/Sec", 60] Preprocessing
|
Method LOCK requests per second | The rate of HTTP requests made using the LOCK method. Lock requests are used to lock a file for one user so that only that user can modify the file. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Lock Requests/Sec", 60] Preprocessing
|
Method MKCOL requests per second | The rate of HTTP requests using the MKCOL method made. Mkcol requests are used to create directories on the server. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Mkcol Requests/Sec", 60] Preprocessing
|
Method MOVE requests per second | The rate of HTTP requests using the MOVE method made. Move requests are used for moving files and directories. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Move Requests/Sec", 60] Preprocessing
|
Method OPTIONS requests per second | The rate of HTTP requests using the OPTIONS method made. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Options Requests/Sec", 60] Preprocessing
|
Method POST requests per second | Rate of HTTP requests using POST method. Generally used for forms or gateway requests. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Post Requests/Sec", 60] Preprocessing
|
Method PROPFIND requests per second | The rate of HTTP requests using the PROPFIND method made. Propfind requests retrieve property values on files and directories. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Propfind Requests/Sec", 60] Preprocessing
|
Method PROPPATCH requests per second | The rate of HTTP requests using the PROPPATCH method made. Proppatch requests set property values on files and directories. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Proppatch Requests/Sec", 60] Preprocessing
|
Method PUT requests per second | The rate of HTTP requests using the PUT method made. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Put Requests/Sec", 60] Preprocessing
|
Method MS-SEARCH requests per second | The rate of HTTP requests using the MS-SEARCH method made. Search requests are used to query the server to find resources that match a set of conditions provided by the client. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Search Requests/Sec", 60] Preprocessing
|
Method TRACE requests per second | The rate of HTTP requests using the TRACE method made. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Trace Requests/Sec", 60] Preprocessing
|
Method TRACE requests per second | The rate of HTTP requests using the UNLOCK method made. Unlock requests are used to remove locks from files. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Unlock Requests/Sec", 60] Preprocessing
|
Method Total requests per second | The rate of all HTTP requests received. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Total Method Requests/Sec", 60] Preprocessing
|
Method Total Other requests per second | Total Other Request Methods is the number of HTTP requests that are not OPTIONS, GET, HEAD, POST, PUT, DELETE, TRACE, MOVE, COPY, MKCOL, PROPFIND, PROPPATCH, SEARCH, LOCK or UNLOCK methods (since service startup). Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Other Request Methods/Sec", 60] Preprocessing
|
Locked errors per second | The rate of errors due to requests that couldn't be satisfied by the server because the requested document was locked. These are generally reported as an HTTP 423 error code to the client. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Locked Errors/Sec", 60] Preprocessing
|
Not Found errors per second | The rate of errors due to requests that couldn't be satisfied by the server because the requested document could not be found. These are generally reported to the client with HTTP error code 404. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Not Found Errors/Sec", 60] Preprocessing
|
Files cache hits percentage | The ratio of user-mode file cache hits to total cache requests (since service startup). Note: This value might be low if the Kernel URI cache hits percentage is high. |
Zabbix agent | perfcounteren["\Web Service Cache\File Cache Hits %"] Preprocessing
|
URIs cache hits percentage | The ratio of user-mode URI Cache Hits to total cache requests (since service startup) |
Zabbix agent | perfcounteren["\Web Service Cache\URI Cache Hits %"] Preprocessing
|
File cache misses | The total number of unsuccessful lookups in the user-mode file cache since service startup. |
Zabbix agent | perfcounteren["\Web Service Cache\File Cache Misses"] Preprocessing
|
URI cache misses | The total number of unsuccessful lookups in the user-mode URI cache since service startup. |
Zabbix agent | perfcounteren["\Web Service Cache\URI Cache Misses"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
IIS: The World Wide Web Publishing Service (W3SVC) is not running | The World Wide Web Publishing Service (W3SVC) is not in the running state. IIS cannot start. |
last(/IIS by Zabbix agent/service.info[W3SVC])<>0 |High |
Depends on:
|
|
IIS: Windows process Activation Service (WAS) is not running | Windows Process Activation Service (WAS) is not in the running state. IIS cannot start. |
last(/IIS by Zabbix agent/service.info[WAS])<>0 |High |
||
IIS: Port {$IIS.PORT} is down | last(/IIS by Zabbix agent/net.tcp.service[{$IIS.SERVICE},,{$IIS.PORT}])=0 |Average |
Manual close: Yes Depends on:
|
||
IIS: Service has been restarted | Uptime is less than 10 minutes. |
last(/IIS by Zabbix agent/perf_counter_en["\Web Service(_Total)\Service Uptime"])<10m |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Application pools discovery | Zabbix agent | wmi.getall[root\webAdministration, select Name from ApplicationPool] |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#APPPOOL} Uptime | The web application uptime period since the last restart. |
Zabbix agent | perfcounteren["\APPPOOLWAS({#APPPOOL})\Current Application Pool Uptime"] |
AppPool {#APPPOOL} state | The state of the application pool. |
Zabbix agent | perfcounteren["\APPPOOLWAS({#APPPOOL})\Current Application Pool State"] Preprocessing
|
AppPool {#APPPOOL} recycles | The number of times the application pool has been recycled since Windows Process Activation Service (WAS) started. |
Zabbix agent | perfcounteren["\APPPOOLWAS({#APPPOOL})\Total Application Pool Recycles"] Preprocessing
|
AppPool {#APPPOOL} current queue size | The number of requests in the queue. |
Zabbix agent | perfcounteren["\HTTP Service Request Queues({#APPPOOL})\CurrentQueueSize"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
IIS: {#APPPOOL} has been restarted | Uptime is less than 10 minutes. |
last(/IIS by Zabbix agent/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Current Application Pool Uptime"])<10m |Info |
Manual close: Yes | |
IIS: Application pool {#APPPOOL} is not in Running state | last(/IIS by Zabbix agent/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Current Application Pool State"])<>3 and {$IIS.APPPOOL.MONITORED:"{#APPPOOL}"}=1 |High |
Depends on:
|
||
IIS: Application pool {#APPPOOL} has been recycled | last(/IIS by Zabbix agent/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Total Application Pool Recycles"],#1)<>last(/IIS by Zabbix agent/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Total Application Pool Recycles"],#2) and {$IIS.APPPOOL.MONITORED:"{#APPPOOL}"}=1 |Info |
|||
IIS: Request queue of {#APPPOOL} is too large | min(/IIS by Zabbix agent/perf_counter_en["\HTTP Service Request Queues({#APPPOOL})\CurrentQueueSize"],{$IIS.QUEUE.MAX.TIME})>{$IIS.QUEUE.MAX.WARN} |Warning |
Depends on:
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor HAProxy by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template collects metrics by polling the HAProxy stats page with HTTP agent.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
HAProxy stats page
.If you want to use authentication, set the username and password in the stats auth
option of the configuration file.
The example configuration of HAProxy:
frontend stats
bind *:8404
stats enable
stats uri /stats
stats refresh 10s
#stats auth Username:Password # Authentication credentials
Set the hostname or IP address of the HAProxy stats host or container in the {$HAPROXY.STATS.HOST}
macro. You can also change the status page port in the {$HAPROXY.STATS.PORT}
macro, the status page scheme in the {$HAPROXY.STATS.SCHEME}
macro and the status page path in the {$HAPROXY.STATS.PATH}
macro if necessary.
If you have enabled authentication in the HAProxy configuration file in step 1, set the username and password in the {$HAPROXY.USERNAME}
and {$HAPROXY.PASSWORD}
macros.
Name | Description | Default |
---|---|---|
{$HAPROXY.STATS.SCHEME} | The scheme of HAProxy stats page (http/https). |
http |
{$HAPROXY.STATS.HOST} | The hostname or IP address of the HAProxy stats host or container. |
<SET HAPROXY HOST> |
{$HAPROXY.STATS.PORT} | The port of the HAProxy stats host or container. |
8404 |
{$HAPROXY.STATS.PATH} | The path of the HAProxy stats page. |
stats |
{$HAPROXY.USERNAME} | The username of the HAProxy stats page. |
|
{$HAPROXY.PASSWORD} | The password of the HAProxy stats page. |
|
{$HAPROXY.RESPONSE_TIME.MAX.WARN} | The HAProxy stats page maximum response time in seconds for trigger expression. |
10s |
{$HAPROXY.FRONT_DREQ.MAX.WARN} | The HAProxy maximum denied requests for trigger expression. |
10 |
{$HAPROXY.FRONT_EREQ.MAX.WARN} | The HAProxy maximum number of request errors for trigger expression. |
10 |
{$HAPROXY.BACK_QCUR.MAX.WARN} | Maximum number of requests on Backend unassigned in queue for trigger expression. |
10 |
{$HAPROXY.BACK_RTIME.MAX.WARN} | Maximum of average Backend response time for trigger expression. |
10s |
{$HAPROXY.BACK_QTIME.MAX.WARN} | Maximum of average time spent in queue on Backend for trigger expression. |
10s |
{$HAPROXY.BACK_ERESP.MAX.WARN} | Maximum of responses with error on Backend for trigger expression. |
10 |
{$HAPROXY.SERVER_QCUR.MAX.WARN} | Maximum number of requests on server unassigned in queue for trigger expression. |
10 |
{$HAPROXY.SERVER_RTIME.MAX.WARN} | Maximum of average server response time for trigger expression. |
10s |
{$HAPROXY.SERVER_QTIME.MAX.WARN} | Maximum of average time spent in queue on server for trigger expression. |
10s |
{$HAPROXY.SERVER_ERESP.MAX.WARN} | Maximum of responses with error on server for trigger expression. |
10 |
{$HAPROXY.FRONT_SUTIL.MAX.WARN} | Maximum of session usage percentage on frontend for trigger expression. |
80 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get stats | HAProxy Statistics Report in CSV format |
HTTP agent | haproxy.get Preprocessing
|
Get nodes | Array for LLD rules. |
Dependent item | haproxy.get.nodes Preprocessing
|
Get stats page | HAProxy Statistics Report HTML |
HTTP agent | haproxy.get_html |
Version | Dependent item | haproxy.version Preprocessing
|
|
Uptime | Dependent item | haproxy.uptime Preprocessing
|
|
Service status | Simple check | net.tcp.service["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"] Preprocessing
|
|
Service response time | Simple check | net.tcp.service.perf["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HAProxy: Version has changed | HAProxy version has changed. Acknowledge to close the problem manually. |
last(/HAProxy by HTTP/haproxy.version,#1)<>last(/HAProxy by HTTP/haproxy.version,#2) and length(last(/HAProxy by HTTP/haproxy.version))>0 |Info |
Manual close: Yes | |
HAProxy: Service has been restarted | Uptime is less than 10 minutes. |
last(/HAProxy by HTTP/haproxy.uptime)<10m |Info |
Manual close: Yes | |
HAProxy: Service is down | last(/HAProxy by HTTP/net.tcp.service["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"])=0 |Average |
Manual close: Yes | ||
HAProxy: Service response time is too high | min(/HAProxy by HTTP/net.tcp.service.perf["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"],5m)>{$HAPROXY.RESPONSE_TIME.MAX.WARN} |Warning |
Manual close: Yes Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Backend discovery | Discovery backends |
Dependent item | haproxy.backend.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Backend {#PXNAME}: Raw data | The raw data of the Backend with the name |
Dependent item | haproxy.backend.raw[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Status | Possible values: UP - The server is reporting as healthy. DOWN - The server is reporting as unhealthy and unable to receive requests. NOLB - You've added http-check disable-on-404 to the backend and the health checked URL has returned an HTTP 404 response. MAINT - The server has been disabled or put into maintenance mode. DRAIN - The server has been put into drain mode. no check - Health checks are not enabled for this server. |
Dependent item | haproxy.backend.status[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Responses time | Average backend response time (in ms) for the last 1,024 requests |
Dependent item | haproxy.backend.rtime[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Errors connection per second | Number of requests that encountered an error attempting to connect to a backend server. |
Dependent item | haproxy.backend.econ.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Responses denied per second | Responses denied due to security concerns (ACL-restricted). |
Dependent item | haproxy.backend.dresp.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Response errors per second | Number of requests whose responses yielded an error |
Dependent item | haproxy.backend.eresp.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Unassigned requests | Current number of requests unassigned in queue. |
Dependent item | haproxy.backend.qcur[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Time in queue | Average time spent in queue (in ms) for the last 1,024 requests |
Dependent item | haproxy.backend.qtime[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Redispatched requests per second | Number of times a request was redispatched to a different backend. |
Dependent item | haproxy.backend.wredis.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Retried connections per second | Number of times a connection was retried. |
Dependent item | haproxy.backend.wretr.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Number of responses with codes 1xx per second | Number of informational HTTP responses per second. |
Dependent item | haproxy.backend.hrsp_1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Number of responses with codes 2xx per second | Number of successful HTTP responses per second. |
Dependent item | haproxy.backend.hrsp_2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Number of responses with codes 3xx per second | Number of HTTP redirections per second. |
Dependent item | haproxy.backend.hrsp_3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Number of responses with codes 4xx per second | Number of HTTP client errors per second. |
Dependent item | haproxy.backend.hrsp_4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Number of responses with codes 5xx per second | Number of HTTP server errors per second. |
Dependent item | haproxy.backend.hrsp_5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Incoming traffic | Number of bits received by the backend |
Dependent item | haproxy.backend.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Outgoing traffic | Number of bits sent by the backend |
Dependent item | haproxy.backend.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Number of active servers | Number of active servers. |
Dependent item | haproxy.backend.act[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Number of backup servers | Number of backup servers. |
Dependent item | haproxy.backend.bck[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Sessions per second | Cumulative number of sessions (end-to-end connections) per second. |
Dependent item | haproxy.backend.stot.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Weight | Total effective weight. |
Dependent item | haproxy.backend.weight[{#PXNAME},{#SVNAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HAProxy: backend {#PXNAME}: Server is DOWN | Backend is not available. |
count(/HAProxy by HTTP/haproxy.backend.status[{#PXNAME},{#SVNAME}],#5,"eq","DOWN")=5 |Average |
||
HAProxy: backend {#PXNAME}: Average response time is high | Average backend response time (in ms) for the last 1,024 requests is more than {$HAPROXY.BACK_RTIME.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.backend.rtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_RTIME.MAX.WARN} |Warning |
||
HAProxy: backend {#PXNAME}: Number of responses with error is high | Number of requests on backend, whose responses yielded an error, is more than {$HAPROXY.BACK_ERESP.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.backend.eresp.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_ERESP.MAX.WARN} |Warning |
||
HAProxy: backend {#PXNAME}: Current number of requests unassigned in queue is high | Current number of requests on backend unassigned in queue is more than {$HAPROXY.BACK_QCUR.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.backend.qcur[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_QCUR.MAX.WARN} |Warning |
||
HAProxy: backend {#PXNAME}: Average time spent in queue is high | Average time spent in queue (in ms) for the last 1,024 requests is more than {$HAPROXY.BACK_QTIME.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.backend.qtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_QTIME.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Frontend discovery | Discovery frontends |
Dependent item | haproxy.frontend.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Frontend {#PXNAME}: Raw data | The raw data of the Frontend with the name |
Dependent item | haproxy.frontend.raw[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Status | Possible values: OPEN, STOP. When Status is OPEN, the frontend is operating normally and ready to receive traffic. |
Dependent item | haproxy.frontend.status[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Requests rate | HTTP requests per second |
Dependent item | haproxy.frontend.req_rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Sessions rate | Number of sessions created per second |
Dependent item | haproxy.frontend.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Established sessions | The current number of established sessions. |
Dependent item | haproxy.frontend.scur[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Session limits | The most simultaneous sessions that are allowed, as defined by the maxconn setting in the frontend. |
Dependent item | haproxy.frontend.slim[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Session utilization | Percentage of sessions used (scur / slim * 100). |
Calculated | haproxy.frontend.sutil[{#PXNAME},{#SVNAME}] |
Frontend {#PXNAME}: Request errors per second | Number of request errors per second. |
Dependent item | haproxy.frontend.ereq.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Denied requests per second | Requests denied due to security concerns (ACL-restricted) per second. |
Dependent item | haproxy.frontend.dreq.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Number of responses with codes 1xx per second | Number of informational HTTP responses per second. |
Dependent item | haproxy.frontend.hrsp_1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Number of responses with codes 2xx per second | Number of successful HTTP responses per second. |
Dependent item | haproxy.frontend.hrsp_2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Number of responses with codes 3xx per second | Number of HTTP redirections per second. |
Dependent item | haproxy.frontend.hrsp_3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Number of responses with codes 4xx per second | Number of HTTP client errors per second. |
Dependent item | haproxy.frontend.hrsp_4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Number of responses with codes 5xx per second | Number of HTTP server errors per second. |
Dependent item | haproxy.frontend.hrsp_5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Incoming traffic | Number of bits received by the frontend |
Dependent item | haproxy.frontend.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Outgoing traffic | Number of bits sent by the frontend |
Dependent item | haproxy.frontend.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HAProxy: frontend {#PXNAME}: Session utilization is high | Alerting on this metric is essential to ensure your server has sufficient capacity to handle all concurrent sessions. Unlike requests, upon reaching the session limit HAProxy will deny additional clients until resource consumption drops. Furthermore, if you find your session usage percentage to be hovering above 80%, it could be time to either modify HAProxy's configuration to allow more sessions, or migrate your HAProxy server to a bigger box. |
min(/HAProxy by HTTP/haproxy.frontend.sutil[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_SUTIL.MAX.WARN} |Warning |
||
HAProxy: frontend {#PXNAME}: Number of request errors is high | Number of request errors is more than {$HAPROXY.FRONT_EREQ.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.frontend.ereq.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_EREQ.MAX.WARN} |Warning |
||
HAProxy: frontend {#PXNAME}: Number of requests denied is high | Number of requests denied due to security concerns (ACL-restricted) is more than {$HAPROXY.FRONT_DREQ.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.frontend.dreq.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_DREQ.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Server discovery | Discovery servers |
Dependent item | haproxy.server.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Server {#PXNAME} {#SVNAME}: Raw data | The raw data of the Server named |
Dependent item | haproxy.server.raw[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Status | Dependent item | haproxy.server.status[{#PXNAME},{#SVNAME}] Preprocessing
|
|
{#PXNAME} {#SVNAME}: Responses time | Average server response time (in ms) for the last 1,024 requests. |
Dependent item | haproxy.server.rtime[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Errors connection per second | Number of requests that encountered an error attempting to connect to a backend server. |
Dependent item | haproxy.server.econ.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Responses denied per second | Responses denied due to security concerns (ACL-restricted). |
Dependent item | haproxy.server.dresp.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Response errors per second | Number of requests whose responses yielded an error. |
Dependent item | haproxy.server.eresp.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Unassigned requests | Current number of requests unassigned in queue. |
Dependent item | haproxy.server.qcur[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Time in queue | Average time spent in queue (in ms) for the last 1,024 requests. |
Dependent item | haproxy.server.qtime[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Redispatched requests per second | Number of times a request was redispatched to a different backend. |
Dependent item | haproxy.server.wredis.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Retried connections per second | Number of times a connection was retried. |
Dependent item | haproxy.server.wretr.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Number of responses with codes 1xx per second | Number of informational HTTP responses per second. |
Dependent item | haproxy.server.hrsp_1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Number of responses with codes 2xx per second | Number of successful HTTP responses per second. |
Dependent item | haproxy.server.hrsp_2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Number of responses with codes 3xx per second | Number of HTTP redirections per second. |
Dependent item | haproxy.server.hrsp_3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Number of responses with codes 4xx per second | Number of HTTP client errors per second. |
Dependent item | haproxy.server.hrsp_4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Number of responses with codes 5xx per second | Number of HTTP server errors per second. |
Dependent item | haproxy.server.hrsp_5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Incoming traffic | Number of bits received by the backend |
Dependent item | haproxy.server.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Outgoing traffic | Number of bits sent by the backend |
Dependent item | haproxy.server.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Server is active | Shows whether the server is active (marked with a Y) or a backup (marked with a -). |
Dependent item | haproxy.server.act[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Server is backup | Shows whether the server is a backup (marked with a Y) or active (marked with a -). |
Dependent item | haproxy.server.bck[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Sessions per second | Cumulative number of sessions (end-to-end connections) per second. |
Dependent item | haproxy.server.stot.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Weight | Effective weight. |
Dependent item | haproxy.server.weight[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Configured maxqueue | Configured maxqueue for the server, or nothing in the value is 0 (default, meaning no limit). |
Dependent item | haproxy.server.qlimit[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Server was selected per second | Number of times that server was selected. |
Dependent item | haproxy.server.lbtot.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Status of last health check | Status of last health check, one of: UNK -> unknown INI -> initializing SOCKERR -> socket error L4OK -> check passed on layer 4, no upper layers testing enabled L4TOUT -> layer 1-4 timeout L4CON -> layer 1-4 connection problem, for example "Connection refused" (tcp rst) or "No route to host" (icmp) L6OK -> check passed on layer 6 L6TOUT -> layer 6 (SSL) timeout L6RSP -> layer 6 invalid response - protocol error L7OK -> check passed on layer 7 L7OKC -> check conditionally passed on layer 7, for example 404 with disable-on-404 L7TOUT -> layer 7 (HTTP/SMTP) timeout L7RSP -> layer 7 invalid response - protocol error L7STS -> layer 7 response error, for example HTTP 5xx Notice: If a check is currently running, the last known status will be reported, prefixed with "* ". e. g. "* L7OK". |
Dependent item | haproxy.server.check_status[{#PXNAME},{#SVNAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HAProxy: {#PXNAME} {#SVNAME}: Server is DOWN | Server is not available. |
count(/HAProxy by HTTP/haproxy.server.status[{#PXNAME},{#SVNAME}],#5,"eq","DOWN")=5 |Warning |
||
HAProxy: {#PXNAME} {#SVNAME}: Average response time is high | Average server response time (in ms) for the last 1,024 requests is more than {$HAPROXY.SERVER_RTIME.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.server.rtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_RTIME.MAX.WARN} |Warning |
||
HAProxy: {#PXNAME} {#SVNAME}: Number of responses with error is high | Number of requests on server, whose responses yielded an error, is more than {$HAPROXY.SERVER_ERESP.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.server.eresp.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_ERESP.MAX.WARN} |Warning |
||
HAProxy: {#PXNAME} {#SVNAME}: Current number of requests unassigned in queue is high | Current number of requests unassigned in queue is more than {$HAPROXY.SERVER_QCUR.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.server.qcur[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_QCUR.MAX.WARN} |Warning |
||
HAProxy: {#PXNAME} {#SVNAME}: Average time spent in queue is high | Average time spent in queue (in ms) for the last 1,024 requests is more than {$HAPROXY.SERVER_QTIME.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.server.qtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_QTIME.MAX.WARN} |Warning |
||
HAProxy: {#PXNAME} {#SVNAME}: Health check error | Please check the server for faults. |
find(/HAProxy by HTTP/haproxy.server.check_status[{#PXNAME},{#SVNAME}],#3,"regexp","(?:L[4-7]OK|^$)")=0 |Warning |
Depends on:
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor HAProxy by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template collects metrics by polling the HAProxy stats page with Zabbix agent.
Note, that this template doesn't support authentication and redirects (limitations of web.page.get
).
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
HAProxy stats page
.The example configuration of HAProxy:
frontend stats
bind *:8404
stats enable
stats uri /stats
stats refresh 10s
{$HAPROXY.STATS.HOST}
macro. You can also change the status page port in the {$HAPROXY.STATS.PORT}
macro, the status page scheme in the {$HAPROXY.STATS.SCHEME}
macro and the status page path in the {$HAPROXY.STATS.PATH}
macro if necessary.Name | Description | Default |
---|---|---|
{$HAPROXY.STATS.SCHEME} | The scheme of HAProxy stats page(http/https). |
http |
{$HAPROXY.STATS.HOST} | The hostname or IP address of the HAProxy stats host or container. |
localhost |
{$HAPROXY.STATS.PORT} | The port of the HAProxy stats host or container. |
8404 |
{$HAPROXY.STATS.PATH} | The path of HAProxy stats page. |
stats |
{$HAPROXY.RESPONSE_TIME.MAX.WARN} | The HAProxy stats page maximum response time in seconds for trigger expression. |
10s |
{$HAPROXY.FRONT_DREQ.MAX.WARN} | The HAProxy maximum denied requests for trigger expression. |
10 |
{$HAPROXY.FRONT_EREQ.MAX.WARN} | The HAProxy maximum number of request errors for trigger expression. |
10 |
{$HAPROXY.BACK_QCUR.MAX.WARN} | Maximum number of requests on BACKEND unassigned in queue for trigger expression. |
10 |
{$HAPROXY.BACK_RTIME.MAX.WARN} | Maximum of average BACKEND response time for trigger expression. |
10s |
{$HAPROXY.BACK_QTIME.MAX.WARN} | Maximum of average time spent in queue on BACKEND for trigger expression. |
10s |
{$HAPROXY.BACK_ERESP.MAX.WARN} | Maximum of responses with error on BACKEND for trigger expression. |
10 |
{$HAPROXY.SERVER_QCUR.MAX.WARN} | Maximum number of requests on server unassigned in queue for trigger expression. |
10 |
{$HAPROXY.SERVER_RTIME.MAX.WARN} | Maximum of average server response time for trigger expression. |
10s |
{$HAPROXY.SERVER_QTIME.MAX.WARN} | Maximum of average time spent in queue on server for trigger expression. |
10s |
{$HAPROXY.SERVER_ERESP.MAX.WARN} | Maximum of responses with error on server for trigger expression. |
10 |
{$HAPROXY.FRONT_SUTIL.MAX.WARN} | Maximum of session usage percentage on frontend for trigger expression. |
80 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get stats | HAProxy Statistics Report in CSV format |
Zabbix agent | web.page.get["{$HAPROXY.STATS.SCHEME}://{$HAPROXY.STATS.HOST}:{$HAPROXY.STATS.PORT}/{$HAPROXY.STATS.PATH};csv"] Preprocessing
|
Get nodes | Array for LLD rules. |
Dependent item | haproxy.get.nodes Preprocessing
|
Get stats page | HAProxy Statistics Report HTML |
Zabbix agent | web.page.get["{$HAPROXY.STATS.SCHEME}://{$HAPROXY.STATS.HOST}:{$HAPROXY.STATS.PORT}/{$HAPROXY.STATS.PATH}"] |
Version | Dependent item | haproxy.version Preprocessing
|
|
Uptime | Dependent item | haproxy.uptime Preprocessing
|
|
Service status | Zabbix agent | net.tcp.service["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"] Preprocessing
|
|
Service response time | Zabbix agent | net.tcp.service.perf["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HAProxy: Version has changed | HAProxy version has changed. Acknowledge to close the problem manually. |
last(/HAProxy by Zabbix agent/haproxy.version,#1)<>last(/HAProxy by Zabbix agent/haproxy.version,#2) and length(last(/HAProxy by Zabbix agent/haproxy.version))>0 |Info |
Manual close: Yes | |
HAProxy: Service has been restarted | Uptime is less than 10 minutes. |
last(/HAProxy by Zabbix agent/haproxy.uptime)<10m |Info |
Manual close: Yes | |
HAProxy: Service is down | last(/HAProxy by Zabbix agent/net.tcp.service["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"])=0 |Average |
Manual close: Yes | ||
HAProxy: Service response time is too high | min(/HAProxy by Zabbix agent/net.tcp.service.perf["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"],5m)>{$HAPROXY.RESPONSE_TIME.MAX.WARN} |Warning |
Manual close: Yes Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Backend discovery | Discovery backends |
Dependent item | haproxy.backend.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Backend {#PXNAME}: Raw data | The raw data of the Backend with the name |
Dependent item | haproxy.backend.raw[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Status | Possible values: UP - The server is reporting as healthy. DOWN - The server is reporting as unhealthy and unable to receive requests. NOLB - You've added http-check disable-on-404 to the backend and the health checked URL has returned an HTTP 404 response. MAINT - The server has been disabled or put into maintenance mode. DRAIN - The server has been put into drain mode. no check - Health checks are not enabled for this server. |
Dependent item | haproxy.backend.status[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Responses time | Average backend response time (in ms) for the last 1,024 requests |
Dependent item | haproxy.backend.rtime[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Errors connection per second | Number of requests that encountered an error attempting to connect to a backend server. |
Dependent item | haproxy.backend.econ.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Responses denied per second | Responses denied due to security concerns (ACL-restricted). |
Dependent item | haproxy.backend.dresp.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Response errors per second | Number of requests whose responses yielded an error |
Dependent item | haproxy.backend.eresp.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Unassigned requests | Current number of requests unassigned in queue. |
Dependent item | haproxy.backend.qcur[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Time in queue | Average time spent in queue (in ms) for the last 1,024 requests |
Dependent item | haproxy.backend.qtime[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Redispatched requests per second | Number of times a request was redispatched to a different backend. |
Dependent item | haproxy.backend.wredis.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Retried connections per second | Number of times a connection was retried. |
Dependent item | haproxy.backend.wretr.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Number of responses with codes 1xx per second | Number of informational HTTP responses per second. |
Dependent item | haproxy.backend.hrsp_1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Number of responses with codes 2xx per second | Number of successful HTTP responses per second. |
Dependent item | haproxy.backend.hrsp_2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Number of responses with codes 3xx per second | Number of HTTP redirections per second. |
Dependent item | haproxy.backend.hrsp_3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Number of responses with codes 4xx per second | Number of HTTP client errors per second. |
Dependent item | haproxy.backend.hrsp_4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Number of responses with codes 5xx per second | Number of HTTP server errors per second. |
Dependent item | haproxy.backend.hrsp_5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Incoming traffic | Number of bits received by the backend |
Dependent item | haproxy.backend.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Outgoing traffic | Number of bits sent by the backend |
Dependent item | haproxy.backend.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Number of active servers | Number of active servers. |
Dependent item | haproxy.backend.act[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Number of backup servers | Number of backup servers. |
Dependent item | haproxy.backend.bck[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Sessions per second | Cumulative number of sessions (end-to-end connections) per second. |
Dependent item | haproxy.backend.stot.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Backend {#PXNAME}: Weight | Total effective weight. |
Dependent item | haproxy.backend.weight[{#PXNAME},{#SVNAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HAProxy: backend {#PXNAME}: Server is DOWN | Backend is not available. |
count(/HAProxy by Zabbix agent/haproxy.backend.status[{#PXNAME},{#SVNAME}],#5,"eq","DOWN")=5 |Average |
||
HAProxy: backend {#PXNAME}: Average response time is high | Average backend response time (in ms) for the last 1,024 requests is more than {$HAPROXY.BACK_RTIME.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.backend.rtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_RTIME.MAX.WARN} |Warning |
||
HAProxy: backend {#PXNAME}: Number of responses with error is high | Number of requests on backend, whose responses yielded an error, is more than {$HAPROXY.BACK_ERESP.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.backend.eresp.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_ERESP.MAX.WARN} |Warning |
||
HAProxy: backend {#PXNAME}: Current number of requests unassigned in queue is high | Current number of requests on backend unassigned in queue is more than {$HAPROXY.BACK_QCUR.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.backend.qcur[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_QCUR.MAX.WARN} |Warning |
||
HAProxy: backend {#PXNAME}: Average time spent in queue is high | Average time spent in queue (in ms) for the last 1,024 requests is more than {$HAPROXY.BACK_QTIME.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.backend.qtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_QTIME.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Frontend discovery | Discovery frontends |
Dependent item | haproxy.frontend.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Frontend {#PXNAME}: Raw data | The raw data of the Frontend with the name |
Dependent item | haproxy.frontend.raw[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Status | Possible values: OPEN, STOP. When Status is OPEN, the frontend is operating normally and ready to receive traffic. |
Dependent item | haproxy.frontend.status[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Requests rate | HTTP requests per second |
Dependent item | haproxy.frontend.req_rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Sessions rate | Number of sessions created per second |
Dependent item | haproxy.frontend.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Established sessions | The current number of established sessions. |
Dependent item | haproxy.frontend.scur[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Session limits | The most simultaneous sessions that are allowed, as defined by the maxconn setting in the frontend. |
Dependent item | haproxy.frontend.slim[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Session utilization | Percentage of sessions used (scur / slim * 100). |
Calculated | haproxy.frontend.sutil[{#PXNAME},{#SVNAME}] |
Frontend {#PXNAME}: Request errors per second | Number of request errors per second. |
Dependent item | haproxy.frontend.ereq.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Denied requests per second | Requests denied due to security concerns (ACL-restricted) per second. |
Dependent item | haproxy.frontend.dreq.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Number of responses with codes 1xx per second | Number of informational HTTP responses per second. |
Dependent item | haproxy.frontend.hrsp_1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Number of responses with codes 2xx per second | Number of successful HTTP responses per second. |
Dependent item | haproxy.frontend.hrsp_2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Number of responses with codes 3xx per second | Number of HTTP redirections per second. |
Dependent item | haproxy.frontend.hrsp_3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Number of responses with codes 4xx per second | Number of HTTP client errors per second. |
Dependent item | haproxy.frontend.hrsp_4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Number of responses with codes 5xx per second | Number of HTTP server errors per second. |
Dependent item | haproxy.frontend.hrsp_5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Incoming traffic | Number of bits received by the frontend |
Dependent item | haproxy.frontend.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Frontend {#PXNAME}: Outgoing traffic | Number of bits sent by the frontend |
Dependent item | haproxy.frontend.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HAProxy: frontend {#PXNAME}: Session utilization is high | Alerting on this metric is essential to ensure your server has sufficient capacity to handle all concurrent sessions. Unlike requests, upon reaching the session limit HAProxy will deny additional clients until resource consumption drops. Furthermore, if you find your session usage percentage to be hovering above 80%, it could be time to either modify HAProxy's configuration to allow more sessions, or migrate your HAProxy server to a bigger box. |
min(/HAProxy by Zabbix agent/haproxy.frontend.sutil[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_SUTIL.MAX.WARN} |Warning |
||
HAProxy: frontend {#PXNAME}: Number of request errors is high | Number of request errors is more than {$HAPROXY.FRONT_EREQ.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.frontend.ereq.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_EREQ.MAX.WARN} |Warning |
||
HAProxy: frontend {#PXNAME}: Number of requests denied is high | Number of requests denied due to security concerns (ACL-restricted) is more than {$HAPROXY.FRONT_DREQ.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.frontend.dreq.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_DREQ.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Server discovery | Discovery servers |
Dependent item | haproxy.server.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Server {#PXNAME} {#SVNAME}: Raw data | The raw data of the Server named |
Dependent item | haproxy.server.raw[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Status | Dependent item | haproxy.server.status[{#PXNAME},{#SVNAME}] Preprocessing
|
|
{#PXNAME} {#SVNAME}: Responses time | Average server response time (in ms) for the last 1,024 requests. |
Dependent item | haproxy.server.rtime[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Errors connection per second | Number of requests that encountered an error attempting to connect to a backend server. |
Dependent item | haproxy.server.econ.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Responses denied per second | Responses denied due to security concerns (ACL-restricted). |
Dependent item | haproxy.server.dresp.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Response errors per second | Number of requests whose responses yielded an error. |
Dependent item | haproxy.server.eresp.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Unassigned requests | Current number of requests unassigned in queue. |
Dependent item | haproxy.server.qcur[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Time in queue | Average time spent in queue (in ms) for the last 1,024 requests. |
Dependent item | haproxy.server.qtime[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Redispatched requests per second | Number of times a request was redispatched to a different backend. |
Dependent item | haproxy.server.wredis.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Retried connections per second | Number of times a connection was retried. |
Dependent item | haproxy.server.wretr.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Number of responses with codes 1xx per second | Number of informational HTTP responses per second. |
Dependent item | haproxy.server.hrsp_1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Number of responses with codes 2xx per second | Number of successful HTTP responses per second. |
Dependent item | haproxy.server.hrsp_2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Number of responses with codes 3xx per second | Number of HTTP redirections per second. |
Dependent item | haproxy.server.hrsp_3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Number of responses with codes 4xx per second | Number of HTTP client errors per second. |
Dependent item | haproxy.server.hrsp_4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Number of responses with codes 5xx per second | Number of HTTP server errors per second. |
Dependent item | haproxy.server.hrsp_5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Incoming traffic | Number of bits received by the backend |
Dependent item | haproxy.server.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Outgoing traffic | Number of bits sent by the backend |
Dependent item | haproxy.server.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Server is active | Shows whether the server is active (marked with a Y) or a backup (marked with a -). |
Dependent item | haproxy.server.act[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Server is backup | Shows whether the server is a backup (marked with a Y) or active (marked with a -). |
Dependent item | haproxy.server.bck[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Sessions per second | Cumulative number of sessions (end-to-end connections) per second. |
Dependent item | haproxy.server.stot.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Weight | Effective weight. |
Dependent item | haproxy.server.weight[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Configured maxqueue | Configured maxqueue for the server, or nothing in the value is 0 (default, meaning no limit). |
Dependent item | haproxy.server.qlimit[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Server was selected per second | Number of times that server was selected. |
Dependent item | haproxy.server.lbtot.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
{#PXNAME} {#SVNAME}: Status of last health check | Status of last health check, one of: UNK -> unknown INI -> initializing SOCKERR -> socket error L4OK -> check passed on layer 4, no upper layers testing enabled L4TOUT -> layer 1-4 timeout L4CON -> layer 1-4 connection problem, for example "Connection refused" (tcp rst) or "No route to host" (icmp) L6OK -> check passed on layer 6 L6TOUT -> layer 6 (SSL) timeout L6RSP -> layer 6 invalid response - protocol error L7OK -> check passed on layer 7 L7OKC -> check conditionally passed on layer 7, for example 404 with disable-on-404 L7TOUT -> layer 7 (HTTP/SMTP) timeout L7RSP -> layer 7 invalid response - protocol error L7STS -> layer 7 response error, for example HTTP 5xx Notice: If a check is currently running, the last known status will be reported, prefixed with "* ". e. g. "* L7OK". |
Dependent item | haproxy.server.check_status[{#PXNAME},{#SVNAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HAProxy: {#PXNAME} {#SVNAME}: Server is DOWN | Server is not available. |
count(/HAProxy by Zabbix agent/haproxy.server.status[{#PXNAME},{#SVNAME}],#5,"eq","DOWN")=5 |Warning |
||
HAProxy: {#PXNAME} {#SVNAME}: Average response time is high | Average server response time (in ms) for the last 1,024 requests is more than {$HAPROXY.SERVER_RTIME.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.server.rtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_RTIME.MAX.WARN} |Warning |
||
HAProxy: {#PXNAME} {#SVNAME}: Number of responses with error is high | Number of requests on server, whose responses yielded an error, is more than {$HAPROXY.SERVER_ERESP.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.server.eresp.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_ERESP.MAX.WARN} |Warning |
||
HAProxy: {#PXNAME} {#SVNAME}: Current number of requests unassigned in queue is high | Current number of requests unassigned in queue is more than {$HAPROXY.SERVER_QCUR.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.server.qcur[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_QCUR.MAX.WARN} |Warning |
||
HAProxy: {#PXNAME} {#SVNAME}: Average time spent in queue is high | Average time spent in queue (in ms) for the last 1,024 requests is more than {$HAPROXY.SERVER_QTIME.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.server.qtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_QTIME.MAX.WARN} |Warning |
||
HAProxy: {#PXNAME} {#SVNAME}: Health check error | Please check the server for faults. |
find(/HAProxy by Zabbix agent/haproxy.server.check_status[{#PXNAME},{#SVNAME}],#3,"regexp","(?:L[4-7]OK|^$)")=0 |Warning |
Depends on:
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template for monitoring Hadoop over HTTP that works without any external scripts. It collects metrics by polling the Hadoop API remotely using an HTTP agent and JSONPath preprocessing. Zabbix server (or proxy) execute direct requests to ResourceManager, NodeManagers, NameNode, DataNodes APIs. All metrics are collected at once, thanks to the Zabbix bulk data collection.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
You should define the IP address (or FQDN) and Web-UI port for the ResourceManager in {$HADOOP.RESOURCEMANAGER.HOST} and {$HADOOP.RESOURCEMANAGER.PORT} macros and for the NameNode in {$HADOOP.NAMENODE.HOST} and {$HADOOP.NAMENODE.PORT} macros respectively. Macros can be set in the template or overridden at the host level.
Name | Description | Default |
---|---|---|
{$HADOOP.RESOURCEMANAGER.HOST} | The Hadoop ResourceManager host IP address or FQDN. |
ResourceManager |
{$HADOOP.RESOURCEMANAGER.PORT} | The Hadoop ResourceManager Web-UI port. |
8088 |
{$HADOOP.RESOURCEMANAGER.RESPONSE_TIME.MAX.WARN} | The Hadoop ResourceManager API page maximum response time in seconds for trigger expression. |
10s |
{$HADOOP.NAMENODE.HOST} | The Hadoop NameNode host IP address or FQDN. |
NameNode |
{$HADOOP.NAMENODE.PORT} | The Hadoop NameNode Web-UI port. |
9870 |
{$HADOOP.NAMENODE.RESPONSE_TIME.MAX.WARN} | The Hadoop NameNode API page maximum response time in seconds for trigger expression. |
10s |
{$HADOOP.CAPACITY_REMAINING.MIN.WARN} | The Hadoop cluster capacity remaining percent for trigger expression. |
20 |
Name | Description | Type | Key and additional info |
---|---|---|---|
ResourceManager: Service status | Hadoop ResourceManager API port availability. |
Simple check | net.tcp.service["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"] Preprocessing
|
ResourceManager: Service response time | Hadoop ResourceManager API performance. |
Simple check | net.tcp.service.perf["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"] |
Get ResourceManager stats | HTTP agent | hadoop.resourcemanager.get | |
ResourceManager: Uptime | Dependent item | hadoop.resourcemanager.uptime Preprocessing
|
|
ResourceManager: Get info | Dependent item | hadoop.resourcemanager.info Preprocessing
|
|
ResourceManager: RPC queue & processing time | Average time spent on processing RPC requests. |
Dependent item | hadoop.resourcemanager.rpcprocessingtime_avg Preprocessing
|
ResourceManager: Active NMs | Number of Active NodeManagers. |
Dependent item | hadoop.resourcemanager.numactivenm Preprocessing
|
ResourceManager: Decommissioning NMs | Number of Decommissioning NodeManagers. |
Dependent item | hadoop.resourcemanager.numdecommissioningnm Preprocessing
|
ResourceManager: Decommissioned NMs | Number of Decommissioned NodeManagers. |
Dependent item | hadoop.resourcemanager.numdecommissionednm Preprocessing
|
ResourceManager: Lost NMs | Number of Lost NodeManagers. |
Dependent item | hadoop.resourcemanager.numlostnm Preprocessing
|
ResourceManager: Unhealthy NMs | Number of Unhealthy NodeManagers. |
Dependent item | hadoop.resourcemanager.numunhealthynm Preprocessing
|
ResourceManager: Rebooted NMs | Number of Rebooted NodeManagers. |
Dependent item | hadoop.resourcemanager.numrebootednm Preprocessing
|
ResourceManager: Shutdown NMs | Number of Shutdown NodeManagers. |
Dependent item | hadoop.resourcemanager.numshutdownnm Preprocessing
|
NameNode: Service status | Hadoop NameNode API port availability. |
Simple check | net.tcp.service["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"] Preprocessing
|
NameNode: Service response time | Hadoop NameNode API performance. |
Simple check | net.tcp.service.perf["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"] |
Get NameNode stats | HTTP agent | hadoop.namenode.get | |
NameNode: Uptime | Dependent item | hadoop.namenode.uptime Preprocessing
|
|
NameNode: Get info | Dependent item | hadoop.namenode.info Preprocessing
|
|
NameNode: RPC queue & processing time | Average time spent on processing RPC requests. |
Dependent item | hadoop.namenode.rpcprocessingtime_avg Preprocessing
|
NameNode: Block Pool Renaming | Dependent item | hadoop.namenode.percentblockpool_used Preprocessing
|
|
NameNode: Transactions since last checkpoint | Total number of transactions since last checkpoint. |
Dependent item | hadoop.namenode.transactionssincelast_checkpoint Preprocessing
|
NameNode: Percent capacity remaining | Available capacity in percent. |
Dependent item | hadoop.namenode.percent_remaining Preprocessing
|
NameNode: Capacity remaining | Available capacity. |
Dependent item | hadoop.namenode.capacity_remaining Preprocessing
|
NameNode: Corrupt blocks | Number of corrupt blocks. |
Dependent item | hadoop.namenode.corrupt_blocks Preprocessing
|
NameNode: Missing blocks | Number of missing blocks. |
Dependent item | hadoop.namenode.missing_blocks Preprocessing
|
NameNode: Failed volumes | Number of failed volumes. |
Dependent item | hadoop.namenode.volumefailurestotal Preprocessing
|
NameNode: Alive DataNodes | Count of alive DataNodes. |
Dependent item | hadoop.namenode.numlivedata_nodes Preprocessing
|
NameNode: Dead DataNodes | Count of dead DataNodes. |
Dependent item | hadoop.namenode.numdeaddata_nodes Preprocessing
|
NameNode: Stale DataNodes | DataNodes that do not send a heartbeat within 30 seconds are marked as "stale". |
Dependent item | hadoop.namenode.numstaledata_nodes Preprocessing
|
NameNode: Total files | Total count of files tracked by the NameNode. |
Dependent item | hadoop.namenode.files_total Preprocessing
|
NameNode: Total load | The current number of concurrent file accesses (read/write) across all DataNodes. |
Dependent item | hadoop.namenode.total_load Preprocessing
|
NameNode: Blocks allocable | Maximum number of blocks allocable. |
Dependent item | hadoop.namenode.block_capacity Preprocessing
|
NameNode: Total blocks | Count of blocks tracked by NameNode. |
Dependent item | hadoop.namenode.blocks_total Preprocessing
|
NameNode: Under-replicated blocks | The number of blocks with insufficient replication. |
Dependent item | hadoop.namenode.underreplicatedblocks Preprocessing
|
Get NodeManagers states | HTTP agent | hadoop.nodemanagers.get Preprocessing
|
|
Get DataNodes states | HTTP agent | hadoop.datanodes.get Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Hadoop: ResourceManager: Service is unavailable | last(/Hadoop by HTTP/net.tcp.service["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"])=0 |Average |
Manual close: Yes | ||
Hadoop: ResourceManager: Service response time is too high | min(/Hadoop by HTTP/net.tcp.service.perf["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"],5m)>{$HADOOP.RESOURCEMANAGER.RESPONSE_TIME.MAX.WARN} |Warning |
Manual close: Yes Depends on:
|
||
Hadoop: ResourceManager: Service has been restarted | Uptime is less than 10 minutes. |
last(/Hadoop by HTTP/hadoop.resourcemanager.uptime)<10m |Info |
Manual close: Yes | |
Hadoop: ResourceManager: Failed to fetch ResourceManager API page | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Hadoop by HTTP/hadoop.resourcemanager.uptime,30m)=1 |Warning |
Manual close: Yes Depends on:
|
|
Hadoop: ResourceManager: Cluster has no active NodeManagers | Cluster is unable to execute any jobs without at least one NodeManager. |
max(/Hadoop by HTTP/hadoop.resourcemanager.num_active_nm,5m)=0 |High |
||
Hadoop: ResourceManager: Cluster has unhealthy NodeManagers | YARN considers any node with disk utilization exceeding the value specified under the property yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage (in yarn-site.xml) to be unhealthy. Ample disk space is critical to ensure uninterrupted operation of a Hadoop cluster, and large numbers of unhealthyNodes (the number to alert on depends on the size of your cluster) should be quickly investigated and resolved. |
min(/Hadoop by HTTP/hadoop.resourcemanager.num_unhealthy_nm,15m)>0 |Average |
||
Hadoop: NameNode: Service is unavailable | last(/Hadoop by HTTP/net.tcp.service["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"])=0 |Average |
Manual close: Yes | ||
Hadoop: NameNode: Service response time is too high | min(/Hadoop by HTTP/net.tcp.service.perf["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"],5m)>{$HADOOP.NAMENODE.RESPONSE_TIME.MAX.WARN} |Warning |
Manual close: Yes Depends on:
|
||
Hadoop: NameNode: Service has been restarted | Uptime is less than 10 minutes. |
last(/Hadoop by HTTP/hadoop.namenode.uptime)<10m |Info |
Manual close: Yes | |
Hadoop: NameNode: Failed to fetch NameNode API page | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Hadoop by HTTP/hadoop.namenode.uptime,30m)=1 |Warning |
Manual close: Yes Depends on:
|
|
Hadoop: NameNode: Cluster capacity remaining is low | A good practice is to ensure that disk use never exceeds 80 percent capacity. |
max(/Hadoop by HTTP/hadoop.namenode.percent_remaining,15m)<{$HADOOP.CAPACITY_REMAINING.MIN.WARN} |Warning |
||
Hadoop: NameNode: Cluster has missing blocks | A missing block is far worse than a corrupt block, because a missing block cannot be recovered by copying a replica. |
min(/Hadoop by HTTP/hadoop.namenode.missing_blocks,15m)>0 |Average |
||
Hadoop: NameNode: Cluster has volume failures | HDFS now allows for disks to fail in place, without affecting DataNode operations, until a threshold value is reached. This is set on each DataNode via the dfs.datanode.failed.volumes.tolerated property; it defaults to 0, meaning that any volume failure will shut down the DataNode; on a production cluster where DataNodes typically have 6, 8, or 12 disks, setting this parameter to 1 or 2 is typically the best practice. |
min(/Hadoop by HTTP/hadoop.namenode.volume_failures_total,15m)>0 |Average |
||
Hadoop: NameNode: Cluster has DataNodes in Dead state | The death of a DataNode causes a flurry of network activity, as the NameNode initiates replication of blocks lost on the dead nodes. |
min(/Hadoop by HTTP/hadoop.namenode.num_dead_data_nodes,5m)>0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Node manager discovery | HTTP agent | hadoop.nodemanager.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Hadoop NodeManager {#HOSTNAME}: Get stats | HTTP agent | hadoop.nodemanager.get[{#HOSTNAME}] | |
{#HOSTNAME}: RPC queue & processing time | Average time spent on processing RPC requests. |
Dependent item | hadoop.nodemanager.rpcprocessingtime_avg[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: Container launch avg duration | Dependent item | hadoop.nodemanager.containerlaunchduration_avg[{#HOSTNAME}] Preprocessing
|
|
{#HOSTNAME}: JVM Threads | The number of JVM threads. |
Dependent item | hadoop.nodemanager.jvm.threads[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: JVM Garbage collection time | The JVM garbage collection time in milliseconds. |
Dependent item | hadoop.nodemanager.jvm.gc_time[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: JVM Heap usage | The JVM heap usage in MBytes. |
Dependent item | hadoop.nodemanager.jvm.memheapused[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: Uptime | Dependent item | hadoop.nodemanager.uptime[{#HOSTNAME}] Preprocessing
|
|
Hadoop NodeManager {#HOSTNAME}: Get raw info | Dependent item | hadoop.nodemanager.raw_info[{#HOSTNAME}] Preprocessing
|
|
{#HOSTNAME}: State | State of the node - valid values are: NEW, RUNNING, UNHEALTHY, DECOMMISSIONING, DECOMMISSIONED, LOST, REBOOTED, SHUTDOWN. |
Dependent item | hadoop.nodemanager.state[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: Version | Dependent item | hadoop.nodemanager.version[{#HOSTNAME}] Preprocessing
|
|
{#HOSTNAME}: Number of containers | Dependent item | hadoop.nodemanager.numcontainers[{#HOSTNAME}] Preprocessing
|
|
{#HOSTNAME}: Used memory | Dependent item | hadoop.nodemanager.usedmemory[{#HOSTNAME}] Preprocessing
|
|
{#HOSTNAME}: Available memory | Dependent item | hadoop.nodemanager.availablememory[{#HOSTNAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Hadoop: {#HOSTNAME}: Service has been restarted | Uptime is less than 10 minutes. |
last(/Hadoop by HTTP/hadoop.nodemanager.uptime[{#HOSTNAME}])<10m |Info |
Manual close: Yes | |
Hadoop: {#HOSTNAME}: Failed to fetch NodeManager API page | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Hadoop by HTTP/hadoop.nodemanager.uptime[{#HOSTNAME}],30m)=1 |Warning |
Manual close: Yes Depends on:
|
|
Hadoop: {#HOSTNAME}: NodeManager has state {ITEM.VALUE}. | The state is different from normal. |
last(/Hadoop by HTTP/hadoop.nodemanager.state[{#HOSTNAME}])<>"RUNNING" |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Data node discovery | HTTP agent | hadoop.datanode.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Hadoop DataNode {#HOSTNAME}: Get stats | HTTP agent | hadoop.datanode.get[{#HOSTNAME}] | |
{#HOSTNAME}: Remaining | Remaining disk space. |
Dependent item | hadoop.datanode.remaining[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: Used | Used disk space. |
Dependent item | hadoop.datanode.dfs_used[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: Number of failed volumes | Number of failed storage volumes. |
Dependent item | hadoop.datanode.numfailedvolumes[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: JVM Threads | The number of JVM threads. |
Dependent item | hadoop.datanode.jvm.threads[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: JVM Garbage collection time | The JVM garbage collection time in milliseconds. |
Dependent item | hadoop.datanode.jvm.gc_time[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: JVM Heap usage | The JVM heap usage in MBytes. |
Dependent item | hadoop.datanode.jvm.memheapused[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: Uptime | Dependent item | hadoop.datanode.uptime[{#HOSTNAME}] Preprocessing
|
|
Hadoop DataNode {#HOSTNAME}: Get raw info | Dependent item | hadoop.datanode.raw_info[{#HOSTNAME}] Preprocessing
|
|
{#HOSTNAME}: Version | DataNode software version. |
Dependent item | hadoop.datanode.version[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: Admin state | Administrative state. |
Dependent item | hadoop.datanode.admin_state[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: Oper state | Operational state. |
Dependent item | hadoop.datanode.oper_state[{#HOSTNAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Hadoop: {#HOSTNAME}: Service has been restarted | Uptime is less than 10 minutes. |
last(/Hadoop by HTTP/hadoop.datanode.uptime[{#HOSTNAME}])<10m |Info |
Manual close: Yes | |
Hadoop: {#HOSTNAME}: Failed to fetch DataNode API page | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Hadoop by HTTP/hadoop.datanode.uptime[{#HOSTNAME}],30m)=1 |Warning |
Manual close: Yes Depends on:
|
|
Hadoop: {#HOSTNAME}: DataNode has state {ITEM.VALUE}. | The state is different from normal. |
last(/Hadoop by HTTP/hadoop.datanode.oper_state[{#HOSTNAME}])<>"Live" |Average |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed to monitor GitLab by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template GitLab by HTTP
— collects metrics by an HTTP agent from the GitLab /-/metrics
endpoint.
See https://docs.gitlab.com/ee/administration/monitoring/prometheus/gitlab_metrics.html.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template works with self-hosted GitLab instances. Internal service metrics are collected from the GitLab /-/metrics
endpoint.
To access metrics following two methods are available:
Admin -> Monitoring -> Health check
page: http://your.gitlab.address/admin/health_check; Use this token in macro {$GITLAB.HEALTH.TOKEN}
as variable path, like: ?token=your_token
.
Remember to change the macros {$GITLAB.URL}
.
Also, see the Macros section for a list of macros used to set trigger values.NOTE. Some metrics may not be collected depending on your Gitlab instance version and configuration. See Gitlab's documentation for further information about its metric collection.
Name | Description | Default |
---|---|---|
{$GITLAB.URL} | URL of a GitLab instance. |
http://localhost |
{$GITLAB.HEALTH.TOKEN} | The token path for Gitlab health check. Example |
|
{$GITLAB.UNICORN.UTILIZATION.MAX.WARN} | The maximum percentage of Unicorn workers utilization for a trigger expression. |
90 |
{$GITLAB.PUMA.UTILIZATION.MAX.WARN} | The maximum percentage of Puma thread utilization for a trigger expression. |
90 |
{$GITLAB.HTTP.FAIL.MAX.WARN} | The maximum number of HTTP request failures for a trigger expression. |
2 |
{$GITLAB.REDIS.FAIL.MAX.WARN} | The maximum number of Redis client exceptions for a trigger expression. |
2 |
{$GITLAB.UNICORN.QUEUE.MAX.WARN} | The maximum number of Unicorn queued requests for a trigger expression. |
1 |
{$GITLAB.PUMA.QUEUE.MAX.WARN} | The maximum number of Puma queued requests for a trigger expression. |
1 |
{$GITLAB.OPEN.FDS.MAX.WARN} | The maximum percentage of used file descriptors for a trigger expression. |
90 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get instance metrics | HTTP agent | gitlab.get_metrics Preprocessing
|
|
Instance readiness check | The readiness probe checks whether the GitLab instance is ready to accept traffic via Rails Controllers. |
HTTP agent | gitlab.readiness Preprocessing
|
Application server status | Checks whether the application server is running. This probe is used to know if Rails Controllers are not deadlocked due to a multi-threading. |
HTTP agent | gitlab.liveness Preprocessing
|
Version | Version of the GitLab instance. |
Dependent item | gitlab.deployments.version Preprocessing
|
Ruby: First process start time | Minimum UNIX timestamp of ruby processes start time. |
Dependent item | gitlab.ruby.processstarttime_seconds.first Preprocessing
|
Ruby: Last process start time | Maximum UNIX timestamp ruby processes start time. |
Dependent item | gitlab.ruby.processstarttime_seconds.last Preprocessing
|
User logins, total | Counter of how many users have logged in since GitLab was started or restarted. |
Dependent item | gitlab.usersessionlogins_total Preprocessing
|
User CAPTCHA logins failed, total | Counter of failed CAPTCHA attempts during login. |
Dependent item | gitlab.failedlogincaptcha_total Preprocessing
|
User CAPTCHA logins, total | Counter of successful CAPTCHA attempts during login. |
Dependent item | gitlab.successfullogincaptcha_total Preprocessing
|
Upload file does not exist | Number of times an upload record could not find its file. |
Dependent item | gitlab.uploadfiledoesnotexist Preprocessing
|
Pipelines: Processing events, total | Total amount of pipeline processing events. |
Dependent item | gitlab.pipeline.processingeventstotal Preprocessing
|
Pipelines: Created, total | Counter of pipelines created. |
Dependent item | gitlab.pipeline.created_total Preprocessing
|
Pipelines: Auto DevOps pipelines, total | Counter of completed Auto DevOps pipelines. |
Dependent item | gitlab.pipeline.autodevopscompleted.total Preprocessing
|
Pipelines: Auto DevOps pipelines, failed | Counter of completed Auto DevOps pipelines with status "failed". |
Dependent item | gitlab.pipeline.autodevopscompleted_total.failed Preprocessing
|
Pipelines: CI/CD creation duration | The sum of the time in seconds it takes to create a CI/CD pipeline. |
Dependent item | gitlab.pipeline.pipeline_creation Preprocessing
|
Pipelines: Pipelines: CI/CD creation count | The count of the time it takes to create a CI/CD pipeline. |
Dependent item | gitlab.pipeline.pipeline_creation.count Preprocessing
|
Database: Connection pool, busy | Connections to the main database in use where the owner is still alive. |
Dependent item | gitlab.database.connectionpoolbusy Preprocessing
|
Database: Connection pool, current | Current connections to the main database in the pool. |
Dependent item | gitlab.database.connectionpoolconnections Preprocessing
|
Database: Connection pool, dead | Connections to the main database in use where the owner is not alive. |
Dependent item | gitlab.database.connectionpooldead Preprocessing
|
Database: Connection pool, idle | Connections to the main database not in use. |
Dependent item | gitlab.database.connectionpoolidle Preprocessing
|
Database: Connection pool, size | Total connection to the main database pool capacity. |
Dependent item | gitlab.database.connectionpoolsize Preprocessing
|
Database: Connection pool, waiting | Threads currently waiting on this queue. |
Dependent item | gitlab.database.connectionpoolwaiting Preprocessing
|
Redis: Client requests rate, queues | Number of Redis client requests per second. (Instance: queues) |
Dependent item | gitlab.redis.client_requests.queues.rate Preprocessing
|
Redis: Client requests rate, cache | Number of Redis client requests per second. (Instance: cache) |
Dependent item | gitlab.redis.client_requests.cache.rate Preprocessing
|
Redis: Client requests rate, shared_state | Number of Redis client requests per second. (Instance: shared_state) |
Dependent item | gitlab.redis.clientrequests.sharedstate.rate Preprocessing
|
Redis: Client exceptions rate, queues | Number of Redis client exceptions per second. (Instance: queues) |
Dependent item | gitlab.redis.client_exceptions.queues.rate Preprocessing
|
Redis: Client exceptions rate, cache | Number of Redis client exceptions per second. (Instance: cache) |
Dependent item | gitlab.redis.client_exceptions.cache.rate Preprocessing
|
Redis: client exceptions rate, shared_state | Number of Redis client exceptions per second. (Instance: shared_state) |
Dependent item | gitlab.redis.clientexceptions.sharedstate.rate Preprocessing
|
Cache: Misses rate, total | The cache read miss count. |
Dependent item | gitlab.cache.misses_total.rate Preprocessing
|
Cache: Operations rate, total | The count of cache operations. |
Dependent item | gitlab.cache.operations_total.rate Preprocessing
|
Ruby: CPU usage per second | Average CPU time util in seconds. |
Dependent item | gitlab.ruby.processcpuseconds.rate Preprocessing
|
Ruby: Running_threads | Number of running Ruby threads. |
Dependent item | gitlab.ruby.threads_running Preprocessing
|
Ruby: File descriptors opened, avg | Average number of opened file descriptors. |
Dependent item | gitlab.ruby.file_descriptors.avg Preprocessing
|
Ruby: File descriptors opened, max | Maximum number of opened file descriptors. |
Dependent item | gitlab.ruby.file_descriptors.max Preprocessing
|
Ruby: File descriptors opened, min | Minimum number of opened file descriptors. |
Dependent item | gitlab.ruby.file_descriptors.min Preprocessing
|
Ruby: File descriptors, max | Maximum number of open file descriptors per process. |
Dependent item | gitlab.ruby.processmaxfds Preprocessing
|
Ruby: RSS memory, avg | Average RSS Memory usage in bytes. |
Dependent item | gitlab.ruby.processresidentmemory_bytes.avg Preprocessing
|
Ruby: RSS memory, min | Minimum RSS Memory usage in bytes. |
Dependent item | gitlab.ruby.processresidentmemory_bytes.min Preprocessing
|
Ruby: RSS memory, max | Maximum RSS Memory usage in bytes. |
Dependent item | gitlab.ruby.processresidentmemory_bytes.max Preprocessing
|
HTTP requests rate, total | Number of requests received into the system. |
Dependent item | gitlab.http.requests.rate Preprocessing
|
HTTP requests rate, 5xx | Number of handle failures of requests with HTTP-code 5xx. |
Dependent item | gitlab.http.requests.5xx.rate Preprocessing
|
HTTP requests rate, 4xx | Number of handle failures of requests with code 4XX. |
Dependent item | gitlab.http.requests.4xx.rate Preprocessing
|
Transactions per second | Transactions per second (gitlabtransaction* metrics). |
Dependent item | gitlab.transactions.rate Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
GitLab: Gitlab instance is not able to accept traffic | last(/GitLab by HTTP/gitlab.readiness)=0 |High |
Depends on:
|
||
GitLab: Liveness check was failed | The application server is not running or Rails Controllers are deadlocked. |
last(/GitLab by HTTP/gitlab.liveness)=0 |High |
||
GitLab: Version has changed | The GitLab version has changed. Acknowledge to close the problem manually. |
last(/GitLab by HTTP/gitlab.deployments.version,#1)<>last(/GitLab by HTTP/gitlab.deployments.version,#2) and length(last(/GitLab by HTTP/gitlab.deployments.version))>0 |Info |
Manual close: Yes | |
GitLab: Too many Redis queues client exceptions | "Too many Redis client exceptions during the requests to Redis instance queues." |
min(/GitLab by HTTP/gitlab.redis.client_exceptions.queues.rate,5m)>{$GITLAB.REDIS.FAIL.MAX.WARN} |Warning |
||
GitLab: Too many Redis cache client exceptions | "Too many Redis client exceptions during the requests to Redis instance cache." |
min(/GitLab by HTTP/gitlab.redis.client_exceptions.cache.rate,5m)>{$GITLAB.REDIS.FAIL.MAX.WARN} |Warning |
||
GitLab: Too many Redis shared_state client exceptions | "Too many Redis client exceptions during the requests to Redis instance shared_state." |
min(/GitLab by HTTP/gitlab.redis.client_exceptions.shared_state.rate,5m)>{$GITLAB.REDIS.FAIL.MAX.WARN} |Warning |
||
GitLab: Failed to fetch info data | Zabbix has not received a metrics data for the last 30 minutes |
nodata(/GitLab by HTTP/gitlab.ruby.threads_running,30m)=1 |Warning |
Manual close: Yes Depends on:
|
|
GitLab: Current number of open files is too high | min(/GitLab by HTTP/gitlab.ruby.file_descriptors.max,5m)/last(/GitLab by HTTP/gitlab.ruby.process_max_fds)*100>{$GITLAB.OPEN.FDS.MAX.WARN} |Warning |
|||
GitLab: Too many HTTP requests failures | "Too many requests failed on GitLab instance with 5xx HTTP code" |
min(/GitLab by HTTP/gitlab.http.requests.5xx.rate,5m)>{$GITLAB.HTTP.FAIL.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Unicorn metrics discovery | DiscoveryUnicorn specific metrics, when Unicorn is used. |
HTTP agent | gitlab.unicorn.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Unicorn: Workers | The number of Unicorn workers |
Dependent item | gitlab.unicorn.unicorn_workers[{#SINGLETON}] Preprocessing
|
Unicorn: Active connections | The number of active Unicorn connections. |
Dependent item | gitlab.unicorn.active_connections[{#SINGLETON}] Preprocessing
|
Unicorn: Queued connections | The number of queued Unicorn connections. |
Dependent item | gitlab.unicorn.queued_connections[{#SINGLETON}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
GitLab: Unicorn worker utilization is too high | min(/GitLab by HTTP/gitlab.unicorn.active_connections[{#SINGLETON}],5m)/last(/GitLab by HTTP/gitlab.unicorn.unicorn_workers[{#SINGLETON}])*100>{$GITLAB.UNICORN.UTILIZATION.MAX.WARN} |Warning |
|||
GitLab: Unicorn is queueing requests | min(/GitLab by HTTP/gitlab.unicorn.queued_connections[{#SINGLETON}],5m)>{$GITLAB.UNICORN.QUEUE.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Puma metrics discovery | Discovery of Puma specific metrics when Puma is used. |
HTTP agent | gitlab.puma.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Active connections | Number of puma threads processing a request. |
Dependent item | gitlab.puma.active_connections[{#SINGLETON}] Preprocessing
|
Workers | Total number of puma workers. |
Dependent item | gitlab.puma.workers[{#SINGLETON}] Preprocessing
|
Running workers | The number of booted puma workers. |
Dependent item | gitlab.puma.running_workers[{#SINGLETON}] Preprocessing
|
Stale workers | The number of old puma workers. |
Dependent item | gitlab.puma.stale_workers[{#SINGLETON}] Preprocessing
|
Running threads | The number of running puma threads. |
Dependent item | gitlab.puma.running[{#SINGLETON}] Preprocessing
|
Queued connections | The number of connections in that puma worker's "todo" set waiting for a worker thread. |
Dependent item | gitlab.puma.queued_connections[{#SINGLETON}] Preprocessing
|
Pool capacity | The number of requests the puma worker is capable of taking right now. |
Dependent item | gitlab.puma.pool_capacity[{#SINGLETON}] Preprocessing
|
Max threads | The maximum number of puma worker threads. |
Dependent item | gitlab.puma.max_threads[{#SINGLETON}] Preprocessing
|
Idle threads | The number of spawned puma threads which are not processing a request. |
Dependent item | gitlab.puma.idle_threads[{#SINGLETON}] Preprocessing
|
Killer terminations, total | The number of workers terminated by PumaWorkerKiller. |
Dependent item | gitlab.puma.killerterminationstotal[{#SINGLETON}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
GitLab: Puma instance thread utilization is too high | min(/GitLab by HTTP/gitlab.puma.active_connections[{#SINGLETON}],5m)/last(/GitLab by HTTP/gitlab.puma.max_threads[{#SINGLETON}])*100>{$GITLAB.PUMA.UTILIZATION.MAX.WARN} |Warning |
|||
GitLab: Puma is queueing requests | min(/GitLab by HTTP/gitlab.puma.queued_connections[{#SINGLETON}],15m)>{$GITLAB.PUMA.QUEUE.MAX.WARN} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of GitHub repository monitoring by Zabbix via GitHub REST API and doesn't require any external scripts.
For more details about GitHub REST API, refer to the official documentation.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
GitHub limits the number of REST API requests that you can make within a specific amount of time, which also depends on whether you are authenticated or not, the plan, and the token type used. Many REST API endpoints require authentication or return additional information if you are authenticated. Additionally, you can make more requests per hour when you are authenticated.
Additional information is available in the official documentation:
One of the simplest ways to send authenticated requests is to use a personal access token - either a classic or a fine-grained one.
Classic personal access token
You can create a new classic personal access token by following the instructions in the official documentation.
For public repositories, no additional permission scopes are required. For monitoring to work on private repositories, the repo
scope must be set to have full control of private repositories.
Additional information about OAuth scopes is available in the official documentation.
Note that authenticated users must have admin access to the repository and the repo
scope must be set to get information about self-hosted runners.
Fine-grained personal access token
Alternatively, you can use a fine-grained personal access token.
In order to use fine-grained tokens to monitor organization-owned repositories, organizations must opt in to fine-grained personal access tokens and set up a personal access token policy.
The fine-grained token needs to have the following permissions set to provide access to the repository resources:
{$GITHUB.API.TOKEN}
macro{$GITHUB.API.URL}
macro if needed (for self-hosted installations){$GITHUB.REPO.OWNER}
macro{$GITHUB.REPO.NAME}
macro{$GITHUB.BRANCH.NAME.MATCHES}
, {$GITHUB.BRANCH.NAME.NOT_MATCHES}
;{$GITHUB.WORKFLOW.NAME.MATCHES}
, {$GITHUB.WORKFLOW.NAME.NOT_MATCHES}
;{$GITHUB.WORKFLOW.STATE.MATCHES}
, {$GITHUB.WORKFLOW.STATE.NOT_MATCHES}
;{$GITHUB.RUNNER.NAME.MATCHES}
, {$GITHUB.RUNNER.NAME.NOT_MATCHES}
;{$GITHUB.RUNNER.OS.MATCHES}
, {$GITHUB.RUNNER.OS.NOT_MATCHES}
.Note: Update intervals and timeouts for script items can be changed individually via {$GITHUB.INTERVAL}
and {$GITHUB.TIMEOUT}
macros with context. Depending on the repository being monitored, it can be adjusted if needed (if you are exceeding rate limits, you can increase update intervals for some script items to stay within per hour request limits). But be aware that it may also affect the triggers (check whether the item is used in triggers and adjust thresholds and/or evaluation periods if needed).
Name | Description | Default |
---|---|---|
{$GITHUB.API.URL} | Set the API URL here. |
https://api.github.com/ |
{$GITHUB.USER_AGENT} | The user agent that is used in headers for HTTP requests. |
Zabbix/7.0 |
{$GITHUB.API_VERSION} | The API version that is used in headers for HTTP requests. |
2022-11-28 |
{$GITHUB.REPO.OWNER} | Set the repository owner here. |
<SET THE REPO OWNER> |
{$GITHUB.REPO.NAME} | Set the repository name here. |
<SET THE REPO NAME> |
{$GITHUB.API.TOKEN} | Set the access token here. |
|
{$GITHUB.INTERVAL} | The update interval for the script items that retrieve data from the API. Can be used with context if needed (check the context values in relevant items). |
1m |
{$GITHUB.INTERVAL:regex:"get(tags|releases|issues)count"} | The update interval for the script items that retrieve the number of tags, releases, issues, and pull requests (total, open, closed). |
1h |
{$GITHUB.INTERVAL:"get_repo"} | The update interval for the script item that retrieves the repository information. |
15m |
{$GITHUB.INTERVAL:"get_(branches|workflows)"} | The update interval for the script items that retrieve the branches and workflows. Used only for related metric discovery. |
1h |
{$GITHUB.INTERVAL:"get_runners"} | The update interval for the script item that retrieves the information about self-hosted runners. |
15m |
{$GITHUB.INTERVAL:regex:"getlastrun:.+"} | The update interval for the script items that retrieve the information about the last workflow run results. |
15m |
{$GITHUB.INTERVAL:regex:"getcommitscount:.+"} | The update interval for the script items that retrieve the commits count in discovered branches. |
1h |
{$GITHUB.TIMEOUT} | The timeout threshold for the script items that retrieve data from the API. Can be used with context if needed (check the context values in relevant items). |
15s |
{$GITHUB.HTTP_PROXY} | The HTTP proxy for script items (set if needed). If the macro is empty, then no proxy is used. |
|
{$GITHUB.RESULTSPERPAGE} | The number of results to fetch per page. Can be used with context and adjusted if needed (check the context values in script parameters of relevant items). |
100 |
{$GITHUB.WORKFLOW.NAME.MATCHES} | The repository workflow name regex filter to use in workflow-related metric discovery - for including. |
.+ |
{$GITHUB.WORKFLOW.NAME.NOT_MATCHES} | The repository workflow name regex filter to use in workflow-related metric discovery - for excluding. |
CHANGE_IF_NEEDED |
{$GITHUB.WORKFLOW.STATE.MATCHES} | The repository workflow state regex filter to use in workflow-related metric discovery - for including. |
active |
{$GITHUB.WORKFLOW.STATE.NOT_MATCHES} | The repository workflow state regex filter to use in workflow-related metric discovery - for excluding. |
CHANGE_IF_NEEDED |
{$GITHUB.BRANCH.NAME.MATCHES} | The repository branch name regex filter to use in branch-related metric discovery - for including. |
.+ |
{$GITHUB.BRANCH.NAME.NOT_MATCHES} | The repository branch name regex filter to use in branch-related metric discovery - for excluding. |
CHANGE_IF_NEEDED |
{$GITHUB.RUNNER.NAME.MATCHES} | The repository self-hosted runner name regex filter to use in discovering metrics related to the self-hosted runner - for including. |
.+ |
{$GITHUB.RUNNER.NAME.NOT_MATCHES} | The repository self-hosted runner name regex filter to use in discovering metrics related to the self-hosted runner - for excluding. |
CHANGE_IF_NEEDED |
{$GITHUB.RUNNER.OS.MATCHES} | The repository self-hosted runner OS regex filter to use in discovering metrics related to the self-hosted runner - for including. |
.+ |
{$GITHUB.RUNNER.OS.NOT_MATCHES} | The repository self-hosted runner OS regex filter to use in discovering metrics related to the self-hosted runner - for excluding. |
CHANGE_IF_NEEDED |
{$GITHUB.REQUESTS.UTIL.WARN} | The threshold percentage of utilized API requests in a Warning trigger expression. |
80 |
{$GITHUB.REQUESTS.UTIL.HIGH} | The threshold percentage of utilized API requests in a High trigger expression. |
90 |
{$GITHUB.WORKFLOW.STATUS.QUEUED.THRESH} | The time threshold used in the trigger of a workflow run that has been in the queue for too long. Can be used with context if needed. |
1h |
{$GITHUB.WORKFLOW.STATUS.IN_PROGRESS.THRESH} | The time threshold used in the trigger of a workflow run that has been in the queue for too long. Can be used with context if needed. |
24h |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get self-hosted runners | Get the self-hosted runners of the repository. Note that admin access to the repository is required to use this endpoint: https://docs.github.com/en/rest/actions/self-hosted-runners?apiVersion=2022-11-28#list-self-hosted-runners-for-a-repository |
Script | github.repo.runners.get Preprocessing
|
Get self-hosted runner check | Carry out a self-hosted runners data collection check. |
Dependent item | github.repo.runners.get.check Preprocessing
|
Number of releases | The number of releases in the repository. Note that this number also includes draft releases. Information about endpoint: https://docs.github.com/en/rest/releases/releases?apiVersion=2022-11-28#list-releases |
Script | github.repo.releases.count Preprocessing
|
Number of tags | The number of tags in the repository. Information about endpoint: https://docs.github.com/en/rest/repos/repos?apiVersion=2022-11-28#list-repository-tags |
Script | github.repo.tags.count Preprocessing
|
Get issue count | Get the count of issues and pull requests in the repository (total, open, closed). Information about endpoint for issues: https://docs.github.com/en/rest/issues/issues?apiVersion=2022-11-28#list-repository-issues Information about endpoint for pull requests: https://docs.github.com/en/rest/pulls/pulls?apiVersion=2022-11-28#list-pull-requests |
Script | github.repo.issues.get Preprocessing
|
Number of issues | The total number of issues in the repository. |
Dependent item | github.repo.issues.total Preprocessing
|
Number of open issues | The number of open issues in the repository. |
Dependent item | github.repo.issues.open Preprocessing
|
Number of closed issues | The number of closed issues in the repository. |
Dependent item | github.repo.issues.closed Preprocessing
|
Number of PRs | The total number of pull requests in the repository. |
Dependent item | github.repo.pr.total Preprocessing
|
Number of open PRs | The number of open pull requests in the repository. |
Dependent item | github.repo.pr.open Preprocessing
|
Number of closed PRs | The number of closed pull requests in the repository. |
Dependent item | github.repo.pr.closed Preprocessing
|
Request limit | API request limit. Information about request limits in GitHub REST API documentation: https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api?apiVersion=2022-11-28 |
Dependent item | github.repo.requests.limit Preprocessing
|
Requests used | The number of used API requests. Information about request limits in GitHub REST API documentation: https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api?apiVersion=2022-11-28 |
Dependent item | github.repo.requests.used Preprocessing
|
Request limit utilization, in % | The calculated utilization of the API request limit in %. Information about request limits in GitHub REST API documentation: https://docs.github.com/en/rest/using-the-rest-api/rate-limits-for-the-rest-api?apiVersion=2022-11-28 |
Dependent item | github.repo.requests.util Preprocessing
|
Get repository | Get the general repository information. If the repository is not a fork, the community profile metrics are also retrieved. Information about endpoint: https://docs.github.com/en/rest/repos/repos?apiVersion=2022-11-28#get-a-repository Information about community profile metrics: https://docs.github.com/en/rest/metrics/community?apiVersion=2022-11-28#get-community-profile-metrics |
Script | github.repo.repository.get |
Get repository data check | Data collection check. |
Dependent item | github.repo.repository.get.check Preprocessing
|
Repository is a fork | Indicates whether the repository is a fork. |
Dependent item | github.repo.repository.is_fork Preprocessing
|
Repository size | The size of the repository. |
Dependent item | github.repo.repository.size Preprocessing
|
Repository stargazers | The number of GitHub users who have starred the repository. |
Dependent item | github.repo.repository.stargazers Preprocessing
|
Repository watchers | The number of GitHub users who are subscribed to the repository. |
Dependent item | github.repo.repository.watchers Preprocessing
|
Repository forks | The number of repository forks. |
Dependent item | github.repo.repository.forks.count Preprocessing
|
Get workflows | Get the repository workflows. Information about endpoint: https://docs.github.com/en/rest/actions/workflows?apiVersion=2022-11-28#list-repository-workflows |
Script | github.repo.workflows.get Preprocessing
|
Get branches | Get the repository branches. Information about endpoint: https://docs.github.com/en/rest/branches/branches?apiVersion=2022-11-28#list-branches |
Script | github.repo.branches.get Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
GitHub: No access to repository self-hosted runners | Admin access to the repository is required to use this endpoint: |
find(/GitHub repository by HTTP/github.repo.runners.get.check,,"iregexp","Must have admin rights to Repository")=1 |Average |
||
GitHub: The total number of issues has increased | The total number of issues has increased which means that either a new issue (or multiple) has been opened. |
last(/GitHub repository by HTTP/github.repo.issues.total)>last(/GitHub repository by HTTP/github.repo.issues.total,#2) |Warning |
||
GitHub: The total number of PRs has increased | The total number of pull requests has increased which means that either a new pull request (or multiple) has been opened. |
last(/GitHub repository by HTTP/github.repo.pr.total)>last(/GitHub repository by HTTP/github.repo.pr.total,#2) |Info |
||
GitHub: API request limit utilization is high | The API request limit utilization is high. It can be lowered by increasing the update intervals for script items (by setting up higher values in corresponding context macros). |
max(/GitHub repository by HTTP/github.repo.requests.util,1h)>{$GITHUB.REQUESTS.UTIL.WARN} |Warning |
Depends on:
|
|
GitHub: API request limit utilization is very high | The API request limit utilization is very high. It can be lowered by increasing the update intervals for script items (by setting up higher values in corresponding context macros). |
max(/GitHub repository by HTTP/github.repo.requests.util,1h)>{$GITHUB.REQUESTS.UTIL.HIGH} |Average |
||
GitHub: There are errors in requests to API | Errors have been received in response to API requests. Check the latest values for details. |
length(last(/GitHub repository by HTTP/github.repo.repository.get.check))>0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Workflow discovery | Discovers repository workflows. By default, only the active workflows are discovered. Information about endpoint: https://docs.github.com/en/rest/actions/workflows?apiVersion=2022-11-28#list-repository-workflows |
Dependent item | github.repo.workflows.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Workflow [{#WORKFLOW_NAME}]: Get last run | Get the data about the last workflow run. Information about endpoint: https://docs.github.com/en/rest/actions/workflow-runs?apiVersion=2022-11-28#list-workflow-runs-for-a-workflow |
Script | github.repo.workflows.lastrun.get[{#WORKFLOWNAME}] Preprocessing
|
Workflow [{#WORKFLOW_NAME}]: Last run status | The status of the last workflow run. Possible values: 0 - queued 1 - in_progress 2 - completed 10 - unknown |
Dependent item | github.repo.workflows.lastrun.status[{#WORKFLOWNAME}] Preprocessing
|
Workflow [{#WORKFLOW_NAME}]: Last run conclusion | The conclusion of the last workflow run. Possible values: 0 - success 1 - failure 2 - neutral 3 - cancelled 4 - skipped 5 - timedout 6 - actionrequired 10 - unknown |
Dependent item | github.repo.workflows.lastrun.conclusion[{#WORKFLOWNAME}] Preprocessing
|
Workflow [{#WORKFLOW_NAME}]: Last run start date | The date when the last workflow run was started. |
Dependent item | github.repo.workflows.lastrun.startdate[{#WORKFLOW_NAME}] Preprocessing
|
Workflow [{#WORKFLOW_NAME}]: Last run update date | The date when the last workflow run was updated. |
Dependent item | github.repo.workflows.lastrun.updatedate[{#WORKFLOW_NAME}] Preprocessing
|
Workflow [{#WORKFLOW_NAME}]: Last run duration | The duration of the last workflow run. |
Dependent item | github.repo.workflows.lastrun.duration[{#WORKFLOWNAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
GitHub: Workflow [{#WORKFLOW_NAME}]: The workflow has been in the queue for too long | The last workflow run has been in the "queued" status for too long. This may mean that it has failed to be assigned to a runner. The default threshold is provided as an example and can be adjusted for relevant workflows with context macros. |
last(/GitHub repository by HTTP/github.repo.workflows.last_run.status[{#WORKFLOW_NAME}])=0 and changecount(/GitHub repository by HTTP/github.repo.workflows.last_run.status[{#WORKFLOW_NAME}],{$GITHUB.WORKFLOW.STATUS.QUEUED.THRESH:"workflow_queued:{#WORKFLOW_NAME}"})=0 |Warning |
||
GitHub: Workflow [{#WORKFLOW_NAME}]: The workflow has been in progress for too long | The last workflow run has been in the "in_progress" status for too long. The default threshold is provided as an example and can be adjusted for relevant workflows with context macros. |
last(/GitHub repository by HTTP/github.repo.workflows.last_run.status[{#WORKFLOW_NAME}])=1 and changecount(/GitHub repository by HTTP/github.repo.workflows.last_run.status[{#WORKFLOW_NAME}],{$GITHUB.WORKFLOW.STATUS.IN_PROGRESS.THRESH:"workflow_in_progress:{#WORKFLOW_NAME}"})=0 |Warning |
||
GitHub: Workflow [{#WORKFLOW_NAME}]: The workflow has failed | The last workflow run has returned a "failure" conclusion. |
last(/GitHub repository by HTTP/github.repo.workflows.last_run.conclusion[{#WORKFLOW_NAME}])=1 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Branch discovery | Discovers repository branches. Information about endpoint: https://docs.github.com/en/rest/branches/branches?apiVersion=2022-11-28#list-branches |
Dependent item | github.repo.branches.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Branch [{#BRANCH_NAME}]: Number of commits | Get the number of commits in the branch. Information about endpoint: https://docs.github.com/en/rest/commits/commits?apiVersion=2022-11-28#list-commits |
Script | github.repo.branches.commits.total[{#BRANCH_NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Self-hosted runner discovery | Discovers self-hosted runners of the repository. Note that admin access to the repository is required to use this endpoint: https://docs.github.com/en/rest/actions/self-hosted-runners?apiVersion=2022-11-28#list-self-hosted-runners-for-a-repository |
Dependent item | github.repo.runners.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Runner [{#RUNNER_NAME}]: Busy | Indicates whether the runner is currently executing a job. |
Dependent item | github.repo.runners.busy[{#RUNNER_NAME}] Preprocessing
|
Runner [{#RUNNER_NAME}]: Online | Indicates whether the runner is connected to GitHub and is ready to execute jobs. |
Dependent item | github.repo.runners.online[{#RUNNER_NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
GitHub: Runner [{#RUNNER_NAME}]: The runner has become offline | The runner was online previously, but is currently not connected to GitHub. This could be because the machine is offline, the self-hosted runner application is not running on the machine, or the self-hosted runner application cannot communicate with GitHub. |
last(/GitHub repository by HTTP/github.repo.runners.online[{#RUNNER_NAME}],#2)=1 and last(/GitHub repository by HTTP/github.repo.runners.online[{#RUNNER_NAME}])=0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Discovery of community profile metrics | Discovers community profile metrics (the repository must not be a fork). Information about community profile metrics: https://docs.github.com/en/rest/metrics/community?apiVersion=2022-11-28#get-community-profile-metrics |
Dependent item | github.repo.community_profile.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Health percentage score | The health percentage score is defined as a percentage of how many of the recommended community health files are present. For more information, see the documentation: https://docs.github.com/en/communities/setting-up-your-project-for-healthy-contributions/about-community-profiles-for-public-repositories |
Dependent item | github.repo.repository.health[{#SINGLETON}] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Official JMX Template from Zabbix distribution. Could be useful for many Java Applications (JMX).
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Refer to the vendor documentation.
Name | Description | Default |
---|---|---|
{$JMX.NONHEAP.MEM.USAGE.MAX} | A threshold in percent for Non-heap memory utilization trigger. |
85 |
{$JMX.NONHEAP.MEM.USAGE.TIME} | The time during which the Non-heap memory utilization may exceed the threshold. |
10m |
{$JMX.HEAP.MEM.USAGE.MAX} | A threshold in percent for Heap memory utilization trigger. |
85 |
{$JMX.HEAP.MEM.USAGE.TIME} | The time during which the Heap memory utilization may exceed the threshold. |
10m |
{$JMX.MP.USAGE.MAX} | A threshold in percent for memory pools utilization trigger. Use a context to change the threshold for a specific pool. |
85 |
{$JMX.MP.USAGE.TIME} | The time during which the memory pools utilization may exceed the threshold. |
10m |
{$JMX.FILE.DESCRIPTORS.MAX} | A threshold in percent for file descriptors count trigger. |
85 |
{$JMX.FILE.DESCRIPTORS.TIME} | The time during which the file descriptors count may exceed the threshold. |
3m |
{$JMX.CPU.LOAD.MAX} | A threshold in percent for CPU utilization trigger. |
85 |
{$JMX.CPU.LOAD.TIME} | The time during which the CPU utilization may exceed the threshold. |
5m |
{$JMX.MEM.POOL.NAME.MATCHES} | This macro used in memory pool discovery as a filter. |
Old Gen|G1|Perm Gen|Code Cache|Tenured Gen |
{$JMX.USER} | JMX username. |
|
{$JMX.PASSWORD} | JMX password. |
Name | Description | Type | Key and additional info |
---|---|---|---|
ClassLoading: Loaded class count | Displays number of classes that are currently loaded in the Java virtual machine. |
JMX agent | jmx["java.lang:type=ClassLoading","LoadedClassCount"] Preprocessing
|
ClassLoading: Total loaded class count | Displays the total number of classes that have been loaded since the Java virtual machine has started execution. |
JMX agent | jmx["java.lang:type=ClassLoading","TotalLoadedClassCount"] Preprocessing
|
ClassLoading: Unloaded class count | Displays the total number of classes that have been loaded since the Java virtual machine has started execution. |
JMX agent | jmx["java.lang:type=ClassLoading","UnloadedClassCount"] Preprocessing
|
Compilation: Name of the current JIT compiler | Displays the total number of classes unloaded since the Java virtual machine has started execution. |
JMX agent | jmx["java.lang:type=Compilation","Name"] Preprocessing
|
Compilation: Accumulated time spent | Displays the approximate accumulated elapsed time spent in compilation, in seconds. |
JMX agent | jmx["java.lang:type=Compilation","TotalCompilationTime"] Preprocessing
|
Memory: Heap memory committed | Current heap memory allocated. This amount of memory is guaranteed for the Java virtual machine to use. |
JMX agent | jmx["java.lang:type=Memory","HeapMemoryUsage.committed"] |
Memory: Heap memory maximum size | Maximum amount of heap that can be used for memory management. This amount of memory is not guaranteed to be available if it is greater than the amount of committed memory. The Java virtual machine may fail to allocate memory even if the amount of used memory does not exceed this maximum size. |
JMX agent | jmx["java.lang:type=Memory","HeapMemoryUsage.max"] Preprocessing
|
Memory: Heap memory used | Current memory usage outside the heap. |
JMX agent | jmx["java.lang:type=Memory","HeapMemoryUsage.used"] Preprocessing
|
Memory: Non-Heap memory committed | Current memory allocated outside the heap. This amount of memory is guaranteed for the Java virtual machine to use. |
JMX agent | jmx["java.lang:type=Memory","NonHeapMemoryUsage.committed"] Preprocessing
|
Memory: Non-Heap memory maximum size | Maximum amount of non-heap memory that can be used for memory management. This amount of memory is not guaranteed to be available if it is greater than the amount of committed memory. The Java virtual machine may fail to allocate memory even if the amount of used memory does not exceed this maximum size. |
JMX agent | jmx["java.lang:type=Memory","NonHeapMemoryUsage.max"] Preprocessing
|
Memory: Non-Heap memory used | Current memory usage outside the heap |
JMX agent | jmx["java.lang:type=Memory","NonHeapMemoryUsage.used"] Preprocessing
|
Memory: Object pending finalization count | The approximate number of objects for which finalization is pending. |
JMX agent | jmx["java.lang:type=Memory","ObjectPendingFinalizationCount"] Preprocessing
|
OperatingSystem: File descriptors maximum count | This is the number of file descriptors we can have opened in the same process, as determined by the operating system. You can never have more file descriptors than this number. |
JMX agent | jmx["java.lang:type=OperatingSystem","MaxFileDescriptorCount"] Preprocessing
|
OperatingSystem: File descriptors opened | This is the number of opened file descriptors at the moment, if this reaches the MaxFileDescriptorCount, the application will throw an IOException: Too many open files. This could mean you are opening file descriptors and never closing them. |
JMX agent | jmx["java.lang:type=OperatingSystem","OpenFileDescriptorCount"] |
OperatingSystem: Process CPU Load | ProcessCpuLoad represents the CPU load in this process. |
JMX agent | jmx["java.lang:type=OperatingSystem","ProcessCpuLoad"] Preprocessing
|
Runtime: JVM uptime | JMX agent | jmx["java.lang:type=Runtime","Uptime"] Preprocessing
|
|
Runtime: JVM name | JMX agent | jmx["java.lang:type=Runtime","VmName"] Preprocessing
|
|
Runtime: JVM version | JMX agent | jmx["java.lang:type=Runtime","VmVersion"] Preprocessing
|
|
Threading: Daemon thread count | Number of daemon threads running. |
JMX agent | jmx["java.lang:type=Threading","DaemonThreadCount"] Preprocessing
|
Threading: Peak thread count | Maximum number of threads being executed at the same time since the JVM was started or the peak was reset. |
JMX agent | jmx["java.lang:type=Threading","PeakThreadCount"] |
Threading: Thread count | The number of threads running at the current moment. |
JMX agent | jmx["java.lang:type=Threading","ThreadCount"] |
Threading: Total started thread count | The number of threads started since the JVM was launched. |
JMX agent | jmx["java.lang:type=Threading","TotalStartedThreadCount"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Generic Java JMX: Compilation: {HOST.NAME} uses suboptimal JIT compiler | find(/Generic Java JMX/jmx["java.lang:type=Compilation","Name"],,"like","Client")=1 |Info |
Manual close: Yes | ||
Generic Java JMX: Memory: Heap memory usage is high | min(/Generic Java JMX/jmx["java.lang:type=Memory","HeapMemoryUsage.used"],{$JMX.HEAP.MEM.USAGE.TIME})>(last(/Generic Java JMX/jmx["java.lang:type=Memory","HeapMemoryUsage.max"])*{$JMX.HEAP.MEM.USAGE.MAX}/100) and last(/Generic Java JMX/jmx["java.lang:type=Memory","HeapMemoryUsage.max"])>0 |Warning |
|||
Generic Java JMX: Memory: Non-Heap memory usage is high | min(/Generic Java JMX/jmx["java.lang:type=Memory","NonHeapMemoryUsage.used"],{$JMX.NONHEAP.MEM.USAGE.TIME})>(last(/Generic Java JMX/jmx["java.lang:type=Memory","NonHeapMemoryUsage.max"])*{$JMX.NONHEAP.MEM.USAGE.MAX}/100) and last(/Generic Java JMX/jmx["java.lang:type=Memory","NonHeapMemoryUsage.max"])>0 |Warning |
|||
Generic Java JMX: OperatingSystem: Opened file descriptor count is high | min(/Generic Java JMX/jmx["java.lang:type=OperatingSystem","OpenFileDescriptorCount"],{$JMX.FILE.DESCRIPTORS.TIME})>(last(/Generic Java JMX/jmx["java.lang:type=OperatingSystem","MaxFileDescriptorCount"])*{$JMX.FILE.DESCRIPTORS.MAX}/100) |Warning |
|||
Generic Java JMX: OperatingSystem: Process CPU Load is high | min(/Generic Java JMX/jmx["java.lang:type=OperatingSystem","ProcessCpuLoad"],{$JMX.CPU.LOAD.TIME})>{$JMX.CPU.LOAD.MAX} |Average |
|||
Generic Java JMX: Runtime: JVM is not reachable | nodata(/Generic Java JMX/jmx["java.lang:type=Runtime","Uptime"],5m)=1 |Average |
Manual close: Yes | ||
Generic Java JMX: Runtime: {HOST.NAME} runs suboptimal VM type | find(/Generic Java JMX/jmx["java.lang:type=Runtime","VmName"],,"like","Server")<>1 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Garbage collector discovery | Garbage collectors metrics discovery. |
JMX agent | jmx.discovery["beans","java.lang:name=*,type=GarbageCollector"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
GarbageCollector: {#JMXNAME} number of collections per second | Displays the total number of collections that have occurred per second. |
JMX agent | jmx["java.lang:name={#JMXNAME},type=GarbageCollector","CollectionCount"] Preprocessing
|
GarbageCollector: {#JMXNAME} accumulated time spent in collection | Displays the approximate accumulated collection elapsed time, in seconds. |
JMX agent | jmx["java.lang:name={#JMXNAME},type=GarbageCollector","CollectionTime"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Memory pool discovery | Memory pools metrics discovery. |
JMX agent | jmx.discovery["beans","java.lang:name=*,type=MemoryPool"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Memory pool: {#JMXNAME} committed | Current memory allocated. |
JMX agent | jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.committed"] Preprocessing
|
Memory pool: {#JMXNAME} maximum size | Maximum amount of memory that can be used for memory management. This amount of memory is not guaranteed to be available if it is greater than the amount of committed memory. The Java virtual machine may fail to allocate memory even if the amount of used memory does not exceed this maximum size. |
JMX agent | jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.max"] Preprocessing
|
Memory pool: {#JMXNAME} used | Current memory usage. |
JMX agent | jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.used"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Generic Java JMX: Memory pool: {#JMXNAME} memory usage is high | min(/Generic Java JMX/jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.used"],{$JMX.MP.USAGE.TIME:"{#JMXNAME}"})>(last(/Generic Java JMX/jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.max"])*{$JMX.MP.USAGE.MAX:"{#JMXNAME}"}/100) and last(/Generic Java JMX/jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.max"])>0 |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Official Template for Microsoft Exchange Server 2016.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Metrics are collected by Zabbix agent active.
1. Import the template into Zabbix.
2. Link the imported template to a host with MS Exchange.
Note that template doesn't provide information about Windows services state. Recommended to use it with "OS Windows by Zabbix agent active" template.
Name | Description | Default |
---|---|---|
{$MS.EXCHANGE.PERF.INTERVAL} | Update interval for perfcounteren items. |
60 |
{$MS.EXCHANGE.DB.FAULTS.TIME} | The time during which the database page faults may exceed the threshold. |
5m |
{$MS.EXCHANGE.DB.FAULTS.WARN} | Threshold for database page faults trigger. |
0 |
{$MS.EXCHANGE.LOG.STALLS.TIME} | The time during which the log records stalled may exceed the threshold. |
10m |
{$MS.EXCHANGE.LOG.STALLS.WARN} | Threshold for log records stalled trigger. |
100 |
{$MS.EXCHANGE.DB.ACTIVE.READ.TIME} | The time during which the active database read operations latency may exceed the threshold. |
5m |
{$MS.EXCHANGE.DB.ACTIVE.READ.WARN} | Threshold for active database read operations latency trigger. |
0.02 |
{$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME} | The time during which the active database write operations latency may exceed the threshold. |
10m |
{$MS.EXCHANGE.DB.ACTIVE.WRITE.WARN} | Threshold for active database write operations latency trigger. |
0.05 |
{$MS.EXCHANGE.DB.PASSIVE.READ.TIME} | The time during which the passive database read operations latency may exceed the threshold. |
5m |
{$MS.EXCHANGE.DB.PASSIVE.READ.WARN} | Threshold for passive database read operations latency trigger. |
0.2 |
{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME} | The time during which the passive database write operations latency may exceed the threshold. |
10m |
{$MS.EXCHANGE.RPC.TIME} | The time during which the RPC requests latency may exceed the threshold. |
10m |
{$MS.EXCHANGE.RPC.WARN} | Threshold for RPC requests latency trigger. |
0.05 |
{$MS.EXCHANGE.RPC.COUNT.TIME} | The time during which the RPC total requests may exceed the threshold. |
5m |
{$MS.EXCHANGE.RPC.COUNT.WARN} | Threshold for LDAP triggers. |
70 |
{$MS.EXCHANGE.LDAP.TIME} | The time during which the LDAP metrics may exceed the threshold. |
5m |
{$MS.EXCHANGE.LDAP.WARN} | Threshold for LDAP triggers. |
0.05 |
{$AGENT.TIMEOUT} | Timeout after which agent is considered unavailable. |
5m |
Name | Description | Type | Key and additional info |
---|---|---|---|
Databases total mounted | Shows the number of active database copies on the server. |
Zabbix agent (active) | perfcounteren["\MSExchange Active Manager(_total)\Database Mounted"] Preprocessing
|
ActiveSync: ping command pending | Shows the number of ping commands currently pending in the queue. |
Zabbix agent (active) | perfcounteren["\MSExchange ActiveSync\Ping Commands Pending", {$MS.EXCHANGE.PERF.INTERVAL}] |
ActiveSync: requests per second | Shows the number of HTTP requests received from the client via ASP.NET per second. Determines the current Exchange ActiveSync request rate. Used only to determine current user load. |
Zabbix agent (active) | perfcounteren["\MSExchange ActiveSync\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
ActiveSync: sync commands per second | Shows the number of sync commands processed per second. Clients use this command to synchronize items within a folder. |
Zabbix agent (active) | perfcounteren["\MSExchange ActiveSync\Sync Commands/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Autodiscover: requests per second | Shows the number of Autodiscover service requests processed each second. Determines current user load. |
Zabbix agent (active) | perfcounteren["\MSExchangeAutodiscover\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Availability Service: availability requests per second | Shows the number of requests serviced per second. The request can be only for free/ busy information or include suggestions. One request may contain multiple mailboxes. Determines the rate at which Availability service requests are occurring. |
Zabbix agent (active) | perfcounteren["\MSExchange Availability Service\Availability Requests (sec)", {$MS.EXCHANGE.PERF.INTERVAL}] |
Outlook Web App: current unique users | Shows the number of unique users currently logged on to Outlook Web App. This value monitors the number of unique active user sessions, so that users are only removed from this counter after they log off or their session times out. Determines current user load. |
Zabbix agent (active) | perfcounteren["\MSExchange OWA\Current Unique Users", {$MS.EXCHANGE.PERF.INTERVAL}] |
Outlook Web App: requests per second | Shows the number of requests handled by Outlook Web App per second. Determines current user load. |
Zabbix agent (active) | perfcounteren["\MSExchange OWA\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MSExchangeWS: requests per second | Shows the number of requests processed each second. Determines current user load. |
Zabbix agent (active) | perfcounteren["\MSExchangeWS\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Active agent availability | Availability of active checks on the host. The value of this item corresponds to availability icons in the host list. Possible values: 0 - unknown 1 - available 2 - not available |
Zabbix internal | zabbix[host,active_agent,available] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MS Exchange 2016: Active checks are not available | Active checks are considered unavailable. Agent is not sending heartbeat for prolonged time. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/zabbix[host,active_agent,available],{$AGENT.TIMEOUT})=2 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Databases discovery | Discovery of Exchange databases. |
Zabbix agent (active) | perf_instance.discovery["MSExchange Active Manager"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Active Manager [{#INSTANCE}]: Database copy role | Database copy active or passive role. |
Zabbix agent (active) | perfcounteren["\MSExchange Active Manager({#INSTANCE})\Database Copy Role Active"] Preprocessing
|
Information Store [{#INSTANCE}]: Database state | Database state. Possible values: 0: Database without any copy and dismounted. 1: Database is a primary database and mounted. 2: Database is a passive copy and the state is healthy. |
Zabbix agent (active) | perfcounteren["\MSExchangeIS Store({#INSTANCE})\Database State"] Preprocessing
|
Information Store [{#INSTANCE}]: Active mailboxes count | Number of active mailboxes in this database. |
Zabbix agent (active) | perfcounteren["\MSExchangeIS Store({#INSTANCE})\Active mailboxes"] |
Information Store [{#INSTANCE}]: Page faults per second | Indicates the rate of page faults that can't be serviced because there are no pages available for allocation from the database cache. If this counter is above 0, it's an indication that the MSExchange Database\I/O Database Writes (Attached) Average Latency is too high. |
Zabbix agent (active) | perfcounteren["\MSExchange Database({#INF.STORE})\Database Page Fault Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Information Store [{#INSTANCE}]: Log records stalled | Indicates the number of log records that can't be added to the log buffers per second because the log buffers are full. The average value should be below 10 per second. Spikes (maximum values) shouldn't be higher than 100 per second. |
Zabbix agent (active) | perfcounteren["\MSExchange Database({#INF.STORE})\Log Record Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Information Store [{#INSTANCE}]: Log threads waiting | Indicates the number of threads waiting to complete an update of the database by writing their data to the log. |
Zabbix agent (active) | perfcounteren["\MSExchange Database({#INF.STORE})\Log Threads Waiting", {$MS.EXCHANGE.PERF.INTERVAL}] |
Information Store [{#INSTANCE}]: RPC requests per second | Shows the number of RPC operations per second for each database instance. |
Zabbix agent (active) | perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC Operations/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Information Store [{#INSTANCE}]: RPC requests latency | RPC Latency average is the average latency of RPC requests per database. Average is calculated over all RPCs since exrpc32 was loaded. Should be less than 50ms at all times, with spikes less than 100ms. |
Zabbix agent (active) | perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Information Store [{#INSTANCE}]: RPC requests total | Indicates the overall RPC requests currently executing within the information store process. Should be below 70 at all times. |
Zabbix agent (active) | perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC requests", {$MS.EXCHANGE.PERF.INTERVAL}] |
Database Counters [{#INSTANCE}]: Active database read operations per second | Shows the number of database read operations. |
Zabbix agent (active) | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached)/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Database Counters [{#INSTANCE}]: Active database read operations latency | Shows the average length of time per database read operation. Should be less than 20 ms on average. |
Zabbix agent (active) | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Database Counters [{#INSTANCE}]: Passive database read operations latency | Shows the average length of time per passive database read operation. Should be less than 200ms on average. |
Zabbix agent (active) | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Database Counters [{#INSTANCE}]: Active database write operations per second | Shows the number of database write operations per second for each attached database instance. |
Zabbix agent (active) | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached)/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Database Counters [{#INSTANCE}]: Active database write operations latency | Shows the average length of time per database write operation. Should be less than 50ms on average. |
Zabbix agent (active) | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Database Counters [{#INSTANCE}]: Passive database write operations latency | Shows the average length of time, in ms, per passive database write operation. Should be less than the read latency for the same instance, as measured by the MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency counter. |
Zabbix agent (active) | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MS Exchange 2016: Information Store [{#INSTANCE}]: Page faults is too high | Too much page faults stalls for database "{#INSTANCE}". This counter should be 0 on production servers. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database({#INF.STORE})\Database Page Fault Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.FAULTS.TIME})>{$MS.EXCHANGE.DB.FAULTS.WARN} |Average |
||
MS Exchange 2016: Information Store [{#INSTANCE}]: Log records stalls is too high | Stalled log records too high. The average value should be less than 10 threads waiting. |
avg(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database({#INF.STORE})\Log Record Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LOG.STALLS.TIME})>{$MS.EXCHANGE.LOG.STALLS.WARN} |Average |
||
MS Exchange 2016: Information Store [{#INSTANCE}]: RPC Requests latency is too high | Should be less than 50ms at all times, with spikes less than 100ms. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchangeIS Store({#INSTANCE})\RPC Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.RPC.TIME})>{$MS.EXCHANGE.RPC.WARN} |Warning |
||
MS Exchange 2016: Information Store [{#INSTANCE}]: RPC Requests total count is too high | Should be below 70 at all times. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchangeIS Store({#INSTANCE})\RPC requests", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.RPC.COUNT.TIME})>{$MS.EXCHANGE.RPC.COUNT.WARN} |Warning |
||
MS Exchange 2016: Database Counters [{#INSTANCE}]: Average read time latency is too high | Should be less than 20ms on average. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.ACTIVE.READ.TIME})>{$MS.EXCHANGE.DB.ACTIVE.READ.WARN} |Warning |
||
MS Exchange 2016: Database Counters [{#INSTANCE}]: Average read time latency is too high | Should be less than 200ms on average. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.READ.TIME})>{$MS.EXCHANGE.DB.PASSIVE.READ.WARN} |Warning |
||
MS Exchange 2016: Database Counters [{#INSTANCE}]: Average write time latency is too high for {$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME} | Should be less than 50ms on average. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME})>{$MS.EXCHANGE.DB.ACTIVE.WRITE.WARN} |Warning |
||
MS Exchange 2016: Database Counters [{#INSTANCE}]: Average write time latency is higher than read time latency for {$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME} | Should be less than the read latency for the same instance, as measured by the MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency counter. |
avg(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME})>avg(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME}) |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Web services discovery | Discovery of Exchange web services. |
Zabbix agent (active) | perfinstanceen.discovery["Web Service"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Web Service [{#INSTANCE}]: Current connections | Shows the current number of connections established to the each Web Service. |
Zabbix agent (active) | perfcounteren["\Web Service({#INSTANCE})\Current Connections", {$MS.EXCHANGE.PERF.INTERVAL}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
LDAP discovery | Discovery of domain controller. |
Zabbix agent (active) | perfinstanceen.discovery["MSExchange ADAccess Domain Controllers"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Domain Controller [{#INSTANCE}]: Read time | Time that it takes to send an LDAP read request to the domain controller in question and get a response. Should ideally be below 50 ms; spikes below 100 ms are acceptable. |
Zabbix agent (active) | perfcounteren["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Read Time", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Domain Controller [{#INSTANCE}]: Search time | Time that it takes to send an LDAP search request and get a response. Should ideally be below 50 ms; spikes below 100 ms are acceptable. |
Zabbix agent (active) | perfcounteren["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Search Time", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MS Exchange 2016: Domain Controller [{#INSTANCE}]: LDAP read time is too high | Should be less than 50ms at all times, with spikes less than 100ms. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Read Time", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LDAP.TIME})>{$MS.EXCHANGE.LDAP.WARN} |Average |
||
MS Exchange 2016: Domain Controller [{#INSTANCE}]: LDAP search time is too high | Should be less than 50ms at all times, with spikes less than 100ms. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Search Time", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LDAP.TIME})>{$MS.EXCHANGE.LDAP.WARN} |Average |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Official Template for Microsoft Exchange Server 2016.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Metrics are collected by Zabbix agent.
1. Import the template into Zabbix.
2. Link the imported template to a host with MS Exchange.
Note that template doesn't provide information about Windows services state. Recommended to use it with "OS Windows by Zabbix agent" template.
Name | Description | Default |
---|---|---|
{$MS.EXCHANGE.PERF.INTERVAL} | Update interval for perfcounteren items. |
60 |
{$MS.EXCHANGE.DB.FAULTS.TIME} | The time during which the database page faults may exceed the threshold. |
5m |
{$MS.EXCHANGE.DB.FAULTS.WARN} | Threshold for database page faults trigger. |
0 |
{$MS.EXCHANGE.LOG.STALLS.TIME} | The time during which the log records stalled may exceed the threshold. |
10m |
{$MS.EXCHANGE.LOG.STALLS.WARN} | Threshold for log records stalled trigger. |
100 |
{$MS.EXCHANGE.DB.ACTIVE.READ.TIME} | The time during which the active database read operations latency may exceed the threshold. |
5m |
{$MS.EXCHANGE.DB.ACTIVE.READ.WARN} | Threshold for active database read operations latency trigger. |
0.02 |
{$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME} | The time during which the active database write operations latency may exceed the threshold. |
10m |
{$MS.EXCHANGE.DB.ACTIVE.WRITE.WARN} | Threshold for active database write operations latency trigger. |
0.05 |
{$MS.EXCHANGE.DB.PASSIVE.READ.TIME} | The time during which the passive database read operations latency may exceed the threshold. |
5m |
{$MS.EXCHANGE.DB.PASSIVE.READ.WARN} | Threshold for passive database read operations latency trigger. |
0.2 |
{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME} | The time during which the passive database write operations latency may exceed the threshold. |
10m |
{$MS.EXCHANGE.RPC.TIME} | The time during which the RPC requests latency may exceed the threshold. |
10m |
{$MS.EXCHANGE.RPC.WARN} | Threshold for RPC requests latency trigger. |
0.05 |
{$MS.EXCHANGE.RPC.COUNT.TIME} | The time during which the RPC total requests may exceed the threshold. |
5m |
{$MS.EXCHANGE.RPC.COUNT.WARN} | Threshold for LDAP triggers. |
70 |
{$MS.EXCHANGE.LDAP.TIME} | The time during which the LDAP metrics may exceed the threshold. |
5m |
{$MS.EXCHANGE.LDAP.WARN} | Threshold for LDAP triggers. |
0.05 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Databases total mounted | Shows the number of active database copies on the server. |
Zabbix agent | perfcounteren["\MSExchange Active Manager(_total)\Database Mounted"] Preprocessing
|
ActiveSync: ping command pending | Shows the number of ping commands currently pending in the queue. |
Zabbix agent | perfcounteren["\MSExchange ActiveSync\Ping Commands Pending", {$MS.EXCHANGE.PERF.INTERVAL}] |
ActiveSync: requests per second | Shows the number of HTTP requests received from the client via ASP.NET per second. Determines the current Exchange ActiveSync request rate. Used only to determine current user load. |
Zabbix agent | perfcounteren["\MSExchange ActiveSync\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
ActiveSync: sync commands per second | Shows the number of sync commands processed per second. Clients use this command to synchronize items within a folder. |
Zabbix agent | perfcounteren["\MSExchange ActiveSync\Sync Commands/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Autodiscover: requests per second | Shows the number of Autodiscover service requests processed each second. Determines current user load. |
Zabbix agent | perfcounteren["\MSExchangeAutodiscover\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Availability Service: availability requests per second | Shows the number of requests serviced per second. The request can be only for free/ busy information or include suggestions. One request may contain multiple mailboxes. Determines the rate at which Availability service requests are occurring. |
Zabbix agent | perfcounteren["\MSExchange Availability Service\Availability Requests (sec)", {$MS.EXCHANGE.PERF.INTERVAL}] |
Outlook Web App: current unique users | Shows the number of unique users currently logged on to Outlook Web App. This value monitors the number of unique active user sessions, so that users are only removed from this counter after they log off or their session times out. Determines current user load. |
Zabbix agent | perfcounteren["\MSExchange OWA\Current Unique Users", {$MS.EXCHANGE.PERF.INTERVAL}] |
Outlook Web App: requests per second | Shows the number of requests handled by Outlook Web App per second. Determines current user load. |
Zabbix agent | perfcounteren["\MSExchange OWA\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MSExchangeWS: requests per second | Shows the number of requests processed each second. Determines current user load. |
Zabbix agent | perfcounteren["\MSExchangeWS\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Databases discovery | Discovery of Exchange databases. |
Zabbix agent | perf_instance.discovery["MSExchange Active Manager"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Active Manager [{#INSTANCE}]: Database copy role | Database copy active or passive role. |
Zabbix agent | perfcounteren["\MSExchange Active Manager({#INSTANCE})\Database Copy Role Active"] Preprocessing
|
Information Store [{#INSTANCE}]: Database state | Database state. Possible values: 0: Database without any copy and dismounted. 1: Database is a primary database and mounted. 2: Database is a passive copy and the state is healthy. |
Zabbix agent | perfcounteren["\MSExchangeIS Store({#INSTANCE})\Database State"] Preprocessing
|
Information Store [{#INSTANCE}]: Active mailboxes count | Number of active mailboxes in this database. |
Zabbix agent | perfcounteren["\MSExchangeIS Store({#INSTANCE})\Active mailboxes"] |
Information Store [{#INSTANCE}]: Page faults per second | Indicates the rate of page faults that can't be serviced because there are no pages available for allocation from the database cache. If this counter is above 0, it's an indication that the MSExchange Database\I/O Database Writes (Attached) Average Latency is too high. |
Zabbix agent | perfcounteren["\MSExchange Database({#INF.STORE})\Database Page Fault Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Information Store [{#INSTANCE}]: Log records stalled | Indicates the number of log records that can't be added to the log buffers per second because the log buffers are full. The average value should be below 10 per second. Spikes (maximum values) shouldn't be higher than 100 per second. |
Zabbix agent | perfcounteren["\MSExchange Database({#INF.STORE})\Log Record Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Information Store [{#INSTANCE}]: Log threads waiting | Indicates the number of threads waiting to complete an update of the database by writing their data to the log. |
Zabbix agent | perfcounteren["\MSExchange Database({#INF.STORE})\Log Threads Waiting", {$MS.EXCHANGE.PERF.INTERVAL}] |
Information Store [{#INSTANCE}]: RPC requests per second | Shows the number of RPC operations per second for each database instance. |
Zabbix agent | perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC Operations/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Information Store [{#INSTANCE}]: RPC requests latency | RPC Latency average is the average latency of RPC requests per database. Average is calculated over all RPCs since exrpc32 was loaded. Should be less than 50ms at all times, with spikes less than 100ms. |
Zabbix agent | perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Information Store [{#INSTANCE}]: RPC requests total | Indicates the overall RPC requests currently executing within the information store process. Should be below 70 at all times. |
Zabbix agent | perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC requests", {$MS.EXCHANGE.PERF.INTERVAL}] |
Database Counters [{#INSTANCE}]: Active database read operations per second | Shows the number of database read operations. |
Zabbix agent | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached)/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Database Counters [{#INSTANCE}]: Active database read operations latency | Shows the average length of time per database read operation. Should be less than 20 ms on average. |
Zabbix agent | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Database Counters [{#INSTANCE}]: Passive database read operations latency | Shows the average length of time per passive database read operation. Should be less than 200ms on average. |
Zabbix agent | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Database Counters [{#INSTANCE}]: Active database write operations per second | Shows the number of database write operations per second for each attached database instance. |
Zabbix agent | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached)/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Database Counters [{#INSTANCE}]: Active database write operations latency | Shows the average length of time per database write operation. Should be less than 50ms on average. |
Zabbix agent | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Database Counters [{#INSTANCE}]: Passive database write operations latency | Shows the average length of time, in ms, per passive database write operation. Should be less than the read latency for the same instance, as measured by the MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency counter. |
Zabbix agent | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MS Exchange 2016: Information Store [{#INSTANCE}]: Page faults is too high | Too much page faults stalls for database "{#INSTANCE}". This counter should be 0 on production servers. |
min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database({#INF.STORE})\Database Page Fault Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.FAULTS.TIME})>{$MS.EXCHANGE.DB.FAULTS.WARN} |Average |
||
MS Exchange 2016: Information Store [{#INSTANCE}]: Log records stalls is too high | Stalled log records too high. The average value should be less than 10 threads waiting. |
avg(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database({#INF.STORE})\Log Record Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LOG.STALLS.TIME})>{$MS.EXCHANGE.LOG.STALLS.WARN} |Average |
||
MS Exchange 2016: Information Store [{#INSTANCE}]: RPC Requests latency is too high | Should be less than 50ms at all times, with spikes less than 100ms. |
min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchangeIS Store({#INSTANCE})\RPC Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.RPC.TIME})>{$MS.EXCHANGE.RPC.WARN} |Warning |
||
MS Exchange 2016: Information Store [{#INSTANCE}]: RPC Requests total count is too high | Should be below 70 at all times. |
min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchangeIS Store({#INSTANCE})\RPC requests", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.RPC.COUNT.TIME})>{$MS.EXCHANGE.RPC.COUNT.WARN} |Warning |
||
MS Exchange 2016: Database Counters [{#INSTANCE}]: Average read time latency is too high | Should be less than 20ms on average. |
min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.ACTIVE.READ.TIME})>{$MS.EXCHANGE.DB.ACTIVE.READ.WARN} |Warning |
||
MS Exchange 2016: Database Counters [{#INSTANCE}]: Average read time latency is too high | Should be less than 200ms on average. |
min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.READ.TIME})>{$MS.EXCHANGE.DB.PASSIVE.READ.WARN} |Warning |
||
MS Exchange 2016: Database Counters [{#INSTANCE}]: Average write time latency is too high for {$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME} | Should be less than 50ms on average. |
min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME})>{$MS.EXCHANGE.DB.ACTIVE.WRITE.WARN} |Warning |
||
MS Exchange 2016: Database Counters [{#INSTANCE}]: Average write time latency is higher than read time latency for {$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME} | Should be less than the read latency for the same instance, as measured by the MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency counter. |
avg(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME})>avg(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME}) |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Web services discovery | Discovery of Exchange web services. |
Zabbix agent | perfinstanceen.discovery["Web Service"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Web Service [{#INSTANCE}]: Current connections | Shows the current number of connections established to the each Web Service. |
Zabbix agent | perfcounteren["\Web Service({#INSTANCE})\Current Connections", {$MS.EXCHANGE.PERF.INTERVAL}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
LDAP discovery | Discovery of domain controller. |
Zabbix agent | perfinstanceen.discovery["MSExchange ADAccess Domain Controllers"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Domain Controller [{#INSTANCE}]: Read time | Time that it takes to send an LDAP read request to the domain controller in question and get a response. Should ideally be below 50 ms; spikes below 100 ms are acceptable. |
Zabbix agent | perfcounteren["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Read Time", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Domain Controller [{#INSTANCE}]: Search time | Time that it takes to send an LDAP search request and get a response. Should ideally be below 50 ms; spikes below 100 ms are acceptable. |
Zabbix agent | perfcounteren["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Search Time", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MS Exchange 2016: Domain Controller [{#INSTANCE}]: LDAP read time is too high | Should be less than 50ms at all times, with spikes less than 100ms. |
min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Read Time", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LDAP.TIME})>{$MS.EXCHANGE.LDAP.WARN} |Average |
||
MS Exchange 2016: Domain Controller [{#INSTANCE}]: LDAP search time is too high | Should be less than 50ms at all times, with spikes less than 100ms. |
min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Search Time", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LDAP.TIME})>{$MS.EXCHANGE.LDAP.WARN} |Average |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed to monitor etcd
by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template Etcd by HTTP
— collects metrics by help of the HTTP agent from /metrics
endpoint.
Refer to the
vendor documentation
.
For the users of etcd version <= 3.4
!
In
etcd v3.5
some metrics have been deprecated. See more details onUpgrade etcd from 3.4 to 3.5
. Please upgrade youretcd
instance, or use olderEtcd by HTTP
template version.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Make sure that etcd
allows the collection of metrics. You can test it by running: curl -L http://localhost:2379/metrics
.
Check if etcd
is accessible from Zabbix proxy or Zabbix server depending on where you are planning to do the monitoring. To verify it, run curl -L http://<etcd_node_address>:2379/metrics
.
Add the template to the etcd
node. Set the hostname or IP address of the etcd
host in the {$ETCD.HOST}
macro. By default, the template uses a client's port.
You can configure metrics endpoint location by adding --listen-metrics-urls
flag.
For more details, see the etcd documentation
.
Additional points to consider:
etcd
, don't forget to change macros: {$ETCD.SCHEME}
and {$ETCD.PORT}
.{$ETCD.USERNAME}
and {$ETCD.PASSWORD}
macros in the template to use on a host level if necessary.zabbix_get -s etcd-host -k etcd.health
.Name | Description | Default |
---|---|---|
{$ETCD.HOST} | The hostname or IP address of the |
<SET ETCD HOST> |
{$ETCD.PORT} | The port of the |
2379 |
{$ETCD.SCHEME} | The request scheme which may be |
http |
{$ETCD.USER} | ||
{$ETCD.PASSWORD} | ||
{$ETCD.LEADER.CHANGES.MAX.WARN} | The maximum number of leader changes. |
5 |
{$ETCD.PROPOSAL.FAIL.MAX.WARN} | The maximum number of proposal failures. |
2 |
{$ETCD.HTTP.FAIL.MAX.WARN} | The maximum number of HTTP request failures. |
2 |
{$ETCD.PROPOSAL.PENDING.MAX.WARN} | The maximum number of proposals in queue. |
5 |
{$ETCD.OPEN.FDS.MAX.WARN} | The maximum percentage of used file descriptors. |
90 |
{$ETCD.GRPC_CODE.MATCHES} | The filter of discoverable gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md. |
.* |
{$ETCD.GRPCCODE.NOTMATCHES} | The filter to exclude discovered gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md. |
CHANGE_IF_NEEDED |
{$ETCD.GRPC.ERRORS.MAX.WARN} | The maximum number of gRPC request failures. |
1 |
{$ETCD.GRPC_CODE.TRIGGER.MATCHES} | The filter of discoverable gRPC codes, which will create triggers. |
Aborted|Unavailable |
Name | Description | Type | Key and additional info |
---|---|---|---|
Service's TCP port state | Simple check | net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"] Preprocessing
|
|
Get node metrics | HTTP agent | etcd.get_metrics | |
Node health | HTTP agent | etcd.health Preprocessing
|
|
Server is a leader | It defines - whether or not this member is a leader: 1 - it is; 0 - otherwise. |
Dependent item | etcd.is.leader Preprocessing
|
Server has a leader | It defines - whether or not a leader exists: 1 - it exists; 0 - it does not. |
Dependent item | etcd.has.leader Preprocessing
|
Leader changes | The number of leader changes the member has seen since its start. |
Dependent item | etcd.leader.changes Preprocessing
|
Proposals committed per second | The number of consensus proposals committed. |
Dependent item | etcd.proposals.committed.rate Preprocessing
|
Proposals applied per second | The number of consensus proposals applied. |
Dependent item | etcd.proposals.applied.rate Preprocessing
|
Proposals failed per second | The number of failed proposals seen. |
Dependent item | etcd.proposals.failed.rate Preprocessing
|
Proposals pending | The current number of pending proposals to commit. |
Dependent item | etcd.proposals.pending Preprocessing
|
Reads per second | The number of read actions by |
Dependent item | etcd.reads.rate Preprocessing
|
Writes per second | The number of writes (e.g., |
Dependent item | etcd.writes.rate Preprocessing
|
Client gRPC received bytes per second | The number of bytes received from gRPC clients per second. |
Dependent item | etcd.network.grpc.received.rate Preprocessing
|
Client gRPC sent bytes per second | The number of bytes sent from gRPC clients per second. |
Dependent item | etcd.network.grpc.sent.rate Preprocessing
|
HTTP requests received | The number of requests received into the system (successfully parsed and |
Dependent item | etcd.http.requests.rate Preprocessing
|
HTTP 5XX | The number of handled failures of requests (non-watches), by the method ( |
Dependent item | etcd.http.requests.5xx.rate Preprocessing
|
HTTP 4XX | The number of handled failures of requests (non-watches), by the method ( |
Dependent item | etcd.http.requests.4xx.rate Preprocessing
|
RPCs received per second | The number of RPC stream messages received on the server. |
Dependent item | etcd.grpc.received.rate Preprocessing
|
RPCs sent per second | The number of gRPC stream messages sent by the server. |
Dependent item | etcd.grpc.sent.rate Preprocessing
|
RPCs started per second | The number of RPCs started on the server. |
Dependent item | etcd.grpc.started.rate Preprocessing
|
Get version | HTTP agent | etcd.get_version | |
Server version | The version of the |
Dependent item | etcd.server.version Preprocessing
|
Cluster version | The version of the |
Dependent item | etcd.cluster.version Preprocessing
|
DB size | The total size of the underlying database. |
Dependent item | etcd.db.size Preprocessing
|
Keys compacted per second | The number of DB keys compacted per second. |
Dependent item | etcd.keys.compacted.rate Preprocessing
|
Keys expired per second | The number of expired keys per second. |
Dependent item | etcd.keys.expired.rate Preprocessing
|
Keys total | The total number of keys. |
Dependent item | etcd.keys.total Preprocessing
|
Uptime |
|
Dependent item | etcd.uptime Preprocessing
|
Virtual memory | The size of virtual memory expressed in bytes. |
Dependent item | etcd.virtual.bytes Preprocessing
|
Resident memory | The size of resident memory expressed in bytes. |
Dependent item | etcd.res.bytes Preprocessing
|
CPU | The total user and system CPU time spent in seconds. |
Dependent item | etcd.cpu.util Preprocessing
|
Open file descriptors | The number of open file descriptors. |
Dependent item | etcd.open.fds Preprocessing
|
Maximum open file descriptors | The Maximum number of open file descriptors. |
Dependent item | etcd.max.fds Preprocessing
|
Deletes per second | The number of deletes seen by this member per second. |
Dependent item | etcd.delete.rate Preprocessing
|
PUT per second | The number of puts seen by this member per second. |
Dependent item | etcd.put.rate Preprocessing
|
Range per second | The number of ranges seen by this member per second. |
Dependent item | etcd.range.rate Preprocessing
|
Transaction per second | The number of transactions seen by this member per second. |
Dependent item | etcd.txn.rate Preprocessing
|
Pending events | The total number of pending events to be sent. |
Dependent item | etcd.events.sent.rate Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Etcd: Service is unavailable | last(/Etcd by HTTP/net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"])=0 |Average |
Manual close: Yes | ||
Etcd: Node healthcheck failed | See more details on https://etcd.io/docs/v3.5/op-guide/monitoring/#health-check. |
last(/Etcd by HTTP/etcd.health)=0 |Average |
Depends on:
|
|
Etcd: Failed to fetch info data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Etcd by HTTP/etcd.is.leader,30m)=1 |Warning |
Manual close: Yes Depends on:
|
|
Etcd: Member has no leader | If a member does not have a leader, it is totally unavailable. |
last(/Etcd by HTTP/etcd.has.leader)=0 |Average |
||
Etcd: Instance has seen too many leader changes | Rapid leadership changes impact the performance of |
(max(/Etcd by HTTP/etcd.leader.changes,15m)-min(/Etcd by HTTP/etcd.leader.changes,15m))>{$ETCD.LEADER.CHANGES.MAX.WARN} |Warning |
||
Etcd: Too many proposal failures | Normally related to two issues: temporary failures related to a leader election or longer downtime caused by a loss of quorum in the cluster. |
min(/Etcd by HTTP/etcd.proposals.failed.rate,5m)>{$ETCD.PROPOSAL.FAIL.MAX.WARN} |Warning |
||
Etcd: Too many proposals are queued to commit | Rising pending proposals suggests there is a high client load, or the member cannot commit proposals. |
min(/Etcd by HTTP/etcd.proposals.pending,5m)>{$ETCD.PROPOSAL.PENDING.MAX.WARN} |Warning |
||
Etcd: Too many HTTP requests failures | Too many requests failed on |
min(/Etcd by HTTP/etcd.http.requests.5xx.rate,5m)>{$ETCD.HTTP.FAIL.MAX.WARN} |Warning |
||
Etcd: Server version has changed | Etcd version has changed. Acknowledge to close the problem manually. |
last(/Etcd by HTTP/etcd.server.version,#1)<>last(/Etcd by HTTP/etcd.server.version,#2) and length(last(/Etcd by HTTP/etcd.server.version))>0 |Info |
Manual close: Yes | |
Etcd: Cluster version has changed | Etcd version has changed. Acknowledge to close the problem manually. |
last(/Etcd by HTTP/etcd.cluster.version,#1)<>last(/Etcd by HTTP/etcd.cluster.version,#2) and length(last(/Etcd by HTTP/etcd.cluster.version))>0 |Info |
Manual close: Yes | |
Etcd: Host has been restarted | Uptime is less than 10 minutes. |
last(/Etcd by HTTP/etcd.uptime)<10m |Info |
Manual close: Yes | |
Etcd: Current number of open files is too high | Heavy usage of a file descriptor (i.e., near the limit of the process's file descriptor) indicates a potential file descriptor exhaustion issue. |
min(/Etcd by HTTP/etcd.open.fds,5m)/last(/Etcd by HTTP/etcd.max.fds)*100>{$ETCD.OPEN.FDS.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
gRPC codes discovery | Dependent item | etcd.grpc_code.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
RPCs completed with code {#GRPC.CODE} | The number of RPCs completed on the server with grpc_code {#GRPC.CODE}. |
Dependent item | etcd.grpc.handled.rate[{#GRPC.CODE}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Etcd: Too many failed gRPC requests with code: {#GRPC.CODE} | min(/Etcd by HTTP/etcd.grpc.handled.rate[{#GRPC.CODE}],5m)>{$ETCD.GRPC.ERRORS.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Peers discovery | Dependent item | etcd.peer.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Etcd peer {#ETCD.PEER}: Bytes sent | The number of bytes sent to a peer with the ID |
Dependent item | etcd.bytes.sent.rate[{#ETCD.PEER}] Preprocessing
|
Etcd peer {#ETCD.PEER}: Bytes received | The number of bytes received from a peer with the ID |
Dependent item | etcd.bytes.received.rate[{#ETCD.PEER}] Preprocessing
|
Etcd peer {#ETCD.PEER}: Send failures | The number of sent failures from a peer with the ID |
Dependent item | etcd.sent.fail.rate[{#ETCD.PEER}] Preprocessing
|
Etcd peer {#ETCD.PEER}: Receive failures | The number of received failures from a peer with the ID |
Dependent item | etcd.received.fail.rate[{#ETCD.PEER}] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Envoy Proxy by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Envoy Proxy by HTTP
- collects metrics by HTTP agent from metrics endpoint {$ENVOY.METRICS.PATH} endpoint (default: /stats/prometheus).
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Internal service metrics are collected from {$ENVOY.METRICS.PATH} endpoint (default: /stats/prometheus). https://www.envoyproxy.io/docs/envoy/v1.20.0/operations/stats_overview
Don't forget to change macros {$ENVOY.URL}, {$ENVOY.METRICS.PATH}. Also, see the Macros section for a list of macros used to set trigger values.
NOTE. Some metrics may not be collected depending on your Envoy Proxy instance version and configuration.
Name | Description | Default |
---|---|---|
{$ENVOY.URL} | Instance URL. |
http://localhost:9901 |
{$ENVOY.METRICS.PATH} | The path Zabbix will scrape metrics in prometheus format from. |
/stats/prometheus |
{$ENVOY.CERT.MIN} | Minimum number of days before certificate expiration used for trigger expression. |
7 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get node metrics | Get server metrics. |
HTTP agent | envoy.get_metrics Preprocessing
|
Server state | State of the server. Live - (default) Server is live and serving traffic. Draining - Server is draining listeners in response to external health checks failing. Pre initializing - Server has not yet completed cluster manager initialization. Initializing - Server is running the cluster manager initialization callbacks (e.g., RDS). |
Dependent item | envoy.server.state Preprocessing
|
Server live | 1 if the server is not currently draining, 0 otherwise. |
Dependent item | envoy.server.live Preprocessing
|
Uptime | Current server uptime in seconds. |
Dependent item | envoy.server.uptime Preprocessing
|
Certificate expiration, day before | Number of days until the next certificate being managed will expire. |
Dependent item | envoy.server.daysuntilfirstcertexpiring Preprocessing
|
Server concurrency | Number of worker threads. |
Dependent item | envoy.server.concurrency Preprocessing
|
Memory allocated | Current amount of allocated memory in bytes. Total of both new and old Envoy processes on hot restart. |
Dependent item | envoy.server.memory_allocated Preprocessing
|
Memory heap size | Current reserved heap size in bytes. New Envoy process heap size on hot restart. |
Dependent item | envoy.server.memoryheapsize Preprocessing
|
Memory physical size | Current estimate of total bytes of the physical memory. New Envoy process physical memory size on hot restart. |
Dependent item | envoy.server.memoryphysicalsize Preprocessing
|
Filesystem, flushed by timer rate | Total number of times internal flush buffers are written to a file due to flush timeout per second. |
Dependent item | envoy.filesystem.flushedbytimer.rate Preprocessing
|
Filesystem, write completed rate | Total number of times a file was written per second. |
Dependent item | envoy.filesystem.write_completed.rate Preprocessing
|
Filesystem, write failed rate | Total number of times an error occurred during a file write operation per second. |
Dependent item | envoy.filesystem.write_failed.rate Preprocessing
|
Filesystem, reopen failed rate | Total number of times a file was failed to be opened per second. |
Dependent item | envoy.filesystem.reopen_failed.rate Preprocessing
|
Connections, total | Total connections of both new and old Envoy processes. |
Dependent item | envoy.server.total_connections Preprocessing
|
Connections, parent | Total connections of the old Envoy process on hot restart. |
Dependent item | envoy.server.parent_connections Preprocessing
|
Clusters, warming | Number of currently warming (not active) clusters. |
Dependent item | envoy.clustermanager.warmingclusters Preprocessing
|
Clusters, active | Number of currently active (warmed) clusters. |
Dependent item | envoy.clustermanager.activeclusters Preprocessing
|
Clusters, added rate | Total clusters added (either via static config or CDS) per second. |
Dependent item | envoy.clustermanager.clusteradded.rate Preprocessing
|
Clusters, modified rate | Total clusters modified (via CDS) per second. |
Dependent item | envoy.clustermanager.clustermodified.rate Preprocessing
|
Clusters, removed rate | Total clusters removed (via CDS) per second. |
Dependent item | envoy.clustermanager.clusterremoved.rate Preprocessing
|
Clusters, updates rate | Total cluster updates per second. |
Dependent item | envoy.clustermanager.clusterupdated.rate Preprocessing
|
Listeners, active | Number of currently active listeners. |
Dependent item | envoy.listenermanager.totallisteners_active Preprocessing
|
Listeners, draining | Number of currently draining listeners. |
Dependent item | envoy.listenermanager.totallisteners_draining Preprocessing
|
Listener, warming | Number of currently warming listeners. |
Dependent item | envoy.listenermanager.totallisteners_warming Preprocessing
|
Listener manager, initialized | A boolean (1 if started and 0 otherwise) that indicates whether listeners have been initialized on workers. |
Dependent item | envoy.listenermanager.workersstarted Preprocessing
|
Listeners, create failure | Total failed listener object additions to workers per second. |
Dependent item | envoy.listenermanager.listenercreate_failure.rate Preprocessing
|
Listeners, create success | Total listener objects successfully added to workers per second. |
Dependent item | envoy.listenermanager.listenercreate_success.rate Preprocessing
|
Listeners, added | Total listeners added (either via static config or LDS) per second. |
Dependent item | envoy.listenermanager.listeneradded.rate Preprocessing
|
Listeners, stopped | Total listeners stopped per second. |
Dependent item | envoy.listenermanager.listenerstopped.rate Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Envoy Proxy: Server state is not live | last(/Envoy Proxy by HTTP/envoy.server.state) > 0 |Average |
|||
Envoy Proxy: Service has been restarted | Uptime is less than 10 minutes. |
last(/Envoy Proxy by HTTP/envoy.server.uptime)<10m |Info |
Manual close: Yes | |
Envoy Proxy: Failed to fetch metrics data | Zabbix has not received data for items for the last 10 minutes. |
nodata(/Envoy Proxy by HTTP/envoy.server.uptime,10m)=1 |Warning |
Manual close: Yes | |
Envoy Proxy: SSL certificate expires soon | Please check certificate. Less than {$ENVOY.CERT.MIN} days left until the next certificate being managed will expire. |
last(/Envoy Proxy by HTTP/envoy.server.days_until_first_cert_expiring)<{$ENVOY.CERT.MIN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster metrics discovery | Dependent item | envoy.lld.cluster Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster ["{#CLUSTER_NAME}"]: Membership, total | Current cluster membership total. |
Dependent item | envoy.cluster.membershiptotal["{#CLUSTERNAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Membership, healthy | Current cluster healthy total (inclusive of both health checking and outlier detection). |
Dependent item | envoy.cluster.membershiphealthy["{#CLUSTERNAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Membership, unhealthy | Current cluster unhealthy. |
Calculated | envoy.cluster.membershipunhealthy["{#CLUSTERNAME}"] |
Cluster ["{#CLUSTER_NAME}"]: Membership, degraded | Current cluster degraded total. |
Dependent item | envoy.cluster.membershipdegraded["{#CLUSTERNAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Connections, total | Current cluster total connections. |
Dependent item | envoy.cluster.upstreamcxtotal["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Connections, active | Current cluster total active connections. |
Dependent item | envoy.cluster.upstreamcxactive["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests total, rate | Current cluster request total per second. |
Dependent item | envoy.cluster.upstreamrqtotal.rate["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests timeout, rate | Current cluster requests that timed out waiting for a response per second. |
Dependent item | envoy.cluster.upstreamrqtimeout.rate["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests completed, rate | Total upstream requests completed per second. |
Dependent item | envoy.cluster.upstreamrqcompleted.rate["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests 2xx, rate | Aggregate HTTP response codes per second. |
Dependent item | envoy.cluster.upstreamrq2x.rate["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests 3xx, rate | Aggregate HTTP response codes per second. |
Dependent item | envoy.cluster.upstreamrq3x.rate["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests 4xx, rate | Aggregate HTTP response codes per second. |
Dependent item | envoy.cluster.upstreamrq4x.rate["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests 5xx, rate | Aggregate HTTP response codes per second. |
Dependent item | envoy.cluster.upstreamrq5x.rate["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests pending | Total active requests pending a connection pool connection. |
Dependent item | envoy.cluster.upstreamrqpendingactive["{#CLUSTERNAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Requests active | Total active requests. |
Dependent item | envoy.cluster.upstreamrqactive["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Upstream bytes out, rate | Total sent connection bytes per second. |
Dependent item | envoy.cluster.upstreamcxtxbytestotal.rate["{#CLUSTER_NAME}"] Preprocessing
|
Cluster ["{#CLUSTER_NAME}"]: Upstream bytes in, rate | Total received connection bytes per second. |
Dependent item | envoy.cluster.upstreamcxrxbytestotal.rate["{#CLUSTER_NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Envoy Proxy: There are unhealthy clusters | last(/Envoy Proxy by HTTP/envoy.cluster.membership_unhealthy["{#CLUSTER_NAME}"]) > 0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Listeners metrics discovery | Dependent item | envoy.lld.listeners Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Listener ["{#LISTENER_ADDRESS}"]: Connections, active | Total active connections. |
Dependent item | envoy.listener.downstreamcxactive["{#LISTENER_ADDRESS}"] Preprocessing
|
Listener ["{#LISTENER_ADDRESS}"]: Connections, rate | Total connections per second. |
Dependent item | envoy.listener.downstreamcxtotal.rate["{#LISTENER_ADDRESS}"] Preprocessing
|
Listener ["{#LISTENER_ADDRESS}"]: Sockets, undergoing | Sockets currently undergoing listener filter processing. |
Dependent item | envoy.listener.downstreamprecxactive["{#LISTENERADDRESS}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
HTTP metrics discovery | Dependent item | envoy.lld.http Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
HTTP ["{#CONN_MANAGER}"]: Requests, rate | Total active connections per second. |
Dependent item | envoy.http.downstreamrqtotal.rate["{#CONN_MANAGER}"] Preprocessing
|
HTTP ["{#CONN_MANAGER}"]: Requests, active | Total active requests. |
Dependent item | envoy.http.downstreamrqactive["{#CONN_MANAGER}"] Preprocessing
|
HTTP ["{#CONN_MANAGER}"]: Requests timeout, rate | Total requests closed due to a timeout on the request path per second. |
Dependent item | envoy.http.downstreamrqtimeout["{#CONN_MANAGER}"] Preprocessing
|
HTTP ["{#CONN_MANAGER}"]: Connections, rate | Total connections per second. |
Dependent item | envoy.http.downstreamcxtotal["{#CONN_MANAGER}"] Preprocessing
|
HTTP ["{#CONN_MANAGER}"]: Connections, active | Total active connections. |
Dependent item | envoy.http.downstreamcxactive["{#CONN_MANAGER}"] Preprocessing
|
HTTP ["{#CONN_MANAGER}"]: Bytes in, rate | Total bytes received per second. |
Dependent item | envoy.http.downstreamcxrxbytestotal.rate["{#CONN_MANAGER}"] Preprocessing
|
HTTP ["{#CONN_MANAGER}"]: Bytes out, rate | Total bytes sent per second. |
Dependent item | envoy.http.downstreamcxtxbytestota.rate["{#CONN_MANAGER}"] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Elasticsearch by Zabbix that work without any external scripts.
It works with both standalone and cluster instances.
The metrics are collected in one pass remotely using an HTTP agent.
They are getting values from REST API _cluster/health
, _cluster/stats
, _nodes/stats
requests.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Set the hostname or IP address of the Elasticsearch host in the {$ELASTICSEARCH.HOST}
macro.
Set the login and password in the {$ELASTICSEARCH.USERNAME}
and {$ELASTICSEARCH.PASSWORD}
macros.
If you use an atypical location of ES API, don't forget to change the macros {$ELASTICSEARCH.SCHEME}
,{$ELASTICSEARCH.PORT}
.
Name | Description | Default |
---|---|---|
{$ELASTICSEARCH.USERNAME} | The username of the Elasticsearch. |
|
{$ELASTICSEARCH.PASSWORD} | The password of the Elasticsearch. |
|
{$ELASTICSEARCH.HOST} | The hostname or IP address of the Elasticsearch host. |
<SET ELASTICSEARCH HOST> |
{$ELASTICSEARCH.PORT} | The port of the Elasticsearch host. |
9200 |
{$ELASTICSEARCH.SCHEME} | The scheme of the Elasticsearch (http/https). |
http |
{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} | The ES cluster maximum response time in seconds for trigger expression. |
10s |
{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN} | Maximum of query latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN} | Maximum of fetch latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN} | Maximum of indexing latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN} | Maximum of flush latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.HEAP_USED.MAX.WARN} | The maximum percent in the use of JVM heap for warning trigger expression. |
85 |
{$ELASTICSEARCH.HEAP_USED.MAX.CRIT} | The maximum percent in the use of JVM heap for critically trigger expression. |
95 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Service status | Checks if the service is running and accepting TCP connections. |
Simple check | net.tcp.service["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"] Preprocessing
|
Service response time | Checks performance of the TCP service. |
Simple check | net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"] |
Get cluster health | Returns the health status of a cluster. |
HTTP agent | es.cluster.get_health |
Cluster health status | Health status of the cluster, based on the state of its primary and replica shards. Statuses are: green All shards are assigned. yellow All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired. red One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned. |
Dependent item | es.cluster.status Preprocessing
|
Number of nodes | The number of nodes within the cluster. |
Dependent item | es.cluster.numberofnodes Preprocessing
|
Number of data nodes | The number of nodes that are dedicated to data nodes. |
Dependent item | es.cluster.numberofdata_nodes Preprocessing
|
Number of relocating shards | The number of shards that are under relocation. |
Dependent item | es.cluster.relocating_shards Preprocessing
|
Number of initializing shards | The number of shards that are under initialization. |
Dependent item | es.cluster.initializing_shards Preprocessing
|
Number of unassigned shards | The number of shards that are not allocated. |
Dependent item | es.cluster.unassigned_shards Preprocessing
|
Delayed unassigned shards | The number of shards whose allocation has been delayed by the timeout settings. |
Dependent item | es.cluster.delayedunassignedshards Preprocessing
|
Number of pending tasks | The number of cluster-level changes that have not yet been executed. |
Dependent item | es.cluster.numberofpending_tasks Preprocessing
|
Task max waiting in queue | The time expressed in seconds since the earliest initiated task is waiting for being performed. |
Dependent item | es.cluster.taskmaxwaitinginqueue Preprocessing
|
Inactive shards percentage | The ratio of inactive shards in the cluster expressed as a percentage. |
Dependent item | es.cluster.inactiveshardspercentasnumber Preprocessing
|
Get cluster stats | Returns cluster statistics. |
HTTP agent | es.cluster.get_stats |
Cluster uptime | Uptime duration in seconds since JVM has last started. |
Dependent item | es.nodes.jvm.max_uptime Preprocessing
|
Number of non-deleted documents | The total number of non-deleted documents across all primary shards assigned to the selected nodes. This number is based on the documents in Lucene segments and may include the documents from nested fields. |
Dependent item | es.indices.docs.count Preprocessing
|
Indices with shards assigned to nodes | The total number of indices with shards assigned to the selected nodes. |
Dependent item | es.indices.count Preprocessing
|
Total size of all file stores | The total size in bytes of all file stores across all selected nodes. |
Dependent item | es.nodes.fs.totalinbytes Preprocessing
|
Total available size to JVM in all file stores | The total number of bytes available to JVM in the file stores across all selected nodes. Depending on OS or process-level restrictions, this number may be less than nodes.fs.freeinbyes. This is the actual amount of free disk space the selected Elasticsearch nodes can use. |
Dependent item | es.nodes.fs.availableinbytes Preprocessing
|
Nodes with the data role | The number of selected nodes with the data role. |
Dependent item | es.nodes.count.data Preprocessing
|
Nodes with the ingest role | The number of selected nodes with the ingest role. |
Dependent item | es.nodes.count.ingest Preprocessing
|
Nodes with the master role | The number of selected nodes with the master role. |
Dependent item | es.nodes.count.master Preprocessing
|
Get nodes stats | Returns cluster nodes statistics. |
HTTP agent | es.nodes.get_stats |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Elasticsearch: Service is down | The service is unavailable or does not accept TCP connections. |
last(/Elasticsearch Cluster by HTTP/net.tcp.service["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"])=0 |Average |
Manual close: Yes | |
Elasticsearch: Service response time is too high | The performance of the TCP service is very low. |
min(/Elasticsearch Cluster by HTTP/net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"],5m)>{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} |Warning |
Manual close: Yes Depends on:
|
|
Elasticsearch: Health is YELLOW | All primary shards are assigned, but one or more replica shards are unassigned. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=1 |Average |
||
Elasticsearch: Health is RED | One or more primary shards are unassigned, so some data is unavailable. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=2 |High |
||
Elasticsearch: Health is UNKNOWN | The health status of the cluster is unknown or cannot be obtained. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=255 |High |
||
Elasticsearch: The number of nodes within the cluster has decreased | change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)<0 |Info |
Manual close: Yes | ||
Elasticsearch: The number of nodes within the cluster has increased | change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)>0 |Info |
Manual close: Yes | ||
Elasticsearch: Cluster has the initializing shards | The cluster has the initializing shards longer than 10 minutes. |
min(/Elasticsearch Cluster by HTTP/es.cluster.initializing_shards,10m)>0 |Average |
||
Elasticsearch: Cluster has the unassigned shards | The cluster has the unassigned shards longer than 10 minutes. |
min(/Elasticsearch Cluster by HTTP/es.cluster.unassigned_shards,10m)>0 |Average |
||
Elasticsearch: Cluster has been restarted | Uptime is less than 10 minutes. |
last(/Elasticsearch Cluster by HTTP/es.nodes.jvm.max_uptime)<10m |Info |
Manual close: Yes | |
Elasticsearch: Cluster does not have enough space for resharding | There is not enough disk space for index resharding. |
(last(/Elasticsearch Cluster by HTTP/es.nodes.fs.total_in_bytes)-last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes))/(last(/Elasticsearch Cluster by HTTP/es.cluster.number_of_data_nodes)-1)>last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes) |High |
||
Elasticsearch: Cluster has only two master nodes | The cluster has only two nodes with a master role and will be unavailable if one of them breaks. |
last(/Elasticsearch Cluster by HTTP/es.nodes.count.master)=2 |Disaster |
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster nodes discovery | Discovery ES cluster nodes. |
HTTP agent | es.nodes.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
ES {#ES.NODE}: Get data | Returns cluster nodes statistics. |
Dependent item | es.node.get.data[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total size | Total size (in bytes) of all file stores. |
Dependent item | es.node.fs.total.totalinbytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total available size | The total number of bytes available to this Java virtual machine on all file stores. Depending on OS or process level restrictions, this might appear less than fs.total.freeinbytes. This is the actual amount of free disk space the Elasticsearch node can utilize. |
Dependent item | es.node.fs.total.availableinbytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Node uptime | JVM uptime in seconds. |
Dependent item | es.node.jvm.uptime[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Maximum JVM memory available for use | The maximum amount of memory, in bytes, available for use by the heap. |
Dependent item | es.node.jvm.mem.heapmaxin_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Amount of JVM heap currently in use | The memory, in bytes, currently in use by the heap. |
Dependent item | es.node.jvm.mem.heapusedin_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Percent of JVM heap currently in use | The percentage of memory currently in use by the heap. |
Dependent item | es.node.jvm.mem.heapusedpercent[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Amount of JVM heap committed | The amount of memory, in bytes, available for use by the heap. |
Dependent item | es.node.jvm.mem.heapcommittedin_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Number of open HTTP connections | The number of currently open HTTP connections for the node. |
Dependent item | es.node.http.current_open[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Rate of HTTP connections opened | The number of HTTP connections opened for the node per second. |
Dependent item | es.node.http.opened.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent throttling operations | Time in seconds spent throttling operations for the last measuring span. |
Dependent item | es.node.indices.indexing.throttle_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent throttling recovery operations | Time in seconds spent throttling recovery operations for the last measuring span. |
Dependent item | es.node.indices.recovery.throttle_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent throttling merge operations | Time in seconds spent throttling merge operations for the last measuring span. |
Dependent item | es.node.indices.merges.totalthrottledtime[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Rate of queries | The number of query operations per second. |
Dependent item | es.node.indices.search.query.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total number of query | The total number of query operations. |
Dependent item | es.node.indices.search.query_total[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent performing query | Time in seconds spent performing query operations for the last measuring span. |
Dependent item | es.node.indices.search.query_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total time spent performing query | Time in milliseconds spent performing query operations. |
Dependent item | es.node.indices.search.querytimein_millis[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Query latency | The average query latency calculated by sampling the total number of queries and the total elapsed time at regular intervals. |
Calculated | es.node.indices.search.query_latency[{#ES.NODE}] |
ES {#ES.NODE}: Current query operations | The number of query operations currently running. |
Dependent item | es.node.indices.search.query_current[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Rate of fetch | The number of fetch operations per second. |
Dependent item | es.node.indices.search.fetch.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total number of fetch | The total number of fetch operations. |
Dependent item | es.node.indices.search.fetch_total[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent performing fetch | Time in seconds spent performing fetch operations for the last measuring span. |
Dependent item | es.node.indices.search.fetch_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total time spent performing fetch | Time in milliseconds spent performing fetch operations. |
Dependent item | es.node.indices.search.fetchtimein_millis[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Fetch latency | The average fetch latency calculated by sampling the total number of fetches and the total elapsed time at regular intervals. |
Calculated | es.node.indices.search.fetch_latency[{#ES.NODE}] |
ES {#ES.NODE}: Current fetch operations | The number of fetch operations currently running. |
Dependent item | es.node.indices.search.fetch_current[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Write thread pool executor tasks completed | The number of tasks completed by the write thread pool executor. |
Dependent item | es.node.thread_pool.write.completed.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Write thread pool active threads | The number of active threads in the write thread pool. |
Dependent item | es.node.thread_pool.write.active[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Write thread pool tasks in queue | The number of tasks in queue for the write thread pool. |
Dependent item | es.node.thread_pool.write.queue[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Write thread pool executor tasks rejected | The number of tasks rejected by the write thread pool executor. |
Dependent item | es.node.thread_pool.write.rejected.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Search thread pool executor tasks completed | The number of tasks completed by the search thread pool executor. |
Dependent item | es.node.thread_pool.search.completed.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Search thread pool active threads | The number of active threads in the search thread pool. |
Dependent item | es.node.thread_pool.search.active[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Search thread pool tasks in queue | The number of tasks in queue for the search thread pool. |
Dependent item | es.node.thread_pool.search.queue[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Search thread pool executor tasks rejected | The number of tasks rejected by the search thread pool executor. |
Dependent item | es.node.thread_pool.search.rejected.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Refresh thread pool executor tasks completed | The number of tasks completed by the refresh thread pool executor. |
Dependent item | es.node.thread_pool.refresh.completed.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Refresh thread pool active threads | The number of active threads in the refresh thread pool. |
Dependent item | es.node.thread_pool.refresh.active[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Refresh thread pool tasks in queue | The number of tasks in queue for the refresh thread pool. |
Dependent item | es.node.thread_pool.refresh.queue[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Refresh thread pool executor tasks rejected | The number of tasks rejected by the refresh thread pool executor. |
Dependent item | es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total number of indexing | The total number of indexing operations. |
Dependent item | es.node.indices.indexing.index_total[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total time spent performing indexing | Total time in milliseconds spent performing indexing operations. |
Dependent item | es.node.indices.indexing.indextimein_millis[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Indexing latency | The average indexing latency calculated from the available indextotal and indextimeinmillis metrics. |
Calculated | es.node.indices.indexing.index_latency[{#ES.NODE}] |
ES {#ES.NODE}: Current indexing operations | The number of indexing operations currently running. |
Dependent item | es.node.indices.indexing.index_current[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total number of index flushes to disk | The total number of flush operations. |
Dependent item | es.node.indices.flush.total[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total time spent on flushing indices to disk | Total time in milliseconds spent performing flush operations. |
Dependent item | es.node.indices.flush.totaltimein_millis[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Flush latency | The average flush latency calculated from the available flush.total and flush.totaltimein_millis metrics. |
Calculated | es.node.indices.flush.latency[{#ES.NODE}] |
ES {#ES.NODE}: Rate of index refreshes | The number of refresh operations per second. |
Dependent item | es.node.indices.refresh.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent performing refresh | Time in seconds spent performing refresh operations for the last measuring span. |
Dependent item | es.node.indices.refresh.time[{#ES.NODE}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Elasticsearch: ES {#ES.NODE}: Node has been restarted | Uptime is less than 10 minutes. |
last(/Elasticsearch Cluster by HTTP/es.node.jvm.uptime[{#ES.NODE}])<10m |Info |
Manual close: Yes | |
Elasticsearch: ES {#ES.NODE}: Percent of JVM heap in use is high | This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. |
min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.WARN} |Warning |
Depends on:
|
|
Elasticsearch: ES {#ES.NODE}: Percent of JVM heap in use is critical | This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. |
min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.CRIT} |High |
||
Elasticsearch: ES {#ES.NODE}: Query latency is too high | If latency exceeds a threshold, look for potential resource bottlenecks, or investigate whether you need to optimize your queries. |
min(/Elasticsearch Cluster by HTTP/es.node.indices.search.query_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN} |Warning |
||
Elasticsearch: ES {#ES.NODE}: Fetch latency is too high | The fetch phase should typically take much less time than the query phase. If you notice this metric consistently increasing, |
min(/Elasticsearch Cluster by HTTP/es.node.indices.search.fetch_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN} |Warning |
||
Elasticsearch: ES {#ES.NODE}: Write thread pool executor has the rejected tasks | The number of tasks rejected by the write thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.write.rejected.rate[{#ES.NODE}],5m)>0 |Warning |
||
Elasticsearch: ES {#ES.NODE}: Search thread pool executor has the rejected tasks | The number of tasks rejected by the search thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.search.rejected.rate[{#ES.NODE}],5m)>0 |Warning |
||
Elasticsearch: ES {#ES.NODE}: Refresh thread pool executor has the rejected tasks | The number of tasks rejected by the refresh thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}],5m)>0 |Warning |
||
Elasticsearch: ES {#ES.NODE}: Indexing latency is too high | If the latency is increasing, it may indicate that you are indexing too many documents at the same time (Elasticsearch's documentation |
min(/Elasticsearch Cluster by HTTP/es.node.indices.indexing.index_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN} |Warning |
||
Elasticsearch: ES {#ES.NODE}: Flush latency is too high | If you see this metric increasing steadily, it may indicate a problem with slow disks; this problem may escalate |
min(/Elasticsearch Cluster by HTTP/es.node.indices.flush.latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Docker engine by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Docker by Zabbix agent 2
— collects metrics by polling zabbix-agent2.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Setup and configure Zabbix agent 2 compiled with the Docker monitoring plugin. The user by which the Zabbix agent 2 is running should have access permissions to the Docker socket.
Test availability: zabbix_get -s docker-host -k docker.info
Name | Description | Default |
---|---|---|
{$DOCKER.LLD.FILTER.CONTAINER.MATCHES} | Filter of discoverable containers. |
.* |
{$DOCKER.LLD.FILTER.CONTAINER.NOT_MATCHES} | Filter to exclude discovered containers. |
CHANGE_IF_NEEDED |
{$DOCKER.LLD.FILTER.IMAGE.MATCHES} | Filter of discoverable images. |
.* |
{$DOCKER.LLD.FILTER.IMAGE.NOT_MATCHES} | Filter to exclude discovered images. |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
Ping | Zabbix agent | docker.ping Preprocessing
|
|
Get info | Zabbix agent | docker.info | |
Get containers | Zabbix agent | docker.containers | |
Get images | Zabbix agent | docker.images | |
Get data_usage | Zabbix agent | docker.data_usage | |
Containers total | Total number of containers on this host. |
Dependent item | docker.containers.total Preprocessing
|
Containers running | Total number of containers running on this host. |
Dependent item | docker.containers.running Preprocessing
|
Containers stopped | Total number of containers stopped on this host. |
Dependent item | docker.containers.stopped Preprocessing
|
Containers paused | Total number of containers paused on this host. |
Dependent item | docker.containers.paused Preprocessing
|
Images total | Number of images with intermediate image layers. |
Dependent item | docker.images.total Preprocessing
|
Storage driver | Docker storage driver. https://docs.docker.com/storage/storagedriver/ |
Dependent item | docker.driver Preprocessing
|
Memory limit enabled | Dependent item | docker.mem_limit.enabled Preprocessing
|
|
Swap limit enabled | Dependent item | docker.swap_limit.enabled Preprocessing
|
|
Kernel memory enabled | Dependent item | docker.kernel_mem.enabled Preprocessing
|
|
Kernel memory TCP enabled | Dependent item | docker.kernelmemtcp.enabled Preprocessing
|
|
CPU CFS Period enabled | https://docs.docker.com/config/containers/resource_constraints/#configure-the-default-cfs-scheduler |
Dependent item | docker.cpucfsperiod.enabled Preprocessing
|
CPU CFS Quota enabled | https://docs.docker.com/config/containers/resource_constraints/#configure-the-default-cfs-scheduler |
Dependent item | docker.cpucfsquota.enabled Preprocessing
|
CPU Shares enabled | https://docs.docker.com/config/containers/resource_constraints/#configure-the-default-cfs-scheduler |
Dependent item | docker.cpu_shares.enabled Preprocessing
|
CPU Set enabled | https://docs.docker.com/config/containers/resource_constraints/#configure-the-default-cfs-scheduler |
Dependent item | docker.cpu_set.enabled Preprocessing
|
Pids limit enabled | Dependent item | docker.pids_limit.enabled Preprocessing
|
|
IPv4 Forwarding enabled | Dependent item | docker.ipv4_forwarding.enabled Preprocessing
|
|
Debug enabled | Dependent item | docker.debug.enabled Preprocessing
|
|
Nfd | Number of used File Descriptors. |
Dependent item | docker.nfd Preprocessing
|
OomKill disabled | Dependent item | docker.oomkill.disabled Preprocessing
|
|
Goroutines | Number of goroutines. |
Dependent item | docker.goroutines Preprocessing
|
Logging driver | Dependent item | docker.logging_driver Preprocessing
|
|
Cgroup driver | Dependent item | docker.cgroup_driver Preprocessing
|
|
NEvents listener | Dependent item | docker.nevents_listener Preprocessing
|
|
Kernel version | Dependent item | docker.kernel_version Preprocessing
|
|
Operating system | Dependent item | docker.operating_system Preprocessing
|
|
OS type | Dependent item | docker.os_type Preprocessing
|
|
Architecture | Dependent item | docker.architecture Preprocessing
|
|
NCPU | Dependent item | docker.ncpu Preprocessing
|
|
Memory total | Dependent item | docker.mem.total Preprocessing
|
|
Docker root dir | Dependent item | docker.root_dir Preprocessing
|
|
Name | Dependent item | docker.name Preprocessing
|
|
Server version | Dependent item | docker.server_version Preprocessing
|
|
Default runtime | Dependent item | docker.default_runtime Preprocessing
|
|
Live restore enabled | Dependent item | docker.live_restore.enabled Preprocessing
|
|
Layers size | Dependent item | docker.layers_size Preprocessing
|
|
Images size | Dependent item | docker.images_size Preprocessing
|
|
Containers size | Dependent item | docker.containers_size Preprocessing
|
|
Volumes size | Dependent item | docker.volumes_size Preprocessing
|
|
Images available | Number of top-level images. |
Dependent item | docker.images.top_level Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Docker: Service is down | last(/Docker by Zabbix agent 2/docker.ping)=0 |Average |
Manual close: Yes | ||
Docker: Failed to fetch info data | Zabbix has not received data for items for the last 30 minutes. |
nodata(/Docker by Zabbix agent 2/docker.name,30m)=1 |Warning |
Manual close: Yes Depends on:
|
|
Docker: Version has changed | Docker version has changed. Acknowledge to close the problem manually. |
last(/Docker by Zabbix agent 2/docker.server_version,#1)<>last(/Docker by Zabbix agent 2/docker.server_version,#2) and length(last(/Docker by Zabbix agent 2/docker.server_version))>0 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Images discovery | Discovery of images metrics. |
Zabbix agent | docker.images.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Image {#NAME}: Created | Dependent item | docker.image.created["{#ID}"] Preprocessing
|
|
Image {#NAME}: Size | Dependent item | docker.image.size["{#ID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Containers discovery | Discovery of containers metrics. Parameter: true - Returns all containers false - Returns only running containers |
Zabbix agent | docker.containers.discovery[false] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Container {#NAME}: Get stats | Get container stats based on resource usage. |
Zabbix agent | docker.container_stats["{#NAME}"] |
Container {#NAME}: CPU total usage per second | Dependent item | docker.containerstats.cpuusage.total.rate["{#NAME}"] Preprocessing
|
|
Container {#NAME}: CPU percent usage | Dependent item | docker.containerstats.cpupct_usage["{#NAME}"] Preprocessing
|
|
Container {#NAME}: CPU kernelmode usage per second | Dependent item | docker.containerstats.cpuusage.kernel.rate["{#NAME}"] Preprocessing
|
|
Container {#NAME}: CPU usermode usage per second | Dependent item | docker.containerstats.cpuusage.user.rate["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Online CPUs | Dependent item | docker.containerstats.onlinecpus["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Throttling periods | Number of periods with throttling active. |
Dependent item | docker.containerstats.cpuusage.throttling_periods["{#NAME}"] Preprocessing
|
Container {#NAME}: Throttled periods | Number of periods when the container hits its throttling limit. |
Dependent item | docker.containerstats.cpuusage.throttled_periods["{#NAME}"] Preprocessing
|
Container {#NAME}: Throttled time | Aggregate time the container was throttled for in nanoseconds. |
Dependent item | docker.containerstats.cpuusage.throttled_time["{#NAME}"] Preprocessing
|
Container {#NAME}: Memory usage | Dependent item | docker.container_stats.memory.usage["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Memory maximum usage | Dependent item | docker.containerstats.memory.maxusage["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Memory commit bytes | Dependent item | docker.containerstats.memory.commitbytes["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Memory commit peak bytes | Dependent item | docker.containerstats.memory.commitpeak_bytes["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Memory private working set | Dependent item | docker.containerstats.memory.privateworking_set["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Current PIDs count | Current number of PIDs the container has created. |
Dependent item | docker.containerstats.pidsstats.current["{#NAME}"] Preprocessing
|
Container {#NAME}: Networks bytes received per second | Dependent item | docker.networks.rx_bytes["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Networks packets received per second | Dependent item | docker.networks.rx_packets["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Networks errors received per second | Dependent item | docker.networks.rx_errors["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Networks incoming packets dropped per second | Dependent item | docker.networks.rx_dropped["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Networks bytes sent per second | Dependent item | docker.networks.tx_bytes["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Networks packets sent per second | Dependent item | docker.networks.tx_packets["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Networks errors sent per second | Dependent item | docker.networks.tx_errors["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Networks outgoing packets dropped per second | Dependent item | docker.networks.tx_dropped["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Get info | Return low-level information about a container. |
Zabbix agent | docker.container_info["{#NAME}",full] |
Container {#NAME}: Created | Dependent item | docker.container_info.created["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Image | Dependent item | docker.container_info.image["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Restart count | Dependent item | docker.containerinfo.restartcount["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Status | Dependent item | docker.container_info.state.status["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Health status | Container's |
Dependent item | docker.container_info.state.health["{#NAME}"] Preprocessing
|
Container {#NAME}: Health failing streak | Dependent item | docker.container_info.state.health.failing["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Running | Dependent item | docker.container_info.state.running["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Paused | Dependent item | docker.container_info.state.paused["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Restarting | Dependent item | docker.container_info.state.restarting["{#NAME}"] Preprocessing
|
|
Container {#NAME}: OOMKilled | Dependent item | docker.container_info.state.oomkilled["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Dead | Dependent item | docker.container_info.state.dead["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Pid | Dependent item | docker.container_info.state.pid["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Exit code | Dependent item | docker.container_info.state.exitcode["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Error | Dependent item | docker.container_info.state.error["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Started at | Dependent item | docker.container_info.started["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Finished at | Time at which the container last terminated. |
Dependent item | docker.container_info.finished["{#NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Docker: Container {#NAME}: Health state container is unhealthy | Container health state is unhealthy. |
count(/Docker by Zabbix agent 2/docker.container_info.state.health["{#NAME}"],2m,,2)>=2 |High |
||
Docker: Container {#NAME}: Container has been stopped with error code | last(/Docker by Zabbix agent 2/docker.container_info.state.exitcode["{#NAME}"])>0 and last(/Docker by Zabbix agent 2/docker.container_info.state.running["{#NAME}"])=0 |Average |
Manual close: Yes | ||
Docker: Container {#NAME}: An error has occurred in the container | Container {#NAME} has an error. Acknowledge to close the problem manually. |
last(/Docker by Zabbix agent 2/docker.container_info.state.error["{#NAME}"],#1)<>last(/Docker by Zabbix agent 2/docker.container_info.state.error["{#NAME}"],#2) and length(last(/Docker by Zabbix agent 2/docker.container_info.state.error["{#NAME}"]))>0 |Warning |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Control-M by Zabbix that work without any external scripts.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template is intended to be used on Control-M Enterprise Manager instances.
It monitors:
Control-M server by HTTP
template.To use this template, you must set macros: {$API.TOKEN} and {$API.URI.ENDPOINT}.
To access the API token, use one of the following Control-M interfaces:
{$API.URI.ENDPOINT}
- is the Control-M Automation API endpoint for the API requests, including your server IP, or DNS address, Automation API port and path.
For example, https://monitored.controlm.instance:8443/automation-api
.
Name | Description | Default |
---|---|---|
{$API.URI.ENDPOINT} | The API endpoint is a URI - for example, |
<set the api uri endpoint here> |
{$API.TOKEN} | A token to use for API connections. |
<set the token here> |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get Control-M servers | Gets a list of servers. |
HTTP agent | controlm.servers |
Get SLA services | Gets all the SLA active services. |
HTTP agent | controlm.services |
Name | Description | Type | Key and additional info |
---|---|---|---|
Server discovery | Discovers the Control-M servers. |
Dependent item | controlm.server.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
SLA services discovery | Discovers the SLA services in the Control-M environment. |
Dependent item | controlm.services.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: stats | Gets the service statistics. |
Dependent item | service.stats['{#SERVICE.NAME}','{#SERVICE.JOB}'] Preprocessing
|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: status | Gets the service status. |
Dependent item | service.status['{#SERVICE.NAME}','{#SERVICE.JOB}'] Preprocessing
|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'executed' | Gets the number of jobs in the state - |
Dependent item | service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',executed] Preprocessing
|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'waitCondition' | Gets the number of jobs in the state - |
Dependent item | service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',waitCondition] Preprocessing
|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'waitResource' | Gets the number of jobs in the state - |
Dependent item | service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',waitResource] Preprocessing
|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'waitHost' | Gets the number of jobs in the state - |
Dependent item | service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',waitHost] Preprocessing
|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'waitWorkload' | Gets the number of jobs in the state - |
Dependent item | service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',waitWorkload] Preprocessing
|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'completed' | Gets the number of jobs in the state - |
Dependent item | service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',completed] Preprocessing
|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'error' | Gets the number of jobs in the state - |
Dependent item | service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',error] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Control-M: Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: status [{ITEM.VALUE}] | The service has encountered an issue. |
last(/Control-M enterprise manager by HTTP/service.status['{#SERVICE.NAME}','{#SERVICE.JOB}'],#1)=0 or last(/Control-M enterprise manager by HTTP/service.status['{#SERVICE.NAME}','{#SERVICE.JOB}'],#1)=10 |Average |
Manual close: Yes | |
Control-M: Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: status [{ITEM.VALUE}] | The service has finished its job late. |
last(/Control-M enterprise manager by HTTP/service.status['{#SERVICE.NAME}','{#SERVICE.JOB}'],#1)=3 |Warning |
Manual close: Yes | |
Control-M: Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs in 'error' state | There are services present which are in the state - |
last(/Control-M enterprise manager by HTTP/service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',error],#1)>0 |Average |
This template is designed to get metrics from the Control-M server using the Control-M Automation API with HTTP agent.
This template monitors server statistics, discovers jobs and agents using Low Level Discovery.
To use this template, macros {$API.TOKEN}, {$API.URI.ENDPOINT}, and {$SERVER.NAME} need to be set.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template is primarily intended for using in conjunction with the Control-M enterprise manager by HTTP
template in order to create host prototypes.
It monitors:
However, if you wish to monitor the Control-M server separately with this template, you must set the following macros: {$API.TOKEN}, {$API.URI.ENDPOINT}, and {$SERVER.NAME}.
To access the {$API.TOKEN}
macro, use one of the following interfaces:
{$API.URI.ENDPOINT}
- is the Control-M Automation API endpoint for the API requests, including your server IP, or DNS address, the Automation API port and path.
For example, https://monitored.controlm.instance:8443/automation-api
.
{$SERVER.NAME}
- is the name of the Control-M server to be monitored.
Name | Description | Default |
---|---|---|
{$SERVER.NAME} | The name of the Control-M server. |
<set the server name here> |
{$API.URI.ENDPOINT} | The API endpoint is a URI - for example, |
<set the api uri endpoint here> |
{$API.TOKEN} | A token to use for API connections. |
<set the token here> |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get Control-M server stats | Gets the statistics of the server. |
HTTP agent | controlm.server.stats Preprocessing
|
Get jobs | Gets the status of jobs. |
HTTP agent | controlm.jobs |
Get agents | Gets agents for the server. |
HTTP agent | controlm.agents |
Jobs statistics | Gets the statistics of jobs. |
Dependent item | controlm.jobs.statistics Preprocessing
|
Jobs returned | Gets the count of returned jobs. |
Dependent item | controlm.jobs.statistics.returned Preprocessing
|
Jobs total | Gets the count of total jobs. |
Dependent item | controlm.jobs.statistics.total Preprocessing
|
Server state | Gets the metric of the server state. |
Dependent item | server.state Preprocessing
|
Server message | Gets the metric of the server message. |
Dependent item | server.message Preprocessing
|
Server version | Gets the metric of the server version. |
Dependent item | server.version Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Control-M: Server is down | The server is down. |
last(/Control-M server by HTTP/server.state)=0 or last(/Control-M server by HTTP/server.state)=10 |High |
||
Control-M: Server disconnected | The server is disconnected. |
last(/Control-M server by HTTP/server.message,#1)="Disconnected" |High |
||
Control-M: Server error | The server has encountered an error. |
last(/Control-M server by HTTP/server.message,#1)<>"Connected" and last(/Control-M server by HTTP/server.message,#1)<>"Disconnected" and last(/Control-M server by HTTP/server.message,#1)<>"" |High |
||
Control-M: Server version has changed | The server version has changed. Acknowledge to close the problem manually. |
last(/Control-M server by HTTP/server.version,#1)<>last(/Control-M server by HTTP/server.version,#2) and length(last(/Control-M server by HTTP/server.version))>0 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Jobs discovery | Discovers jobs on the server. |
Dependent item | controlm.jobs.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Job [{#JOB.ID}]: stats | Gets the statistics of a job. |
Dependent item | job.stats['{#JOB.ID}'] Preprocessing
|
Job [{#JOB.ID}]: status | Gets the status of a job. |
Dependent item | job.status['{#JOB.ID}'] Preprocessing
|
Job [{#JOB.ID}]: number of runs | Gets the number of runs for a job. |
Dependent item | job.numberOfRuns['{#JOB.ID}'] Preprocessing
|
Job [{#JOB.ID}]: type | Gets the job type. |
Dependent item | job.type['{#JOB.ID}'] Preprocessing
|
Job [{#JOB.ID}]: held status | Gets the held status of a job. |
Dependent item | job.held['{#JOB.ID}'] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Control-M: Job [{#JOB.ID}]: status [{ITEM.VALUE}] | The job has encountered an issue. |
last(/Control-M server by HTTP/job.status['{#JOB.ID}'],#1)=1 or last(/Control-M server by HTTP/job.status['{#JOB.ID}'],#1)=10 |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Agent discovery | Discovers agents on the server. |
Dependent item | controlm.agent.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Agent [{#AGENT.NAME}]: stats | Gets the statistics of an agent. |
Dependent item | agent.stats['{#AGENT.NAME}'] Preprocessing
|
Agent [{#AGENT.NAME}]: status | Gets the status of an agent. |
Dependent item | agent.status['{#AGENT.NAME}'] Preprocessing
|
Agent [{#AGENT.NAME}]: version | Gets the version number of an agent. |
Dependent item | agent.version['{#AGENT.NAME}'] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Control-M: Agent [{#AGENT.NAME}]: status [{ITEM.VALUE}] | The agent has encountered an issue. |
last(/Control-M server by HTTP/agent.status['{#AGENT.NAME}'],#1)=1 or last(/Control-M server by HTTP/agent.status['{#AGENT.NAME}'],#1)=10 |Average |
Manual close: Yes | |
Control-M: Agent [{#AGENT.NAME}}: status disabled | The agent is disabled. |
last(/Control-M server by HTTP/agent.status['{#AGENT.NAME}'],#1)=2 or last(/Control-M server by HTTP/agent.status['{#AGENT.NAME}'],#1)=3 |Info |
Manual close: Yes | |
Control-M: Agent [{#AGENT.NAME}]: version has changed | The agent version has changed. Acknowledge to close the problem manually. |
last(/Control-M server by HTTP/agent.version['{#AGENT.NAME}'],#1)<>last(/Control-M server by HTTP/agent.version['{#AGENT.NAME}'],#2) |Info |
Manual close: Yes | |
Control-M: Agent [{#AGENT.NAME}]: unknown version | The agent version is unknown. |
last(/Control-M server by HTTP/agent.version['{#AGENT.NAME}'],#1)="Unknown" |Warning |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor HashiCorp Consul by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template HashiCorp Consul Cluster by HTTP
— collects metrics by HTTP agent from API endpoints.
More information about metrics you can find in official documentation.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Template need to use Authorization via API token.
Don't forget to change macros {$CONSUL.CLUSTER.URL}, {$CONSUL.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values.
This template support Consul namespaces. You can set macro {$CONSUL.NAMESPACE}, if you are interested in only one service namespace. Do not specify this macro to get all of services. In case of Open Source version leave this macro empty.
NOTE. Some metrics may not be collected depending on your HashiCorp Consul instance version and configuration. NOTE. You maybe are interested in Envoy Proxy by HTTP template.
Name | Description | Default |
---|---|---|
{$CONSUL.CLUSTER.URL} | Consul cluster URL. |
http://localhost:8500 |
{$CONSUL.TOKEN} | Consul auth token. |
<PUT YOUR AUTH TOKEN> |
{$CONSUL.NAMESPACE} | Consul service namespace. Enterprise only, in case of Open Source version leave this macro empty. Do not specify this macro to get all of services. |
|
{$CONSUL.API.SCHEME} | Consul API scheme. Using in node LLD. |
http |
{$CONSUL.API.PORT} | Consul API port. Using in node LLD. |
8500 |
{$CONSUL.LLD.FILTER.NODE_NAME.MATCHES} | Filter of discoverable discovered nodes. |
.* |
{$CONSUL.LLD.FILTER.NODENAME.NOTMATCHES} | Filter to exclude discovered nodes. |
CHANGE IF NEEDED |
{$CONSUL.LLD.FILTER.SERVICE_NAME.MATCHES} | Filter of discoverable discovered services. |
.* |
{$CONSUL.LLD.FILTER.SERVICENAME.NOTMATCHES} | Filter to exclude discovered services. |
CHANGE IF NEEDED |
{$CONSUL.SERVICE_NODES.CRITICAL.MAX.AVG} | Maximum number of service nodes in status 'critical' for trigger expression. Can be used with context. |
0 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster leader | Current leader address. |
HTTP agent | consul.get_leader Preprocessing
|
Nodes: peers | The number of Raft peers for the datacenter in which the agent is running. |
HTTP agent | consul.get_peers Preprocessing
|
Get nodes | Catalog of nodes registered in a given datacenter. |
HTTP agent | consul.get_nodes Preprocessing
|
Get nodes Serf health status | Get Serf Health Status for all agents in cluster. |
HTTP agent | consul.getclusterserf Preprocessing
|
Nodes: total | Number of nodes on current dc. |
Dependent item | consul.nodes_total Preprocessing
|
Nodes: passing | Number of agents on current dc with serf health status 'passing'. |
Dependent item | consul.nodes_passing Preprocessing
|
Nodes: critical | Number of agents on current dc with serf health status 'critical'. |
Dependent item | consul.nodes_critical Preprocessing
|
Nodes: warning | Number of agents on current dc with serf health status 'warning'. |
Dependent item | consul.nodes_warning Preprocessing
|
Get services | Catalog of services registered in a given datacenter. |
HTTP agent | consul.getcatalogservices Preprocessing
|
Services: total | Number of services on current dc. |
Dependent item | consul.services_total Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HashiCorp Consul Cluster: Leader has been changed | Consul cluster version has changed. Acknowledge to close the problem manually. |
last(/HashiCorp Consul Cluster by HTTP/consul.get_leader,#1)<>last(/HashiCorp Consul Cluster by HTTP/consul.get_leader,#2) and length(last(/HashiCorp Consul Cluster by HTTP/consul.get_leader))>0 |Info |
Manual close: Yes | |
HashiCorp Consul Cluster: One or more nodes in cluster in 'critical' state | One or more agents on current dc with serf health status 'critical'. |
last(/HashiCorp Consul Cluster by HTTP/consul.nodes_critical)>0 |Average |
||
HashiCorp Consul Cluster: One or more nodes in cluster in 'warning' state | One or more agents on current dc with serf health status 'warning'. |
last(/HashiCorp Consul Cluster by HTTP/consul.nodes_warning)>0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Consul cluster nodes discovery | Dependent item | consul.lld_nodes Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Node ["{#NODE_NAME}"]: Serf Health | Node Serf Health Status. |
Dependent item | consul.serf.health["{#NODE_NAME}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Consul cluster services discovery | Dependent item | consul.lld_services Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Service ["{#SERVICE_NAME}"]: Nodes passing | The number of nodes with service status |
Dependent item | consul.service.nodespassing["{#SERVICENAME}"] Preprocessing
|
Service ["{#SERVICE_NAME}"]: Nodes warning | The number of nodes with service status |
Dependent item | consul.service.nodeswarning["{#SERVICENAME}"] Preprocessing
|
Service ["{#SERVICE_NAME}"]: Nodes critical | The number of nodes with service status |
Dependent item | consul.service.nodescritical["{#SERVICENAME}"] Preprocessing
|
["{#SERVICE_NAME}"]: Get raw service state | Retrieve service instances providing the service indicated on the path. |
HTTP agent | consul.getservicestats["{#SERVICE_NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HashiCorp Consul Cluster: Service ["{#SERVICE_NAME}"]: Too many nodes with service status 'critical' | One or more nodes with service status 'critical'. |
last(/HashiCorp Consul Cluster by HTTP/consul.service.nodes_critical["{#SERVICE_NAME}"])>{$CONSUL.SERVICE_NODES.CRITICAL.MAX.AVG:"{#SERVICE_NAME}"} |Average |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor HashiCorp Consul by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Do not forget to enable Prometheus format for export metrics.
See documentation.
More information about metrics you can find in official documentation.
Template HashiCorp Consul Node by HTTP
— collects metrics by HTTP agent from /v1/agent/metrics endpoint.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Internal service metrics are collected from /v1/agent/metrics endpoint. Do not forget to enable Prometheus format for export metrics. See documentation. Template need to use Authorization via API token.
Don't forget to change macros {$CONSUL.NODE.API.URL}, {$CONSUL.TOKEN}.
Also, see the Macros section for a list of macros used to set trigger values.
More information about metrics you can find in official documentation.
This template support Consul namespaces. You can set macros {$CONSUL.LLD.FILTER.SERVICENAMESPACE.MATCHES}, {$CONSUL.LLD.FILTER.SERVICENAMESPACE.NOT_MATCHES} if you want to filter discovered services by namespace.
In case of Open Source version service namespace will be set to 'None'.
NOTE. Some metrics may not be collected depending on your HashiCorp Consul instance version and configuration.
NOTE. You maybe are interested in Envoy Proxy by HTTP template.
Name | Description | Default |
---|---|---|
{$CONSUL.NODE.API.URL} | Consul instance URL. |
http://localhost:8500 |
{$CONSUL.TOKEN} | Consul auth token. |
<PUT YOUR AUTH TOKEN> |
{$CONSUL.OPEN.FDS.MAX.WARN} | Maximum percentage of used file descriptors. |
90 |
{$CONSUL.LLD.FILTER.LOCALSERVICENAME.MATCHES} | Filter of discoverable discovered services on local node. |
.* |
{$CONSUL.LLD.FILTER.LOCALSERVICENAME.NOT_MATCHES} | Filter to exclude discovered services on local node. |
CHANGE IF NEEDED |
{$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.MATCHES} | Filter of discoverable discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'. |
.* |
{$CONSUL.LLD.FILTER.SERVICENAMESPACE.NOTMATCHES} | Filter to exclude discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'. |
CHANGE IF NEEDED |
{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN} | Maximum acceptable value of node's health score for WARNING trigger expression. |
2 |
{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH} | Maximum acceptable value of node's health score for AVERAGE trigger expression. |
4 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get instance metrics | Get raw metrics from Consul instance /metrics endpoint. |
HTTP agent | consul.get_metrics Preprocessing
|
Get node info | Get configuration and member information of the local agent. |
HTTP agent | consul.getnodeinfo Preprocessing
|
Role | Role of current Consul agent. |
Dependent item | consul.role Preprocessing
|
Version | Version of Consul agent. |
Dependent item | consul.version Preprocessing
|
Number of services | Number of services on current node. |
Dependent item | consul.services_number Preprocessing
|
Number of checks | Number of checks on current node. |
Dependent item | consul.checks_number Preprocessing
|
Number of check monitors | Number of check monitors on current node. |
Dependent item | consul.checkmonitorsnumber Preprocessing
|
Process CPU seconds, total | Total user and system CPU time spent in seconds. |
Dependent item | consul.cpusecondstotal.rate Preprocessing
|
Virtual memory size | Virtual memory size in bytes. |
Dependent item | consul.virtualmemorybytes Preprocessing
|
RSS memory usage | Resident memory size in bytes. |
Dependent item | consul.residentmemorybytes Preprocessing
|
Goroutine count | The number of Goroutines on Consul instance. |
Dependent item | consul.goroutines Preprocessing
|
Open file descriptors | Number of open file descriptors. |
Dependent item | consul.processopenfds Preprocessing
|
Open file descriptors, max | Maximum number of open file descriptors. |
Dependent item | consul.processmaxfds Preprocessing
|
Client RPC, per second | Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server. This gives a measure of how much a given agent is loading the Consul servers. This is only generated by agents in client mode, not Consul servers. |
Dependent item | consul.client_rpc Preprocessing
|
Client RPC failed ,per second | Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server and fails. |
Dependent item | consul.clientrpcfailed Preprocessing
|
TCP connections, accepted per second | This metric counts the number of times a Consul agent has accepted an incoming TCP stream connection per second. |
Dependent item | consul.memberlist.tcp_accept Preprocessing
|
TCP connections, per second | This metric counts the number of times a Consul agent has initiated a push/pull sync with an other agent per second. |
Dependent item | consul.memberlist.tcp_connect Preprocessing
|
TCP send bytes, per second | This metric measures the total number of bytes sent by a Consul agent through the TCP protocol per second. |
Dependent item | consul.memberlist.tcp_sent Preprocessing
|
UDP received bytes, per second | This metric measures the total number of bytes received by a Consul agent through the UDP protocol per second. |
Dependent item | consul.memberlist.udp_received Preprocessing
|
UDP sent bytes, per second | This metric measures the total number of bytes sent by a Consul agent through the UDP protocol per second. |
Dependent item | consul.memberlist.udp_sent Preprocessing
|
GC pause, p90 | The 90 percentile for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds. |
Dependent item | consul.gc_pause.p90 Preprocessing
|
GC pause, p50 | The 50 percentile (median) for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds. |
Dependent item | consul.gc_pause.p50 Preprocessing
|
Memberlist: degraded | This metric counts the number of times the Consul agent has performed failure detection on another agent at a slower probe rate. The agent uses its own health metric as an indicator to perform this action. If its health score is low, it means that the node is healthy, and vice versa. |
Dependent item | consul.memberlist.degraded Preprocessing
|
Memberlist: health score | This metric describes a node's perception of its own health based on how well it is meeting the soft real-time requirements of the protocol. This metric ranges from 0 to 8, where 0 indicates "totally healthy". |
Dependent item | consul.memberlist.health_score Preprocessing
|
Memberlist: gossip, p90 | The 90 percentile for the number of gossips (messages) broadcasted to a set of randomly selected nodes. |
Dependent item | consul.memberlist.dispatch_log.p90 Preprocessing
|
Memberlist: gossip, p50 | The 50 for the number of gossips (messages) broadcasted to a set of randomly selected nodes. |
Dependent item | consul.memberlist.gossip.p50 Preprocessing
|
Memberlist: msg alive | This metric counts the number of alive Consul agents, that the agent has mapped out so far, based on the message information given by the network layer. |
Dependent item | consul.memberlist.msg.alive Preprocessing
|
Memberlist: msg dead | This metric counts the number of times a Consul agent has marked another agent to be a dead node. |
Dependent item | consul.memberlist.msg.dead Preprocessing
|
Memberlist: msg suspect | The number of times a Consul agent suspects another as failed while probing during gossip protocol. |
Dependent item | consul.memberlist.msg.suspect Preprocessing
|
Memberlist: probe node, p90 | The 90 percentile for the time taken to perform a single round of failure detection on a select Consul agent. |
Dependent item | consul.memberlist.probe_node.p90 Preprocessing
|
Memberlist: probe node, p50 | The 50 percentile (median) for the time taken to perform a single round of failure detection on a select Consul agent. |
Dependent item | consul.memberlist.probe_node.p50 Preprocessing
|
Memberlist: push pull node, p90 | The 90 percentile for the number of Consul agents that have exchanged state with this agent. |
Dependent item | consul.memberlist.pushpullnode.p90 Preprocessing
|
Memberlist: push pull node, p50 | The 50 percentile (median) for the number of Consul agents that have exchanged state with this agent. |
Dependent item | consul.memberlist.pushpullnode.p50 Preprocessing
|
KV store: apply, p90 | The 90 percentile for the time it takes to complete an update to the KV store. |
Dependent item | consul.kvs.apply.p90 Preprocessing
|
KV store: apply, p50 | The 50 percentile (median) for the time it takes to complete an update to the KV store. |
Dependent item | consul.kvs.apply.p50 Preprocessing
|
KV store: apply, rate | The number of updates to the KV store per second. |
Dependent item | consul.kvs.apply.rate Preprocessing
|
Serf member: flap, rate | Increments when an agent is marked dead and then recovers within a short time period. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports. Shown as events per second. |
Dependent item | consul.serf.member.flap.rate Preprocessing
|
Serf member: failed, rate | Increments when an agent is marked dead. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports. Shown as events per second. |
Dependent item | consul.serf.member.failed.rate Preprocessing
|
Serf member: join, rate | Increments when an agent joins the cluster. If an agent flapped or failed this counter also increments when it re-joins. Shown as events per second. |
Dependent item | consul.serf.member.join.rate Preprocessing
|
Serf member: left, rate | Increments when an agent leaves the cluster. Shown as events per second. |
Dependent item | consul.serf.member.left.rate Preprocessing
|
Serf member: update, rate | Increments when a Consul agent updates. Shown as events per second. |
Dependent item | consul.serf.member.update.rate Preprocessing
|
ACL: resolves, rate | The number of ACL resolves per second. |
Dependent item | consul.acl.resolves.rate Preprocessing
|
Catalog: register, rate | The number of catalog register operation per second. |
Dependent item | consul.catalog.register.rate Preprocessing
|
Catalog: deregister, rate | The number of catalog deregister operation per second. |
Dependent item | consul.catalog.deregister.rate Preprocessing
|
Snapshot: append line, p90 | The 90 percentile for the time taken by the Consul agent to append an entry into the existing log. |
Dependent item | consul.snapshot.append_line.p90 Preprocessing
|
Snapshot: append line, p50 | The 50 percentile (median) for the time taken by the Consul agent to append an entry into the existing log. |
Dependent item | consul.snapshot.append_line.p50 Preprocessing
|
Snapshot: append line, rate | The number of snapshot appendLine operations per second. |
Dependent item | consul.snapshot.append_line.rate Preprocessing
|
Snapshot: compact, p90 | The 90 percentile for the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction. |
Dependent item | consul.snapshot.compact.p90 Preprocessing
|
Snapshot: compact, p50 | The 50 percentile (median) for the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction. |
Dependent item | consul.snapshot.compact.p50 Preprocessing
|
Snapshot: compact, rate | The number of snapshot compact operations per second. |
Dependent item | consul.snapshot.compact.rate Preprocessing
|
Get local services | Get all the services that are registered with the local agent and their status. |
Script | consul.getlocalservices |
Get local services check | Data collection check. |
Dependent item | consul.getlocalservices.check Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HashiCorp Consul Node: Version has been changed | Consul version has changed. Acknowledge to close the problem manually. |
last(/HashiCorp Consul Node by HTTP/consul.version,#1)<>last(/HashiCorp Consul Node by HTTP/consul.version,#2) and length(last(/HashiCorp Consul Node by HTTP/consul.version))>0 |Info |
Manual close: Yes | |
HashiCorp Consul Node: Current number of open files is too high | "Heavy file descriptor usage (i.e., near the process’s file descriptor limit) indicates a potential file descriptor exhaustion issue." |
min(/HashiCorp Consul Node by HTTP/consul.process_open_fds,5m)/last(/HashiCorp Consul Node by HTTP/consul.process_max_fds)*100>{$CONSUL.OPEN.FDS.MAX.WARN} |Warning |
||
HashiCorp Consul Node: Node's health score is warning | This metric ranges from 0 to 8, where 0 indicates "totally healthy". |
max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN} |Warning |
Depends on:
|
|
HashiCorp Consul Node: Node's health score is critical | This metric ranges from 0 to 8, where 0 indicates "totally healthy". |
max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH} |Average |
||
HashiCorp Consul Node: Failed to get local services | Failed to get local services. Check debug log for more information. |
length(last(/HashiCorp Consul Node by HTTP/consul.get_local_services.check))>0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Local node services discovery | Discover metrics for services that are registered with the local agent. |
Dependent item | consul.nodeserviceslld Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
["{#SERVICE_NAME}"]: Aggregated status | Aggregated values of all health checks for the service instance. |
Dependent item | consul.service.aggregatedstate["{#SERVICEID}"] Preprocessing
|
["{#SERVICENAME}"]: Check ["{#SERVICECHECK_NAME}"]: Status | Current state of health check for the service. |
Dependent item | consul.service.check.state["{#SERVICEID}/{#SERVICECHECK_ID}"] Preprocessing
|
["{#SERVICENAME}"]: Check ["{#SERVICECHECK_NAME}"]: Output | Current output of health check for the service. |
Dependent item | consul.service.check.output["{#SERVICEID}/{#SERVICECHECK_ID}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HashiCorp Consul Node: Aggregated status is 'warning' | Aggregated state of service on the local agent is 'warning'. |
last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 1 |Warning |
||
HashiCorp Consul Node: Aggregated status is 'critical' | Aggregated state of service on the local agent is 'critical'. |
last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 2 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
HTTP API methods discovery | Discovery HTTP API methods specific metrics. |
Dependent item | consul.httpapidiscovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
HTTP request: ["{#HTTP_METHOD}"], p90 | The 90 percentile of how long it takes to service the given HTTP request for the given verb. |
Dependent item | consul.http.api.p90["{#HTTP_METHOD}"] Preprocessing
|
HTTP request: ["{#HTTP_METHOD}"], p50 | The 50 percentile (median) of how long it takes to service the given HTTP request for the given verb. |
Dependent item | consul.http.api.p50["{#HTTP_METHOD}"] Preprocessing
|
HTTP request: ["{#HTTP_METHOD}"], rate | The number of HTTP request for the given verb per second. |
Dependent item | consul.http.api.rate["{#HTTP_METHOD}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Raft server metrics discovery | Discover raft metrics for server nodes. |
Dependent item | consul.raft.server.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Raft state | Current state of Consul agent. |
Dependent item | consul.raft.state[{#SINGLETON}] Preprocessing
|
Raft state: leader | Increments when a server becomes a leader. |
Dependent item | consul.raft.state_leader[{#SINGLETON}] Preprocessing
|
Raft state: candidate | The number of initiated leader elections. |
Dependent item | consul.raft.state_candidate[{#SINGLETON}] Preprocessing
|
Raft: apply, rate | Incremented whenever a leader first passes a message into the Raft commit process (called an Apply operation). This metric describes the arrival rate of new logs into Raft per second. |
Dependent item | consul.raft.apply.rate[{#SINGLETON}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Raft leader metrics discovery | Discover raft metrics for leader nodes. |
Dependent item | consul.raft.leader.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Raft state: leader last contact, p90 | The 90 percentile of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds. |
Dependent item | consul.raft.leaderlastcontact.p90[{#SINGLETON}] Preprocessing
|
Raft state: leader last contact, p50 | The 50 percentile (median) of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds. |
Dependent item | consul.raft.leaderlastcontact.p50[{#SINGLETON}] Preprocessing
|
Raft state: commit time, p90 | The 90 percentile time it takes to commit a new entry to the raft log on the leader, in milliseconds. |
Dependent item | consul.raft.commit_time.p90[{#SINGLETON}] Preprocessing
|
Raft state: commit time, p50 | The 50 percentile (median) time it takes to commit a new entry to the raft log on the leader, in milliseconds. |
Dependent item | consul.raft.commit_time.p50[{#SINGLETON}] Preprocessing
|
Raft state: dispatch log, p90 | The 90 percentile time it takes for the leader to write log entries to disk, in milliseconds. |
Dependent item | consul.raft.dispatch_log.p90[{#SINGLETON}] Preprocessing
|
Raft state: dispatch log, p50 | The 50 percentile (median) time it takes for the leader to write log entries to disk, in milliseconds. |
Dependent item | consul.raft.dispatch_log.p50[{#SINGLETON}] Preprocessing
|
Raft state: dispatch log, rate | The number of times a Raft leader writes a log to disk per second. |
Dependent item | consul.raft.dispatch_log.rate[{#SINGLETON}] Preprocessing
|
Raft state: commit, rate | The number of commits a new entry to the Raft log on the leader per second. |
Dependent item | consul.raft.commit_time.rate[{#SINGLETON}] Preprocessing
|
Autopilot healthy | Tracks the overall health of the local server cluster. 1 if all servers are healthy, 0 if one or more are unhealthy. |
Dependent item | consul.autopilot.healthy[{#SINGLETON}] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Cloudflare monitoring by Zabbix via HTTP and doesn't require any external scripts.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
1. Create a host, for example mywebsite.com, for a site in your Cloudflare account.
2. Link the template to the host.
3. Customize the values of {$CLOUDFLARE.API.TOKEN}, {$CLOUDFLARE.ZONE_ID} macros.
Cloudflare API Tokens are available in your Cloudflare account under My Profile > API Tokens.
Zone ID is available in your Cloudflare account under Account Home > Site.
Name | Description | Default |
---|---|---|
{$CLOUDFLARE.API.URL} | The URL of Cloudflare API endpoint. |
https://api.cloudflare.com/client/v4 |
{$CLOUDFLARE.API.TOKEN} | Your Cloudflare API Token. |
<change> |
{$CLOUDFLARE.ZONE_ID} | Your Cloudflare Site Zone ID. |
<change> |
{$CLOUDFLARE.ERRORS.MAX.WARN} | Maximum responses with errors in %. |
30 |
{$CLOUDFLARE.CACHED_BANDWIDTH.MIN.WARN} | Minimum of cached bandwidth in %. |
50 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Total bandwidth | The volume of all data. |
Dependent item | cloudflare.bandwidth.all Preprocessing
|
Cached bandwidth | The volume of cached data. |
Dependent item | cloudflare.bandwidth.cached Preprocessing
|
Uncached bandwidth | The volume of uncached data. |
Dependent item | cloudflare.bandwidth.uncached Preprocessing
|
Cache hit ratio of bandwidth | The ratio of the amount cached bandwidth to the bandwidth in percentage. |
Dependent item | cloudflare.bandwidth.cachehitratio Preprocessing
|
SSL encrypted bandwidth | The volume of encrypted data. |
Dependent item | cloudflare.bandwidth.ssl.encrypted Preprocessing
|
Unencrypted bandwidth | The volume of unencrypted data. |
Dependent item | cloudflare.bandwidth.ssl.unencrypted Preprocessing
|
DNS queries | The amount of all DNS queries. |
Dependent item | cloudflare.dns.query.all Preprocessing
|
Stale DNS queries | The number of stale DNS queries. |
Dependent item | cloudflare.dns.query.stale Preprocessing
|
Uncached DNS queries | The number of uncached DNS queries. |
Dependent item | cloudflare.dns.query.uncached Preprocessing
|
Get data | The JSON with result of Cloudflare API request. |
Script | cloudflare.get |
Total page views | The amount of all pageviews. |
Dependent item | cloudflare.pageviews.all Preprocessing
|
Total requests | The amount of all requests. |
Dependent item | cloudflare.requests.all Preprocessing
|
Cached requests | Dependent item | cloudflare.requests.cached Preprocessing
|
|
Uncached requests | The number of uncached requests. |
Dependent item | cloudflare.requests.uncached Preprocessing
|
Cache hit ratio % over time | The ratio of the amount cached requests to all requests in percentage. |
Dependent item | cloudflare.requests.cachehitratio Preprocessing
|
Response codes 1xx | The number requests with 1xx response codes. |
Dependent item | cloudflare.requests.response_100 Preprocessing
|
Response codes 2xx | The number requests with 2xx response codes. |
Dependent item | cloudflare.requests.response_200 Preprocessing
|
Response codes 3xx | The number requests with 3xx response codes. |
Dependent item | cloudflare.requests.response_300 Preprocessing
|
Response codes 4xx | The number requests with 4xx response codes. |
Dependent item | cloudflare.requests.response_400 Preprocessing
|
Response codes 5xx | The number requests with 5xx response codes. |
Dependent item | cloudflare.requests.response_500 Preprocessing
|
Non-2xx responses ratio | The ratio of the amount requests with non-2xx response codes to all requests in percentage. |
Dependent item | cloudflare.requests.others_ratio Preprocessing
|
2xx responses ratio | The ratio of the amount requests with 2xx response codes to all requests in percentage. |
Dependent item | cloudflare.requests.success_ratio Preprocessing
|
SSL encrypted requests | The number of encrypted requests. |
Dependent item | cloudflare.requests.ssl.encrypted Preprocessing
|
Unencrypted requests | The number of unencrypted requests. |
Dependent item | cloudflare.requests.ssl.unencrypted Preprocessing
|
Total threats | The number of all threats. |
Dependent item | cloudflare.threats.all Preprocessing
|
Unique visitors | The number of all visitors IPs. |
Dependent item | cloudflare.uniques.all Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Cloudflare: Cached bandwidth is too low | max(/Cloudflare by HTTP/cloudflare.bandwidth.cache_hit_ratio,#3) < {$CLOUDFLARE.CACHED_BANDWIDTH.MIN.WARN} |Warning |
|||
Cloudflare: Ratio of non-2xx responses is too high | A large number of errors can indicate a malfunction of the site. |
min(/Cloudflare by HTTP/cloudflare.requests.others_ratio,#3) > {$CLOUDFLARE.ERRORS.MAX.WARN} |Average |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor TLS/SSL certificate on the website by Zabbix agent 2 that works without any external scripts. Zabbix agent 2 with the WebCertificate plugin requests certificate using the web.certificate.get key and returns JSON with certificate attributes.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
1. Setup and configure zabbix-agent2 with the WebCertificate plugin.
2. Test availability: zabbix_get -s <zabbix_agent_addr> -k web.certificate.get[<website_DNS_name>]
3. Create a host for the TLS/SSL certificate with Zabbix agent interface.
4. Link the template to the host.
5. Customize the value of {$CERT.WEBSITE.HOSTNAME} macro.
Name | Description | Default |
---|---|---|
{$CERT.EXPIRY.WARN} | Number of days until the certificate expires. |
7 |
{$CERT.WEBSITE.HOSTNAME} | The website DNS name for the connection. |
<Put DNS name> |
{$CERT.WEBSITE.PORT} | The TLS/SSL port number of the website. |
443 |
{$CERT.WEBSITE.IP} | The website IP address for the connection. |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get | Returns the JSON with attributes of a certificate of the requested site. |
Zabbix agent (active) | web.certificate.get[{$CERT.WEBSITE.HOSTNAME},{$CERT.WEBSITE.PORT},{$CERT.WEBSITE.IP}] Preprocessing
|
Validation result | The certificate validation result. Possible values: valid/invalid/valid-but-self-signed |
Dependent item | cert.validation Preprocessing
|
Last validation status | Last check result message. |
Dependent item | cert.message Preprocessing
|
Version | The version of the encoded certificate. |
Dependent item | cert.version Preprocessing
|
Serial number | The serial number is a positive integer assigned by the CA to each certificate. It is unique for each certificate issued by a given CA. Non-conforming CAs may issue certificates with serial numbers that are negative or zero. |
Dependent item | cert.serial_number Preprocessing
|
Signature algorithm | The algorithm identifier for the algorithm used by the CA to sign the certificate. |
Dependent item | cert.signature_algorithm Preprocessing
|
Issuer | The field identifies the entity that has signed and issued the certificate. |
Dependent item | cert.issuer Preprocessing
|
Valid from | The date on which the certificate validity period begins. |
Dependent item | cert.not_before Preprocessing
|
Expires on | The date on which the certificate validity period ends. |
Dependent item | cert.not_after Preprocessing
|
Subject | The field identifies the entity associated with the public key stored in the subject public key field. |
Dependent item | cert.subject Preprocessing
|
Subject alternative name | The subject alternative name extension allows identities to be bound to the subject of the certificate. These identities may be included in addition to or in place of the identity in the subject field of the certificate. Defined options include an Internet electronic mail address, a DNS name, an IP address, and a Uniform Resource Identifier (URI). |
Dependent item | cert.alternative_names Preprocessing
|
Public key algorithm | The digital signature algorithm is used to verify the signature of a certificate. |
Dependent item | cert.publickeyalgorithm Preprocessing
|
Fingerprint | The Certificate Signature (SHA1 Fingerprint or Thumbprint) is the hash of the entire certificate in DER form. |
Dependent item | cert.sha1_fingerprint Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Certificate: SSL certificate is invalid | SSL certificate has expired or it is issued for another domain. |
find(/Website certificate by Zabbix agent 2 active/cert.validation,,"like","invalid")=1 |High |
||
Certificate: SSL certificate expires soon | The SSL certificate should be updated or it will become untrusted. |
(last(/Website certificate by Zabbix agent 2 active/cert.not_after) - now()) / 86400 < {$CERT.EXPIRY.WARN} |Warning |
Depends on:
|
|
Certificate: Fingerprint has changed | The SSL certificate fingerprint has changed. If you did not update the certificate, it may mean your certificate has been hacked. Acknowledge to close the problem manually. |
last(/Website certificate by Zabbix agent 2 active/cert.sha1_fingerprint) <> last(/Website certificate by Zabbix agent 2 active/cert.sha1_fingerprint,#2) |Info |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor TLS/SSL certificate on the website by Zabbix agent 2 that works without any external scripts. Zabbix agent 2 with the WebCertificate plugin requests certificate using the web.certificate.get key and returns JSON with certificate attributes.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
1. Setup and configure zabbix-agent2 with the WebCertificate plugin.
2. Test availability: zabbix_get -s <zabbix_agent_addr> -k web.certificate.get[<website_DNS_name>]
3. Create a host for the TLS/SSL certificate with Zabbix agent interface.
4. Link the template to the host.
5. Customize the value of {$CERT.WEBSITE.HOSTNAME} macro.
Name | Description | Default |
---|---|---|
{$CERT.EXPIRY.WARN} | Number of days until the certificate expires. |
7 |
{$CERT.WEBSITE.HOSTNAME} | The website DNS name for the connection. |
<Put DNS name> |
{$CERT.WEBSITE.PORT} | The TLS/SSL port number of the website. |
443 |
{$CERT.WEBSITE.IP} | The website IP address for the connection. |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get | Returns the JSON with attributes of a certificate of the requested site. |
Zabbix agent | web.certificate.get[{$CERT.WEBSITE.HOSTNAME},{$CERT.WEBSITE.PORT},{$CERT.WEBSITE.IP}] Preprocessing
|
Validation result | The certificate validation result. Possible values: valid/invalid/valid-but-self-signed |
Dependent item | cert.validation Preprocessing
|
Last validation status | Last check result message. |
Dependent item | cert.message Preprocessing
|
Version | The version of the encoded certificate. |
Dependent item | cert.version Preprocessing
|
Serial number | The serial number is a positive integer assigned by the CA to each certificate. It is unique for each certificate issued by a given CA. Non-conforming CAs may issue certificates with serial numbers that are negative or zero. |
Dependent item | cert.serial_number Preprocessing
|
Signature algorithm | The algorithm identifier for the algorithm used by the CA to sign the certificate. |
Dependent item | cert.signature_algorithm Preprocessing
|
Issuer | The field identifies the entity that has signed and issued the certificate. |
Dependent item | cert.issuer Preprocessing
|
Valid from | The date on which the certificate validity period begins. |
Dependent item | cert.not_before Preprocessing
|
Expires on | The date on which the certificate validity period ends. |
Dependent item | cert.not_after Preprocessing
|
Subject | The field identifies the entity associated with the public key stored in the subject public key field. |
Dependent item | cert.subject Preprocessing
|
Subject alternative name | The subject alternative name extension allows identities to be bound to the subject of the certificate. These identities may be included in addition to or in place of the identity in the subject field of the certificate. Defined options include an Internet electronic mail address, a DNS name, an IP address, and a Uniform Resource Identifier (URI). |
Dependent item | cert.alternative_names Preprocessing
|
Public key algorithm | The digital signature algorithm is used to verify the signature of a certificate. |
Dependent item | cert.publickeyalgorithm Preprocessing
|
Fingerprint | The Certificate Signature (SHA1 Fingerprint or Thumbprint) is the hash of the entire certificate in DER form. |
Dependent item | cert.sha1_fingerprint Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Certificate: SSL certificate is invalid | SSL certificate has expired or it is issued for another domain. |
find(/Website certificate by Zabbix agent 2/cert.validation,,"like","invalid")=1 |High |
||
Certificate: SSL certificate expires soon | The SSL certificate should be updated or it will become untrusted. |
(last(/Website certificate by Zabbix agent 2/cert.not_after) - now()) / 86400 < {$CERT.EXPIRY.WARN} |Warning |
Depends on:
|
|
Certificate: Fingerprint has changed | The SSL certificate fingerprint has changed. If you did not update the certificate, it may mean your certificate has been hacked. Acknowledge to close the problem manually. |
last(/Website certificate by Zabbix agent 2/cert.sha1_fingerprint) <> last(/Website certificate by Zabbix agent 2/cert.sha1_fingerprint,#2) |Info |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Ceph cluster by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Ceph by Zabbix agent 2
— collects metrics by polling zabbix-agent2.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Test availability: zabbix_get -s ceph-host -k ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Name | Description | Default |
---|---|---|
{$CEPH.USER} | zabbix |
|
{$CEPH.API.KEY} | zabbix_pass |
|
{$CEPH.CONNSTRING} | https://localhost:8003 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get overall cluster status | Zabbix agent | ceph.status["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] | |
Get OSD stats | Zabbix agent | ceph.osd.stats["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] | |
Get OSD dump | Zabbix agent | ceph.osd.dump["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] | |
Get df | Zabbix agent | ceph.df.details["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] | |
Ping | Zabbix agent | ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] Preprocessing
|
|
Number of Monitors | The number of Monitors configured in a Ceph cluster. |
Dependent item | ceph.num_mon Preprocessing
|
Overall cluster status | The overall Ceph cluster status, eg 0 - HEALTHOK, 1 - HEALTHWARN or 2 - HEALTH_ERR. |
Dependent item | ceph.overall_status Preprocessing
|
Minimum Mon release version | minmonrelease_name |
Dependent item | ceph.minmonrelease_name Preprocessing
|
Ceph Read bandwidth | The global read bytes per second. |
Dependent item | ceph.rd_bytes.rate Preprocessing
|
Ceph Write bandwidth | The global write bytes per second. |
Dependent item | ceph.wr_bytes.rate Preprocessing
|
Ceph Read operations per sec | The global read operations per second. |
Dependent item | ceph.rd_ops.rate Preprocessing
|
Ceph Write operations per sec | The global write operations per second. |
Dependent item | ceph.wr_ops.rate Preprocessing
|
Total bytes available | The total bytes available in a Ceph cluster. |
Dependent item | ceph.totalavailbytes Preprocessing
|
Total bytes | The total (RAW) capacity of a Ceph cluster in bytes. |
Dependent item | ceph.total_bytes Preprocessing
|
Total bytes used | The total bytes used in a Ceph cluster. |
Dependent item | ceph.totalusedbytes Preprocessing
|
Total number of objects | The total number of objects in a Ceph cluster. |
Dependent item | ceph.total_objects Preprocessing
|
Number of Placement Groups | The total number of Placement Groups in a Ceph cluster. |
Dependent item | ceph.num_pg Preprocessing
|
Number of Placement Groups in Temporary state | The total number of Placement Groups in a pg_temp state |
Dependent item | ceph.numpgtemp Preprocessing
|
Number of Placement Groups in Active state | The total number of Placement Groups in an active state. |
Dependent item | ceph.pg_states.active Preprocessing
|
Number of Placement Groups in Clean state | The total number of Placement Groups in a clean state. |
Dependent item | ceph.pg_states.clean Preprocessing
|
Number of Placement Groups in Peering state | The total number of Placement Groups in a peering state. |
Dependent item | ceph.pg_states.peering Preprocessing
|
Number of Placement Groups in Scrubbing state | The total number of Placement Groups in a scrubbing state. |
Dependent item | ceph.pg_states.scrubbing Preprocessing
|
Number of Placement Groups in Undersized state | The total number of Placement Groups in an undersized state. |
Dependent item | ceph.pg_states.undersized Preprocessing
|
Number of Placement Groups in Backfilling state | The total number of Placement Groups in a backfill state. |
Dependent item | ceph.pg_states.backfilling Preprocessing
|
Number of Placement Groups in degraded state | The total number of Placement Groups in a degraded state. |
Dependent item | ceph.pg_states.degraded Preprocessing
|
Number of Placement Groups in inconsistent state | The total number of Placement Groups in an inconsistent state. |
Dependent item | ceph.pg_states.inconsistent Preprocessing
|
Number of Placement Groups in Unknown state | The total number of Placement Groups in an unknown state. |
Dependent item | ceph.pg_states.unknown Preprocessing
|
Number of Placement Groups in remapped state | The total number of Placement Groups in a remapped state. |
Dependent item | ceph.pg_states.remapped Preprocessing
|
Number of Placement Groups in recovering state | The total number of Placement Groups in a recovering state. |
Dependent item | ceph.pg_states.recovering Preprocessing
|
Number of Placement Groups in backfill_toofull state | The total number of Placement Groups in a backfill_toofull state. |
Dependent item | ceph.pgstates.backfilltoofull Preprocessing
|
Number of Placement Groups in backfill_wait state | The total number of Placement Groups in a backfill_wait state. |
Dependent item | ceph.pgstates.backfillwait Preprocessing
|
Number of Placement Groups in recovery_wait state | The total number of Placement Groups in a recovery_wait state. |
Dependent item | ceph.pgstates.recoverywait Preprocessing
|
Number of Pools | The total number of pools in a Ceph cluster. |
Dependent item | ceph.num_pools Preprocessing
|
Number of OSDs | The number of the known storage daemons in a Ceph cluster. |
Dependent item | ceph.num_osd Preprocessing
|
Number of OSDs in state: UP | The total number of the online storage daemons in a Ceph cluster. |
Dependent item | ceph.numosdup Preprocessing
|
Number of OSDs in state: IN | The total number of the participating storage daemons in a Ceph cluster. |
Dependent item | ceph.numosdin Preprocessing
|
Ceph OSD avg fill | The average fill of OSDs. |
Dependent item | ceph.osd_fill.avg Preprocessing
|
Ceph OSD max fill | The percentage of the most filled OSD. |
Dependent item | ceph.osd_fill.max Preprocessing
|
Ceph OSD min fill | The percentage fill of the minimum filled OSD. |
Dependent item | ceph.osd_fill.min Preprocessing
|
Ceph OSD max PGs | The maximum amount of Placement Groups on OSDs. |
Dependent item | ceph.osd_pgs.max Preprocessing
|
Ceph OSD min PGs | The minimum amount of Placement Groups on OSDs. |
Dependent item | ceph.osd_pgs.min Preprocessing
|
Ceph OSD avg PGs | The average amount of Placement Groups on OSDs. |
Dependent item | ceph.osd_pgs.avg Preprocessing
|
Ceph OSD Apply latency Avg | The average apply latency of OSDs. |
Dependent item | ceph.osdlatencyapply.avg Preprocessing
|
Ceph OSD Apply latency Max | The maximum apply latency of OSDs. |
Dependent item | ceph.osdlatencyapply.max Preprocessing
|
Ceph OSD Apply latency Min | The minimum apply latency of OSDs. |
Dependent item | ceph.osdlatencyapply.min Preprocessing
|
Ceph OSD Commit latency Avg | The average commit latency of OSDs. |
Dependent item | ceph.osdlatencycommit.avg Preprocessing
|
Ceph OSD Commit latency Max | The maximum commit latency of OSDs. |
Dependent item | ceph.osdlatencycommit.max Preprocessing
|
Ceph OSD Commit latency Min | The minimum commit latency of OSDs. |
Dependent item | ceph.osdlatencycommit.min Preprocessing
|
Ceph backfill full ratio | The backfill full ratio setting of the Ceph cluster as configured on OSDMap. |
Dependent item | ceph.osdbackfillfullratio Preprocessing
|
Ceph full ratio | The full ratio setting of the Ceph cluster as configured on OSDMap. |
Dependent item | ceph.osdfullratio Preprocessing
|
Ceph nearfull ratio | The near full ratio setting of the Ceph cluster as configured on OSDMap. |
Dependent item | ceph.osdnearfullratio Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Ceph: Can not connect to cluster | The connection to the Ceph RESTful module is broken (if there is any error presented including AUTH and the configuration issues). |
last(/Ceph by Zabbix agent 2/ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"])=0 |Average |
||
Ceph: Cluster in ERROR state | last(/Ceph by Zabbix agent 2/ceph.overall_status)=2 |Average |
Manual close: Yes | ||
Ceph: Cluster in WARNING state | last(/Ceph by Zabbix agent 2/ceph.overall_status)=1 |Warning |
Manual close: Yes Depends on:
|
||
Ceph: Minimum monitor release version has changed | A Ceph version has changed. Acknowledge to close the problem manually. |
last(/Ceph by Zabbix agent 2/ceph.min_mon_release_name,#1)<>last(/Ceph by Zabbix agent 2/ceph.min_mon_release_name,#2) and length(last(/Ceph by Zabbix agent 2/ceph.min_mon_release_name))>0 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
OSD | Zabbix agent | ceph.osd.discovery["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
[osd.{#OSDNAME}] OSD in | Dependent item | ceph.osd[{#OSDNAME},in] Preprocessing
|
|
[osd.{#OSDNAME}] OSD up | Dependent item | ceph.osd[{#OSDNAME},up] Preprocessing
|
|
[osd.{#OSDNAME}] OSD PGs | Dependent item | ceph.osd[{#OSDNAME},num_pgs] Preprocessing
|
|
[osd.{#OSDNAME}] OSD fill | Dependent item | ceph.osd[{#OSDNAME},fill] Preprocessing
|
|
[osd.{#OSDNAME}] OSD latency apply | The time taken to flush an update to disks. |
Dependent item | ceph.osd[{#OSDNAME},latency_apply] Preprocessing
|
[osd.{#OSDNAME}] OSD latency commit | The time taken to commit an operation to the journal. |
Dependent item | ceph.osd[{#OSDNAME},latency_commit] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Ceph: OSD osd.{#OSDNAME} is down | OSD osd.{#OSDNAME} is marked "down" in the osdmap. |
last(/Ceph by Zabbix agent 2/ceph.osd[{#OSDNAME},up]) = 0 |Average |
||
Ceph: OSD osd.{#OSDNAME} is full | min(/Ceph by Zabbix agent 2/ceph.osd[{#OSDNAME},fill],15m) > last(/Ceph by Zabbix agent 2/ceph.osd_full_ratio)*100 |Average |
|||
Ceph: Ceph OSD osd.{#OSDNAME} is near full | min(/Ceph by Zabbix agent 2/ceph.osd[{#OSDNAME},fill],15m) > last(/Ceph by Zabbix agent 2/ceph.osd_nearfull_ratio)*100 |Warning |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Pool | Zabbix agent | ceph.pool.discovery["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
[{#POOLNAME}] Pool Used | The total bytes used in a pool. |
Dependent item | ceph.pool["{#POOLNAME}",bytes_used] Preprocessing
|
[{#POOLNAME}] Max available | The maximum available space in the given pool. |
Dependent item | ceph.pool["{#POOLNAME}",max_avail] Preprocessing
|
[{#POOLNAME}] Pool RAW Used | Bytes used in pool including the copies made. |
Dependent item | ceph.pool["{#POOLNAME}",stored_raw] Preprocessing
|
[{#POOLNAME}] Pool Percent Used | The percentage of the storage used per pool. |
Dependent item | ceph.pool["{#POOLNAME}",percent_used] Preprocessing
|
[{#POOLNAME}] Pool objects | The number of objects in the pool. |
Dependent item | ceph.pool["{#POOLNAME}",objects] Preprocessing
|
[{#POOLNAME}] Pool Read bandwidth | The read rate per pool (bytes per second). |
Dependent item | ceph.pool["{#POOLNAME}",rd_bytes.rate] Preprocessing
|
[{#POOLNAME}] Pool Write bandwidth | The write rate per pool (bytes per second). |
Dependent item | ceph.pool["{#POOLNAME}",wr_bytes.rate] Preprocessing
|
[{#POOLNAME}] Pool Read operations | The read rate per pool (operations per second). |
Dependent item | ceph.pool["{#POOLNAME}",rd_ops.rate] Preprocessing
|
[{#POOLNAME}] Pool Write operations | The write rate per pool (operations per second). |
Dependent item | ceph.pool["{#POOLNAME}",wr_ops.rate] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Refer to the vendor documentation.
Name | Description | Default |
---|---|---|
{$ARANET.API.ENDPOINT} | Aranet Cloud API endpoint. |
https://aranet.cloud/api |
{$ARANET.API.USERNAME} | Aranet Cloud username. |
<PUT YOUR USERNAME> |
{$ARANET.API.PASSWORD} | Aranet Cloud password. |
<PUT YOUR PASSWORD> |
{$ARANET.API.SPACE_NAME} | Aranet Cloud organization name. |
<PUT YOUR SPACE NAME> |
{$ARANET.LLD.FILTER.SENSOR_NAME.MATCHES} | Filter of discoverable sensors by name. |
.+ |
{$ARANET.LLD.FILTER.SENSORNAME.NOTMATCHES} | Filter to exclude discoverable sensors by name. |
CHANGE_IF_NEEDED |
{$ARANET.LLD.FILTER.SENSOR_ID.MATCHES} | Filter of discoverable sensors by id. |
.+ |
{$ARANET.LLD.FILTER.GATEWAY_NAME.MATCHES} | Filter of discoverable sensors by gateway name. |
.+ |
{$ARANET.LLD.FILTER.GATEWAYNAME.NOTMATCHES} | Filter to exclude discoverable sensors by gateway name. |
CHANGE_IF_NEEDED |
{$ARANET.LLD.FILTER.GATEWAY_ID.MATCHES} | Filter of discoverable sensors by gateway id. |
.+ |
{$ARANET.BATT.VOLTAGE.MIN.WARN} | Battery voltage warning threshold. |
1 |
{$ARANET.BATT.VOLTAGE.MIN.CRIT} | Battery voltage critical threshold. |
2 |
{$ARANET.HUMIDITY.MIN.WARN} | Minimum humidity threshold. |
20 |
{$ARANET.HUMIDITY.MAX.WARN} | Maximum humidity threshold. |
70 |
{$ARANET.CO2.MAX.WARN} | CO2 warning threshold. |
600 |
{$ARANET.CO2.MAX.CRIT} | CO2 critical threshold. |
1000 |
{$ARANET.LAST_UPDATE.MAX.WARN} | Data update delay threshold. |
1h |
Name | Description | Type | Key and additional info |
---|---|---|---|
Sensors discovery | Discovery for Aranet Cloud sensors |
Dependent item | aranet.sensor.discovery Preprocessing
|
Get data | Script | aranet.get_data |
Name | Description | Type | Key and additional info |
---|---|---|---|
Temperature discovery | Discovery for Aranet Cloud temperature sensors |
Dependent item | aranet.temp.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.temp["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Humidity discovery | Discovery for Aranet Cloud humidity sensors |
Dependent item | aranet.humidity.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.humidity["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Aranet: {#METRIC}: Low humidity on "[{#GATEWAYNAME}] {#SENSORNAME}" | max(/Aranet Cloud/aranet.humidity["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) < {$ARANET.HUMIDITY.MIN.WARN:"{#SENSOR_NAME}"} |Warning |
Depends on:
|
||
Aranet: {#METRIC}: High humidity on "[{#GATEWAYNAME}] {#SENSORNAME}" | min(/Aranet Cloud/aranet.humidity["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) > {$ARANET.HUMIDITY.MAX.WARN:"{#SENSOR_NAME}"} |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
RSSI discovery | Discovery for Aranet Cloud RSSI sensors |
Dependent item | aranet.rssi.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.rssi["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Battery voltage discovery | Discovery for Aranet Cloud Battery voltage sensors |
Dependent item | aranet.battery.voltage.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.battery.voltage["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Aranet: {#METRIC}: Low battery voltage on "[{#GATEWAYNAME}] {#SENSORNAME}" | max(/Aranet Cloud/aranet.battery.voltage["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) < {$ARANET.BATT.VOLTAGE.MIN.WARN:"{#SENSOR_NAME}"} |Warning |
Depends on:
|
||
Aranet: {#METRIC}: Critically low battery voltage on "[{#GATEWAYNAME}] {#SENSORNAME}" | max(/Aranet Cloud/aranet.battery.voltage["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) < {$ARANET.BATT.VOLTAGE.MIN.CRIT:"{#SENSOR_NAME}"} |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
CO2 discovery | Discovery for Aranet Cloud CO2 sensors |
Dependent item | aranet.co2.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.co2["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Aranet: {#METRIC}: High CO2 level on "[{#GATEWAYNAME}] {#SENSORNAME}" | min(/Aranet Cloud/aranet.co2["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) > {$ARANET.CO2.MAX.WARN:"{#SENSOR_NAME}"} |Warning |
Depends on:
|
||
Aranet: {#METRIC}: Critically high CO2 level on "[{#GATEWAYNAME}] {#SENSORNAME}" | min(/Aranet Cloud/aranet.co2["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) > {$ARANET.CO2.MAX.CRIT:"{#SENSOR_NAME}"} |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Atmospheric pressure discovery | Discovery for Aranet Cloud atmospheric pressure sensors |
Dependent item | aranet.pressure.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.pressure["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Voltage discovery | Discovery for Aranet Cloud Voltage sensors |
Dependent item | aranet.voltage.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.voltage["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Weight discovery | Discovery for Aranet Cloud Weight sensors |
Dependent item | aranet.weight.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.weight["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Volumetric Water Content discovery | Discovery for Aranet Cloud Volumetric Water Content sensors |
Dependent item | aranet.volumwatercontent.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.volumetric.water.content["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
PPFD discovery | Discovery for Aranet Cloud PPFD sensors |
Dependent item | aranet.ppfd.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.ppfd["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Distance discovery | Discovery for Aranet Cloud Distance sensors |
Dependent item | aranet.distance.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.distance["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Illuminance discovery | Discovery for Aranet Cloud Illuminance sensors |
Dependent item | aranet.illuminance.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.illuminance["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
pH discovery | Discovery for Aranet Cloud pH sensors |
Dependent item | aranet.ph.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.ph["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Current discovery | Discovery for Aranet Cloud Current sensors |
Dependent item | aranet.current.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.current["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Soil Dielectric Permittivity discovery | Discovery for Aranet Cloud Soil Dielectric Permittivity sensors |
Dependent item | aranet.soildielectricperm.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.soildielectricperm["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Soil Electrical Conductivity discovery | Discovery for Aranet Cloud Soil Electrical Conductivity sensors |
Dependent item | aranet.soilelectriccond.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.soilelectriccond["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Pore Electrical Conductivity discovery | Discovery for Aranet Cloud Pore Electrical Conductivity sensors |
Dependent item | aranet.poreelectriccond.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.poreelectriccond["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Pulses discovery | Discovery for Aranet Cloud Pulses sensors |
Dependent item | aranet.pulses.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.pulses["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Pulses Cumulative discovery | Discovery for Aranet Cloud Pulses Cumulative sensors |
Dependent item | aranet.pulses_cumulative.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.pulsescumulative["{#GATEWAYID}", "{#SENSOR_ID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Differential Pressure discovery | Discovery for Aranet Cloud Differential Pressure sensors |
Dependent item | aranet.diff_pressure.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.diffpressure["{#GATEWAYID}", "{#SENSOR_ID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Last update discovery | Discovery for Aranet Cloud Last update metric |
Dependent item | aranet.last_update.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.lastupdate["{#GATEWAYID}", "{#SENSOR_ID}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Aranet: {#METRIC}: Sensor data "[{#GATEWAYNAME}] {#SENSORNAME}" is not updated | last(/Aranet Cloud/aranet.last_update["{#GATEWAY_ID}", "{#SENSOR_ID}"]) > {$ARANET.LAST_UPDATE.MAX.WARN:"{#SENSOR_NAME}"} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Apache monitoring by Zabbix via HTTP and doesn't require any external scripts.
The template collects metrics by polling mod_status
with HTTP agent remotely:
127.0.0.1
ServerVersion: Apache/2.4.41 (Unix)
ServerMPM: event
Server Built: Aug 14 2019 00:35:10
CurrentTime: Friday, 16-Aug-2019 12:38:40 UTC
RestartTime: Wednesday, 14-Aug-2019 07:58:26 UTC
ParentServerConfigGeneration: 1
ParentServerMPMGeneration: 0
ServerUptimeSeconds: 189613
ServerUptime: 2 days 4 hours 40 minutes 13 seconds
Load1: 4.60
Load5: 1.20
Load15: 0.47
Total Accesses: 27860
Total kBytes: 33011
Total Duration: 54118
CPUUser: 18.02
CPUSystem: 31.76
CPUChildrenUser: 0
CPUChildrenSystem: 0
CPULoad: .0262535
Uptime: 189613
ReqPerSec: .146931
BytesPerSec: 178.275
BytesPerReq: 1213.33
DurationPerReq: 1.9425
BusyWorkers: 7
IdleWorkers: 93
Processes: 4
Stopping: 0
BusyWorkers: 7
IdleWorkers: 93
ConnsTotal: 13
ConnsAsyncWriting: 0
ConnsAsyncKeepAlive: 5
ConnsAsyncClosing: 0
Scoreboard: ...
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
mod_status
.Check the availability of the module with this command line:
httpd -M 2>/dev/null | grep status_module
This is an example configuration of the Apache web server:
<Location "/server-status">
SetHandler server-status
Require host example.com
</Location>
{$APACHE.STATUS.HOST}
macro. You can also change the status page port in the {$APACHE.STATUS.PORT}
macro and status page path in the {$APACHE.STATUS.PATH}
macro if necessary.Name | Description | Default |
---|---|---|
{$APACHE.STATUS.HOST} | The hostname or IP address of the Apache status page host. |
<SET APACHE HOST> |
{$APACHE.STATUS.PORT} | The port of the Apache status page. |
80 |
{$APACHE.STATUS.PATH} | The URL path. |
server-status?auto |
{$APACHE.STATUS.SCHEME} | The request scheme, which may be either HTTP or HTTPS. |
http |
{$APACHE.RESPONSE_TIME.MAX.WARN} | The maximum Apache response time expressed in seconds for a trigger expression. |
10 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get status | Getting data from a machine-readable version of the Apache status page. For more information see Apache Module mod_status. |
HTTP agent | apache.get_status Preprocessing
|
Service ping | Simple check | net.tcp.service[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"] Preprocessing
|
|
Service response time | Simple check | net.tcp.service.perf[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"] | |
Total bytes | The total bytes served. |
Dependent item | apache.bytes Preprocessing
|
Bytes per second | It is calculated as a rate of change for total bytes statistics.
|
Dependent item | apache.bytes.rate Preprocessing
|
Requests per second | It is calculated as a rate of change for the "Total requests" statistics.
|
Dependent item | apache.requests.rate Preprocessing
|
Total requests | The total number of the Apache server accesses. |
Dependent item | apache.requests Preprocessing
|
Uptime | The service uptime expressed in seconds. |
Dependent item | apache.uptime Preprocessing
|
Version | The Apache service version. |
Dependent item | apache.version Preprocessing
|
Total workers busy | The total number of busy worker threads/processes. |
Dependent item | apache.workers_total.busy Preprocessing
|
Total workers idle | The total number of idle worker threads/processes. |
Dependent item | apache.workers_total.idle Preprocessing
|
Workers closing connection | The number of workers in closing state. |
Dependent item | apache.workers.closing Preprocessing
|
Workers DNS lookup | The number of workers in |
Dependent item | apache.workers.dnslookup Preprocessing
|
Workers finishing | The number of workers in finishing state. |
Dependent item | apache.workers.finishing Preprocessing
|
Workers idle cleanup | The number of workers in cleanup state. |
Dependent item | apache.workers.cleanup Preprocessing
|
Workers keepalive (read) | The number of workers in |
Dependent item | apache.workers.keepalive Preprocessing
|
Workers logging | The number of workers in logging state. |
Dependent item | apache.workers.logging Preprocessing
|
Workers reading request | The number of workers in reading state. |
Dependent item | apache.workers.reading Preprocessing
|
Workers sending reply | The number of workers in sending state. |
Dependent item | apache.workers.sending Preprocessing
|
Workers slot with no current process | The number of slots with no current process. |
Dependent item | apache.workers.slot Preprocessing
|
Workers starting up | The number of workers in starting state. |
Dependent item | apache.workers.starting Preprocessing
|
Workers waiting for connection | The number of workers in waiting state. |
Dependent item | apache.workers.waiting Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Apache: Failed to fetch status page | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Apache by HTTP/apache.get_status,30m)=1 |Warning |
Manual close: Yes Depends on:
|
|
Apache: Service is down | last(/Apache by HTTP/net.tcp.service[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"])=0 |Average |
Manual close: Yes | ||
Apache: Service response time is too high | min(/Apache by HTTP/net.tcp.service.perf[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"],5m)>{$APACHE.RESPONSE_TIME.MAX.WARN} |Warning |
Manual close: Yes Depends on:
|
||
Apache: Service has been restarted | Uptime is less than 10 minutes. |
last(/Apache by HTTP/apache.uptime)<10m |Info |
Manual close: Yes | |
Apache: Version has changed | Apache version has changed. Acknowledge to close the problem manually. |
last(/Apache by HTTP/apache.version,#1)<>last(/Apache by HTTP/apache.version,#2) and length(last(/Apache by HTTP/apache.version))>0 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Event MPM discovery | The discovery of additional metrics if the event Multi-Processing Module (MPM) is used. For more details see Apache MPM event. |
Dependent item | apache.mpm.event.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Connections async closing | The number of asynchronous connections in closing state (applicable only to the event MPM). |
Dependent item | apache.connections[async_closing{#SINGLETON}] Preprocessing
|
Connections async keepalive | The number of asynchronous connections in keepalive state (applicable only to the event MPM). |
Dependent item | apache.connections[asynckeepalive{#SINGLETON}] Preprocessing
|
Connections async writing | The number of asynchronous connections in writing state (applicable only to the event MPM). |
Dependent item | apache.connections[async_writing{#SINGLETON}] Preprocessing
|
Connections total | The number of total connections. |
Dependent item | apache.connections[total{#SINGLETON}] Preprocessing
|
Bytes per request | The average number of client requests per second. |
Dependent item | apache.bytes[per_request{#SINGLETON}] Preprocessing
|
Number of async processes | The number of asynchronous processes. |
Dependent item | apache.process[num{#SINGLETON}] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Apache monitoring by Zabbix via Zabbix agent and doesn't require any external scripts.
The template Apache by Zabbix agent
- collects metrics by polling mod_status locally with Zabbix agent:
127.0.0.1
ServerVersion: Apache/2.4.41 (Unix)
ServerMPM: event
Server Built: Aug 14 2019 00:35:10
CurrentTime: Friday, 16-Aug-2019 12:38:40 UTC
RestartTime: Wednesday, 14-Aug-2019 07:58:26 UTC
ParentServerConfigGeneration: 1
ParentServerMPMGeneration: 0
ServerUptimeSeconds: 189613
ServerUptime: 2 days 4 hours 40 minutes 13 seconds
Load1: 4.60
Load5: 1.20
Load15: 0.47
Total Accesses: 27860
Total kBytes: 33011
Total Duration: 54118
CPUUser: 18.02
CPUSystem: 31.76
CPUChildrenUser: 0
CPUChildrenSystem: 0
CPULoad: .0262535
Uptime: 189613
ReqPerSec: .146931
BytesPerSec: 178.275
BytesPerReq: 1213.33
DurationPerReq: 1.9425
BusyWorkers: 7
IdleWorkers: 93
Processes: 4
Stopping: 0
BusyWorkers: 7
IdleWorkers: 93
ConnsTotal: 13
ConnsAsyncWriting: 0
ConnsAsyncKeepAlive: 5
ConnsAsyncClosing: 0
Scoreboard: ...
It also uses Zabbix agent to collect Apache
Linux process statistics such as CPU usage, memory usage, and whether the process is running or not.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
See the setup instructions for mod_status.
Check the availability of the module with this command line: httpd -M 2>/dev/null | grep status_module
This is an example configuration of the Apache web server:
<Location "/server-status">
SetHandler server-status
Require host example.com
</Location>
If you use another path, then do not forget to change the {$APACHE.STATUS.PATH}
macro.
Install and setup Zabbix agent.
Name | Description | Default |
---|---|---|
{$APACHE.STATUS.HOST} | The hostname or IP address of the Apache status page. |
127.0.0.1 |
{$APACHE.STATUS.PORT} | The port of the Apache status page. |
80 |
{$APACHE.STATUS.PATH} | The URL path. |
server-status?auto |
{$APACHE.STATUS.SCHEME} | The request scheme, which may be either HTTP or HTTPS. |
http |
{$APACHE.RESPONSE_TIME.MAX.WARN} | The maximum Apache response time expressed in seconds for a trigger expression. |
10 |
{$APACHE.PROCESS_NAME} | The process name filter for the Apache process discovery. |
(httpd|apache2) |
{$APACHE.PROCESS.NAME.PARAMETER} | The process name of the Apache web server used in the item key |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get status | Getting data from a machine-readable version of the Apache status page. For more information see Apache Module mod_status. |
Zabbix agent (active) | web.page.get["{$APACHE.STATUS.SCHEME}://{$APACHE.STATUS.HOST}:{$APACHE.STATUS.PORT}/{$APACHE.STATUS.PATH}"] Preprocessing
|
Service ping | Zabbix agent (active) | net.tcp.service[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"] Preprocessing
|
|
Service response time | Zabbix agent (active) | net.tcp.service.perf[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"] | |
Total bytes | The total bytes served. |
Dependent item | apache.bytes Preprocessing
|
Bytes per second | It is calculated as a rate of change for total bytes statistics.
|
Dependent item | apache.bytes.rate Preprocessing
|
Requests per second | It is calculated as a rate of change for the "Total requests" statistics.
|
Dependent item | apache.requests.rate Preprocessing
|
Total requests | The total number of the Apache server accesses. |
Dependent item | apache.requests Preprocessing
|
Uptime | The service uptime expressed in seconds. |
Dependent item | apache.uptime Preprocessing
|
Version | The Apache service version. |
Dependent item | apache.version Preprocessing
|
Total workers busy | The total number of busy worker threads/processes. |
Dependent item | apache.workers_total.busy Preprocessing
|
Total workers idle | The total number of idle worker threads/processes. |
Dependent item | apache.workers_total.idle Preprocessing
|
Workers closing connection | The number of workers in closing state. |
Dependent item | apache.workers.closing Preprocessing
|
Workers DNS lookup | The number of workers in |
Dependent item | apache.workers.dnslookup Preprocessing
|
Workers finishing | The number of workers in finishing state. |
Dependent item | apache.workers.finishing Preprocessing
|
Workers idle cleanup | The number of workers in cleanup state. |
Dependent item | apache.workers.cleanup Preprocessing
|
Workers keepalive (read) | The number of workers in |
Dependent item | apache.workers.keepalive Preprocessing
|
Workers logging | The number of workers in logging state. |
Dependent item | apache.workers.logging Preprocessing
|
Workers reading request | The number of workers in reading state. |
Dependent item | apache.workers.reading Preprocessing
|
Workers sending reply | The number of workers in sending state. |
Dependent item | apache.workers.sending Preprocessing
|
Workers slot with no current process | The number of slots with no current process. |
Dependent item | apache.workers.slot Preprocessing
|
Workers starting up | The number of workers in starting state. |
Dependent item | apache.workers.starting Preprocessing
|
Workers waiting for connection | The number of workers in waiting state. |
Dependent item | apache.workers.waiting Preprocessing
|
Get processes summary | The aggregated data of summary metrics for all processes. |
Zabbix agent (active) | proc.get[{$APACHE.PROCESS.NAME.PARAMETER},,,summary] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Apache: Service has been restarted | Uptime is less than 10 minutes. |
last(/Apache by Zabbix agent active/apache.uptime)<10m |Info |
Manual close: Yes | |
Apache: Version has changed | Apache version has changed. Acknowledge to close the problem manually. |
last(/Apache by Zabbix agent active/apache.version,#1)<>last(/Apache by Zabbix agent active/apache.version,#2) and length(last(/Apache by Zabbix agent active/apache.version))>0 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Event MPM discovery | The discovery of additional metrics if the event Multi-Processing Module (MPM) is used. For more details see Apache MPM event. |
Dependent item | apache.mpm.event.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Connections async closing | The number of asynchronous connections in closing state (applicable only to the event MPM). |
Dependent item | apache.connections[async_closing{#SINGLETON}] Preprocessing
|
Connections async keepalive | The number of asynchronous connections in keepalive state (applicable only to the event MPM). |
Dependent item | apache.connections[asynckeepalive{#SINGLETON}] Preprocessing
|
Connections async writing | The number of asynchronous connections in writing state (applicable only to the event MPM). |
Dependent item | apache.connections[async_writing{#SINGLETON}] Preprocessing
|
Connections total | The number of total connections. |
Dependent item | apache.connections[total{#SINGLETON}] Preprocessing
|
Bytes per request | The average number of client requests per second. |
Dependent item | apache.bytes[per_request{#SINGLETON}] Preprocessing
|
Number of async processes | The number of asynchronous processes. |
Dependent item | apache.process[num{#SINGLETON}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Apache process discovery | The discovery of the Apache process summary. |
Dependent item | apache.proc.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
CPU utilization | The percentage of the CPU utilization by a process {#APACHE.NAME}. |
Zabbix agent (active) | proc.cpu.util[{#APACHE.NAME}] |
Get process data | The summary metrics aggregated by a process {#APACHE.NAME}. |
Dependent item | apache.proc.get[{#APACHE.NAME}] Preprocessing
|
Memory usage (rss) | The summary of resident set size memory used by a process {#APACHE.NAME} expressed in bytes. |
Dependent item | apache.proc.rss[{#APACHE.NAME}] Preprocessing
|
Memory usage (vsize) | The summary of virtual memory used by a process {#APACHE.NAME} expressed in bytes. |
Dependent item | apache.proc.vmem[{#APACHE.NAME}] Preprocessing
|
Memory usage, % | The percentage of real memory used by a process {#APACHE.NAME}. |
Dependent item | apache.proc.pmem[{#APACHE.NAME}] Preprocessing
|
Number of running processes | The number of running processes {#APACHE.NAME}. |
Dependent item | apache.proc.num[{#APACHE.NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Apache: Process is not running | last(/Apache by Zabbix agent active/apache.proc.num[{#APACHE.NAME}])=0 |High |
|||
Apache: Service is down | last(/Apache by Zabbix agent active/net.tcp.service[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"])=0 and last(/Apache by Zabbix agent active/apache.proc.num[{#APACHE.NAME}])>0 |Average |
Manual close: Yes | ||
Apache: Failed to fetch status page | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Apache by Zabbix agent active/web.page.get["{$APACHE.STATUS.SCHEME}://{$APACHE.STATUS.HOST}:{$APACHE.STATUS.PORT}/{$APACHE.STATUS.PATH}"],30m)=1 and last(/Apache by Zabbix agent active/apache.proc.num[{#APACHE.NAME}])>0 |Warning |
Manual close: Yes Depends on:
|
|
Apache: Service response time is too high | min(/Apache by Zabbix agent active/net.tcp.service.perf[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"],5m)>{$APACHE.RESPONSE_TIME.MAX.WARN} and last(/Apache by Zabbix agent active/apache.proc.num[{#APACHE.NAME}])>0 |Warning |
Manual close: Yes Depends on:
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Apache monitoring by Zabbix via Zabbix agent and doesn't require any external scripts.
The template Apache by Zabbix agent
- collects metrics by polling mod_status locally with Zabbix agent:
127.0.0.1
ServerVersion: Apache/2.4.41 (Unix)
ServerMPM: event
Server Built: Aug 14 2019 00:35:10
CurrentTime: Friday, 16-Aug-2019 12:38:40 UTC
RestartTime: Wednesday, 14-Aug-2019 07:58:26 UTC
ParentServerConfigGeneration: 1
ParentServerMPMGeneration: 0
ServerUptimeSeconds: 189613
ServerUptime: 2 days 4 hours 40 minutes 13 seconds
Load1: 4.60
Load5: 1.20
Load15: 0.47
Total Accesses: 27860
Total kBytes: 33011
Total Duration: 54118
CPUUser: 18.02
CPUSystem: 31.76
CPUChildrenUser: 0
CPUChildrenSystem: 0
CPULoad: .0262535
Uptime: 189613
ReqPerSec: .146931
BytesPerSec: 178.275
BytesPerReq: 1213.33
DurationPerReq: 1.9425
BusyWorkers: 7
IdleWorkers: 93
Processes: 4
Stopping: 0
BusyWorkers: 7
IdleWorkers: 93
ConnsTotal: 13
ConnsAsyncWriting: 0
ConnsAsyncKeepAlive: 5
ConnsAsyncClosing: 0
Scoreboard: ...
It also uses Zabbix agent to collect Apache
Linux process statistics such as CPU usage, memory usage, and whether the process is running or not.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
See the setup instructions for mod_status.
Check the availability of the module with this command line: httpd -M 2>/dev/null | grep status_module
This is an example configuration of the Apache web server:
<Location "/server-status">
SetHandler server-status
Require host example.com
</Location>
If you use another path, then do not forget to change the {$APACHE.STATUS.PATH}
macro.
Install and setup Zabbix agent.
Name | Description | Default |
---|---|---|
{$APACHE.STATUS.HOST} | The hostname or IP address of the Apache status page. |
127.0.0.1 |
{$APACHE.STATUS.PORT} | The port of the Apache status page. |
80 |
{$APACHE.STATUS.PATH} | The URL path. |
server-status?auto |
{$APACHE.STATUS.SCHEME} | The request scheme, which may be either HTTP or HTTPS. |
http |
{$APACHE.RESPONSE_TIME.MAX.WARN} | The maximum Apache response time expressed in seconds for a trigger expression. |
10 |
{$APACHE.PROCESS_NAME} | The process name filter for the Apache process discovery. |
(httpd|apache2) |
{$APACHE.PROCESS.NAME.PARAMETER} | The process name of the Apache web server used in the item key |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get status | Getting data from a machine-readable version of the Apache status page. For more information see Apache Module mod_status. |
Zabbix agent | web.page.get["{$APACHE.STATUS.SCHEME}://{$APACHE.STATUS.HOST}:{$APACHE.STATUS.PORT}/{$APACHE.STATUS.PATH}"] Preprocessing
|
Service ping | Zabbix agent | net.tcp.service[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"] Preprocessing
|
|
Service response time | Zabbix agent | net.tcp.service.perf[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"] | |
Total bytes | The total bytes served. |
Dependent item | apache.bytes Preprocessing
|
Bytes per second | It is calculated as a rate of change for total bytes statistics.
|
Dependent item | apache.bytes.rate Preprocessing
|
Requests per second | It is calculated as a rate of change for the "Total requests" statistics.
|
Dependent item | apache.requests.rate Preprocessing
|
Total requests | The total number of the Apache server accesses. |
Dependent item | apache.requests Preprocessing
|
Uptime | The service uptime expressed in seconds. |
Dependent item | apache.uptime Preprocessing
|
Version | The Apache service version. |
Dependent item | apache.version Preprocessing
|
Total workers busy | The total number of busy worker threads/processes. |
Dependent item | apache.workers_total.busy Preprocessing
|
Total workers idle | The total number of idle worker threads/processes. |
Dependent item | apache.workers_total.idle Preprocessing
|
Workers closing connection | The number of workers in closing state. |
Dependent item | apache.workers.closing Preprocessing
|
Workers DNS lookup | The number of workers in |
Dependent item | apache.workers.dnslookup Preprocessing
|
Workers finishing | The number of workers in finishing state. |
Dependent item | apache.workers.finishing Preprocessing
|
Workers idle cleanup | The number of workers in cleanup state. |
Dependent item | apache.workers.cleanup Preprocessing
|
Workers keepalive (read) | The number of workers in |
Dependent item | apache.workers.keepalive Preprocessing
|
Workers logging | The number of workers in logging state. |
Dependent item | apache.workers.logging Preprocessing
|
Workers reading request | The number of workers in reading state. |
Dependent item | apache.workers.reading Preprocessing
|
Workers sending reply | The number of workers in sending state. |
Dependent item | apache.workers.sending Preprocessing
|
Workers slot with no current process | The number of slots with no current process. |
Dependent item | apache.workers.slot Preprocessing
|
Workers starting up | The number of workers in starting state. |
Dependent item | apache.workers.starting Preprocessing
|
Workers waiting for connection | The number of workers in waiting state. |
Dependent item | apache.workers.waiting Preprocessing
|
Get processes summary | The aggregated data of summary metrics for all processes. |
Zabbix agent | proc.get[{$APACHE.PROCESS.NAME.PARAMETER},,,summary] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Apache: Service has been restarted | Uptime is less than 10 minutes. |
last(/Apache by Zabbix agent/apache.uptime)<10m |Info |
Manual close: Yes | |
Apache: Version has changed | Apache version has changed. Acknowledge to close the problem manually. |
last(/Apache by Zabbix agent/apache.version,#1)<>last(/Apache by Zabbix agent/apache.version,#2) and length(last(/Apache by Zabbix agent/apache.version))>0 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Event MPM discovery | The discovery of additional metrics if the event Multi-Processing Module (MPM) is used. For more details see Apache MPM event. |
Dependent item | apache.mpm.event.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Connections async closing | The number of asynchronous connections in closing state (applicable only to the event MPM). |
Dependent item | apache.connections[async_closing{#SINGLETON}] Preprocessing
|
Connections async keepalive | The number of asynchronous connections in keepalive state (applicable only to the event MPM). |
Dependent item | apache.connections[asynckeepalive{#SINGLETON}] Preprocessing
|
Connections async writing | The number of asynchronous connections in writing state (applicable only to the event MPM). |
Dependent item | apache.connections[async_writing{#SINGLETON}] Preprocessing
|
Connections total | The number of total connections. |
Dependent item | apache.connections[total{#SINGLETON}] Preprocessing
|
Bytes per request | The average number of client requests per second. |
Dependent item | apache.bytes[per_request{#SINGLETON}] Preprocessing
|
Number of async processes | The number of asynchronous processes. |
Dependent item | apache.process[num{#SINGLETON}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Apache process discovery | The discovery of the Apache process summary. |
Dependent item | apache.proc.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
CPU utilization | The percentage of the CPU utilization by a process {#APACHE.NAME}. |
Zabbix agent | proc.cpu.util[{#APACHE.NAME}] |
Get process data | The summary metrics aggregated by a process {#APACHE.NAME}. |
Dependent item | apache.proc.get[{#APACHE.NAME}] Preprocessing
|
Memory usage (rss) | The summary of resident set size memory used by a process {#APACHE.NAME} expressed in bytes. |
Dependent item | apache.proc.rss[{#APACHE.NAME}] Preprocessing
|
Memory usage (vsize) | The summary of virtual memory used by a process {#APACHE.NAME} expressed in bytes. |
Dependent item | apache.proc.vmem[{#APACHE.NAME}] Preprocessing
|
Memory usage, % | The percentage of real memory used by a process {#APACHE.NAME}. |
Dependent item | apache.proc.pmem[{#APACHE.NAME}] Preprocessing
|
Number of running processes | The number of running processes {#APACHE.NAME}. |
Dependent item | apache.proc.num[{#APACHE.NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Apache: Process is not running | last(/Apache by Zabbix agent/apache.proc.num[{#APACHE.NAME}])=0 |High |
|||
Apache: Service is down | last(/Apache by Zabbix agent/net.tcp.service[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"])=0 and last(/Apache by Zabbix agent/apache.proc.num[{#APACHE.NAME}])>0 |Average |
Manual close: Yes | ||
Apache: Failed to fetch status page | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Apache by Zabbix agent/web.page.get["{$APACHE.STATUS.SCHEME}://{$APACHE.STATUS.HOST}:{$APACHE.STATUS.PORT}/{$APACHE.STATUS.PATH}"],30m)=1 and last(/Apache by Zabbix agent/apache.proc.num[{#APACHE.NAME}])>0 |Warning |
Manual close: Yes Depends on:
|
|
Apache: Service response time is too high | min(/Apache by Zabbix agent/net.tcp.service.perf[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"],5m)>{$APACHE.RESPONSE_TIME.MAX.WARN} and last(/Apache by Zabbix agent/apache.proc.num[{#APACHE.NAME}])>0 |Warning |
Manual close: Yes Depends on:
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Apache ActiveMQ monitoring by Zabbix via JMX and doesn't require any external scripts.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Metrics are collected by JMX.
Name | Description | Default |
---|---|---|
{$ACTIVEMQ.USER} | User for JMX |
admin |
{$ACTIVEMQ.PASSWORD} | Password for JMX |
activemq |
{$ACTIVEMQ.PORT} | Port for JMX |
1099 |
{$ACTIVEMQ.LLD.FILTER.BROKER.MATCHES} | Filter of discoverable discovered brokers |
.* |
{$ACTIVEMQ.LLD.FILTER.BROKER.NOT_MATCHES} | Filter to exclude discovered brokers |
CHANGE IF NEEDED |
{$ACTIVEMQ.LLD.FILTER.DESTINATION.MATCHES} | Filter of discoverable discovered destinations |
.* |
{$ACTIVEMQ.LLD.FILTER.DESTINATION.NOT_MATCHES} | Filter to exclude discovered destinations |
CHANGE IF NEEDED |
{$ACTIVEMQ.MSG.RATE.WARN.TIME} | The time for message enqueue/dequeue rate. Can be used with destination or broker name as context. |
15m |
{$ACTIVEMQ.MEM.MAX.WARN} | Memory threshold for AVERAGE trigger. Can be used with destination or broker name as context. |
75 |
{$ACTIVEMQ.MEM.MAX.HIGH} | Memory threshold for HIGH trigger. Can be used with destination or broker name as context. |
90 |
{$ACTIVEMQ.MEM.TIME} | Time during which the metric can be above the threshold. Can be used with destination or broker name as context. |
5m |
{$ACTIVEMQ.STORE.MAX.WARN} | Storage threshold for AVERAGE trigger. Can be used with broker name as context. |
75 |
{$ACTIVEMQ.STORE.TIME} | Time during which the metric can be above the threshold. Can be used with destination or broker name as context. |
5m |
{$ACTIVEMQ.STORE.MAX.HIGH} | Storage threshold for HIGH trigger. Can be used with broker name as context. |
90 |
{$ACTIVEMQ.TEMP.MAX.WARN} | Temp threshold for AVERAGE trigger. Can be used with broker name as context. |
75 |
{$ACTIVEMQ.TEMP.MAX.HIGH} | Temp threshold for HIGH trigger. Can be used with broker name as context. |
90 |
{$ACTIVEMQ.TEMP.TIME} | Time during which the metric can be above the threshold. Can be used with destination or broker name as context. |
5m |
{$ACTIVEMQ.DESTINATION.CONSUMERS.MIN.TIME} | Time during which there may be no consumers in destination. Can be used with destination name as context. |
10m |
{$ACTIVEMQ.DESTINATION.CONSUMERS.MIN.HIGH} | Minimum amount of consumers for destination. Can be used with destination name as context. |
1 |
{$ACTIVEMQ.DESTINATION.PRODUCERS.MIN.TIME} | Time during which there may be no producers on destination. Can be used with destination name as context. |
10m |
{$ACTIVEMQ.DESTINATION.PRODUCERS.MIN.HIGH} | Minimum amount of producers for destination. Can be used with destination name as context. |
1 |
{$ACTIVEMQ.BROKER.CONSUMERS.MIN.TIME} | Time during which there may be no consumers on destination. Can be used with broker name as context. |
5m |
{$ACTIVEMQ.BROKER.CONSUMERS.MIN.HIGH} | Minimum amount of consumers for broker. Can be used with broker name as context. |
1 |
{$ACTIVEMQ.BROKER.PRODUCERS.MIN.TIME} | Time during which there may be no producers on broker. Can be used with broker name as context. |
5m |
{$ACTIVEMQ.BROKER.PRODUCERS.MIN.HIGH} | Minimum amount of producers for broker. Can be used with broker name as context. |
1 |
{$ACTIVEMQ.TOTAL.CONSUMERS.COUNT} | Attribute for TotalConsumerCount per destination. Used to suppress destination's triggers when the count of consumers on the broker is lower than threshold. |
TotalConsumerCount |
{$ACTIVEMQ.TOTAL.PRODUCERS.COUNT} | Attribute for TotalProducerCount per destination. Used to suppress destination's triggers when the count of consumers on the broker is lower than threshold. |
TotalProducerCount |
{$ACTIVEMQ.QUEUE.TIME} | Time during which the QueueSize can be higher than threshold. Can be used with destination name as context. |
10m |
{$ACTIVEMQ.QUEUE.WARN} | Threshold for QueueSize. Can be used with destination name as context. |
100 |
{$ACTIVEMQ.QUEUE.ENABLED} | Use this to disable alerting for specific destination. 1 = enabled, 0 = disabled. Can be used with destination name as context. |
1 |
{$ACTIVEMQ.EXPIRED.WARN} | Threshold for expired messages count. Can be used with destination name as context. |
0 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Brokers discovery | Discovery of brokers |
JMX agent | jmx.discovery[beans,"org.apache.activemq:type=Broker,brokerName=*"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Broker {#JMXBROKERNAME}: Version | The version of the broker. |
JMX agent | jmx[{#JMXOBJ},BrokerVersion] Preprocessing
|
Broker {#JMXBROKERNAME}: Uptime | The uptime of the broker. |
JMX agent | jmx[{#JMXOBJ},UptimeMillis] Preprocessing
|
Broker {#JMXBROKERNAME}: Memory limit | Memory limit, in bytes, used for holding undelivered messages before paging to temporary storage. |
JMX agent | jmx[{#JMXOBJ},MemoryLimit] Preprocessing
|
Broker {#JMXBROKERNAME}: Memory usage in percents | Percent of memory limit used. |
JMX agent | jmx[{#JMXOBJ}, MemoryPercentUsage] |
Broker {#JMXBROKERNAME}: Storage limit | Disk limit, in bytes, used for persistent messages before producers are blocked. |
JMX agent | jmx[{#JMXOBJ},StoreLimit] Preprocessing
|
Broker {#JMXBROKERNAME}: Storage usage in percents | Percent of store limit used. |
JMX agent | jmx[{#JMXOBJ},StorePercentUsage] |
Broker {#JMXBROKERNAME}: Temp limit | Disk limit, in bytes, used for non-persistent messages and temporary data before producers are blocked. |
JMX agent | jmx[{#JMXOBJ},TempLimit] Preprocessing
|
Broker {#JMXBROKERNAME}: Temp usage in percents | Percent of temp limit used. |
JMX agent | jmx[{#JMXOBJ},TempPercentUsage] |
Broker {#JMXBROKERNAME}: Messages enqueue rate | Rate of messages that have been sent to the broker. |
JMX agent | jmx[{#JMXOBJ},TotalEnqueueCount] Preprocessing
|
Broker {#JMXBROKERNAME}: Messages dequeue rate | Rate of messages that have been delivered by the broker and acknowledged by consumers. |
JMX agent | jmx[{#JMXOBJ},TotalDequeueCount] Preprocessing
|
Broker {#JMXBROKERNAME}: Consumers count total | Number of consumers attached to this broker. |
JMX agent | jmx[{#JMXOBJ},TotalConsumerCount] |
Broker {#JMXBROKERNAME}: Producers count total | Number of producers attached to this broker. |
JMX agent | jmx[{#JMXOBJ},TotalProducerCount] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Apache ActiveMQ: Broker {#JMXBROKERNAME}: Version has been changed | The Broker {#JMXBROKERNAME} version has changed. Acknowledge to close the problem manually. |
last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},BrokerVersion],#1)<>last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},BrokerVersion],#2) and length(last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},BrokerVersion]))>0 |Info |
Manual close: Yes | |
Apache ActiveMQ: Broker {#JMXBROKERNAME}: Broker has been restarted | Uptime is less than 10 minutes. |
last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},UptimeMillis])<10m |Info |
Manual close: Yes | |
Apache ActiveMQ: Broker {#JMXBROKERNAME}: Memory usage is too high | min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ}, MemoryPercentUsage],{$ACTIVEMQ.MEM.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.MEM.MAX.WARN:"{#JMXBROKERNAME}"} |Average |
Depends on:
|
||
Apache ActiveMQ: Broker {#JMXBROKERNAME}: Memory usage is too high | min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ}, MemoryPercentUsage],{$ACTIVEMQ.MEM.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.MEM.MAX.HIGH:"{#JMXBROKERNAME}"} |High |
|||
Apache ActiveMQ: Broker {#JMXBROKERNAME}: Storage usage is too high | min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},StorePercentUsage],{$ACTIVEMQ.STORE.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.STORE.MAX.WARN:"{#JMXBROKERNAME}"} |Average |
Depends on:
|
||
Apache ActiveMQ: Broker {#JMXBROKERNAME}: Storage usage is too high | min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},StorePercentUsage],{$ACTIVEMQ.STORE.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.STORE.MAX.HIGH:"{#JMXBROKERNAME}"} |High |
|||
Apache ActiveMQ: Broker {#JMXBROKERNAME}: Temp usage is too high | min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TempPercentUsage],{$ACTIVEMQ.TEMP.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.TEMP.MAX.WARN} |Average |
Depends on:
|
||
Apache ActiveMQ: Broker {#JMXBROKERNAME}: Temp usage is too high | min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TempPercentUsage],{$ACTIVEMQ.TEMP.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.TEMP.MAX.HIGH} |High |
|||
Apache ActiveMQ: Broker {#JMXBROKERNAME}: Message enqueue rate is higher than dequeue rate | Enqueue rate is higher than dequeue rate. It may indicate performance problems. |
avg(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TotalEnqueueCount],{$ACTIVEMQ.MSG.RATE.WARN.TIME:"{#JMXBROKERNAME}"})>avg(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TotalEnqueueCount],{$ACTIVEMQ.MSG.RATE.WARN.TIME:"{#JMXBROKERNAME}"}) |Average |
||
Apache ActiveMQ: Broker {#JMXBROKERNAME}: Consumers count is too low | max(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TotalConsumerCount],{$ACTIVEMQ.BROKER.CONSUMERS.MIN.TIME:"{#JMXBROKERNAME}"})<{$ACTIVEMQ.BROKER.CONSUMERS.MIN.HIGH:"{#JMXBROKERNAME}"} |High |
|||
Apache ActiveMQ: Broker {#JMXBROKERNAME}: Producers count is too low | max(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TotalProducerCount],{$ACTIVEMQ.BROKER.PRODUCERS.MIN.TIME:"{#JMXBROKERNAME}"})<{$ACTIVEMQ.BROKER.PRODUCERS.MIN.HIGH:"{#JMXBROKERNAME}"} |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Destinations discovery | Discovery of destinations |
JMX agent | jmx.discovery[beans,"org.apache.activemq:type=Broker,brokerName=,destinationType=,destinationName=*"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Consumers count | Number of consumers attached to this destination. |
JMX agent | jmx[{#JMXOBJ},ConsumerCount] |
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Consumers count total on {#JMXBROKERNAME} | Number of consumers attached to the broker of this destination. Used to suppress destination's triggers when the count of consumers on the broker is lower than threshold. |
JMX agent | jmx["org.apache.activemq:type=Broker,brokerName={#JMXBROKERNAME}",{$ACTIVEMQ.TOTAL.CONSUMERS.COUNT: "{#JMXDESTINATIONNAME}"}] Preprocessing
|
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Producers count | Number of producers attached to this destination. |
JMX agent | jmx[{#JMXOBJ},ProducerCount] |
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Producers count total on {#JMXBROKERNAME} | Number of producers attached to the broker of this destination. Used to suppress destination's triggers when the count of producers on the broker is lower than threshold. |
JMX agent | jmx["org.apache.activemq:type=Broker,brokerName={#JMXBROKERNAME}",{$ACTIVEMQ.TOTAL.PRODUCERS.COUNT: "{#JMXDESTINATIONNAME}"}] Preprocessing
|
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Memory usage in percents | The percentage of the memory limit used. |
JMX agent | jmx[{#JMXOBJ},MemoryPercentUsage] |
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Messages enqueue rate | Rate of messages that have been sent to the destination. |
JMX agent | jmx[{#JMXOBJ},EnqueueCount] Preprocessing
|
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Messages dequeue rate | Rate of messages that has been acknowledged (and removed) from the destination. |
JMX agent | jmx[{#JMXOBJ},DequeueCount] Preprocessing
|
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Queue size | Number of messages on this destination, including any that have been dispatched but not acknowledged. |
JMX agent | jmx[{#JMXOBJ},QueueSize] |
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Expired messages count | Number of messages that have been expired. |
JMX agent | jmx[{#JMXOBJ},ExpiredCount] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Apache ActiveMQ: {#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Consumers count is too low | max(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},ConsumerCount],{$ACTIVEMQ.DESTINATION.CONSUMERS.MIN.TIME:"{#JMXDESTINATIONNAME}"})<{$ACTIVEMQ.DESTINATION.CONSUMERS.MIN.HIGH:"{#JMXDESTINATIONNAME}"} and last(/Apache ActiveMQ by JMX/jmx["org.apache.activemq:type=Broker,brokerName={#JMXBROKERNAME}",{$ACTIVEMQ.TOTAL.CONSUMERS.COUNT: "{#JMXDESTINATIONNAME}"}])>{$ACTIVEMQ.BROKER.CONSUMERS.MIN.HIGH:"{#JMXBROKERNAME}"} |Average |
Manual close: Yes | ||
Apache ActiveMQ: {#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Producers count is too low | max(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},ProducerCount],{$ACTIVEMQ.DESTINATION.PRODUCERS.MIN.TIME:"{#JMXDESTINATIONNAME}"})<{$ACTIVEMQ.DESTINATION.PRODUCERS.MIN.HIGH:"{#JMXDESTINATIONNAME}"} and last(/Apache ActiveMQ by JMX/jmx["org.apache.activemq:type=Broker,brokerName={#JMXBROKERNAME}",{$ACTIVEMQ.TOTAL.PRODUCERS.COUNT: "{#JMXDESTINATIONNAME}"}])>{$ACTIVEMQ.BROKER.PRODUCERS.MIN.HIGH:"{#JMXBROKERNAME}"} |Average |
Manual close: Yes | ||
Apache ActiveMQ: {#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Memory usage is too high | last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},MemoryPercentUsage])>{$ACTIVEMQ.MEM.MAX.WARN:"{#JMXDESTINATIONNAME}"} |Average |
|||
Apache ActiveMQ: {#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Memory usage is too high | last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},MemoryPercentUsage])>{$ACTIVEMQ.MEM.MAX.HIGH:"{#JMXDESTINATIONNAME}"} |High |
|||
Apache ActiveMQ: {#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Message enqueue rate is higher than dequeue rate | Enqueue rate is higher than dequeue rate. It may indicate performance problems. |
avg(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},EnqueueCount],{$ACTIVEMQ.MSG.RATE.WARN.TIME:"{#JMXDESTINATIONNAME}"})>avg(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},DequeueCount],{$ACTIVEMQ.MSG.RATE.WARN.TIME:"{#JMXDESTINATIONNAME}"}) |Average |
||
Apache ActiveMQ: {#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Queue size is high | Queue size is higher than threshold. It may indicate performance problems. |
min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},QueueSize],{$ACTIVEMQ.QUEUE.TIME:"{#JMXDESTINATIONNAME}"})>{$ACTIVEMQ.QUEUE.WARN:"{#JMXDESTINATIONNAME}"} and {$ACTIVEMQ.QUEUE.ENABLED:"{#JMXDESTINATIONNAME}"}=1 |Average |
||
Apache ActiveMQ: {#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Expired messages count is high | This metric represents the number of messages that expired before they could be delivered. If you expect all messages to be delivered and acknowledged within a certain amount of time, you can set an expiration for each message, and investigate if your ExpiredCount metric rises above zero. |
last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},ExpiredCount])>{$ACTIVEMQ.EXPIRED.WARN:"{#JMXDESTINATIONNAME}"} |Average |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Acronis Cyber Protect Cloud monitoring by Zabbix via HTTP and doesn't require any external scripts.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This is a master template that needs to be assigned to a host, and it will automatically create MSP host prototype, which will monitor Acronis Cyber Protect Cloud metrics.
Before using this template it is required to create a new MSP-level API client for Zabbix to use. To do that, sign into your Acronis Cyber Protect Cloud WEB interface, navigate to Settings
-> API clients
and create new API client.
You will be shown credentials for this API client. These credentials need to be entered in the following user macros of this template:
{$ACRONIS.CPC.AUTH.CLIENT.ID}
- enter Client ID
here;
{$ACRONIS.CPC.AUTH.SECRET}
- enter Secret
here;
{$ACRONIS.CPC.DATACENTER.URL}
- enter Data center URL
This is all the configuration needed for this integration.
Name | Description | Default |
---|---|---|
{$ACRONIS.CPC.DATACENTER.URL} | Acronis Cyber Protect Cloud datacenter URL, e.g., https://eu2-cloud.acronis.com. |
|
{$ACRONIS.CPC.AUTH.INTERVAL} | API token regeneration interval, in minutes. By default, Acronis Cyber Protect Cloud tokens expire after 2 hours. |
110m |
{$ACRONIS.CPC.HTTP.PROXY} | Sets the HTTP proxy for the authorization item. Host prototypes will also use this value for HTTP proxy. If this parameter is empty, then no proxy is used. |
|
{$ACRONIS.CPC.AUTH.CLIENT.ID} | Client ID for API user access. |
|
{$ACRONIS.CPC.AUTH.SECRET} | Secret for API user access. |
|
{$ACRONIS.CPC.PATH.ACCOUNT.MANAGEMENT} | Sub-path for the Account Management API. |
/api/2 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Get access token | Authorizes API user and receives access token. |
HTTP agent | acronis.cpc.accountmanager.gettoken Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Acronis CPC: MSP Discovery | Discovers MSP and creates host prototype based on that. |
Dependent item | acronis.cpc.lld.msp_discovery |
This template is designed for the effortless deployment of Acronis Cyber Protect Cloud MSP monitoring by Zabbix via HTTP and doesn't require any external scripts.
Zabbix version: 7.0 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template is not meant to be used independently. A host with the Acronis Cyber Protect Cloud by HTTP
template will request API token and automatically create a host prototype with this template assigned to it.
If needed, you can specify an HTTP proxy for the template to use by changing the value of {$ACRONIS.CPC.HTTP.PROXY}
user macro.
Device discovery trigger prototypes that check services which have failed to run, have trigger time offset user macros:
{$ACRONIS.CPC.OFFSET.SCHEDULED.ANTIMALWARE}
{$ACRONIS.CPC.OFFSET.SCHEDULED.BACKUP}
{$ACRONIS.CPC.OFFSET.SCHEDULED.VULNERABILITY}
{$ACRONIS.CPC.OFFSET.SCHEDULED.PATCH}
Using these macros, their respective triggers can be offset in both directions. For example, if you wish to make
sure that the trigger fires only when the current time is at least 3 minutes over the next scheduled antimalware
scan, then set the value of {$ACRONIS.CPC.OFFSET.SCHEDULED.ANTIMALWARE}
user macro to -180
.
This is the default behaviour.
Name | Description | Default |
---|---|---|
{$ACRONIS.CPC.DATACENTER.URL} | Acronis Cyber Protect Cloud datacenter URL, e.g., https://eu2-cloud.acronis.com. |
|
{$ACRONIS.CPC.HTTP.PROXY} | Sets the HTTP proxy for the authorization item. Host prototypes will also use this value for HTTP proxy. If this parameter is empty, then no proxy is used. |
|
{$ACRONIS.CPC.CYBERFIT.WARN} | CyberFit score threshold for "warning" severity trigger. |
669 |
{$ACRONIS.CPC.CYBERFIT.HIGH} | CyberFit score threshold for "high" severity trigger. |
579 |
{$ACRONIS.CPC.OFFSET.SCHEDULED.ANTIMALWARE} | Offset time in seconds for scheduled antimalware scan trigger check. |
-180 |
{$ACRONIS.CPC.OFFSET.SCHEDULED.BACKUP} | Offset time in seconds for scheduled backup run trigger check. |
-180 |
{$ACRONIS.CPC.OFFSET.SCHEDULED.VULNERABILITY} | Offset time in seconds for scheduled vulnerability assessment run trigger check. |
-180 |
{$ACRONIS.CPC.OFFSET.SCHEDULED.PATCH} | Offset time in seconds for scheduled patch management run trigger check. |
-180 |
{$ACRONIS.CPC.DEVICE.RESOURCE.TYPE} | Comma separated list of resource types for devices retrieval. |
resource.machine |
{$ACRONIS.CPC.ALERT.DISCOVERY.CATEGORY.MATCHES} | Sets the alert category regex filter to use in alert discovery for including. |
.* |
{$ACRONIS.CPC.ALERT.DISCOVERY.CATEGORY.NOT_MATCHES} | Sets the alert category regex filter to use in alert discovery for excluding. |
CHANGE_IF_NEEDED |
{$ACRONIS.CPC.ALERT.DISCOVERY.SEVERITY.MATCHES} | Sets the alert severity regex filter to use in alert discovery for including. |
.* |
{$ACRONIS.CPC.ALERT.DISCOVERY.SEVERITY.NOT_MATCHES} | Sets the alert severity regex filter to use in alert discovery for excluding. |
CHANGE_IF_NEEDED |
{$ACRONIS.CPC.ALERT.DISCOVERY.RESOURCE.MATCHES} | Sets the alert resource name regex filter to use in alert discovery for including. |
.* |
{$ACRONIS.CPC.ALERT.DISCOVERY.RESOURCE.NOT_MATCHES} | Sets the alert resource name regex filter to use in alert discovery for excluding. |
CHANGE_IF_NEEDED |
{$ACRONIS.CPC.CUSTOMER.DISCOVERY.KIND.MATCHES} | Sets the customer name regex filter to use in customer discovery for including. |
customer |
{$ACRONIS.CPC.CUSTOMER.DISCOVERY.NAME.MATCHES} | Sets the customer name regex filter to use in customer discovery for including. |
.* |
{$ACRONIS.CPC.CUSTOMER.DISCOVERY.NAME.NOT_MATCHES} | Sets the customer name regex filter to use in customer discovery for excluding. |
CHANGE_IF_NEEDED |
{$ACRONIS.CPC.DEVICE.DISCOVERY.TENANT.MATCHES} | Sets the tenant name regex filter to use in device discovery for including. |
.* |
{$ACRONIS.CPC.DEVICE.DISCOVERY.TENANT.NOT_MATCHES} | Sets the tenant name regex filter to use in device discovery for excluding. |
CHANGE_IF_NEEDED |
{$ACRONIS.CPC.ACCESS_TOKEN} | API access token. |
|
{$ACRONIS.CPC.PATH.ACCOUNT.MANAGEMENT} | Sub-path for the Account Management API. |
/api/2 |
{$ACRONIS.CPC.PATH.RESOURCE.MANAGEMENT} | Sub-path for the Resource Management API. |
/api/resource_management/v4 |
{$ACRONIS.CPC.PATH.ALERTS} | Sub-path for the Alerts API. |
/api/alert_manager/v1 |
{$ACRONIS.CPC.PATH.AGENTS} | Sub-path for the Agents API. |
/api/agent_manager/v2 |
{$ACRONIS.CPC.MSP.TENANT.UUID} | UUID for MSP. |
Name | Description | Type | Key and additional info |
---|---|---|---|
Register integration | Registers integration on Acronis services. |
Script | acronis.cpc.register.integration |
Get alerts | Fetches all alerts. |
HTTP agent | acronis.cpc.alerts.get Preprocessing
|
Get customers | Fetches all customers. |
HTTP agent | acronis.cpc.customers.get Preprocessing
|
Get devices | Fetches all devices. |
HTTP agent | acronis.cpc.devices.get Preprocessing
|
Alerts with "ok" severity | Gets count of alerts with "ok" severity. |
Dependent item | acronis.cpc.alerts.severity.ok Preprocessing
|
Alerts with "warning" severity | Gets count of alerts with "warning" severity. |
Dependent item | acronis.cpc.alerts.severity.warn Preprocessing
|
Alerts with "error" severity | Gets count of alerts with "error" severity. |
Dependent item | acronis.cpc.alerts.severity.err Preprocessing
|
Alerts with "critical" severity | Gets count of alerts with "critical" severity. |
Dependent item | acronis.cpc.alerts.severity.crit Preprocessing
|
Alerts with "information" severity | Gets count of alerts with "information" severity. |
Dependent item | acronis.cpc.alerts.severity.info Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Acronis CPC: Alerts discovery | Discovers alerts. |
Dependent item | acronis.cpc.alerts.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Alert [{#TYPE}]:[{#ALERT_ID}]: Alert severity | Severity for the alert. |
Dependent item | acronis.cpc.alert.severity[{#ALERT_ID}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Acronis: Alert [{#TYPE}]:[{#ALERT_ID}]: Alert has "critical" severity | Alert has "critical" severity. |
last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.alert.severity[{#ALERT_ID}])=3 |High |
Manual close: Yes | |
Acronis: Alert [{#TYPE}]:[{#ALERT_ID}]: Alert has "error" severity | Alert has "error" severity. |
last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.alert.severity[{#ALERT_ID}])=2 |Average |
Manual close: Yes Depends on:
|
|
Acronis: Alert [{#TYPE}]:[{#ALERT_ID}]: Alert has "warning" severity | Alert has "warning" severity. |
last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.alert.severity[{#ALERT_ID}])=1 |Warning |
Manual close: Yes Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Acronis CPC: Customer discovery | Discovers customers. |
Dependent item | acronis.cpc.customer.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Customer [{#NAME}]: Enabled status | Enabled status for customer (true or false). |
Dependent item | acronis.cpc.customer.status[{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Acronis CPC: Device discovery | Discovers devices. |
Dependent item | acronis.cpc.device.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Device [{#NAME}]:[{#ID}]: Raw data resources status | Gets statuses for device resources. |
HTTP agent | acronis.cpc.device.res.status.raw[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: CyberFit score | Acronis "CyberFit" score for the device. Value of "-1" is assigned if "CyberFit" could not be found for device. |
Dependent item | acronis.cpc.device.cyberfit[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Agent version | Agent version for the device. |
Dependent item | acronis.cpc.device.agent.version[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Agent enabled | Agent status (enabled or disabled) for the device. |
Dependent item | acronis.cpc.device.agent.enabled[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Agent online | Agent reachability for the device. |
Dependent item | acronis.cpc.device.agent.online[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Protection status | Protection status for device. |
Dependent item | acronis.cpc.device.protection.status[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Protection plan name | Protection plan name for device. |
Dependent item | acronis.cpc.device.protection.name[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Previous successful antimalware protection scan | Previous successful antimalware protection scan for device. |
Dependent item | acronis.cpc.device.protection.scan.prev.ok[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Previous antimalware protection scan | Previous antimalware protection scan for device. |
Dependent item | acronis.cpc.device.protection.scan.prev[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Next antimalware protection scan | Next scheduled antimalware protection scan for device. |
Dependent item | acronis.cpc.device.protection.scan.next[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Previous successful machine backup run | Previous successful machine backup run for device. |
Dependent item | acronis.cpc.device.backup.prev.ok[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Previous machine backup run | Previous machine backup run for device. |
Dependent item | acronis.cpc.device.backup.prev[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Next machine backup run | Next scheduled machine backup run for device. |
Dependent item | acronis.cpc.device.backup.next[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Previous successful vulnerability assessment | Previous successful vulnerability assessment for device. |
Dependent item | acronis.cpc.device.vuln.prev.ok[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Previous vulnerability assessment | Previous vulnerability assessment for device. |
Dependent item | acronis.cpc.device.vuln.prev[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Next vulnerability assessment | Next scheduled vulnerability assessment for device. |
Dependent item | acronis.cpc.device.vuln.next[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Previous successful patch management run | Previous successful patch management run for device. |
Dependent item | acronis.cpc.device.patch.prev.ok[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Previous patch management run | Previous patch management run for device. |
Dependent item | acronis.cpc.device.patch.prev[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Next patch management run | Next scheduled patch management run for device. |
Dependent item | acronis.cpc.device.patch.next[{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Acronis: Device [{#NAME}]:[{#ID}]: CyberFit score critical | CyberFit score for this device is critical for at least 3 minutes. |
min(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.cyberfit[{#NAME}],3m) < {$ACRONIS.CPC.CYBERFIT.HIGH} and max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.cyberfit[{#NAME}],3m) <> -1 |High |
Manual close: Yes | |
Acronis: Device [{#NAME}]:[{#ID}]: CyberFit score low | CyberFit score for this device is low for at least 3 minutes. |
min(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.cyberfit[{#NAME}],3m) < {$ACRONIS.CPC.CYBERFIT.WARN} and max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.cyberfit[{#NAME}],3m) <> -1 |Warning |
Manual close: Yes Depends on:
|
|
Acronis: Device [{#NAME}]:[{#ID}]: Agent disabled | Agent for this device is disabled for at least 3 minutes. |
max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.agent.enabled[{#NAME}],3m) < 1 |Info |
Manual close: Yes | |
Acronis: Device [{#NAME}]:[{#ID}]: Protection status "error" | Device has "error" protection status. |
last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.protection.status[{#NAME}])="error" |Average |
Manual close: Yes | |
Acronis: Device [{#NAME}]:[{#ID}]: Protection status "warning" | Device has "warning" protection status. |
last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.protection.status[{#NAME}])="warning" |Warning |
Manual close: Yes Depends on:
|
|
Acronis: Device [{#NAME}]:[{#ID}]: Previous protection scan not successful | Device has "error" protection status. |
last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.protection.scan.prev.ok[{#NAME}])<>last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.protection.scan.prev[{#NAME}]) |Average |
Manual close: Yes | |
Acronis: Device [{#NAME}]:[{#ID}]: Scheduled antimalware scan failed to run | Scheduled antimalware scan failed to run. |
last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.protection.scan.next[{#NAME}]) < (now() + {$ACRONIS.CPC.OFFSET.SCHEDULED.ANTIMALWARE}) |Warning |
Manual close: Yes | |
Acronis: Device [{#NAME}]:[{#ID}]: Previous machine backup run not successful | Previous machine backup did not run successfully. |
max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.backup.prev.ok[{#NAME}],1m)<>max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.backup.prev[{#NAME}],1m) |Average |
Manual close: Yes | |
Acronis: Device [{#NAME}]:[{#ID}]: Scheduled machine backup failed to run | Scheduled machine backup failed to run. |
last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.backup.next[{#NAME}]) < (now() + {$ACRONIS.CPC.OFFSET.SCHEDULED.BACKUP}) |Warning |
Manual close: Yes | |
Acronis: Device [{#NAME}]:[{#ID}]: Previous vulnerability assessment not successful | Previous vulnerability assessment did not run successfully. |
max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.vuln.prev.ok[{#NAME}],1m)<>max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.vuln.prev[{#NAME}],1m) |Average |
Manual close: Yes | |
Acronis: Device [{#NAME}]:[{#ID}]: Scheduled vulnerability assessment failed to run | Scheduled vulnerability assessment failed to run. |
last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.vuln.next[{#NAME}]) < (now() + {$ACRONIS.CPC.OFFSET.SCHEDULED.VULNERABILITY}) |Warning |
Manual close: Yes | |
Acronis: Device [{#NAME}]:[{#ID}]: Previous patch management run not successful | Previous patch management run did not run successfully. |
max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.patch.prev.ok[{#NAME}],1m)<>max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.patch.prev[{#NAME}],1m) |Average |
Manual close: Yes | |
Acronis: Device [{#NAME}]:[{#ID}]: Scheduled patch management failed to run | Scheduled patch management failed to run. |
last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.patch.next[{#NAME}]) < (now() + {$ACRONIS.CPC.OFFSET.SCHEDULED.PATCH}) |Warning |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums