This template is designed for the effortless deployment of Apache Zookeeper monitoring by Zabbix via HTTP and doesn't require any external scripts.
This template works with standalone and cluster instances. Metrics are collected from each Zookeeper node by requests to AdminServer.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Enable the AdminServer and configure the parameters according to the official documentation
.
Set the hostname or IP address of the Apache Zookeeper host in the {$ZOOKEEPER.HOST}
macro. You can also change the {$ZOOKEEPER.COMMAND_URL}
, {$ZOOKEEPER.PORT}
and {$ZOOKEEPER.SCHEME}
macros if necessary.
Name | Description | Default |
---|---|---|
{$ZOOKEEPER.HOST} | The hostname or IP address of the Apache Zookeeper host. |
<SET ZOOKEEPER HOST> |
{$ZOOKEEPER.PORT} | The port the embedded Jetty server listens on (admin.serverPort). |
8080 |
{$ZOOKEEPER.COMMAND_URL} | The URL for listing and issuing commands relative to the root URL (admin.commandURL). |
commands |
{$ZOOKEEPER.SCHEME} | Request scheme which may be http or https |
http |
{$ZOOKEEPER.FILE_DESCRIPTORS.MAX.WARN} | Maximum percentage of file descriptors usage alert threshold (for trigger expression). |
85 |
{$ZOOKEEPER.OUTSTANDING_REQ.MAX.WARN} | Maximum number of outstanding requests (for trigger expression). |
10 |
{$ZOOKEEPER.PENDING_SYNCS.MAX.WARN} | Maximum number of pending syncs from the followers (for trigger expression). |
10 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Zookeeper: Get server metrics | HTTP agent | zookeeper.get_metrics | |
Zookeeper: Get connections stats | Get information on client connections to server. Note, depending on the number of client connections this operation may be expensive (i.e. impact server performance). |
HTTP agent | zookeeper.getconnectionsstats |
Zookeeper: Server mode | Mode of the server. In an ensemble, this may either be leader or follower. Otherwise, it is standalone |
Dependent item | zookeeper.server_state Preprocessing
|
Zookeeper: Uptime | Uptime that a peer has been in a table leading/following/observing state. |
Dependent item | zookeeper.uptime Preprocessing
|
Zookeeper: Version | Version of Zookeeper server. |
Dependent item | zookeeper.version Preprocessing
|
Zookeeper: Approximate data size | Data tree size in bytes.The size includes the znode path and its value. |
Dependent item | zookeeper.approximatedatasize Preprocessing
|
Zookeeper: File descriptors, max | Maximum number of file descriptors that a zookeeper server can open. |
Dependent item | zookeeper.maxfiledescriptor_count Preprocessing
|
Zookeeper: File descriptors, open | Number of file descriptors that a zookeeper server has open. |
Dependent item | zookeeper.openfiledescriptor_count Preprocessing
|
Zookeeper: Outstanding requests | The number of queued requests when the server is under load and is receiving more sustained requests than it can process. |
Dependent item | zookeeper.outstanding_requests Preprocessing
|
Zookeeper: Commit per sec | The number of commits performed per second |
Dependent item | zookeeper.commit_count.rate Preprocessing
|
Zookeeper: Diff syncs per sec | Number of diff syncs performed per second |
Dependent item | zookeeper.diff_count.rate Preprocessing
|
Zookeeper: Snap syncs per sec | Number of snap syncs performed per second |
Dependent item | zookeeper.snap_count.rate Preprocessing
|
Zookeeper: Looking per sec | Rate of transitions into looking state. |
Dependent item | zookeeper.looking_count.rate Preprocessing
|
Zookeeper: Alive connections | Number of active clients connected to a zookeeper server. |
Dependent item | zookeeper.numaliveconnections Preprocessing
|
Zookeeper: Global sessions | Number of global sessions. |
Dependent item | zookeeper.global_sessions Preprocessing
|
Zookeeper: Local sessions | Number of local sessions. |
Dependent item | zookeeper.local_sessions Preprocessing
|
Zookeeper: Drop connections per sec | Rate of connection drops. |
Dependent item | zookeeper.connectiondropcount.rate Preprocessing
|
Zookeeper: Rejected connections per sec | Rate of connection rejected. |
Dependent item | zookeeper.connection_rejected.rate Preprocessing
|
Zookeeper: Revalidate connections per sec | Rate of connection revalidations. |
Dependent item | zookeeper.connectionrevalidatecount.rate Preprocessing
|
Zookeeper: Revalidate per sec | Rate of revalidations. |
Dependent item | zookeeper.revalidate_count.rate Preprocessing
|
Zookeeper: Latency, max | The maximum amount of time it takes for the server to respond to a client request. |
Dependent item | zookeeper.max_latency Preprocessing
|
Zookeeper: Latency, min | The minimum amount of time it takes for the server to respond to a client request. |
Dependent item | zookeeper.min_latency Preprocessing
|
Zookeeper: Latency, avg | The average amount of time it takes for the server to respond to a client request. |
Dependent item | zookeeper.avg_latency Preprocessing
|
Zookeeper: Znode count | The number of znodes in the ZooKeeper namespace (the data) |
Dependent item | zookeeper.znode_count Preprocessing
|
Zookeeper: Ephemeral nodes count | Number of ephemeral nodes that a zookeeper server has in its data tree. |
Dependent item | zookeeper.ephemerals_count Preprocessing
|
Zookeeper: Watch count | Number of watches currently set on the local ZooKeeper process. |
Dependent item | zookeeper.watch_count Preprocessing
|
Zookeeper: Packets sent per sec | The number of zookeeper packets sent from a server per second. |
Dependent item | zookeeper.packets_sent Preprocessing
|
Zookeeper: Packets received per sec | The number of zookeeper packets received by a server per second. |
Dependent item | zookeeper.packets_received.rate Preprocessing
|
Zookeeper: Bytes received per sec | Number of bytes received per second. |
Dependent item | zookeeper.bytesreceivedcount.rate Preprocessing
|
Zookeeper: Election time, avg | Time between entering and leaving election. |
Dependent item | zookeeper.avgelectiontime Preprocessing
|
Zookeeper: Elections | Number of elections happened. |
Dependent item | zookeeper.cntelectiontime Preprocessing
|
Zookeeper: Fsync time, avg | Time to fsync transaction log. |
Dependent item | zookeeper.avg_fsynctime Preprocessing
|
Zookeeper: Fsync | Count of performed fsyncs. |
Dependent item | zookeeper.cnt_fsynctime Preprocessing
|
Zookeeper: Snapshot write time, avg | Average time to write a snapshot. |
Dependent item | zookeeper.avg_snapshottime Preprocessing
|
Zookeeper: Snapshot writes | Count of performed snapshot writes. |
Dependent item | zookeeper.cnt_snapshottime Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zookeeper: Server mode has changed | Zookeeper node state has changed. Acknowledge to close the problem manually. |
last(/Zookeeper by HTTP/zookeeper.server_state,#1)<>last(/Zookeeper by HTTP/zookeeper.server_state,#2) and length(last(/Zookeeper by HTTP/zookeeper.server_state))>0 |Info |
Manual close: Yes | |
Zookeeper: Failed to fetch info data | Zabbix has not received data for items for the last 10 minutes |
nodata(/Zookeeper by HTTP/zookeeper.uptime,10m)=1 |Warning |
Manual close: Yes | |
Zookeeper: Version has changed | Zookeeper version has changed. Acknowledge to close the problem manually. |
last(/Zookeeper by HTTP/zookeeper.version,#1)<>last(/Zookeeper by HTTP/zookeeper.version,#2) and length(last(/Zookeeper by HTTP/zookeeper.version))>0 |Info |
Manual close: Yes | |
Zookeeper: Too many file descriptors used | Number of file descriptors used more than {$ZOOKEEPER.FILE_DESCRIPTORS.MAX.WARN}% of the available number of file descriptors. |
min(/Zookeeper by HTTP/zookeeper.open_file_descriptor_count,5m) * 100 / last(/Zookeeper by HTTP/zookeeper.max_file_descriptor_count) > {$ZOOKEEPER.FILE_DESCRIPTORS.MAX.WARN} |Warning |
||
Zookeeper: Too many queued requests | Number of queued requests in the server. This goes up when the server receives more requests than it can process. |
min(/Zookeeper by HTTP/zookeeper.outstanding_requests,5m)>{$ZOOKEEPER.OUTSTANDING_REQ.MAX.WARN} |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Leader metrics discovery | Additional metrics for leader node |
Dependent item | zookeeper.metrics.leader Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Zookeeper: Pending syncs{#SINGLETON} | Number of pending syncs to carry out to ZooKeeper ensemble followers. |
Dependent item | zookeeper.pending_syncs[{#SINGLETON}] Preprocessing
|
Zookeeper: Quorum size{#SINGLETON} | Dependent item | zookeeper.quorum_size[{#SINGLETON}] Preprocessing
|
|
Zookeeper: Synced followers{#SINGLETON} | Number of synced followers reported when a node server_state is leader. |
Dependent item | zookeeper.synced_followers[{#SINGLETON}] Preprocessing
|
Zookeeper: Synced non-voting follower{#SINGLETON} | Number of synced voting followers reported when a node server_state is leader. |
Dependent item | zookeeper.syncednonvoting_followers[{#SINGLETON}] Preprocessing
|
Zookeeper: Synced observers{#SINGLETON} | Number of synced observers. |
Dependent item | zookeeper.synced_observers[{#SINGLETON}] Preprocessing
|
Zookeeper: Learners{#SINGLETON} | Number of learners. |
Dependent item | zookeeper.learners[{#SINGLETON}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zookeeper: Too many pending syncs | min(/Zookeeper by HTTP/zookeeper.pending_syncs[{#SINGLETON}],5m)>{$ZOOKEEPER.PENDING_SYNCS.MAX.WARN} |Average |
Manual close: Yes | ||
Zookeeper: Too few active followers | The number of followers should equal the total size of your ZooKeeper ensemble, minus 1 (the leader is not included in the follower count). If the ensemble fails to maintain quorum, all automatic failover features are suspended. |
last(/Zookeeper by HTTP/zookeeper.synced_followers[{#SINGLETON}]) < last(/Zookeeper by HTTP/zookeeper.quorum_size[{#SINGLETON}])-1 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Clients discovery | Get list of client connections. Note, depending on the number of client connections this operation may be expensive (i.e. impact server performance). |
HTTP agent | zookeeper.clients Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Zookeeper client {#TYPE} [{#CLIENT}]: Get client info | The item gets information about "{#CLIENT}" client of "{#TYPE}" type. |
Dependent item | zookeeper.client_info[{#TYPE},{#CLIENT}] Preprocessing
|
Zookeeper client {#TYPE} [{#CLIENT}]: Latency, max | The maximum amount of time it takes for the server to respond to a client request. |
Dependent item | zookeeper.max_latency[{#TYPE},{#CLIENT}] Preprocessing
|
Zookeeper client {#TYPE} [{#CLIENT}]: Latency, min | The minimum amount of time it takes for the server to respond to a client request. |
Dependent item | zookeeper.min_latency[{#TYPE},{#CLIENT}] Preprocessing
|
Zookeeper client {#TYPE} [{#CLIENT}]: Latency, avg | The average amount of time it takes for the server to respond to a client request. |
Dependent item | zookeeper.avg_latency[{#TYPE},{#CLIENT}] Preprocessing
|
Zookeeper client {#TYPE} [{#CLIENT}]: Packets sent per sec | The number of packets sent. |
Dependent item | zookeeper.packets_sent[{#TYPE},{#CLIENT}] Preprocessing
|
Zookeeper client {#TYPE} [{#CLIENT}]: Packets received per sec | The number of packets received. |
Dependent item | zookeeper.packets_received[{#TYPE},{#CLIENT}] Preprocessing
|
Zookeeper client {#TYPE} [{#CLIENT}]: Outstanding requests | The number of queued requests when the server is under load and is receiving more sustained requests than it can process. |
Dependent item | zookeeper.outstanding_requests[{#TYPE},{#CLIENT}] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed to monitor internal Zabbix metrics on the remote Zabbix server.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Specify the address of the remote Zabbix server by changing {$ZABBIX.SERVER.ADDRESS}
and {$ZABBIX.SERVER.PORT}
macros. Don't forget to adjust the StatsAllowedIP
parameter in the remote server's configuration file to allow the collection of statistics.
Name | Description | Default |
---|---|---|
{$ZABBIX.SERVER.ADDRESS} | IP/DNS/network mask list of servers to be remotely queried (default is 127.0.0.1). |
|
{$ZABBIX.SERVER.PORT} | Port of server to be remotely queried (default is 10051). |
|
{$PROXY.LAST_SEEN.MAX} | The maximum number of seconds that Zabbix proxy has not been seen. |
600 |
{$ZABBIX.SERVER.NODATA_TIMEOUT} | The time threshold after which statistics are considered unavailable. Used in trigger expression. |
5m |
Name | Description | Type | Key and additional info |
---|---|---|---|
Remote Zabbix server: Zabbix stats | The master item of Zabbix server statistics. |
Zabbix internal | zabbix[stats,{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT}] |
Zabbix proxies stats | The master item of Zabbix proxies' statistics. |
Dependent item | zabbix.proxies.stats Preprocessing
|
Remote Zabbix server: Zabbix stats queue over 10m | The number of monitored items in the queue, which are delayed at least by 10 minutes. |
Zabbix internal | zabbix[stats,{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT},queue,10m] Preprocessing
|
Remote Zabbix server: Zabbix stats queue | The number of monitored items in the queue, which are delayed at least by 6 seconds. |
Zabbix internal | zabbix[stats,{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT},queue] Preprocessing
|
Remote Zabbix server: Utilization of alert manager internal processes, in % | The average percentage of the time during which the alert manager processes have been busy for the last minute. |
Dependent item | process.alert_manager.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of alert syncer internal processes, in % | The average percentage of the time during which the alert syncer processes have been busy for the last minute. |
Dependent item | process.alert_syncer.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of alerter internal processes, in % | The average percentage of the time during which the alerter processes have been busy for the last minute. |
Dependent item | process.alerter.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of availability manager internal processes, in % | The average percentage of the time during which the availability manager processes have been busy for the last minute. |
Dependent item | process.availability_manager.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of configuration syncer internal processes, in % | The average percentage of the time during which the configuration syncer processes have been busy for the last minute. |
Dependent item | process.configuration_syncer.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of discoverer data collector processes, in % | The average percentage of the time during which the discoverer processes have been busy for the last minute. |
Dependent item | process.discoverer.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of escalator internal processes, in % | The average percentage of the time during which the escalator processes have been busy for the last minute. |
Dependent item | process.escalator.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of history poller data collector processes, in % | The average percentage of the time during which the history poller processes have been busy for the last minute. |
Dependent item | process.history_poller.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of ODBC poller data collector processes, in % | The average percentage of the time during which the ODBC poller processes have been busy for the last minute. |
Dependent item | process.odbc_poller.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of history syncer internal processes, in % | The average percentage of the time during which the history syncer processes have been busy for the last minute. |
Dependent item | process.history_syncer.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of housekeeper internal processes, in % | The average percentage of the time during which the housekeeper processes have been busy for the last minute. |
Dependent item | process.housekeeper.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of http poller data collector processes, in % | The average percentage of the time during which the http poller processes have been busy for the last minute. |
Dependent item | process.http_poller.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of icmp pinger data collector processes, in % | The average percentage of the time during which the icmp pinger processes have been busy for the last minute. |
Dependent item | process.icmp_pinger.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of ipmi manager internal processes, in % | The average percentage of the time during which the ipmi manager processes have been busy for the last minute. |
Dependent item | process.ipmi_manager.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of ipmi poller data collector processes, in % | The average percentage of the time during which the ipmi poller processes have been busy for the last minute. |
Dependent item | process.ipmi_poller.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of java poller data collector processes, in % | The average percentage of the time during which the java poller processes have been busy for the last minute. |
Dependent item | process.java_poller.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of LLD manager internal processes, in % | The average percentage of the time during which the lld manager processes have been busy for the last minute. |
Dependent item | process.lld_manager.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of LLD worker internal processes, in % | The average percentage of the time during which the lld worker processes have been busy for the last minute. |
Dependent item | process.lld_worker.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of connector manager internal processes, in % | The average percentage of the time during which the connector manager processes have been busy for the last minute. |
Dependent item | process.connector_manager.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of connector worker internal processes, in % | The average percentage of the time during which the connector worker processes have been busy for the last minute. |
Dependent item | process.connector_worker.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of poller data collector processes, in % | The average percentage of the time during which the poller processes have been busy for the last minute. |
Dependent item | process.poller.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of preprocessing worker internal processes, in % | The average percentage of the time during which the preprocessing worker processes have been busy for the last minute. |
Dependent item | process.preprocessing_worker.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of preprocessing manager internal processes, in % | The average percentage of the time during which the preprocessing manager processes have been busy for the last minute. |
Dependent item | process.preprocessing_manager.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of proxy poller data collector processes, in % | The average percentage of the time during which the proxy poller processes have been busy for the last minute. |
Dependent item | process.proxy_poller.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of report manager internal processes, in % | The average percentage of the time during which the report manager processes have been busy for the last minute. |
Dependent item | process.report_manager.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of report writer internal processes, in % | The average percentage of the time during which the report writer processes have been busy for the last minute. |
Dependent item | process.report_writer.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of self-monitoring internal processes, in % | The average percentage of the time during which the self-monitoring processes have been busy for the last minute. |
Dependent item | process.self-monitoring.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of snmp trapper data collector processes, in % | The average percentage of the time during which the snmp trapper processes have been busy for the last minute. |
Dependent item | process.snmp_trapper.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of task manager internal processes, in % | The average percentage of the time during which the task manager processes have been busy for the last minute. |
Dependent item | process.task_manager.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of timer internal processes, in % | The average percentage of the time during which the timer processes have been busy for the last minute. |
Dependent item | process.timer.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of service manager internal processes, in % | The average percentage of the time during which the service manager processes have been busy for the last minute. |
Dependent item | process.service_manager.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of trigger housekeeper internal processes, in % | The average percentage of the time during which the trigger housekeeper processes have been busy for the last minute. |
Dependent item | process.trigger_housekeeper.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of trapper data collector processes, in % | The average percentage of the time during which the trapper processes have been busy for the last minute. |
Dependent item | process.trapper.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of unreachable poller data collector processes, in % | The average percentage of the time during which the unreachable poller processes have been busy for the last minute. |
Dependent item | process.unreachable_poller.avg.busy Preprocessing
|
Remote Zabbix server: Utilization of vmware data collector processes, in % | The average percentage of the time during which the vmware collector processes have been busy for the last minute. |
Dependent item | process.vmware_collector.avg.busy Preprocessing
|
Remote Zabbix server: Configuration cache, % used | The availability statistics of Zabbix configuration cache. The percentage of used data buffer. |
Dependent item | rcache.buffer.pused Preprocessing
|
Remote Zabbix server: Trend function cache, % of unique requests | The effectiveness statistics of Zabbix trend function cache. The percentage of cached items calculated from the sum of cached items plus requests. Low percentage most likely means that the cache size can be reduced. |
Dependent item | tcache.pitems Preprocessing
|
Remote Zabbix server: Trend function cache, % of misses | The effectiveness statistics of Zabbix trend function cache. The percentage of cache misses. |
Dependent item | tcache.pmisses Preprocessing
|
Remote Zabbix server: Value cache, % used | The availability statistics of Zabbix value cache. The percentage of used data buffer. |
Dependent item | vcache.buffer.pused Preprocessing
|
Remote Zabbix server: Value cache hits | The effectiveness statistics of Zabbix value cache. The number of cache hits (history values taken from the cache). |
Dependent item | vcache.cache.hits Preprocessing
|
Remote Zabbix server: Value cache misses | The effectiveness statistics of Zabbix value cache. The number of cache misses (history values taken from the database). |
Dependent item | vcache.cache.misses Preprocessing
|
Remote Zabbix server: Value cache operating mode | The operating mode of the value cache. |
Dependent item | vcache.cache.mode Preprocessing
|
Remote Zabbix server: Version | A version of Zabbix server. |
Dependent item | version Preprocessing
|
Remote Zabbix server: VMware cache, % used | The availability statistics of Zabbix vmware cache. The percentage of used data buffer. |
Dependent item | vmware.buffer.pused Preprocessing
|
Remote Zabbix server: History write cache, % used | The statistics and availability of Zabbix write cache. The percentage of used history buffer. The history cache is used to store item values. A high number indicates performance problems on the database side. |
Dependent item | wcache.history.pused Preprocessing
|
Remote Zabbix server: History index cache, % used | The statistics and availability of Zabbix write cache. The percentage of used history index buffer. The history index cache is used to index values stored in the history cache. |
Dependent item | wcache.index.pused Preprocessing
|
Remote Zabbix server: Trend write cache, % used | The statistics and availability of Zabbix write cache. The percentage of used trend buffer. The trend cache stores the aggregate of all items that have receive data for the current hour. |
Dependent item | wcache.trend.pused Preprocessing
|
Remote Zabbix server: Number of processed values per second | The statistics and availability of Zabbix write cache. The total number of values processed by Zabbix server or Zabbix proxy, except unsupported items. |
Dependent item | wcache.values Preprocessing
|
Remote Zabbix server: Number of processed numeric (float) values per second | The statistics and availability of Zabbix write cache. The number of processed float values. |
Dependent item | wcache.values.float Preprocessing
|
Remote Zabbix server: Number of processed log values per second | The statistics and availability of Zabbix write cache. The number of processed log values. |
Dependent item | wcache.values.log Preprocessing
|
Remote Zabbix server: Number of processed not supported values per second | The statistics and availability of Zabbix write cache. The number of times the item processing resulted in an item becoming unsupported or keeping that state. |
Dependent item | wcache.values.not_supported Preprocessing
|
Remote Zabbix server: Number of processed character values per second | The statistics and availability of Zabbix write cache. The number of processed character/string values. |
Dependent item | wcache.values.str Preprocessing
|
Remote Zabbix server: Number of processed text values per second | The statistics and availability of Zabbix write cache. The number of processed text values. |
Dependent item | wcache.values.text Preprocessing
|
Remote Zabbix server: LLD queue | The count of values enqueued in the low-level discovery processing queue. |
Dependent item | lld_queue Preprocessing
|
Remote Zabbix server: Preprocessing queue | The count of values enqueued in the preprocessing queue. |
Dependent item | preprocessing_queue Preprocessing
|
Remote Zabbix server: Connector queue | The count of values enqueued in the connector queue. |
Dependent item | connector_queue Preprocessing
|
Remote Zabbix server: Number of processed numeric (unsigned) values per second | The statistics and availability of Zabbix write cache. The number of processed numeric (unsigned) values. |
Dependent item | wcache.values.uint Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Remote Zabbix server: More than 100 items having missing data for more than 10 minutes | The |
min(/Remote Zabbix server health/zabbix[stats,{$ZABBIX.SERVER.ADDRESS},{$ZABBIX.SERVER.PORT},queue,10m],10m)>100 |Warning |
Manual close: Yes | |
Remote Zabbix server: Utilization of alert manager processes is high | avg(/Remote Zabbix server health/process.alert_manager.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of alert syncer processes is high | avg(/Remote Zabbix server health/process.alert_syncer.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of alerter processes is high | avg(/Remote Zabbix server health/process.alerter.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of availability manager processes is high | avg(/Remote Zabbix server health/process.availability_manager.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of configuration syncer processes is high | avg(/Remote Zabbix server health/process.configuration_syncer.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of discoverer processes is high | avg(/Remote Zabbix server health/process.discoverer.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of escalator processes is high | avg(/Remote Zabbix server health/process.escalator.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of history poller processes is high | avg(/Remote Zabbix server health/process.history_poller.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of ODBC poller processes is high | avg(/Remote Zabbix server health/process.odbc_poller.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of history syncer processes is high | avg(/Remote Zabbix server health/process.history_syncer.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of housekeeper processes is high | avg(/Remote Zabbix server health/process.housekeeper.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of http poller processes is high | avg(/Remote Zabbix server health/process.http_poller.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of icmp pinger processes is high | avg(/Remote Zabbix server health/process.icmp_pinger.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of ipmi manager processes is high | avg(/Remote Zabbix server health/process.ipmi_manager.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of ipmi poller processes is high | avg(/Remote Zabbix server health/process.ipmi_poller.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of java poller processes is high | avg(/Remote Zabbix server health/process.java_poller.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of lld manager processes is high | avg(/Remote Zabbix server health/process.lld_manager.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of lld worker processes is high | avg(/Remote Zabbix server health/process.lld_worker.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of connector manager processes is high | avg(/Remote Zabbix server health/process.connector_manager.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of connector worker processes is high | avg(/Remote Zabbix server health/process.connector_worker.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of poller processes is high | avg(/Remote Zabbix server health/process.poller.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of preprocessing worker processes is high | avg(/Remote Zabbix server health/process.preprocessing_worker.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of preprocessing manager processes is high | avg(/Remote Zabbix server health/process.preprocessing_manager.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of proxy poller processes is high | avg(/Remote Zabbix server health/process.proxy_poller.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of report manager processes is high | avg(/Remote Zabbix server health/process.report_manager.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of report writer processes is high | avg(/Remote Zabbix server health/process.report_writer.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of self-monitoring processes is high | avg(/Remote Zabbix server health/process.self-monitoring.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of snmp trapper processes is high | avg(/Remote Zabbix server health/process.snmp_trapper.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of task manager processes is high | avg(/Remote Zabbix server health/process.task_manager.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of timer processes is high | avg(/Remote Zabbix server health/process.timer.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of service manager processes is high | avg(/Remote Zabbix server health/process.service_manager.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of trigger housekeeper processes is high | avg(/Remote Zabbix server health/process.trigger_housekeeper.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of trapper processes is high | avg(/Remote Zabbix server health/process.trapper.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of unreachable poller processes is high | avg(/Remote Zabbix server health/process.unreachable_poller.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: Utilization of vmware collector processes is high | avg(/Remote Zabbix server health/process.vmware_collector.avg.busy,10m)>75 |Average |
Manual close: Yes | ||
Remote Zabbix server: More than 75% used in the configuration cache | Consider increasing |
max(/Remote Zabbix server health/rcache.buffer.pused,10m)>75 |Average |
Manual close: Yes | |
Remote Zabbix server: Failed to fetch stats data | Zabbix has not received statistics data for {$ZABBIX.SERVER.NODATA_TIMEOUT}. |
nodata(/Remote Zabbix server health/rcache.buffer.pused,{$ZABBIX.SERVER.NODATA_TIMEOUT})=1 |Warning |
||
Remote Zabbix server: More than 95% used in the value cache | Consider increasing |
max(/Remote Zabbix server health/vcache.buffer.pused,10m)>95 |Average |
Manual close: Yes | |
Remote Zabbix server: Zabbix value cache working in low memory mode | Once the low memory mode has been switched on, the value cache will remain in this state for 24 hours, even if the problem that triggered this mode is resolved sooner. |
last(/Remote Zabbix server health/vcache.cache.mode)=1 |High |
Manual close: Yes | |
Remote Zabbix server: Version has changed | Zabbix server version has changed. Acknowledge to close the problem manually. |
last(/Remote Zabbix server health/version,#1)<>last(/Remote Zabbix server health/version,#2) and length(last(/Remote Zabbix server health/version))>0 |Info |
Manual close: Yes | |
Remote Zabbix server: More than 75% used in the vmware cache | Consider increasing |
max(/Remote Zabbix server health/vmware.buffer.pused,10m)>75 |Average |
Manual close: Yes | |
Remote Zabbix server: More than 75% used in the history cache | Consider increasing |
max(/Remote Zabbix server health/wcache.history.pused,10m)>75 |Average |
Manual close: Yes | |
Remote Zabbix server: More than 75% used in the history index cache | Consider increasing |
max(/Remote Zabbix server health/wcache.index.pused,10m)>75 |Average |
Manual close: Yes | |
Remote Zabbix server: More than 75% used in the trends cache | Consider increasing |
max(/Remote Zabbix server health/wcache.trend.pused,10m)>75 |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Zabbix proxy discovery | LLD rule with item and trigger prototypes for the proxy discovery. |
Dependent item | zabbix.proxy.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Proxy [{#PROXY.NAME}]: Stats | The statistics for the discovered proxy. |
Dependent item | zabbix.proxy.stats[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Mode | The mode of Zabbix proxy. |
Dependent item | zabbix.proxy.mode[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Unencrypted | The encryption status for connections from a proxy. |
Dependent item | zabbix.proxy.unencrypted[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: PSK | The encryption status for connections from a proxy. |
Dependent item | zabbix.proxy.psk[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Certificate | The encryption status for connections from a proxy. |
Dependent item | zabbix.proxy.cert[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Compression | The compression status of a proxy. |
Dependent item | zabbix.proxy.compression[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Item count | The number of enabled items on enabled hosts assigned to a proxy. |
Dependent item | zabbix.proxy.items[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Host count | The number of enabled hosts assigned to a proxy. |
Dependent item | zabbix.proxy.hosts[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Version | A version of Zabbix proxy. |
Dependent item | zabbix.proxy.version[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Last seen, in seconds | The time when a proxy was last seen by a server. |
Dependent item | zabbix.proxy.last_seen[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Compatibility | Version of proxy compared to Zabbix server version. Possible values: 0 - Undefined; 1 - Current version (proxy and server have the same major version); 2 - Outdated version (proxy version is older than server version, but is partially supported); 3 - Unsupported version (proxy version is older than server previous LTS release version or server major version is older than proxy major version). |
Dependent item | zabbix.proxy.compatibility[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Required VPS | The required performance of a proxy (the number of values that need to be collected per second). |
Dependent item | zabbix.proxy.requiredperformance[{#PROXY.NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Proxy [{#PROXY.NAME}]: Proxy last seen | Zabbix proxy is not updating the configuration data. |
last(/Remote Zabbix server health/zabbix.proxy.last_seen[{#PROXY.NAME}],#1)>{$PROXY.LAST_SEEN.MAX} |Warning |
||
Proxy [{#PROXY.NAME}]: Zabbix proxy never seen | Zabbix proxy is not updating the configuration data. |
last(/Remote Zabbix server health/zabbix.proxy.last_seen[{#PROXY.NAME}],#1)=-1 |Warning |
||
Proxy [{#PROXY.NAME}]: Zabbix proxy is outdated | Zabbix proxy version is older than server version, but is partially supported. Only data collection and remote execution is available. |
last(/Remote Zabbix server health/zabbix.proxy.compatibility[{#PROXY.NAME}],#1)=2 |Warning |
||
Proxy [{#PROXY.NAME}]: Zabbix proxy is not supported | Zabbix proxy version is older than server previous LTS release version or server major version is older than proxy major version. |
last(/Remote Zabbix server health/zabbix.proxy.compatibility[{#PROXY.NAME}],#1)=3 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
High availability cluster node discovery | LLD rule with item and trigger prototypes for the node discovery. |
Dependent item | zabbix.nodes.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster node [{#NODE.NAME}]: Stats | Provides the statistics of a node. |
Dependent item | zabbix.nodes.stats[{#NODE.ID}] Preprocessing
|
Cluster node [{#NODE.NAME}]: Address | The IPv4 address of a node. |
Dependent item | zabbix.nodes.address[{#NODE.ID}] Preprocessing
|
Cluster node [{#NODE.NAME}]: Last access time | Last access time. |
Dependent item | zabbix.nodes.lastaccess.time[{#NODE.ID}] Preprocessing
|
Cluster node [{#NODE.NAME}]: Last access age | The time between the database's |
Dependent item | zabbix.nodes.lastaccess.age[{#NODE.ID}] Preprocessing
|
Cluster node [{#NODE.NAME}]: Status | The status of a node. |
Dependent item | zabbix.nodes.status[{#NODE.ID}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Cluster node [{#NODE.NAME}]: Status changed | The state of the node has changed. Acknowledge to close the problem manually. |
last(/Remote Zabbix server health/zabbix.nodes.status[{#NODE.ID}],#1)<>last(/Remote Zabbix server health/zabbix.nodes.status[{#NODE.ID}],#2) |Info |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed to monitor internal Zabbix metrics on the local Zabbix server.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Link this template to the local Zabbix server host.
Name | Description | Default |
---|---|---|
{$PROXY.LAST_SEEN.MAX} | The maximum number of seconds that Zabbix proxy has not been seen. |
600 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Zabbix server: Zabbix stats cluster | The master item of Zabbix cluster statistics. |
Zabbix internal | zabbix[cluster,discovery,nodes] |
Zabbix server: Zabbix proxies stats | The master item of Zabbix proxies' statistics. |
Zabbix internal | zabbix[proxy,discovery] |
Zabbix server: Queue over 10 minutes | The number of monitored items in the queue, which are delayed at least by 10 minutes. |
Zabbix internal | zabbix[queue,10m] |
Zabbix server: Queue | The number of monitored items in the queue, which are delayed at least by 6 seconds. |
Zabbix internal | zabbix[queue] |
Zabbix server: Utilization of alert manager internal processes, in % | The average percentage of the time during which the alert manager processes have been busy for the last minute. |
Zabbix internal | zabbix[process,alert manager,avg,busy] |
Zabbix server: Utilization of alert syncer internal processes, in % | The average percentage of the time during which the alert syncer processes have been busy for the last minute. |
Zabbix internal | zabbix[process,alert syncer,avg,busy] |
Zabbix server: Utilization of alerter internal processes, in % | The average percentage of the time during which the alerter processes have been busy for the last minute. |
Zabbix internal | zabbix[process,alerter,avg,busy] |
Zabbix server: Utilization of availability manager internal processes, in % | The average percentage of the time during which the availability manager processes have been busy for the last minute. |
Zabbix internal | zabbix[process,availability manager,avg,busy] |
Zabbix server: Utilization of configuration syncer internal processes, in % | The average percentage of the time during which the configuration syncer processes have been busy for the last minute. |
Zabbix internal | zabbix[process,configuration syncer,avg,busy] |
Zabbix server: Utilization of discoverer data collector processes, in % | The average percentage of the time during which the discoverer processes have been busy for the last minute. |
Zabbix internal | zabbix[process,discoverer,avg,busy] |
Zabbix server: Utilization of escalator internal processes, in % | The average percentage of the time during which the escalator processes have been busy for the last minute. |
Zabbix internal | zabbix[process,escalator,avg,busy] |
Zabbix server: Utilization of history poller data collector processes, in % | The average percentage of the time during which the history poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,history poller,avg,busy] |
Zabbix server: Utilization of ODBC poller data collector processes, in % | The average percentage of the time during which the ODBC poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,odbc poller,avg,busy] |
Zabbix server: Utilization of history syncer internal processes, in % | The average percentage of the time during which the history syncer processes have been busy for the last minute. |
Zabbix internal | zabbix[process,history syncer,avg,busy] |
Zabbix server: Utilization of housekeeper internal processes, in % | The average percentage of the time during which the housekeeper processes have been busy for the last minute. |
Zabbix internal | zabbix[process,housekeeper,avg,busy] |
Zabbix server: Utilization of http poller data collector processes, in % | The average percentage of the time during which the http poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,http poller,avg,busy] |
Zabbix server: Utilization of icmp pinger data collector processes, in % | The average percentage of the time during which the icmp pinger processes have been busy for the last minute. |
Zabbix internal | zabbix[process,icmp pinger,avg,busy] |
Zabbix server: Utilization of ipmi manager internal processes, in % | The average percentage of the time during which the ipmi manager processes have been busy for the last minute. |
Zabbix internal | zabbix[process,ipmi manager,avg,busy] |
Zabbix server: Utilization of ipmi poller data collector processes, in % | The average percentage of the time during which the ipmi poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,ipmi poller,avg,busy] |
Zabbix server: Utilization of java poller data collector processes, in % | The average percentage of the time during which the java poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,java poller,avg,busy] |
Zabbix server: Utilization of LLD manager internal processes, in % | The average percentage of the time during which the lld manager processes have been busy for the last minute. |
Zabbix internal | zabbix[process,lld manager,avg,busy] |
Zabbix server: Utilization of LLD worker internal processes, in % | The average percentage of the time during which the lld worker processes have been busy for the last minute. |
Zabbix internal | zabbix[process,lld worker,avg,busy] |
Zabbix server: Utilization of connector manager internal processes, in % | The average percentage of the time during which the connector manager processes have been busy for the last minute. |
Zabbix internal | zabbix[process,connector manager,avg,busy] |
Zabbix server: Utilization of connector worker internal processes, in % | The average percentage of the time during which the connector worker processes have been busy for the last minute. |
Zabbix internal | zabbix[process,connector worker,avg,busy] |
Zabbix server: Utilization of poller data collector processes, in % | The average percentage of the time during which the poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,poller,avg,busy] |
Zabbix server: Utilization of preprocessing worker internal processes, in % | The average percentage of the time during which the preprocessing worker processes have been busy for the last minute. |
Zabbix internal | zabbix[process,preprocessing worker,avg,busy] |
Zabbix server: Utilization of preprocessing manager internal processes, in % | The average percentage of the time during which the preprocessing manager processes have been busy for the last minute. |
Zabbix internal | zabbix[process,preprocessing manager,avg,busy] |
Zabbix server: Utilization of proxy poller data collector processes, in % | The average percentage of the time during which the proxy poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,proxy poller,avg,busy] |
Zabbix server: Utilization of report manager internal processes, in % | The average percentage of the time during which the report manager processes have been busy for the last minute. |
Zabbix internal | zabbix[process,report manager,avg,busy] |
Zabbix server: Utilization of report writer internal processes, in % | The average percentage of the time during which the report writer processes have been busy for the last minute. |
Zabbix internal | zabbix[process,report writer,avg,busy] |
Zabbix server: Utilization of self-monitoring internal processes, in % | The average percentage of the time during which the self-monitoring processes have been busy for the last minute. |
Zabbix internal | zabbix[process,self-monitoring,avg,busy] |
Zabbix server: Utilization of snmp trapper data collector processes, in % | The average percentage of the time during which the snmp trapper processes have been busy for the last minute. |
Zabbix internal | zabbix[process,snmp trapper,avg,busy] |
Zabbix server: Utilization of task manager internal processes, in % | The average percentage of the time during which the task manager processes have been busy for the last minute. |
Zabbix internal | zabbix[process,task manager,avg,busy] |
Zabbix server: Utilization of timer internal processes, in % | The average percentage of the time during which the timer processes have been busy for the last minute. |
Zabbix internal | zabbix[process,timer,avg,busy] |
Zabbix server: Utilization of service manager internal processes, in % | The average percentage of the time during which the service manager processes have been busy for the last minute. |
Zabbix internal | zabbix[process,service manager,avg,busy] |
Zabbix server: Utilization of trigger housekeeper internal processes, in % | The average percentage of the time during which the trigger housekeeper processes have been busy for the last minute. |
Zabbix internal | zabbix[process,trigger housekeeper,avg,busy] |
Zabbix server: Utilization of trapper data collector processes, in % | The average percentage of the time during which the trapper processes have been busy for the last minute. |
Zabbix internal | zabbix[process,trapper,avg,busy] |
Zabbix server: Utilization of unreachable poller data collector processes, in % | The average percentage of the time during which the unreachable poller processes have been busy for the last minute. |
Zabbix internal | zabbix[process,unreachable poller,avg,busy] |
Zabbix server: Utilization of vmware data collector processes, in % | The average percentage of the time during which the vmware collector processes have been busy for the last minute. |
Zabbix internal | zabbix[process,vmware collector,avg,busy] |
Zabbix server: Configuration cache, % used | The availability statistics of Zabbix configuration cache. The percentage of used data buffer. |
Zabbix internal | zabbix[rcache,buffer,pused] |
Zabbix server: Trend function cache, % of unique requests | The effectiveness statistics of Zabbix trend function cache. The percentage of cached items calculated from the sum of cached items plus requests. Low percentage most likely means that the cache size can be reduced. |
Zabbix internal | zabbix[tcache,cache,pitems] |
Zabbix server: Trend function cache, % of misses | The effectiveness statistics of Zabbix trend function cache. The percentage of cache misses. |
Zabbix internal | zabbix[tcache,cache,pmisses] |
Zabbix server: Value cache, % used | The availability statistics of Zabbix value cache. The percentage of used data buffer. |
Zabbix internal | zabbix[vcache,buffer,pused] |
Zabbix server: Value cache hits | The effectiveness statistics of Zabbix value cache. The number of cache hits (history values taken from the cache). |
Zabbix internal | zabbix[vcache,cache,hits] Preprocessing
|
Zabbix server: Value cache misses | The effectiveness statistics of Zabbix value cache. The number of cache misses (history values taken from the database). |
Zabbix internal | zabbix[vcache,cache,misses] Preprocessing
|
Zabbix server: Value cache operating mode | The operating mode of the value cache. |
Zabbix internal | zabbix[vcache,cache,mode] |
Zabbix server: Version | A version of Zabbix server. |
Zabbix internal | zabbix[version] Preprocessing
|
Zabbix server: VMware cache, % used | The availability statistics of Zabbix vmware cache. The percentage of used data buffer. |
Zabbix internal | zabbix[vmware,buffer,pused] |
Zabbix server: History write cache, % used | The statistics and availability of Zabbix write cache. The percentage of used history buffer. The history cache is used to store item values. A high number indicates performance problems on the database side. |
Zabbix internal | zabbix[wcache,history,pused] |
Zabbix server: History index cache, % used | The statistics and availability of Zabbix write cache. The percentage of used history index buffer. The history index cache is used to index values stored in the history cache. |
Zabbix internal | zabbix[wcache,index,pused] |
Zabbix server: Trend write cache, % used | The statistics and availability of Zabbix write cache. The percentage of used trend buffer. The trend cache stores the aggregate of all items that have received data for the current hour. |
Zabbix internal | zabbix[wcache,trend,pused] |
Zabbix server: Number of processed values per second | The statistics and availability of Zabbix write cache. The total number of values processed by Zabbix server or Zabbix proxy, except unsupported items. |
Zabbix internal | zabbix[wcache,values] Preprocessing
|
Zabbix server: Number of processed numeric (float) values per second | The statistics and availability of Zabbix write cache. The number of processed float values. |
Zabbix internal | zabbix[wcache,values,float] Preprocessing
|
Zabbix server: Number of processed log values per second | The statistics and availability of Zabbix write cache. The number of processed log values. |
Zabbix internal | zabbix[wcache,values,log] Preprocessing
|
Zabbix server: Number of processed not supported values per second | The statistics and availability of Zabbix write cache. The number of times the item processing resulted in an item becoming unsupported or keeping that state. |
Zabbix internal | zabbix[wcache,values,not supported] Preprocessing
|
Zabbix server: Number of processed character values per second | The statistics and availability of Zabbix write cache. The number of processed character/string values. |
Zabbix internal | zabbix[wcache,values,str] Preprocessing
|
Zabbix server: Number of processed text values per second | The statistics and availability of Zabbix write cache. The number of processed text values. |
Zabbix internal | zabbix[wcache,values,text] Preprocessing
|
Zabbix server: LLD queue | The count of values enqueued in the low-level discovery processing queue. |
Zabbix internal | zabbix[lld_queue] |
Zabbix server: Preprocessing queue | The count of values enqueued in the preprocessing queue. |
Zabbix internal | zabbix[preprocessing_queue] |
Zabbix server: Connector queue | The count of values enqueued in the connector queue. |
Zabbix internal | zabbix[connector_queue] |
Zabbix server: Number of processed numeric (unsigned) values per second | The statistics and availability of Zabbix write cache. The number of processed numeric (unsigned) values. |
Zabbix internal | zabbix[wcache,values,uint] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zabbix server: More than 100 items having missing data for more than 10 minutes | The |
min(/Zabbix server health/zabbix[queue,10m],10m)>100 |Warning |
Manual close: Yes | |
Zabbix server: Utilization of alert manager processes is high | avg(/Zabbix server health/zabbix[process,alert manager,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of alert syncer processes is high | avg(/Zabbix server health/zabbix[process,alert syncer,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of alerter processes is high | avg(/Zabbix server health/zabbix[process,alerter,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of availability manager processes is high | avg(/Zabbix server health/zabbix[process,availability manager,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of configuration syncer processes is high | avg(/Zabbix server health/zabbix[process,configuration syncer,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of discoverer processes is high | avg(/Zabbix server health/zabbix[process,discoverer,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of escalator processes is high | avg(/Zabbix server health/zabbix[process,escalator,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of history poller processes is high | avg(/Zabbix server health/zabbix[process,history poller,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of ODBC poller processes is high | avg(/Zabbix server health/zabbix[process,odbc poller,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of history syncer processes is high | avg(/Zabbix server health/zabbix[process,history syncer,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of housekeeper processes is high | avg(/Zabbix server health/zabbix[process,housekeeper,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of http poller processes is high | avg(/Zabbix server health/zabbix[process,http poller,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of icmp pinger processes is high | avg(/Zabbix server health/zabbix[process,icmp pinger,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of ipmi manager processes is high | avg(/Zabbix server health/zabbix[process,ipmi manager,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of ipmi poller processes is high | avg(/Zabbix server health/zabbix[process,ipmi poller,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of java poller processes is high | avg(/Zabbix server health/zabbix[process,java poller,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of lld manager processes is high | avg(/Zabbix server health/zabbix[process,lld manager,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of lld worker processes is high | avg(/Zabbix server health/zabbix[process,lld worker,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of connector manager processes is high | avg(/Zabbix server health/zabbix[process,connector manager,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of connector worker processes is high | avg(/Zabbix server health/zabbix[process,connector worker,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of poller processes is high | avg(/Zabbix server health/zabbix[process,poller,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of preprocessing worker processes is high | avg(/Zabbix server health/zabbix[process,preprocessing worker,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of preprocessing manager processes is high | avg(/Zabbix server health/zabbix[process,preprocessing manager,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of proxy poller processes is high | avg(/Zabbix server health/zabbix[process,proxy poller,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of report manager processes is high | avg(/Zabbix server health/zabbix[process,report manager,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of report writer processes is high | avg(/Zabbix server health/zabbix[process,report writer,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of self-monitoring processes is high | avg(/Zabbix server health/zabbix[process,self-monitoring,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of snmp trapper processes is high | avg(/Zabbix server health/zabbix[process,snmp trapper,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of task manager processes is high | avg(/Zabbix server health/zabbix[process,task manager,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of timer processes is high | avg(/Zabbix server health/zabbix[process,timer,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of service manager processes is high | avg(/Zabbix server health/zabbix[process,service manager,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of trigger housekeeper processes is high | avg(/Zabbix server health/zabbix[process,trigger housekeeper,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of trapper processes is high | avg(/Zabbix server health/zabbix[process,trapper,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of unreachable poller processes is high | avg(/Zabbix server health/zabbix[process,unreachable poller,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: Utilization of vmware collector processes is high | avg(/Zabbix server health/zabbix[process,vmware collector,avg,busy],10m)>75 |Average |
Manual close: Yes | ||
Zabbix server: More than 75% used in the configuration cache | Consider increasing |
max(/Zabbix server health/zabbix[rcache,buffer,pused],10m)>75 |Average |
Manual close: Yes | |
Zabbix server: More than 95% used in the value cache | Consider increasing |
max(/Zabbix server health/zabbix[vcache,buffer,pused],10m)>95 |Average |
Manual close: Yes | |
Zabbix server: Zabbix value cache working in low memory mode | Once the low memory mode has been switched on, the value cache will remain in this state for 24 hours, even if the problem that triggered this mode is resolved sooner. |
last(/Zabbix server health/zabbix[vcache,cache,mode])=1 |High |
Manual close: Yes | |
Zabbix server: Version has changed | Zabbix server version has changed. Acknowledge to close the problem manually. |
last(/Zabbix server health/zabbix[version],#1)<>last(/Zabbix server health/zabbix[version],#2) and length(last(/Zabbix server health/zabbix[version]))>0 |Info |
Manual close: Yes | |
Zabbix server: More than 75% used in the vmware cache | Consider increasing |
max(/Zabbix server health/zabbix[vmware,buffer,pused],10m)>75 |Average |
Manual close: Yes | |
Zabbix server: More than 75% used in the history cache | Consider increasing |
max(/Zabbix server health/zabbix[wcache,history,pused],10m)>75 |Average |
Manual close: Yes | |
Zabbix server: More than 75% used in the history index cache | Consider increasing |
max(/Zabbix server health/zabbix[wcache,index,pused],10m)>75 |Average |
Manual close: Yes | |
Zabbix server: More than 75% used in the trends cache | Consider increasing |
max(/Zabbix server health/zabbix[wcache,trend,pused],10m)>75 |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Zabbix proxy discovery | LLD rule with item and trigger prototypes for the proxy discovery. |
Dependent item | zabbix.proxy.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Proxy [{#PROXY.NAME}]: Stats | The statistics for the discovered proxy. |
Dependent item | zabbix.proxy.stats[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Mode | The mode of Zabbix proxy. |
Dependent item | zabbix.proxy.mode[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Unencrypted | The encryption status for connections from a proxy. |
Dependent item | zabbix.proxy.unencrypted[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: PSK | The encryption status for connections from a proxy. |
Dependent item | zabbix.proxy.psk[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Certificate | The encryption status for connections from a proxy. |
Dependent item | zabbix.proxy.cert[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Compression | The compression status of a proxy. |
Dependent item | zabbix.proxy.compression[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Item count | The number of enabled items on enabled hosts assigned to a proxy. |
Dependent item | zabbix.proxy.items[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Host count | The number of enabled hosts assigned to a proxy. |
Dependent item | zabbix.proxy.hosts[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Version | A version of Zabbix proxy. |
Dependent item | zabbix.proxy.version[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Last seen, in seconds | The time when a proxy was last seen by a server. |
Dependent item | zabbix.proxy.last_seen[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Compatibility | Version of proxy compared to Zabbix server version. Possible values: 0 - Undefined; 1 - Current version (proxy and server have the same major version); 2 - Outdated version (proxy version is older than server version, but is partially supported); 3 - Unsupported version (proxy version is older than server previous LTS release version or server major version is older than proxy major version). |
Dependent item | zabbix.proxy.compatibility[{#PROXY.NAME}] Preprocessing
|
Proxy [{#PROXY.NAME}]: Required VPS | The required performance of a proxy (the number of values that need to be collected per second). |
Dependent item | zabbix.proxy.requiredperformance[{#PROXY.NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Proxy [{#PROXY.NAME}]: Proxy last seen | Zabbix proxy is not updating the configuration data. |
last(/Zabbix server health/zabbix.proxy.last_seen[{#PROXY.NAME}],#1)>{$PROXY.LAST_SEEN.MAX} |Warning |
||
Proxy [{#PROXY.NAME}]: Zabbix proxy never seen | Zabbix proxy is not updating the configuration data. |
last(/Zabbix server health/zabbix.proxy.last_seen[{#PROXY.NAME}],#1)=-1 |Warning |
||
Proxy [{#PROXY.NAME}]: Zabbix proxy is outdated | Zabbix proxy version is older than server version, but is partially supported. Only data collection and remote execution is available. |
last(/Zabbix server health/zabbix.proxy.compatibility[{#PROXY.NAME}],#1)=2 |Warning |
||
Proxy [{#PROXY.NAME}]: Zabbix proxy is not supported | Zabbix proxy version is older than server previous LTS release version or server major version is older than proxy major version. |
last(/Zabbix server health/zabbix.proxy.compatibility[{#PROXY.NAME}],#1)=3 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
High availability cluster node discovery | LLD rule with item and trigger prototypes for the node discovery. |
Dependent item | zabbix.nodes.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster node [{#NODE.NAME}]: Stats | Provides the statistics of a node. |
Dependent item | zabbix.nodes.stats[{#NODE.ID}] Preprocessing
|
Cluster node [{#NODE.NAME}]: Address | The IPv4 address of a node. |
Dependent item | zabbix.nodes.address[{#NODE.ID}] Preprocessing
|
Cluster node [{#NODE.NAME}]: Last access time | Last access time. |
Dependent item | zabbix.nodes.lastaccess.time[{#NODE.ID}] Preprocessing
|
Cluster node [{#NODE.NAME}]: Last access age | The time between the database's |
Dependent item | zabbix.nodes.lastaccess.age[{#NODE.ID}] Preprocessing
|
Cluster node [{#NODE.NAME}]: Status | The status of a node. |
Dependent item | zabbix.nodes.status[{#NODE.ID}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Cluster node [{#NODE.NAME}]: Status changed | The state of the node has changed. Acknowledge to close the problem manually. |
last(/Zabbix server health/zabbix.nodes.status[{#NODE.ID}],#1)<>last(/Zabbix server health/zabbix.nodes.status[{#NODE.ID}],#2) |Info |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Name | Description | Default |
---|---|---|
{$ZABBIX.PROXY.ADDRESS} | IP/DNS/network mask list of proxies to be remotely queried (default is 127.0.0.1). |
|
{$ZABBIX.PROXY.PORT} | Port of proxy to be remotely queried (default is 10051). |
|
{$ZABBIX.PROXY.UTIL.MAX} | Maximum average percentage of time processes busy in the last minute (default is 75). |
75 |
{$ZABBIX.PROXY.UTIL.MIN} | Minimum average percentage of time processes busy in the last minute (default is 65). |
65 |
{$ZABBIX.PROXY.NODATA_TIMEOUT} | The time threshold after which statistics are considered unavailable. Used in trigger expression. |
5m |
Name | Description | Type | Key and additional info |
---|---|---|---|
Remote Zabbix proxy: Zabbix stats | Zabbix server statistics master item. |
Zabbix internal | zabbix[stats,{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT}] |
Remote Zabbix proxy: Zabbix stats queue over 10m | Number of monitored items in the queue which are delayed at least by 10 minutes. |
Zabbix internal | zabbix[stats,{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue,10m] Preprocessing
|
Remote Zabbix proxy: Zabbix stats queue | Number of monitored items in the queue which are delayed at least by 6 seconds. |
Zabbix internal | zabbix[stats,{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue] Preprocessing
|
Remote Zabbix proxy: Utilization of data sender internal processes, in % | Average percentage of time data sender processes have been busy in the last minute. |
Dependent item | process.data_sender.avg.busy Preprocessing
|
Remote Zabbix proxy: Utilization of availability manager internal processes, in % | Average percentage of time availability manager processes have been busy in the last minute. |
Dependent item | process.availability_manager.avg.busy Preprocessing
|
Remote Zabbix proxy: Utilization of configuration syncer internal processes, in % | Average percentage of time configuration syncer processes have been busy in the last minute. |
Dependent item | process.configuration_syncer.avg.busy Preprocessing
|
Remote Zabbix proxy: Utilization of discoverer data collector processes, in % | Average percentage of time discoverer processes have been busy in the last minute. |
Dependent item | process.discoverer.avg.busy Preprocessing
|
Remote Zabbix proxy: Utilization of ODBC poller data collector processes, in % | Average percentage of time ODBC poller processes have been busy in the last minute. |
Dependent item | process.odbc_poller.avg.busy Preprocessing
|
Remote Zabbix proxy: Utilization of history poller data collector processes, in % | Average percentage of time history poller processes have been busy in the last minute. |
Dependent item | process.history_poller.avg.busy Preprocessing
|
Remote Zabbix proxy: Utilization of history syncer internal processes, in % | Average percentage of time history syncer processes have been busy in the last minute. |
Dependent item | process.history_syncer.avg.busy Preprocessing
|
Remote Zabbix proxy: Utilization of housekeeper internal processes, in % | Average percentage of time housekeeper processes have been busy in the last minute. |
Dependent item | process.housekeeper.avg.busy Preprocessing
|
Remote Zabbix proxy: Utilization of http poller data collector processes, in % | Average percentage of time http poller processes have been busy in the last minute. |
Dependent item | process.http_poller.avg.busy Preprocessing
|
Remote Zabbix proxy: Utilization of icmp pinger data collector processes, in % | Average percentage of time icmp pinger processes have been busy in the last minute. |
Dependent item | process.icmp_pinger.avg.busy Preprocessing
|
Remote Zabbix proxy: Utilization of ipmi manager internal processes, in % | Average percentage of time ipmi manager processes have been busy in the last minute. |
Dependent item | process.ipmi_manager.avg.busy Preprocessing
|
Remote Zabbix proxy: Utilization of ipmi poller data collector processes, in % | Average percentage of time ipmi poller processes have been busy in the last minute. |
Dependent item | process.ipmi_poller.avg.busy Preprocessing
|
Remote Zabbix proxy: Utilization of java poller data collector processes, in % | Average percentage of time java poller processes have been busy in the last minute. |
Dependent item | process.java_poller.avg.busy Preprocessing
|
Remote Zabbix proxy: Utilization of poller data collector processes, in % | Average percentage of time poller processes have been busy in the last minute. |
Dependent item | process.poller.avg.busy Preprocessing
|
Remote Zabbix proxy: Utilization of preprocessing worker internal processes, in % | Average percentage of time preprocessing worker processes have been busy in the last minute. |
Dependent item | process.preprocessing_worker.avg.busy Preprocessing
|
Remote Zabbix proxy: Utilization of preprocessing manager internal processes, in % | Average percentage of time preprocessing manager processes have been busy in the last minute. |
Dependent item | process.preprocessing_manager.avg.busy Preprocessing
|
Remote Zabbix proxy: Utilization of self-monitoring internal processes, in % | Average percentage of time self-monitoring processes have been busy in the last minute. |
Dependent item | process.self-monitoring.avg.busy Preprocessing
|
Remote Zabbix proxy: Utilization of snmp trapper data collector processes, in % | Average percentage of time snmp trapper processes have been busy in the last minute. |
Dependent item | process.snmp_trapper.avg.busy Preprocessing
|
Remote Zabbix proxy: Utilization of task manager internal processes, in % | Average percentage of time task manager processes have been busy in the last minute. |
Dependent item | process.task_manager.avg.busy Preprocessing
|
Remote Zabbix proxy: Utilization of trapper data collector processes, in % | Average percentage of time trapper processes have been busy in the last minute. |
Dependent item | process.trapper.avg.busy Preprocessing
|
Remote Zabbix proxy: Utilization of unreachable poller data collector processes, in % | Average percentage of time unreachable poller processes have been busy in the last minute. |
Dependent item | process.unreachable_poller.avg.busy Preprocessing
|
Remote Zabbix proxy: Utilization of vmware data collector processes, in % | Average percentage of time vmware collector processes have been busy in the last minute. |
Dependent item | process.vmware_collector.avg.busy Preprocessing
|
Remote Zabbix proxy: Configuration cache, % used | Availability statistics of Zabbix configuration cache. Percentage of used buffer. |
Dependent item | rcache.buffer.pused Preprocessing
|
Remote Zabbix proxy: Version | Version of Zabbix proxy. |
Dependent item | version Preprocessing
|
Remote Zabbix proxy: VMware cache, % used | Availability statistics of Zabbix vmware cache. Percentage of used buffer. |
Dependent item | vmware.buffer.pused Preprocessing
|
Remote Zabbix proxy: History write cache, % used | Statistics and availability of Zabbix write cache. Percentage of used history buffer. History cache is used to store item values. A high number indicates performance problems on the database side. |
Dependent item | wcache.history.pused Preprocessing
|
Remote Zabbix proxy: History index cache, % used | Statistics and availability of Zabbix write cache. Percentage of used history index buffer. History index cache is used to index values stored in history cache. |
Dependent item | wcache.index.pused Preprocessing
|
Remote Zabbix proxy: Number of processed values per second | Statistics and availability of Zabbix write cache. Total number of values processed by Zabbix server or Zabbix proxy, except unsupported items. |
Dependent item | wcache.values Preprocessing
|
Remote Zabbix proxy: Number of processed numeric (float) values per second | Statistics and availability of Zabbix write cache. Number of processed float values. |
Dependent item | wcache.values.float Preprocessing
|
Remote Zabbix proxy: Number of processed log values per second | Statistics and availability of Zabbix write cache. Number of processed log values. |
Dependent item | wcache.values.log Preprocessing
|
Remote Zabbix proxy: Number of processed not supported values per second | Statistics and availability of Zabbix write cache. Number of times item processing resulted in item becoming unsupported or keeping that state. |
Dependent item | wcache.values.not_supported Preprocessing
|
Remote Zabbix proxy: Number of processed character values per second | Statistics and availability of Zabbix write cache. Number of processed character/string values. |
Dependent item | wcache.values.str Preprocessing
|
Remote Zabbix proxy: Number of processed text values per second | Statistics and availability of Zabbix write cache. Number of processed text values. |
Dependent item | wcache.values.text Preprocessing
|
Remote Zabbix proxy: Preprocessing queue | Count of values enqueued in the preprocessing queue. |
Dependent item | preprocessing_queue Preprocessing
|
Remote Zabbix proxy: Number of processed numeric (unsigned) values per second | Statistics and availability of Zabbix write cache. Number of processed numeric (unsigned) values. |
Dependent item | wcache.values.uint Preprocessing
|
Remote Zabbix proxy: Required performance | Required performance of Zabbix proxy, in new values per second expected. |
Dependent item | requiredperformance Preprocessing
|
Remote Zabbix proxy: Uptime | Uptime of Zabbix proxy process in seconds. |
Dependent item | uptime Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Remote Zabbix proxy: More than 100 items having missing data for more than 10 minutes | zabbix[stats,{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue,10m] item is collecting data about how many items are missing data for more than 10 minutes. |
min(/Remote Zabbix proxy health/zabbix[stats,{$ZABBIX.PROXY.ADDRESS},{$ZABBIX.PROXY.PORT},queue,10m],10m)>100 |Warning |
Manual close: Yes | |
Remote Zabbix proxy: Utilization of data sender processes is high | avg(/Remote Zabbix proxy health/process.data_sender.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"data sender"} |Average |
Manual close: Yes | ||
Remote Zabbix proxy: Utilization of availability manager processes is high | avg(/Remote Zabbix proxy health/process.availability_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"availability manager"} |Average |
Manual close: Yes | ||
Remote Zabbix proxy: Utilization of configuration syncer processes is high | avg(/Remote Zabbix proxy health/process.configuration_syncer.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"configuration syncer"} |Average |
Manual close: Yes | ||
Remote Zabbix proxy: Utilization of discoverer processes is high | avg(/Remote Zabbix proxy health/process.discoverer.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"discoverer"} |Average |
Manual close: Yes | ||
Remote Zabbix proxy: Utilization of ODBC poller processes is high | avg(/Remote Zabbix proxy health/process.odbc_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ODBC poller"} |Average |
Manual close: Yes | ||
Remote Zabbix proxy: Utilization of history poller processes is high | avg(/Remote Zabbix proxy health/process.history_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"history poller"} |Average |
Manual close: Yes | ||
Remote Zabbix proxy: Utilization of history syncer processes is high | avg(/Remote Zabbix proxy health/process.history_syncer.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"history syncer"} |Average |
Manual close: Yes | ||
Remote Zabbix proxy: Utilization of housekeeper processes is high | avg(/Remote Zabbix proxy health/process.housekeeper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"housekeeper"} |Average |
Manual close: Yes | ||
Remote Zabbix proxy: Utilization of http poller processes is high | avg(/Remote Zabbix proxy health/process.http_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"http poller"} |Average |
Manual close: Yes | ||
Remote Zabbix proxy: Utilization of icmp pinger processes is high | avg(/Remote Zabbix proxy health/process.icmp_pinger.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"icmp pinger"} |Average |
Manual close: Yes | ||
Remote Zabbix proxy: Utilization of ipmi manager processes is high | avg(/Remote Zabbix proxy health/process.ipmi_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi manager"} |Average |
Manual close: Yes | ||
Remote Zabbix proxy: Utilization of ipmi poller processes is high | avg(/Remote Zabbix proxy health/process.ipmi_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi poller"} |Average |
Manual close: Yes | ||
Remote Zabbix proxy: Utilization of java poller processes is high | avg(/Remote Zabbix proxy health/process.java_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"java poller"} |Average |
Manual close: Yes | ||
Remote Zabbix proxy: Utilization of poller processes is high | avg(/Remote Zabbix proxy health/process.poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"poller"} |Average |
Manual close: Yes | ||
Remote Zabbix proxy: Utilization of preprocessing worker processes is high | avg(/Remote Zabbix proxy health/process.preprocessing_worker.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing worker"} |Average |
Manual close: Yes | ||
Remote Zabbix proxy: Utilization of preprocessing manager processes is high | avg(/Remote Zabbix proxy health/process.preprocessing_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing manager"} |Average |
Manual close: Yes | ||
Remote Zabbix proxy: Utilization of self-monitoring processes is high | avg(/Remote Zabbix proxy health/process.self-monitoring.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"self-monitoring"} |Average |
Manual close: Yes | ||
Remote Zabbix proxy: Utilization of snmp trapper processes is high | avg(/Remote Zabbix proxy health/process.snmp_trapper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"snmp trapper"} |Average |
Manual close: Yes | ||
Remote Zabbix proxy: Utilization of task manager processes is high | avg(/Remote Zabbix proxy health/process.task_manager.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"task manager"} |Average |
Manual close: Yes | ||
Remote Zabbix proxy: Utilization of trapper processes is high | avg(/Remote Zabbix proxy health/process.trapper.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"trapper"} |Average |
Manual close: Yes | ||
Remote Zabbix proxy: Utilization of unreachable poller processes is high | avg(/Remote Zabbix proxy health/process.unreachable_poller.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"unreachable poller"} |Average |
Manual close: Yes | ||
Remote Zabbix proxy: Utilization of vmware collector processes is high | avg(/Remote Zabbix proxy health/process.vmware_collector.avg.busy,10m)>{$ZABBIX.PROXY.UTIL.MAX:"vmware collector"} |Average |
Manual close: Yes | ||
Remote Zabbix proxy: More than {$ZABBIX.PROXY.UTIL.MAX}% used in the configuration cache | Consider increasing CacheSize in the zabbix_server.conf configuration file. |
max(/Remote Zabbix proxy health/rcache.buffer.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX} |Average |
Manual close: Yes | |
Remote Zabbix proxy: Failed to fetch stats data | Zabbix has not received statistics data for {$ZABBIX.PROXY.NODATA_TIMEOUT}. |
nodata(/Remote Zabbix proxy health/rcache.buffer.pused,{$ZABBIX.PROXY.NODATA_TIMEOUT})=1 |Warning |
||
Remote Zabbix proxy: Version has changed | Zabbix proxy version has changed. Acknowledge to close the problem manually. |
last(/Remote Zabbix proxy health/version,#1)<>last(/Remote Zabbix proxy health/version,#2) and length(last(/Remote Zabbix proxy health/version))>0 |Info |
Manual close: Yes | |
Remote Zabbix proxy: More than {$ZABBIX.PROXY.UTIL.MAX}% used in the vmware cache | Consider increasing VMwareCacheSize in the zabbix_server.conf configuration file. |
max(/Remote Zabbix proxy health/vmware.buffer.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX} |Average |
Manual close: Yes | |
Remote Zabbix proxy: More than {$ZABBIX.PROXY.UTIL.MAX}% used in the history cache | Consider increasing HistoryCacheSize in the zabbix_server.conf configuration file. |
max(/Remote Zabbix proxy health/wcache.history.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX} |Average |
Manual close: Yes | |
Remote Zabbix proxy: More than {$ZABBIX.PROXY.UTIL.MAX}% used in the history index cache | Consider increasing HistoryIndexCacheSize in the zabbix_server.conf configuration file. |
max(/Remote Zabbix proxy health/wcache.index.pused,10m)>{$ZABBIX.PROXY.UTIL.MAX} |Average |
Manual close: Yes | |
Remote Zabbix proxy: {HOST.NAME} has been restarted | Uptime is less than 10 minutes. |
last(/Remote Zabbix proxy health/uptime)<10m |Info |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Name | Description | Default |
---|---|---|
{$ZABBIX.PROXY.UTIL.MAX} | Maximum average percentage of time processes busy in the last minute (default is 75). |
75 |
{$ZABBIX.PROXY.UTIL.MIN} | Minimum average percentage of time processes busy in the last minute (default is 65). |
65 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Zabbix proxy: Queue over 10 minutes | Number of monitored items in the queue which are delayed at least by 10 minutes. |
Zabbix internal | zabbix[queue,10m] |
Zabbix proxy: Queue | Number of monitored items in the queue which are delayed at least by 6 seconds. |
Zabbix internal | zabbix[queue] |
Zabbix proxy: Utilization of data sender internal processes, in % | Average percentage of time data sender processes have been busy in the last minute. |
Zabbix internal | zabbix[process,data sender,avg,busy] |
Zabbix proxy: Utilization of availability manager internal processes, in % | Average percentage of time availability manager processes have been busy in the last minute. |
Zabbix internal | zabbix[process,availability manager,avg,busy] |
Zabbix proxy: Utilization of configuration syncer internal processes, in % | Average percentage of time configuration syncer processes have been busy in the last minute. |
Zabbix internal | zabbix[process,configuration syncer,avg,busy] |
Zabbix proxy: Utilization of discoverer data collector processes, in % | Average percentage of time discoverer processes have been busy in the last minute. |
Zabbix internal | zabbix[process,discoverer,avg,busy] |
Zabbix proxy: Utilization of ODBC poller data collector processes, in % | Average percentage of time ODBC poller processes have been busy in the last minute. |
Zabbix internal | zabbix[process,odbc poller,avg,busy] |
Zabbix proxy: Utilization of history poller data collector processes, in % | Average percentage of time history poller processes have been busy in the last minute. |
Zabbix internal | zabbix[process,history poller,avg,busy] |
Zabbix proxy: Utilization of history syncer internal processes, in % | Average percentage of time history syncer processes have been busy in the last minute. |
Zabbix internal | zabbix[process,history syncer,avg,busy] |
Zabbix proxy: Utilization of housekeeper internal processes, in % | Average percentage of time housekeeper processes have been busy in the last minute. |
Zabbix internal | zabbix[process,housekeeper,avg,busy] |
Zabbix proxy: Utilization of http poller data collector processes, in % | Average percentage of time http poller processes have been busy in the last minute. |
Zabbix internal | zabbix[process,http poller,avg,busy] |
Zabbix proxy: Utilization of icmp pinger data collector processes, in % | Average percentage of time icmp pinger processes have been busy in the last minute. |
Zabbix internal | zabbix[process,icmp pinger,avg,busy] |
Zabbix proxy: Utilization of ipmi manager internal processes, in % | Average percentage of time ipmi manager processes have been busy in the last minute. |
Zabbix internal | zabbix[process,ipmi manager,avg,busy] |
Zabbix proxy: Utilization of ipmi poller data collector processes, in % | Average percentage of time ipmi poller processes have been busy in the last minute. |
Zabbix internal | zabbix[process,ipmi poller,avg,busy] |
Zabbix proxy: Utilization of java poller data collector processes, in % | Average percentage of time java poller processes have been busy in the last minute. |
Zabbix internal | zabbix[process,java poller,avg,busy] |
Zabbix proxy: Utilization of poller data collector processes, in % | Average percentage of time poller processes have been busy in the last minute. |
Zabbix internal | zabbix[process,poller,avg,busy] |
Zabbix proxy: Utilization of preprocessing worker internal processes, in % | Average percentage of time preprocessing worker processes have been busy in the last minute. |
Zabbix internal | zabbix[process,preprocessing worker,avg,busy] |
Zabbix proxy: Utilization of preprocessing manager internal processes, in % | Average percentage of time preprocessing manager processes have been busy in the last minute. |
Zabbix internal | zabbix[process,preprocessing manager,avg,busy] |
Zabbix proxy: Utilization of self-monitoring internal processes, in % | Average percentage of time self-monitoring processes have been busy in the last minute. |
Zabbix internal | zabbix[process,self-monitoring,avg,busy] |
Zabbix proxy: Utilization of snmp trapper data collector processes, in % | Average percentage of time snmp trapper processes have been busy in the last minute. |
Zabbix internal | zabbix[process,snmp trapper,avg,busy] |
Zabbix proxy: Utilization of task manager internal processes, in % | Average percentage of time task manager processes have been busy in the last minute. |
Zabbix internal | zabbix[process,task manager,avg,busy] |
Zabbix proxy: Utilization of trapper data collector processes, in % | Average percentage of time trapper processes have been busy in the last minute. |
Zabbix internal | zabbix[process,trapper,avg,busy] |
Zabbix proxy: Utilization of unreachable poller data collector processes, in % | Average percentage of time unreachable poller processes have been busy in the last minute. |
Zabbix internal | zabbix[process,unreachable poller,avg,busy] |
Zabbix proxy: Utilization of vmware data collector processes, in % | Average percentage of time vmware collector processes have been busy in the last minute. |
Zabbix internal | zabbix[process,vmware collector,avg,busy] |
Zabbix proxy: Configuration cache, % used | Availability statistics of Zabbix configuration cache. Percentage of used buffer. |
Zabbix internal | zabbix[rcache,buffer,pused] |
Zabbix proxy: Version | Version of Zabbix proxy. |
Zabbix internal | zabbix[version] Preprocessing
|
Zabbix proxy: VMware cache, % used | Availability statistics of Zabbix vmware cache. Percentage of used buffer. |
Zabbix internal | zabbix[vmware,buffer,pused] |
Zabbix proxy: History write cache, % used | Statistics and availability of Zabbix write cache. Percentage of used history buffer. History cache is used to store item values. A high number indicates performance problems on the database side. |
Zabbix internal | zabbix[wcache,history,pused] |
Zabbix proxy: History index cache, % used | Statistics and availability of Zabbix write cache. Percentage of used history index buffer. History index cache is used to index values stored in history cache. |
Zabbix internal | zabbix[wcache,index,pused] |
Zabbix proxy: Number of processed values per second | Statistics and availability of Zabbix write cache. Total number of values processed by Zabbix server or Zabbix proxy, except unsupported items. |
Zabbix internal | zabbix[wcache,values] Preprocessing
|
Zabbix proxy: Number of processed numeric (float) values per second | Statistics and availability of Zabbix write cache. Number of processed float values. |
Zabbix internal | zabbix[wcache,values,float] Preprocessing
|
Zabbix proxy: Number of processed log values per second | Statistics and availability of Zabbix write cache. Number of processed log values. |
Zabbix internal | zabbix[wcache,values,log] Preprocessing
|
Zabbix proxy: Number of processed not supported values per second | Statistics and availability of Zabbix write cache. Number of times item processing resulted in item becoming unsupported or keeping that state. |
Zabbix internal | zabbix[wcache,values,not supported] Preprocessing
|
Zabbix proxy: Number of processed character values per second | Statistics and availability of Zabbix write cache. Number of processed character/string values. |
Zabbix internal | zabbix[wcache,values,str] Preprocessing
|
Zabbix proxy: Number of processed text values per second | Statistics and availability of Zabbix write cache. Number of processed text values. |
Zabbix internal | zabbix[wcache,values,text] Preprocessing
|
Zabbix proxy: Preprocessing queue | Count of values enqueued in the preprocessing queue. |
Zabbix internal | zabbix[preprocessing_queue] |
Zabbix proxy: Number of processed numeric (unsigned) values per second | Statistics and availability of Zabbix write cache. Number of processed numeric (unsigned) values. |
Zabbix internal | zabbix[wcache,values,uint] Preprocessing
|
Zabbix proxy: Values waiting to be sent | Number of values in the proxy history table waiting to be sent to the server. |
Zabbix internal | zabbix[proxy_history] |
Zabbix proxy: Required performance | Required performance of Zabbix proxy, in new values per second expected. |
Zabbix internal | zabbix[requiredperformance] |
Zabbix proxy: Uptime | Uptime of Zabbix proxy process in seconds. |
Zabbix internal | zabbix[uptime] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zabbix proxy: More than 100 items having missing data for more than 10 minutes | zabbix[stats,{$IP},{$PORT},queue,10m] item is collecting data about how many items are missing data for more than 10 minutes. |
min(/Zabbix proxy health/zabbix[queue,10m],10m)>100 |Warning |
Manual close: Yes | |
Zabbix proxy: Utilization of data sender processes is high | avg(/Zabbix proxy health/zabbix[process,data sender,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"data sender"} |Average |
Manual close: Yes | ||
Zabbix proxy: Utilization of availability manager processes is high | avg(/Zabbix proxy health/zabbix[process,availability manager,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"availability manager"} |Average |
Manual close: Yes | ||
Zabbix proxy: Utilization of configuration syncer processes is high | avg(/Zabbix proxy health/zabbix[process,configuration syncer,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"configuration syncer"} |Average |
Manual close: Yes | ||
Zabbix proxy: Utilization of discoverer processes is high | avg(/Zabbix proxy health/zabbix[process,discoverer,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"discoverer"} |Average |
Manual close: Yes | ||
Zabbix proxy: Utilization of ODBC poller processes is high | avg(/Zabbix proxy health/zabbix[process,odbc poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"ODBC poller"} |Average |
Manual close: Yes | ||
Zabbix proxy: Utilization of history poller processes is high | avg(/Zabbix proxy health/zabbix[process,history poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"history poller"} |Average |
Manual close: Yes | ||
Zabbix proxy: Utilization of history syncer processes is high | avg(/Zabbix proxy health/zabbix[process,history syncer,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"history syncer"} |Average |
Manual close: Yes | ||
Zabbix proxy: Utilization of housekeeper processes is high | avg(/Zabbix proxy health/zabbix[process,housekeeper,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"housekeeper"} |Average |
Manual close: Yes | ||
Zabbix proxy: Utilization of http poller processes is high | avg(/Zabbix proxy health/zabbix[process,http poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"http poller"} |Average |
Manual close: Yes | ||
Zabbix proxy: Utilization of icmp pinger processes is high | avg(/Zabbix proxy health/zabbix[process,icmp pinger,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"icmp pinger"} |Average |
Manual close: Yes | ||
Zabbix proxy: Utilization of ipmi manager processes is high | avg(/Zabbix proxy health/zabbix[process,ipmi manager,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi manager"} |Average |
Manual close: Yes | ||
Zabbix proxy: Utilization of ipmi poller processes is high | avg(/Zabbix proxy health/zabbix[process,ipmi poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"ipmi poller"} |Average |
Manual close: Yes | ||
Zabbix proxy: Utilization of java poller processes is high | avg(/Zabbix proxy health/zabbix[process,java poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"java poller"} |Average |
Manual close: Yes | ||
Zabbix proxy: Utilization of poller processes is high | avg(/Zabbix proxy health/zabbix[process,poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"poller"} |Average |
Manual close: Yes | ||
Zabbix proxy: Utilization of preprocessing worker processes is high | avg(/Zabbix proxy health/zabbix[process,preprocessing worker,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing worker"} |Average |
Manual close: Yes | ||
Zabbix proxy: Utilization of preprocessing manager processes is high | avg(/Zabbix proxy health/zabbix[process,preprocessing manager,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"preprocessing manager"} |Average |
Manual close: Yes | ||
Zabbix proxy: Utilization of self-monitoring processes is high | avg(/Zabbix proxy health/zabbix[process,self-monitoring,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"self-monitoring"} |Average |
Manual close: Yes | ||
Zabbix proxy: Utilization of snmp trapper processes is high | avg(/Zabbix proxy health/zabbix[process,snmp trapper,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"snmp trapper"} |Average |
Manual close: Yes | ||
Zabbix proxy: Utilization of task manager processes is high | avg(/Zabbix proxy health/zabbix[process,task manager,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"task manager"} |Average |
Manual close: Yes | ||
Zabbix proxy: Utilization of trapper processes is high | avg(/Zabbix proxy health/zabbix[process,trapper,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"trapper"} |Average |
Manual close: Yes | ||
Zabbix proxy: Utilization of unreachable poller processes is high | avg(/Zabbix proxy health/zabbix[process,unreachable poller,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"unreachable poller"} |Average |
Manual close: Yes | ||
Zabbix proxy: Utilization of vmware collector processes is high | avg(/Zabbix proxy health/zabbix[process,vmware collector,avg,busy],10m)>{$ZABBIX.PROXY.UTIL.MAX:"vmware collector"} |Average |
Manual close: Yes | ||
Zabbix proxy: More than {$ZABBIX.PROXY.UTIL.MAX}% used in the configuration cache | Consider increasing CacheSize in the zabbix_proxy.conf configuration file. |
max(/Zabbix proxy health/zabbix[rcache,buffer,pused],10m)>{$ZABBIX.PROXY.UTIL.MAX} |Average |
Manual close: Yes | |
Zabbix proxy: Version has changed | Zabbix proxy version has changed. Acknowledge to close the problem manually. |
last(/Zabbix proxy health/zabbix[version],#1)<>last(/Zabbix proxy health/zabbix[version],#2) and length(last(/Zabbix proxy health/zabbix[version]))>0 |Info |
Manual close: Yes | |
Zabbix proxy: More than {$ZABBIX.PROXY.UTIL.MAX}% used in the vmware cache | Consider increasing VMwareCacheSize in the zabbix_proxy.conf configuration file. |
max(/Zabbix proxy health/zabbix[vmware,buffer,pused],10m)>{$ZABBIX.PROXY.UTIL.MAX} |Average |
Manual close: Yes | |
Zabbix proxy: More than {$ZABBIX.PROXY.UTIL.MAX}% used in the history cache | Consider increasing HistoryCacheSize in the zabbix_proxy.conf configuration file. |
max(/Zabbix proxy health/zabbix[wcache,history,pused],10m)>{$ZABBIX.PROXY.UTIL.MAX} |Average |
Manual close: Yes | |
Zabbix proxy: More than {$ZABBIX.PROXY.UTIL.MAX}% used in the history index cache | Consider increasing HistoryIndexCacheSize in the zabbix_proxy.conf configuration file. |
max(/Zabbix proxy health/zabbix[wcache,index,pused],10m)>{$ZABBIX.PROXY.UTIL.MAX} |Average |
Manual close: Yes | |
Zabbix proxy: {HOST.NAME} has been restarted | Uptime is less than 10 minutes. |
last(/Zabbix proxy health/zabbix[uptime])<10m |Info |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Name | Description | Default |
---|---|---|
{$AGENT.TIMEOUT} | Timeout after which agent is considered unavailable. Works only for agents reachable from Zabbix server/proxy (passive mode). |
3m |
Name | Description | Type | Key and additional info |
---|---|---|---|
Zabbix agent: Version of Zabbix agent running | Zabbix agent | agent.version Preprocessing
|
|
Zabbix agent: Host name of Zabbix agent running | Zabbix agent | agent.hostname Preprocessing
|
|
Zabbix agent: Zabbix agent ping | The agent always returns 1 for this item. It could be used in combination with nodata() for availability check. |
Zabbix agent | agent.ping |
Zabbix agent: Zabbix agent availability | Monitoring the availability status of the agent. |
Zabbix internal | zabbix[host,agent,available] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zabbix agent: Zabbix agent is not available | For passive only agents, host availability is used with {$AGENT.TIMEOUT} as time threshold. |
max(/Zabbix agent/zabbix[host,agent,available],{$AGENT.TIMEOUT})=0 |Average |
Manual close: Yes |
Name | Description | Default |
---|---|---|
{$AGENT.NODATA_TIMEOUT} | No data timeout for active agents. Consider to keep it relatively high. |
30m |
{$AGENT.TIMEOUT} | Timeout after which agent is considered unavailable. |
5m |
Name | Description | Type | Key and additional info |
---|---|---|---|
Zabbix agent active: Version of Zabbix agent running | Zabbix agent (active) | agent.version Preprocessing
|
|
Zabbix agent active: Host name of Zabbix agent running | Zabbix agent (active) | agent.hostname Preprocessing
|
|
Zabbix agent active: Zabbix agent ping | The agent always returns 1 for this item. It could be used in combination with nodata() for availability check. |
Zabbix agent (active) | agent.ping |
Zabbix agent active: Active agent availability | Availability of active checks on the host. The value of this item corresponds to availability icons in the host list. Possible value: 0 - unknown 1 - available 2 - not available |
Zabbix internal | zabbix[host,active_agent,available] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Zabbix agent active: Zabbix agent is not available | For active agents, nodata() with agent.ping is used with {$AGENT.NODATA_TIMEOUT} as time threshold. |
nodata(/Zabbix agent active/agent.ping,{$AGENT.NODATA_TIMEOUT})=1 |Average |
Manual close: Yes | |
Zabbix agent active: Active checks are not available | Active checks are considered unavailable. Agent is not sending heartbeat for prolonged time. |
min(/Zabbix agent active/zabbix[host,active_agent,available],{$AGENT.TIMEOUT})=2 |High |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Official JMX Template for WildFly server.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Metrics are collected by JMX. This template works with standalone and domain instances.
/(wildfly,EAP,Jboss,AS)/bin/client
in to directory /usr/share/zabbix-java-gateway/lib
Name | Description | Default |
---|---|---|
{$WILDFLY.USER} | zabbix |
|
{$WILDFLY.PASSWORD} | zabbix |
|
{$WILDFLY.JMX.PROTOCOL} | remote+http |
|
{$WILDFLY.DEPLOYMENT.MATCHES} | Filter of discoverable deployments |
.* |
{$WILDFLY.DEPLOYMENT.NOT_MATCHES} | Filter to exclude discovered deployments |
CHANGE_IF_NEEDED |
{$WILDFLY.CONN.USAGE.WARN.MAX} | The maximum connection usage percent for trigger expression. |
80 |
{$WILDFLY.CONN.WAIT.MAX.WARN} | The maximum number of waiting connections for trigger expression. |
300 |
Name | Description | Type | Key and additional info |
---|---|---|---|
WildFly: Launch type | The manner in which the server process was launched. Either "DOMAIN" for a domain mode server launched by a Host Controller, "STANDALONE" for a standalone server launched from the command line, or "EMBEDDED" for a standalone server launched as an embedded part of an application running in the same virtual machine. |
JMX agent | jmx["jboss.as:management-root=server","launchType"] Preprocessing
|
WildFly: Name | For standalone mode: The name of this server. If not set, defaults to the runtime value of InetAddress.getLocalHost().getHostName(). For domain mode: The name given to this domain. |
JMX agent | jmx["jboss.as:management-root=server","name"] Preprocessing
|
WildFly: Process type | The type of process represented by this root resource. |
JMX agent | jmx["jboss.as:management-root=server","processType"] Preprocessing
|
WildFly: Runtime configuration state | The current persistent configuration state, one of starting, ok, reload-required, restart-required, stopping or stopped. |
JMX agent | jmx["jboss.as:management-root=server","runtimeConfigurationState"] Preprocessing
|
WildFly: Server controller state | The current state of the server controller; either STARTING, RUNNING, RESTARTREQUIRED, RELOADREQUIRED or STOPPING. |
JMX agent | jmx["jboss.as:management-root=server","serverState"] Preprocessing
|
WildFly: Version | The version of the WildFly Core based product release. |
JMX agent | jmx["jboss.as:management-root=server","productVersion"] Preprocessing
|
WildFly: Uptime | WildFly server uptime. |
JMX agent | jmx["java.lang:type=Runtime","Uptime"] Preprocessing
|
WildFly: Transactions: Total, rate | The total number of transactions (top-level and nested) created per second. |
JMX agent | jmx["jboss.as:subsystem=transactions","numberOfTransactions"] Preprocessing
|
WildFly: Transactions: Aborted, rate | The number of aborted (i.e. rolledback) transactions per second. |
JMX agent | jmx["jboss.as:subsystem=transactions","numberOfAbortedTransactions"] Preprocessing
|
WildFly: Transactions: Application rollbacks, rate | The number of transactions that have been rolled back by application request. This includes those that timeout, since the timeout behavior is considered an attribute of the application configuration. |
JMX agent | jmx["jboss.as:subsystem=transactions","numberOfApplicationRollbacks"] Preprocessing
|
WildFly: Transactions: Committed, rate | The number of committed transactions. |
JMX agent | jmx["jboss.as:subsystem=transactions","numberOfCommittedTransactions"] Preprocessing
|
WildFly: Transactions: Heuristics, rate | The number of transactions which have terminated with heuristic outcomes. |
JMX agent | jmx["jboss.as:subsystem=transactions","numberOfHeuristics"] Preprocessing
|
WildFly: Transactions: Current | The number of transactions that have begun but not yet terminated. |
JMX agent | jmx["jboss.as:subsystem=transactions","numberOfInflightTransactions"] |
WildFly: Transactions: Nested, rate | The total number of nested (sub) transactions created. |
JMX agent | jmx["jboss.as:subsystem=transactions","numberOfNestedTransactions"] Preprocessing
|
WildFly: Transactions: ResourceRollbacks, rate | The number of transactions that rolled back due to resource (participant) failure. |
JMX agent | jmx["jboss.as:subsystem=transactions","numberOfResourceRollbacks"] Preprocessing
|
WildFly: Transactions: System rollbacks, rate | The number of transactions that have been rolled back due to internal system errors. |
JMX agent | jmx["jboss.as:subsystem=transactions","numberOfSystemRollbacks"] Preprocessing
|
WildFly: Transactions: Timed out, rate | The number of transactions that have rolled back due to timeout. |
JMX agent | jmx["jboss.as:subsystem=transactions","numberOfTimedOutTransactions"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
WildFly: Server needs to restart for configuration change. | find(/WildFly Server by JMX/jmx["jboss.as:management-root=server","runtimeConfigurationState"],,"like","ok")=0 |Warning |
|||
WildFly: Server controller is not in RUNNING state | find(/WildFly Server by JMX/jmx["jboss.as:management-root=server","serverState"],,"like","running")=0 |Warning |
Depends on:
|
||
WildFly: Version has changed | WildFly version has changed. Acknowledge to close the problem manually. |
last(/WildFly Server by JMX/jmx["jboss.as:management-root=server","productVersion"],#1)<>last(/WildFly Server by JMX/jmx["jboss.as:management-root=server","productVersion"],#2) and length(last(/WildFly Server by JMX/jmx["jboss.as:management-root=server","productVersion"]))>0 |Info |
Manual close: Yes | |
WildFly: Host has been restarted | Uptime is less than 10 minutes. |
last(/WildFly Server by JMX/jmx["java.lang:type=Runtime","Uptime"])<10m |Info |
Manual close: Yes | |
WildFly: Failed to fetch info data | Zabbix has not received data for items for the last 15 minutes |
nodata(/WildFly Server by JMX/jmx["java.lang:type=Runtime","Uptime"],15m)=1 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Deployments discovery | Discovery deployments metrics. |
JMX agent | jmx.get[beans,"jboss.as.expr:deployment=*"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
WildFly deployment [{#DEPLOYMENT}]: Status | The current runtime status of a deployment. Possible status modes are OK, FAILED, and STOPPED. FAILED indicates a dependency is missing or a service could not start. STOPPED indicates that the deployment was not enabled or was manually stopped. |
JMX agent | jmx["{#JMXOBJ}",status] Preprocessing
|
WildFly deployment [{#DEPLOYMENT}]: Enabled | Boolean indicating whether the deployment content is currently deployed in the runtime (or should be deployed in the runtime the next time the server starts). |
JMX agent | jmx["{#JMXOBJ}",enabled] Preprocessing
|
WildFly deployment [{#DEPLOYMENT}]: Managed | Indicates if the deployment is managed (aka uses the ContentRepository). |
JMX agent | jmx["{#JMXOBJ}",managed] Preprocessing
|
WildFly deployment [{#DEPLOYMENT}]: Persistent | Indicates if the deployment is managed (aka uses the ContentRepository). |
JMX agent | jmx["{#JMXOBJ}",persistent] Preprocessing
|
WildFly deployment [{#DEPLOYMENT}]: Enabled time | Indicates if the deployment is managed (aka uses the ContentRepository). |
JMX agent | jmx["{#JMXOBJ}",enabledTime] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
WildFly deployment [{#DEPLOYMENT}]: Deployment status has changed | Deployment status has changed. Acknowledge to close the problem manually. |
last(/WildFly Server by JMX/jmx["{#JMXOBJ}",status],#1)<>last(/WildFly Server by JMX/jmx["{#JMXOBJ}",status],#2) and length(last(/WildFly Server by JMX/jmx["{#JMXOBJ}",status]))>0 |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
JDBC metrics discovery | JMX agent | jmx.get[beans,"jboss.as:subsystem=datasources,data-source=*,statistics=jdbc"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
WildFly {#JMXDATASOURCE}: Cache access, rate | The number of times that the statement cache was accessed per second. |
JMX agent | jmx["{#JMXOBJ}",PreparedStatementCacheAccessCount] Preprocessing
|
WildFly {#JMXDATASOURCE}: Cache add, rate | The number of statements added to the statement cache per second. |
JMX agent | jmx["{#JMXOBJ}",PreparedStatementCacheAddCount] Preprocessing
|
WildFly {#JMXDATASOURCE}: Cache current size | The number of prepared and callable statements currently cached in the statement cache. |
JMX agent | jmx["{#JMXOBJ}",PreparedStatementCacheCurrentSize] |
WildFly {#JMXDATASOURCE}: Cache delete, rate | The number of statements discarded from the cache per second. |
JMX agent | jmx["{#JMXOBJ}",PreparedStatementCacheDeleteCount] Preprocessing
|
WildFly {#JMXDATASOURCE}: Cache hit, rate | The number of times that statements from the cache were used per second. |
JMX agent | jmx["{#JMXOBJ}",PreparedStatementCacheHitCount] Preprocessing
|
WildFly {#JMXDATASOURCE}: Cache miss, rate | The number of times that a statement request could not be satisfied with a statement from the cache per second. |
JMX agent | jmx["{#JMXOBJ}",PreparedStatementCacheMissCount] Preprocessing
|
WildFly {#JMXDATASOURCE}: Statistics enabled | Define whether runtime statistics are enabled or not. |
JMX agent | jmx["{#JMXOBJ}",statisticsEnabled, "JDBC"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
WildFly {#JMXDATASOURCE}: JDBC monitoring statistic is not enabled | last(/WildFly Server by JMX/jmx["{#JMXOBJ}",statisticsEnabled, "JDBC"])=0 |Info |
Name | Description | Type | Key and additional info |
---|---|---|---|
Pools metrics discovery | JMX agent | jmx.get[beans,"jboss.as:subsystem=datasources,data-source=*,statistics=pool"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
WildFly {#JMXDATASOURCE}: Connections: Active | The number of open connections. |
JMX agent | jmx["{#JMXOBJ}",ActiveCount] |
WildFly {#JMXDATASOURCE}: Connections: Available | The available count. |
JMX agent | jmx["{#JMXOBJ}",AvailableCount] |
WildFly {#JMXDATASOURCE}: Blocking time, avg | Average Blocking Time for pool. |
JMX agent | jmx["{#JMXOBJ}",AverageBlockingTime] |
WildFly {#JMXDATASOURCE}: Connections: Creating time, avg | The average time spent creating a physical connection. |
JMX agent | jmx["{#JMXOBJ}",AverageCreationTime] |
WildFly {#JMXDATASOURCE}: Connections: Get time, avg | The average time spent obtaining a physical connection. |
JMX agent | jmx["{#JMXOBJ}",AverageGetTime] |
WildFly {#JMXDATASOURCE}: Connections: Pool time, avg | The average time for a physical connection spent in the pool. |
JMX agent | jmx["{#JMXOBJ}",AveragePoolTime] |
WildFly {#JMXDATASOURCE}: Connections: Usage time, avg | The average time spent using a physical connection |
JMX agent | jmx["{#JMXOBJ}",AverageUsageTime] |
WildFly {#JMXDATASOURCE}: Connections: Blocking failure, rate | The number of failures trying to obtain a physical connection per second. |
JMX agent | jmx["{#JMXOBJ}",BlockingFailureCount] Preprocessing
|
WildFly {#JMXDATASOURCE}: Connections: Created, rate | The created per second |
JMX agent | jmx["{#JMXOBJ}",CreatedCount] Preprocessing
|
WildFly {#JMXDATASOURCE}: Connections: Destroyed, rate | The destroyed count. |
JMX agent | jmx["{#JMXOBJ}",DestroyedCount] Preprocessing
|
WildFly {#JMXDATASOURCE}: Connections: Idle | The number of physical connections currently idle. |
JMX agent | jmx["{#JMXOBJ}",IdleCount] |
WildFly {#JMXDATASOURCE}: Connections: In use | The number of physical connections currently in use. |
JMX agent | jmx["{#JMXOBJ}",InUseCount] |
WildFly {#JMXDATASOURCE}: Connections: Used, max | The maximum number of connections used. |
JMX agent | jmx["{#JMXOBJ}",MaxUsedCount] |
WildFly {#JMXDATASOURCE}: Statistics enabled | Define whether runtime statistics are enabled or not. |
JMX agent | jmx["{#JMXOBJ}",statisticsEnabled] Preprocessing
|
WildFly {#JMXDATASOURCE}: Connections: Timed out, rate | The timed out connections per second. |
JMX agent | jmx["{#JMXOBJ}",TimedOut] Preprocessing
|
WildFly {#JMXDATASOURCE}: Connections: Wait | The number of requests that had to wait to obtain a physical connection. |
JMX agent | jmx["{#JMXOBJ}",WaitCount] |
WildFly {#JMXDATASOURCE}: XA: Commit time, avg | The average time for a XAResource commit invocation. |
JMX agent | jmx["{#JMXOBJ}",XACommitAverageTime] |
WildFly {#JMXDATASOURCE}: XA: Commit, rate | The number of XAResource commit invocations per second. |
JMX agent | jmx["{#JMXOBJ}",XACommitCount] Preprocessing
|
WildFly {#JMXDATASOURCE}: XA: End time, avg | The average time for a XAResource end invocation. |
JMX agent | jmx["{#JMXOBJ}",XAEndAverageTime] |
WildFly {#JMXDATASOURCE}: XA: End, rate | The number of XAResource end invocations per second. |
JMX agent | jmx["{#JMXOBJ}",XAEndCount] Preprocessing
|
WildFly {#JMXDATASOURCE}: XA: Forget time, avg | The average time for a XAResource forget invocation. |
JMX agent | jmx["{#JMXOBJ}",XAForgetAverageTime] |
WildFly {#JMXDATASOURCE}: XA: Forget, rate | The number of XAResource forget invocations per second. |
JMX agent | jmx["{#JMXOBJ}",XAForgetCount] Preprocessing
|
WildFly {#JMXDATASOURCE}: XA: Prepare time, avg | The average time for a XAResource prepare invocation. |
JMX agent | jmx["{#JMXOBJ}",XAPrepareAverageTime] |
WildFly {#JMXDATASOURCE}: XA: Prepare, rate | The number of XAResource prepare invocations per second. |
JMX agent | jmx["{#JMXOBJ}",XAPrepareCount] Preprocessing
|
WildFly {#JMXDATASOURCE}: XA: Recover time, avg | The average time for a XAResource recover invocation. |
JMX agent | jmx["{#JMXOBJ}",XARecoverAverageTime] |
WildFly {#JMXDATASOURCE}: XA: Recover, rate | The number of XAResource recover invocations per second. |
JMX agent | jmx["{#JMXOBJ}",XARecoverCount] Preprocessing
|
WildFly {#JMXDATASOURCE}: XA: Rollback time, avg | The average time for a XAResource rollback invocation. |
JMX agent | jmx["{#JMXOBJ}",XARollbackAverageTime] |
WildFly {#JMXDATASOURCE}: XA: Rollback, rate | The number of XAResource rollback invocations per second. |
JMX agent | jmx["{#JMXOBJ}",XARollbackCount] Preprocessing
|
WildFly {#JMXDATASOURCE}: XA: Start time, avg | The average time for a XAResource start invocation. |
JMX agent | jmx["{#JMXOBJ}",XAStartAverageTime] |
WildFly {#JMXDATASOURCE}: XA: Start rate | The number of XAResource start invocations per second. |
JMX agent | jmx["{#JMXOBJ}",XAStartCount] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
WildFly {#JMXDATASOURCE}: There are no active connections for 5m | max(/WildFly Server by JMX/jmx["{#JMXOBJ}",ActiveCount],5m)=0 |Warning |
|||
WildFly {#JMXDATASOURCE}: Connection usage is too high | min(/WildFly Server by JMX/jmx["{#JMXOBJ}",InUseCount],5m)/last(/WildFly Server by JMX/jmx["{#JMXOBJ}",AvailableCount])*100>{$WILDFLY.CONN.USAGE.WARN.MAX} |High |
|||
WildFly {#JMXDATASOURCE}: Pools monitoring statistic is not enabled | Zabbix has not received data for items for the last 15 minutes |
last(/WildFly Server by JMX/jmx["{#JMXOBJ}",statisticsEnabled])=0 |Info |
||
WildFly {#JMXDATASOURCE}: There are timeout connections | last(/WildFly Server by JMX/jmx["{#JMXOBJ}",TimedOut])>0 |Warning |
|||
WildFly {#JMXDATASOURCE}: Too many waiting connections | min(/WildFly Server by JMX/jmx["{#JMXOBJ}",WaitCount],5m)>{$WILDFLY.CONN.WAIT.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Undertow metrics discovery | JMX agent | jmx.get[beans,"jboss.as:subsystem=undertow,server=,http-listener="] |
Name | Description | Type | Key and additional info |
---|---|---|---|
WildFly listener {#HTTP_LISTENER}: Errors, rate | The number of 500 responses that have been sent by this listener per second. |
JMX agent | jmx["{#JMXOBJ}",errorCount] Preprocessing
|
WildFly listener {#HTTP_LISTENER}: Requests, rate | The number of requests this listener has served per second. |
JMX agent | jmx["{#JMXOBJ}",requestCount] Preprocessing
|
WildFly listener {#HTTP_LISTENER}: Bytes sent, rate | The number of bytes that have been sent out on this listener per second. |
JMX agent | jmx["{#JMXOBJ}",bytesSent] Preprocessing
|
WildFly listener {#HTTP_LISTENER}: Bytes received, rate | The number of bytes that have been received by this listener per second. |
JMX agent | jmx["{#JMXOBJ}",bytesReceived] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
WildFly listener {#HTTP_LISTENER}: There are 500 responses by this listener. | last(/WildFly Server by JMX/jmx["{#JMXOBJ}",errorCount])>0 |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Official JMX Template for WildFly Domain Controller.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Metrics are collected by JMX. This template works with Domain Controller.
/(wildfly,EAP,Jboss,AS)/bin/client
in to directory /usr/share/zabbix-java-gateway/lib
Name | Description | Default |
---|---|---|
{$WILDFLY.USER} | zabbix |
|
{$WILDFLY.PASSWORD} | zabbix |
|
{$WILDFLY.JMX.PROTOCOL} | remote+http |
|
{$WILDFLY.DEPLOYMENT.MATCHES} | Filter of discoverable deployments |
.* |
{$WILDFLY.DEPLOYMENT.NOT_MATCHES} | Filter to exclude discovered deployments |
CHANGE_IF_NEEDED |
{$WILDFLY.SERVER.MATCHES} | Filter of discoverable servers |
.* |
{$WILDFLY.SERVER.NOT_MATCHES} | Filter to exclude discovered servers |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
WildFly: Launch type | The manner in which the server process was launched. Either "DOMAIN" for a domain mode server launched by a Host Controller, "STANDALONE" for a standalone server launched from the command line, or "EMBEDDED" for a standalone server launched as an embedded part of an application running in the same virtual machine. |
JMX agent | jmx["jboss.as:management-root=server","launchType"] Preprocessing
|
WildFly: Name | For standalone mode: The name of this server. If not set, defaults to the runtime value of InetAddress.getLocalHost().getHostName(). For domain mode: The name given to this domain |
JMX agent | jmx["jboss.as:management-root=server","name"] Preprocessing
|
WildFly: Process type | The type of process represented by this root resource. |
JMX agent | jmx["jboss.as:management-root=server","processType"] Preprocessing
|
WildFly: Version | The version of the WildFly Core based product release. |
JMX agent | jmx["jboss.as:management-root=server","productVersion"] Preprocessing
|
WildFly: Uptime | WildFly server uptime. |
JMX agent | jmx["java.lang:type=Runtime","Uptime"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
WildFly: Version has changed | WildFly version has changed. Acknowledge to close the problem manually. |
last(/WildFly Domain by JMX/jmx["jboss.as:management-root=server","productVersion"],#1)<>last(/WildFly Domain by JMX/jmx["jboss.as:management-root=server","productVersion"],#2) and length(last(/WildFly Domain by JMX/jmx["jboss.as:management-root=server","productVersion"]))>0 |Info |
Manual close: Yes | |
WildFly: Host has been restarted | Uptime is less than 10 minutes. |
last(/WildFly Domain by JMX/jmx["java.lang:type=Runtime","Uptime"])<10m |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Deployments discovery | Discovery deployments metrics. |
JMX agent | jmx.get[beans,"jboss.as.expr:deployment=,server-group="] |
Name | Description | Type | Key and additional info |
---|---|---|---|
WildFly deployment [{#DEPLOYMENT}]: Enabled | Boolean indicating whether the deployment content is currently deployed in the runtime (or should be deployed in the runtime the next time the server starts). |
JMX agent | jmx["{#JMXOBJ}",enabled] Preprocessing
|
WildFly deployment [{#DEPLOYMENT}]: Managed | Indicates if the deployment is managed (aka uses the ContentRepository). |
JMX agent | jmx["{#JMXOBJ}",managed] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Servers discovery | Discovery instances in domain. |
JMX agent | jmx.get[beans,"jboss.as:host=master,server-config=*"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
WildFly domain: Server {#SERVER}: Autostart | Whether or not this server should be started when the Host Controller starts. |
JMX agent | jmx["{#JMXOBJ}",autoStart] Preprocessing
|
WildFly domain: Server {#SERVER}: Status | The current status of the server. |
JMX agent | jmx["{#JMXOBJ}",status] Preprocessing
|
WildFly domain: Server {#SERVER}: Server group | The name of a server group from the domain model. |
JMX agent | jmx["{#JMXOBJ}",group] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
WildFly domain: Server {#SERVER}: Server status has changed | Server status has changed. Acknowledge to close the problem manually. |
last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",status],#1)<>last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",status],#2) and length(last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",status]))>0 |Warning |
Manual close: Yes | |
WildFly domain: Server {#SERVER}: Server group has changed | Server group has changed. Acknowledge to close the problem manually. |
last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",group],#1)<>last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",group],#2) and length(last(/WildFly Domain by JMX/jmx["{#JMXOBJ}",group]))>0 |Info |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of both VMware vCenter and ESX hypervisor monitoring and doesn't require any external scripts.
The "VMware Hypervisor" and "VMware Guest" templates are used by discovery and normally should not be manually linked to a host. For additional information please check https://www.zabbix.com/documentation/6.4/manual/vm_monitoring
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
--with-libxml2
and --with-libcurl
)StartVMwareCollectors
option in Zabbix server configuration file to "1" or more{$VMWARE.URL}
{$VMWARE.USERNAME}
{$VMWARE.PASSWORD}
Note: To enable discovery of hardware sensors of VMware Hypervisors, set the macro {$VMWARE.HV.SENSOR.DISCOVERY}
to the value true
on the discovered host level.
Name | Description | Default |
---|---|---|
{$VMWARE.URL} | VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk) |
|
{$VMWARE.USERNAME} | VMware service user name |
|
{$VMWARE.PASSWORD} | VMware service {$USERNAME} user password |
|
{$VMWARE.HV.SENSOR.DISCOVERY} | Set "true"/"false" to enable or disable monitoring of hardware sensors. |
false |
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.MATCHES} | Sets the regex string of hardware sensor names to allow in discovery. |
.* |
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.NOT_MATCHES} | Sets the regex string of hardware sensor names to ignore in discovery. |
CHANGE_IF_NEEDED |
{$VMWARE.DATASTORE.SPACE.CRIT} | The critical threshold of the datastore free space. |
10 |
{$VMWARE.DATASTORE.SPACE.WARN} | The warning threshold of the datastore free space. |
20 |
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware: Event log | Collect VMware event log. See also: https://www.zabbix.com/documentation/6.4/manual/config/items/preprocessing/examples#filteringvmwareeventlogrecords |
Simple check | vmware.eventlog[{$VMWARE.URL},skip] |
VMware: Full name | VMware service full name. |
Simple check | vmware.fullname[{$VMWARE.URL}] Preprocessing
|
VMware: Version | VMware service version. |
Simple check | vmware.version[{$VMWARE.URL}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Discover VMware clusters | Discovery of clusters |
Simple check | vmware.cluster.discovery[{$VMWARE.URL}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware: Status of "{#CLUSTER.NAME}" cluster | VMware cluster status. |
Simple check | vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware: The {#CLUSTER.NAME} status is Red | A cluster enabled for DRS becomes invalid (red) when the tree is no longer internally consistent, that is, resource constraints are not observed. See also: https://docs.vmware.com/en/VMware-vSphere/8.0/vsphere-resource-management/GUID-C7417CAA-BD38-41D0-9529-9E7A5898BB12.html |
last(/VMware FQDN/vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}])=3 |High |
||
VMware: The {#CLUSTER.NAME} status is Yellow | A cluster becomes overcommitted (yellow) when the tree of resource pools and virtual machines is internally consistent but the cluster does not have the capacity to support all resources reserved by the child resource pools. See also: https://docs.vmware.com/en/VMware-vSphere/8.0/vsphere-resource-management/GUID-ED8240A0-FB54-4A31-BD3D-F23FE740F10C.html |
last(/VMware FQDN/vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}])=2 |Average |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Discover VMware datastores | Simple check | vmware.datastore.discovery[{$VMWARE.URL}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware: Average read latency of the datastore {#DATASTORE} | Amount of time for a read operation from the datastore (milliseconds). |
Simple check | vmware.datastore.read[{$VMWARE.URL},{#DATASTORE},latency] |
VMware: Free space on datastore {#DATASTORE} (percentage) | VMware datastore free space in percentage from total. |
Simple check | vmware.datastore.size[{$VMWARE.URL},{#DATASTORE},pfree] |
VMware: Total size of datastore {#DATASTORE} | VMware datastore space in bytes. |
Simple check | vmware.datastore.size[{$VMWARE.URL},{#DATASTORE}] |
VMware: Average write latency of the datastore {#DATASTORE} | Amount of time for a write operation to the datastore (milliseconds). |
Simple check | vmware.datastore.write[{$VMWARE.URL},{#DATASTORE},latency] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware: {#DATASTORE}: Free space is critically low | Datastore free space has fallen below critical threshold. |
last(/VMware FQDN/vmware.datastore.size[{$VMWARE.URL},{#DATASTORE},pfree])<{$VMWARE.DATASTORE.SPACE.CRIT} |High |
||
VMware: {#DATASTORE}: Free space is low | Datastore free space has fallen below warning threshold. |
last(/VMware FQDN/vmware.datastore.size[{$VMWARE.URL},{#DATASTORE},pfree])<{$VMWARE.DATASTORE.SPACE.WARN} |Warning |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Discover VMware hypervisors | Discovery of hypervisors. |
Simple check | vmware.hv.discovery[{$VMWARE.URL}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Discover VMware VMs FQDN | Discovery of guest virtual machines. |
Simple check | vmware.vm.discovery[{$VMWARE.URL}] |
Name | Description | Default |
---|---|---|
{$VMWARE.URL} | VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk) |
|
{$VMWARE.USERNAME} | VMware service user name |
|
{$VMWARE.PASSWORD} | VMware service {$USERNAME} user password |
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware: Cluster name | Cluster name of the guest VM. |
Simple check | vmware.vm.cluster.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
VMware: Number of virtual CPUs | Number of virtual CPUs assigned to the guest. |
Simple check | vmware.vm.cpu.num[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
VMware: CPU ready | Time that the virtual machine was ready, but could not get scheduled to run on the physical CPU during last measurement interval (VMware vCenter/ESXi Server performance counter sampling interval - 20 seconds) |
Simple check | vmware.vm.cpu.ready[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: CPU usage | Current upper-bound on CPU usage. The upper-bound is based on the host the virtual machine is current running on, as well as limits configured on the virtual machine itself or any parent resource pool. Valid while the virtual machine is running. |
Simple check | vmware.vm.cpu.usage[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Datacenter name | Datacenter name of the guest VM. |
Simple check | vmware.vm.datacenter.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
VMware: Hypervisor name | Hypervisor name of the guest VM. |
Simple check | vmware.vm.hv.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
VMware: Ballooned memory | The amount of guest physical memory that is currently reclaimed through the balloon driver. |
Simple check | vmware.vm.memory.size.ballooned[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Compressed memory | The amount of memory currently in the compression cache for this VM. |
Simple check | vmware.vm.memory.size.compressed[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Private memory | Amount of memory backed by host memory and not being shared. |
Simple check | vmware.vm.memory.size.private[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Shared memory | The amount of guest physical memory shared through transparent page sharing. |
Simple check | vmware.vm.memory.size.shared[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Swapped memory | The amount of guest physical memory swapped out to the VM's swap device by ESX. |
Simple check | vmware.vm.memory.size.swapped[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Guest memory usage | The amount of guest physical memory that is being used by the VM. |
Simple check | vmware.vm.memory.size.usage.guest[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Host memory usage | The amount of host physical memory allocated to the VM, accounting for saving from memory sharing with other VMs. |
Simple check | vmware.vm.memory.size.usage.host[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Memory size | Total size of configured memory. |
Simple check | vmware.vm.memory.size[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
VMware: Power state | The current power state of the virtual machine. |
Simple check | vmware.vm.powerstate[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
VMware: Committed storage space | Total storage space, in bytes, committed to this virtual machine across all datastores. |
Simple check | vmware.vm.storage.committed[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Uncommitted storage space | Additional storage space, in bytes, potentially used by this virtual machine on all datastores. |
Simple check | vmware.vm.storage.uncommitted[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Unshared storage space | Total storage space, in bytes, occupied by the virtual machine across all datastores, that is not shared with any other virtual machine. |
Simple check | vmware.vm.storage.unshared[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Uptime | System uptime. |
Simple check | vmware.vm.uptime[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Guest memory swapped | Amount of guest physical memory that is swapped out to the swap space. |
Simple check | vmware.vm.guest.memory.size.swapped[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Host memory consumed | Amount of host physical memory consumed for backing up guest physical memory pages. |
Simple check | vmware.vm.memory.size.consumed[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Host memory usage in percents | Percentage of host physical memory that has been consumed. |
Simple check | vmware.vm.memory.usage[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: CPU usage in percents | CPU usage as a percentage during the interval. |
Simple check | vmware.vm.cpu.usage.perf[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: CPU latency in percents | Percentage of time the virtual machine is unable to run because it is contending for access to the physical CPU(s). |
Simple check | vmware.vm.cpu.latency[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: CPU readiness latency in percents | Percentage of time that the virtual machine was ready, but could not get scheduled to run on the physical CPU. |
Simple check | vmware.vm.cpu.readiness[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: CPU swap-in latency in percents | Percentage of CPU time spent waiting for swap-in. |
Simple check | vmware.vm.cpu.swapwait[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Uptime of guest OS | Total time elapsed since the last operating system boot-up (in seconds). |
Simple check | vmware.vm.guest.osuptime[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware: VM has been restarted | Uptime is less than 10 minutes. |
last(/VMware Guest/vmware.vm.guest.osuptime[{$VMWARE.URL},{$VMWARE.VM.UUID}])<10m |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Network device discovery | Discovery of all network devices. |
Simple check | vmware.vm.net.if.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware: Number of bytes received on interface {#IFDESC} | VMware virtual machine network interface input statistics (bytes per second). |
Simple check | vmware.vm.net.if.in[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},bps] |
VMware: Number of packets received on interface {#IFDESC} | VMware virtual machine network interface input statistics (packets per second). |
Simple check | vmware.vm.net.if.in[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},pps] |
VMware: Number of bytes transmitted on interface {#IFDESC} | VMware virtual machine network interface output statistics (bytes per second). |
Simple check | vmware.vm.net.if.out[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},bps] |
VMware: Number of packets transmitted on interface {#IFDESC} | VMware virtual machine network interface output statistics (packets per second). |
Simple check | vmware.vm.net.if.out[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},pps] |
VMware: Network utilization on interface {#IFDESC} | VMware virtual machine network utilization (combined transmit-rates and receive-rates) during the interval. |
Simple check | vmware.vm.net.if.usage[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Disk device discovery | Discovery of all disk devices. |
Simple check | vmware.vm.vfs.dev.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware: Average number of bytes read from the disk {#DISKDESC} | VMware virtual machine disk device read statistics (bytes per second). |
Simple check | vmware.vm.vfs.dev.read[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},bps] |
VMware: Average number of reads from the disk {#DISKDESC} | VMware virtual machine disk device read statistics (operations per second). |
Simple check | vmware.vm.vfs.dev.read[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},ops] |
VMware: Average number of bytes written to the disk {#DISKDESC} | VMware virtual machine disk device write statistics (bytes per second). |
Simple check | vmware.vm.vfs.dev.write[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},bps] |
VMware: Average number of writes to the disk {#DISKDESC} | VMware virtual machine disk device write statistics (operations per second). |
Simple check | vmware.vm.vfs.dev.write[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},ops] |
VMware: Average number of outstanding read requests to the disk {#DISKDESC} | Average number of outstanding read requests to the virtual disk during the collection interval. |
Simple check | vmware.vm.storage.readoio[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}] |
VMware: Average number of outstanding write requests to the disk {#DISKDESC} | Average number of outstanding write requests to the virtual disk during the collection interval. |
Simple check | vmware.vm.storage.writeoio[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}] |
VMware: Average write latency to the disk {#DISKDESC} | The average time a write to the virtual disk takes. |
Simple check | vmware.vm.storage.totalwritelatency[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}] |
VMware: Average read latency to the disk {#DISKDESC} | The average time a read from the virtual disk takes. |
Simple check | vmware.vm.storage.totalreadlatency[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Mounted filesystem discovery | Discovery of all guest file systems. |
Simple check | vmware.vm.vfs.fs.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware: Free disk space on {#FSNAME} | VMware virtual machine file system statistics (bytes). |
Simple check | vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},free] |
VMware: Free disk space on {#FSNAME} (percentage) | VMware virtual machine file system statistics (percentages). |
Simple check | vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},pfree] |
VMware: Total disk space on {#FSNAME} | VMware virtual machine total disk space (bytes). |
Simple check | vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},total] Preprocessing
|
VMware: Used disk space on {#FSNAME} | VMware virtual machine used disk space (bytes). |
Simple check | vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},used] |
Name | Description | Default |
---|---|---|
{$VMWARE.URL} | VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk). |
|
{$VMWARE.USERNAME} | VMware service user name. |
|
{$VMWARE.PASSWORD} | VMware service {$USERNAME} user password. |
|
{$VMWARE.HV.SENSOR.DISCOVERY} | Set "true"/"false" to enable or disable monitoring of hardware sensors. |
false |
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.MATCHES} | Sets the regex string of hardware sensor names to allow in discovery. |
.* |
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.NOT_MATCHES} | Sets the regex string of hardware sensor names to ignore in discovery. |
CHANGE_IF_NEEDED |
{$VMWARE.HV.DATASTORE.SPACE.CRIT} | The critical threshold of the datastore free space. |
10 |
{$VMWARE.HV.DATASTORE.SPACE.WARN} | The warning threshold of the datastore free space. |
20 |
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware: Hypervisor ping | Checks if the hypervisor is running and accepting ICMP pings. |
Simple check | icmpping[] Preprocessing
|
VMware: Cluster name | Cluster name of the guest VM. |
Simple check | vmware.hv.cluster.name[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing
|
VMware: CPU usage | Aggregated CPU usage across all cores on the host in Hz. This is only available if the host is connected. |
Simple check | vmware.hv.cpu.usage[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: CPU usage in percents | CPU usage as a percentage during the interval. |
Simple check | vmware.hv.cpu.usage.perf[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: CPU utilization | CPU usage as a percentage during the interval depends on power management or HT. |
Simple check | vmware.hv.cpu.utilization[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Power usage | Current power usage. |
Simple check | vmware.hv.power[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Power usage maximum allowed | Maximum allowed power usage. |
Simple check | vmware.hv.power[{$VMWARE.URL},{$VMWARE.HV.UUID},max] Preprocessing
|
VMware: Datacenter name | Datacenter name of the hypervisor. |
Simple check | vmware.hv.datacenter.name[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing
|
VMware: Full name | The complete product name, including the version information. |
Simple check | vmware.hv.fullname[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing
|
VMware: CPU frequency | The speed of the CPU cores. This is an average value if there are multiple speeds. The product of CPU frequency and number of cores is approximately equal to the sum of the MHz for all the individual cores on the host. |
Simple check | vmware.hv.hw.cpu.freq[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: CPU model | The CPU model. |
Simple check | vmware.hv.hw.cpu.model[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: CPU cores | Number of physical CPU cores on the host. Physical CPU cores are the processors contained by a CPU package. |
Simple check | vmware.hv.hw.cpu.num[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing
|
VMware: CPU threads | Number of physical CPU threads on the host. |
Simple check | vmware.hv.hw.cpu.threads[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Total memory | The physical memory size. |
Simple check | vmware.hv.hw.memory[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Model | The system model identification. |
Simple check | vmware.hv.hw.model[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Bios UUID | The hardware BIOS identification. |
Simple check | vmware.hv.hw.uuid[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Vendor | The hardware vendor identification. |
Simple check | vmware.hv.hw.vendor[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Ballooned memory | The amount of guest physical memory that is currently reclaimed through the balloon driver. Sum of all guest VMs. |
Simple check | vmware.hv.memory.size.ballooned[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Used memory | Physical memory usage on the host. |
Simple check | vmware.hv.memory.used[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Number of bytes received | VMware hypervisor network input statistics (bytes per second). |
Simple check | vmware.hv.network.in[{$VMWARE.URL},{$VMWARE.HV.UUID},bps] |
VMware: Number of bytes transmitted | VMware hypervisor network output statistics (bytes per second). |
Simple check | vmware.hv.network.out[{$VMWARE.URL},{$VMWARE.HV.UUID},bps] |
VMware: Overall status | The overall alarm status of the host: gray - unknown, green - ok, red - it has a problem, yellow - it might have a problem. |
Simple check | vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Uptime | System uptime. |
Simple check | vmware.hv.uptime[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Version | Dot-separated version string. |
Simple check | vmware.hv.version[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Number of guest VMs | Number of guest virtual machines. |
Simple check | vmware.hv.vm.num[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Get sensors | Master item for sensors data. |
Simple check | vmware.hv.sensors.get[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware: Hypervisor is down | The service is unavailable or does not accept ICMP ping. |
last(/VMware Hypervisor/icmpping[])=0 |Average |
Manual close: Yes | |
VMware: The {$VMWARE.HV.UUID} health is Red | One or more components in the appliance might be in an unusable status and the appliance might become unresponsive soon. Security patches might be available. |
last(/VMware Hypervisor/vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}])=3 |High |
||
VMware: The {$VMWARE.HV.UUID} health is Yellow | One or more components in the appliance might become overloaded soon. |
last(/VMware Hypervisor/vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}])=2 |Average |
Depends on:
|
|
VMware: Hypervisor has been restarted | Uptime is less than 10 minutes. |
last(/VMware Hypervisor/vmware.hv.uptime[{$VMWARE.URL},{$VMWARE.HV.UUID}])<10m |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Datastore discovery | Simple check | vmware.hv.datastore.discovery[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware: Average read latency of the datastore {#DATASTORE} | Average amount of time for a read operation from the datastore (milliseconds). |
Simple check | vmware.hv.datastore.read[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},latency] |
VMware: Free space on datastore {#DATASTORE} (percentage) | VMware datastore free space in percentage from total. |
Simple check | vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},pfree] |
VMware: Total size of datastore {#DATASTORE} | VMware datastore space in bytes. |
Simple check | vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}] |
VMware: Average write latency of the datastore {#DATASTORE} | Average amount of time for a write operation to the datastore (milliseconds). |
Simple check | vmware.hv.datastore.write[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},latency] |
VMware: Multipath count for datastore {#DATASTORE} | Number of available datastore paths. |
Simple check | vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware: {#DATASTORE}: Free space is critically low | Datastore free space has fallen below critical threshold. |
last(/VMware Hypervisor/vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},pfree])<{$VMWARE.HV.DATASTORE.SPACE.CRIT} |High |
||
VMware: {#DATASTORE}: Free space is low | Datastore free space has fallen below warning threshold. |
last(/VMware Hypervisor/vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},pfree])<{$VMWARE.HV.DATASTORE.SPACE.WARN} |Warning |
Depends on:
|
|
VMware: The multipath count has been changed | The number of available datastore paths less than registered ({#MULTIPATH.COUNT}). |
last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}],#1)<>last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}],#2) and last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}])<{#MULTIPATH.COUNT} |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Healthcheck discovery | VMware Rollup Health State sensor discovery. |
Dependent item | vmware.hv.healthcheck.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware: Health state rollup | The host health state rollup sensor value: gray - unknown, green - ok, red - it has a problem, yellow - it might have a problem. |
Dependent item | vmware.hv.sensor.health.state[{#SINGLETON}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware: The {$VMWARE.HV.UUID} health is Red | One or more components in the appliance might be in an unusable status and the appliance might become unresponsive soon. Security patches might be available. |
last(/VMware Hypervisor/vmware.hv.sensor.health.state[{#SINGLETON}])="Red" |High |
Depends on:
|
|
VMware: The {$VMWARE.HV.UUID} health is Yellow | One or more components in the appliance might become overloaded soon. |
last(/VMware Hypervisor/vmware.hv.sensor.health.state[{#SINGLETON}])="Yellow" |Average |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Sensor discovery | VMware hardware sensor discovery. The data is retrieved from numeric sensor probes and provides information about the health of the physical system. |
Dependent item | vmware.hv.sensors.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware: Sensor [{#NAME}] health state | VMware hardware sensor health state. One of the following: - Unknown - Green - Yellow - Red |
Dependent item | vmware.hv.sensor.state["{#NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware: Sensor [{#NAME}] health state is Red | One or more components in the appliance might be in an unusable status and the appliance might become unresponsive soon. |
last(/VMware Hypervisor/vmware.hv.sensor.state["{#NAME}"])=3 |High |
Depends on:
|
|
VMware: Sensor [{#NAME}] health state is Yellow | One or more components in the appliance might become overloaded soon. |
last(/VMware Hypervisor/vmware.hv.sensor.state["{#NAME}"])=2 |Average |
Depends on:
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of both VMware vCenter and ESX hypervisor monitoring and doesn't require any external scripts.
The "VMware Hypervisor" and "VMware Guest" templates are used by discovery and normally should not be manually linked to a host. For additional information please check https://www.zabbix.com/documentation/6.4/manual/vm_monitoring
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
--with-libxml2
and --with-libcurl
)StartVMwareCollectors
option in Zabbix server configuration file to "1" or more{$VMWARE.URL}
{$VMWARE.USERNAME}
{$VMWARE.PASSWORD}
Note: To enable discovery of hardware sensors of VMware Hypervisors, set the macro {$VMWARE.HV.SENSOR.DISCOVERY}
to the value true
on the discovered host level.
Name | Description | Default |
---|---|---|
{$VMWARE.URL} | VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk) |
|
{$VMWARE.USERNAME} | VMware service user name |
|
{$VMWARE.PASSWORD} | VMware service {$USERNAME} user password |
|
{$VMWARE.HV.SENSOR.DISCOVERY} | Set "true"/"false" to enable or disable monitoring of hardware sensors. |
false |
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.MATCHES} | Sets the regex string of hardware sensor names to allow in discovery. |
.* |
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.NOT_MATCHES} | Sets the regex string of hardware sensor names to ignore in discovery. |
CHANGE_IF_NEEDED |
{$VMWARE.DATASTORE.SPACE.CRIT} | The critical threshold of the datastore free space. |
10 |
{$VMWARE.DATASTORE.SPACE.WARN} | The warning threshold of the datastore free space. |
20 |
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware: Event log | Collect VMware event log. See also: https://www.zabbix.com/documentation/6.4/manual/config/items/preprocessing/examples#filteringvmwareeventlogrecords |
Simple check | vmware.eventlog[{$VMWARE.URL},skip] |
VMware: Full name | VMware service full name. |
Simple check | vmware.fullname[{$VMWARE.URL}] Preprocessing
|
VMware: Version | VMware service version. |
Simple check | vmware.version[{$VMWARE.URL}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Discover VMware clusters | Discovery of clusters |
Simple check | vmware.cluster.discovery[{$VMWARE.URL}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware: Status of "{#CLUSTER.NAME}" cluster | VMware cluster status. |
Simple check | vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware: The {#CLUSTER.NAME} status is Red | A cluster enabled for DRS becomes invalid (red) when the tree is no longer internally consistent, that is, resource constraints are not observed. See also: https://docs.vmware.com/en/VMware-vSphere/8.0/vsphere-resource-management/GUID-C7417CAA-BD38-41D0-9529-9E7A5898BB12.html |
last(/VMware/vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}])=3 |High |
||
VMware: The {#CLUSTER.NAME} status is Yellow | A cluster becomes overcommitted (yellow) when the tree of resource pools and virtual machines is internally consistent but the cluster does not have the capacity to support all resources reserved by the child resource pools. See also: https://docs.vmware.com/en/VMware-vSphere/8.0/vsphere-resource-management/GUID-ED8240A0-FB54-4A31-BD3D-F23FE740F10C.html |
last(/VMware/vmware.cluster.status[{$VMWARE.URL},{#CLUSTER.NAME}])=2 |Average |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Discover VMware datastores | Simple check | vmware.datastore.discovery[{$VMWARE.URL}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware: Average read latency of the datastore {#DATASTORE} | Amount of time for a read operation from the datastore (milliseconds). |
Simple check | vmware.datastore.read[{$VMWARE.URL},{#DATASTORE},latency] |
VMware: Free space on datastore {#DATASTORE} (percentage) | VMware datastore free space in percentage from total. |
Simple check | vmware.datastore.size[{$VMWARE.URL},{#DATASTORE},pfree] |
VMware: Total size of datastore {#DATASTORE} | VMware datastore space in bytes. |
Simple check | vmware.datastore.size[{$VMWARE.URL},{#DATASTORE}] |
VMware: Average write latency of the datastore {#DATASTORE} | Amount of time for a write operation to the datastore (milliseconds). |
Simple check | vmware.datastore.write[{$VMWARE.URL},{#DATASTORE},latency] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware: {#DATASTORE}: Free space is critically low | Datastore free space has fallen below critical threshold. |
last(/VMware/vmware.datastore.size[{$VMWARE.URL},{#DATASTORE},pfree])<{$VMWARE.DATASTORE.SPACE.CRIT} |High |
||
VMware: {#DATASTORE}: Free space is low | Datastore free space has fallen below warning threshold. |
last(/VMware/vmware.datastore.size[{$VMWARE.URL},{#DATASTORE},pfree])<{$VMWARE.DATASTORE.SPACE.WARN} |Warning |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Discover VMware hypervisors | Discovery of hypervisors. |
Simple check | vmware.hv.discovery[{$VMWARE.URL}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Discover VMware VMs | Discovery of guest virtual machines. |
Simple check | vmware.vm.discovery[{$VMWARE.URL}] |
Name | Description | Default |
---|---|---|
{$VMWARE.URL} | VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk) |
|
{$VMWARE.USERNAME} | VMware service user name |
|
{$VMWARE.PASSWORD} | VMware service {$USERNAME} user password |
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware: Cluster name | Cluster name of the guest VM. |
Simple check | vmware.vm.cluster.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
VMware: Number of virtual CPUs | Number of virtual CPUs assigned to the guest. |
Simple check | vmware.vm.cpu.num[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
VMware: CPU ready | Time that the virtual machine was ready, but could not get scheduled to run on the physical CPU during last measurement interval (VMware vCenter/ESXi Server performance counter sampling interval - 20 seconds) |
Simple check | vmware.vm.cpu.ready[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: CPU usage | Current upper-bound on CPU usage. The upper-bound is based on the host the virtual machine is current running on, as well as limits configured on the virtual machine itself or any parent resource pool. Valid while the virtual machine is running. |
Simple check | vmware.vm.cpu.usage[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Datacenter name | Datacenter name of the guest VM. |
Simple check | vmware.vm.datacenter.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
VMware: Hypervisor name | Hypervisor name of the guest VM. |
Simple check | vmware.vm.hv.name[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
VMware: Ballooned memory | The amount of guest physical memory that is currently reclaimed through the balloon driver. |
Simple check | vmware.vm.memory.size.ballooned[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Compressed memory | The amount of memory currently in the compression cache for this VM. |
Simple check | vmware.vm.memory.size.compressed[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Private memory | Amount of memory backed by host memory and not being shared. |
Simple check | vmware.vm.memory.size.private[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Shared memory | The amount of guest physical memory shared through transparent page sharing. |
Simple check | vmware.vm.memory.size.shared[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Swapped memory | The amount of guest physical memory swapped out to the VM's swap device by ESX. |
Simple check | vmware.vm.memory.size.swapped[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Guest memory usage | The amount of guest physical memory that is being used by the VM. |
Simple check | vmware.vm.memory.size.usage.guest[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Host memory usage | The amount of host physical memory allocated to the VM, accounting for saving from memory sharing with other VMs. |
Simple check | vmware.vm.memory.size.usage.host[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Memory size | Total size of configured memory. |
Simple check | vmware.vm.memory.size[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
VMware: Power state | The current power state of the virtual machine. |
Simple check | vmware.vm.powerstate[{$VMWARE.URL},{$VMWARE.VM.UUID}] Preprocessing
|
VMware: Committed storage space | Total storage space, in bytes, committed to this virtual machine across all datastores. |
Simple check | vmware.vm.storage.committed[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Uncommitted storage space | Additional storage space, in bytes, potentially used by this virtual machine on all datastores. |
Simple check | vmware.vm.storage.uncommitted[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Unshared storage space | Total storage space, in bytes, occupied by the virtual machine across all datastores, that is not shared with any other virtual machine. |
Simple check | vmware.vm.storage.unshared[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Uptime | System uptime. |
Simple check | vmware.vm.uptime[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Guest memory swapped | Amount of guest physical memory that is swapped out to the swap space. |
Simple check | vmware.vm.guest.memory.size.swapped[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Host memory consumed | Amount of host physical memory consumed for backing up guest physical memory pages. |
Simple check | vmware.vm.memory.size.consumed[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Host memory usage in percents | Percentage of host physical memory that has been consumed. |
Simple check | vmware.vm.memory.usage[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: CPU usage in percents | CPU usage as a percentage during the interval. |
Simple check | vmware.vm.cpu.usage.perf[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: CPU latency in percents | Percentage of time the virtual machine is unable to run because it is contending for access to the physical CPU(s). |
Simple check | vmware.vm.cpu.latency[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: CPU readiness latency in percents | Percentage of time that the virtual machine was ready, but could not get scheduled to run on the physical CPU. |
Simple check | vmware.vm.cpu.readiness[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: CPU swap-in latency in percents | Percentage of CPU time spent waiting for swap-in. |
Simple check | vmware.vm.cpu.swapwait[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
VMware: Uptime of guest OS | Total time elapsed since the last operating system boot-up (in seconds). |
Simple check | vmware.vm.guest.osuptime[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware: VM has been restarted | Uptime is less than 10 minutes. |
last(/VMware Guest/vmware.vm.guest.osuptime[{$VMWARE.URL},{$VMWARE.VM.UUID}])<10m |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Network device discovery | Discovery of all network devices. |
Simple check | vmware.vm.net.if.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware: Number of bytes received on interface {#IFDESC} | VMware virtual machine network interface input statistics (bytes per second). |
Simple check | vmware.vm.net.if.in[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},bps] |
VMware: Number of packets received on interface {#IFDESC} | VMware virtual machine network interface input statistics (packets per second). |
Simple check | vmware.vm.net.if.in[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},pps] |
VMware: Number of bytes transmitted on interface {#IFDESC} | VMware virtual machine network interface output statistics (bytes per second). |
Simple check | vmware.vm.net.if.out[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},bps] |
VMware: Number of packets transmitted on interface {#IFDESC} | VMware virtual machine network interface output statistics (packets per second). |
Simple check | vmware.vm.net.if.out[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME},pps] |
VMware: Network utilization on interface {#IFDESC} | VMware virtual machine network utilization (combined transmit-rates and receive-rates) during the interval. |
Simple check | vmware.vm.net.if.usage[{$VMWARE.URL},{$VMWARE.VM.UUID},{#IFNAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Disk device discovery | Discovery of all disk devices. |
Simple check | vmware.vm.vfs.dev.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware: Average number of bytes read from the disk {#DISKDESC} | VMware virtual machine disk device read statistics (bytes per second). |
Simple check | vmware.vm.vfs.dev.read[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},bps] |
VMware: Average number of reads from the disk {#DISKDESC} | VMware virtual machine disk device read statistics (operations per second). |
Simple check | vmware.vm.vfs.dev.read[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},ops] |
VMware: Average number of bytes written to the disk {#DISKDESC} | VMware virtual machine disk device write statistics (bytes per second). |
Simple check | vmware.vm.vfs.dev.write[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},bps] |
VMware: Average number of writes to the disk {#DISKDESC} | VMware virtual machine disk device write statistics (operations per second). |
Simple check | vmware.vm.vfs.dev.write[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME},ops] |
VMware: Average number of outstanding read requests to the disk {#DISKDESC} | Average number of outstanding read requests to the virtual disk during the collection interval. |
Simple check | vmware.vm.storage.readoio[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}] |
VMware: Average number of outstanding write requests to the disk {#DISKDESC} | Average number of outstanding write requests to the virtual disk during the collection interval. |
Simple check | vmware.vm.storage.writeoio[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}] |
VMware: Average write latency to the disk {#DISKDESC} | The average time a write to the virtual disk takes. |
Simple check | vmware.vm.storage.totalwritelatency[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}] |
VMware: Average read latency to the disk {#DISKDESC} | The average time a read from the virtual disk takes. |
Simple check | vmware.vm.storage.totalreadlatency[{$VMWARE.URL},{$VMWARE.VM.UUID},{#DISKNAME}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Mounted filesystem discovery | Discovery of all guest file systems. |
Simple check | vmware.vm.vfs.fs.discovery[{$VMWARE.URL},{$VMWARE.VM.UUID}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware: Free disk space on {#FSNAME} | VMware virtual machine file system statistics (bytes). |
Simple check | vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},free] |
VMware: Free disk space on {#FSNAME} (percentage) | VMware virtual machine file system statistics (percentages). |
Simple check | vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},pfree] |
VMware: Total disk space on {#FSNAME} | VMware virtual machine total disk space (bytes). |
Simple check | vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},total] Preprocessing
|
VMware: Used disk space on {#FSNAME} | VMware virtual machine used disk space (bytes). |
Simple check | vmware.vm.vfs.fs.size[{$VMWARE.URL},{$VMWARE.VM.UUID},{#FSNAME},used] |
Name | Description | Default |
---|---|---|
{$VMWARE.URL} | VMware service (vCenter or ESX hypervisor) SDK URL (https://servername/sdk). |
|
{$VMWARE.USERNAME} | VMware service user name. |
|
{$VMWARE.PASSWORD} | VMware service {$USERNAME} user password. |
|
{$VMWARE.HV.SENSOR.DISCOVERY} | Set "true"/"false" to enable or disable monitoring of hardware sensors. |
false |
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.MATCHES} | Sets the regex string of hardware sensor names to allow in discovery. |
.* |
{$VMWARE.HV.SENSOR.DISCOVERY.NAME.NOT_MATCHES} | Sets the regex string of hardware sensor names to ignore in discovery. |
CHANGE_IF_NEEDED |
{$VMWARE.HV.DATASTORE.SPACE.CRIT} | The critical threshold of the datastore free space. |
10 |
{$VMWARE.HV.DATASTORE.SPACE.WARN} | The warning threshold of the datastore free space. |
20 |
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware: Hypervisor ping | Checks if the hypervisor is running and accepting ICMP pings. |
Simple check | icmpping[] Preprocessing
|
VMware: Cluster name | Cluster name of the guest VM. |
Simple check | vmware.hv.cluster.name[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing
|
VMware: CPU usage | Aggregated CPU usage across all cores on the host in Hz. This is only available if the host is connected. |
Simple check | vmware.hv.cpu.usage[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: CPU usage in percents | CPU usage as a percentage during the interval. |
Simple check | vmware.hv.cpu.usage.perf[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: CPU utilization | CPU usage as a percentage during the interval depends on power management or HT. |
Simple check | vmware.hv.cpu.utilization[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Power usage | Current power usage. |
Simple check | vmware.hv.power[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Power usage maximum allowed | Maximum allowed power usage. |
Simple check | vmware.hv.power[{$VMWARE.URL},{$VMWARE.HV.UUID},max] Preprocessing
|
VMware: Datacenter name | Datacenter name of the hypervisor. |
Simple check | vmware.hv.datacenter.name[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing
|
VMware: Full name | The complete product name, including the version information. |
Simple check | vmware.hv.fullname[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing
|
VMware: CPU frequency | The speed of the CPU cores. This is an average value if there are multiple speeds. The product of CPU frequency and number of cores is approximately equal to the sum of the MHz for all the individual cores on the host. |
Simple check | vmware.hv.hw.cpu.freq[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: CPU model | The CPU model. |
Simple check | vmware.hv.hw.cpu.model[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: CPU cores | Number of physical CPU cores on the host. Physical CPU cores are the processors contained by a CPU package. |
Simple check | vmware.hv.hw.cpu.num[{$VMWARE.URL},{$VMWARE.HV.UUID}] Preprocessing
|
VMware: CPU threads | Number of physical CPU threads on the host. |
Simple check | vmware.hv.hw.cpu.threads[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Total memory | The physical memory size. |
Simple check | vmware.hv.hw.memory[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Model | The system model identification. |
Simple check | vmware.hv.hw.model[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Bios UUID | The hardware BIOS identification. |
Simple check | vmware.hv.hw.uuid[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Vendor | The hardware vendor identification. |
Simple check | vmware.hv.hw.vendor[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Ballooned memory | The amount of guest physical memory that is currently reclaimed through the balloon driver. Sum of all guest VMs. |
Simple check | vmware.hv.memory.size.ballooned[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Used memory | Physical memory usage on the host. |
Simple check | vmware.hv.memory.used[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Number of bytes received | VMware hypervisor network input statistics (bytes per second). |
Simple check | vmware.hv.network.in[{$VMWARE.URL},{$VMWARE.HV.UUID},bps] |
VMware: Number of bytes transmitted | VMware hypervisor network output statistics (bytes per second). |
Simple check | vmware.hv.network.out[{$VMWARE.URL},{$VMWARE.HV.UUID},bps] |
VMware: Overall status | The overall alarm status of the host: gray - unknown, green - ok, red - it has a problem, yellow - it might have a problem. |
Simple check | vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Uptime | System uptime. |
Simple check | vmware.hv.uptime[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Version | Dot-separated version string. |
Simple check | vmware.hv.version[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Number of guest VMs | Number of guest virtual machines. |
Simple check | vmware.hv.vm.num[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
VMware: Get sensors | Master item for sensors data. |
Simple check | vmware.hv.sensors.get[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware: Hypervisor is down | The service is unavailable or does not accept ICMP ping. |
last(/VMware Hypervisor/icmpping[])=0 |Average |
Manual close: Yes | |
VMware: The {$VMWARE.HV.UUID} health is Red | One or more components in the appliance might be in an unusable status and the appliance might become unresponsive soon. Security patches might be available. |
last(/VMware Hypervisor/vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}])=3 |High |
||
VMware: The {$VMWARE.HV.UUID} health is Yellow | One or more components in the appliance might become overloaded soon. |
last(/VMware Hypervisor/vmware.hv.status[{$VMWARE.URL},{$VMWARE.HV.UUID}])=2 |Average |
Depends on:
|
|
VMware: Hypervisor has been restarted | Uptime is less than 10 minutes. |
last(/VMware Hypervisor/vmware.hv.uptime[{$VMWARE.URL},{$VMWARE.HV.UUID}])<10m |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Datastore discovery | Simple check | vmware.hv.datastore.discovery[{$VMWARE.URL},{$VMWARE.HV.UUID}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware: Average read latency of the datastore {#DATASTORE} | Average amount of time for a read operation from the datastore (milliseconds). |
Simple check | vmware.hv.datastore.read[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},latency] |
VMware: Free space on datastore {#DATASTORE} (percentage) | VMware datastore free space in percentage from total. |
Simple check | vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},pfree] |
VMware: Total size of datastore {#DATASTORE} | VMware datastore space in bytes. |
Simple check | vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}] |
VMware: Average write latency of the datastore {#DATASTORE} | Average amount of time for a write operation to the datastore (milliseconds). |
Simple check | vmware.hv.datastore.write[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},latency] |
VMware: Multipath count for datastore {#DATASTORE} | Number of available datastore paths. |
Simple check | vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware: {#DATASTORE}: Free space is critically low | Datastore free space has fallen below critical threshold. |
last(/VMware Hypervisor/vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},pfree])<{$VMWARE.HV.DATASTORE.SPACE.CRIT} |High |
||
VMware: {#DATASTORE}: Free space is low | Datastore free space has fallen below warning threshold. |
last(/VMware Hypervisor/vmware.hv.datastore.size[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE},pfree])<{$VMWARE.HV.DATASTORE.SPACE.WARN} |Warning |
Depends on:
|
|
VMware: The multipath count has been changed | The number of available datastore paths less than registered ({#MULTIPATH.COUNT}). |
last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}],#1)<>last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}],#2) and last(/VMware Hypervisor/vmware.hv.datastore.multipath[{$VMWARE.URL},{$VMWARE.HV.UUID},{#DATASTORE}])<{#MULTIPATH.COUNT} |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Healthcheck discovery | VMware Rollup Health State sensor discovery. |
Dependent item | vmware.hv.healthcheck.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware: Health state rollup | The host health state rollup sensor value: gray - unknown, green - ok, red - it has a problem, yellow - it might have a problem. |
Dependent item | vmware.hv.sensor.health.state[{#SINGLETON}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware: The {$VMWARE.HV.UUID} health is Red | One or more components in the appliance might be in an unusable status and the appliance might become unresponsive soon. Security patches might be available. |
last(/VMware Hypervisor/vmware.hv.sensor.health.state[{#SINGLETON}])="Red" |High |
Depends on:
|
|
VMware: The {$VMWARE.HV.UUID} health is Yellow | One or more components in the appliance might become overloaded soon. |
last(/VMware Hypervisor/vmware.hv.sensor.health.state[{#SINGLETON}])="Yellow" |Average |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Sensor discovery | VMware hardware sensor discovery. The data is retrieved from numeric sensor probes and provides information about the health of the physical system. |
Dependent item | vmware.hv.sensors.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
VMware: Sensor [{#NAME}] health state | VMware hardware sensor health state. One of the following: - Unknown - Green - Yellow - Red |
Dependent item | vmware.hv.sensor.state["{#NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
VMware: Sensor [{#NAME}] health state is Red | One or more components in the appliance might be in an unusable status and the appliance might become unresponsive soon. |
last(/VMware Hypervisor/vmware.hv.sensor.state["{#NAME}"])=3 |High |
Depends on:
|
|
VMware: Sensor [{#NAME}] health state is Yellow | One or more components in the appliance might become overloaded soon. |
last(/VMware Hypervisor/vmware.hv.sensor.state["{#NAME}"])=2 |Average |
Depends on:
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
It works without any external scripts and uses the script item.
NOTE: Veeam Backup Enterprise Manager REST API may not be available for some editions, the template will only work with the following editions of Veeam Backup and Replication:
See Veeam Data Platform Feature Comparison for more details.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
See Zabbix template operation for basic instructions.
Portal Administrator
role.
> See Veeam Help Center for more details.{$VEEAM.MANAGER.API.URL}
, {$VEEAM.MANAGER.USER}
, {$VEEAM.MANAGER.PASSWORD}
.Name | Description | Default |
---|---|---|
{$VEEAM.MANAGER.API.URL} | Veeam Backup Enterprise Manager API endpoint is a URL in the format: |
https://localhost:9398 |
{$VEEAM.MANAGER.HTTP.PROXY} | Sets the HTTP proxy to |
|
{$VEEAM.MANAGER.PASSWORD} | The |
|
{$VEEAM.MANAGER.USER} | The |
|
{$VEEAM.MANAGER.DATA.TIMEOUT} | A response timeout for API. |
10 |
{$BACKUP.TYPE.MATCHES} | This macro is used in backup discovery rule. |
.* |
{$BACKUP.TYPE.NOT_MATCHES} | This macro is used in backup discovery rule. |
CHANGE_IF_NEEDED |
{$BACKUP.NAME.MATCHES} | This macro is used in backup discovery rule. |
.* |
{$BACKUP.NAME.NOT_MATCHES} | This macro is used in backup discovery rule. |
CHANGE_IF_NEEDED |
{$VEEAM.MANAGER.JOB.MAX.WARN} | The maximum score of warning jobs (for a trigger expression). |
10 |
{$VEEAM.MANAGER.JOB.MAX.FAIL} | The maximum score of failed jobs (for a trigger expression). |
5 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Veeam Manager: Get metrics | The result of API requests is expressed in the JSON. |
Script | veeam.manager.get.metrics |
Veeam Manager: Get errors | The errors from API requests. |
Dependent item | veeam.manager.get.errors Preprocessing
|
Veeam Manager: Running Jobs | Informs about the running jobs. |
Dependent item | veeam.manager.running.jobs Preprocessing
|
Veeam Manager: Scheduled Jobs | Informs about the scheduled jobs. |
Dependent item | veeam.manager.scheduled.jobs Preprocessing
|
Veeam Manager: Scheduled Backup Jobs | Informs about the scheduled backup jobs. |
Dependent item | veeam.manager.scheduled.backup.jobs Preprocessing
|
Veeam Manager: Scheduled Replica Jobs | Informs about the scheduled replica jobs. |
Dependent item | veeam.manager.scheduled.replica.jobs Preprocessing
|
Veeam Manager: Total Job Runs | Informs about the total job runs. |
Dependent item | veeam.manager.scheduled.total.jobs Preprocessing
|
Veeam Manager: Warnings Job Runs | Informs about the warning job runs. |
Dependent item | veeam.manager.warning.jobs Preprocessing
|
Veeam Manager: Failed Job Runs | Informs about the failed job runs. |
Dependent item | veeam.manager.failed.jobs Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Veeam Manager: There are errors in requests to API | Zabbix has received errors in response to API requests. |
length(last(/Veeam Backup Enterprise Manager by HTTP/veeam.manager.get.errors))>0 |Average |
||
Veeam Manager: Warning job runs is too high | last(/Veeam Backup Enterprise Manager by HTTP/veeam.manager.warning.jobs)>{$VEEAM.MANAGER.JOB.MAX.WARN} |Warning |
Manual close: Yes | ||
Veeam Manager: Failed job runs is too high | last(/Veeam Backup Enterprise Manager by HTTP/veeam.manager.failed.jobs)>{$VEEAM.MANAGER.JOB.MAX.FAIL} |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Backup Files discovery | Discovery of all backup files created on, or imported to the backup servers that are connected to Veeam Backup Enterprise Manager. |
Dependent item | veeam.backup.files.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Veeam Manager: Backup Size [{#NAME}] | Gets the backup size with the name |
Dependent item | veeam.backup.file.size[{#NAME}] Preprocessing
|
Veeam Manager: Data Size [{#NAME}] | Gets the data size with the name |
Dependent item | veeam.backup.data.size[{#NAME}] Preprocessing
|
Veeam Manager: Compression ratio [{#NAME}] | Gets the data compression ratio with the name |
Dependent item | veeam.backup.compress.ratio[{#NAME}] Preprocessing
|
Veeam Manager: Deduplication Ratio [{#NAME}] | Gets the data deduplication ratio with the name |
Dependent item | veeam.backup.deduplication.ratio[{#NAME}] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed to monitor Veeam Backup and Replication. It works without any external scripts and uses the script item.
NOTE: Since the RESTful API may not be available for some editions, the template will only work with the following editions of Veeam Backup and Replication:
See Veeam Data Platform Feature Comparison for more details.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
{$VEEAM.API.URL}
, {$VEEAM.USER}
, and {$VEEAM.PASSWORD}
.Name | Description | Default |
---|---|---|
{$VEEAM.API.URL} | The Veeam API endpoint is a URL in the format |
https://localhost:9419 |
{$VEEAM.HTTP.PROXY} | Sets the HTTP proxy to |
|
{$VEEAM.PASSWORD} | The |
|
{$VEEAM.USER} | The |
|
{$VEEAM.DATA.TIMEOUT} | A response timeout for the API. |
10 |
{$CREATED.AFTER} | Returns sessions that are created after chosen days. |
7 |
{$SESSION.NAME.MATCHES} | This macro is used in discovery rule to evaluate sessions. |
.* |
{$SESSION.NAME.NOT_MATCHES} | This macro is used in discovery rule to evaluate sessions. |
CHANGE_IF_NEEDED |
{$SESSION.TYPE.MATCHES} | This macro is used in discovery rule to evaluate sessions. |
.* |
{$SESSION.TYPE.NOT_MATCHES} | This macro is used in discovery rule to evaluate sessions. |
CHANGE_IF_NEEDED |
{$PROXIES.NAME.MATCHES} | This macro is used in proxies discovery rule. |
.* |
{$PROXIES.NAME.NOT_MATCHES} | This macro is used in proxies discovery rule. |
CHANGE_IF_NEEDED |
{$PROXIES.TYPE.MATCHES} | This macro is used in proxies discovery rule. |
.* |
{$PROXIES.TYPE.NOT_MATCHES} | This macro is used in proxies discovery rule. |
CHANGE_IF_NEEDED |
{$REPOSITORIES.NAME.MATCHES} | This macro is used in repositories discovery rule. |
.* |
{$REPOSITORIES.NAME.NOT_MATCHES} | This macro is used in repositories discovery rule. |
CHANGE_IF_NEEDED |
{$REPOSITORIES.TYPE.MATCHES} | This macro is used in repositories discovery rule. |
.* |
{$REPOSITORIES.TYPE.NOT_MATCHES} | This macro is used in repositories discovery rule. |
CHANGE_IF_NEEDED |
{$JOB.NAME.MATCHES} | This macro is used in discovery rule to evaluate the states of jobs. |
.* |
{$JOB.NAME.NOT_MATCHES} | This macro is used in discovery rule to evaluate the states of jobs. |
CHANGE_IF_NEEDED |
{$JOB.TYPE.MATCHES} | This macro is used in discovery rule to evaluate the states of jobs. |
.* |
{$JOB.TYPE.NOT_MATCHES} | This macro is used in discovery rule to evaluate the states of jobs. |
CHANGE_IF_NEEDED |
{$JOB.STATUS.MATCHES} | This macro is used in discovery rule to evaluate the states of jobs. |
.* |
{$JOB.STATUS.NOT_MATCHES} | This macro is used in discovery rule to evaluate the states of jobs. |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
Veeam: Get metrics | The result of API requests is expressed in the JSON. |
Script | veeam.get.metrics |
Veeam: Get errors | The errors from API requests. |
Dependent item | veeam.get.errors Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Veeam: There are errors in requests to API | Zabbix has received errors in response to API requests. |
length(last(/Veeam Backup and Replication by HTTP/veeam.get.errors))>0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Proxies discovery | Discovery of proxies. |
Dependent item | veeam.proxies.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Veeam: Server [{#NAME}]: Get data | Gets raw data collected by the proxy server. |
Dependent item | veeam.proxy.server.raw[{#NAME}] Preprocessing
|
Veeam: Proxy [{#NAME}] [{#TYPE}]: Get data | Gets raw data collected by the proxy with the name |
Dependent item | veeam.proxy.raw[{#NAME}] Preprocessing
|
Veeam: Proxy [{#NAME}] [{#TYPE}]: Max Task Count | The maximum number of concurrent tasks. |
Dependent item | veeam.proxy.maxtask[{#NAME}] Preprocessing
|
Veeam: Proxy [{#NAME}] [{#TYPE}]: Host name | The name of the proxy server. |
Dependent item | veeam.proxy.server.name[{#NAME}] Preprocessing
|
Veeam: Proxy [{#NAME}] [{#TYPE}]: Host type | The type of the proxy server. |
Dependent item | veeam.proxy.server.type[{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Repositories discovery | Discovery of repositories. |
Dependent item | veeam.repositories.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Veeam: Repository [{#NAME}] [{#TYPE}]: Get data | Gets raw data from repository with the name: |
Dependent item | veeam.repositories.raw[{#NAME}] Preprocessing
|
Veeam: Repository [{#NAME}] [{#TYPE}]: Used space [{#PATH}] | Used space by repositories expressed in gigabytes (GB). |
Dependent item | veeam.repository.capacity[{#NAME}] Preprocessing
|
Veeam: Repository [{#NAME}] [{#TYPE}]: Free space [{#PATH}] | Free space of repositories expressed in gigabytes (GB). |
Dependent item | veeam.repository.free.space[{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Sessions discovery | Discovery of sessions. |
Dependent item | veeam.sessions.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Veeam: Session [{#NAME}] [{#TYPE}]: Get data | Gets raw data from session with the name: |
Dependent item | veeam.sessions.raw[{#ID}] Preprocessing
|
Veeam: Session [{#NAME}] [{#TYPE}]: State | The state of the session. The enums used: |
Dependent item | veeam.sessions.state[{#ID}] Preprocessing
|
Veeam: Session [{#NAME}] [{#TYPE}]: Result | The result of the session. The enums used: |
Dependent item | veeam.sessions.result[{#ID}] Preprocessing
|
Veeam: Session [{#NAME}] [{#TYPE}]: Message | A message that explains the session result. |
Dependent item | veeam.sessions.message[{#ID}] Preprocessing
|
Veeam: Session progress percent [{#NAME}] [{#TYPE}] | The progress of the session expressed as percentage. |
Dependent item | veeam.sessions.progress.percent[{#ID}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Veeam: Last result session failed | find(/Veeam Backup and Replication by HTTP/veeam.sessions.result[{#ID}],,"like","Failed")=1 |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Jobs states discovery | Discovery of the jobs states. |
Dependent item | veeam.job.state.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Veeam: Job states [{#NAME}] [{#TYPE}]: Get data | Gets raw data from the job states with the name |
Dependent item | veeam.jobs.states.raw[{#ID}] Preprocessing
|
Veeam: Job states [{#NAME}] [{#TYPE}]: Status | The current status of the job. The enums used: |
Dependent item | veeam.jobs.status[{#ID}] Preprocessing
|
Veeam: Job states [{#NAME}] [{#TYPE}]: Last result | The result of the session. The enums used: |
Dependent item | veeam.jobs.last.result[{#ID}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Veeam: Last result job failed | find(/Veeam Backup and Replication by HTTP/veeam.jobs.last.result[{#ID}],,"like","Failed")=1 |Average |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor HashiCorp Vault by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Vault by HTTP
— collects metrics by HTTP agent from /sys/metrics
API endpoint.
See https://www.vaultproject.io/api-docs/system/metrics.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
See Zabbix template operation for basic instructions.
Configure Vault API. See Vault Configuration.
Create a Vault service token and set it to the macro {$VAULT.TOKEN}
.
Name | Description | Default |
---|---|---|
{$VAULT.API.PORT} | Vault port. |
8200 |
{$VAULT.API.SCHEME} | Vault API scheme. |
http |
{$VAULT.HOST} | Vault host name. |
<PUT YOUR VAULT HOST> |
{$VAULT.OPEN.FDS.MAX.WARN} | Maximum percentage of used file descriptors for trigger expression. |
90 |
{$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN} | Maximum number of Vault leadership setup failed. |
5 |
{$VAULT.LEADERSHIP.LOSSES.MAX.WARN} | Maximum number of Vault leadership losses. |
5 |
{$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN} | Maximum number of Vault leadership step downs. |
5 |
{$VAULT.LLD.FILTER.STORAGE.MATCHES} | Filter of discoverable storage backends. |
.+ |
{$VAULT.TOKEN} | Vault auth token. |
<PUT YOUR AUTH TOKEN> |
{$VAULT.TOKEN.ACCESSORS} | Vault accessors separated by spaces for monitoring token expiration time. |
|
{$VAULT.TOKEN.TTL.MIN.CRIT} | Token TTL critical threshold. |
3d |
{$VAULT.TOKEN.TTL.MIN.WARN} | Token TTL warning threshold. |
7d |
Name | Description | Type | Key and additional info | |||
---|---|---|---|---|---|---|
Vault: Get health | HTTP agent | vault.get_health Preprocessing
|
||||
Vault: Get leader | HTTP agent | vault.get_leader Preprocessing
|
||||
Vault: Get metrics | HTTP agent | vault.get_metrics Preprocessing
|
||||
Vault: Clear metrics | Dependent item | vault.clear_metrics Preprocessing
|
||||
Vault: Get tokens | Get information about tokens via their accessors. Accessors are defined in the macro "{$VAULT.TOKEN.ACCESSORS}". |
Script | vault.get_tokens | |||
Vault: Check WAL discovery | Dependent item | vault.checkwaldiscovery Preprocessing
|
||||
Vault: Check replication discovery | Dependent item | vault.checkreplicationdiscovery Preprocessing
|
||||
Vault: Check storage discovery | Dependent item | vault.checkstoragediscovery Preprocessing
|
put | list | delete)_count$"}</p><p>⛔️Custom on fail: Discard value</p></li><li><p>JavaScript: The text is too long. Please see the template.</p></li><li><p>Discard unchanged with heartbeat: 15m` |
|
Vault: Check mountpoint discovery | Dependent item | vault.checkmountpointdiscovery Preprocessing
|
||||
Vault: Initialized | Initialization status. |
Dependent item | vault.health.initialized Preprocessing
|
|||
Vault: Sealed | Seal status. |
Dependent item | vault.health.sealed Preprocessing
|
|||
Vault: Standby | Standby status. |
Dependent item | vault.health.standby Preprocessing
|
|||
Vault: Performance standby | Performance standby status. |
Dependent item | vault.health.performance_standby Preprocessing
|
|||
Vault: Performance replication | Performance replication mode https://www.vaultproject.io/docs/enterprise/replication |
Dependent item | vault.health.replicationperformancemode Preprocessing
|
|||
Vault: Disaster Recovery replication | Disaster recovery replication mode https://www.vaultproject.io/docs/enterprise/replication |
Dependent item | vault.health.replicationdrmode Preprocessing
|
|||
Vault: Version | Server version. |
Dependent item | vault.health.version Preprocessing
|
|||
Vault: Healthcheck | Vault healthcheck. |
Dependent item | vault.health.check Preprocessing
|
|||
Vault: HA enabled | HA enabled status. |
Dependent item | vault.leader.ha_enabled Preprocessing
|
|||
Vault: Is leader | Leader status. |
Dependent item | vault.leader.is_self Preprocessing
|
|||
Vault: Get metrics error | Get metrics error. |
Dependent item | vault.get_metrics.error Preprocessing
|
|||
Vault: Process CPU seconds, total | Total user and system CPU time spent in seconds. |
Dependent item | vault.metrics.process.cpu.seconds.total Preprocessing
|
|||
Vault: Open file descriptors, max | Maximum number of open file descriptors. |
Dependent item | vault.metrics.process.max.fds Preprocessing
|
|||
Vault: Open file descriptors, current | Number of open file descriptors. |
Dependent item | vault.metrics.process.open.fds Preprocessing
|
|||
Vault: Process resident memory | Resident memory size in bytes. |
Dependent item | vault.metrics.process.resident_memory.bytes Preprocessing
|
|||
Vault: Uptime | Server uptime. |
Dependent item | vault.metrics.process.uptime Preprocessing
|
|||
Vault: Process virtual memory, current | Virtual memory size in bytes. |
Dependent item | vault.metrics.process.virtual_memory.bytes Preprocessing
|
|||
Vault: Process virtual memory, max | Maximum amount of virtual memory available in bytes. |
Dependent item | vault.metrics.process.virtual_memory.max.bytes Preprocessing
|
|||
Vault: Audit log requests, rate | Number of all audit log requests across all audit log devices. |
Dependent item | vault.metrics.audit.log.request.rate Preprocessing
|
|||
Vault: Audit log request failures, rate | Number of audit log request failures. |
Dependent item | vault.metrics.audit.log.request.failure.rate Preprocessing
|
|||
Vault: Audit log response, rate | Number of audit log responses across all audit log devices. |
Dependent item | vault.metrics.audit.log.response.rate Preprocessing
|
|||
Vault: Audit log response failures, rate | Number of audit log response failures. |
Dependent item | vault.metrics.audit.log.response.failure.rate Preprocessing
|
|||
Vault: Barrier DELETE ops, rate | Number of DELETE operations at the barrier. |
Dependent item | vault.metrics.barrier.delete.rate Preprocessing
|
|||
Vault: Barrier GET ops, rate | Number of GET operations at the barrier. |
Dependent item | vault.metrics.vault.barrier.get.rate Preprocessing
|
|||
Vault: Barrier LIST ops, rate | Number of LIST operations at the barrier. |
Dependent item | vault.metrics.barrier.list.rate Preprocessing
|
|||
Vault: Barrier PUT ops, rate | Number of PUT operations at the barrier. |
Dependent item | vault.metrics.barrier.put.rate Preprocessing
|
|||
Vault: Cache hit, rate | Number of times a value was retrieved from the LRU cache. |
Dependent item | vault.metrics.cache.hit.rate Preprocessing
|
|||
Vault: Cache miss, rate | Number of times a value was not in the LRU cache. The results in a read from the configured storage. |
Dependent item | vault.metrics.cache.miss.rate Preprocessing
|
|||
Vault: Cache write, rate | Number of times a value was written to the LRU cache. |
Dependent item | vault.metrics.cache.write.rate Preprocessing
|
|||
Vault: Check token, rate | Number of token checks handled by Vault core. |
Dependent item | vault.metrics.core.check.token.rate Preprocessing
|
|||
Vault: Fetch ACL and token, rate | Number of ACL and corresponding token entry fetches handled by Vault core. |
Dependent item | vault.metrics.core.fetch.aclandtoken Preprocessing
|
|||
Vault: Requests, rate | Number of requests handled by Vault core. |
Dependent item | vault.metrics.core.handle.request Preprocessing
|
|||
Vault: Leadership setup failed, counter | Cluster leadership setup failures which have occurred in a highly available Vault cluster. |
Dependent item | vault.metrics.core.leadership.setup_failed Preprocessing
|
|||
Vault: Leadership setup lost, counter | Cluster leadership losses which have occurred in a highly available Vault cluster. |
Dependent item | vault.metrics.core.leadership_lost Preprocessing
|
|||
Vault: Post-unseal ops, counter | Duration of time taken by post-unseal operations handled by Vault core. |
Dependent item | vault.metrics.core.post_unseal Preprocessing
|
|||
Vault: Pre-seal ops, counter | Duration of time taken by pre-seal operations. |
Dependent item | vault.metrics.core.pre_seal Preprocessing
|
|||
Vault: Requested seal ops, counter | Duration of time taken by requested seal operations. |
Dependent item | vault.metrics.core.sealwithrequest Preprocessing
|
|||
Vault: Seal ops, counter | Duration of time taken by seal operations. |
Dependent item | vault.metrics.core.seal Preprocessing
|
|||
Vault: Internal seal ops, counter | Duration of time taken by internal seal operations. |
Dependent item | vault.metrics.core.seal_internal Preprocessing
|
|||
Vault: Leadership step downs, counter | Cluster leadership step down. |
Dependent item | vault.metrics.core.step_down Preprocessing
|
|||
Vault: Unseal ops, counter | Duration of time taken by unseal operations. |
Dependent item | vault.metrics.core.unseal Preprocessing
|
|||
Vault: Fetch lease times, counter | Time taken to fetch lease times. |
Dependent item | vault.metrics.expire.fetch.lease.times Preprocessing
|
|||
Vault: Fetch lease times by token, counter | Time taken to fetch lease times by token. |
Dependent item | vault.metrics.expire.fetch.lease.times.by_token Preprocessing
|
|||
Vault: Number of expiring leases | Number of all leases which are eligible for eventual expiry. |
Dependent item | vault.metrics.expire.num_leases Preprocessing
|
|||
Vault: Expire revoke, count | Time taken to revoke a token. |
Dependent item | vault.metrics.expire.revoke Preprocessing
|
|||
Vault: Expire revoke force, count | Time taken to forcibly revoke a token. |
Dependent item | vault.metrics.expire.revoke.force Preprocessing
|
|||
Vault: Expire revoke prefix, count | Tokens revoke on a prefix. |
Dependent item | vault.metrics.expire.revoke.prefix Preprocessing
|
|||
Vault: Revoke secrets by token, count | Time taken to revoke all secrets issued with a given token. |
Dependent item | vault.metrics.expire.revoke.by_token Preprocessing
|
|||
Vault: Expire renew, count | Time taken to renew a lease. |
Dependent item | vault.metrics.expire.renew Preprocessing
|
|||
Vault: Renew token, count | Time taken to renew a token which does not need to invoke a logical backend. |
Dependent item | vault.metrics.expire.renew_token Preprocessing
|
|||
Vault: Register ops, count | Time taken for register operations. |
Dependent item | vault.metrics.expire.register Preprocessing
|
|||
Vault: Register auth ops, count | Time taken for register authentication operations which create lease entries without lease ID. |
Dependent item | vault.metrics.expire.register.auth Preprocessing
|
|||
Vault: Policy GET ops, rate | Number of operations to get a policy. |
Dependent item | vault.metrics.policy.get_policy.rate Preprocessing
|
|||
Vault: Policy LIST ops, rate | Number of operations to list policies. |
Dependent item | vault.metrics.policy.list_policies.rate Preprocessing
|
|||
Vault: Policy DELETE ops, rate | Number of operations to delete a policy. |
Dependent item | vault.metrics.policy.delete_policy.rate Preprocessing
|
|||
Vault: Policy SET ops, rate | Number of operations to set a policy. |
Dependent item | vault.metrics.policy.set_policy.rate Preprocessing
|
|||
Vault: Token create, count | The time taken to create a token. |
Dependent item | vault.metrics.token.create Preprocessing
|
|||
Vault: Token createAccessor, count | The time taken to create a token accessor. |
Dependent item | vault.metrics.token.createAccessor Preprocessing
|
|||
Vault: Token lookup, rate | Number of token look up. |
Dependent item | vault.metrics.token.lookup.rate Preprocessing
|
|||
Vault: Token revoke, count | The time taken to look up a token. |
Dependent item | vault.metrics.token.revoke Preprocessing
|
|||
Vault: Token revoke tree, count | Time taken to revoke a token tree. |
Dependent item | vault.metrics.token.revoke.tree Preprocessing
|
|||
Vault: Token store, count | Time taken to store an updated token entry without writing to the secondary index. |
Dependent item | vault.metrics.token.store Preprocessing
|
|||
Vault: Runtime allocated bytes | Number of bytes allocated by the Vault process. This could burst from time to time, but should return to a steady state value. |
Dependent item | vault.metrics.runtime.alloc.bytes Preprocessing
|
|||
Vault: Runtime freed objects | Number of freed objects. |
Dependent item | vault.metrics.runtime.free.count Preprocessing
|
|||
Vault: Runtime heap objects | Number of objects on the heap. This is a good general memory pressure indicator worth establishing a baseline and thresholds for alerting. |
Dependent item | vault.metrics.runtime.heap.objects Preprocessing
|
|||
Vault: Runtime malloc count | Cumulative count of allocated heap objects. |
Dependent item | vault.metrics.runtime.malloc.count Preprocessing
|
|||
Vault: Runtime num goroutines | Number of goroutines. This serves as a general system load indicator worth establishing a baseline and thresholds for alerting. |
Dependent item | vault.metrics.runtime.num_goroutines Preprocessing
|
|||
Vault: Runtime sys bytes | Number of bytes allocated to Vault. This includes what is being used by Vault's heap and what has been reclaimed but not given back to the operating system. |
Dependent item | vault.metrics.runtime.sys.bytes Preprocessing
|
|||
Vault: Runtime GC pause, total | The total garbage collector pause time since Vault was last started. |
Dependent item | vault.metrics.total.gc.pause Preprocessing
|
|||
Vault: Runtime GC runs, total | Total number of garbage collection runs since Vault was last started. |
Dependent item | vault.metrics.runtime.total.gc.runs Preprocessing
|
|||
Vault: Token count, total | Total number of service tokens available for use; counts all un-expired and un-revoked tokens in Vault's token store. This measurement is performed every 10 minutes. |
Dependent item | vault.metrics.token Preprocessing
|
|||
Vault: Token count by auth, total | Total number of service tokens that were created by an auth method. |
Dependent item | vault.metrics.token.by_auth Preprocessing
|
|||
Vault: Token count by policy, total | Total number of service tokens that have a policy attached. |
Dependent item | vault.metrics.token.by_policy Preprocessing
|
|||
Vault: Token count by ttl, total | Number of service tokens, grouped by the TTL range they were assigned at creation. |
Dependent item | vault.metrics.token.by_ttl Preprocessing
|
|||
Vault: Token creation, rate | Number of service or batch tokens created. |
Dependent item | vault.metrics.token.creation.rate Preprocessing
|
|||
Vault: Secret kv entries | Number of entries in each key-value secret engine. |
Dependent item | vault.metrics.secret.kv.count Preprocessing
|
|||
Vault: Token secret lease creation, rate | Counts the number of leases created by secret engines. |
Dependent item | vault.metrics.secret.lease.creation.rate Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Vault: Vault server is sealed | https://www.vaultproject.io/docs/concepts/seal |
last(/HashiCorp Vault by HTTP/vault.health.sealed)=1 |Average |
||
Vault: Version has changed | Vault version has changed. Acknowledge to close the problem manually. |
last(/HashiCorp Vault by HTTP/vault.health.version,#1)<>last(/HashiCorp Vault by HTTP/vault.health.version,#2) and length(last(/HashiCorp Vault by HTTP/vault.health.version))>0 |Info |
Manual close: Yes | |
Vault: Vault server is not responding | last(/HashiCorp Vault by HTTP/vault.health.check)=0 |High |
|||
Vault: Failed to get metrics | length(last(/HashiCorp Vault by HTTP/vault.get_metrics.error))>0 |Warning |
Depends on:
|
||
Vault: Current number of open files is too high | min(/HashiCorp Vault by HTTP/vault.metrics.process.open.fds,5m)/last(/HashiCorp Vault by HTTP/vault.metrics.process.max.fds)*100>{$VAULT.OPEN.FDS.MAX.WARN} |Warning |
|||
Vault: has been restarted | Uptime is less than 10 minutes. |
last(/HashiCorp Vault by HTTP/vault.metrics.process.uptime)<10m |Info |
Manual close: Yes | |
Vault: High frequency of leadership setup failures | There have been more than {$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN} Vault leadership setup failures in the past 1h. |
(max(/HashiCorp Vault by HTTP/vault.metrics.core.leadership.setup_failed,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.leadership.setup_failed,1h))>{$VAULT.LEADERSHIP.SETUP.FAILED.MAX.WARN} |Average |
||
Vault: High frequency of leadership losses | There have been more than {$VAULT.LEADERSHIP.LOSSES.MAX.WARN} Vault leadership losses in the past 1h. |
(max(/HashiCorp Vault by HTTP/vault.metrics.core.leadership_lost,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.leadership_lost,1h))>{$VAULT.LEADERSHIP.LOSSES.MAX.WARN} |Average |
||
Vault: High frequency of leadership step downs | There have been more than {$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN} Vault leadership step downs in the past 1h. |
(max(/HashiCorp Vault by HTTP/vault.metrics.core.step_down,1h)-min(/HashiCorp Vault by HTTP/vault.metrics.core.step_down,1h))>{$VAULT.LEADERSHIP.STEPDOWNS.MAX.WARN} |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Storage metrics discovery | Storage backend metrics discovery. |
Dependent item | vault.storage.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Vault: Storage [{#STORAGE}] {#OPERATION} ops, rate | Number of a {#OPERATION} operation against the {#STORAGE} storage backend. |
Dependent item | vault.metrics.storage.rate[{#STORAGE}, {#OPERATION}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Mountpoint metrics discovery | Mountpoint metrics discovery. |
Dependent item | vault.mountpoint.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Vault: Rollback attempt [{#MOUNTPOINT}] ops, rate | Number of operations to perform a rollback operation on the given mount point. |
Dependent item | vault.metrics.rollback.attempt.rate[{#MOUNTPOINT}] Preprocessing
|
Vault: Route rollback [{#MOUNTPOINT}] ops, rate | Number of operations to dispatch a rollback operation to a backend, and for that backend to process it. Rollback operations are automatically scheduled to clean up partial errors. |
Dependent item | vault.metrics.route.rollback.rate[{#MOUNTPOINT}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
WAL metrics discovery | Discovery for WAL metrics. |
Dependent item | vault.wal.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Vault: Delete WALs, count{#SINGLETON} | Time taken to delete a Write Ahead Log (WAL). |
Dependent item | vault.metrics.wal.deletewals[{#SINGLETON}] Preprocessing
|
Vault: GC deleted WAL{#SINGLETON} | Number of Write Ahead Logs (WAL) deleted during each garbage collection run. |
Dependent item | vault.metrics.wal.gc.deleted[{#SINGLETON}] Preprocessing
|
Vault: WALs on disk, total{#SINGLETON} | Total Number of Write Ahead Logs (WAL) on disk. |
Dependent item | vault.metrics.wal.gc.total[{#SINGLETON}] Preprocessing
|
Vault: Load WALs, count{#SINGLETON} | Time taken to load a Write Ahead Log (WAL). |
Dependent item | vault.metrics.wal.loadWAL[{#SINGLETON}] Preprocessing
|
Vault: Persist WALs, count{#SINGLETON} | Time taken to persist a Write Ahead Log (WAL). |
Dependent item | vault.metrics.wal.persistwals[{#SINGLETON}] Preprocessing
|
Vault: Flush ready WAL, count{#SINGLETON} | Time taken to flush a ready Write Ahead Log (WAL) to storage. |
Dependent item | vault.metrics.wal.flushready[{#SINGLETON}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Replication metrics discovery | Discovery for replication metrics. |
Dependent item | vault.replication.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Vault: Stream WAL missing guard, count{#SINGLETON} | Number of incidences where the starting Merkle Tree index used to begin streaming WAL entries is not matched/found. |
Dependent item | vault.metrics.logshipper.streamWALs.missing_guard[{#SINGLETON}] Preprocessing
|
Vault: Stream WAL guard found, count{#SINGLETON} | Number of incidences where the starting Merkle Tree index used to begin streaming WAL entries is matched/found. |
Dependent item | vault.metrics.logshipper.streamWALs.guard_found[{#SINGLETON}] Preprocessing
|
Vault: Merkle commit index{#SINGLETON} | The last committed index in the Merkle Tree. |
Dependent item | vault.metrics.replication.merkle.commit_index[{#SINGLETON}] Preprocessing
|
Vault: Last WAL{#SINGLETON} | The index of the last WAL. |
Dependent item | vault.metrics.replication.wal.last_wal[{#SINGLETON}] Preprocessing
|
Vault: Last DR WAL{#SINGLETON} | The index of the last DR WAL. |
Dependent item | vault.metrics.replication.wal.lastdrwal[{#SINGLETON}] Preprocessing
|
Vault: Last performance WAL{#SINGLETON} | The index of the last Performance WAL. |
Dependent item | vault.metrics.replication.wal.lastperformancewal[{#SINGLETON}] Preprocessing
|
Vault: Last remote WAL{#SINGLETON} | The index of the last remote WAL. |
Dependent item | vault.metrics.replication.fsm.lastremotewal[{#SINGLETON}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Token metrics discovery | Tokens metrics discovery. |
Dependent item | vault.tokens.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Vault: Token [{#TOKEN_NAME}] error | Token lookup error text. |
Dependent item | vault.tokenviaaccessor.error["{#ACCESSOR}"] Preprocessing
|
Vault: Token [{#TOKEN_NAME}] has TTL | The Token has TTL. |
Dependent item | vault.tokenviaaccessor.has_ttl["{#ACCESSOR}"] Preprocessing
|
Vault: Token [{#TOKEN_NAME}] TTL | The TTL period of the token. |
Dependent item | vault.tokenviaaccessor.ttl["{#ACCESSOR}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Vault: Token [{#TOKEN_NAME}] lookup error occurred | length(last(/HashiCorp Vault by HTTP/vault.token_via_accessor.error["{#ACCESSOR}"]))>0 |Warning |
Depends on:
|
||
Vault: Token [{#TOKEN_NAME}] will expire soon | last(/HashiCorp Vault by HTTP/vault.token_via_accessor.has_ttl["{#ACCESSOR}"])=1 and last(/HashiCorp Vault by HTTP/vault.token_via_accessor.ttl["{#ACCESSOR}"])<{$VAULT.TOKEN.TTL.MIN.CRIT} |Average |
|||
Vault: Token [{#TOKEN_NAME}] will expire soon | last(/HashiCorp Vault by HTTP/vault.token_via_accessor.has_ttl["{#ACCESSOR}"])=1 and last(/HashiCorp Vault by HTTP/vault.token_via_accessor.ttl["{#ACCESSOR}"])<{$VAULT.TOKEN.TTL.MIN.WARN} |Warning |
Depends on:
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Template for monitoring TrueNAS CORE by SNMP.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Name | Description | Default |
---|---|---|
{$CPU.UTIL.CRIT} | Threshold of CPU utilization for warning trigger in %. |
90 |
{$ICMPLOSSWARN} | Threshold of ICMP packets loss for warning trigger in %. |
20 |
{$ICMPRESPONSETIME_WARN} | Threshold of average ICMP response time for warning trigger in seconds. |
0.15 |
{$IF.ERRORS.WARN} | Threshold of error packets rate for warning trigger. Can be used with interface name as context. |
2 |
{$IF.UTIL.MAX} | Threshold of interface bandwidth utilization for warning trigger in %. Can be used with interface name as context. |
90 |
{$IFCONTROL} | Macro for operational state of the interface for link down trigger. Can be used with interface name as context. |
1 |
{$LOADAVGPER_CPU.MAX.WARN} | Load per CPU considered sustainable. Tune if needed. |
1.5 |
{$MEMORY.AVAILABLE.MIN} | Threshold of available memory for trigger in bytes. |
20M |
{$MEMORY.UTIL.MAX} | Threshold of memory utilization for trigger in % |
90 |
{$NET.IF.IFADMINSTATUS.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
^.* |
{$NET.IF.IFADMINSTATUS.NOT_MATCHES} | Ignore down(2) administrative status |
^2$ |
{$NET.IF.IFALIAS.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
.* |
{$NET.IF.IFALIAS.NOT_MATCHES} | This macro is used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
{$NET.IF.IFDESCR.MATCHES} | This macro used in filters of network interfaces discovery rule. |
.* |
{$NET.IF.IFDESCR.NOT_MATCHES} | This macro used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
{$NET.IF.IFNAME.NOT_MATCHES} | This macro used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
{$NET.IF.IFOPERSTATUS.MATCHES} | This macro used in filters of network interfaces discovery rule. |
^.*$ |
{$NET.IF.IFOPERSTATUS.NOT_MATCHES} | Ignore notPresent(6) |
^6$ |
{$NET.IF.IFTYPE.MATCHES} | This macro used in filters of network interfaces discovery rule. |
.* |
{$NET.IF.IFTYPE.NOT_MATCHES} | This macro used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
{$SNMP.TIMEOUT} | The time interval for SNMP availability trigger. |
5m |
{$SWAP.PFREE.MIN.WARN} | Threshold of free swap space for warning trigger in %. |
50 |
{$VFS.DEV.DEVNAME.MATCHES} | This macro is used in block devices discovery. Can be overridden on the host or linked template level |
.+ |
{$VFS.DEV.DEVNAME.NOT_MATCHES} | This macro is used in block devices discovery. Can be overridden on the host or linked template level |
Macro too long. Please see the template. |
{$DATASET.NAME.MATCHES} | This macro is used in datasets discovery. Can be overridden on the host or linked template level |
.+ |
{$DATASET.NAME.NOT_MATCHES} | This macro is used in datasets discovery. Can be overridden on the host or linked template level |
^(boot|.+\.system(.+)?$) |
{$ZPOOL.PUSED.MAX.WARN} | Threshold of used pool space for warning trigger in %. |
80 |
{$ZPOOL.FREE.MIN.WARN} | This macro is used for trigger expression. It can be overridden on the host or linked on the template level. |
5G |
{$ZPOOL.PUSED.MAX.CRIT} | Threshold of used pool space for average severity trigger in %. |
90 |
{$ZPOOL.FREE.MIN.CRIT} | This macro is used for trigger expression. It can be overridden on the host or linked on the template level. |
5G |
{$DATASET.PUSED.MAX.WARN} | Threshold of used dataset space for warning trigger in %. |
80 |
{$DATASET.FREE.MIN.WARN} | This macro is used for trigger expression. It can be overridden on the host or linked on the template level. |
5G |
{$DATASET.PUSED.MAX.CRIT} | Threshold of used dataset space for average severity trigger in %. |
90 |
{$DATASET.FREE.MIN.CRIT} | This macro is used for trigger expression. It can be overridden on the host or linked on the template level. |
5G |
{$TEMPERATURE.MAX.WARN} | This macro is used for trigger expression. It can be overridden on the host or linked on the template level. |
50 |
{$TEMPERATURE.MAX.CRIT} | This macro is used for trigger expression. It can be overridden on the host or linked on the template level. |
65 |
Name | Description | Type | Key and additional info |
---|---|---|---|
TrueNAS CORE: ICMP ping | Host accessibility by ICMP. 0 - ICMP ping fails. 1 - ICMP ping successful. |
Simple check | icmpping |
TrueNAS CORE: ICMP loss | Percentage of lost packets. |
Simple check | icmppingloss |
TrueNAS CORE: ICMP response time | ICMP ping response time (in seconds). |
Simple check | icmppingsec |
TrueNAS CORE: System contact details | MIB: SNMPv2-MIB The textual identification of the contact person for this managed node, together with information on how to contact this person. If no contact information is known, the value is the zero-length string. |
SNMP agent | system.contact Preprocessing
|
TrueNAS CORE: System description | MIB: SNMPv2-MIB System description of the host. |
SNMP agent | system.descr Preprocessing
|
TrueNAS CORE: System location | MIB: SNMPv2-MIB The physical location of this node. If the location is unknown, the value is the zero-length string. |
SNMP agent | system.location Preprocessing
|
TrueNAS CORE: System name | MIB: SNMPv2-MIB The host name of the system. |
SNMP agent | system.name Preprocessing
|
TrueNAS CORE: System object ID | MIB: SNMPv2-MIB The vendor authoritative identification of the network management subsystem contained in the entity. This value is allocated within the SMI enterprises subtree (1.3.6.1.4.1) and provides an easy and unambiguous means for determining what kind of box is being managed. |
SNMP agent | system.objectid Preprocessing
|
TrueNAS CORE: Uptime | MIB: HOST-RESOURCES-MIB The amount of time since this host was last initialized. Note that this is different from sysUpTime in the SNMPv2-MIB [RFC1907] because sysUpTime is the uptime of the network management portion of the system. |
SNMP agent | system.uptime Preprocessing
|
TrueNAS CORE: SNMP traps (fallback) | The item is used to collect all SNMP traps unmatched by other snmptrap items. |
SNMP trap | snmptrap.fallback |
TrueNAS CORE: SNMP agent availability | Availability of SNMP checks on the host. The value of this item corresponds to availability icons in the host list. Possible value: 0 - not available 1 - available 2 - unknown |
Zabbix internal | zabbix[host,snmp,available] |
TrueNAS CORE: Interrupts per second | MIB: UCD-SNMP-MIB Number of interrupts processed. |
SNMP agent | system.cpu.intr Preprocessing
|
TrueNAS CORE: Context switches per second | MIB: UCD-SNMP-MIB Number of context switches. |
SNMP agent | system.cpu.switches Preprocessing
|
TrueNAS CORE: Load average (1m avg) | MIB: UCD-SNMP-MIB The 1 minute load averages. |
SNMP agent | system.cpu.load.avg1 |
TrueNAS CORE: Load average (5m avg) | MIB: UCD-SNMP-MIB The 5 minutes load averages. |
SNMP agent | system.cpu.load.avg5 |
TrueNAS CORE: Load average (15m avg) | MIB: UCD-SNMP-MIB The 15 minutes load averages. |
SNMP agent | system.cpu.load.avg15 |
TrueNAS CORE: Number of CPUs | MIB: HOST-RESOURCES-MIB Count the number of CPU cores by counting number of cores discovered in hrProcessorTable using LLD. |
SNMP agent | system.cpu.num Preprocessing
|
TrueNAS CORE: Free memory | MIB: UCD-SNMP-MIB The amount of real/physical memory currently unused or available. |
SNMP agent | vm.memory.free Preprocessing
|
TrueNAS CORE: Memory (buffers) | MIB: UCD-SNMP-MIB The total amount of real or virtual memory currently allocated for use as memory buffers. |
SNMP agent | vm.memory.buffers Preprocessing
|
TrueNAS CORE: Memory (cached) | MIB: UCD-SNMP-MIB The total amount of real or virtual memory currently allocated for use as cached memory. |
SNMP agent | vm.memory.cached Preprocessing
|
TrueNAS CORE: Total memory | MIB: UCD-SNMP-MIB The total memory expressed in bytes. |
SNMP agent | vm.memory.total Preprocessing
|
TrueNAS CORE: Available memory | Please note that memory utilization is a rough estimate, since memory available is calculated as free+buffers+cached, which is not 100% accurate, but the best we can get using SNMP. |
Calculated | vm.memory.available |
TrueNAS CORE: Memory utilization | Please note that memory utilization is a rough estimate, since memory available is calculated as free+buffers+cached, which is not 100% accurate, but the best we can get using SNMP. |
Calculated | vm.memory.util |
TrueNAS CORE: Total swap space | MIB: UCD-SNMP-MIB The total amount of swap space configured for this host. |
SNMP agent | system.swap.total Preprocessing
|
TrueNAS CORE: Free swap space | MIB: UCD-SNMP-MIB The amount of swap space currently unused or available. |
SNMP agent | system.swap.free Preprocessing
|
TrueNAS CORE: Free swap space in % | The free space of the swap volume/file expressed in %. |
Calculated | system.swap.pfree Preprocessing
|
TrueNAS CORE: ARC size | MIB: FREENAS-MIB ARC size in bytes. |
SNMP agent | truenas.zfs.arc.size Preprocessing
|
TrueNAS CORE: ARC metadata size | MIB: FREENAS-MIB ARC metadata size used in bytes. |
SNMP agent | truenas.zfs.arc.meta Preprocessing
|
TrueNAS CORE: ARC data size | MIB: FREENAS-MIB ARC data size used in bytes. |
SNMP agent | truenas.zfs.arc.data Preprocessing
|
TrueNAS CORE: ARC hits | MIB: FREENAS-MIB Total amount of cache hits in the ARC per second. |
SNMP agent | truenas.zfs.arc.hits Preprocessing
|
TrueNAS CORE: ARC misses | MIB: FREENAS-MIB Total amount of cache misses in the ARC per second. |
SNMP agent | truenas.zfs.arc.misses Preprocessing
|
TrueNAS CORE: ARC target size of cache | MIB: FREENAS-MIB ARC target size of cache in bytes. |
SNMP agent | truenas.zfs.arc.c Preprocessing
|
TrueNAS CORE: ARC target size of MRU | MIB: FREENAS-MIB ARC target size of MRU in bytes. |
SNMP agent | truenas.zfs.arc.p Preprocessing
|
TrueNAS CORE: ARC cache hit ratio | MIB: FREENAS-MIB ARC cache hit ration percentage. |
SNMP agent | truenas.zfs.arc.hit.ratio |
TrueNAS CORE: ARC cache miss ratio | MIB: FREENAS-MIB ARC cache miss ration percentage. |
SNMP agent | truenas.zfs.arc.miss.ratio |
TrueNAS CORE: L2ARC hits | MIB: FREENAS-MIB Hits to the L2 cache per second. |
SNMP agent | truenas.zfs.l2arc.hits Preprocessing
|
TrueNAS CORE: L2ARC misses | MIB: FREENAS-MIB Misses to the L2 cache per second. |
SNMP agent | truenas.zfs.l2arc.misses Preprocessing
|
TrueNAS CORE: L2ARC read rate | MIB: FREENAS-MIB Read rate from L2 cache in bytes per second. |
SNMP agent | truenas.zfs.l2arc.read Preprocessing
|
TrueNAS CORE: L2ARC write rate | MIB: FREENAS-MIB Write rate from L2 cache in bytes per second. |
SNMP agent | truenas.zfs.l2arc.write Preprocessing
|
TrueNAS CORE: L2ARC size | MIB: FREENAS-MIB L2ARC size in bytes. |
SNMP agent | truenas.zfs.l2arc.size Preprocessing
|
TrueNAS CORE: ZIL operations 1 second | MIB: FREENAS-MIB The ops column parsed from the command zilstat 1 1. |
SNMP agent | truenas.zfs.zil.ops1 |
TrueNAS CORE: ZIL operations 5 seconds | MIB: FREENAS-MIB The ops column parsed from the command zilstat 5 1. |
SNMP agent | truenas.zfs.zil.ops5 |
TrueNAS CORE: ZIL operations 10 seconds | MIB: FREENAS-MIB The ops column parsed from the command zilstat 10 1. |
SNMP agent | truenas.zfs.zil.ops10 |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
TrueNAS CORE: Unavailable by ICMP ping | Last three attempts returned timeout. Please check device connectivity. |
max(/TrueNAS CORE by SNMP/icmpping,#3)=0 |High |
||
TrueNAS CORE: High ICMP ping loss | ICMP packets loss detected. |
min(/TrueNAS CORE by SNMP/icmppingloss,5m)>{$ICMP_LOSS_WARN} and min(/TrueNAS CORE by SNMP/icmppingloss,5m)<100 |Warning |
Depends on:
|
|
TrueNAS CORE: High ICMP ping response time | Average ICMP response time is too big. |
avg(/TrueNAS CORE by SNMP/icmppingsec,5m)>{$ICMP_RESPONSE_TIME_WARN} |Warning |
Depends on:
|
|
TrueNAS CORE: System name has changed | The name of the system has changed. Acknowledge to close the problem manually. |
last(/TrueNAS CORE by SNMP/system.name,#1)<>last(/TrueNAS CORE by SNMP/system.name,#2) and length(last(/TrueNAS CORE by SNMP/system.name))>0 |Info |
Manual close: Yes | |
TrueNAS CORE: Host has been restarted | Uptime is less than 10 minutes. |
last(/TrueNAS CORE by SNMP/system.uptime)<10m |Info |
Manual close: Yes | |
TrueNAS CORE: No SNMP data collection | SNMP is not available for polling. Please check device connectivity and SNMP settings. |
max(/TrueNAS CORE by SNMP/zabbix[host,snmp,available],{$SNMP.TIMEOUT})=0 |Warning |
Depends on:
|
|
TrueNAS CORE: Load average is too high | The load average per CPU is too high. The system may be slow to respond. |
min(/TrueNAS CORE by SNMP/system.cpu.load.avg1,5m)/last(/TrueNAS CORE by SNMP/system.cpu.num)>{$LOAD_AVG_PER_CPU.MAX.WARN} and last(/TrueNAS CORE by SNMP/system.cpu.load.avg5)>0 and last(/TrueNAS CORE by SNMP/system.cpu.load.avg15)>0 |Average |
||
TrueNAS CORE: Lack of available memory | The system is running out of memory. |
min(/TrueNAS CORE by SNMP/vm.memory.available,5m)<{$MEMORY.AVAILABLE.MIN} and last(/TrueNAS CORE by SNMP/vm.memory.total)>0 |Average |
||
TrueNAS CORE: High memory utilization | The system is running out of free memory. |
min(/TrueNAS CORE by SNMP/vm.memory.util,5m)>{$MEMORY.UTIL.MAX} |Average |
Depends on:
|
|
TrueNAS CORE: High swap space usage | If there is no swap configured, this trigger is ignored. |
min(/TrueNAS CORE by SNMP/system.swap.pfree,5m)<{$SWAP.PFREE.MIN.WARN} and last(/TrueNAS CORE by SNMP/system.swap.total)>0 |Warning |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
CPU discovery | This discovery will create set of per core CPU metrics from UCD-SNMP-MIB, using {#CPU.COUNT} in preprocessing. That's the only reason why LLD is used. |
Dependent item | cpu.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
TrueNAS CORE: CPU idle time | MIB: UCD-SNMP-MIB The time the CPU has spent doing nothing. |
SNMP agent | system.cpu.idle[{#SNMPINDEX}] |
TrueNAS CORE: CPU system time | MIB: UCD-SNMP-MIB The time the CPU has spent running the kernel and its processes. |
SNMP agent | system.cpu.system[{#SNMPINDEX}] Preprocessing
|
TrueNAS CORE: CPU user time | MIB: UCD-SNMP-MIB The time the CPU has spent running users' processes that are not niced. |
SNMP agent | system.cpu.user[{#SNMPINDEX}] Preprocessing
|
TrueNAS CORE: CPU nice time | MIB: UCD-SNMP-MIB The time the CPU has spent running users' processes that have been niced. |
SNMP agent | system.cpu.nice[{#SNMPINDEX}] Preprocessing
|
TrueNAS CORE: CPU iowait time | MIB: UCD-SNMP-MIB The amount of time the CPU has been waiting for I/O to complete. |
SNMP agent | system.cpu.iowait[{#SNMPINDEX}] Preprocessing
|
TrueNAS CORE: CPU interrupt time | MIB: UCD-SNMP-MIB The amount of time the CPU has been servicing hardware interrupts. |
SNMP agent | system.cpu.interrupt[{#SNMPINDEX}] Preprocessing
|
TrueNAS CORE: CPU utilization | The CPU utilization expressed in %. |
Dependent item | system.cpu.util[{#SNMPINDEX}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
TrueNAS CORE: High CPU utilization | The CPU utilization is too high. The system might be slow to respond. |
min(/TrueNAS CORE by SNMP/system.cpu.util[{#SNMPINDEX}],5m)>{$CPU.UTIL.CRIT} |Warning |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Block devices discovery | Block devices are discovered from UCD-DISKIO-MIB::diskIOTable (http://net-snmp.sourceforge.net/docs/mibs/ucdDiskIOMIB.html#diskIOTable). |
SNMP agent | vfs.dev.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
TrueNAS CORE: [{#DEVNAME}]: Disk read rate | MIB: UCD-DISKIO-MIB The number of read accesses from this device since boot. |
SNMP agent | vfs.dev.read.rate[{#SNMPINDEX}] Preprocessing
|
TrueNAS CORE: [{#DEVNAME}]: Disk write rate | MIB: UCD-DISKIO-MIB The number of write accesses from this device since boot. |
SNMP agent | vfs.dev.write.rate[{#SNMPINDEX}] Preprocessing
|
TrueNAS CORE: [{#DEVNAME}]: Disk utilization | MIB: UCD-DISKIO-MIB The 1 minute average load of disk (%). |
SNMP agent | vfs.dev.util[{#SNMPINDEX}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Network interfaces discovery | Discovering interfaces from IF-MIB. |
SNMP agent | net.if.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Inbound packets discarded | MIB: IF-MIB The number of inbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.in.discards[{#SNMPINDEX}] Preprocessing
|
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Inbound packets with errors | MIB: IF-MIB For packet-oriented interfaces, the number of inbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of inbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.in.errors[{#SNMPINDEX}] Preprocessing
|
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Bits received | MIB: IF-MIB The total number of octets received on the interface, including framing characters. This object is a 64-bit version of ifInOctets. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.in[{#SNMPINDEX}] Preprocessing
|
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Outbound packets discarded | MIB: IF-MIB The number of outbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.out.discards[{#SNMPINDEX}] Preprocessing
|
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Outbound packets with errors | MIB: IF-MIB For packet-oriented interfaces, the number of outbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of outbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.out.errors[{#SNMPINDEX}] Preprocessing
|
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Bits sent | MIB: IF-MIB The total number of octets transmitted out of the interface, including framing characters. This object is a 64-bit version of ifOutOctets.Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.out[{#SNMPINDEX}] Preprocessing
|
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Speed | MIB: IF-MIB An estimate of the interface's current bandwidth in units of 1,000,000 bits per second. If this object reports a value of |
SNMP agent | net.if.speed[{#SNMPINDEX}] Preprocessing
|
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Operational status | MIB: IF-MIB The current operational state of the interface. - The testing(3) state indicates that no operational packet scan be passed - If ifAdminStatus is down(2) then ifOperStatus should be down(2) - If ifAdminStatus is changed to up(1) then ifOperStatus should change to up(1) if the interface is ready to transmit and receive network traffic - It should change todormant(5) if the interface is waiting for external actions (such as a serial line waiting for an incoming connection) - It should remain in the down(2) state if and only if there is a fault that prevents it from going to the up(1) state - It should remain in the notPresent(6) state if the interface has missing(typically, hardware) components. |
SNMP agent | net.if.status[{#SNMPINDEX}] Preprocessing
|
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Interface type | MIB: IF-MIB The type of interface. Additional values for ifType are assigned by the Internet Assigned Numbers Authority (IANA), through updating the syntax of the IANAifType textual convention. |
SNMP agent | net.if.type[{#SNMPINDEX}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: High input error rate | It recovers when it is below 80% of the |
min(/TrueNAS CORE by SNMP/net.if.in.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} |Warning |
Depends on:
|
|
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: High inbound bandwidth usage | The utilization of the network interface is close to its estimated maximum bandwidth. |
(avg(/TrueNAS CORE by SNMP/net.if.in[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/TrueNAS CORE by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/TrueNAS CORE by SNMP/net.if.speed[{#SNMPINDEX}])>0 |Warning |
Depends on:
|
|
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: High output error rate | It recovers when it is below 80% of the |
min(/TrueNAS CORE by SNMP/net.if.out.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} |Warning |
Depends on:
|
|
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: High outbound bandwidth usage | The utilization of the network interface is close to its estimated maximum bandwidth. |
(avg(/TrueNAS CORE by SNMP/net.if.out[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/TrueNAS CORE by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/TrueNAS CORE by SNMP/net.if.speed[{#SNMPINDEX}])>0 |Warning |
Depends on:
|
|
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Ethernet has changed to lower speed than it was before | This Ethernet connection has transitioned down from its known maximum speed. This might be a sign of autonegotiation issues. Acknowledge to close the problem manually. |
change(/TrueNAS CORE by SNMP/net.if.speed[{#SNMPINDEX}])<0 and last(/TrueNAS CORE by SNMP/net.if.speed[{#SNMPINDEX}])>0 and ( last(/TrueNAS CORE by SNMP/net.if.type[{#SNMPINDEX}])=6 or last(/TrueNAS CORE by SNMP/net.if.type[{#SNMPINDEX}])=7 or last(/TrueNAS CORE by SNMP/net.if.type[{#SNMPINDEX}])=11 or last(/TrueNAS CORE by SNMP/net.if.type[{#SNMPINDEX}])=62 or last(/TrueNAS CORE by SNMP/net.if.type[{#SNMPINDEX}])=69 or last(/TrueNAS CORE by SNMP/net.if.type[{#SNMPINDEX}])=117 ) and (last(/TrueNAS CORE by SNMP/net.if.status[{#SNMPINDEX}])<>2) |Info |
Depends on:
|
|
TrueNAS CORE: Interface [{#IFNAME}({#IFALIAS})]: Link down | This trigger expression works as follows: |
{$IFCONTROL:"{#IFNAME}"}=1 and (last(/TrueNAS CORE by SNMP/net.if.status[{#SNMPINDEX}])=2) |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
ZFS pools discovery | ZFS pools discovery from FREENAS-MIB. |
SNMP agent | truenas.zfs.pools.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
TrueNAS CORE: Pool [{#POOLNAME}]: Total space | MIB: FREENAS-MIB The size of the storage pool in bytes. |
SNMP agent | truenas.zpool.size.total[{#POOLNAME}] Preprocessing
|
TrueNAS CORE: Pool [{#POOLNAME}]: Used space | MIB: FREENAS-MIB The used size of the storage pool in bytes. |
SNMP agent | truenas.zpool.used[{#POOLNAME}] Preprocessing
|
TrueNAS CORE: Pool [{#POOLNAME}]: Available space | MIB: FREENAS-MIB The available size of the storage pool in bytes. |
SNMP agent | truenas.zpool.avail[{#POOLNAME}] Preprocessing
|
TrueNAS CORE: Pool [{#POOLNAME}]: Usage in % | The used size of the storage pool in %. |
Calculated | truenas.zpool.pused[{#POOLNAME}] |
TrueNAS CORE: Pool [{#POOLNAME}]: Health | MIB: FREENAS-MIB The current health of the containing pool, as reported by zpool status. |
SNMP agent | truenas.zpool.health[{#POOLNAME}] Preprocessing
|
TrueNAS CORE: Pool [{#POOLNAME}]: Read operations rate | MIB: FREENAS-MIB The number of read I/O operations sent to the pool or device, including metadata requests (averaged since system booted). |
SNMP agent | truenas.zpool.read.ops[{#POOLNAME}] Preprocessing
|
TrueNAS CORE: Pool [{#POOLNAME}]: Write operations rate | MIB: FREENAS-MIB The number of write I/O operations sent to the pool or device (averaged since system booted). |
SNMP agent | truenas.zpool.write.ops[{#POOLNAME}] Preprocessing
|
TrueNAS CORE: Pool [{#POOLNAME}]: Read rate | MIB: FREENAS-MIB The bandwidth of all read operations (including metadata), expressed as units per second (averaged since system booted). |
SNMP agent | truenas.zpool.read.bytes[{#POOLNAME}] Preprocessing
|
TrueNAS CORE: Pool [{#POOLNAME}]: Write rate | MIB: FREENAS-MIB The bandwidth of all write operations, expressed as units per second (averaged since system booted). |
SNMP agent | truenas.zpool.write.bytes[{#POOLNAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
TrueNAS CORE: Pool [{#POOLNAME}]: Very high space usage | Two conditions should match: |
min(/TrueNAS CORE by SNMP/truenas.zpool.pused[{#POOLNAME}],5m) > {$ZPOOL.PUSED.MAX.CRIT:"{#POOLNAME}"} and last(/TrueNAS CORE by SNMP/truenas.zpool.avail[{#POOLNAME}]) < {$ZPOOL.FREE.MIN.CRIT:"{#POOLNAME}"} |Average |
||
TrueNAS CORE: Pool [{#POOLNAME}]: High space usage | Two conditions should match: |
min(/TrueNAS CORE by SNMP/truenas.zpool.pused[{#POOLNAME}],5m) > {$ZPOOL.PUSED.MAX.WARN:"{#POOLNAME}"} and last(/TrueNAS CORE by SNMP/truenas.zpool.avail[{#POOLNAME}]) < {$ZPOOL.FREE.MIN.WARN:"{#POOLNAME}"} |Warning |
Depends on:
|
|
TrueNAS CORE: Pool [{#POOLNAME}]: Status is not online | Please check pool status. |
last(/TrueNAS CORE by SNMP/truenas.zpool.health[{#POOLNAME}]) <> 0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
ZFS datasets discovery | ZFS datasets discovery from FREENAS-MIB. |
SNMP agent | truenas.zfs.dataset.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
TrueNAS CORE: Dataset [{#DATASET_NAME}]: Total space | MIB: FREENAS-MIB The size of the dataset in bytes. |
SNMP agent | truenas.dataset.size.total[{#DATASET_NAME}] Preprocessing
|
TrueNAS CORE: Dataset [{#DATASET_NAME}]: Used space | MIB: FREENAS-MIB The used size of the dataset in bytes. |
SNMP agent | truenas.dataset.used[{#DATASET_NAME}] Preprocessing
|
TrueNAS CORE: Dataset [{#DATASET_NAME}]: Available space | MIB: FREENAS-MIB The available size of the dataset in bytes. |
SNMP agent | truenas.dataset.avail[{#DATASET_NAME}] Preprocessing
|
TrueNAS CORE: Dataset [{#DATASET_NAME}]: Usage in % | The used size of the dataset in %. |
Calculated | truenas.dataset.pused[{#DATASET_NAME}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
TrueNAS CORE: Dataset [{#DATASET_NAME}]: Very high space usage | Two conditions should match: |
min(/TrueNAS CORE by SNMP/truenas.dataset.pused[{#DATASET_NAME}],5m) > {$DATASET.PUSED.MAX.CRIT:"{#DATASET_NAME}"} and last(/TrueNAS CORE by SNMP/truenas.dataset.avail[{#DATASET_NAME}]) < {$DATASET.FREE.MIN.CRIT:"{#POOLNAME}"} |Average |
||
TrueNAS CORE: Dataset [{#DATASET_NAME}]: High space usage | Two conditions should match: |
min(/TrueNAS CORE by SNMP/truenas.dataset.pused[{#DATASET_NAME}],5m) > {$DATASET.PUSED.MAX.WARN:"{#DATASET_NAME}"} and last(/TrueNAS CORE by SNMP/truenas.dataset.avail[{#DATASET_NAME}]) < {$DATASET.FREE.MIN.WARN:"{#POOLNAME}"} |Warning |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
ZFS volumes discovery | ZFS volumes discovery from FREENAS-MIB. |
SNMP agent | truenas.zfs.zvols.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
TrueNAS CORE: ZFS volume [{#ZVOL_NAME}]: Total space | MIB: FREENAS-MIB The size of the ZFS volume in bytes. |
SNMP agent | truenas.zvol.size.total[{#ZVOL_NAME}] Preprocessing
|
TrueNAS CORE: ZFS volume [{#ZVOL_NAME}]: Used space | MIB: FREENAS-MIB The used size of the ZFS volume in bytes. |
SNMP agent | truenas.zvol.used[{#ZVOL_NAME}] Preprocessing
|
TrueNAS CORE: ZFS volume [{#ZVOL_NAME}]: Available space | MIB: FREENAS-MIB The available of the ZFS volume in bytes. |
SNMP agent | truenas.zvol.avail[{#ZVOL_NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Disks temperature discovery | Disks temperature discovery from FREENAS-MIB. |
SNMP agent | truenas.disk.temp.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
TrueNAS CORE: Disk [{#DISK_NAME}]: Temperature | MIB: FREENAS-MIB The temperature of this HDD in mC. |
SNMP agent | truenas.disk.temp[{#DISK_NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
TrueNAS CORE: Disk [{#DISK_NAME}]: Average disk temperature is too high | Disk temperature is high. |
avg(/TrueNAS CORE by SNMP/truenas.disk.temp[{#DISK_NAME}],5m) > {$TEMPERATURE.MAX.CRIT:"{#DISK_NAME}"} |Average |
||
TrueNAS CORE: Disk [{#DISK_NAME}]: Average disk temperature is too high | Disk temperature is high. |
avg(/TrueNAS CORE by SNMP/truenas.disk.temp[{#DISK_NAME}],5m) > {$TEMPERATURE.MAX.WARN:"{#DISK_NAME}"} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Travis CI by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
You must set {$TRAVIS.API.TOKEN} and {$TRAVIS.API.URL} macros. {$TRAVIS.API.TOKEN} is a Travis API authentication token located in User -> Settings -> API authentication. {$TRAVIS.API.URL} could be in 2 different variations:
Name | Description | Default |
---|---|---|
{$TRAVIS.API.TOKEN} | Travis API Token |
|
{$TRAVIS.API.URL} | Travis API URL |
api.travis-ci.com |
{$TRAVIS.BUILDS.SUCCESS.PERCENT} | Percent of successful builds in the repo (for trigger expression) |
80 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Travis: Get repos | Getting repos using Travis API. |
HTTP agent | travis.get_repos |
Travis: Get builds | Getting builds using Travis API. |
HTTP agent | travis.get_builds |
Travis: Get jobs | Getting jobs using Travis API. |
HTTP agent | travis.get_jobs |
Travis: Get health | Getting home JSON using Travis API. |
HTTP agent | travis.get_health Preprocessing
|
Travis: Jobs passed | Total count of passed jobs in all repos. |
Dependent item | travis.jobs.total Preprocessing
|
Travis: Jobs active | Active jobs in all repos. |
Dependent item | travis.jobs.active Preprocessing
|
Travis: Jobs in queue | Jobs in queue in all repos. |
Dependent item | travis.jobs.queue Preprocessing
|
Travis: Builds | Total count of builds in all repos. |
Dependent item | travis.builds.total Preprocessing
|
Travis: Builds duration | Sum of all builds durations in all repos. |
Dependent item | travis.builds.duration Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Travis: Service is unavailable | Travis API is unavailable. Please check if the correct macros are set. |
last(/Travis CI by HTTP/travis.get_health)=0 |High |
Manual close: Yes | |
Travis: Failed to fetch home page | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Travis CI by HTTP/travis.get_health,30m)=1 |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Repos metrics discovery | Metrics for Repos statistics. |
Dependent item | travis.repos.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Travis: Repo [{#SLUG}]: Get builds | Getting builds of {#SLUG} using Travis API. |
HTTP agent | travis.repo.get_builds[{#SLUG}] |
Travis: Repo [{#SLUG}]: Get caches | Getting caches of {#SLUG} using Travis API. |
HTTP agent | travis.repo.get_caches[{#SLUG}] |
Travis: Repo [{#SLUG}]: Cache files | Count of cache files in {#SLUG} repo. |
Dependent item | travis.repo.caches.files[{#SLUG}] Preprocessing
|
Travis: Repo [{#SLUG}]: Cache size | Total size of cache files in {#SLUG} repo. |
Dependent item | travis.repo.caches.size[{#SLUG}] Preprocessing
|
Travis: Repo [{#SLUG}]: Builds passed | Count of all passed builds in {#SLUG} repo. |
Dependent item | travis.repo.builds.passed[{#SLUG}] Preprocessing
|
Travis: Repo [{#SLUG}]: Builds failed | Count of all failed builds in {#SLUG} repo. |
Dependent item | travis.repo.builds.failed[{#SLUG}] Preprocessing
|
Travis: Repo [{#SLUG}]: Builds total | Count of total builds in {#SLUG} repo. |
Dependent item | travis.repo.builds.total[{#SLUG}] Preprocessing
|
Travis: Repo [{#SLUG}]: Builds passed, % | Percent of passed builds in {#SLUG} repo. |
Calculated | travis.repo.builds.passed.pct[{#SLUG}] |
Travis: Repo [{#SLUG}]: Description | Description of Travis repo (git project description). |
Dependent item | travis.repo.description[{#SLUG}] Preprocessing
|
Travis: Repo [{#SLUG}]: Last build duration | Last build duration in {#SLUG} repo. |
Dependent item | travis.repo.last_build.duration[{#SLUG}] Preprocessing
|
Travis: Repo [{#SLUG}]: Last build state | Last build state in {#SLUG} repo. |
Dependent item | travis.repo.last_build.state[{#SLUG}] Preprocessing
|
Travis: Repo [{#SLUG}]: Last build number | Last build number in {#SLUG} repo. |
Dependent item | travis.repo.last_build.number[{#SLUG}] Preprocessing
|
Travis: Repo [{#SLUG}]: Last build id | Last build id in {#SLUG} repo. |
Dependent item | travis.repo.last_build.id[{#SLUG}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Travis: Repo [{#SLUG}]: Percent of successful builds | Low successful builds rate. |
last(/Travis CI by HTTP/travis.repo.builds.passed.pct[{#SLUG}])<{$TRAVIS.BUILDS.SUCCESS.PERCENT} |Warning |
Manual close: Yes | |
Travis: Repo [{#SLUG}]: Last build status is 'errored' | Last build status is errored. |
find(/Travis CI by HTTP/travis.repo.last_build.state[{#SLUG}],,"like","errored")=1 |Warning |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Apache Tomcat monitoring by Zabbix via JMX and doesn't require any external scripts.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Metrics are collected by JMX.
Name | Description | Default |
---|---|---|
{$TOMCAT.USER} | User for JMX |
|
{$TOMCAT.PASSWORD} | Password for JMX |
|
{$TOMCAT.LLD.FILTER.REQUEST_PROCESSOR.MATCHES} | Filter for discoverable global request processors. |
.* |
{$TOMCAT.LLD.FILTER.REQUESTPROCESSOR.NOTMATCHES} | Filter to exclude global request processors. |
CHANGE_IF_NEEDED |
{$TOMCAT.LLD.FILTER.MANAGER.MATCHES} | Filter for discoverable managers. |
.* |
{$TOMCAT.LLD.FILTER.MANAGER.NOT_MATCHES} | Filter to exclude managers. |
CHANGE_IF_NEEDED |
{$TOMCAT.LLD.FILTER.THREAD_POOL.MATCHES} | Filter for discoverable thread pools. |
.* |
{$TOMCAT.LLD.FILTER.THREADPOOL.NOTMATCHES} | Filter to exclude thread pools. |
CHANGE_IF_NEEDED |
{$TOMCAT.THREADS.MAX.PCT} | Threshold for busy worker threads trigger. Can be used with {#JMXNAME} as context. |
75 |
{$TOMCAT.THREADS.MAX.TIME} | The time during which the number of busy threads can exceed the threshold. Can be used with {#JMXNAME} as context. |
5m |
Name | Description | Type | Key and additional info |
---|---|---|---|
Tomcat: Version | The version of the Tomcat. |
JMX agent | jmx["Catalina:type=Server",serverInfo] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Tomcat: Version has been changed | The Tomcat version has changed. Acknowledge to close the problem manually. |
last(/Apache Tomcat by JMX/jmx["Catalina:type=Server",serverInfo],#1)<>last(/Apache Tomcat by JMX/jmx["Catalina:type=Server",serverInfo],#2) and length(last(/Apache Tomcat by JMX/jmx["Catalina:type=Server",serverInfo]))>0 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Global request processors discovery | Discovery for GlobalRequestProcessor |
JMX agent | jmx.discovery[beans,"Catalina:type=GlobalRequestProcessor,name=*"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#JMXNAME}: Bytes received per second | Bytes received rate by processor {#JMXNAME} |
JMX agent | jmx[{#JMXOBJ},bytesReceived] Preprocessing
|
{#JMXNAME}: Bytes sent per second | Bytes sent rate by processor {#JMXNAME} |
JMX agent | jmx[{#JMXOBJ},bytesSent] Preprocessing
|
{#JMXNAME}: Errors per second | Error rate of request processor {#JMXNAME} |
JMX agent | jmx[{#JMXOBJ},errorCount] Preprocessing
|
{#JMXNAME}: Requests per second | Rate of requests served by request processor {#JMXNAME} |
JMX agent | jmx[{#JMXOBJ},requestCount] Preprocessing
|
{#JMXNAME}: Requests processing time | The total time to process all incoming requests of request processor {#JMXNAME} |
JMX agent | jmx[{#JMXOBJ},processingTime] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Protocol handlers discovery | Discovery for ProtocolHandler |
JMX agent | jmx.discovery[attributes,"Catalina:type=ProtocolHandler,port=*"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#JMXVALUE}: Gzip compression status | Gzip compression status on {#JMXNAME}. Enabling gzip compression may save server bandwidth. |
JMX agent | jmx[{#JMXOBJ},compression] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
{#JMXVALUE}: Gzip compression is disabled | gzip compression is disabled for connector {#JMXVALUE}. |
find(/Apache Tomcat by JMX/jmx[{#JMXOBJ},compression],,"like","off") = 1 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Thread pools discovery | Discovery for ThreadPool |
JMX agent | jmx.discovery[beans,"Catalina:type=ThreadPool,name=*"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#JMXNAME}: Threads count | Amount of threads the thread pool has right now, both busy and free. |
JMX agent | jmx[{#JMXOBJ},currentThreadCount] Preprocessing
|
{#JMXNAME}: Threads limit | Limit of the threads count. When currentThreadsBusy counter reaches the maxThreads limit, no more requests could be handled, and the application chokes. |
JMX agent | jmx[{#JMXOBJ},maxThreads] Preprocessing
|
{#JMXNAME}: Threads busy | Number of the requests that are being currently handled. |
JMX agent | jmx[{#JMXOBJ},currentThreadsBusy] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
{#JMXNAME}: Busy worker threads count is high | When current threads busy counter reaches the limit, no more requests could be handled, and the application chokes. |
min(/Apache Tomcat by JMX/jmx[{#JMXOBJ},currentThreadsBusy],{$TOMCAT.THREADS.MAX.TIME:"{#JMXNAME}"})>last(/Apache Tomcat by JMX/jmx[{#JMXOBJ},maxThreads])*{$TOMCAT.THREADS.MAX.PCT:"{#JMXNAME}"}/100 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Contexts discovery | Discovery for contexts |
JMX agent | jmx.discovery[beans,"Catalina:type=Manager,host=,context="] |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#JMXHOST}{#JMXCONTEXT}: Sessions active | Active sessions of the application. |
JMX agent | jmx[{#JMXOBJ},activeSessions] |
{#JMXHOST}{#JMXCONTEXT}: Sessions active maximum so far | Maximum number of active sessions so far. |
JMX agent | jmx[{#JMXOBJ},maxActive] |
{#JMXHOST}{#JMXCONTEXT}: Sessions created per second | Rate of sessions created by this application per second. |
JMX agent | jmx[{#JMXOBJ},sessionCounter] Preprocessing
|
{#JMXHOST}{#JMXCONTEXT}: Sessions rejected per second | Rate of sessions we rejected due to maxActive being reached. |
JMX agent | jmx[{#JMXOBJ},rejectedSessions] Preprocessing
|
{#JMXHOST}{#JMXCONTEXT}: Sessions allowed maximum | The maximum number of active Sessions allowed, or -1 for no limit. |
JMX agent | jmx[{#JMXOBJ},maxActiveSessions] |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Systemd monitoring by Zabbix via Zabbix agent 2 and doesn't require any external scripts.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Name | Description | Default |
---|---|---|
{$SYSTEMD.NAME.SOCKET.MATCHES} | Filter of systemd socket units by name |
.* |
{$SYSTEMD.NAME.SOCKET.NOT_MATCHES} | Filter of systemd socket units by name |
CHANGE_IF_NEEDED |
{$SYSTEMD.ACTIVESTATE.SOCKET.MATCHES} | Filter of systemd socket units by active state |
active |
{$SYSTEMD.ACTIVESTATE.SOCKET.NOT_MATCHES} | Filter of systemd socket units by active state |
CHANGE_IF_NEEDED |
{$SYSTEMD.UNITFILESTATE.SOCKET.MATCHES} | Filter of systemd socket units by unit file state |
enabled |
{$SYSTEMD.UNITFILESTATE.SOCKET.NOT_MATCHES} | Filter of systemd socket units by unit file state |
CHANGE_IF_NEEDED |
{$SYSTEMD.NAME.SERVICE.MATCHES} | Filter of systemd service units by name |
.* |
{$SYSTEMD.NAME.SERVICE.NOT_MATCHES} | Filter of systemd service units by name |
CHANGE_IF_NEEDED |
{$SYSTEMD.ACTIVESTATE.SERVICE.MATCHES} | Filter of systemd service units by active state |
active |
{$SYSTEMD.ACTIVESTATE.SERVICE.NOT_MATCHES} | Filter of systemd service units by active state |
CHANGE_IF_NEEDED |
{$SYSTEMD.UNITFILESTATE.SERVICE.MATCHES} | Filter of systemd service units by unit file state |
enabled |
{$SYSTEMD.UNITFILESTATE.SERVICE.NOT_MATCHES} | Filter of systemd service units by unit file state |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
Service units discovery | Discover systemd service units and their details. |
Zabbix agent | systemd.unit.discovery[service] |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#UNIT.NAME}: Get unit info | Returns all properties of a systemd service unit. Unit description: {#UNIT.DESCRIPTION}. |
Zabbix agent | systemd.unit.get["{#UNIT.NAME}"] |
{#UNIT.NAME}: Active state | State value that reflects whether the unit is currently active or not. The following states are currently defined: "active", "reloading", "inactive", "failed", "activating", and "deactivating". |
Dependent item | systemd.service.active_state["{#UNIT.NAME}"] Preprocessing
|
{#UNIT.NAME}: Load state | State value that reflects whether the configuration file of this unit has been loaded. The following states are currently defined: "loaded", "error", and "masked". |
Dependent item | systemd.service.load_state["{#UNIT.NAME}"] Preprocessing
|
{#UNIT.NAME}: Unit file state | Encodes the install state of the unit file of FragmentPath. It currently knows the following states: "enabled", "enabled-runtime", "linked", "linked-runtime", "masked", "masked-runtime", "static", "disabled", and "invalid". |
Dependent item | systemd.service.unitfile_state["{#UNIT.NAME}"] Preprocessing
|
{#UNIT.NAME}: Active time | Number of seconds since unit entered the active state. |
Dependent item | systemd.service.uptime["{#UNIT.NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
{#UNIT.NAME}: Service is not running | last(/Systemd by Zabbix agent 2/systemd.service.active_state["{#UNIT.NAME}"])<>1 |Warning |
Manual close: Yes | ||
{#UNIT.NAME}: has been restarted | Uptime is less than 10 minutes. |
last(/Systemd by Zabbix agent 2/systemd.service.uptime["{#UNIT.NAME}"])<10m |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Socket units discovery | Discover systemd socket units and their details. |
Zabbix agent | systemd.unit.discovery[socket] |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#UNIT.NAME}: Get unit info | Returns all properties of a systemd socket unit. Unit description: {#UNIT.DESCRIPTION}. |
Zabbix agent | systemd.unit.get["{#UNIT.NAME}",Socket] |
{#UNIT.NAME}: Connections accepted per sec | The number of accepted socket connections (NAccepted) per second. |
Dependent item | systemd.socket.conn_accepted.rate["{#UNIT.NAME}"] Preprocessing
|
{#UNIT.NAME}: Connections connected | The current number of socket connections (NConnections). |
Dependent item | systemd.socket.conn_count["{#UNIT.NAME}"] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Squid monitoring by Zabbix via SNMP and doesn't require any external scripts.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Enable SNMP support following official documentation. Required parameters in squid.conf:
snmp_port <port_number>
acl <zbx_acl_name> snmp_community <community_name>
snmp_access allow <zbx_acl_name> <zabbix_server_ip>
1. Import the template templateappsquid_snmp.yaml into Zabbix.
2. Set values for {$SQUID.SNMP.COMMUNITY}, {$SQUID.SNMP.PORT} and {$SQUID.HTTP.PORT} as configured in squid.conf.
3. Link the imported template to a host with Squid.
4. Add SNMPv2 interface to Squid host. Set Port as {$SQUID.SNMP.PORT} and SNMP community as {$SQUID.SNMP.COMMUNITY}.
Name | Description | Default |
---|---|---|
{$SQUID.SNMP.PORT} | snmp_port configured in squid.conf (Default: 3401) |
3401 |
{$SQUID.HTTP.PORT} | http_port configured in squid.conf (Default: 3128) |
3128 |
{$SQUID.SNMP.COMMUNITY} | SNMP community allowed by ACL in squid.conf |
public |
{$SQUID.FILE.DESC.WARN.MIN} | The threshold for minimum number of available file descriptors |
100 |
{$SQUID.PAGE.FAULT.WARN} | The threshold for sys page faults rate in percent of received HTTP requests |
90 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Squid: Service ping | Simple check | net.tcp.service[tcp,,{$SQUID.HTTP.PORT}] Preprocessing
|
|
Squid: Uptime | The Uptime of the cache in timeticks (in hundredths of a second) with preprocessing |
SNMP agent | squid[cacheUptime] Preprocessing
|
Squid: Version | Cache Software Version |
SNMP agent | squid[cacheVersionId] Preprocessing
|
Squid: CPU usage | The percentage use of the CPU |
SNMP agent | squid[cacheCpuUsage] |
Squid: Memory maximum resident size | Maximum Resident Size |
SNMP agent | squid[cacheMaxResSize] Preprocessing
|
Squid: Memory maximum cache size | The value of the cache_mem parameter |
SNMP agent | squid[cacheMemMaxSize] Preprocessing
|
Squid: Memory cache usage | Total accounted memory |
SNMP agent | squid[cacheMemUsage] Preprocessing
|
Squid: Cache swap low water mark | Cache Swap Low Water Mark |
SNMP agent | squid[cacheSwapLowWM] |
Squid: Cache swap high water mark | Cache Swap High Water Mark |
SNMP agent | squid[cacheSwapHighWM] |
Squid: Cache swap directory size | The total of the cache_dir space allocated |
SNMP agent | squid[cacheSwapMaxSize] Preprocessing
|
Squid: Cache swap current size | Storage Swap Size |
SNMP agent | squid[cacheCurrentSwapSize] |
Squid: File descriptor count - current used | Number of file descriptors in use |
SNMP agent | squid[cacheCurrentFileDescrCnt] |
Squid: File descriptor count - current maximum | Highest number of file descriptors in use |
SNMP agent | squid[cacheCurrentFileDescrMax] |
Squid: File descriptor count - current reserved | Reserved number of file descriptors |
SNMP agent | squid[cacheCurrentResFileDescrCnt] |
Squid: File descriptor count - current available | Available number of file descriptors |
SNMP agent | squid[cacheCurrentUnusedFDescrCnt] |
Squid: Byte hit ratio per 1 minute | Byte Hit Ratios |
SNMP agent | squid[cacheRequestByteRatio.1] |
Squid: Byte hit ratio per 5 minutes | Byte Hit Ratios |
SNMP agent | squid[cacheRequestByteRatio.5] |
Squid: Byte hit ratio per 1 hour | Byte Hit Ratios |
SNMP agent | squid[cacheRequestByteRatio.60] |
Squid: Request hit ratio per 1 minute | Byte Hit Ratios |
SNMP agent | squid[cacheRequestHitRatio.1] |
Squid: Request hit ratio per 5 minutes | Byte Hit Ratios |
SNMP agent | squid[cacheRequestHitRatio.5] |
Squid: Request hit ratio per 1 hour | Byte Hit Ratios |
SNMP agent | squid[cacheRequestHitRatio.60] |
Squid: Sys page faults per second | Page faults with physical I/O |
SNMP agent | squid[cacheSysPageFaults] Preprocessing
|
Squid: HTTP requests received per second | Number of HTTP requests received |
SNMP agent | squid[cacheProtoClientHttpRequests] Preprocessing
|
Squid: HTTP traffic received per second | Number of HTTP traffic received from clients |
SNMP agent | squid[cacheHttpInKb] Preprocessing
|
Squid: HTTP traffic sent per second | Number of HTTP traffic sent to clients |
SNMP agent | squid[cacheHttpOutKb] Preprocessing
|
Squid: HTTP Hits sent from cache per second | Number of HTTP Hits sent to clients from cache |
SNMP agent | squid[cacheHttpHits] Preprocessing
|
Squid: HTTP Errors sent per second | Number of HTTP Errors sent to clients |
SNMP agent | squid[cacheHttpErrors] Preprocessing
|
Squid: ICP messages sent per second | Number of ICP messages sent |
SNMP agent | squid[cacheIcpPktsSent] Preprocessing
|
Squid: ICP messages received per second | Number of ICP messages received |
SNMP agent | squid[cacheIcpPktsRecv] Preprocessing
|
Squid: ICP traffic transmitted per second | Number of ICP traffic transmitted |
SNMP agent | squid[cacheIcpKbSent] Preprocessing
|
Squid: ICP traffic received per second | Number of ICP traffic received |
SNMP agent | squid[cacheIcpKbRecv] Preprocessing
|
Squid: DNS server requests per second | Number of external dns server requests |
SNMP agent | squid[cacheDnsRequests] Preprocessing
|
Squid: DNS server replies per second | Number of external dns server replies |
SNMP agent | squid[cacheDnsReplies] Preprocessing
|
Squid: FQDN cache requests per second | Number of FQDN Cache requests |
SNMP agent | squid[cacheFqdnRequests] Preprocessing
|
Squid: FQDN cache hits per second | Number of FQDN Cache hits |
SNMP agent | squid[cacheFqdnHits] Preprocessing
|
Squid: FQDN cache misses per second | Number of FQDN Cache misses |
SNMP agent | squid[cacheFqdnMisses] Preprocessing
|
Squid: IP cache requests per second | Number of IP Cache requests |
SNMP agent | squid[cacheIpRequests] Preprocessing
|
Squid: IP cache hits per second | Number of IP Cache hits |
SNMP agent | squid[cacheIpHits] Preprocessing
|
Squid: IP cache misses per second | Number of IP Cache misses |
SNMP agent | squid[cacheIpMisses] Preprocessing
|
Squid: Objects count | Number of objects stored by the cache |
SNMP agent | squid[cacheNumObjCount] |
Squid: Objects LRU expiration age | Storage LRU Expiration Age |
SNMP agent | squid[cacheCurrentLRUExpiration] Preprocessing
|
Squid: Objects unlinkd requests | Requests given to unlinkd |
SNMP agent | squid[cacheCurrentUnlinkRequests] |
Squid: HTTP all service time per 5 minutes | HTTP all service time per 5 minutes |
SNMP agent | squid[cacheHttpAllSvcTime.5] Preprocessing
|
Squid: HTTP all service time per hour | HTTP all service time per hour |
SNMP agent | squid[cacheHttpAllSvcTime.60] Preprocessing
|
Squid: HTTP miss service time per 5 minutes | HTTP miss service time per 5 minutes |
SNMP agent | squid[cacheHttpMissSvcTime.5] Preprocessing
|
Squid: HTTP miss service time per hour | HTTP miss service time per hour |
SNMP agent | squid[cacheHttpMissSvcTime.60] Preprocessing
|
Squid: HTTP hit service time per 5 minutes | HTTP hit service time per 5 minutes |
SNMP agent | squid[cacheHttpHitSvcTime.5] Preprocessing
|
Squid: HTTP hit service time per hour | HTTP hit service time per hour |
SNMP agent | squid[cacheHttpHitSvcTime.60] Preprocessing
|
Squid: ICP query service time per 5 minutes | ICP query service time per 5 minutes |
SNMP agent | squid[cacheIcpQuerySvcTime.5] Preprocessing
|
Squid: ICP query service time per hour | ICP query service time per hour |
SNMP agent | squid[cacheIcpQuerySvcTime.60] Preprocessing
|
Squid: ICP reply service time per 5 minutes | ICP reply service time per 5 minutes |
SNMP agent | squid[cacheIcpReplySvcTime.5] Preprocessing
|
Squid: ICP reply service time per hour | ICP reply service time per hour |
SNMP agent | squid[cacheIcpReplySvcTime.60] Preprocessing
|
Squid: DNS service time per 5 minutes | DNS service time per 5 minutes |
SNMP agent | squid[cacheDnsSvcTime.5] Preprocessing
|
Squid: DNS service time per hour | DNS service time per hour |
SNMP agent | squid[cacheDnsSvcTime.60] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Squid: Port {$SQUID.HTTP.PORT} is down | last(/Squid by SNMP/net.tcp.service[tcp,,{$SQUID.HTTP.PORT}])=0 |Average |
Manual close: Yes | ||
Squid: Squid has been restarted | Uptime is less than 10 minutes. |
last(/Squid by SNMP/squid[cacheUptime])<10m |Info |
Manual close: Yes | |
Squid: Squid version has been changed | Squid version has changed. Acknowledge to close the problem manually. |
last(/Squid by SNMP/squid[cacheVersionId],#1)<>last(/Squid by SNMP/squid[cacheVersionId],#2) and length(last(/Squid by SNMP/squid[cacheVersionId]))>0 |Info |
Manual close: Yes | |
Squid: Swap usage is more than low watermark | last(/Squid by SNMP/squid[cacheCurrentSwapSize])>last(/Squid by SNMP/squid[cacheSwapLowWM])*last(/Squid by SNMP/squid[cacheSwapMaxSize])/100 |Warning |
|||
Squid: Swap usage is more than high watermark | last(/Squid by SNMP/squid[cacheCurrentSwapSize])>last(/Squid by SNMP/squid[cacheSwapHighWM])*last(/Squid by SNMP/squid[cacheSwapMaxSize])/100 |High |
|||
Squid: Squid is running out of file descriptors | last(/Squid by SNMP/squid[cacheCurrentUnusedFDescrCnt])<{$SQUID.FILE.DESC.WARN.MIN} |Warning |
|||
Squid: High sys page faults rate | avg(/Squid by SNMP/squid[cacheSysPageFaults],5m)>avg(/Squid by SNMP/squid[cacheProtoClientHttpRequests],5m)/100*{$SQUID.PAGE.FAULT.WARN} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Microsoft SharePoint monitoring by Zabbix via HTTP and doesn't require any external scripts.
SharePoint includes a Representational State Transfer (REST) service. Developers can perform read operations from their SharePoint Add-ins, solutions, and client applications, using REST web technologies and standard Open Data Protocol (OData) syntax. Details in https://docs.microsoft.com/ru-ru/sharepoint/dev/sp-add-ins/get-to-know-the-sharepoint-rest-service?tabs=csom
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Create a new host. Define macros according to your Sharepoint web portal. It is recommended to fill in the values of the filter macros to avoid getting redundant data.
Name | Description | Default |
---|---|---|
{$SHAREPOINT.USER} | ||
{$SHAREPOINT.PASSWORD} | ||
{$SHAREPOINT.URL} | Portal page URL. For example http://sharepoint.companyname.local/ |
|
{$SHAREPOINT.LLD.FILTER.NAME.MATCHES} | Filter of discoverable dictionaries by name. |
.* |
{$SHAREPOINT.LLD.FILTER.FULL_PATH.MATCHES} | Filter of discoverable dictionaries by full path. |
^/ |
{$SHAREPOINT.LLD.FILTER.TYPE.MATCHES} | Filter of discoverable types. |
FOLDER |
{$SHAREPOINT.LLD.FILTER.NAME.NOT_MATCHES} | Filter to exclude discovered dictionaries by name. |
CHANGE_IF_NEEDED |
{$SHAREPOINT.LLD.FILTER.FULLPATH.NOTMATCHES} | Filter to exclude discovered dictionaries by full path. |
CHANGE_IF_NEEDED |
{$SHAREPOINT.LLD.FILTER.TYPE.NOT_MATCHES} | Filter to exclude discovered types. |
CHANGE_IF_NEEDED |
{$SHAREPOINT.ROOT} | /Shared Documents |
|
{$SHAREPOINT.LLD_INTERVAL} | 3h |
|
{$SHAREPOINT.GET_INTERVAL} | 1m |
|
{$SHAREPOINT.MAXHEALTHSCORE} | Must be in the range from 0 to 10 in details: https://docs.microsoft.com/en-us/openspecs/sharepoint_protocols/ms-wsshp/c60ddeb6-4113-4a73-9e97-26b5c3907d33 |
5 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Sharepoint: Get directory structure | Used to get directory structure information |
Script | sharepoint.get_dir Preprocessing
|
Sharepoint: Get directory structure: Status | HTTP response (status) code. Indicates whether the HTTP request was successfully completed. Additional information is available in the server log file. |
Dependent item | sharepoint.get_dir.status Preprocessing
|
Sharepoint: Get directory structure: Exec time | The time taken to execute the script for obtaining the data structure (in ms). Less is better. |
Dependent item | sharepoint.get_dir.time Preprocessing
|
Sharepoint: Health score | This item specifies a value between 0 and 10, where 0 represents a low load and a high ability to process requests and 10 represents a high load and that the server is throttling requests to maintain adequate throughput. |
HTTP agent | sharepoint.health_score Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Sharepoint: Error getting directory structure. | Error getting directory structure. Check the Zabbix server log for more details. |
last(/Microsoft SharePoint by HTTP/sharepoint.get_dir.status)<>200 |Warning |
Manual close: Yes | |
Sharepoint: Server responds slowly to API request | last(/Microsoft SharePoint by HTTP/sharepoint.get_dir.time)>2000 |Warning |
Manual close: Yes | ||
Sharepoint: Bad health score | last(/Microsoft SharePoint by HTTP/sharepoint.health_score)>"{$SHAREPOINT.MAX_HEALTH_SCORE}" |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Directory discovery | Script | sharepoint.directory.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Sharepoint: Size ({#SHAREPOINT.LLD.FULL_PATH}) | Size of: {#SHAREPOINT.LLD.FULL_PATH} |
Dependent item | sharepoint.size["{#SHAREPOINT.LLD.FULL_PATH}"] Preprocessing
|
Sharepoint: Modified ({#SHAREPOINT.LLD.FULL_PATH}) | Date of change: {#SHAREPOINT.LLD.FULL_PATH} |
Dependent item | sharepoint.modified["{#SHAREPOINT.LLD.FULL_PATH}"] Preprocessing
|
Sharepoint: Created ({#SHAREPOINT.LLD.FULL_PATH}) | Date of creation: {#SHAREPOINT.LLD.FULL_PATH} |
Dependent item | sharepoint.created["{#SHAREPOINT.LLD.FULL_PATH}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Sharepoint: Sharepoint object is changed | Updated date of modification of folder / file |
last(/Microsoft SharePoint by HTTP/sharepoint.modified["{#SHAREPOINT.LLD.FULL_PATH}"],#1)<>last(/Microsoft SharePoint by HTTP/sharepoint.modified["{#SHAREPOINT.LLD.FULL_PATH}"],#2) |Info |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is developed to monitor the messaging broker RabbitMQ cluster by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template collects metrics by polling RabbitMQ management plugin with HTTP agent remotely.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Enable the RabbitMQ management plugin. See the RabbitMQ documentation
for the instructions.
Create a user to monitor the service:
rabbitmqctl add_user zbx_monitor <PASSWORD>
rabbitmqctl set_permissions -p / zbx_monitor "" "" ".*"
rabbitmqctl set_user_tags zbx_monitor monitoring
Set the hostname or IP address of the RabbitMQ cluster host in the {$RABBITMQ.API.CLUSTER_HOST}
macro. You can also change the port in the {$RABBITMQ.API.PORT}
macro and the scheme in the {$RABBITMQ.API.SCHEME}
macro if necessary.
Set the user name and password in the macros {$RABBITMQ.API.USER}
and {$RABBITMQ.API.PASSWORD}
.
Name | Description | Default |
---|---|---|
{$RABBITMQ.API.USER} | zbx_monitor |
|
{$RABBITMQ.API.PASSWORD} | zabbix |
|
{$RABBITMQ.API.CLUSTER_HOST} | The hostname or IP of the API endpoint for the RabbitMQ cluster. |
<SET CLUSTER API HOST> |
{$RABBITMQ.API.PORT} | The port of the RabbitMQ API endpoint. |
15672 |
{$RABBITMQ.API.SCHEME} | The request scheme, which may be HTTP or HTTPS. |
http |
{$RABBITMQ.LLD.FILTER.EXCHANGE.MATCHES} | This macro is used in the discovery of exchanges. It can be overridden at host level or its linked template level. |
.* |
{$RABBITMQ.LLD.FILTER.EXCHANGE.NOT_MATCHES} | This macro is used in the discovery of exchanges. It can be overridden at host level or its linked template level. |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
RabbitMQ: Get overview | The HTTP API endpoint that returns cluster-wide metrics. |
HTTP agent | rabbitmq.get_overview |
RabbitMQ: Get exchanges | The HTTP API endpoint that returns exchanges metrics. |
HTTP agent | rabbitmq.get_exchanges |
RabbitMQ: Connections total | The total number of connections. |
Dependent item | rabbitmq.overview.object_totals.connections Preprocessing
|
RabbitMQ: Channels total | The total number of channels. |
Dependent item | rabbitmq.overview.object_totals.channels Preprocessing
|
RabbitMQ: Queues total | The total number of queues. |
Dependent item | rabbitmq.overview.object_totals.queues Preprocessing
|
RabbitMQ: Consumers total | The total number of consumers. |
Dependent item | rabbitmq.overview.object_totals.consumers Preprocessing
|
RabbitMQ: Exchanges total | The total number of exchanges. |
Dependent item | rabbitmq.overview.object_totals.exchanges Preprocessing
|
RabbitMQ: Messages total | The total number of messages (ready, plus unacknowledged). |
Dependent item | rabbitmq.overview.queue_totals.messages Preprocessing
|
RabbitMQ: Messages ready for delivery | The number of messages ready for delivery. |
Dependent item | rabbitmq.overview.queue_totals.messages.ready Preprocessing
|
RabbitMQ: Messages unacknowledged | The number of unacknowledged messages. |
Dependent item | rabbitmq.overview.queue_totals.messages.unacknowledged Preprocessing
|
RabbitMQ: Messages acknowledged | The number of messages delivered to clients and acknowledged. |
Dependent item | rabbitmq.overview.messages.ack Preprocessing
|
RabbitMQ: Messages acknowledged per second | The rate of messages (per second) delivered to clients and acknowledged. |
Dependent item | rabbitmq.overview.messages.ack.rate Preprocessing
|
RabbitMQ: Messages confirmed | The count of confirmed messages. |
Dependent item | rabbitmq.overview.messages.confirm Preprocessing
|
RabbitMQ: Messages confirmed per second | The rate of messages confirmed per second. |
Dependent item | rabbitmq.overview.messages.confirm.rate Preprocessing
|
RabbitMQ: Messages delivered | The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the |
Dependent item | rabbitmq.overview.messages.deliver_get Preprocessing
|
RabbitMQ: Messages delivered per second | The rate of the sum of messages (per second) delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the |
Dependent item | rabbitmq.overview.messages.deliver_get.rate Preprocessing
|
RabbitMQ: Messages published | The count of published messages. |
Dependent item | rabbitmq.overview.messages.publish Preprocessing
|
RabbitMQ: Messages published per second | The rate of messages published per second. |
Dependent item | rabbitmq.overview.messages.publish.rate Preprocessing
|
RabbitMQ: Messages publish_in | The count of messages published from the channels into this overview. |
Dependent item | rabbitmq.overview.messages.publish_in Preprocessing
|
RabbitMQ: Messages publish_in per second | The rate of messages (per second) published from the channels into this overview. |
Dependent item | rabbitmq.overview.messages.publish_in.rate Preprocessing
|
RabbitMQ: Messages publish_out | The count of messages published from this overview into queues. |
Dependent item | rabbitmq.overview.messages.publish_out Preprocessing
|
RabbitMQ: Messages publish_out per second | The rate of messages (per second) published from this overview into queues. |
Dependent item | rabbitmq.overview.messages.publish_out.rate Preprocessing
|
RabbitMQ: Messages returned unroutable | The count of messages returned to a publisher as unroutable. |
Dependent item | rabbitmq.overview.messages.return_unroutable Preprocessing
|
RabbitMQ: Messages returned unroutable per second | The rate of messages (per second) returned to a publisher as unroutable. |
Dependent item | rabbitmq.overview.messages.return_unroutable.rate Preprocessing
|
RabbitMQ: Messages returned redeliver | The count of subset of messages in the |
Dependent item | rabbitmq.overview.messages.redeliver Preprocessing
|
RabbitMQ: Messages returned redeliver per second | The rate of subset of messages (per second) in the |
Dependent item | rabbitmq.overview.messages.redeliver.rate Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ: Failed to fetch overview data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/RabbitMQ cluster by HTTP/rabbitmq.get_overview,30m)=1 |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Health Check 3.8.10+ discovery | Specific metrics for the versions: up to and including 3.8.10. |
Dependent item | rabbitmq.healthcheck.v3810.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
RabbitMQ: Healthcheck: alarms in effect in the cluster{#SINGLETON} | Responds a 200 OK if there are no alarms in effect in the cluster, otherwise responds with a 503 Service Unavailable. |
HTTP agent | rabbitmq.healthcheck.alarms[{#SINGLETON}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ: There are active alarms in the cluster | This is the default API endpoint path: http://{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ cluster by HTTP/rabbitmq.healthcheck.alarms[{#SINGLETON}])=0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Exchanges discovery | The metrics for an individual exchange. |
Dependent item | rabbitmq.exchanges.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Get data | The HTTP API endpoint that returns [{#VHOST}][{#EXCHANGE}][{#TYPE}] exchanges metrics |
Dependent item | rabbitmq.get_exchanges["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages acknowledged | The number of messages delivered to clients and acknowledged. |
Dependent item | rabbitmq.exchange.messages.ack["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages acknowledged per second | The rate of messages (per second) delivered to clients and acknowledged. |
Dependent item | rabbitmq.exchange.messages.ack.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages confirmed | The count of confirmed messages. |
Dependent item | rabbitmq.exchange.messages.confirm["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages confirmed per second | The rate of messages confirmed per second. |
Dependent item | rabbitmq.exchange.messages.confirm.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages delivered | The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the |
Dependent item | rabbitmq.exchange.messages.deliver_get["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages delivered per second | The rate of the sum of messages (per second) delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the |
Dependent item | rabbitmq.exchange.messages.deliver_get.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages published | The count of published messages. |
Dependent item | rabbitmq.exchange.messages.publish["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages published per second | The rate of messages published per second. |
Dependent item | rabbitmq.exchange.messages.publish.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_in | The count of messages published from the channels into this overview. |
Dependent item | rabbitmq.exchange.messages.publish_in["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_in per second | The rate of messages (per second) published from the channels into this overview. |
Dependent item | rabbitmq.exchange.messages.publish_in.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_out | The count of messages published from this overview into queues. |
Dependent item | rabbitmq.exchange.messages.publish_out["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_out per second | The rate of messages (per second) published from this overview into queues. |
Dependent item | rabbitmq.exchange.messages.publish_out.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages returned unroutable | The count of messages returned to a publisher as unroutable. |
Dependent item | rabbitmq.exchange.messages.return_unroutable["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages returned unroutable per second | The rate of messages (per second) returned to a publisher as unroutable. |
Dependent item | rabbitmq.exchange.messages.return_unroutable.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages redelivered | The count of subset of messages in the |
Dependent item | rabbitmq.exchange.messages.redeliver["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange {#VHOST}/{#EXCHANGE}/{#TYPE}: Messages redelivered per second | The rate of subset of messages (per second) in the |
Dependent item | rabbitmq.exchange.messages.redeliver.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
This template is developed to monitor the messaging broker RabbitMQ node by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template collects metrics by polling RabbitMQ management plugin with HTTP agent remotely.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Enable the RabbitMQ management plugin. See the RabbitMQ documentation
for the instructions.
Create a user to monitor the service:
rabbitmqctl add_user zbx_monitor <PASSWORD>
rabbitmqctl set_permissions -p / zbx_monitor "" "" ".*"
rabbitmqctl set_user_tags zbx_monitor monitoring
Set the hostname or IP address of the RabbitMQ node host in the {$RABBITMQ.API.HOST}
macro. You can also change the port in the {$RABBITMQ.API.PORT}
macro and the scheme in the {$RABBITMQ.API.SCHEME}
macro if necessary.
Set the user name and password in the macros {$RABBITMQ.API.USER}
and {$RABBITMQ.API.PASSWORD}
.
Name | Description | Default |
---|---|---|
{$RABBITMQ.API.USER} | zbx_monitor |
|
{$RABBITMQ.API.PASSWORD} | zabbix |
|
{$RABBITMQ.CLUSTER.NAME} | The name of the RabbitMQ cluster. |
rabbit |
{$RABBITMQ.API.PORT} | The port of the RabbitMQ API endpoint. |
15672 |
{$RABBITMQ.API.SCHEME} | The request scheme, which may be HTTP or HTTPS. |
http |
{$RABBITMQ.API.HOST} | The hostname or IP of the API endpoint for the RabbitMQ. |
<SET NODE API HOST> |
{$RABBITMQ.LLD.FILTER.QUEUE.MATCHES} | This macro is used in the discovery of queues. It can be overridden at host level or its linked template level. |
.* |
{$RABBITMQ.LLD.FILTER.QUEUE.NOT_MATCHES} | This macro is used in the discovery of queues. It can be overridden at host level or its linked template level. |
CHANGE_IF_NEEDED |
{$RABBITMQ.RESPONSE_TIME.MAX.WARN} | The maximum response time by the RabbitMQ expressed in seconds for a trigger expression. |
10 |
{$RABBITMQ.MESSAGES.MAX.WARN} | The maximum number of messages in the queue for a trigger expression. |
1000 |
Name | Description | Type | Key and additional info |
---|---|---|---|
RabbitMQ: Service ping | Simple check | net.tcp.service["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"] Preprocessing
|
|
RabbitMQ: Get node overview | The HTTP API endpoint that returns cluster-wide metrics. |
HTTP agent | rabbitmq.getnodeoverview Preprocessing
|
RabbitMQ: Get nodes | The HTTP API endpoint that returns metrics of the nodes. |
HTTP agent | rabbitmq.get_nodes |
RabbitMQ: Get queues | The HTTP API endpoint that returns metrics of the queues metrics. |
HTTP agent | rabbitmq.get_queues |
RabbitMQ: Management plugin version | The version of the management plugin in use. |
Dependent item | rabbitmq.node.overview.management_version Preprocessing
|
RabbitMQ: RabbitMQ version | The version of the RabbitMQ on the node, which processed this request. |
Dependent item | rabbitmq.node.overview.rabbitmq_version Preprocessing
|
RabbitMQ: Used file descriptors | The descriptors of the used file. |
Dependent item | rabbitmq.node.fd_used Preprocessing
|
RabbitMQ: Free disk space | The current free disk space. |
Dependent item | rabbitmq.node.disk_free Preprocessing
|
RabbitMQ: Disk free limit | The free space limit of a disk expressed in bytes. |
Dependent item | rabbitmq.node.diskfreelimit Preprocessing
|
RabbitMQ: Memory used | The memory usage expressed in bytes. |
Dependent item | rabbitmq.node.mem_used Preprocessing
|
RabbitMQ: Memory limit | The memory usage with high watermark properties expressed in bytes. |
Dependent item | rabbitmq.node.mem_limit Preprocessing
|
RabbitMQ: Runtime run queue | The average number of Erlang processes waiting to run. |
Dependent item | rabbitmq.node.run_queue Preprocessing
|
RabbitMQ: Sockets used | The number of file descriptors used as sockets. |
Dependent item | rabbitmq.node.sockets_used Preprocessing
|
RabbitMQ: Sockets available | The file descriptors available for use as sockets. |
Dependent item | rabbitmq.node.sockets_total Preprocessing
|
RabbitMQ: Number of network partitions | The number of network partitions, which this node "sees". |
Dependent item | rabbitmq.node.partitions Preprocessing
|
RabbitMQ: Is running | It "sees" whether the node is running or not. |
Dependent item | rabbitmq.node.running Preprocessing
|
RabbitMQ: Memory alarm | It checks whether the host has a memory alarm or not. |
Dependent item | rabbitmq.node.mem_alarm Preprocessing
|
RabbitMQ: Disk free alarm | It checks whether the node has a disk alarm or not. |
Dependent item | rabbitmq.node.diskfreealarm Preprocessing
|
RabbitMQ: Uptime | Uptime expressed in milliseconds. |
Dependent item | rabbitmq.node.uptime Preprocessing
|
RabbitMQ: Service response time | Simple check | net.tcp.service.perf["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ: Service is down | last(/RabbitMQ node by HTTP/net.tcp.service["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"])=0 |Average |
Manual close: Yes | ||
RabbitMQ: Failed to fetch nodes data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/RabbitMQ node by HTTP/rabbitmq.get_nodes,30m)=1 |Warning |
Manual close: Yes Depends on:
|
|
RabbitMQ: Version has changed | RabbitMQ version has changed. Acknowledge to close the problem manually. |
last(/RabbitMQ node by HTTP/rabbitmq.node.overview.rabbitmq_version,#1)<>last(/RabbitMQ node by HTTP/rabbitmq.node.overview.rabbitmq_version,#2) and length(last(/RabbitMQ node by HTTP/rabbitmq.node.overview.rabbitmq_version))>0 |Info |
Manual close: Yes | |
RabbitMQ: Number of network partitions is too high | For more details see Detecting Network Partitions. |
min(/RabbitMQ node by HTTP/rabbitmq.node.partitions,5m)>0 |Warning |
||
RabbitMQ: Node is not running | RabbitMQ node is not running. |
max(/RabbitMQ node by HTTP/rabbitmq.node.running,5m)=0 |Average |
Depends on:
|
|
RabbitMQ: Memory alarm | For more details see Memory Alarms. |
last(/RabbitMQ node by HTTP/rabbitmq.node.mem_alarm)=1 |Average |
||
RabbitMQ: Free disk space alarm | For more details see Free Disk Space Alarms. |
last(/RabbitMQ node by HTTP/rabbitmq.node.disk_free_alarm)=1 |Average |
||
RabbitMQ: Host has been restarted | Uptime is less than 10 minutes. |
last(/RabbitMQ node by HTTP/rabbitmq.node.uptime)<10m |Info |
Manual close: Yes | |
RabbitMQ: Service response time is too high | min(/RabbitMQ node by HTTP/net.tcp.service.perf["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"],5m)>{$RABBITMQ.RESPONSE_TIME.MAX.WARN} |Warning |
Manual close: Yes Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Health Check 3.8.10+ discovery | Specific metrics for the versions: up to and including 3.8.10. |
Dependent item | rabbitmq.healthcheck.v3810.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
RabbitMQ: Healthcheck: local alarms in effect on this node{#SINGLETON} | It responds with a status code Otherwise, it responds with a status code |
HTTP agent | rabbitmq.healthcheck.local_alarms[{#SINGLETON}] Preprocessing
|
RabbitMQ: Healthcheck: expiration date on the certificates{#SINGLETON} | It checks the expiration date on the certificates for every listener configured to use the Transport Layer Security (TLS). It responds with a status code Otherwise, it responds with a status code |
HTTP agent | rabbitmq.healthcheck.certificate_expiration[{#SINGLETON}] Preprocessing
|
RabbitMQ: Healthcheck: virtual hosts on this node{#SINGLETON} | It responds with It responds with a status code Otherwise it responds with a status code |
HTTP agent | rabbitmq.healthcheck.virtual_hosts[{#SINGLETON}] Preprocessing
|
RabbitMQ: Healthcheck: classic mirrored queues without synchronized mirrors online{#SINGLETON} | It checks if there are classic mirrored queues without synchronized mirrors online (queues that would potentially lose data if the target node is shut down). It responds with a status code Otherwise, it responds with a status code |
HTTP agent | rabbitmq.healthcheck.mirror_sync[{#SINGLETON}] Preprocessing
|
RabbitMQ: Healthcheck: queues with minimum online quorum{#SINGLETON} | It checks if there are quorum queues with minimum online quorum (queues that would lose their quorum and availability if the target node is shut down). It responds with a status code Otherwise, it responds with a status code |
HTTP agent | rabbitmq.healthcheck.quorum[{#SINGLETON}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ: There are active alarms in the node | This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by HTTP/rabbitmq.healthcheck.local_alarms[{#SINGLETON}])=0 |Average |
||
RabbitMQ: There are valid TLS certificates expiring in the next month | This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by HTTP/rabbitmq.healthcheck.certificate_expiration[{#SINGLETON}])=0 |Average |
||
RabbitMQ: There are not running virtual hosts | This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by HTTP/rabbitmq.healthcheck.virtual_hosts[{#SINGLETON}])=0 |Average |
||
RabbitMQ: There are queues that could potentially lose data if this node goes offline. | This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by HTTP/rabbitmq.healthcheck.mirror_sync[{#SINGLETON}])=0 |Average |
||
RabbitMQ: There are queues that would lose their quorum and availability if this node is shut down. | This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by HTTP/rabbitmq.healthcheck.quorum[{#SINGLETON}])=0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Health Check 3.8.9- discovery | Specific metrics for the versions: up to and including 3.8.4. |
Dependent item | rabbitmq.healthcheck.v389.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
RabbitMQ: Healthcheck{#SINGLETON} | It checks whether the RabbitMQ application is running; and whether the channels and queues can be listed successfully; and that no alarms are in effect. |
HTTP agent | rabbitmq.healthcheck[{#SINGLETON}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ: Node healthcheck failed | For more details see Health Checks. |
last(/RabbitMQ node by HTTP/rabbitmq.healthcheck[{#SINGLETON}])=0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Queues discovery | The metrics for an individual queue. |
Dependent item | rabbitmq.queues.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Get data | The HTTP API endpoint that returns [{#VHOST}][{#QUEUE}] queue metrics |
Dependent item | rabbitmq.get_exchanges["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages total | The count of total messages in the queue. |
Dependent item | rabbitmq.queue.messages["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages per second | The count of total messages per second in the queue. |
Dependent item | rabbitmq.queue.messages.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Consumers | The number of consumers. |
Dependent item | rabbitmq.queue.consumers["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Memory | The bytes of memory consumed by the Erlang process associated with the queue, including stack, heap and internal structures. |
Dependent item | rabbitmq.queue.memory["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages ready | The number of messages ready to be delivered to clients. |
Dependent item | rabbitmq.queue.messages_ready["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages ready per second | The number of messages per second ready to be delivered to clients. |
Dependent item | rabbitmq.queue.messages_ready.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages unacknowledged | The number of messages delivered to clients but not yet acknowledged. |
Dependent item | rabbitmq.queue.messages_unacknowledged["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages unacknowledged per second | The number of messages per second delivered to clients but not yet acknowledged. |
Dependent item | rabbitmq.queue.messages_unacknowledged.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages acknowledged | The number of messages delivered to clients and acknowledged. |
Dependent item | rabbitmq.queue.messages.ack["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages acknowledged per second | The number of messages (per second) delivered to clients and acknowledged. |
Dependent item | rabbitmq.queue.messages.ack.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages delivered | The count of messages delivered to consumers in acknowledgement mode. |
Dependent item | rabbitmq.queue.messages.deliver["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages delivered per second | The count of messages (per second) delivered to consumers in acknowledgement mode. |
Dependent item | rabbitmq.queue.messages.deliver.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Sum of messages delivered | The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the |
Dependent item | rabbitmq.queue.messages.deliver_get["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Sum of messages delivered per second | The rate of delivery per second. The sum of messages delivered (per second) to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to |
Dependent item | rabbitmq.queue.messages.deliver_get.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages published | The count of published messages. |
Dependent item | rabbitmq.queue.messages.publish["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages published per second | The rate of published messages per second. |
Dependent item | rabbitmq.queue.messages.publish.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages redelivered | The count of subset of messages in the |
Dependent item | rabbitmq.queue.messages.redeliver["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages redelivered per second | The rate of messages redelivered per second. |
Dependent item | rabbitmq.queue.messages.redeliver.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ: Too many messages in queue [{#VHOST}][{#QUEUE}] | min(/RabbitMQ node by HTTP/rabbitmq.queue.messages["{#VHOST}/{#QUEUE}"],5m)>{$RABBITMQ.MESSAGES.MAX.WARN:"{#QUEUE}"} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is developed to monitor the messaging broker RabbitMQ by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template RabbitMQ Cluster
— collects metrics by polling RabbitMQ management plugin with Zabbix agent.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Enable the RabbitMQ management plugin. See RabbitMQ documentation for the instructions.
Create a user to monitor the service:
rabbitmqctl add_user zbx_monitor <PASSWORD>
rabbitmqctl set_permissions -p / zbx_monitor "" "" ".*"
rabbitmqctl set_user_tags zbx_monitor monitoring
A login name and password are also supported in macros functions:
If your cluster consists of several nodes, it is recommended to assign the cluster
template to a separate balancing host.
In the case of a single-node installation, you can assign the cluster
template to one host with a node
template.
If you use another API endpoint, then don't forget to change {$RABBITMQ.API.CLUSTER_HOST}
macro.
Install and setup Zabbix agent.
Name | Description | Default |
---|---|---|
{$RABBITMQ.API.USER} | zbx_monitor |
|
{$RABBITMQ.API.PASSWORD} | zabbix |
|
{$RABBITMQ.API.CLUSTER_HOST} | The hostname or IP of the API endpoint for the RabbitMQ cluster. |
127.0.0.1 |
{$RABBITMQ.API.PORT} | The port of the RabbitMQ API endpoint. |
15672 |
{$RABBITMQ.API.SCHEME} | The request scheme, which may be HTTP or HTTPS. |
http |
{$RABBITMQ.LLD.FILTER.EXCHANGE.MATCHES} | This macro is used in the discovery of exchanges. It can be overridden at host level or its linked template level. |
.* |
{$RABBITMQ.LLD.FILTER.EXCHANGE.NOT_MATCHES} | This macro is used in the discovery of exchanges. It can be overridden at host level or its linked template level. |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
RabbitMQ: Get overview | The HTTP API endpoint that returns cluster-wide metrics. |
Zabbix agent | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/overview"] Preprocessing
|
RabbitMQ: Get exchanges | The HTTP API endpoint that returns exchanges metrics. |
Zabbix agent | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/exchanges"] Preprocessing
|
RabbitMQ: Connections total | The total number of connections. |
Dependent item | rabbitmq.overview.object_totals.connections Preprocessing
|
RabbitMQ: Channels total | The total number of channels. |
Dependent item | rabbitmq.overview.object_totals.channels Preprocessing
|
RabbitMQ: Queues total | The total number of queues. |
Dependent item | rabbitmq.overview.object_totals.queues Preprocessing
|
RabbitMQ: Consumers total | The total number of consumers. |
Dependent item | rabbitmq.overview.object_totals.consumers Preprocessing
|
RabbitMQ: Exchanges total | The total number of exchanges. |
Dependent item | rabbitmq.overview.object_totals.exchanges Preprocessing
|
RabbitMQ: Messages total | The total number of messages (ready, plus unacknowledged). |
Dependent item | rabbitmq.overview.queue_totals.messages Preprocessing
|
RabbitMQ: Messages ready for delivery | The number of messages ready for delivery. |
Dependent item | rabbitmq.overview.queue_totals.messages.ready Preprocessing
|
RabbitMQ: Messages unacknowledged | The number of unacknowledged messages. |
Dependent item | rabbitmq.overview.queue_totals.messages.unacknowledged Preprocessing
|
RabbitMQ: Messages acknowledged | The number of messages delivered to clients and acknowledged. |
Dependent item | rabbitmq.overview.messages.ack Preprocessing
|
RabbitMQ: Messages acknowledged per second | The rate of messages (per second) delivered to clients and acknowledged. |
Dependent item | rabbitmq.overview.messages.ack.rate Preprocessing
|
RabbitMQ: Messages confirmed | The count of confirmed messages. |
Dependent item | rabbitmq.overview.messages.confirm Preprocessing
|
RabbitMQ: Messages confirmed per second | The rate of messages confirmed per second. |
Dependent item | rabbitmq.overview.messages.confirm.rate Preprocessing
|
RabbitMQ: Messages delivered | The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the |
Dependent item | rabbitmq.overview.messages.deliver_get Preprocessing
|
RabbitMQ: Messages delivered per second | The rate of the sum of messages (per second) delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the |
Dependent item | rabbitmq.overview.messages.deliver_get.rate Preprocessing
|
RabbitMQ: Messages published | The count of published messages. |
Dependent item | rabbitmq.overview.messages.publish Preprocessing
|
RabbitMQ: Messages published per second | The rate of messages published per second. |
Dependent item | rabbitmq.overview.messages.publish.rate Preprocessing
|
RabbitMQ: Messages publish_in | The count of messages published from the channels into this overview. |
Dependent item | rabbitmq.overview.messages.publish_in Preprocessing
|
RabbitMQ: Messages publish_in per second | The rate of messages (per second) published from the channels into this overview. |
Dependent item | rabbitmq.overview.messages.publish_in.rate Preprocessing
|
RabbitMQ: Messages publish_out | The count of messages published from this overview into queues. |
Dependent item | rabbitmq.overview.messages.publish_out Preprocessing
|
RabbitMQ: Messages publish_out per second | The rate of messages (per second) published from this overview into queues. |
Dependent item | rabbitmq.overview.messages.publish_out.rate Preprocessing
|
RabbitMQ: Messages returned unroutable | The count of messages returned to a publisher as unroutable. |
Dependent item | rabbitmq.overview.messages.return_unroutable Preprocessing
|
RabbitMQ: Messages returned unroutable per second | The rate of messages (per second) returned to a publisher as unroutable. |
Dependent item | rabbitmq.overview.messages.return_unroutable.rate Preprocessing
|
RabbitMQ: Messages returned redeliver | The count of subset of messages in the |
Dependent item | rabbitmq.overview.messages.redeliver Preprocessing
|
RabbitMQ: Messages returned redeliver per second | The rate of subset of messages (per second) in the |
Dependent item | rabbitmq.overview.messages.redeliver.rate Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ: Failed to fetch overview data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/RabbitMQ cluster by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/overview"],30m)=1 |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Health Check 3.8.10+ discovery | Specific metrics for the versions: up to and including 3.8.10. |
Dependent item | rabbitmq.healthcheck.v3810.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
RabbitMQ: Healthcheck: alarms in effect in the cluster{#SINGLETON} | It responds with a status code Otherwise, it responds with a status code |
Zabbix agent | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/health/checks/alarms{#SINGLETON}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ: There are active alarms in the cluster | This is the default API endpoint path: http://{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ cluster by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.CLUSTER_HOST}:{$RABBITMQ.API.PORT}/api/health/checks/alarms{#SINGLETON}"])=0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Exchanges discovery | The metrics for an individual exchange. |
Dependent item | rabbitmq.exchanges.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Get data | The HTTP API endpoint that returns [{#VHOST}][{#EXCHANGE}][{#TYPE}] exchanges metrics |
Dependent item | rabbitmq.get_exchanges["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages acknowledged | The number of messages delivered to clients and acknowledged. |
Dependent item | rabbitmq.exchange.messages.ack["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages acknowledged per second | The rate of messages (per second) delivered to clients and acknowledged. |
Dependent item | rabbitmq.exchange.messages.ack.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages confirmed | The count of confirmed messages. |
Dependent item | rabbitmq.exchange.messages.confirm["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages confirmed per second | The rate of messages confirmed per second. |
Dependent item | rabbitmq.exchange.messages.confirm.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages delivered | The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the |
Dependent item | rabbitmq.exchange.messages.deliver_get["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages delivered per second | The rate of the sum of messages (per second) delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the |
Dependent item | rabbitmq.exchange.messages.deliver_get.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages published | The count of published messages. |
Dependent item | rabbitmq.exchange.messages.publish["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages published per second | The rate of messages published per second. |
Dependent item | rabbitmq.exchange.messages.publish.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_in | The count of messages published from the channels into this overview. |
Dependent item | rabbitmq.exchange.messages.publish_in["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_in per second | The rate of messages (per second) published from the channels into this overview. |
Dependent item | rabbitmq.exchange.messages.publish_in.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_out | The count of messages published from this overview into queues. |
Dependent item | rabbitmq.exchange.messages.publish_out["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages publish_out per second | The rate of messages (per second) published from this overview into queues. |
Dependent item | rabbitmq.exchange.messages.publish_out.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages returned unroutable | The count of messages returned to a publisher as unroutable. |
Dependent item | rabbitmq.exchange.messages.return_unroutable["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages returned unroutable per second | The rate of messages (per second) returned to a publisher as unroutable. |
Dependent item | rabbitmq.exchange.messages.return_unroutable.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange [{#VHOST}][{#EXCHANGE}][{#TYPE}]: Messages redelivered | The count of subset of messages in the |
Dependent item | rabbitmq.exchange.messages.redeliver["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
RabbitMQ: Exchange {#VHOST}/{#EXCHANGE}/{#TYPE}: Messages redelivered per second | The rate of subset of messages (per second) in the |
Dependent item | rabbitmq.exchange.messages.redeliver.rate["{#VHOST}/{#EXCHANGE}/{#TYPE}"] Preprocessing
|
This template is developed to monitor RabbitMQ by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template RabbitMQ Node
— (Zabbix version >= 4.2) collects metrics by polling RabbitMQ management plugin with Zabbix agent.
It also uses Zabbix agent to collect RabbitMQ Linux process statistics, such as CPU usage, memory usage, and whether the process is running or not.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Enable the RabbitMQ management plugin. See RabbitMQ documentation for the instructions.
Create a user to monitor the service:
rabbitmqctl add_user zbx_monitor <PASSWORD>
rabbitmqctl set_permissions -p / zbx_monitor "" "" ".*"
rabbitmqctl set_user_tags zbx_monitor monitoring
A login name and password are also supported in macros functions:
If you use another API endpoint, then don't forget to change {$RABBITMQ.API.HOST}
macro.
Install and setup Zabbix agent.
Name | Description | Default |
---|---|---|
{$RABBITMQ.API.USER} | zbx_monitor |
|
{$RABBITMQ.API.PASSWORD} | zabbix |
|
{$RABBITMQ.CLUSTER.NAME} | The name of the RabbitMQ cluster. |
rabbit |
{$RABBITMQ.API.PORT} | The port of the RabbitMQ API endpoint. |
15672 |
{$RABBITMQ.API.SCHEME} | The request scheme, which may be HTTP or HTTPS. |
http |
{$RABBITMQ.API.HOST} | The hostname or IP of the API endpoint for the RabbitMQ. |
127.0.0.1 |
{$RABBITMQ.PROCESS_NAME} | The process name filter for the RabbitMQ process discovery. |
beam.smp |
{$RABBITMQ.PROCESS.NAME.PARAMETER} | The process name of the RabbitMQ server used in the item key |
|
{$RABBITMQ.LLD.FILTER.QUEUE.MATCHES} | This macro is used in the discovery of queues. It can be overridden at host level or its linked template level. |
.* |
{$RABBITMQ.LLD.FILTER.QUEUE.NOT_MATCHES} | This macro is used in the discovery of queues. It can be overridden at host level or its linked template level. |
CHANGE_IF_NEEDED |
{$RABBITMQ.RESPONSE_TIME.MAX.WARN} | The maximum response time by the RabbitMQ expressed in seconds for a trigger expression. |
10 |
{$RABBITMQ.MESSAGES.MAX.WARN} | The maximum number of messages in the queue for a trigger expression. |
1000 |
Name | Description | Type | Key and additional info |
---|---|---|---|
RabbitMQ: Service ping | Zabbix agent | net.tcp.service["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"] Preprocessing
|
|
RabbitMQ: Get node overview | The HTTP API endpoint that returns cluster-wide metrics. |
Zabbix agent | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/overview"] Preprocessing
|
RabbitMQ: Get nodes | The HTTP API endpoint that returns metrics of the nodes. |
Zabbix agent | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/nodes/{$RABBITMQ.CLUSTER.NAME}@{HOST.NAME}?memory=true"] Preprocessing
|
RabbitMQ: Get queues | The HTTP API endpoint that returns metrics of the queues metrics. |
Zabbix agent | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/queues"] Preprocessing
|
RabbitMQ: Management plugin version | The version of the management plugin in use. |
Dependent item | rabbitmq.node.overview.management_version Preprocessing
|
RabbitMQ: RabbitMQ version | The version of the RabbitMQ on the node, which processed this request. |
Dependent item | rabbitmq.node.overview.rabbitmq_version Preprocessing
|
RabbitMQ: Used file descriptors | The descriptors of the used file. |
Dependent item | rabbitmq.node.fd_used Preprocessing
|
RabbitMQ: Free disk space | The current free disk space. |
Dependent item | rabbitmq.node.disk_free Preprocessing
|
RabbitMQ: Memory used | The memory usage expressed in bytes. |
Dependent item | rabbitmq.node.mem_used Preprocessing
|
RabbitMQ: Memory limit | The memory usage with high watermark properties expressed in bytes. |
Dependent item | rabbitmq.node.mem_limit Preprocessing
|
RabbitMQ: Disk free limit | The free space limit of a disk expressed in bytes. |
Dependent item | rabbitmq.node.diskfreelimit Preprocessing
|
RabbitMQ: Runtime run queue | The average number of Erlang processes waiting to run. |
Dependent item | rabbitmq.node.run_queue Preprocessing
|
RabbitMQ: Sockets used | The number of file descriptors used as sockets. |
Dependent item | rabbitmq.node.sockets_used Preprocessing
|
RabbitMQ: Sockets available | The file descriptors available for use as sockets. |
Dependent item | rabbitmq.node.sockets_total Preprocessing
|
RabbitMQ: Number of network partitions | The number of network partitions, which this node "sees". |
Dependent item | rabbitmq.node.partitions Preprocessing
|
RabbitMQ: Is running | It "sees" whether the node is running or not. |
Dependent item | rabbitmq.node.running Preprocessing
|
RabbitMQ: Memory alarm | It checks whether the host has a memory alarm or not. |
Dependent item | rabbitmq.node.mem_alarm Preprocessing
|
RabbitMQ: Disk free alarm | It checks whether the node has a disk alarm or not. |
Dependent item | rabbitmq.node.diskfreealarm Preprocessing
|
RabbitMQ: Uptime | Uptime expressed in milliseconds. |
Dependent item | rabbitmq.node.uptime Preprocessing
|
RabbitMQ: Get processes summary | The aggregated data of summary metrics for all processes. |
Zabbix agent | proc.get[{$RABBITMQ.PROCESS.NAME.PARAMETER},,,summary] |
RabbitMQ: Service response time | Zabbix agent | net.tcp.service.perf["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ: Version has changed | RabbitMQ version has changed. Acknowledge to close the problem manually. |
last(/RabbitMQ node by Zabbix agent/rabbitmq.node.overview.rabbitmq_version,#1)<>last(/RabbitMQ node by Zabbix agent/rabbitmq.node.overview.rabbitmq_version,#2) and length(last(/RabbitMQ node by Zabbix agent/rabbitmq.node.overview.rabbitmq_version))>0 |Info |
Manual close: Yes | |
RabbitMQ: Number of network partitions is too high | For more details see Detecting Network Partitions. |
min(/RabbitMQ node by Zabbix agent/rabbitmq.node.partitions,5m)>0 |Warning |
||
RabbitMQ: Memory alarm | For more details see Memory Alarms. |
last(/RabbitMQ node by Zabbix agent/rabbitmq.node.mem_alarm)=1 |Average |
||
RabbitMQ: Free disk space alarm | For more details see Free Disk Space Alarms. |
last(/RabbitMQ node by Zabbix agent/rabbitmq.node.disk_free_alarm)=1 |Average |
||
RabbitMQ: Host has been restarted | Uptime is less than 10 minutes. |
last(/RabbitMQ node by Zabbix agent/rabbitmq.node.uptime)<10m |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
RabbitMQ process discovery | The discovery of the RabbitMQ summary processes. |
Dependent item | rabbitmq.proc.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
RabbitMQ: Get process data | The summary metrics aggregated by a process {#RABBITMQ.NAME}. |
Dependent item | rabbitmq.proc.get[{#RABBITMQ.NAME}] Preprocessing
|
RabbitMQ: Number of running processes | The number of running processes {#RABBITMQ.NAME}. |
Dependent item | rabbitmq.proc.num[{#RABBITMQ.NAME}] Preprocessing
|
RabbitMQ: Memory usage (rss) | The summary of resident set size memory used by a process {#RABBITMQ.NAME} expressed in bytes. |
Dependent item | rabbitmq.proc.rss[{#RABBITMQ.NAME}] Preprocessing
|
RabbitMQ: Memory usage (vsize) | The summary of virtual memory used by a process {#RABBITMQ.NAME} expressed in bytes. |
Dependent item | rabbitmq.proc.vmem[{#RABBITMQ.NAME}] Preprocessing
|
RabbitMQ: Memory usage, % | The percentage of real memory used by a process {#RABBITMQ.NAME}. |
Dependent item | rabbitmq.proc.pmem[{#RABBITMQ.NAME}] Preprocessing
|
RabbitMQ: CPU utilization | The percentage of the CPU utilization by a process {#RABBITMQ.NAME}. |
Zabbix agent | proc.cpu.util[{#RABBITMQ.NAME}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ: Process is not running | last(/RabbitMQ node by Zabbix agent/rabbitmq.proc.num[{#RABBITMQ.NAME}])=0 |High |
|||
RabbitMQ: Failed to fetch nodes data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/nodes/{$RABBITMQ.CLUSTER.NAME}@{HOST.NAME}?memory=true"],30m)=1 and last(/RabbitMQ node by Zabbix agent/rabbitmq.proc.num[{#RABBITMQ.NAME}])>0 |Warning |
Manual close: Yes Depends on:
|
|
RabbitMQ: Service is down | last(/RabbitMQ node by Zabbix agent/net.tcp.service["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"])=0 and last(/RabbitMQ node by Zabbix agent/rabbitmq.proc.num[{#RABBITMQ.NAME}])>0 |Average |
Manual close: Yes | ||
RabbitMQ: Node is not running | RabbitMQ node is not running. |
max(/RabbitMQ node by Zabbix agent/rabbitmq.node.running,5m)=0 and last(/RabbitMQ node by Zabbix agent/rabbitmq.proc.num[{#RABBITMQ.NAME}])>0 |Average |
Depends on:
|
|
RabbitMQ: Service response time is too high | min(/RabbitMQ node by Zabbix agent/net.tcp.service.perf["{$RABBITMQ.API.SCHEME}","{$RABBITMQ.API.HOST}","{$RABBITMQ.API.PORT}"],5m)>{$RABBITMQ.RESPONSE_TIME.MAX.WARN} and last(/RabbitMQ node by Zabbix agent/rabbitmq.proc.num[{#RABBITMQ.NAME}])>0 |Warning |
Manual close: Yes Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Health Check 3.8.10+ discovery | Specific metrics for the versions: up to and including 3.8.10. |
Dependent item | rabbitmq.healthcheck.v3810.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
RabbitMQ: Healthcheck: local alarms in effect on this node{#SINGLETON} | It responds with a status code Otherwise, it responds with a status code |
Zabbix agent | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/local-alarms{#SINGLETON}"] Preprocessing
|
RabbitMQ: Healthcheck: expiration date on the certificates{#SINGLETON} | It checks the expiration date on the certificates for every listener configured to use the Transport Layer Security (TLS). It responds with a status code Otherwise, it responds with a status code |
Zabbix agent | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/certificate-expiration/1/months{#SINGLETON}"] Preprocessing
|
RabbitMQ: Healthcheck: virtual hosts on this node{#SINGLETON} | It responds with It responds with a status code Otherwise it responds with a status code |
Zabbix agent | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/virtual-hosts{#SINGLETON}"] Preprocessing
|
RabbitMQ: Healthcheck: classic mirrored queues without synchronized mirrors online{#SINGLETON} | It checks if there are classic mirrored queues without synchronized mirrors online (queues that would potentially lose data if the target node is shut down). It responds with a status code Otherwise, it responds with a status code |
Zabbix agent | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/node-is-mirror-sync-critical{#SINGLETON}"] Preprocessing
|
RabbitMQ: Healthcheck: queues with minimum online quorum{#SINGLETON} | It checks if there are quorum queues with minimum online quorum (queues that would lose their quorum and availability if the target node is shut down). It responds with a status code Otherwise, it responds with a status code |
Zabbix agent | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/node-is-quorum-critical{#SINGLETON}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ: There are active alarms in the node | It checks the active alarms in the nodes via API. This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/local-alarms{#SINGLETON}"])=0 |Average |
||
RabbitMQ: There are valid TLS certificates expiring in the next month | It checks if there are valid TLS certificates expiring in the next month. This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/certificate-expiration/1/months{#SINGLETON}"])=0 |Average |
||
RabbitMQ: There are not running virtual hosts | It checks if there are not running virtual hosts via API. This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/virtual-hosts{#SINGLETON}"])=0 |Average |
||
RabbitMQ: There are queues that could potentially lose data if this node goes offline. | It checks whether there are queues that could potentially lose data if this node goes offline via API. This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/node-is-mirror-sync-critical{#SINGLETON}"])=0 |Average |
||
RabbitMQ: There are queues that would lose their quorum and availability if this node is shut down. | It checks if there are queues that could potentially lose data if this node goes offline via API. This is the default API endpoint path: http://{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/index.html. |
last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/health/checks/node-is-quorum-critical{#SINGLETON}"])=0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Health Check 3.8.9- discovery | Specific metrics for the versions: up to and including 3.8.4. |
Dependent item | rabbitmq.healthcheck.v389.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
RabbitMQ: Healthcheck{#SINGLETON} | It checks whether the RabbitMQ application is running; and whether the channels and queues can be listed successfully; and that no alarms are in effect. |
Zabbix agent | web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/healthchecks/node{#SINGLETON}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ: Node healthcheck failed | For more details see Health Checks. |
last(/RabbitMQ node by Zabbix agent/web.page.get["{$RABBITMQ.API.SCHEME}://{$RABBITMQ.API.USER}:{$RABBITMQ.API.PASSWORD}@{$RABBITMQ.API.HOST}:{$RABBITMQ.API.PORT}/api/healthchecks/node{#SINGLETON}"])=0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Queues discovery | The metrics for an individual queue. |
Dependent item | rabbitmq.queues.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Get data | The HTTP API endpoint that returns [{#VHOST}][{#QUEUE}] queue metrics |
Dependent item | rabbitmq.get_exchanges["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages total | The count of total messages in the queue. |
Dependent item | rabbitmq.queue.messages["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages per second | The count of total messages per second in the queue. |
Dependent item | rabbitmq.queue.messages.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Consumers | The number of consumers. |
Dependent item | rabbitmq.queue.consumers["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Memory | The bytes of memory consumed by the Erlang process associated with the queue, including stack, heap and internal structures. |
Dependent item | rabbitmq.queue.memory["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages ready | The number of messages ready to be delivered to clients. |
Dependent item | rabbitmq.queue.messages_ready["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages ready per second | The number of messages per second ready to be delivered to clients. |
Dependent item | rabbitmq.queue.messages_ready.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages unacknowledged | The number of messages delivered to clients but not yet acknowledged. |
Dependent item | rabbitmq.queue.messages_unacknowledged["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages unacknowledged per second | The number of messages per second delivered to clients but not yet acknowledged. |
Dependent item | rabbitmq.queue.messages_unacknowledged.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages acknowledged | The number of messages delivered to clients and acknowledged. |
Dependent item | rabbitmq.queue.messages.ack["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages acknowledged per second | The number of messages (per second) delivered to clients and acknowledged. |
Dependent item | rabbitmq.queue.messages.ack.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages delivered | The count of messages delivered to consumers in acknowledgement mode. |
Dependent item | rabbitmq.queue.messages.deliver["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages delivered per second | The count of messages (per second) delivered to consumers in acknowledgement mode. |
Dependent item | rabbitmq.queue.messages.deliver.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Sum of messages delivered | The sum of messages delivered to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to the |
Dependent item | rabbitmq.queue.messages.deliver_get["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Sum of messages delivered per second | The rate of delivery per second. The sum of messages delivered (per second) to consumers: in acknowledgement mode and in no-acknowledgement mode; delivered to consumers in response to |
Dependent item | rabbitmq.queue.messages.deliver_get.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages published | The count of published messages. |
Dependent item | rabbitmq.queue.messages.publish["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages published per second | The rate of published messages per second. |
Dependent item | rabbitmq.queue.messages.publish.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages redelivered | The count of subset of messages in the |
Dependent item | rabbitmq.queue.messages.redeliver["{#VHOST}/{#QUEUE}"] Preprocessing
|
RabbitMQ: Queue [{#VHOST}][{#QUEUE}]: Messages redelivered per second | The rate of messages redelivered per second. |
Dependent item | rabbitmq.queue.messages.redeliver.rate["{#VHOST}/{#QUEUE}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
RabbitMQ: Too many messages in queue [{#VHOST}][{#QUEUE}] | min(/RabbitMQ node by Zabbix agent/rabbitmq.queue.messages["{#VHOST}/{#QUEUE}"],5m)>{$RABBITMQ.MESSAGES.MAX.WARN:"{#QUEUE}"} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Proxmox VE monitoring by Zabbix via HTTP and doesn't require any external scripts.
Proxmox VE uses a REST like API. The concept is described in Resource Oriented Architecture (ROA).
Check the API documentation
for details.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Please provide the necessary access levels for both the User and the Token:
Copy the resulting Token ID and Secret into the host macros {$PVE.TOKEN.ID}
and {$PVE.TOKEN.SECRET}
.
Set the hostname or IP address of the Proxmox API VE host in the {$PVE.URL.HOST}
macro. You can also change the API port in the {$PVE.URL.PORT}
macro if necessary.
Name | Description | Default |
---|---|---|
{$PVE.URL.HOST} | The hostname or IP address of the Proxmox VE API host. |
<SET PVE HOST> |
{$PVE.URL.PORT} | The API uses the HTTPS protocol and the server listens to port 8006 by default. |
8006 |
{$PVE.TOKEN.ID} | API tokens allow stateless access to most parts of the REST API by another system, software or API client. |
USER@REALM!TOKENID |
{$PVE.TOKEN.SECRET} | Secret key. |
xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx |
{$PVE.ROOT.PUSE.MAX.WARN} | Maximum used root space in percentage. |
90 |
{$PVE.MEMORY.PUSE.MAX.WARN} | Maximum used memory in percentage. |
90 |
{$PVE.CPU.PUSE.MAX.WARN} | Maximum used CPU in percentage. |
90 |
{$PVE.SWAP.PUSE.MAX.WARN} | Maximum used swap space in percentage. |
90 |
{$PVE.VM.MEMORY.PUSE.MAX.WARN} | Maximum used memory in percentage. |
90 |
{$PVE.VM.CPU.PUSE.MAX.WARN} | Maximum used CPU in percentage. |
90 |
{$PVE.LXC.MEMORY.PUSE.MAX.WARN} | Maximum used memory in percentage. |
90 |
{$PVE.LXC.CPU.PUSE.MAX.WARN} | Maximum used CPU in percentage. |
90 |
{$PVE.STORAGE.PUSE.MAX.WARN} | Maximum used storage space in percentage. |
90 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Proxmox: Get cluster resources | Resources index. |
HTTP agent | proxmox.cluster.resources Preprocessing
|
Proxmox: Get cluster status | Get cluster status information. |
HTTP agent | proxmox.cluster.status Preprocessing
|
Proxmox: API service status | Get API service status. |
Script | proxmox.api.available Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Proxmox: API service not available | The API service is not available. Check your network and authorization settings. |
last(/Proxmox VE by HTTP/proxmox.api.available) <> 200 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster discovery | Dependent item | proxmox.cluster.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Proxmox: Cluster [{#RESOURCE.NAME}]: Quorate | Indicates if there is a majority of nodes online to make decisions. |
Dependent item | proxmox.cluster.quorate[{#RESOURCE.NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Proxmox: Cluster [{#RESOURCE.NAME}] not quorum | Proxmox VE use a quorum-based technique to provide a consistent state among all cluster nodes. |
last(/Proxmox VE by HTTP/proxmox.cluster.quorate[{#RESOURCE.NAME}]) <> 1 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Node discovery | Dependent item | proxmox.node.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Proxmox: Node [{#NODE.NAME}]: Status | Indicates if the node is online or offline. |
Dependent item | proxmox.node.online[{#NODE.NAME}] Preprocessing
|
Proxmox: Node [{#NODE.NAME}]: Status | Read node status. |
HTTP agent | proxmox.node.status[{#NODE.NAME}] |
Proxmox: Node [{#NODE.NAME}]: RRD statistics | Read node RRD statistics. |
HTTP agent | proxmox.node.rrd[{#NODE.NAME}] Preprocessing
|
Proxmox: Node [{#NODE.NAME}]: Time | Read server time and time zone settings. |
HTTP agent | proxmox.node.time[{#NODE.NAME}] |
Proxmox: Node [{#NODE.NAME}]: Uptime | The system uptime expressed in the following format: "N days, hh:mm:ss". |
Dependent item | proxmox.node.uptime[{#NODE.NAME}] Preprocessing
|
Proxmox: Node [{#NODE.NAME}]: PVE version | PVE manager version. |
Dependent item | proxmox.node.pveversion[{#NODE.NAME}] Preprocessing
|
Proxmox: Node [{#NODE.NAME}]: Kernel version | Kernel version info. |
Dependent item | proxmox.node.kernelversion[{#NODE.NAME}] Preprocessing
|
Proxmox: Node [{#NODE.NAME}]: Root filesystem, used | Root filesystem usage. |
Dependent item | proxmox.node.rootused[{#NODE.NAME}] Preprocessing
|
Proxmox: Node [{#NODE.NAME}]: Root filesystem, total | Root filesystem total. |
Dependent item | proxmox.node.roottotal[{#NODE.NAME}] Preprocessing
|
Proxmox: Node [{#NODE.NAME}]: Memory, used | Memory usage. |
Dependent item | proxmox.node.memused[{#NODE.NAME}] Preprocessing
|
Proxmox: Node [{#NODE.NAME}]: Memory, total | Memory total. |
Dependent item | proxmox.node.memtotal[{#NODE.NAME}] Preprocessing
|
Proxmox: Node [{#NODE.NAME}]: CPU, usage | CPU usage. |
Dependent item | proxmox.node.cpu[{#NODE.NAME}] Preprocessing
|
Proxmox: Node [{#NODE.NAME}]: Outgoing data, rate | Network usage. |
Dependent item | proxmox.node.netout[{#NODE.NAME}] Preprocessing
|
Proxmox: Node [{#NODE.NAME}]: Incoming data, rate | Network usage. |
Dependent item | proxmox.node.netin[{#NODE.NAME}] Preprocessing
|
Proxmox: Node [{#NODE.NAME}]: CPU, loadavg | CPU average load. |
Dependent item | proxmox.node.loadavg[{#NODE.NAME}] Preprocessing
|
Proxmox: Node [{#NODE.NAME}]: CPU, iowait | CPU iowait time. |
Dependent item | proxmox.node.iowait[{#NODE.NAME}] Preprocessing
|
Proxmox: Node [{#NODE.NAME}]: Swap filesystem, total | Swap total. |
Dependent item | proxmox.node.swaptotal[{#NODE.NAME}] Preprocessing
|
Proxmox: Node [{#NODE.NAME}]: Swap filesystem, used | Swap used. |
Dependent item | proxmox.node.swapused[{#NODE.NAME}] Preprocessing
|
Proxmox: Node [{#NODE.NAME}]: Time zone | Time zone. |
Dependent item | proxmox.node.timezone[{#NODE.NAME}] Preprocessing
|
Proxmox: Node [{#NODE.NAME}]: Localtime | Seconds since 1970-01-01 00:00:00 (local time). |
Dependent item | proxmox.node.localtime[{#NODE.NAME}] Preprocessing
|
Proxmox: Node [{#NODE.NAME}]: Time | Seconds since 1970-01-01 00:00:00 UTC. |
Dependent item | proxmox.node.utctime[{#NODE.NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Proxmox: Node [{#NODE.NAME}] offline | Node offline. |
last(/Proxmox VE by HTTP/proxmox.node.online[{#NODE.NAME}]) <> 1 |High |
||
Proxmox: Node [{#NODE.NAME}]: has been restarted | Uptime is less than 10 minutes. |
last(/Proxmox VE by HTTP/proxmox.node.uptime[{#NODE.NAME}])<10m |Info |
Manual close: Yes Depends on:
|
|
Proxmox: Node [{#NODE.NAME}]: PVE manager has changed | Firmware version has changed. Acknowledge to close the problem manually. |
last(/Proxmox VE by HTTP/proxmox.node.pveversion[{#NODE.NAME}],#1)<>last(/Proxmox VE by HTTP/proxmox.node.pveversion[{#NODE.NAME}],#2) and length(last(/Proxmox VE by HTTP/proxmox.node.pveversion[{#NODE.NAME}]))>0 |Info |
Manual close: Yes | |
Proxmox: Node [{#NODE.NAME}]: Kernel version has changed | Firmware version has changed. Acknowledge to close the problem manually. |
last(/Proxmox VE by HTTP/proxmox.node.kernelversion[{#NODE.NAME}],#1)<>last(/Proxmox VE by HTTP/proxmox.node.kernelversion[{#NODE.NAME}],#2) and length(last(/Proxmox VE by HTTP/proxmox.node.kernelversion[{#NODE.NAME}]))>0 |Info |
Manual close: Yes | |
Proxmox: Node [{#NODE.NAME}] high root filesystem space usage | Root filesystem space usage. |
min(/Proxmox VE by HTTP/proxmox.node.rootused[{#NODE.NAME}],5m) / last(/Proxmox VE by HTTP/proxmox.node.roottotal[{#NODE.NAME}]) * 100 >{$PVE.ROOT.PUSE.MAX.WARN:"{#NODE.NAME}"} |Warning |
||
Proxmox: Node [{#NODE.NAME}] high memory usage | Memory usage. |
min(/Proxmox VE by HTTP/proxmox.node.memused[{#NODE.NAME}],5m) / last(/Proxmox VE by HTTP/proxmox.node.memtotal[{#NODE.NAME}]) * 100 >{$PVE.MEMORY.PUSE.MAX.WARN:"{#NODE.NAME}"} |Warning |
||
Proxmox: Node [{#NODE.NAME}] high CPU usage | CPU usage. |
min(/Proxmox VE by HTTP/proxmox.node.cpu[{#NODE.NAME}],5m) > {$PVE.CPU.PUSE.MAX.WARN:"{#NODE.NAME}"} |Warning |
||
Proxmox: Node [{#NODE.NAME}] high swap space usage | If there is no swap configured, this trigger is ignored. |
min(/Proxmox VE by HTTP/proxmox.node.swapused[{#NODE.NAME}],5m) / last(/Proxmox VE by HTTP/proxmox.node.swaptotal[{#NODE.NAME}]) * 100 > {$PVE.SWAP.PUSE.MAX.WARN:"{#NODE.NAME}"} and last(/Proxmox VE by HTTP/proxmox.node.swaptotal[{#NODE.NAME}]) > 0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Storage discovery | Dependent item | proxmox.storage.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Proxmox: Storage [{#NODE.NAME}/{#STORAGE.NAME}]: Type | More specific type, if available. |
Dependent item | proxmox.node.plugintype[{#NODE.NAME},{#STORAGE.NAME}] Preprocessing
|
Proxmox: Storage [{#NODE.NAME}/{#STORAGE.NAME}]: Size | Storage size in bytes. |
Dependent item | proxmox.node.maxdisk[{#NODE.NAME},{#STORAGE.NAME}] Preprocessing
|
Proxmox: Storage [{#NODE.NAME}/{#STORAGE.NAME}]: Content | Allowed storage content types. |
Dependent item | proxmox.node.content[{#NODE.NAME},{#STORAGE.NAME}] Preprocessing
|
Proxmox: Storage [{#NODE.NAME}/{#STORAGE.NAME}]: Used | Used disk space in bytes. |
Dependent item | proxmox.node.disk[{#NODE.NAME},{#STORAGE.NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Proxmox: Storage [{#NODE.NAME}/{#STORAGE.NAME}] high filesystem space usage | Root filesystem space usage. |
min(/Proxmox VE by HTTP/proxmox.node.disk[{#NODE.NAME},{#STORAGE.NAME}],5m) / last(/Proxmox VE by HTTP/proxmox.node.maxdisk[{#NODE.NAME},{#STORAGE.NAME}]) * 100 >{$PVE.STORAGE.PUSE.MAX.WARN:"{#NODE.NAME}/{#STORAGE.NAME}"} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
QEMU discovery | Dependent item | proxmox.qemu.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Disk write, rate | Disk write. |
Dependent item | proxmox.qemu.diskwrite[{#QEMU.ID}] Preprocessing
|
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Disk read, rate | Disk read. |
Dependent item | proxmox.qemu.diskread[{#QEMU.ID}] Preprocessing
|
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Memory usage | Used memory in bytes. |
Dependent item | proxmox.qemu.mem[{#QEMU.ID}] Preprocessing
|
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Memory total | The total memory expressed in bytes. |
Dependent item | proxmox.qemu.maxmem[{#QEMU.ID}] Preprocessing
|
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Incoming data, rate | Incoming data rate. |
Dependent item | proxmox.qemu.netin[{#QEMU.ID}] Preprocessing
|
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Outgoing data, rate | Outgoing data rate. |
Dependent item | proxmox.qemu.netout[{#QEMU.ID}] Preprocessing
|
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: CPU usage | CPU load. |
Dependent item | proxmox.qemu.cpu[{#QEMU.ID}] Preprocessing
|
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME}]: Get data | Get VM status data. |
HTTP agent | proxmox.qemu.get.data[{#QEMU.ID}] |
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Uptime | The system uptime expressed in the following format: "N days, hh:mm:ss". |
Dependent item | proxmox.qemu.uptime[{#QEMU.ID}] Preprocessing
|
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Status | Status of Virtual Machine. |
Dependent item | proxmox.qemu.vmstatus[{#QEMU.ID}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})] high memory usage | Memory usage. |
min(/Proxmox VE by HTTP/proxmox.qemu.mem[{#QEMU.ID}],5m) / last(/Proxmox VE by HTTP/proxmox.qemu.maxmem[{#QEMU.ID}]) * 100 >{$PVE.VM.MEMORY.PUSE.MAX.WARN:"{#QEMU.ID}"} |Warning |
||
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})] high CPU usage | CPU usage. |
min(/Proxmox VE by HTTP/proxmox.qemu.cpu[{#QEMU.ID}],5m) > {$PVE.VM.CPU.PUSE.MAX.WARN:"{#QEMU.ID}"} |Warning |
||
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME}]: has been restarted | Uptime is less than 10 minutes. |
last(/Proxmox VE by HTTP/proxmox.qemu.uptime[{#QEMU.ID}])<10m |Info |
Manual close: Yes Depends on:
|
|
Proxmox: VM [{#NODE.NAME}/{#QEMU.NAME} ({#QEMU.ID})]: Not running | VM state is not "running". |
last(/Proxmox VE by HTTP/proxmox.qemu.vmstatus[{#QEMU.ID}])<>"running" |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
LXC discovery | Dependent item | proxmox.lxc.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME}]: Get data | Get LXC status data. |
HTTP agent | proxmox.lxc.get.data[{#LXC.ID}] |
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Uptime | The system uptime expressed in the following format: "N days, hh:mm:ss". |
Dependent item | proxmox.lxc.uptime[{#LXC.ID}] Preprocessing
|
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Status | Status of LXC container. |
Dependent item | proxmox.lxc.vmstatus[{#LXC.ID}] Preprocessing
|
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Disk write, rate | Disk write. |
Dependent item | proxmox.lxc.diskwrite[{#LXC.ID}] Preprocessing
|
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Disk read, rate | Disk read. |
Dependent item | proxmox.lxc.diskread[{#LXC.ID}] Preprocessing
|
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Memory usage | Used memory in bytes. |
Dependent item | proxmox.lxc.mem[{#LXC.ID}] Preprocessing
|
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Memory total | The total memory expressed in bytes. |
Dependent item | proxmox.lxc.maxmem[{#LXC.ID}] Preprocessing
|
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Incoming data, rate | Incoming data rate. |
Dependent item | proxmox.lxc.netin[{#LXC.ID}] Preprocessing
|
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Outgoing data, rate | Outgoing data rate. |
Dependent item | proxmox.lxc.netout[{#LXC.ID}] Preprocessing
|
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: CPU usage | CPU load. |
Dependent item | proxmox.lxc.cpu[{#LXC.ID}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME}]: has been restarted | Uptime is less than 10 minutes. |
last(/Proxmox VE by HTTP/proxmox.lxc.uptime[{#LXC.ID}])<10m |Info |
Manual close: Yes Depends on:
|
|
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})]: Not running | LXC state is not "running". |
last(/Proxmox VE by HTTP/proxmox.lxc.vmstatus[{#LXC.ID}])<>"running" |Average |
||
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})] high memory usage | Memory usage. |
min(/Proxmox VE by HTTP/proxmox.lxc.mem[{#LXC.ID}],5m) / last(/Proxmox VE by HTTP/proxmox.lxc.maxmem[{#LXC.ID}]) * 100 >{$PVE.LXC.MEMORY.PUSE.MAX.WARN:"{#LXC.ID}"} |Warning |
||
Proxmox: LXC [{#NODE.NAME}/{#LXC.NAME} ({#LXC.ID})] high CPU usage | CPU usage. |
min(/Proxmox VE by HTTP/proxmox.lxc.cpu[{#LXC.ID}],5m) > {$PVE.LXC.CPU.PUSE.MAX.WARN:"{#LXC.ID}"} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed to monitor processes by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection. For example, by specifying "zabbix" as macro value, you can monitor all zabbix processes.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Install and setup Zabbix agent.
Custom processes set in macros:
Name | Description | Default |
---|---|---|
{$PROC.NAME.MATCHES} | This macro is used in the discovery of processes. It can be overridden on a host-level or on a linked template-level. |
<CHANGE VALUE> |
{$PROC.NAME.NOT_MATCHES} | This macro is used in the discovery of processes. It can be overridden on a host-level or on a linked template-level. |
<CHANGE VALUE> |
Name | Description | Type | Key and additional info |
---|---|---|---|
OS: Get process summary | The summary of data metrics for all processes. |
Zabbix agent | proc.get[,,,summary] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Processes discovery | Discovery of OS summary processes. |
Dependent item | custom.proc.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Process [{#NAME}]: Get data | Summary metrics collected during the process {#NAME}. |
Dependent item | custom.proc.get[{#NAME}] Preprocessing
|
Process [{#NAME}]: Memory usage (rss) | The summary of Resident Set Size (RSS) memory used by the process {#NAME} in bytes. |
Dependent item | custom.proc.rss[{#NAME}] Preprocessing
|
Process [{#NAME}]: Memory usage (vsize) | The summary of virtual memory used by process {#NAME} in bytes. |
Dependent item | custom.proc.vmem[{#NAME}] Preprocessing
|
Process [{#NAME}]: Memory usage, % | The percentage of real memory used by the process {#NAME}. |
Dependent item | custom.proc.pmem[{#NAME}] Preprocessing
|
Process [{#NAME}]: Number of running processes | The number of running processes {#NAME}. |
Dependent item | custom.proc.num[{#NAME}] Preprocessing
|
Process [{#NAME}]: Number of threads | The number of threads {#NAME}. |
Dependent item | custom.proc.thread[{#NAME}] Preprocessing
|
Process [{#NAME}]: Number of page faults | The number of page faults {#NAME}. |
Dependent item | custom.proc.page[{#NAME}] Preprocessing
|
Process [{#NAME}]: Size of locked memory | The size of locked memory {#NAME}. |
Dependent item | custom.proc.mem.locked[{#NAME}] Preprocessing
|
Process [{#NAME}]: Swap space used | The swap space used by {#NAME}. |
Dependent item | custom.proc.swap[{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Process [{#NAME}]: Process is not running | last(/OS processes by Zabbix agent/custom.proc.num[{#NAME}])=0 |High |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is developed to monitor the FastCGI Process Manager (PHP-FPM) by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template PHP-FPM by Zabbix agent
- collects metrics by polling PHP-FPM status-page with HTTP agent remotely.
Note that this solution supports HTTPS and redirects.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Note that depending on your OS distribution, the PHP-FPM executable/service name can vary. RHEL-like distributions usually name both process and service as php-fpm
, while for Debian/Ubuntu based distributions it may include the version, for example: executable name - php-fpm8.2
, systemd service name - php8.2-fpm
. Adjust the following instructions accordingly if needed.
Open the PHP-FPM configuration file and enable the status page as shown.
pm.status_path = /status
ping.path = /ping
Validate the syntax to ensure it is correct before you reload the service. Replace the <version>
in the command if needed.
$ php-fpm -t
or
$ php-fpm<version> -t
Reload the php-fpm
service to make the change active. Replace the <version>
in the command if needed.
$ systemctl reload php-fpm
or
$ systemctl reload php<version>-fpm
Next, edit the configuration of your web server.
If you use Nginx, edit the configuration file of your Nginx server block (virtual host) and add the location block below it.
# Enable php-fpm status page
location ~ ^/(status|ping)$ {
## disable access logging for request if you prefer
access_log off;
## Only allow trusted IPs for security, deny everyone else
# allow 127.0.0.1;
# allow 1.2.3.4; # your IP here
# deny all;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_index index.php;
include fastcgi_params;
## Now the port or socket of the php-fpm pool we want the status of
fastcgi_pass 127.0.0.1:9000;
# fastcgi_pass unix:/run/php-fpm/your_socket.sock;
}
If you use Apache, edit the configuration file of the virtual host and add the following location blocks.
<LocationMatch "/status">
Require ip 127.0.0.1
# Require ip 1.2.3.4 # Your IP here
# Adjust the path to the socket if needed
ProxyPass "unix:/run/php-fpm/www.sock|fcgi://localhost/status"
</LocationMatch>
<LocationMatch "/ping">
Require ip 127.0.0.1
# Require ip 1.2.3.4 # Your IP here
# Adjust the path to the socket if needed
ProxyPass "unix:/run/php-fpm/www.sock|fcgi://localhost/ping"
</LocationMatch>
$ nginx -t
or
$ httpd -t
or
$ apachectl configtest
Reload the web server configuration. The command may vary depending on the OS distribution and web server.
$ systemctl reload nginx
or
$ systemctl reload httpd
or
$ systemctl reload apache2
Verify that the pages are available with these commands.
curl -L 127.0.0.1/status
curl -L 127.0.0.1/ping
If you use another location of the status/ping pages, don't forget to change the {$PHP_FPM.STATUS.PAGE}/{$PHP_FPM.PING.PAGE}
macro.
If you use another web server port or scheme for the location of the PHP-FPM status/ping pages, don't forget to change the macros {$PHP_FPM.SCHEME}
and {$PHP_FPM.PORT}
.
Name | Description | Default |
---|---|---|
{$PHP_FPM.PORT} | The port of the PHP-FPM status host or container. |
80 |
{$PHP_FPM.SCHEME} | Request scheme which may be http or https |
http |
{$PHP_FPM.HOST} | The hostname or IP address of the PHP-FPM status for a host or container. |
localhost |
{$PHP_FPM.STATUS.PAGE} | The path of the PHP-FPM status page. |
status |
{$PHP_FPM.PING.PAGE} | The path of the PHP-FPM ping page. |
ping |
{$PHP_FPM.PING.REPLY} | The expected reply to the ping. |
pong |
{$PHP_FPM.QUEUE.WARN.MAX} | The maximum percent of the PHP-FPM queue usage for a trigger expression. |
80 |
Name | Description | Type | Key and additional info | |
---|---|---|---|---|
PHP-FPM: Get ping page | HTTP agent | php-fpm.get_ping | ||
PHP-FPM: Get status page | HTTP agent | php-fpm.get_status | ||
PHP-FPM: Ping | Dependent item | php-fpm.ping Preprocessing
|
\r?\n) 1</p><p>⛔️Custom on fail: Set value to: 0` |
|
PHP-FPM: Processes, active | The total number of active processes. |
Dependent item | php-fpm.processes_active Preprocessing
|
|
PHP-FPM: Version | The current version of the PHP. You can get it from the HTTP-Header "X-Powered-By"; it may not work if you have changed the default HTTP-headers. |
Dependent item | php-fpm.version Preprocessing
|
|
PHP-FPM: Pool name | The name of the current pool. |
Dependent item | php-fpm.name Preprocessing
|
|
PHP-FPM: Uptime | It indicates how long has this pool been running. |
Dependent item | php-fpm.uptime Preprocessing
|
|
PHP-FPM: Start time | The time when this pool was started. |
Dependent item | php-fpm.start_time Preprocessing
|
|
PHP-FPM: Processes, total | The total number of server processes currently running. |
Dependent item | php-fpm.processes_total Preprocessing
|
|
PHP-FPM: Processes, idle | The total number of idle processes. |
Dependent item | php-fpm.processes_idle Preprocessing
|
|
PHP-FPM: Process manager | The method used by the process manager to control the number of child processes for this pool. |
Dependent item | php-fpm.process_manager Preprocessing
|
|
PHP-FPM: Processes, max active | The highest value of "active processes" since the PHP-FPM server was started. |
Dependent item | php-fpm.processesmaxactive Preprocessing
|
|
PHP-FPM: Accepted connections per second | The number of accepted requests per second. |
Dependent item | php-fpm.conn_accepted.rate Preprocessing
|
|
PHP-FPM: Slow requests | The number of requests that has exceeded your |
Dependent item | php-fpm.slow_requests Preprocessing
|
|
PHP-FPM: Listen queue | The current number of connections that have been initiated but not yet accepted. |
Dependent item | php-fpm.listen_queue Preprocessing
|
|
PHP-FPM: Listen queue, max | The maximum number of requests in the queue of pending connections since this FPM pool was started. |
Dependent item | php-fpm.listenqueuemax Preprocessing
|
|
PHP-FPM: Listen queue, len | The size of the socket queue of pending connections. |
Dependent item | php-fpm.listenqueuelen Preprocessing
|
|
PHP-FPM: Queue usage | The utilization of the queue. |
Calculated | php-fpm.listenqueueusage | |
PHP-FPM: Max children reached | The number of times that |
Dependent item | php-fpm.max_children Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
PHP-FPM: Service is down | last(/PHP-FPM by HTTP/php-fpm.ping)=0 or nodata(/PHP-FPM by HTTP/php-fpm.ping,3m)=1 |High |
Manual close: Yes | ||
PHP-FPM: Version has changed | The PHP-FPM version has changed. Acknowledge to close the problem manually. |
last(/PHP-FPM by HTTP/php-fpm.version,#1)<>last(/PHP-FPM by HTTP/php-fpm.version,#2) and length(last(/PHP-FPM by HTTP/php-fpm.version))>0 |Info |
Manual close: Yes | |
PHP-FPM: Failed to fetch info data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/PHP-FPM by HTTP/php-fpm.uptime,30m)=1 |Info |
Manual close: Yes Depends on:
|
|
PHP-FPM: Pool has been restarted | Uptime is less than 10 minutes. |
last(/PHP-FPM by HTTP/php-fpm.uptime)<10m |Info |
Manual close: Yes | |
PHP-FPM: Manager changed | The PHP-FPM manager has changed. Acknowledge to close the problem manually. |
last(/PHP-FPM by HTTP/php-fpm.process_manager,#1)<>last(/PHP-FPM by HTTP/php-fpm.process_manager,#2) |Info |
Manual close: Yes | |
PHP-FPM: Detected slow requests | The PHP-FPM has detected a slow request. |
min(/PHP-FPM by HTTP/php-fpm.slow_requests,#3)>0 |Warning |
||
PHP-FPM: Queue utilization is high | The queue for this pool has reached |
min(/PHP-FPM by HTTP/php-fpm.listen_queue_usage,15m) > {$PHP_FPM.QUEUE.WARN.MAX} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is developed to monitor the FastCGI Process Manager (PHP-FPM) by Zabbix agent that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template PHP-FPM by Zabbix agent
- collects metrics by polling the PHP-FPM status-page locally with Zabbix agent.
Note that this template doesn't support HTTPS and redirects (limitations of web.page.get
).
It also uses Zabbix agent to collect php-fpm
Linux process statistics, such as CPU usage, memory usage, and whether the process is running or not.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Note that depending on your OS distribution, the PHP-FPM executable/service name can vary. RHEL-like distributions usually name both process and service as php-fpm
, while for Debian/Ubuntu based distributions it may include the version, for example: executable name - php-fpm8.2
, systemd service name - php8.2-fpm
. Adjust the following instructions accordingly if needed.
Open the PHP-FPM configuration file and enable the status page as shown.
pm.status_path = /status
ping.path = /ping
Validate the syntax to ensure it is correct before you reload the service. Replace the <version>
in the command if needed.
$ php-fpm -t
or
$ php-fpm<version> -t
Reload the php-fpm
service to make the change active. Replace the <version>
in the command if needed.
$ systemctl reload php-fpm
or
$ systemctl reload php<version>-fpm
Next, edit the configuration of your web server.
If you use Nginx, edit the configuration file of your Nginx server block (virtual host) and add the location block below it.
# Enable php-fpm status page
location ~ ^/(status|ping)$ {
## disable access logging for request if you prefer
access_log off;
## Only allow trusted IPs for security, deny everyone else
# allow 127.0.0.1;
# allow 1.2.3.4; # your IP here
# deny all;
fastcgi_param SCRIPT_FILENAME $document_root$fastcgi_script_name;
fastcgi_index index.php;
include fastcgi_params;
## Now the port or socket of the php-fpm pool we want the status of
fastcgi_pass 127.0.0.1:9000;
# fastcgi_pass unix:/run/php-fpm/your_socket.sock;
}
If you use Apache, edit the configuration file of the virtual host and add the following location blocks.
<LocationMatch "/status">
Require ip 127.0.0.1
# Require ip 1.2.3.4 # Your IP here
# Adjust the path to the socket if needed
ProxyPass "unix:/run/php-fpm/www.sock|fcgi://localhost/status"
</LocationMatch>
<LocationMatch "/ping">
Require ip 127.0.0.1
# Require ip 1.2.3.4 # Your IP here
# Adjust the path to the socket if needed
ProxyPass "unix:/run/php-fpm/www.sock|fcgi://localhost/ping"
</LocationMatch>
$ nginx -t
or
$ httpd -t
or
$ apachectl configtest
Reload the web server configuration. The command may vary depending on the OS distribution and web server.
$ systemctl reload nginx
or
$ systemctl reload httpd
or
$ systemctl reload apache2
Verify that the pages are available with these commands.
curl -L 127.0.0.1/status
curl -L 127.0.0.1/ping
Depending on your OS distribution, the PHP-FPM process name may vary as well. Please check the actual name in the line "Name" from /proc/<pid>/status file (https://www.zabbix.com/documentation/6.4/manual/appendix/items/procmemnumnotes) and change the {$PHPFPM.PROCESS.NAME.PARAMETER} macro if needed.
If you use another location of the status/ping pages, don't forget to change the {$PHP_FPM.STATUS.PAGE}/{$PHP_FPM.PING.PAGE}
macro.
If you use another web server port for the location of the PHP-FPM status/ping pages, don't forget to change the macro {$PHP_FPM.PORT}
.
Name | Description | Default |
---|---|---|
{$PHP_FPM.PORT} | The port of the PHP-FPM status host or container. |
80 |
{$PHP_FPM.HOST} | The hostname or IP address of the PHP-FPM status for a host or container. |
localhost |
{$PHP_FPM.STATUS.PAGE} | The path of the PHP-FPM status page. |
status |
{$PHP_FPM.PING.PAGE} | The path of the PHP-FPM ping page. |
ping |
{$PHP_FPM.PING.REPLY} | The expected reply to the ping. |
pong |
{$PHP_FPM.QUEUE.WARN.MAX} | The maximum percent of the PHP-FPM queue usage for a trigger expression. |
80 |
{$PHPFPM.PROCESSNAME} | The process name filter for the PHP-FPM process discovery. May vary depending on your OS distribution. |
php-fpm |
{$PHP_FPM.PROCESS.NAME.PARAMETER} | The process name of the PHP-FPM used in the item key |
Name | Description | Type | Key and additional info | |
---|---|---|---|---|
PHP-FPM: Get processes summary | The aggregated data of summary metrics for all processes. |
Zabbix agent | proc.get[{$PHP_FPM.PROCESS.NAME.PARAMETER},,,summary] | |
PHP-FPM: php-fpm_ping | Zabbix agent | web.page.get["{$PHPFPM.HOST}","{$PHPFPM.PING.PAGE}","{$PHP_FPM.PORT}"] | ||
PHP-FPM: Get status page | Zabbix agent | web.page.get["{$PHPFPM.HOST}","{$PHPFPM.STATUS.PAGE}?json","{$PHP_FPM.PORT}"] Preprocessing
|
||
PHP-FPM: Ping | Dependent item | php-fpm.ping Preprocessing
|
\r?\n) 1</p><p>⛔️Custom on fail: Set value to: 0` |
|
PHP-FPM: Processes, active | The total number of active processes. |
Dependent item | php-fpm.processes_active Preprocessing
|
|
PHP-FPM: Version | The current version of the PHP. You can get it from the HTTP-Header "X-Powered-By"; it may not work if you have changed the default HTTP-headers. |
Dependent item | php-fpm.version Preprocessing
|
|
PHP-FPM: Pool name | The name of the current pool. |
Dependent item | php-fpm.name Preprocessing
|
|
PHP-FPM: Uptime | It indicates how long has this pool been running. |
Dependent item | php-fpm.uptime Preprocessing
|
|
PHP-FPM: Start time | The time when this pool was started. |
Dependent item | php-fpm.start_time Preprocessing
|
|
PHP-FPM: Processes, total | The total number of server processes currently running. |
Dependent item | php-fpm.processes_total Preprocessing
|
|
PHP-FPM: Processes, idle | The total number of idle processes. |
Dependent item | php-fpm.processes_idle Preprocessing
|
|
PHP-FPM: Queue usage | The utilization of the queue. |
Calculated | php-fpm.listenqueueusage | |
PHP-FPM: Process manager | The method used by the process manager to control the number of child processes for this pool. |
Dependent item | php-fpm.process_manager Preprocessing
|
|
PHP-FPM: Processes, max active | The highest value of "active processes" since the PHP-FPM server was started. |
Dependent item | php-fpm.processesmaxactive Preprocessing
|
|
PHP-FPM: Accepted connections per second | The number of accepted requests per second. |
Dependent item | php-fpm.conn_accepted.rate Preprocessing
|
|
PHP-FPM: Slow requests | The number of requests that has exceeded your |
Dependent item | php-fpm.slow_requests Preprocessing
|
|
PHP-FPM: Listen queue | The current number of connections that have been initiated but not yet accepted. |
Dependent item | php-fpm.listen_queue Preprocessing
|
|
PHP-FPM: Listen queue, max | The maximum number of requests in the queue of pending connections since this FPM pool was started. |
Dependent item | php-fpm.listenqueuemax Preprocessing
|
|
PHP-FPM: Listen queue, len | The size of the socket queue of pending connections. |
Dependent item | php-fpm.listenqueuelen Preprocessing
|
|
PHP-FPM: Max children reached | The number of times that |
Dependent item | php-fpm.max_children Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
PHP-FPM: Version has changed | The PHP-FPM version has changed. Acknowledge to close the problem manually. |
last(/PHP-FPM by Zabbix agent/php-fpm.version,#1)<>last(/PHP-FPM by Zabbix agent/php-fpm.version,#2) and length(last(/PHP-FPM by Zabbix agent/php-fpm.version))>0 |Info |
Manual close: Yes | |
PHP-FPM: Pool has been restarted | Uptime is less than 10 minutes. |
last(/PHP-FPM by Zabbix agent/php-fpm.uptime)<10m |Info |
Manual close: Yes | |
PHP-FPM: Queue utilization is high | The queue for this pool has reached |
min(/PHP-FPM by Zabbix agent/php-fpm.listen_queue_usage,15m) > {$PHP_FPM.QUEUE.WARN.MAX} |Warning |
||
PHP-FPM: Manager changed | The PHP-FPM manager has changed. Acknowledge to close the problem manually. |
last(/PHP-FPM by Zabbix agent/php-fpm.process_manager,#1)<>last(/PHP-FPM by Zabbix agent/php-fpm.process_manager,#2) |Info |
Manual close: Yes | |
PHP-FPM: Detected slow requests | The PHP-FPM has detected a slow request. |
min(/PHP-FPM by Zabbix agent/php-fpm.slow_requests,#3)>0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
PHP-FPM process discovery | The discovery of the PHP-FPM summary processes. |
Dependent item | php-fpm.proc.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
PHP-FPM: Get process data | The summary metrics aggregated by a process |
Dependent item | php-fpm.proc.get[{#PHP_FPM.NAME}] Preprocessing
|
PHP-FPM: Memory usage (rss) | The summary of resident set size memory used by a process |
Dependent item | php-fpm.proc.rss[{#PHP_FPM.NAME}] Preprocessing
|
PHP-FPM: Memory usage (vsize) | The summary of virtual memory used by a process |
Dependent item | php-fpm.proc.vmem[{#PHP_FPM.NAME}] Preprocessing
|
PHP-FPM: Memory usage, % | The percentage of real memory used by a process |
Dependent item | php-fpm.proc.pmem[{#PHP_FPM.NAME}] Preprocessing
|
PHP-FPM: Number of running processes | The number of running processes |
Dependent item | php-fpm.proc.num[{#PHP_FPM.NAME}] Preprocessing
|
PHP-FPM: CPU utilization | The percentage of the CPU utilization by a process |
Zabbix agent | proc.cpu.util[{#PHP_FPM.NAME}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
PHP-FPM: Process is not running | last(/PHP-FPM by Zabbix agent/php-fpm.proc.num[{#PHP_FPM.NAME}])=0 |High |
|||
PHP-FPM: Failed to fetch info data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/PHP-FPM by Zabbix agent/php-fpm.uptime,30m)=1 and last(/PHP-FPM by Zabbix agent/php-fpm.proc.num[{#PHP_FPM.NAME}])>0 |Info |
Manual close: Yes | |
PHP-FPM: Service is down | (last(/PHP-FPM by Zabbix agent/php-fpm.ping)=0 or nodata(/PHP-FPM by Zabbix agent/php-fpm.ping,3m)=1) and last(/PHP-FPM by Zabbix agent/php-fpm.proc.num[{#PHP_FPM.NAME}])>0 |High |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Template for monitoring pfSense by SNMP
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Name | Description | Default |
---|---|---|
{$IF.ERRORS.WARN} | Threshold of error packets rate for warning trigger. Can be used with interface name as context. |
2 |
{$IF.UTIL.MAX} | Threshold of interface bandwidth utilization for warning trigger in %. Can be used with interface name as context. |
90 |
{$IFCONTROL} | Macro for operational state of the interface for link down trigger. Can be used with interface name as context. |
1 |
{$NET.IF.IFADMINSTATUS.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
^.* |
{$NET.IF.IFADMINSTATUS.NOT_MATCHES} | Ignore down(2) administrative status. |
^2$ |
{$NET.IF.IFALIAS.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
.* |
{$NET.IF.IFALIAS.NOT_MATCHES} | This macro is used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
{$NET.IF.IFDESCR.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
.* |
{$NET.IF.IFDESCR.NOT_MATCHES} | This macro is used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
{$NET.IF.IFNAME.NOT_MATCHES} | This macro is used in filters of network interfaces discovery rule. |
(^pflog[0-9.]*$|^pfsync[0-9.]*$) |
{$NET.IF.IFOPERSTATUS.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
^.*$ |
{$NET.IF.IFOPERSTATUS.NOT_MATCHES} | Ignore notPresent(6). |
^6$ |
{$NET.IF.IFTYPE.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
.* |
{$NET.IF.IFTYPE.NOT_MATCHES} | This macro is used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
{$SNMP.TIMEOUT} | The time interval for SNMP availability trigger. |
5m |
{$STATE.TABLE.UTIL.MAX} | Threshold of state table utilization trigger in %. |
90 |
{$SOURCE.TRACKING.TABLE.UTIL.MAX} | Threshold of source tracking table utilization trigger in %. |
90 |
Name | Description | Type | Key and additional info |
---|---|---|---|
PFSense: SNMP agent availability | Availability of SNMP checks on the host. The value of this item corresponds to availability icons in the host list. Possible value: 0 - not available 1 - available 2 - unknown |
Zabbix internal | zabbix[host,snmp,available] |
PFSense: Packet filter running status | MIB: BEGEMOT-PF-MIB True if packet filter is currently enabled. |
SNMP agent | pfsense.pf.status |
PFSense: States table current | MIB: BEGEMOT-PF-MIB Number of entries in the state table. |
SNMP agent | pfsense.state.table.count |
PFSense: States table limit | MIB: BEGEMOT-PF-MIB Maximum number of 'keep state' rules in the ruleset. |
SNMP agent | pfsense.state.table.limit |
PFSense: States table utilization in % | Utilization of state table in %. |
Calculated | pfsense.state.table.pused |
PFSense: Source tracking table current | MIB: BEGEMOT-PF-MIB Number of entries in the source tracking table. |
SNMP agent | pfsense.source.tracking.table.count |
PFSense: Source tracking table limit | MIB: BEGEMOT-PF-MIB Maximum number of 'sticky-address' or 'source-track' rules in the ruleset. |
SNMP agent | pfsense.source.tracking.table.limit |
PFSense: Source tracking table utilization in % | Utilization of source tracking table in %. |
Calculated | pfsense.source.tracking.table.pused |
PFSense: DHCP server status | MIB: HOST-RESOURCES-MIB The status of DHCP server process. |
SNMP agent | pfsense.dhcpd.status Preprocessing
|
PFSense: DNS server status | MIB: HOST-RESOURCES-MIB The status of DNS server process. |
SNMP agent | pfsense.dns.status Preprocessing
|
PFSense: State of nginx process | MIB: HOST-RESOURCES-MIB The status of nginx process. |
SNMP agent | pfsense.nginx.status Preprocessing
|
PFSense: Packets matched a filter rule | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP agent | pfsense.packets.match Preprocessing
|
PFSense: Packets with bad offset | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP agent | pfsense.packets.bad.offset Preprocessing
|
PFSense: Fragmented packets | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP agent | pfsense.packets.fragment Preprocessing
|
PFSense: Short packets | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP agent | pfsense.packets.short Preprocessing
|
PFSense: Normalized packets | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP agent | pfsense.packets.normalize Preprocessing
|
PFSense: Packets dropped due to memory limitation | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP agent | pfsense.packets.mem.drop Preprocessing
|
PFSense: Firewall rules count | MIB: BEGEMOT-PF-MIB The number of labeled filter rules on this system. |
SNMP agent | pfsense.rules.count |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
PFSense: No SNMP data collection | SNMP is not available for polling. Please check device connectivity and SNMP settings. |
max(/PFSense by SNMP/zabbix[host,snmp,available],{$SNMP.TIMEOUT})=0 |Warning |
||
PFSense: Packet filter is not running | Please check PF status. |
last(/PFSense by SNMP/pfsense.pf.status)<>1 |High |
||
PFSense: State table usage is high | Please check the number of connections https://docs.netgate.com/pfsense/en/latest/config/advanced-firewall-nat.html#config-advanced-firewall-maxstates |
min(/PFSense by SNMP/pfsense.state.table.pused,#3)>{$STATE.TABLE.UTIL.MAX} |Warning |
||
PFSense: Source tracking table usage is high | Please check the number of sticky connections https://docs.netgate.com/pfsense/en/latest/monitoring/status/firewall-states-sources.html |
min(/PFSense by SNMP/pfsense.source.tracking.table.pused,#3)>{$SOURCE.TRACKING.TABLE.UTIL.MAX} |Warning |
||
PFSense: DHCP server is not running | Please check DHCP server settings https://docs.netgate.com/pfsense/en/latest/services/dhcp/index.html |
last(/PFSense by SNMP/pfsense.dhcpd.status)=0 |Average |
||
PFSense: DNS server is not running | Please check DNS server settings https://docs.netgate.com/pfsense/en/latest/services/dns/index.html |
last(/PFSense by SNMP/pfsense.dns.status)=0 |Average |
||
PFSense: Web server is not running | Please check nginx service status. |
last(/PFSense by SNMP/pfsense.nginx.status)=0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Network interfaces discovery | Discovering interfaces from IF-MIB. |
SNMP agent | pfsense.net.if.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound packets discarded | MIB: IF-MIB The number of inbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.in.discards[{#SNMPINDEX}] Preprocessing
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound packets with errors | MIB: IF-MIB For packet-oriented interfaces, the number of inbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of inbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.in.errors[{#SNMPINDEX}] Preprocessing
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Bits received | MIB: IF-MIB The total number of octets received on the interface, including framing characters. This object is a 64-bit version of ifInOctets. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.in[{#SNMPINDEX}] Preprocessing
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound packets discarded | MIB: IF-MIB The number of outbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.out.discards[{#SNMPINDEX}] Preprocessing
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound packets with errors | MIB: IF-MIB For packet-oriented interfaces, the number of outbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of outbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.out.errors[{#SNMPINDEX}] Preprocessing
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Bits sent | MIB: IF-MIB The total number of octets transmitted out of the interface, including framing characters. This object is a 64-bit version of ifOutOctets.Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.out[{#SNMPINDEX}] Preprocessing
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Speed | MIB: IF-MIB An estimate of the interface's current bandwidth in units of 1,000,000 bits per second. If this object reports a value of |
SNMP agent | net.if.speed[{#SNMPINDEX}] Preprocessing
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Operational status | MIB: IF-MIB The current operational state of the interface. - The testing(3) state indicates that no operational packet scan be passed - If ifAdminStatus is down(2) then ifOperStatus should be down(2) - If ifAdminStatus is changed to up(1) then ifOperStatus should change to up(1) if the interface is ready to transmit and receive network traffic - It should change todormant(5) if the interface is waiting for external actions (such as a serial line waiting for an incoming connection) - It should remain in the down(2) state if and only if there is a fault that prevents it from going to the up(1) state - It should remain in the notPresent(6) state if the interface has missing(typically, hardware) components. |
SNMP agent | net.if.status[{#SNMPINDEX}] Preprocessing
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Interface type | MIB: IF-MIB The type of interface. Additional values for ifType are assigned by the Internet Assigned Numbers Authority (IANA), through updating the syntax of the IANAifType textual convention. |
SNMP agent | net.if.type[{#SNMPINDEX}] Preprocessing
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Rules references count | MIB: BEGEMOT-PF-MIB The number of rules referencing this interface. |
SNMP agent | net.if.rules.refs[{#SNMPINDEX}] |
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 traffic passed | MIB: BEGEMOT-PF-MIB IPv4 bits per second passed coming in on this interface. |
SNMP agent | net.if.in.pass.v4.bps[{#SNMPINDEX}] Preprocessing
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 traffic blocked | MIB: BEGEMOT-PF-MIB IPv4 bits per second blocked coming in on this interface. |
SNMP agent | net.if.in.block.v4.bps[{#SNMPINDEX}] Preprocessing
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 traffic passed | MIB: BEGEMOT-PF-MIB IPv4 bits per second passed going out on this interface. |
SNMP agent | net.if.out.pass.v4.bps[{#SNMPINDEX}] Preprocessing
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 traffic blocked | MIB: BEGEMOT-PF-MIB IPv4 bits per second blocked going out on this interface. |
SNMP agent | net.if.out.block.v4.bps[{#SNMPINDEX}] Preprocessing
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 packets passed | MIB: BEGEMOT-PF-MIB The number of IPv4 packets passed coming in on this interface. |
SNMP agent | net.if.in.pass.v4.pps[{#SNMPINDEX}] Preprocessing
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 packets blocked | MIB: BEGEMOT-PF-MIB The number of IPv4 packets blocked coming in on this interface. |
SNMP agent | net.if.in.block.v4.pps[{#SNMPINDEX}] Preprocessing
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 packets passed | MIB: BEGEMOT-PF-MIB The number of IPv4 packets passed going out on this interface. |
SNMP agent | net.if.out.pass.v4.pps[{#SNMPINDEX}] Preprocessing
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 packets blocked | MIB: BEGEMOT-PF-MIB The number of IPv4 packets blocked going out on this interface. |
SNMP agent | net.if.out.block.v4.pps[{#SNMPINDEX}] Preprocessing
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 traffic passed | MIB: BEGEMOT-PF-MIB IPv6 bits per second passed coming in on this interface. |
SNMP agent | net.if.in.pass.v6.bps[{#SNMPINDEX}] Preprocessing
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 traffic blocked | MIB: BEGEMOT-PF-MIB IPv6 bits per second blocked coming in on this interface. |
SNMP agent | net.if.in.block.v6.bps[{#SNMPINDEX}] Preprocessing
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 traffic passed | MIB: BEGEMOT-PF-MIB IPv6 bits per second passed going out on this interface. |
SNMP agent | net.if.out.pass.v6.bps[{#SNMPINDEX}] Preprocessing
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 traffic blocked | MIB: BEGEMOT-PF-MIB IPv6 bits per second blocked going out on this interface. |
SNMP agent | net.if.out.block.v6.bps[{#SNMPINDEX}] Preprocessing
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 packets passed | MIB: BEGEMOT-PF-MIB The number of IPv6 packets passed coming in on this interface. |
SNMP agent | net.if.in.pass.v6.pps[{#SNMPINDEX}] Preprocessing
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 packets blocked | MIB: BEGEMOT-PF-MIB The number of IPv6 packets blocked coming in on this interface. |
SNMP agent | net.if.in.block.v6.pps[{#SNMPINDEX}] Preprocessing
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 packets passed | MIB: BEGEMOT-PF-MIB The number of IPv6 packets passed going out on this interface. |
SNMP agent | net.if.out.pass.v6.pps[{#SNMPINDEX}] Preprocessing
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 packets blocked | MIB: BEGEMOT-PF-MIB The number of IPv6 packets blocked going out on this interface. |
SNMP agent | net.if.out.block.v6.pps[{#SNMPINDEX}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: High input error rate | It recovers when it is below 80% of the |
min(/PFSense by SNMP/net.if.in.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} |Warning |
Depends on:
|
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: High inbound bandwidth usage | The utilization of the network interface is close to its estimated maximum bandwidth. |
(avg(/PFSense by SNMP/net.if.in[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])>0 |Warning |
Depends on:
|
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: High output error rate | It recovers when it is below 80% of the |
min(/PFSense by SNMP/net.if.out.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} |Warning |
Depends on:
|
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: High outbound bandwidth usage | The utilization of the network interface is close to its estimated maximum bandwidth. |
(avg(/PFSense by SNMP/net.if.out[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])>0 |Warning |
Depends on:
|
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Ethernet has changed to lower speed than it was before | This Ethernet connection has transitioned down from its known maximum speed. This might be a sign of autonegotiation issues. Acknowledge to close the problem manually. |
change(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])<0 and last(/PFSense by SNMP/net.if.speed[{#SNMPINDEX}])>0 and ( last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=6 or last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=7 or last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=11 or last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=62 or last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=69 or last(/PFSense by SNMP/net.if.type[{#SNMPINDEX}])=117 ) and (last(/PFSense by SNMP/net.if.status[{#SNMPINDEX}])<>2) |Info |
Depends on:
|
|
PFSense: Interface [{#IFNAME}({#IFALIAS})]: Link down | This trigger expression works as follows: |
{$IFCONTROL:"{#IFNAME}"}=1 and (last(/PFSense by SNMP/net.if.status[{#SNMPINDEX}])=2) |Average |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Template for monitoring OPNsense by SNMP
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Name | Description | Default |
---|---|---|
{$IF.ERRORS.WARN} | Threshold of error packets rate for warning trigger. Can be used with interface name as context. |
2 |
{$IF.UTIL.MAX} | Threshold of interface bandwidth utilization for warning trigger in %. Can be used with interface name as context. |
90 |
{$IFCONTROL} | Macro for operational state of the interface for link down trigger. Can be used with interface name as context. |
1 |
{$NET.IF.IFADMINSTATUS.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
^.* |
{$NET.IF.IFADMINSTATUS.NOT_MATCHES} | Ignore down(2) administrative status. |
^2$ |
{$NET.IF.IFALIAS.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
.* |
{$NET.IF.IFALIAS.NOT_MATCHES} | This macro is used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
{$NET.IF.IFDESCR.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
.* |
{$NET.IF.IFDESCR.NOT_MATCHES} | This macro is used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
{$NET.IF.IFNAME.NOT_MATCHES} | This macro is used in filters of network interfaces discovery rule. |
(^pflog[0-9.]*$|^pfsync[0-9.]*$) |
{$NET.IF.IFOPERSTATUS.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
^.*$ |
{$NET.IF.IFOPERSTATUS.NOT_MATCHES} | Ignore notPresent(6). |
^6$ |
{$NET.IF.IFTYPE.MATCHES} | This macro is used in filters of network interfaces discovery rule. |
.* |
{$NET.IF.IFTYPE.NOT_MATCHES} | This macro is used in filters of network interfaces discovery rule. |
CHANGE_IF_NEEDED |
{$SNMP.TIMEOUT} | The time interval for SNMP availability trigger. |
5m |
{$STATE.TABLE.UTIL.MAX} | Threshold of state table utilization trigger in %. |
90 |
{$SOURCE.TRACKING.TABLE.UTIL.MAX} | Threshold of source tracking table utilization trigger in %. |
90 |
Name | Description | Type | Key and additional info |
---|---|---|---|
OPNsense: SNMP agent availability | Availability of SNMP checks on the host. The value of this item corresponds to availability icons in the host list. Possible value: 0 - not available 1 - available 2 - unknown |
Zabbix internal | zabbix[host,snmp,available] |
OPNsense: Packet filter running status | MIB: BEGEMOT-PF-MIB True if packet filter is currently enabled. |
SNMP agent | opnsense.pf.status |
OPNsense: States table current | MIB: BEGEMOT-PF-MIB Number of entries in the state table. |
SNMP agent | opnsense.state.table.count |
OPNsense: States table limit | MIB: BEGEMOT-PF-MIB Maximum number of 'keep state' rules in the ruleset. |
SNMP agent | opnsense.state.table.limit |
OPNsense: States table utilization in % | Utilization of state table in %. |
Calculated | opnsense.state.table.pused |
OPNsense: Source tracking table current | MIB: BEGEMOT-PF-MIB Number of entries in the source tracking table. |
SNMP agent | opnsense.source.tracking.table.count |
OPNsense: Source tracking table limit | MIB: BEGEMOT-PF-MIB Maximum number of 'sticky-address' or 'source-track' rules in the ruleset. |
SNMP agent | opnsense.source.tracking.table.limit |
OPNsense: Source tracking table utilization in % | Utilization of source tracking table in %. |
Calculated | opnsense.source.tracking.table.pused |
OPNsense: DHCP server status | MIB: HOST-RESOURCES-MIB The status of DHCP server process. |
SNMP agent | opnsense.dhcpd.status Preprocessing
|
OPNsense: DNS server status | MIB: HOST-RESOURCES-MIB The status of DNS server process. |
SNMP agent | opnsense.dns.status Preprocessing
|
OPNsense: Web server status | MIB: HOST-RESOURCES-MIB The status of lighttpd process. |
SNMP agent | opnsense.lighttpd.status Preprocessing
|
OPNsense: Packets matched a filter rule | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP agent | opnsense.packets.match Preprocessing
|
OPNsense: Packets with bad offset | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP agent | opnsense.packets.bad.offset Preprocessing
|
OPNsense: Fragmented packets | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP agent | opnsense.packets.fragment Preprocessing
|
OPNsense: Short packets | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP agent | opnsense.packets.short Preprocessing
|
OPNsense: Normalized packets | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP agent | opnsense.packets.normalize Preprocessing
|
OPNsense: Packets dropped due to memory limitation | MIB: BEGEMOT-PF-MIB True if the packet was logged with the specified packet filter reason code. The known codes are: match, bad-offset, fragment, short, normalize, and memory. |
SNMP agent | opnsense.packets.mem.drop Preprocessing
|
OPNsense: Firewall rules count | MIB: BEGEMOT-PF-MIB The number of labeled filter rules on this system. |
SNMP agent | opnsense.rules.count |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
OPNsense: No SNMP data collection | SNMP is not available for polling. Please check device connectivity and SNMP settings. |
max(/OPNsense by SNMP/zabbix[host,snmp,available],{$SNMP.TIMEOUT})=0 |Warning |
||
OPNsense: Packet filter is not running | Please check PF status. |
last(/OPNsense by SNMP/opnsense.pf.status)<>1 |High |
||
OPNsense: State table usage is high | Please check the number of connections. |
min(/OPNsense by SNMP/opnsense.state.table.pused,#3)>{$STATE.TABLE.UTIL.MAX} |Warning |
||
OPNsense: Source tracking table usage is high | Please check the number of sticky connections. |
min(/OPNsense by SNMP/opnsense.source.tracking.table.pused,#3)>{$SOURCE.TRACKING.TABLE.UTIL.MAX} |Warning |
||
OPNsense: DHCP server is not running | Please check DHCP server settings. |
last(/OPNsense by SNMP/opnsense.dhcpd.status)=0 |Average |
||
OPNsense: DNS server is not running | Please check DNS server settings. |
last(/OPNsense by SNMP/opnsense.dns.status)=0 |Average |
||
OPNsense: Web server is not running | Please check lighttpd service status. |
last(/OPNsense by SNMP/opnsense.lighttpd.status)=0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Network interfaces discovery | Discovering interfaces from IF-MIB. |
SNMP agent | opnsense.net.if.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound packets discarded | MIB: IF-MIB The number of inbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.in.discards[{#SNMPINDEX}] Preprocessing
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound packets with errors | MIB: IF-MIB For packet-oriented interfaces, the number of inbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of inbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.in.errors[{#SNMPINDEX}] Preprocessing
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Bits received | MIB: IF-MIB The total number of octets received on the interface, including framing characters. This object is a 64-bit version of ifInOctets. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.in[{#SNMPINDEX}] Preprocessing
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound packets discarded | MIB: IF-MIB The number of outbound packets which were chosen to be discarded even though no errors had been detected to prevent their being deliverable to a higher-layer protocol. One possible reason for discarding such a packet could be to free up buffer space. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.out.discards[{#SNMPINDEX}] Preprocessing
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound packets with errors | MIB: IF-MIB For packet-oriented interfaces, the number of outbound packets that contained errors preventing them from being deliverable to a higher-layer protocol. For character-oriented or fixed-length interfaces, the number of outbound transmission units that contained errors preventing them from being deliverable to a higher-layer protocol. Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.out.errors[{#SNMPINDEX}] Preprocessing
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Bits sent | MIB: IF-MIB The total number of octets transmitted out of the interface, including framing characters. This object is a 64-bit version of ifOutOctets.Discontinuities in the value of this counter can occur at re-initialization of the management system, and at other times as indicated by the value of ifCounterDiscontinuityTime. |
SNMP agent | net.if.out[{#SNMPINDEX}] Preprocessing
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Speed | MIB: IF-MIB An estimate of the interface's current bandwidth in units of 1,000,000 bits per second. If this object reports a value of |
SNMP agent | net.if.speed[{#SNMPINDEX}] Preprocessing
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Operational status | MIB: IF-MIB The current operational state of the interface. - The testing(3) state indicates that no operational packet scan be passed - If ifAdminStatus is down(2) then ifOperStatus should be down(2) - If ifAdminStatus is changed to up(1) then ifOperStatus should change to up(1) if the interface is ready to transmit and receive network traffic - It should change todormant(5) if the interface is waiting for external actions (such as a serial line waiting for an incoming connection) - It should remain in the down(2) state if and only if there is a fault that prevents it from going to the up(1) state - It should remain in the notPresent(6) state if the interface has missing(typically, hardware) components. |
SNMP agent | net.if.status[{#SNMPINDEX}] Preprocessing
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Interface type | MIB: IF-MIB The type of interface. Additional values for ifType are assigned by the Internet Assigned Numbers Authority (IANA), through updating the syntax of the IANAifType textual convention. |
SNMP agent | net.if.type[{#SNMPINDEX}] Preprocessing
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Rules references count | MIB: BEGEMOT-PF-MIB The number of rules referencing this interface. |
SNMP agent | net.if.rules.refs[{#SNMPINDEX}] |
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 traffic passed | MIB: BEGEMOT-PF-MIB IPv4 bits per second passed coming in on this interface. |
SNMP agent | net.if.in.pass.v4.bps[{#SNMPINDEX}] Preprocessing
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 traffic blocked | MIB: BEGEMOT-PF-MIB IPv4 bits per second blocked coming in on this interface. |
SNMP agent | net.if.in.block.v4.bps[{#SNMPINDEX}] Preprocessing
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 traffic passed | MIB: BEGEMOT-PF-MIB IPv4 bits per second passed going out on this interface. |
SNMP agent | net.if.out.pass.v4.bps[{#SNMPINDEX}] Preprocessing
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 traffic blocked | MIB: BEGEMOT-PF-MIB IPv4 bits per second blocked going out on this interface. |
SNMP agent | net.if.out.block.v4.bps[{#SNMPINDEX}] Preprocessing
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 packets passed | MIB: BEGEMOT-PF-MIB The number of IPv4 packets passed coming in on this interface. |
SNMP agent | net.if.in.pass.v4.pps[{#SNMPINDEX}] Preprocessing
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv4 packets blocked | MIB: BEGEMOT-PF-MIB The number of IPv4 packets blocked coming in on this interface. |
SNMP agent | net.if.in.block.v4.pps[{#SNMPINDEX}] Preprocessing
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 packets passed | MIB: BEGEMOT-PF-MIB The number of IPv4 packets passed going out on this interface. |
SNMP agent | net.if.out.pass.v4.pps[{#SNMPINDEX}] Preprocessing
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv4 packets blocked | MIB: BEGEMOT-PF-MIB The number of IPv4 packets blocked going out on this interface. |
SNMP agent | net.if.out.block.v4.pps[{#SNMPINDEX}] Preprocessing
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 traffic passed | MIB: BEGEMOT-PF-MIB IPv6 bits per second passed coming in on this interface. |
SNMP agent | net.if.in.pass.v6.bps[{#SNMPINDEX}] Preprocessing
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 traffic blocked | MIB: BEGEMOT-PF-MIB IPv6 bits per second blocked coming in on this interface. |
SNMP agent | net.if.in.block.v6.bps[{#SNMPINDEX}] Preprocessing
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 traffic passed | MIB: BEGEMOT-PF-MIB IPv6 bits per second passed going out on this interface. |
SNMP agent | net.if.out.pass.v6.bps[{#SNMPINDEX}] Preprocessing
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 traffic blocked | MIB: BEGEMOT-PF-MIB IPv6 bits per second blocked going out on this interface. |
SNMP agent | net.if.out.block.v6.bps[{#SNMPINDEX}] Preprocessing
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 packets passed | MIB: BEGEMOT-PF-MIB The number of IPv6 packets passed coming in on this interface. |
SNMP agent | net.if.in.pass.v6.pps[{#SNMPINDEX}] Preprocessing
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Inbound IPv6 packets blocked | MIB: BEGEMOT-PF-MIB The number of IPv6 packets blocked coming in on this interface. |
SNMP agent | net.if.in.block.v6.pps[{#SNMPINDEX}] Preprocessing
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 packets passed | MIB: BEGEMOT-PF-MIB The number of IPv6 packets passed going out on this interface. |
SNMP agent | net.if.out.pass.v6.pps[{#SNMPINDEX}] Preprocessing
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Outbound IPv6 packets blocked | MIB: BEGEMOT-PF-MIB The number of IPv6 packets blocked going out on this interface. |
SNMP agent | net.if.out.block.v6.pps[{#SNMPINDEX}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: High input error rate | It recovers when it is below 80% of the |
min(/OPNsense by SNMP/net.if.in.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} |Warning |
Depends on:
|
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: High inbound bandwidth usage | The utilization of the network interface is close to its estimated maximum bandwidth. |
(avg(/OPNsense by SNMP/net.if.in[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])>0 |Warning |
Depends on:
|
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: High output error rate | It recovers when it is below 80% of the |
min(/OPNsense by SNMP/net.if.out.errors[{#SNMPINDEX}],5m)>{$IF.ERRORS.WARN:"{#IFNAME}"} |Warning |
Depends on:
|
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: High outbound bandwidth usage | The utilization of the network interface is close to its estimated maximum bandwidth. |
(avg(/OPNsense by SNMP/net.if.out[{#SNMPINDEX}],15m)>({$IF.UTIL.MAX:"{#IFNAME}"}/100)*last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])) and last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])>0 |Warning |
Depends on:
|
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Ethernet has changed to lower speed than it was before | This Ethernet connection has transitioned down from its known maximum speed. This might be a sign of autonegotiation issues. Acknowledge to close the problem manually. |
change(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])<0 and last(/OPNsense by SNMP/net.if.speed[{#SNMPINDEX}])>0 and ( last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=6 or last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=7 or last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=11 or last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=62 or last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=69 or last(/OPNsense by SNMP/net.if.type[{#SNMPINDEX}])=117 ) and (last(/OPNsense by SNMP/net.if.status[{#SNMPINDEX}])<>2) |Info |
Depends on:
|
|
OPNsense: Interface [{#IFNAME}({#IFALIAS})]: Link down | This trigger expression works as follows: |
{$IFCONTROL:"{#IFNAME}"}=1 and (last(/OPNsense by SNMP/net.if.status[{#SNMPINDEX}])=2) |Average |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of OpenWeatherMap monitoring by Zabbix via HTTP and doesn't require any external scripts.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Create a host.
Link the template to the host.
Customize the values of {$OPENWEATHERMAP.API.TOKEN} and {$LOCATION} macros.
OpenWeatherMap API Tokens are available in your OpenWeatherMap account https://home.openweathermap.org/api_keys.
Locations can be set by few ways:
|
delimiter.
For example: 43.81821,7.76115|Riga|2643743|94040,us
.
Please note that API requests by city name, zip-codes and city id will be deprecated soon.Language and units macros can be customized too if necessary. List of available languages: https://openweathermap.org/current#multi. Available units of measurement are: standard, metric and imperial https://openweathermap.org/current#data.
Name | Description | Default |
---|---|---|
{$OPENWEATHERMAP.API.TOKEN} | Specify openweathermap API key. |
|
{$LANG} | List of available languages https://openweathermap.org/current#multi. |
en |
{$LOCATION} | Locations can be set by few ways: 1. by geo coordinates (for example: 56.95,24.0833) 2. by location name (for example: Riga) 3. by location ID. Link to the list of city ID: http://bulk.openweathermap.org/sample/city.list.json.gz 4. by zip/post code with a country code (for example: 94040,us) A few locations can be added to the macro at the same time by For example: Please note that API requests by city name, zip-codes and city id will be deprecated soon. |
Riga |
{$OPENWEATHERMAP.API.ENDPOINT} | OpenWeatherMap API endpoint. |
api.openweathermap.org/data/2.5/weather? |
{$UNITS} | Available units of measurement are standard, metric and imperial https://openweathermap.org/current#data. |
metric |
{$OPENWEATHERMAP.DATA.TIMEOUT} | Response timeout for OpenWeatherMap API. |
3s |
{$TEMP.CRIT.HIGH} | Threshold for high temperature trigger. |
30 |
{$TEMP.CRIT.LOW} | Threshold for low temperature trigger. |
-20 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Openweathermap: Get data | JSON array with result of OpenWeatherMap API requests. |
Script | openweathermap.get.data |
Openweathermap: Get data collection errors | Errors from get data requests by script item. |
Dependent item | openweathermap.get.errors Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Openweathermap: There are errors in requests to OpenWeatherMap API | Zabbix has received errors in requests to OpenWeatherMap API. |
length(last(/OpenWeatherMap by HTTP/openweathermap.get.errors))>0 |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Locations discovery | Weather metrics discovery by location. |
Dependent item | openweathermap.locations.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
[{#LOCATION}, {#COUNTRY}]: Data | JSON with result of OpenWeatherMap API request by location. |
Dependent item | openweathermap.location.data[{#ID}] Preprocessing
|
[{#LOCATION}, {#COUNTRY}]: Atmospheric pressure | Atmospheric pressure in Pa. |
Dependent item | openweathermap.pressure[{#ID}] Preprocessing
|
[{#LOCATION}, {#COUNTRY}]: Cloudiness | Cloudiness in %. |
Dependent item | openweathermap.clouds[{#ID}] Preprocessing
|
[{#LOCATION}, {#COUNTRY}]: Humidity | Humidity in %. |
Dependent item | openweathermap.humidity[{#ID}] Preprocessing
|
[{#LOCATION}, {#COUNTRY}]: Rain volume for the last one hour | Rain volume for the lat one hour in m. |
Dependent item | openweathermap.rain[{#ID}] Preprocessing
|
[{#LOCATION}, {#COUNTRY}]: Short weather status | Short weather status description. |
Dependent item | openweathermap.description[{#ID}] Preprocessing
|
[{#LOCATION}, {#COUNTRY}]: Snow volume for the last one hour | Snow volume for the lat one hour in m. |
Dependent item | openweathermap.snow[{#ID}] Preprocessing
|
[{#LOCATION}, {#COUNTRY}]: Temperature | Atmospheric temperature value. |
Dependent item | openweathermap.temp[{#ID}] Preprocessing
|
[{#LOCATION}, {#COUNTRY}]: Visibility | Visibility in m. |
Dependent item | openweathermap.visibility[{#ID}] Preprocessing
|
[{#LOCATION}, {#COUNTRY}]: Wind direction | Wind direction in degrees. |
Dependent item | openweathermap.wind.direction[{#ID}] Preprocessing
|
[{#LOCATION}, {#COUNTRY}]: Wind speed | Wind speed value. |
Dependent item | openweathermap.wind.speed[{#ID}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
[{#LOCATION}, {#COUNTRY}]: Temperature is too high | Temperature value is too high. |
min(/OpenWeatherMap by HTTP/openweathermap.temp[{#ID}],#3)>{$TEMP.CRIT.HIGH} |Average |
Manual close: Yes | |
[{#LOCATION}, {#COUNTRY}]: Temperature is too low | Temperature value is too low. |
max(/OpenWeatherMap by HTTP/openweathermap.temp[{#ID}],#3)<{$TEMP.CRIT.LOW} |Average |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed to monitor HashiCorp Nomad by Zabbix. It works without any external scripts. Currently the template supports Nomad servers and clients discovery.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
{$NOMAD.ENDPOINT.API.URL}
macro value with correct web protocol, host and port.node:read
, namespace:read-job
, agent:read
and management
permissions applied. Define the {$NOMAD.TOKEN}
macro value.
> Refer to the vendor documentation about Nomad native ACL
or Nomad Vault-generated tokens
if you have the HashiCorp Vault integration configured.Additional information:
Useful links
Name | Description | Default |
---|---|---|
{$NOMAD.ENDPOINT.API.URL} | API endpoint URL for one of the Nomad cluster members. |
http://localhost:4646 |
{$NOMAD.TOKEN} | Nomad authentication token. |
<PUT YOUR AUTH TOKEN> |
{$NOMAD.DATA.TIMEOUT} | Response timeout for an API. |
15s |
{$NOMAD.HTTP.PROXY} | Sets the HTTP proxy for script and HTTP agent items. If this parameter is empty, then no proxy is used. |
|
{$NOMAD.API.RESPONSE.SUCCESS} | HTTP API successful response code. Availability triggers threshold. Change, if needed. |
200 |
{$NOMAD.SERVER.NAME.MATCHES} | The filter to include HashiCorp Nomad servers by name. |
.* |
{$NOMAD.SERVER.NAME.NOT_MATCHES} | The filter to exclude HashiCorp Nomad servers by name. |
CHANGE_IF_NEEDED |
{$NOMAD.SERVER.DC.MATCHES} | The filter to include HashiCorp Nomad servers by datacenter belonging. |
.* |
{$NOMAD.SERVER.DC.NOT_MATCHES} | The filter to exclude HashiCorp Nomad servers by datacenter belonging. |
CHANGE_IF_NEEDED |
{$NOMAD.CLIENT.NAME.MATCHES} | The filter to include HashiCorp Nomad clients by name. |
.* |
{$NOMAD.CLIENT.NAME.NOT_MATCHES} | The filter to exclude HashiCorp Nomad clients by name. |
CHANGE_IF_NEEDED |
{$NOMAD.CLIENT.DC.MATCHES} | The filter to include HashiCorp Nomad clients by datacenter belonging. |
.* |
{$NOMAD.CLIENT.DC.NOT_MATCHES} | The filter to exclude HashiCorp Nomad clients by datacenter belonging. |
CHANGE_IF_NEEDED |
{$NOMAD.CLIENT.SCHEDULE.ELIGIBILITY.MATCHES} | The filter to include HashiCorp Nomad clients by scheduling eligibility. |
.* |
{$NOMAD.CLIENT.SCHEDULE.ELIGIBILITY.NOT_MATCHES} | The filter to exclude HashiCorp Nomad clients by scheduling eligibility. |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
HashiCorp Nomad: Nomad clients get | Nomad clients data in raw format. |
HTTP agent | nomad.client.nodes.get Preprocessing
|
HashiCorp Nomad: Client nodes API response | Client nodes API response message. |
Dependent item | nomad.client.nodes.api.response Preprocessing
|
HashiCorp Nomad: Nomad servers get | Nomad servers data in raw format. |
Script | nomad.server.nodes.get |
HashiCorp Nomad: Server-related APIs response | Server-related ( |
Dependent item | nomad.server.api.response Preprocessing
|
HashiCorp Nomad: Region | Current cluster region. |
Dependent item | nomad.region Preprocessing
|
HashiCorp Nomad: Nomad servers count | Nomad servers count. |
Dependent item | nomad.servers.count Preprocessing
|
HashiCorp Nomad: Nomad clients count | Nomad clients count. |
Dependent item | nomad.clients.count Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HashiCorp Nomad: Client nodes API connection has failed | Client nodes API connection has failed. |
find(/HashiCorp Nomad by HTTP/nomad.client.nodes.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0 |Average |
Manual close: Yes | |
HashiCorp Nomad: Server-related API connection has failed | Server-related API connection has failed. |
find(/HashiCorp Nomad by HTTP/nomad.server.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0 |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Clients discovery | Client nodes discovery. |
Dependent item | nomad.clients.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Servers discovery | Server nodes discovery. |
Dependent item | nomad.servers.discovery Preprocessing
|
This template is designed to monitor HashiCorp Nomad clients by Zabbix. It works without any external scripts.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
vendor documentation
.node:read
, namespace:read-job
permissions applied. Define the {$NOMAD.TOKEN}
macro value.
> Refer to the vendor documentation about Nomad native ACL
or Nomad Vault-generated tokens
if you're using integration with HashiCorp Vault.{$NOMAD.CLIENT.API.SCHEME}
and {$NOMAD.CLIENT.API.PORT}
macros to define the common Nomad API web schema and connection port.Additional information:
You have to prepare an additional ACL token only if you wish to monitor Nomad clients as separate entities. If you're using clients discovery - token will be inherited from the master host linked to the HashiCorp Nomad by HTTP template.
If you're not using ACL - skip 2nd setup step.
The Nomad clients use the default web schema - HTTP
and default API port - 4646
. If you're using clients discovery and you need to re-define macros for the particular host created from prototype, use the context macros like {{$NOMAD.CLIENT.API.SCHEME:NECESSARY.IP
}} or/and {{$NOMAD.CLIENT.API.PORT:NECESSARY.IP
}} on master host or template level.
Useful links:
Name | Description | Default |
---|---|---|
{$NOMAD.CLIENT.API.SCHEME} | Nomad client API scheme. |
http |
{$NOMAD.CLIENT.API.PORT} | Nomad client API port. |
4646 |
{$NOMAD.TOKEN} | Nomad authentication token. |
<PUT YOUR AUTH TOKEN> |
{$NOMAD.DATA.TIMEOUT} | Response timeout for an API. |
15s |
{$NOMAD.HTTP.PROXY} | Sets the HTTP proxy for HTTP agent item. If this parameter is empty, then no proxy is used. |
|
{$NOMAD.API.RESPONSE.SUCCESS} | HTTP API successful response code. Availability triggers threshold. Change, if needed. |
200 |
{$NOMAD.CLIENT.RPC.PORT} | Nomad RPC service port. |
4647 |
{$NOMAD.CLIENT.SERF.PORT} | Nomad serf service port. |
4648 |
{$NOMAD.CLIENT.OPEN.FDS.MAX.WARN} | Maximum percentage of used file descriptors. |
90 |
{$NOMAD.DISK.NAME.MATCHES} | The filter to include HashiCorp Nomad client disks by name. |
.* |
{$NOMAD.DISK.NAME.NOT_MATCHES} | The filter to exclude HashiCorp Nomad client disks by name. |
CHANGE_IF_NEEDED |
{$NOMAD.JOB.NAME.MATCHES} | The filter to include HashiCorp Nomad client jobs by name. |
.* |
{$NOMAD.JOB.NAME.NOT_MATCHES} | The filter to exclude HashiCorp Nomad client jobs by name. |
CHANGE_IF_NEEDED |
{$NOMAD.JOB.NAMESPACE.MATCHES} | The filter to include HashiCorp Nomad client jobs by namespace. |
.* |
{$NOMAD.JOB.NAMESPACE.NOT_MATCHES} | The filter to exclude HashiCorp Nomad client jobs by namespace. |
CHANGE_IF_NEEDED |
{$NOMAD.JOB.TYPE.MATCHES} | The filter to include HashiCorp Nomad client jobs by type. |
.* |
{$NOMAD.JOB.TYPE.NOT_MATCHES} | The filter to exclude HashiCorp Nomad client jobs by type. |
CHANGE_IF_NEEDED |
{$NOMAD.JOB.TASK.GROUP.MATCHES} | The filter to include HashiCorp Nomad client jobs by task group belonging. |
.* |
{$NOMAD.JOB.TASK.GROUP.NOT_MATCHES} | The filter to exclude HashiCorp Nomad client jobs by task group belonging. |
CHANGE_IF_NEEDED |
{$NOMAD.DRIVER.NAME.MATCHES} | The filter to include HashiCorp Nomad client drivers by name. |
.* |
{$NOMAD.DRIVER.NAME.NOT_MATCHES} | The filter to exclude HashiCorp Nomad client drivers by name. |
CHANGE_IF_NEEDED |
{$NOMAD.DRIVER.DETECT.MATCHES} | The filter to include HashiCorp Nomad client drivers by detection state. Possible filtering values: |
.* |
{$NOMAD.DRIVER.DETECT.NOT_MATCHES} | The filter to exclude HashiCorp Nomad client drivers by detection state. Possible filtering values: |
CHANGE_IF_NEEDED |
{$NOMAD.CPU.UTIL.MIN} | CPU utilization threshold. Measured as a percentage. |
90 |
{$NOMAD.RAM.AVAIL.MIN} | CPU utilization threshold. Measured as a percentage. |
5 |
{$NOMAD.INODES.FREE.MIN.WARN} | Warning threshold of the filesystem metadata utilization. Measured as a percentage. |
20 |
{$NOMAD.INODES.FREE.MIN.CRIT} | Critical threshold of the filesystem metadata utilization. Measured as a percentage. |
10 |
Name | Description | Type | Key and additional info |
---|---|---|---|
HashiCorp Nomad Client: Telemetry get | Telemetry data in raw format. |
HTTP agent | nomad.client.data.get Preprocessing
|
HashiCorp Nomad Client: Metrics | Nomad client metrics in raw format. |
Dependent item | nomad.client.metrics.get Preprocessing
|
HashiCorp Nomad Client: Monitoring API response | Monitoring API response message. |
Dependent item | nomad.client.data.api.response Preprocessing
|
HashiCorp Nomad Client: Service [rpc] state | Current [rpc] service state. |
Simple check | net.tcp.service[tcp,,{$NOMAD.CLIENT.RPC.PORT}] Preprocessing
|
HashiCorp Nomad Client: Service [serf] state | Current [serf] service state. |
Simple check | net.tcp.service[tcp,,{$NOMAD.CLIENT.SERF.PORT}] Preprocessing
|
HashiCorp Nomad Client: CPU allocated | Total amount of CPU shares the scheduler has allocated to tasks. |
Dependent item | nomad.client.allocated.cpu Preprocessing
|
HashiCorp Nomad Client: CPU unallocated | Total amount of CPU shares free for the scheduler to allocate to tasks. |
Dependent item | nomad.client.unallocated.cpu Preprocessing
|
HashiCorp Nomad Client: Memory allocated | Total amount of memory the scheduler has allocated to tasks. |
Dependent item | nomad.client.allocated.memory Preprocessing
|
HashiCorp Nomad Client: Memory unallocated | Total amount of memory free for the scheduler to allocate to tasks. |
Dependent item | nomad.client.unallocated.memory Preprocessing
|
HashiCorp Nomad Client: Disk allocated | Total amount of disk space the scheduler has allocated to tasks. |
Dependent item | nomad.client.allocated.disk Preprocessing
|
HashiCorp Nomad Client: Disk unallocated | Total amount of disk space free for the scheduler to allocate to tasks. |
Dependent item | nomad.client.unallocated.disk Preprocessing
|
HashiCorp Nomad Client: Allocations blocked | Number of allocations waiting for previous versions. |
Dependent item | nomad.client.allocations.blocked Preprocessing
|
HashiCorp Nomad Client: Allocations migrating | Number of allocations migrating data from previous versions. |
Dependent item | nomad.client.allocations.migrating Preprocessing
|
HashiCorp Nomad Client: Allocations pending | Number of allocations pending (received by the client but not yet running). |
Dependent item | nomad.client.allocations.pending Preprocessing
|
HashiCorp Nomad Client: Allocations starting | Number of allocations starting. |
Dependent item | nomad.client.allocations.start Preprocessing
|
HashiCorp Nomad Client: Allocations running | Number of allocations running. |
Dependent item | nomad.client.allocations.running Preprocessing
|
HashiCorp Nomad Client: Allocations terminal | Number of allocations terminal. |
Dependent item | nomad.client.allocations.terminal Preprocessing
|
HashiCorp Nomad Client: Allocations failed, rate | Number of allocations failed. |
Dependent item | nomad.client.allocations.failed Preprocessing
|
HashiCorp Nomad Client: Allocations completed, rate | Number of allocations completed. |
Dependent item | nomad.client.allocations.complete Preprocessing
|
HashiCorp Nomad Client: Allocations restarted, rate | Number of allocations restarted. |
Dependent item | nomad.client.allocations.restart Preprocessing
|
HashiCorp Nomad Client: Allocations OOM killed | Number of allocations OOM killed. |
Dependent item | nomad.client.allocations.oom_killed Preprocessing
|
HashiCorp Nomad Client: CPU idle utilization | CPU utilization in idle state. |
Dependent item | nomad.client.cpu.idle Preprocessing
|
HashiCorp Nomad Client: CPU system utilization | CPU utilization in system space. |
Dependent item | nomad.client.cpu.system Preprocessing
|
HashiCorp Nomad Client: CPU total utilization | Total CPU utilization. |
Dependent item | nomad.client.cpu.total Preprocessing
|
HashiCorp Nomad Client: CPU user utilization | CPU utilization in user space. |
Dependent item | nomad.client.cpu.user Preprocessing
|
HashiCorp Nomad Client: Memory available | Total amount of memory available to processes which includes free and cached memory. |
Dependent item | nomad.client.memory.available Preprocessing
|
HashiCorp Nomad Client: Memory free | Amount of memory which is free. |
Dependent item | nomad.client.memory.free Preprocessing
|
HashiCorp Nomad Client: Memory size | Total amount of physical memory on the node. |
Dependent item | nomad.client.memory.total Preprocessing
|
HashiCorp Nomad Client: Memory used | Amount of memory used by processes. |
Dependent item | nomad.client.memory.used Preprocessing
|
HashiCorp Nomad Client: Uptime | Uptime of the host running the Nomad client. |
Dependent item | nomad.client.uptime Preprocessing
|
HashiCorp Nomad Client: Node info get | Node info data in raw format. |
HTTP agent | nomad.client.node.info.get Preprocessing
|
HashiCorp Nomad Client: Nomad client version | Nomad client version. |
Dependent item | nomad.client.version Preprocessing
|
HashiCorp Nomad Client: Nodes API response | Nodes API response message. |
Dependent item | nomad.client.node.info.api.response Preprocessing
|
HashiCorp Nomad Client: Allocated jobs get | Allocated jobs data in raw format. |
HTTP agent | nomad.client.job.allocs.get Preprocessing
|
HashiCorp Nomad Client: Allocations API response | Allocations API response message. |
Dependent item | nomad.client.job.allocs.api.response Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HashiCorp Nomad Client: Monitoring API connection has failed | Monitoring API connection has failed. |
find(/HashiCorp Nomad Client by HTTP/nomad.client.data.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0 |Average |
Manual close: Yes | |
HashiCorp Nomad Client: Service [rpc] is down | Cannot establish the connection to [rpc] service port {$NOMAD.CLIENT.RPC.PORT}. |
last(/HashiCorp Nomad Client by HTTP/net.tcp.service[tcp,,{$NOMAD.CLIENT.RPC.PORT}]) = 0 |Average |
Manual close: Yes | |
HashiCorp Nomad Client: Service [serf] is down | Cannot establish the connection to [serf] service port {$NOMAD.CLIENT.SERF.PORT}. |
last(/HashiCorp Nomad Client by HTTP/net.tcp.service[tcp,,{$NOMAD.CLIENT.SERF.PORT}]) = 0 |Average |
Manual close: Yes | |
HashiCorp Nomad Client: OOM killed allocations found | OOM killed allocations found. |
last(/HashiCorp Nomad Client by HTTP/nomad.client.allocations.oom_killed) > 0 |Warning |
Manual close: Yes | |
HashiCorp Nomad Client: High CPU utilization | CPU utilization is too high. The system might be slow to respond. |
min(/HashiCorp Nomad Client by HTTP/nomad.client.cpu.total, 10m) >= {$NOMAD.CPU.UTIL.MIN} |Average |
||
HashiCorp Nomad Client: High memory utilization | RAM utilization is too high. The system might be slow to respond. |
(min(/HashiCorp Nomad Client by HTTP/nomad.client.memory.available, 10m) / last(/HashiCorp Nomad Client by HTTP/nomad.client.memory.total))*100 <= {$NOMAD.RAM.AVAIL.MIN} |Average |
||
HashiCorp Nomad Client: The host has been restarted | The host uptime is less than 10 minutes. |
last(/HashiCorp Nomad Client by HTTP/nomad.client.uptime) < 10m |Warning |
Manual close: Yes | |
HashiCorp Nomad Client: Nomad client version has changed | Nomad client version has changed. |
change(/HashiCorp Nomad Client by HTTP/nomad.client.version)<>0 |Info |
Manual close: Yes | |
HashiCorp Nomad Client: Nodes API connection has failed | Nodes API connection has failed. |
find(/HashiCorp Nomad Client by HTTP/nomad.client.node.info.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0 |Average |
Manual close: Yes Depends on:
|
|
HashiCorp Nomad Client: Allocations API connection has failed | Allocations API connection has failed. |
find(/HashiCorp Nomad Client by HTTP/nomad.client.job.allocs.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0 |Average |
Manual close: Yes Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Drivers discovery | Client drivers discovery. |
Dependent item | nomad.client.drivers.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
HashiCorp Nomad Client: Driver [{#DRIVER.NAME}] state | Driver [{#DRIVER.NAME}] state. |
Dependent item | nomad.client.driver.state["{#DRIVER.NAME}"] Preprocessing
|
HashiCorp Nomad Client: Driver [{#DRIVER.NAME}] detection state | Driver [{#DRIVER.NAME}] detection state. |
Dependent item | nomad.client.driver.detected["{#DRIVER.NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HashiCorp Nomad Client: Driver [{#DRIVER.NAME}] is in unhealthy state | The [{#DRIVER.NAME}] driver detected, but its state is unhealthy. |
last(/HashiCorp Nomad Client by HTTP/nomad.client.driver.state["{#DRIVER.NAME}"]) = 0 and last(/HashiCorp Nomad Client by HTTP/nomad.client.driver.detected["{#DRIVER.NAME}"]) = 1 |Warning |
Manual close: Yes | |
HashiCorp Nomad Client: Driver [{#DRIVER.NAME}] detection state has changed | The [{#DRIVER.NAME}] driver detection state has changed. |
change(/HashiCorp Nomad Client by HTTP/nomad.client.driver.detected["{#DRIVER.NAME}"]) <> 0 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Physical disks discovery | Physical disks discovery. |
Dependent item | nomad.client.disk.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
HashiCorp Nomad Client: Disk ["{#DEV.NAME}"] space available | Amount of space which is available on ["{#DEV.NAME}"] disk. |
Dependent item | nomad.client.disk.available["{#DEV.NAME}"] Preprocessing
|
HashiCorp Nomad Client: Disk ["{#DEV.NAME}"] inodes utilization | Disk space consumed by the inodes on ["{#DEV.NAME}"] disk. |
Dependent item | nomad.client.disk.inodes_percent["{#DEV.NAME}"] Preprocessing
|
HashiCorp Nomad Client: Disk ["{#DEV.NAME}"] size | Total size of the ["{#DEV.NAME}"] device. |
Dependent item | nomad.client.disk.size["{#DEV.NAME}"] Preprocessing
|
HashiCorp Nomad Client: Disk ["{#DEV.NAME}"] space utilization | Percentage of disk ["{#DEV.NAME}"] space used. |
Dependent item | nomad.client.disk.used_percent["{#DEV.NAME}"] Preprocessing
|
HashiCorp Nomad Client: Disk ["{#DEV.NAME}"] space used | Amount of disk ["{#DEV.NAME}"] space which has been used. |
Dependent item | nomad.client.disk.used["{#DEV.NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HashiCorp Nomad Client: Running out of free inodes on [{#DEV.NAME}] device | It may become impossible to write to a disk if there are no index nodes left. |
min(/HashiCorp Nomad Client by HTTP/nomad.client.disk.inodes_percent["{#DEV.NAME}"],5m) >= {$NOMAD.INODES.FREE.MIN.WARN:"{#DEV.NAME}"} |Warning |
Manual close: Yes Depends on:
|
|
HashiCorp Nomad Client: Running out of free inodes on [{#DEV.NAME}] device | It may become impossible to write to a disk if there are no index nodes left. |
min(/HashiCorp Nomad Client by HTTP/nomad.client.disk.inodes_percent["{#DEV.NAME}"],5m) >= {$NOMAD.INODES.FREE.MIN.CRIT:"{#DEV.NAME}"} |Average |
Manual close: Yes | |
HashiCorp Nomad Client: High disk [{#DEV.NAME}] utilization | High disk [{#DEV.NAME}] utilization. |
min(/HashiCorp Nomad Client by HTTP/nomad.client.disk.used_percent["{#DEV.NAME}"],5m) >= {$NOMAD.DISK.UTIL.MIN.WARN:"{#DEV.NAME}"} |Warning |
Manual close: Yes Depends on:
|
|
HashiCorp Nomad Client: High disk [{#DEV.NAME}] utilization | High disk [{#DEV.NAME}] utilization. |
min(/HashiCorp Nomad Client by HTTP/nomad.client.disk.used_percent["{#DEV.NAME}"],5m) >= {$NOMAD.DISK.UTIL.MIN.CRIT:"{#DEV.NAME}"} |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Allocated jobs discovery | Allocated jobs discovery. |
Dependent item | nomad.client.alloc.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
HashiCorp Nomad Client: Job ["{#JOB.NAME}"] CPU allocated | Total CPU resources allocated by the ["{#JOB.NAME}"] job across all cores. |
Dependent item | nomad.client.allocs.cpu.allocated["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing
|
HashiCorp Nomad Client: Job ["{#JOB.NAME}"] CPU system utilization | Total CPU resources consumed by the ["{#JOB.NAME}"] job in system space. |
Dependent item | nomad.client.allocs.cpu.system["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing
|
HashiCorp Nomad Client: Job ["{#JOB.NAME}"] CPU user utilization | Total CPU resources consumed by the ["{#JOB.NAME}"] job in user space. |
Dependent item | nomad.client.allocs.cpu.user["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing
|
HashiCorp Nomad Client: Job ["{#JOB.NAME}"] CPU total utilization | Total CPU resources consumed by the ["{#JOB.NAME}"] job across all cores. |
Dependent item | nomad.client.allocs.cpu.total_percent["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing
|
HashiCorp Nomad Client: Job ["{#JOB.NAME}"] CPU throttled periods time | Total number of CPU periods that the ["{#JOB.NAME}"] job was throttled. |
Dependent item | nomad.client.allocs.cpu.throttled_periods["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing
|
HashiCorp Nomad Client: Job ["{#JOB.NAME}"] CPU throttled time | Total time that the ["{#JOB.NAME}"] job was throttled. |
Dependent item | nomad.client.allocs.cpu.throttled_time["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing
|
HashiCorp Nomad Client: Job ["{#JOB.NAME}"] CPU ticks | CPU ticks consumed by the process for the ["{#JOB.NAME}"] job in the last collection interval. |
Dependent item | nomad.client.allocs.cpu.total_ticks["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing
|
HashiCorp Nomad Client: Job ["{#JOB.NAME}"] Memory allocated | Amount of memory allocated by the ["{#JOB.NAME}"] job. |
Dependent item | nomad.client.allocs.memory.allocated["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing
|
HashiCorp Nomad Client: Job ["{#JOB.NAME}"] Memory cached | Amount of memory cached by the ["{#JOB.NAME}"] job. |
Dependent item | nomad.client.allocs.memory.cache["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing
|
HashiCorp Nomad Client: Job ["{#JOB.NAME}"] Memory used | Total amount of memory used by the ["{#JOB.NAME}"] job. |
Dependent item | nomad.client.allocs.memory.usage["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing
|
HashiCorp Nomad Client: Job ["{#JOB.NAME}"] Memory swapped | Amount of memory swapped by the ["{#JOB.NAME}"] job. |
Dependent item | nomad.client.allocs.memory.swap["{#JOB.NAME}","{#JOB.TASK.GROUP}","{#JOB.NAMESPACE}"] Preprocessing
|
This template is designed to monitor HashiCorp Nomad servers by Zabbix. It works without any external scripts.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
vendor documentation
.{$NOMAD.SERVER.API.SCHEME}
and {$NOMAD.SERVER.API.PORT}
macros to define the common Nomad API web schema and connection port.Additional information:
HTTP
and default API port - 4646
. If you're using servers discovery and you need to re-define macros for the particular host created from prototype, use the context macros like {{$NOMAD.SERVER.API.SCHEME:NECESSARY.IP
}} or/and {{$NOMAD.SERVER.API.PORT:NECESSARY.IP
}} on master host or template level.{$NOMAD.REDUNDANCY.MIN}
macro value, based on your cluster nodes amount to configure the failure tolerance triggers correctly.Useful links:
Name | Description | Default |
---|---|---|
{$NOMAD.SERVER.API.SCHEME} | Nomad SERVER API scheme. |
http |
{$NOMAD.SERVER.API.PORT} | Nomad SERVER API port. |
4646 |
{$NOMAD.TOKEN} | Nomad authentication token. |
<PUT YOUR AUTH TOKEN> |
{$NOMAD.DATA.TIMEOUT} | Response timeout for an API. |
15s |
{$NOMAD.HTTP.PROXY} | Sets the HTTP proxy for HTTP agent item. If this parameter is empty, then no proxy is used. |
|
{$NOMAD.API.RESPONSE.SUCCESS} | HTTP API successful response code. Availability triggers threshold. Change, if needed. |
200 |
{$NOMAD.SERVER.RPC.PORT} | Nomad RPC service port. |
4647 |
{$NOMAD.SERVER.SERF.PORT} | Nomad serf service port. |
4648 |
{$NOMAD.REDUNDANCY.MIN} | Amount of redundant servers to keep the cluster safe. Default value - '1' for the 3-nodes cluster. Change if needed. |
1 |
{$NOMAD.OPEN.FDS.MAX} | Maximum percentage of used file descriptors. |
90 |
{$NOMAD.SERVER.LEADER.LATENCY} | Leader last contact latency threshold. |
0.3s |
Name | Description | Type | Key and additional info |
---|---|---|---|
HashiCorp Nomad Server: Telemetry get | Telemetry data in raw format. |
HTTP agent | nomad.server.data.get Preprocessing
|
HashiCorp Nomad Server: Metrics | Nomad server metrics in raw format. |
Dependent item | nomad.server.metrics.get Preprocessing
|
HashiCorp Nomad Server: Monitoring API response | Monitoring API response message. |
Dependent item | nomad.server.data.api.response Preprocessing
|
HashiCorp Nomad Server: Internal stats get | Internal stats data in raw format. |
HTTP agent | nomad.server.stats.get Preprocessing
|
HashiCorp Nomad Server: Internal stats API response | Internal stats API response message. |
Dependent item | nomad.server.stats.api.response Preprocessing
|
HashiCorp Nomad Server: Nomad server version | Nomad server version. |
Dependent item | nomad.server.version Preprocessing
|
HashiCorp Nomad Server: Nomad raft version | Nomad raft version. |
Dependent item | nomad.raft.version Preprocessing
|
HashiCorp Nomad Server: Raft peers | Current cluster raft peers amount. |
Dependent item | nomad.server.raft.peers Preprocessing
|
HashiCorp Nomad Server: Cluster role | Current role in the cluster. |
Dependent item | nomad.server.raft.cluster_role Preprocessing
|
HashiCorp Nomad Server: CPU time, rate | Total user and system CPU time spent in seconds. |
Dependent item | nomad.server.cpu.time Preprocessing
|
HashiCorp Nomad Server: Memory used | Memory utilization in bytes. |
Dependent item | nomad.server.runtime.alloc_bytes Preprocessing
|
HashiCorp Nomad Server: Virtual memory size | Virtual memory size in bytes. |
Dependent item | nomad.server.virtualmemorybytes Preprocessing
|
HashiCorp Nomad Server: Resident memory size | Resident memory size in bytes. |
Dependent item | nomad.server.residentmemorybytes Preprocessing
|
HashiCorp Nomad Server: Heap objects | Number of objects on the heap. General memory pressure indicator. |
Dependent item | nomad.server.runtime.heap_objects Preprocessing
|
HashiCorp Nomad Server: Open file descriptors | Number of open file descriptors. |
Dependent item | nomad.server.processopenfds Preprocessing
|
HashiCorp Nomad Server: Open file descriptors, max | Maximum number of open file descriptors. |
Dependent item | nomad.server.processmaxfds Preprocessing
|
HashiCorp Nomad Server: Goroutines | Number of goroutines and general load pressure indicator. |
Dependent item | nomad.server.runtime.num_goroutines Preprocessing
|
HashiCorp Nomad Server: Evaluations pending | Evaluations that are pending until an existing evaluation for the same job completes. |
Dependent item | nomad.server.broker.total_pending Preprocessing
|
HashiCorp Nomad Server: Evaluations ready | Number of evaluations ready to be processed. |
Dependent item | nomad.server.broker.total_ready Preprocessing
|
HashiCorp Nomad Server: Evaluations unacked | Evaluations dispatched for processing but incomplete. |
Dependent item | nomad.server.broker.total_unacked Preprocessing
|
HashiCorp Nomad Server: CPU shares for blocked evaluations | Amount of CPU shares requested by blocked evals. |
Dependent item | nomad.server.blocked_evals.cpu Preprocessing
|
HashiCorp Nomad Server: Memory shares by blocked evaluations | Amount of memory requested by blocked evals. |
Dependent item | nomad.server.blocked_evals.memory Preprocessing
|
HashiCorp Nomad Server: CPU shares for blocked job evaluations | Amount of CPU shares requested by blocked evals of a job. |
Dependent item | nomad.server.blocked_evals.job.cpu Preprocessing
|
HashiCorp Nomad Server: Memory shares for blocked job evaluations | Amount of memory requested by blocked evals of a job. |
Dependent item | nomad.server.blocked_evals.job.memory Preprocessing
|
HashiCorp Nomad Server: Evaluations blocked | Count of evals in the blocked state for any reason (cluster resource exhaustion or quota limits). |
Dependent item | nomad.server.blockedevals.totalblocked Preprocessing
|
HashiCorp Nomad Server: Evaluations escaped | Count of evals that have escaped computed node classes. This indicates a scheduler optimization was skipped and is not usually a source of concern. |
Dependent item | nomad.server.blockedevals.totalescaped Preprocessing
|
HashiCorp Nomad Server: Evaluations waiting | Count of evals waiting to be enqueued. |
Dependent item | nomad.server.broker.total_waiting Preprocessing
|
HashiCorp Nomad Server: Evaluations blocked due to quota limit | Count of blocked evals due to quota limits (the resources for these jobs are not counted in other blockedevals metrics, except for totalblocked). |
Dependent item | nomad.server.blockedevals.totalquota_limit Preprocessing
|
HashiCorp Nomad Server: Evaluations enqueue time | Average time elapsed with evaluations waiting to be enqueued. |
Dependent item | nomad.server.broker.eval_waiting Preprocessing
|
HashiCorp Nomad Server: RPC evaluation acknowledgement time | Time elapsed for Eval.Ack RPC call. |
Dependent item | nomad.server.eval.ack Preprocessing
|
HashiCorp Nomad Server: RPC job summary time | Time elapsed for Job.Summary RPC call. |
Dependent item | nomad.server.jobsummary.getjob_summary Preprocessing
|
HashiCorp Nomad Server: Heartbeats active | Number of active heartbeat timers. Each timer represents a Nomad client connection. |
Dependent item | nomad.server.heartbeat.active Preprocessing
|
HashiCorp Nomad Server: RPC requests, rate | Number of RPC requests being handled. |
Dependent item | nomad.server.rpc.request Preprocessing
|
HashiCorp Nomad Server: RPC error requests, rate | Number of RPC requests being handled that result in an error. |
Dependent item | nomad.server.rpc.request_error Preprocessing
|
HashiCorp Nomad Server: RPC queries, rate | Number of RPC queries. |
Dependent item | nomad.server.rpc.query Preprocessing
|
HashiCorp Nomad Server: RPC job allocations time | Time elapsed for Job.Allocations RPC call. |
Dependent item | nomad.server.job.allocations Preprocessing
|
HashiCorp Nomad Server: RPC job evaluations time | Time elapsed for Job.Evaluations RPC call. |
Dependent item | nomad.server.job.evaluations Preprocessing
|
HashiCorp Nomad Server: RPC get job time | Time elapsed for Job.GetJob RPC call. |
Dependent item | nomad.server.job.get_job Preprocessing
|
HashiCorp Nomad Server: Plan apply time | Time elapsed to apply a plan. |
Dependent item | nomad.server.plan.apply Preprocessing
|
HashiCorp Nomad Server: Plan evaluate time | Time elapsed to evaluate a plan. |
Dependent item | nomad.server.plan.evaluate Preprocessing
|
HashiCorp Nomad Server: RPC plan submit time | Time elapsed for Plan.Submit RPC call. |
Dependent item | nomad.server.plan.submit Preprocessing
|
HashiCorp Nomad Server: Plan raft index processing time | Time elapsed that planner waits for the raft index of the plan to be processed. |
Dependent item | nomad.server.plan.waitforindex Preprocessing
|
HashiCorp Nomad Server: RPC list time | Time elapsed for Node.List RPC call. |
Dependent item | nomad.server.client.list Preprocessing
|
HashiCorp Nomad Server: RPC update allocations time | Time elapsed for Node.UpdateAlloc RPC call. |
Dependent item | nomad.server.client.update_alloc Preprocessing
|
HashiCorp Nomad Server: RPC update status time | Time elapsed for Node.UpdateStatus RPC call. |
Dependent item | nomad.server.client.update_status Preprocessing
|
HashiCorp Nomad Server: RPC get client allocs time | Time elapsed for Node.GetClientAllocs RPC call. |
Dependent item | nomad.server.client.getclientallocs Preprocessing
|
HashiCorp Nomad Server: RPC eval dequeue time | Time elapsed for Eval.Dequeue RPC call. |
Dependent item | nomad.server.client.dequeue Preprocessing
|
HashiCorp Nomad Server: Vault token last renewal | Time since last successful Vault token renewal. |
Dependent item | nomad.server.vault.tokenlastrenewal Preprocessing
|
HashiCorp Nomad Server: Vault token next renewal | Time until next Vault token renewal attempt. |
Dependent item | nomad.server.vault.tokennextrenewal Preprocessing
|
HashiCorp Nomad Server: Vault token TTL | Time to live for Vault token. |
Dependent item | nomad.server.vault.token_ttl Preprocessing
|
HashiCorp Nomad Server: Vault tokens revoked | Count of revoked tokens. |
Dependent item | nomad.server.vault.distributedtokensrevoked Preprocessing
|
HashiCorp Nomad Server: Jobs dead | Number of dead jobs. |
Dependent item | nomad.server.job_status.dead Preprocessing
|
HashiCorp Nomad Server: Jobs pending | Number of pending jobs. |
Dependent item | nomad.server.job_status.pending Preprocessing
|
HashiCorp Nomad Server: Jobs running | Number of running jobs. |
Dependent item | nomad.server.job_status.running Preprocessing
|
HashiCorp Nomad Server: Job allocations completed | Number of complete allocations for a job. |
Dependent item | nomad.server.job_summary.complete Preprocessing
|
HashiCorp Nomad Server: Job allocations failed | Number of failed allocations for a job. |
Dependent item | nomad.server.job_summary.failed Preprocessing
|
HashiCorp Nomad Server: Job allocations lost | Number of lost allocations for a job. |
Dependent item | nomad.server.job_summary.lost Preprocessing
|
HashiCorp Nomad Server: Job allocations unknown | Number of unknown allocations for a job. |
Dependent item | nomad.server.job_summary.unknown Preprocessing
|
HashiCorp Nomad Server: Job allocations queued | Number of queued allocations for a job. |
Dependent item | nomad.server.job_summary.queued Preprocessing
|
HashiCorp Nomad Server: Job allocations running | Number of running allocations for a job. |
Dependent item | nomad.server.job_summary.running Preprocessing
|
HashiCorp Nomad Server: Job allocations starting | Number of starting allocations for a job. |
Dependent item | nomad.server.job_summary.starting Preprocessing
|
HashiCorp Nomad Server: Gossip time | Time elapsed to broadcast gossip messages. |
Dependent item | nomad.server.memberlist.gossip Preprocessing
|
HashiCorp Nomad Server: Leader barrier time | Time elapsed to establish a raft barrier during leader transition. |
Dependent item | nomad.server.leader.barrier Preprocessing
|
HashiCorp Nomad Server: Reconcile peer time | Time elapsed to reconcile a serf peer with state store. |
Dependent item | nomad.server.leader.reconcile_member Preprocessing
|
HashiCorp Nomad Server: Total reconcile time | Time elapsed to reconcile all serf peers with state store. |
Dependent item | nomad.server.leader.reconcile Preprocessing
|
HashiCorp Nomad Server: Leader last contact | Time since last contact to leader. General indicator of Raft latency. |
Dependent item | nomad.server.raft.leader.lastContact Preprocessing
|
HashiCorp Nomad Server: Plan queue | Count of evals in the plan queue. |
Dependent item | nomad.server.plan.queue_depth Preprocessing
|
HashiCorp Nomad Server: Worker evaluation create time | Time elapsed for worker to create an eval. |
Dependent item | nomad.server.worker.create_eval Preprocessing
|
HashiCorp Nomad Server: Worker evaluation dequeue time | Time elapsed for worker to dequeue an eval. |
Dependent item | nomad.server.worker.dequeue_eval Preprocessing
|
HashiCorp Nomad Server: Worker invoke scheduler time | Time elapsed for worker to invoke the scheduler. |
Dependent item | nomad.server.worker.invokeschedulerservice Preprocessing
|
HashiCorp Nomad Server: Worker acknowledgement send time | Time elapsed for worker to send acknowledgement. |
Dependent item | nomad.server.worker.send_ack Preprocessing
|
HashiCorp Nomad Server: Worker submit plan time | Time elapsed for worker to submit plan. |
Dependent item | nomad.server.worker.submit_plan Preprocessing
|
HashiCorp Nomad Server: Worker update evaluation time | Time elapsed for worker to submit updated eval. |
Dependent item | nomad.server.worker.update_eval Preprocessing
|
HashiCorp Nomad Server: Worker log replication time | Time elapsed that worker waits for the raft index of the eval to be processed. |
Dependent item | nomad.server.worker.waitforindex Preprocessing
|
HashiCorp Nomad Server: Raft calls blocked, rate | Count of blocking raft API calls. |
Dependent item | nomad.server.raft.barrier Preprocessing
|
HashiCorp Nomad Server: Raft commit logs enqueued | Count of logs enqueued. |
Dependent item | nomad.server.raft.commitnumlogs Preprocessing
|
HashiCorp Nomad Server: Raft transactions, rate | Number of Raft transactions. |
Dependent item | nomad.server.raft.apply Preprocessing
|
HashiCorp Nomad Server: Raft commit time | Time elapsed to commit writes. |
Dependent item | nomad.server.raft.commit_time Preprocessing
|
HashiCorp Nomad Server: Raft transaction commit time | Raft transaction commit time. |
Dependent item | nomad.server.raft.replication.appendEntries Preprocessing
|
HashiCorp Nomad Server: FSM apply time | Time elapsed to apply write to FSM. |
Dependent item | nomad.server.raft.fsm.apply Preprocessing
|
HashiCorp Nomad Server: FSM enqueue time | Time elapsed to enqueue write to FSM. |
Dependent item | nomad.server.raft.fsm.enqueue Preprocessing
|
HashiCorp Nomad Server: FSM autopilot time | Time elapsed to apply Autopilot raft entry. |
Dependent item | nomad.server.raft.fsm.autopilot Preprocessing
|
HashiCorp Nomad Server: FSM register node time | Time elapsed to apply RegisterNode raft entry. |
Dependent item | nomad.server.raft.fsm.register_node Preprocessing
|
HashiCorp Nomad Server: FSM index | Current index applied to FSM. |
Dependent item | nomad.server.raft.applied_index Preprocessing
|
HashiCorp Nomad Server: Raft last index | Most recent index seen. |
Dependent item | nomad.server.raft.last_index Preprocessing
|
HashiCorp Nomad Server: Dispatch log time | Time elapsed to write log, mark in flight, and start replication. |
Dependent item | nomad.server.raft.leader.dispatch_log Preprocessing
|
HashiCorp Nomad Server: Logs dispatched | Count of logs dispatched. |
Dependent item | nomad.server.raft.leader.dispatchnumlogs Preprocessing
|
HashiCorp Nomad Server: Heartbeat fails | Count of failing to heartbeat and starting election. |
Dependent item | nomad.server.raft.transition.heartbeat_timeout Preprocessing
|
HashiCorp Nomad Server: Objects freed, rate | Count of objects freed from heap by go runtime GC. |
Dependent item | nomad.server.runtime.free_count Preprocessing
|
HashiCorp Nomad Server: GC pause time | Go runtime GC pause times. |
Dependent item | nomad.server.runtime.gcpausens Preprocessing
|
HashiCorp Nomad Server: GC metadata size | Go runtime GC metadata size in bytes. |
Dependent item | nomad.server.runtime.sys_bytes Preprocessing
|
HashiCorp Nomad Server: GC runs | Count of go runtime GC runs. |
Dependent item | nomad.server.runtime.totalgcruns Preprocessing
|
HashiCorp Nomad Server: Memberlist events | Count of memberlist events received. |
Dependent item | nomad.server.serf.queue.event Preprocessing
|
HashiCorp Nomad Server: Memberlist changes | Count of memberlist changes. |
Dependent item | nomad.server.serf.queue.intent Preprocessing
|
HashiCorp Nomad Server: Memberlist queries | Count of memberlist queries. |
Dependent item | nomad.server.serf.queue.queries Preprocessing
|
HashiCorp Nomad Server: Snapshot index | Current snapshot index. |
Dependent item | nomad.server.state.snapshot.index Preprocessing
|
HashiCorp Nomad Server: Services ready to schedule | Count of service evals ready to be scheduled. |
Dependent item | nomad.server.broker.service_ready Preprocessing
|
HashiCorp Nomad Server: Services unacknowledged | Count of unacknowledged service evals. |
Dependent item | nomad.server.broker.service_unacked Preprocessing
|
HashiCorp Nomad Server: System evaluations ready to schedule | Count of service evals ready to be scheduled. |
Dependent item | nomad.server.broker.system_ready Preprocessing
|
HashiCorp Nomad Server: System evaluations unacknowledged | Count of unacknowledged system evals. |
Dependent item | nomad.server.broker.system_unacked Preprocessing
|
HashiCorp Nomad Server: BoltDB free pages | Number of BoltDB free pages. |
Dependent item | nomad.server.raft.boltdb.numfreepages Preprocessing
|
HashiCorp Nomad Server: BoltDB pending pages | Number of BoltDB pending pages. |
Dependent item | nomad.server.raft.boltdb.numpendingpages Preprocessing
|
HashiCorp Nomad Server: BoltDB free page bytes | Number of free page bytes. |
Dependent item | nomad.server.raft.boltdb.freepagebytes Preprocessing
|
HashiCorp Nomad Server: BoltDB freelist bytes | Number of freelist bytes. |
Dependent item | nomad.server.raft.boltdb.freelist_bytes Preprocessing
|
HashiCorp Nomad Server: BoltDB read transactions, rate | Count of total read transactions. |
Dependent item | nomad.server.raft.boltdb.totalreadtxn Preprocessing
|
HashiCorp Nomad Server: BoltDB open read transactions | Number of current open read transactions. |
Dependent item | nomad.server.raft.boltdb.openreadtxn Preprocessing
|
HashiCorp Nomad Server: BoltDB pages in use | Number of pages in use. |
Dependent item | nomad.server.raft.boltdb.txstats.page_count Preprocessing
|
HashiCorp Nomad Server: BoltDB page allocations, rate | Number of page allocations. |
Dependent item | nomad.server.raft.boltdb.txstats.page_alloc Preprocessing
|
HashiCorp Nomad Server: BoltDB cursors | Count of total database cursors. |
Dependent item | nomad.server.raft.boltdb.txstats.cursor_count Preprocessing
|
HashiCorp Nomad Server: BoltDB nodes, rate | Count of total database nodes. |
Dependent item | nomad.server.raft.boltdb.txstats.node_count Preprocessing
|
HashiCorp Nomad Server: BoltDB node dereferences, rate | Count of total database node dereferences. |
Dependent item | nomad.server.raft.boltdb.txstats.node_deref Preprocessing
|
HashiCorp Nomad Server: BoltDB rebalance operations, rate | Count of total rebalance operations. |
Dependent item | nomad.server.raft.boltdb.txstats.rebalance Preprocessing
|
HashiCorp Nomad Server: BoltDB split operations, rate | Count of total split operations. |
Dependent item | nomad.server.raft.boltdb.txstats.split Preprocessing
|
HashiCorp Nomad Server: BoltDB spill operations, rate | Count of total spill operations. |
Dependent item | nomad.server.raft.boltdb.txstats.spill Preprocessing
|
HashiCorp Nomad Server: BoltDB write operations, rate | Count of total write operations. |
Dependent item | nomad.server.raft.boltdb.txstats.write Preprocessing
|
HashiCorp Nomad Server: BoltDB rebalance time | Sample of rebalance operation times. |
Dependent item | nomad.server.raft.boltdb.txstats.rebalance_time Preprocessing
|
HashiCorp Nomad Server: BoltDB spill time | Sample of spill operation times. |
Dependent item | nomad.server.raft.boltdb.txstats.spill_time Preprocessing
|
HashiCorp Nomad Server: BoltDB write time | Sample of write operation times. |
Dependent item | nomad.server.raft.boltdb.txstats.write_time Preprocessing
|
HashiCorp Nomad Server: Service [rpc] state | Current [rpc] service state. |
Simple check | net.tcp.service[tcp,,{$NOMAD.SERVER.RPC.PORT}] Preprocessing
|
HashiCorp Nomad Server: Service [serf] state | Current [serf] service state. |
Simple check | net.tcp.service[tcp,,{$NOMAD.SERVER.SERF.PORT}] Preprocessing
|
HashiCorp Nomad Server: Namespace list time | Time elapsed for Namespace.ListNamespaces. |
Dependent item | nomad.server.namespace.list_namespace Preprocessing
|
HashiCorp Nomad Server: Autopilot state | Current autopilot state. |
Dependent item | nomad.server.autopilot.state Preprocessing
|
HashiCorp Nomad Server: Autopilot failure tolerance | The number of redundant healthy servers that can fail without causing an outage. |
Dependent item | nomad.server.autopilot.failure_tolerance Preprocessing
|
HashiCorp Nomad Server: FSM allocation client update time | Time elapsed to apply AllocClientUpdate raft entry. |
Dependent item | nomad.server.allocclientupdate Preprocessing
|
HashiCorp Nomad Server: FSM apply plan results time | Time elapsed to apply ApplyPlanResults raft entry. |
Dependent item | nomad.server.fsm.applyplanresults Preprocessing
|
HashiCorp Nomad Server: FSM update evaluation time | Time elapsed to apply UpdateEval raft entry. |
Dependent item | nomad.server.fsm.update_eval Preprocessing
|
HashiCorp Nomad Server: FSM job registration time | Time elapsed to apply RegisterJob raft entry. |
Dependent item | nomad.server.fsm.register_job Preprocessing
|
HashiCorp Nomad Server: Allocation reschedule attempts | Count of attempts to reschedule an allocation. |
Dependent item | nomad.server.scheduler.allocs.rescheduled.attempted Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HashiCorp Nomad Server: Monitoring API connection has failed | Monitoring API connection has failed. |
find(/HashiCorp Nomad Server by HTTP/nomad.server.data.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0 |Average |
Manual close: Yes | |
HashiCorp Nomad Server: Internal stats API connection has failed | Internal stats API connection has failed. |
find(/HashiCorp Nomad Server by HTTP/nomad.server.stats.api.response,,"like","{$NOMAD.API.RESPONSE.SUCCESS}")=0 |Average |
Manual close: Yes Depends on:
|
|
HashiCorp Nomad Server: Nomad server version has changed | Nomad server version has changed. |
change(/HashiCorp Nomad Server by HTTP/nomad.server.version)<>0 |Info |
Manual close: Yes | |
HashiCorp Nomad Server: Cluster role has changed | Cluster role has changed. |
change(/HashiCorp Nomad Server by HTTP/nomad.server.raft.cluster_role) <> 0 |Info |
Manual close: Yes | |
HashiCorp Nomad Server: Current number of open files is too high | Heavy file descriptor usage (i.e., near the process file descriptor limit) indicates a potential file descriptor exhaustion issue. |
min(/HashiCorp Nomad Server by HTTP/nomad.server.process_open_fds,5m)/last(/HashiCorp Nomad Server by HTTP/nomad.server.process_max_fds)*100>{$NOMAD.OPEN.FDS.MAX} |Warning |
||
HashiCorp Nomad Server: Dead jobs found | Jobs with the |
last(/HashiCorp Nomad Server by HTTP/nomad.server.job_status.dead) > 0 and nodata(/HashiCorp Nomad Server by HTTP/nomad.server.job_status.dead,5m) = 0 |Warning |
Manual close: Yes | |
HashiCorp Nomad Server: Leader last contact timeout exceeded | The nomad.raft.leader.lastContact metric is a general indicator of Raft latency which can be used to observe how Raft timing is performing and guide infrastructure provisioning. |
min(/HashiCorp Nomad Server by HTTP/nomad.server.raft.leader.lastContact,5m) >= {$NOMAD.SERVER.LEADER.LATENCY} and nodata(/HashiCorp Nomad Server by HTTP/nomad.server.raft.leader.lastContact,5m) = 0 |Warning |
||
HashiCorp Nomad Server: Service [rpc] is down | Cannot establish the connection to [rpc] service port {$NOMAD.SERVER.RPC.PORT}. |
last(/HashiCorp Nomad Server by HTTP/net.tcp.service[tcp,,{$NOMAD.SERVER.RPC.PORT}]) = 0 |Average |
Manual close: Yes | |
HashiCorp Nomad Server: Service [serf] is down | Cannot establish the connection to [serf] service port {$NOMAD.SERVER.SERF.PORT}. |
last(/HashiCorp Nomad Server by HTTP/net.tcp.service[tcp,,{$NOMAD.SERVER.SERF.PORT}]) = 0 |Average |
Manual close: Yes | |
HashiCorp Nomad Server: Autopilot is unhealthy | The autopilot is in unhealthy state. The successful failover probability is extremely low. |
last(/HashiCorp Nomad Server by HTTP/nomad.server.autopilot.state) = 0 and nodata(/HashiCorp Nomad Server by HTTP/nomad.server.autopilot.state,5m) = 0 |Average |
Manual close: Yes | |
HashiCorp Nomad Server: Autopilot redundancy is low | The autopilot redundancy is low. |
last(/HashiCorp Nomad Server by HTTP/nomad.server.autopilot.failure_tolerance) < {$NOMAD.REDUNDANCY.MIN} and nodata(/HashiCorp Nomad Server by HTTP/nomad.server.autopilot.failure_tolerance,5m) = 0 |Warning |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Nginx Plus monitoring by Zabbix via HTTP and doesn't require any external scripts.
The monitoring data of the live activity is generated by the NGINX Plus API.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
<scheme>://<host>:<port>/<location>/
.Note that depending on the number of zones and upstreams discovery operation may be expensive. Therefore, use the following filters with these macros:
Name | Description | Default |
---|---|---|
{$NGINX.API.ENDPOINT} | NGINX Plus API URL in the format |
|
{$NGINX.LLD.FILTER.HTTP.ZONE.MATCHES} | The filter to include the necessary discovered HTTP server zones. |
.* |
{$NGINX.LLD.FILTER.HTTP.ZONE.NOT_MATCHES} | The filter to exclude discovered HTTP server zones. |
CHANGE_IF_NEEDED |
{$NGINX.LLD.FILTER.HTTP.LOCATION.ZONE.MATCHES} | The filter to include the necessary discovered HTTP location zones. |
.* |
{$NGINX.LLD.FILTER.HTTP.LOCATION.ZONE.NOT_MATCHES} | The filter to exclude discovered HTTP location zones. |
CHANGE_IF_NEEDED |
{$NGINX.LLD.FILTER.HTTP.UPSTREAM.MATCHES} | The filter to include the necessary discovered HTTP upstreams. |
.* |
{$NGINX.LLD.FILTER.HTTP.UPSTREAM.NOT_MATCHES} | The filter to exclude discovered HTTP upstreams. |
CHANGE_IF_NEEDED |
{$NGINX.LLD.FILTER.STREAM.ZONE.MATCHES} | The filter to include discovered server zones of the "stream" directive. |
.* |
{$NGINX.LLD.FILTER.STREAM.ZONE.NOT_MATCHES} | The filter to exclude discovered server zones of the "stream" directive. |
CHANGE_IF_NEEDED |
{$NGINX.LLD.FILTER.STREAM.UPSTREAM.MATCHES} | The filter to include the necessary discovered upstreams of the "stream" directive. |
.* |
{$NGINX.LLD.FILTER.STREAM.UPSTREAM.NOT_MATCHES} | The filter to exclude discovered upstreams of the "stream" directive |
CHANGE_IF_NEEDED |
{$NGINX.LLD.FILTER.RESOLVER.MATCHES} | The filter to include the necessary discovered |
.* |
{$NGINX.LLD.FILTER.RESOLVER.NOT_MATCHES} | The filter to exclude discovered |
CHANGE_IF_NEEDED |
{$NGINX.DROP_RATE.MAX.WARN} | The critical rate of the dropped connections for a trigger expression. |
1 |
{$NGINX.HTTP.UPSTREAM.4XX.MAX.WARN} | The maximum percentage of errors with the status code |
5 |
{$NGINX.HTTP.UPSTREAM.5XX.MAX.WARN} | The maximum percentage of errors with the status code |
5 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Nginx: Get info | Return status of the NGINX running instance. |
HTTP agent | nginx.info |
Nginx: Get connections | Returns the statistics of client connections. |
HTTP agent | nginx.connections |
Nginx: Get SSL | Returns the SSL statistics. |
HTTP agent | nginx.ssl |
Nginx: Get requests | Returns the status of the client's HTTP requests. |
HTTP agent | nginx.requests |
Nginx: Get HTTP zones | Returns the status information for each HTTP server zone. |
HTTP agent | nginx.http.server_zones |
Nginx: Get HTTP location zones | Returns the status information for each HTTP location zone. |
HTTP agent | nginx.http.location_zones |
Nginx: Get HTTP upstreams | Returns the status of each HTTP upstream server group and its servers. |
HTTP agent | nginx.http.upstreams |
Nginx: Get Stream server zones | Returns the status information for each server zone configured in the "stream" directive. |
HTTP agent | nginx.stream.server_zones |
Nginx: Get Stream upstreams | Returns status of each stream upstream server group and its servers. |
HTTP agent | nginx.stream.upstreams |
Nginx: Get resolvers | Returns the status information for each Resolver zone. |
HTTP agent | nginx.resolvers |
Nginx: Get info error | The description of NGINX errors. |
Dependent item | nginx.info.error Preprocessing
|
Nginx: Version | A version number of NGINX. |
Dependent item | nginx.info.version Preprocessing
|
Nginx: Address | The address of the server that accepted status request. |
Dependent item | nginx.info.address Preprocessing
|
Nginx: Generation | The total number of configuration reloads. |
Dependent item | nginx.info.generation Preprocessing
|
Nginx: Uptime | The server uptime. |
Dependent item | nginx.info.uptime Preprocessing
|
Nginx: Connections accepted, rate | The total number of accepted client connections per second. |
Dependent item | nginx.connections.accepted.rate Preprocessing
|
Nginx: Connections dropped | The total number of dropped client connections. |
Dependent item | nginx.connections.dropped Preprocessing
|
Nginx: Connections active | The current number of active client connections. |
Dependent item | nginx.connections.active Preprocessing
|
Nginx: Connections idle | The current number of idle client connections. |
Dependent item | nginx.connections.idle Preprocessing
|
Nginx: SSL handshakes, rate | The total number of successful SSL handshakes per second. |
Dependent item | nginx.ssl.handshakes.rate Preprocessing
|
Nginx: SSL handshakes failed, rate | The total number of failed SSL handshakes per second. |
Dependent item | nginx.ssl.handshakes_failed.rate Preprocessing
|
Nginx: SSL session reuses, rate | The total number of session reuses during SSL handshake per second. |
Dependent item | nginx.ssl.session_reuses.rate Preprocessing
|
Nginx: Requests total, rate | The total number of client requests per second. |
Dependent item | nginx.requests.total.rate Preprocessing
|
Nginx: Requests current | The current number of client requests. |
Dependent item | nginx.requests.current Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Nginx: Server response error | length(last(/NGINX Plus by HTTP/nginx.info.error))>0 |High |
|||
Nginx: Version has changed | The Nginx version has changed. Acknowledge to close the problem manually. |
last(/NGINX Plus by HTTP/nginx.info.version,#1)<>last(/NGINX Plus by HTTP/nginx.info.version,#2) and length(last(/NGINX Plus by HTTP/nginx.info.version))>0 |Info |
Manual close: Yes | |
Nginx: Host has been restarted | Uptime is less than 10 minutes. |
last(/NGINX Plus by HTTP/nginx.info.uptime)<10m |Info |
Manual close: Yes | |
Nginx: Failed to fetch info data | Zabbix has not received any data for metrics for the last 30 minutes |
nodata(/NGINX Plus by HTTP/nginx.info.uptime,30m)=1 |Warning |
Manual close: Yes | |
Nginx: High connections drop rate | The rate of dropped connections is greater than |
min(/NGINX Plus by HTTP/nginx.connections.dropped,5m) > {$NGINX.DROP_RATE.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
HTTP server zones discovery | Dependent item | nginx.http.server_zones.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Nginx: HTTP server zone [{#NAME}]: Raw data | The raw data of the HTTP server zone with the name |
Dependent item | nginx.http.server_zones.raw[{#NAME}] Preprocessing
|
Nginx: HTTP server zone [{#NAME}]: Processing | The number of client requests that are currently being processed. |
Dependent item | nginx.http.server_zones.processing[{#NAME}] Preprocessing
|
Nginx: HTTP server zone [{#NAME}]: Requests, rate | The total number of client requests received from clients per second. |
Dependent item | nginx.http.server_zones.requests.rate[{#NAME}] Preprocessing
|
Nginx: HTTP server zone [{#NAME}]: Responses 1xx, rate | The number of responses with |
Dependent item | nginx.http.server_zones.responses.1xx.rate[{#NAME}] Preprocessing
|
Nginx: HTTP server zone [{#NAME}]: Responses 2xx, rate | The number of responses with |
Dependent item | nginx.http.server_zones.responses.2xx.rate[{#NAME}] Preprocessing
|
Nginx: HTTP server zone [{#NAME}]: Responses 3xx, rate | The number of responses with |
Dependent item | nginx.http.server_zones.responses.3xx.rate[{#NAME}] Preprocessing
|
Nginx: HTTP server zone [{#NAME}]: Responses 4xx, rate | The number of responses with |
Dependent item | nginx.http.server_zones.responses.4xx.rate[{#NAME}] Preprocessing
|
Nginx: HTTP server zone [{#NAME}]: Responses 5xx, rate | The number of responses with |
Dependent item | nginx.http.server_zones.responses.5xx.rate[{#NAME}] Preprocessing
|
Nginx: HTTP server zone [{#NAME}]: Responses total, rate | The total number of responses sent to clients per second. |
Dependent item | nginx.http.server_zones.responses.total.rate[{#NAME}] Preprocessing
|
Nginx: HTTP server zone [{#NAME}]: Discarded, rate | The total number of requests completed without sending a response per second. |
Dependent item | nginx.http.server_zones.discarded.rate[{#NAME}] Preprocessing
|
Nginx: HTTP server zone [{#NAME}]: Received, rate | The total number of bytes received from clients per second. |
Dependent item | nginx.http.server_zones.received.rate[{#NAME}] Preprocessing
|
Nginx: HTTP server zone [{#NAME}]: Sent, rate | The total number of bytes sent to clients per second. |
Dependent item | nginx.http.server_zones.sent.rate[{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
HTTP location zones discovery | Dependent item | nginx.http.location_zones.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Nginx: HTTP location zone [{#NAME}]: Raw data | The raw data of the location zone with the name |
Dependent item | nginx.http.location_zones.raw[{#NAME}] Preprocessing
|
Nginx: HTTP location zone [{#NAME}]: Requests, rate | The total number of client requests received from clients per second. |
Dependent item | nginx.http.location_zones.requests.rate[{#NAME}] Preprocessing
|
Nginx: HTTP location zone [{#NAME}]: Responses 1xx, rate | The number of responses with |
Dependent item | nginx.http.location_zones.responses.1xx.rate[{#NAME}] Preprocessing
|
Nginx: HTTP location zone [{#NAME}]: Responses 2xx, rate | The number of responses with |
Dependent item | nginx.http.location_zones.responses.2xx.rate[{#NAME}] Preprocessing
|
Nginx: HTTP location zone [{#NAME}]: Responses 3xx, rate | The number of responses with |
Dependent item | nginx.http.location_zones.responses.3xx.rate[{#NAME}] Preprocessing
|
Nginx: HTTP location zone [{#NAME}]: Responses 4xx, rate | The number of responses with |
Dependent item | nginx.http.location_zones.responses.4xx.rate[{#NAME}] Preprocessing
|
Nginx: HTTP location zone [{#NAME}]: Responses 5xx, rate | The number of responses with |
Dependent item | nginx.http.location_zones.responses.5xx.rate[{#NAME}] Preprocessing
|
Nginx: HTTP location zone [{#NAME}]: Responses total, rate | The total number of responses sent to clients per second. |
Dependent item | nginx.http.location_zones.responses.total.rate[{#NAME}] Preprocessing
|
Nginx: HTTP location zone [{#NAME}]: Discarded, rate | The total number of requests completed without sending a response per second. |
Dependent item | nginx.http.location_zones.discarded.rate[{#NAME}] Preprocessing
|
Nginx: HTTP location zone [{#NAME}]: Received, rate | The total number of bytes received from clients per second. |
Dependent item | nginx.http.location_zones.received.rate[{#NAME}] Preprocessing
|
Nginx: HTTP location zone [{#NAME}]: Sent, rate | The total number of bytes sent to clients per second. |
Dependent item | nginx.http.location_zones.sent.rate[{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
HTTP upstreams discovery | Dependent item | nginx.http.upstreams.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Nginx: HTTP upstream [{#NAME}]: Raw data | The raw data of the HTTP upstream with the name |
Dependent item | nginx.http.upstreams.raw[{#NAME}] Preprocessing
|
Nginx: HTTP upstream [{#NAME}]: Keepalive | The current number of idle keepalive connections. |
Dependent item | nginx.http.upstreams.keepalive[{#NAME}] Preprocessing
|
Nginx: HTTP upstream [{#NAME}]: Zombies | The current number of servers removed from the group but still processing active client requests. |
Dependent item | nginx.http.upstreams.zombies[{#NAME}] Preprocessing
|
Nginx: HTTP upstream [{#NAME}]: Zone | The name of the shared memory zone that keeps the group's configuration and run-time state. |
Dependent item | nginx.http.upstreams.zone[{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
HTTP upstream peers discovery | Dependent item | nginx.http.upstream.peers.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Raw data | The raw data of the HTTP upstream with the name |
Dependent item | nginx.http.upstream.peer.raw[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: State | The current state, which may be one of “up”, “draining”, “down”, “unavail”, “checking”, and “unhealthy”. |
Dependent item | nginx.http.upstream.peer.state[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Active | The current number of active connections. |
Dependent item | nginx.http.upstream.peer.active[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Requests, rate | The total number of client requests forwarded to this server per second. |
Dependent item | nginx.http.upstream.peer.requests.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses 1xx, rate | The number of responses with |
Dependent item | nginx.http.upstream.peer.responses.1xx.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses 2xx, rate | The number of responses with |
Dependent item | nginx.http.upstream.peer.responses.2xx.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses 3xx, rate | The number of responses with |
Dependent item | nginx.http.upstream.peer.responses.3xx.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses 4xx, rate | The number of responses with |
Dependent item | nginx.http.upstream.peer.responses.4xx.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses 5xx, rate | The number of responses with |
Dependent item | nginx.http.upstream.peer.responses.5xx.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Responses total, rate | The total number of responses obtained from this server. |
Dependent item | nginx.http.upstream.peer.responses.total.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Sent, rate | The total number of bytes sent to this server per second. |
Dependent item | nginx.http.upstream.peer.sent.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Received, rate | The total number of bytes received from this server per second. |
Dependent item | nginx.http.upstream.peer.received.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Fails, rate | The total number of unsuccessful attempts to communicate with the server per second. |
Dependent item | nginx.http.upstream.peer.fails.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Unavail | Displays how many times the server has become unavailable for client requests (the state - “unavail”) due to the number of unsuccessful attempts reaching the |
Dependent item | nginx.http.upstream.peer.unavail.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Header time | The average time to get the response header from the server. |
Dependent item | nginx.http.upstream.peer.header_time.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Response time | The average time to get the full response from the server. |
Dependent item | nginx.http.upstream.peer.response_time.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, check | The total number of health check requests made. |
Dependent item | nginx.http.upstream.peer.health_checks.checks[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, fails | The number of failed health checks. |
Dependent item | nginx.http.upstream.peer.health_checks.fails[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: HTTP upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, unhealthy | Displays how many times the server has become |
Dependent item | nginx.http.upstream.peer.health_checks.unhealthy[{#UPSTREAM},{#PEER}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Nginx: HTTP upstream server is not in UP or DOWN state. | find(/NGINX Plus by HTTP/nginx.http.upstream.peer.state[{#UPSTREAM},{#PEER}],,"like","up")=0 and find(/NGINX Plus by HTTP/nginx.http.upstream.peer.state[{#UPSTREAM},{#PEER}],,"like","down")=0 |Warning |
|||
Nginx: Too many HTTP requests with code 4xx | sum(/NGINX Plus by HTTP/nginx.http.upstream.peer.responses.4xx.rate[{#UPSTREAM},{#PEER}],5m) > (sum(/NGINX Plus by HTTP/nginx.http.upstream.peer.responses.total.rate[{#UPSTREAM},{#PEER}],5m)*({$NGINX.HTTP.UPSTREAM.4XX.MAX.WARN}/100)) |Warning |
|||
Nginx: Too many HTTP requests with code 5xx | sum(/NGINX Plus by HTTP/nginx.http.upstream.peer.responses.5xx.rate[{#UPSTREAM},{#PEER}],5m) > (sum(/NGINX Plus by HTTP/nginx.http.upstream.peer.responses.total.rate[{#UPSTREAM},{#PEER}],5m)*({$NGINX.HTTP.UPSTREAM.5XX.MAX.WARN}/100)) |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Stream server zones discovery | Dependent item | nginx.stream.server_zones.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Nginx: Stream server zone [{#NAME}]: Raw data | The raw data of server zone with the name |
Dependent item | nginx.stream.server_zones.raw[{#NAME}] Preprocessing
|
Nginx: Stream server zone [{#NAME}]: Processing | The number of client connections that are currently being processed. |
Dependent item | nginx.stream.server_zones.processing[{#NAME}] Preprocessing
|
Nginx: Stream server zone [{#NAME}]: Connections, rate | The total number of connections accepted from clients per second. |
Dependent item | nginx.stream.server_zones.connections.rate[{#NAME}] Preprocessing
|
Nginx: Stream server zone [{#NAME}]: Sessions 2xx, rate | The total number of sessions completed with status code |
Dependent item | nginx.stream.server_zones.sessions.2xx.rate[{#NAME}] Preprocessing
|
Nginx: Stream server zone [{#NAME}]: Sessions 4xx, rate | The total number of sessions completed with status code |
Dependent item | nginx.stream.server_zones.sessions.4xx.rate[{#NAME}] Preprocessing
|
Nginx: Stream server zone [{#NAME}]: Sessions 5xx, rate | The total number of sessions completed with status code |
Dependent item | nginx.stream.server_zones.sessions.5xx.rate[{#NAME}] Preprocessing
|
Nginx: Stream server zone [{#NAME}]: Sessions total, rate | The total number of completed client sessions per second. |
Dependent item | nginx.stream.server_zones.sessions.total.rate[{#NAME}] Preprocessing
|
Nginx: Stream server zone [{#NAME}]: Discarded, rate | The total number of connections completed without creating a session per second. |
Dependent item | nginx.stream.server_zones.discarded.rate[{#NAME}] Preprocessing
|
Nginx: Stream server zone [{#NAME}]: Received, rate | The total number of bytes received from clients per second. |
Dependent item | nginx.stream.server_zones.received.rate[{#NAME}] Preprocessing
|
Nginx: Stream server zone [{#NAME}]: Sent, rate | The total number of bytes sent to clients per second. |
Dependent item | nginx.stream.server_zones.sent.rate[{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Stream upstreams discovery | Dependent item | nginx.stream.upstreams.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Nginx: Stream upstream [{#NAME}]: Raw data | The raw data of the upstream with the name |
Dependent item | nginx.stream.upstreams.raw[{#NAME}] Preprocessing
|
Nginx: Stream upstream [{#NAME}]: Zombies | Dependent item | nginx.stream.upstreams.zombies[{#NAME}] Preprocessing
|
|
Nginx: Stream upstream [{#NAME}]: Zone | Dependent item | nginx.stream.upstreams.zone[{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Stream upstream peers discovery | Dependent item | nginx.stream.upstream.peers.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Raw data | The raw data of the upstream with the name |
Dependent item | nginx.stream.upstream.peer.raw[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: State | The current state, which may be one of “up”, “draining”, “down”, “unavail”, “checking”, and “unhealthy”. |
Dependent item | nginx.stream.upstream.peer.state[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Active | The current number of connections. |
Dependent item | nginx.stream.upstream.peer.active[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Sent, rate | The total number of bytes sent to this server per second. |
Dependent item | nginx.stream.upstream.peer.sent.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Received, rate | The total number of bytes received from this server per second. |
Dependent item | nginx.stream.upstream.peer.received.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Fails, rate | The total number of unsuccessful attempts to communicate with the server per second. |
Dependent item | nginx.stream.upstream.peer.fails.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Unavail | Displays how many times the server has become unavailable for client requests (the state - “unavail”) due to the number of unsuccessful attempts reaching the |
Dependent item | nginx.stream.upstream.peer.unavail.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Connections | The total number of client connections forwarded to this server. |
Dependent item | nginx.stream.upstream.peer.connections.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Connect time | The average time to connect to the upstream server. |
Dependent item | nginx.stream.upstream.peer.connect_time.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: First byte time | The average time to receive the first byte of data. |
Dependent item | nginx.stream.upstream.peer.firstbytetime.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Response time | The average time to receive the last byte of data. |
Dependent item | nginx.stream.upstream.peer.response_time.rate[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, check | The total number of health check requests made. |
Dependent item | nginx.stream.upstream.peer.health_checks.checks[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, fails | The number of failed health checks. |
Dependent item | nginx.stream.upstream.peer.health_checks.fails[{#UPSTREAM},{#PEER}] Preprocessing
|
Nginx: Stream upstream [{#UPSTREAM}] peer [{#PEER}]: Health checks, unhealthy | Displays how many times the server has become |
Dependent item | nginx.stream.upstream.peer.health_checks.unhealthy[{#UPSTREAM},{#PEER}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Nginx: Stream upstream server is not in UP or DOWN state. | find(/NGINX Plus by HTTP/nginx.stream.upstream.peer.state[{#UPSTREAM},{#PEER}],,"like","up")=0 and find(/NGINX Plus by HTTP/nginx.stream.upstream.peer.state[{#UPSTREAM},{#PEER}],,"like","down")=0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Resolvers discovery | Dependent item | nginx.resolvers.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Nginx: Resolver [{#NAME}]: Raw data | The raw data of the |
Dependent item | nginx.resolvers.raw[{#NAME}] Preprocessing
|
Nginx: Resolver [{#NAME}]: Requests name, rate | The total number of requests to resolve names to addresses per second. |
Dependent item | nginx.resolvers.requests.name.rate[{#NAME}] Preprocessing
|
Nginx: Resolver [{#NAME}]: Requests srv, rate | The total number of requests to resolve SRV records per second. |
Dependent item | nginx.resolvers.requests.srv.rate[{#NAME}] Preprocessing
|
Nginx: Resolver [{#NAME}]: Requests addr, rate | The total number of requests to resolve addresses to names per second. |
Dependent item | nginx.resolvers.requests.addr.rate[{#NAME}] Preprocessing
|
Nginx: Resolver [{#NAME}]: Responses noerror, rate | The total number of successful responses per second. |
Dependent item | nginx.resolvers.responses.noerror.rate[{#NAME}] Preprocessing
|
Nginx: Resolver [{#NAME}]: Responses formerr, rate | The total number of |
Dependent item | nginx.resolvers.responses.formerr.rate[{#NAME}] Preprocessing
|
Nginx: Resolver [{#NAME}]: Responses servfail, rate | The total number of |
Dependent item | nginx.resolvers.responses.servfail.rate[{#NAME}] Preprocessing
|
Nginx: Resolver [{#NAME}]: Responses nxdomain, rate | The total number of |
Dependent item | nginx.resolvers.responses.nxdomain.rate[{#NAME}] Preprocessing
|
Nginx: Resolver [{#NAME}]: Responses notimp, rate | The total number of |
Dependent item | nginx.resolvers.responses.notimp.rate[{#NAME}] Preprocessing
|
Nginx: Resolver [{#NAME}]: Responses refused, rate | The total number of |
Dependent item | nginx.resolvers.responses.refused.rate[{#NAME}] Preprocessing
|
Nginx: Resolver [{#NAME}]: Responses timedout, rate | The total number of timed out requests per second. |
Dependent item | nginx.resolvers.responses.timedout.rate[{#NAME}] Preprocessing
|
Nginx: Resolver [{#NAME}]: Responses unknown, rate | The total number of requests completed with an unknown error per second. |
Dependent item | nginx.resolvers.responses.unknown.rate[{#NAME}] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is developed to monitor Nginx by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template collects metrics by polling the module ngx_http_stub_status_module
with HTTP agent remotely:
Active connections: 291
server accepts handled requests
16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
ngx_http_stub_status_module
.Test the availability of the http_stub_status_module
with nginx -V 2>&1 | grep -o with-http_stub_status_module
.
Example configuration of Nginx:
location = /basic_status {
stub_status;
allow <IP of your Zabbix server/proxy>;
deny all;
}
{$NGINX.STUB_STATUS.HOST}
macro. You can also change the status page port in the {$NGINX.STUB_STATUS.PORT}
macro, the status page scheme in the {$NGINX.STUB_STATUS.SCHEME}
macro and the status page path in the {$NGINX.STUB_STATUS.PATH}
macro if necessary.Example answer from Nginx:
Active connections: 291
server accepts handled requests
16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106
Name | Description | Default |
---|---|---|
{$NGINX.STUB_STATUS.HOST} | The hostname or IP address of the Nginx host or Nginx container of a stub_status. |
<SET STUB_STATUS HOST> |
{$NGINX.STUB_STATUS.SCHEME} | The protocol http or https of Nginx stub_status host or container. |
http |
{$NGINX.STUB_STATUS.PATH} | The path of the |
basic_status |
{$NGINX.STUB_STATUS.PORT} | The port of the |
80 |
{$NGINX.RESPONSE_TIME.MAX.WARN} | The maximum response time of Nginx expressed in seconds for a trigger expression. |
10 |
{$NGINX.DROP_RATE.MAX.WARN} | The critical rate of the dropped connections for a trigger expression. |
1 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Nginx: Get stub status page | The following status information is provided:
See also Module ngxhttpstubstatusmodule. |
HTTP agent | nginx.getstubstatus |
Nginx: Service status | Simple check | net.tcp.service[http,"{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PORT}"] Preprocessing
|
|
Nginx: Service response time | Simple check | net.tcp.service.perf[http,"{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PORT}"] | |
Nginx: Requests total | The total number of client requests. |
Dependent item | nginx.requests.total Preprocessing
|
Nginx: Requests per second | The total number of client requests. |
Dependent item | nginx.requests.total.rate Preprocessing
|
Nginx: Connections accepted per second | The total number of accepted client connections. |
Dependent item | nginx.connections.accepted.rate Preprocessing
|
Nginx: Connections dropped per second | The total number of dropped client connections. |
Dependent item | nginx.connections.dropped.rate Preprocessing
|
Nginx: Connections handled per second | The total number of handled connections. Generally, the parameter value is the same as for the accepted connections, unless some resource limits have been reached (for example, the |
Dependent item | nginx.connections.handled.rate Preprocessing
|
Nginx: Connections active | The current number of active client connections including waiting connections. |
Dependent item | nginx.connections.active Preprocessing
|
Nginx: Connections reading | The current number of connections where Nginx is reading the request header. |
Dependent item | nginx.connections.reading Preprocessing
|
Nginx: Connections waiting | The current number of idle client connections waiting for a request. |
Dependent item | nginx.connections.waiting Preprocessing
|
Nginx: Connections writing | The current number of connections where Nginx is writing a response back to the client. |
Dependent item | nginx.connections.writing Preprocessing
|
Nginx: Version | Dependent item | nginx.version Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Nginx: Failed to fetch stub status page | Zabbix has not received any data for items for the last 30 minutes. |
find(/Nginx by HTTP/nginx.get_stub_status,,"iregexp","HTTP\/[\d.]+\s+200")=0 or nodata(/Nginx by HTTP/nginx.get_stub_status,30m)=1 |Warning |
Manual close: Yes Depends on:
|
|
Nginx: Service is down | last(/Nginx by HTTP/net.tcp.service[http,"{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PORT}"])=0 |Average |
Manual close: Yes | ||
Nginx: Service response time is too high | min(/Nginx by HTTP/net.tcp.service.perf[http,"{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PORT}"],5m)>{$NGINX.RESPONSE_TIME.MAX.WARN} |Warning |
Manual close: Yes Depends on:
|
||
Nginx: High connections drop rate | The rate of dropping connections has been greater than {$NGINX.DROP_RATE.MAX.WARN} for the last 5 minutes. |
min(/Nginx by HTTP/nginx.connections.dropped.rate,5m) > {$NGINX.DROP_RATE.MAX.WARN} |Warning |
Depends on:
|
|
Nginx: Version has changed | The Nginx version has changed. Acknowledge to close the problem manually. |
last(/Nginx by HTTP/nginx.version,#1)<>last(/Nginx by HTTP/nginx.version,#2) and length(last(/Nginx by HTTP/nginx.version))>0 |Info |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is developed to monitor Nginx by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template Nginx by Zabbix agent
- collects metrics by polling the Module ngxhttpstubstatusmodule locally with Zabbix agent:
Active connections: 291
server accepts handled requests
16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106
Note that this template doesn't support HTTPS and redirects (limitations of web.page.get
).
It also uses Zabbix agent to collect Nginx
Linux process statistics, such as CPU usage, memory usage and whether the process is running or not.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
See the setup instructions for ngxhttpstubstatusmodule.
Test the availability of the http_stub_status_module
nginx -V 2>&1 | grep -o with-http_stub_status_module
.
Example configuration of Nginx:
location = /basic_status {
stub_status;
allow 127.0.0.1;
allow ::1;
deny all;
}
If you use another location, then don't forget to change the {$NGINX.STUB_STATUS.PATH} macro.
Example answer from Nginx:
Active connections: 291
server accepts handled requests
16630948 16630948 31070465
Reading: 6 Writing: 179 Waiting: 106
Note that this template doesn't support https and redirects (limitations of web.page.get).
Install and setup Zabbix agent.
Name | Description | Default |
---|---|---|
{$NGINX.STUB_STATUS.HOST} | The hostname or IP address of the Nginx host or Nginx container of |
localhost |
{$NGINX.STUB_STATUS.PATH} | The path of the |
basic_status |
{$NGINX.STUB_STATUS.PORT} | The port of the |
80 |
{$NGINX.RESPONSE_TIME.MAX.WARN} | The maximum response time of Nginx expressed in seconds for a trigger expression. |
10 |
{$NGINX.DROP_RATE.MAX.WARN} | The critical rate of the dropped connections for a trigger expression. |
1 |
{$NGINX.PROCESS_NAME} | The process name filter for the Nginx process discovery. |
nginx |
{$NGINX.PROCESS.NAME.PARAMETER} | The process name of the Nginx server used in the item key |
Name | Description | Type | Key and additional info |
---|---|---|---|
Nginx: Get stub status page | The following status information is provided:
See also Module ngxhttpstubstatusmodule. |
Zabbix agent | web.page.get["{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PATH}","{$NGINX.STUB_STATUS.PORT}"] |
Nginx: Service status | Zabbix agent | net.tcp.service[http,"{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PORT}"] Preprocessing
|
|
Nginx: Service response time | Zabbix agent | net.tcp.service.perf[http,"{$NGINX.STUBSTATUS.HOST}","{$NGINX.STUBSTATUS.PORT}"] | |
Nginx: Requests total | The total number of client requests. |
Dependent item | nginx.requests.total Preprocessing
|
Nginx: Requests per second | The total number of client requests. |
Dependent item | nginx.requests.total.rate Preprocessing
|
Nginx: Connections accepted per second | The total number of accepted client connections. |
Dependent item | nginx.connections.accepted.rate Preprocessing
|
Nginx: Connections dropped per second | The total number of dropped client connections. |
Dependent item | nginx.connections.dropped.rate Preprocessing
|
Nginx: Connections handled per second | The total number of handled connections. Generally, the parameter value is the same as for the accepted connections, unless some resource limits have been reached (for example, the |
Dependent item | nginx.connections.handled.rate Preprocessing
|
Nginx: Connections active | The current number of active client connections including waiting connections. |
Dependent item | nginx.connections.active Preprocessing
|
Nginx: Connections reading | The current number of connections where Nginx is reading the request header. |
Dependent item | nginx.connections.reading Preprocessing
|
Nginx: Connections waiting | The current number of idle client connections waiting for a request. |
Dependent item | nginx.connections.waiting Preprocessing
|
Nginx: Connections writing | The current number of connections where Nginx is writing a response back to the client. |
Dependent item | nginx.connections.writing Preprocessing
|
Nginx: Version | Dependent item | nginx.version Preprocessing
|
|
Nginx: Get processes summary | The aggregated data of summary metrics for all processes. |
Zabbix agent | proc.get[{$NGINX.PROCESS.NAME.PARAMETER},,,summary] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Nginx: Version has changed | The Nginx version has changed. Acknowledge to close the problem manually. |
last(/Nginx by Zabbix agent/nginx.version,#1)<>last(/Nginx by Zabbix agent/nginx.version,#2) and length(last(/Nginx by Zabbix agent/nginx.version))>0 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Nginx process discovery | The discovery of Nginx process summary. |
Dependent item | nginx.proc.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Nginx: CPU utilization | The percentage of the CPU utilization by a process {#NGINX.NAME}. |
Zabbix agent | proc.cpu.util[{#NGINX.NAME}] |
Nginx: Get process data | The summary metrics aggregated by a process {#NGINX.NAME}. |
Dependent item | nginx.proc.get[{#NGINX.NAME}] Preprocessing
|
Nginx: Memory usage (vsize) | The summary of virtual memory used by a process {#NGINX.NAME} expressed in bytes. |
Dependent item | nginx.proc.vmem[{#NGINX.NAME}] Preprocessing
|
Nginx: Memory usage (rss) | The summary of resident set size memory used by a process {#NGINX.NAME} expressed in bytes. |
Dependent item | nginx.proc.rss[{#NGINX.NAME}] Preprocessing
|
Nginx: Memory usage, % | The percentage of real memory used by a process {#NGINX.NAME}. |
Dependent item | nginx.proc.pmem[{#NGINX.NAME}] Preprocessing
|
Nginx: Number of running processes | The number of running processes {#NGINX.NAME}. |
Dependent item | nginx.proc.num[{#NGINX.NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Nginx: Process is not running | last(/Nginx by Zabbix agent/nginx.proc.num[{#NGINX.NAME}])=0 |High |
|||
Nginx: Service is down | last(/Nginx by Zabbix agent/net.tcp.service[http,"{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PORT}"])=0 and last(/Nginx by Zabbix agent/nginx.proc.num[{#NGINX.NAME}])>0 |Average |
Manual close: Yes | ||
Nginx: High connections drop rate | The rate of dropping connections has been greater than {$NGINX.DROP_RATE.MAX.WARN} for the last 5 minutes. |
min(/Nginx by Zabbix agent/nginx.connections.dropped.rate,5m) > {$NGINX.DROP_RATE.MAX.WARN} and last(/Nginx by Zabbix agent/nginx.proc.num[{#NGINX.NAME}])>0 |Warning |
Depends on:
|
|
Nginx: Service response time is too high | min(/Nginx by Zabbix agent/net.tcp.service.perf[http,"{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PORT}"],5m)>{$NGINX.RESPONSE_TIME.MAX.WARN} and last(/Nginx by Zabbix agent/nginx.proc.num[{#NGINX.NAME}])>0 |Warning |
Manual close: Yes Depends on:
|
||
Nginx: Failed to fetch stub status page | Zabbix has not received any data for items for the last 30 minutes. |
(find(/Nginx by Zabbix agent/web.page.get["{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PATH}","{$NGINX.STUB_STATUS.PORT}"],,"iregexp","HTTP\/[\d.]+\s+200")=0 or nodata(/Nginx by Zabbix agent/web.page.get["{$NGINX.STUB_STATUS.HOST}","{$NGINX.STUB_STATUS.PATH}","{$NGINX.STUB_STATUS.PORT}"],30m)) and last(/Nginx by Zabbix agent/nginx.proc.num[{#NGINX.NAME}])>0 |Warning |
Manual close: Yes Depends on:
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for monitoring Nextcloud by HTTP via Zabbix, and it works without any external scripts.
Nextcloud is a suite of client-server software for creating and using file hosting services.
For more information, see the official documentation
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Set macros {$NEXTCLOUD.USER.NAME}
, {$NEXTCLOUD.USER.PASSWORD}
, {$NEXTCLOUD.ADDRESS}
.
The user must be included in the Administrators group.
Name | Description | Default |
---|---|---|
{$NEXTCLOUD.SCHEMA} | HTTP or HTTPS protocol of Nextcloud. |
https |
{$NEXTCLOUD.USER.NAME} | Nextcloud username. |
root |
{$NEXTCLOUD.USER.PASSWORD} | Nextcloud user password. |
<Put the password here> |
{$NEXTCLOUD.ADDRESS} | IP or DNS name of Nextcloud server. |
127.0.0.1 |
{$NEXTCLOUD.LLD.FILTER.USER.MATCHES} | Filter of discoverable users by name. |
.* |
{$NEXTCLOUD.LLD.FILTER.USER.NOT_MATCHES} | Filter to exclude discovered users by name. |
CHANGE_IF_NEEDED |
{$NEXTCLOUD.USER.QUOTA.PUSED.MAX} | Storage utilization threshold. |
90 |
{$NEXTCLOUD.USER.MAX.INACTIVE} | How many days a user can be inactive. |
30 |
{$NEXTCLOUD.CPU.LOAD.MAX} | CPU load threshold (the number of processes in the system run queue). |
95 |
{$NEXTCLOUD.MEM.PUSED.MAX} | Memory utilization threshold. |
90 |
{$NEXTCLOUD.SWAP.PUSED.MAX} | Swap utilization threshold. |
90 |
{$NEXTCLOUD.PHP.MEM.PUSED.MAX} | PHP memory utilization threshold. |
90 |
{$NEXTCLOUD.STORAGE.FREE.MIN} | Free space threshold. |
1G |
{$NEXTCLOUD.PROXY} | Proxy HTTP(S) address. |
Name | Description | Type | Key and additional info |
---|---|---|---|
Nextcloud: Get server information | This item provides useful server information, such as CPU load, RAM usage, disk usage, number of users, etc. https://github.com/nextcloud/serverinfo |
HTTP agent | nextcloud.serverinfo.get_data Preprocessing
|
Nextcloud: Server information status | Server information API status |
Dependent item | nextcloud.serverinfo.status Preprocessing
|
Nextcloud: Version | Nextcloud service version. |
Dependent item | nextcloud.serverinfo.version Preprocessing
|
Nextcloud: Free space | The amount of free disk space. |
Dependent item | nextcloud.serverinfo.freespace Preprocessing
|
Nextcloud: CPU load, avg 1m | The average system load (the number of processes in the system run queue), last 1 minute. |
Dependent item | nextcloud.serverinfo.cpu.avg.1m Preprocessing
|
Nextcloud: CPU load, avg 5m | The average system load (the number of processes in the system run queue), last 5 minutes. |
Dependent item | nextcloud.serverinfo.cpu.avg.5m Preprocessing
|
Nextcloud: CPU load, avg 15m | The average system load (the number of processes in the system run queue), last 15 minutes. |
Dependent item | nextcloud.serverinfo.cpu.avg.15m Preprocessing
|
Nextcloud: Memory total | The size of the RAM. |
Dependent item | nextcloud.serverinfo.mem.total Preprocessing
|
Nextcloud: Memory free | The amount of free RAM. |
Dependent item | nextcloud.serverinfo.mem.free Preprocessing
|
Nextcloud: Memory used, in % | RAM usage, in percent. |
Dependent item | nextcloud.serverinfo.mem.pused Preprocessing
|
Nextcloud: Swap total | The size of the swap memory. |
Dependent item | nextcloud.serverinfo.swap.total Preprocessing
|
Nextcloud: Swap free | The amount of free swap. |
Dependent item | nextcloud.serverinfo.swap.free Preprocessing
|
Nextcloud: Swap used, in % | Swap usage, in percent. |
Dependent item | nextcloud.serverinfo.swap.pused Preprocessing
|
Nextcloud: Apps installed | The number of installed applications. |
Dependent item | nextcloud.serverinfo.apps.installed Preprocessing
|
Nextcloud: Apps update available | The number of applications for which an update is available. |
Dependent item | nextcloud.serverinfo.apps.update Preprocessing
|
Nextcloud: Web server | Web server description. |
Dependent item | nextcloud.serverinfo.apps.webserver Preprocessing
|
Nextcloud: PHP version | PHP version |
Dependent item | nextcloud.serverinfo.php.version Preprocessing
|
Nextcloud: PHP memory limit | By default, the PHP memory limit is generally set to 128 MB, but it can be customized based on the application's specific needs. The php.ini file is usually the standard location to set the PHP memory limit. |
Dependent item | nextcloud.serverinfo.php.memory.limit Preprocessing
|
Nextcloud: PHP memory used | PHP memory used |
Dependent item | nextcloud.serverinfo.php.memory.used Preprocessing
|
Nextcloud: PHP memory free | PHP free memory size. |
Dependent item | nextcloud.serverinfo.php.memory.free Preprocessing
|
Nextcloud: PHP memory wasted | Memory allocated to the service but not in use. |
Dependent item | nextcloud.serverinfo.php.memory.wasted Preprocessing
|
Nextcloud: PHP memory wasted, in % | Memory allocated to the service but not in use, in percent. |
Dependent item | nextcloud.serverinfo.php.memory.wasted_percentage Preprocessing
|
Nextcloud: PHP memory used, in % | PHP memory used percentage |
Dependent item | nextcloud.serverinfo.php.memory.pused Preprocessing
|
Nextcloud: PHP maximum execution time | By default, the maximum execution time for PHP scripts is set to 30 seconds. If a script runs for longer than 30 seconds, PHP stops the script and reports an error. You can control the amount of time PHP allows scripts to run by changing the 'maxexecutiontime' directive in your php.ini file. |
Dependent item | nextcloud.serverinfo.php.maxexecutiontime Preprocessing
|
Nextcloud: PHP maximum upload file size | By default, the maximum upload file size for PHP scripts is set to 128 megabytes. However, you may want to change this limit. For example, you can set a lower limit to prevent users from uploading large files to your site. To do this, change the 'uploadmaxfilesize' and 'postmaxsize' directives. |
Dependent item | nextcloud.serverinfo.php.uploadmaxfilesize Preprocessing
|
Nextcloud: Database type | Database type. |
Dependent item | nextcloud.serverinfo.db.type Preprocessing
|
Nextcloud: Database version | Database description. |
Dependent item | nextcloud.serverinfo.db.version Preprocessing
|
Nextcloud: Database size | Size of database. |
Dependent item | nextcloud.serverinfo.db.size Preprocessing
|
Nextcloud: Active users, last 5 minutes | The number of active users in the last 5 minutes. |
Dependent item | nextcloud.serverinfo.active_users.last5m Preprocessing
|
Nextcloud: Active users, last 1 hour | The number of active users in the last 1 hour. |
Dependent item | nextcloud.serverinfo.active_users.last1h Preprocessing
|
Nextcloud: Active users, last 24 hours | The number of active users in the last day. |
Dependent item | nextcloud.serverinfo.active_users.last24hours Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Nextcloud: Server information unavailable | Failed to get server information. |
last(/Nextcloud by HTTP/nextcloud.serverinfo.status)<>"OK" |High |
||
Nextcloud: Version has changed | Nextcloud version has changed. Acknowledge to close the problem manually. |
change(/Nextcloud by HTTP/nextcloud.serverinfo.version)=1 and length(last(/Nextcloud by HTTP/nextcloud.serverinfo.version))>0 |Info |
Manual close: Yes | |
Nextcloud: Disk space is low | Condition should be the following: |
last(/Nextcloud by HTTP/nextcloud.serverinfo.freespace)<{$NEXTCLOUD.STORAGE.FREE.MIN} |Average |
Manual close: Yes | |
Nextcloud: CPU load is too high | High CPU load. |
min(/Nextcloud by HTTP/nextcloud.serverinfo.cpu.avg.1m,5m) > {$NEXTCLOUD.CPU.LOAD.MAX} |Average |
||
Nextcloud: High memory utilization | The system is running out of free memory. |
min(/Nextcloud by HTTP/nextcloud.serverinfo.mem.pused,5m) > {$NEXTCLOUD.MEM.PUSED.MAX} |Average |
||
Nextcloud: High swap utilization | The system is running out of free swap. |
min(/Nextcloud by HTTP/nextcloud.serverinfo.swap.pused,5m) > {$NEXTCLOUD.SWAP.PUSED.MAX} |Average |
||
Nextcloud: Number of installed apps has been changed | Applications have been installed or removed. |
change(/Nextcloud by HTTP/nextcloud.serverinfo.apps.installed)<>0 |Info |
Manual close: Yes | |
Nextcloud: Application updates are available | Updates are available for some of the installed applications. |
last(/Nextcloud by HTTP/nextcloud.serverinfo.apps.update)<>0 |Warning |
Manual close: Yes | |
Nextcloud: PHP version has changed | The PHP version has changed. Acknowledge to close the problem manually. |
change(/Nextcloud by HTTP/nextcloud.serverinfo.php.version)=1 and length(last(/Nextcloud by HTTP/nextcloud.serverinfo.php.version))>0 |Info |
Manual close: Yes | |
Nextcloud: High PHP memory utilization | The PHP is running out of free memory. |
min(/Nextcloud by HTTP/nextcloud.serverinfo.php.memory.pused,5m) > {$NEXTCLOUD.PHP.MEM.PUSED.MAX} |Average |
||
Nextcloud: Database version has changed | The Database version has changed. Acknowledge to close the problem manually. |
change(/Nextcloud by HTTP/nextcloud.serverinfo.db.version)=1 and length(last(/Nextcloud by HTTP/nextcloud.serverinfo.db.version))>0 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Nextcloud: User discovery | User discovery. |
HTTP agent | nextcloud.user.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Nextcloud: User "{#NEXTCLOUD.USER}": Get data | Get common information about user |
HTTP agent | nextcloud.user.get_data[{#NEXTCLOUD.USER}] Preprocessing
|
Nextcloud: User "{#NEXTCLOUD.USER}": Status | User account status. |
Dependent item | nextcloud.user.enabled[{#NEXTCLOUD.USER}] Preprocessing
|
Nextcloud: User "{#NEXTCLOUD.USER}": Storage location | The location of the user's store. |
Dependent item | nextcloud.user.storageLocation[{#NEXTCLOUD.USER}] Preprocessing
|
Nextcloud: User "{#NEXTCLOUD.USER}": Last login | The time the user has last logged in. |
Dependent item | nextcloud.user.lastLogin[{#NEXTCLOUD.USER}] Preprocessing
|
Nextcloud: User "{#NEXTCLOUD.USER}": Last login, days ago | The number of days since the user has last logged in. |
Dependent item | nextcloud.user.inactive[{#NEXTCLOUD.USER}] Preprocessing
|
Nextcloud: User "{#NEXTCLOUD.USER}": Quota free space | The size of the free available space in the user's storage. |
Dependent item | nextcloud.user.quota.free[{#NEXTCLOUD.USER}] Preprocessing
|
Nextcloud: User "{#NEXTCLOUD.USER}": Quota used space | The size of the used available space in the user storage. |
Dependent item | nextcloud.user.quota.used[{#NEXTCLOUD.USER}] Preprocessing
|
Nextcloud: User "{#NEXTCLOUD.USER}": Quota total space | The size of space available in the user's storage. |
Dependent item | nextcloud.user.quota.total[{#NEXTCLOUD.USER}] Preprocessing
|
Nextcloud: User "{#NEXTCLOUD.USER}": Quota used space, in % | Usage of the allocated storage space, in percent. |
Dependent item | nextcloud.user.quota.pused[{#NEXTCLOUD.USER}] Preprocessing
|
Nextcloud: User "{#NEXTCLOUD.USER}": Quota | The size of space available in the user's storage. |
Dependent item | nextcloud.user.quota[{#NEXTCLOUD.USER}] Preprocessing
|
Nextcloud: User "{#NEXTCLOUD.USER}": Display name | User visible name. |
Dependent item | nextcloud.user.displayname[{#NEXTCLOUD.USER}] Preprocessing
|
Nextcloud: User "{#NEXTCLOUD.USER}": Language | User language. |
Dependent item | nextcloud.user.language[{#NEXTCLOUD.USER}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Nextcloud: User "{#NEXTCLOUD.USER}" status changed | User account status has changed. |
change(/Nextcloud by HTTP/nextcloud.user.enabled[{#NEXTCLOUD.USER}]) = 1 |Info |
||
Nextcloud: User "{#NEXTCLOUD.USER}": inactive | The user has not logged in for more than {$NEXTCLOUD.USER.MAX.INACTIVE:"{#NEXTCLOUD.USER}"} days. |
last(/Nextcloud by HTTP/nextcloud.user.inactive[{#NEXTCLOUD.USER}]) > {$NEXTCLOUD.USER.MAX.INACTIVE:"{#NEXTCLOUD.USER}"} |Info |
||
Nextcloud: User "{#NEXTCLOUD.USER}": High quota utilization | More than {$NEXTCLOUD.USER.QUOTA.PUSED.MAX:"{#NEXTCLOUD.USER}"} percent of the allocated storage space has been used. |
min(/Nextcloud by HTTP/nextcloud.user.quota.pused[{#NEXTCLOUD.USER}],5m) > {$NEXTCLOUD.USER.QUOTA.PUSED.MAX:"{#NEXTCLOUD.USER}"} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Memcached monitoring by Zabbix via Zabbix agent 2 and doesn't require any external scripts.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Setup and configure zabbix-agent2 compiled with the Memcached monitoring plugin.
Test availability: zabbix_get -s memcached-host -k memcached.ping
Name | Description | Default |
---|---|---|
{$MEMCACHED.CONN.URI} | Connection string in the URI format (password is not used). This param overwrites a value configured in the "Plugins.Memcached.Uri" option of the configuration file (if it's set), otherwise, the plugin's default value is used: "tcp://localhost:11211" |
tcp://localhost:11211 |
{$MEMCACHED.CONN.THROTTLED.MAX.WARN} | Maximum number of throttled connections per second |
1 |
{$MEMCACHED.CONN.QUEUED.MAX.WARN} | Maximum number of queued connections per second |
1 |
{$MEMCACHED.CONN.PRC.MAX.WARN} | Maximum percentage of connected clients |
80 |
{$MEMCACHED.MEM.PUSED.MAX.WARN} | Maximum percentage of memory used |
90 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Memcached: Get status | Zabbix agent | memcached.stats["{$MEMCACHED.CONN.URI}"] | |
Memcached: Ping | Zabbix agent | memcached.ping["{$MEMCACHED.CONN.URI}"] Preprocessing
|
|
Memcached: Max connections | Max number of concurrent connections |
Dependent item | memcached.connections.max Preprocessing
|
Memcached: Maximum number of bytes | Maximum number of bytes allowed in cache. You can adjust this setting via a config file or the command line while starting your Memcached server. |
Dependent item | memcached.config.limit_maxbytes Preprocessing
|
Memcached: CPU sys | System CPU consumed by the Memcached server |
Dependent item | memcached.cpu.sys Preprocessing
|
Memcached: CPU user | User CPU consumed by the Memcached server |
Dependent item | memcached.cpu.user Preprocessing
|
Memcached: Queued connections per second | Number of times that memcached has hit its connections limit and disabled its listener |
Dependent item | memcached.connections.queued.rate Preprocessing
|
Memcached: New connections per second | Number of connections opened per second |
Dependent item | memcached.connections.rate Preprocessing
|
Memcached: Throttled connections | Number of times a client connection was throttled. When sending GETs in batch mode and the connection contains too many requests (limited by -R parameter) the connection might be throttled to prevent starvation. |
Dependent item | memcached.connections.throttled.rate Preprocessing
|
Memcached: Connection structures | Number of connection structures allocated by the server |
Dependent item | memcached.connections.structures Preprocessing
|
Memcached: Open connections | The number of clients presently connected |
Dependent item | memcached.connections.current Preprocessing
|
Memcached: Commands: FLUSH per second | The flush_all command invalidates all items in the database. This operation incurs a performance penalty and shouldn't take place in production, so check your debug scripts. |
Dependent item | memcached.commands.flush.rate Preprocessing
|
Memcached: Commands: GET per second | Number of GET requests received by server per second. |
Dependent item | memcached.commands.get.rate Preprocessing
|
Memcached: Commands: SET per second | Number of SET requests received by server per second. |
Dependent item | memcached.commands.set.rate Preprocessing
|
Memcached: Process id | PID of the server process |
Dependent item | memcached.process_id Preprocessing
|
Memcached: Memcached version | Version of the Memcached server |
Dependent item | memcached.version Preprocessing
|
Memcached: Uptime | Number of seconds since Memcached server start |
Dependent item | memcached.uptime Preprocessing
|
Memcached: Bytes used | Current number of bytes used to store items. |
Dependent item | memcached.stats.bytes Preprocessing
|
Memcached: Written bytes per second | The network's read rate per second in B/sec |
Dependent item | memcached.stats.bytes_written.rate Preprocessing
|
Memcached: Read bytes per second | The network's read rate per second in B/sec |
Dependent item | memcached.stats.bytes_read.rate Preprocessing
|
Memcached: Hits per second | Number of successful GET requests (items requested and found) per second. |
Dependent item | memcached.stats.hits.rate Preprocessing
|
Memcached: Misses per second | Number of missed GET requests (items requested but not found) per second. |
Dependent item | memcached.stats.misses.rate Preprocessing
|
Memcached: Evictions per second | "An eviction is when an item that still has time to live is removed from the cache because a brand new item needs to be allocated. The item is selected with a pseudo-LRU mechanism. A high number of evictions coupled with a low hit rate means your application is setting a large number of keys that are never used again." |
Dependent item | memcached.stats.evictions.rate Preprocessing
|
Memcached: New items per second | Number of new items stored per second. |
Dependent item | memcached.stats.total_items.rate Preprocessing
|
Memcached: Current number of items stored | Current number of items stored by this instance. |
Dependent item | memcached.stats.curr_items Preprocessing
|
Memcached: Threads | Number of worker threads requested |
Dependent item | memcached.stats.threads Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Memcached: Service is down | last(/Memcached by Zabbix agent 2/memcached.ping["{$MEMCACHED.CONN.URI}"])=0 |Average |
Manual close: Yes | ||
Memcached: Failed to fetch info data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Memcached by Zabbix agent 2/memcached.cpu.sys,30m)=1 |Warning |
Manual close: Yes Depends on:
|
|
Memcached: Too many queued connections | The max number of connections is reached and a new connection had to wait in the queue as a result. |
min(/Memcached by Zabbix agent 2/memcached.connections.queued.rate,5m)>{$MEMCACHED.CONN.QUEUED.MAX.WARN} |Warning |
||
Memcached: Too many throttled connections | Number of times a client connection was throttled is too high. |
min(/Memcached by Zabbix agent 2/memcached.connections.throttled.rate,5m)>{$MEMCACHED.CONN.THROTTLED.MAX.WARN} |Warning |
||
Memcached: Total number of connected clients is too high | When the number of connections reaches the value of the "max_connections" parameter, new connections will be rejected. |
min(/Memcached by Zabbix agent 2/memcached.connections.current,5m)/last(/Memcached by Zabbix agent 2/memcached.connections.max)*100>{$MEMCACHED.CONN.PRC.MAX.WARN} |Warning |
||
Memcached: Version has changed | The Memcached version has changed. Acknowledge to close the problem manually. |
last(/Memcached by Zabbix agent 2/memcached.version,#1)<>last(/Memcached by Zabbix agent 2/memcached.version,#2) and length(last(/Memcached by Zabbix agent 2/memcached.version))>0 |Info |
Manual close: Yes | |
Memcached: has been restarted | Uptime is less than 10 minutes. |
last(/Memcached by Zabbix agent 2/memcached.uptime)<10m |Info |
Manual close: Yes | |
Memcached: Memory usage is too high | min(/Memcached by Zabbix agent 2/memcached.stats.bytes,5m)/last(/Memcached by Zabbix agent 2/memcached.config.limit_maxbytes)*100>{$MEMCACHED.MEM.PUSED.MAX.WARN} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Mantis BT monitoring by Zabbix via HTTP and doesn't require any external scripts.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Name | Description | Default |
---|---|---|
{$MANTIS.URL} | MantisBT URL. |
|
{$MANTIS.TOKEN} | MantisBT Token. |
|
{$MANTIS.LLD.FILTER.PROJECTS.MATCHES} | Filter of discoverable projects. |
.* |
{$MANTIS.LLD.FILTER.PROJECTS.NOT_MATCHES} | Filter to exclude discovered projects. |
CHANGE_IF_NEEDED |
{$MANTIS.HTTP.PROXY} | Proxy for http requests. |
Name | Description | Type | Key and additional info |
---|---|---|---|
Mantis BT: Get projects | Get projects from Mantis BT. |
HTTP agent | mantisbt.get.projects |
Name | Description | Type | Key and additional info |
---|---|---|---|
Projects discovery | Discovery rule for a Mantis BT projects. |
Dependent item | mantisbt.projects.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Project [{#NAME}]: Get issues | Getting project issues. |
HTTP agent | mantisbt.get.issues[{#NAME}] |
Project [{#NAME}]: Total issues | Count of issues in project. |
Dependent item | mantis.project.total_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: New issues | Count of issues with 'new' status. |
Dependent item | mantis.project.status.new_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Resolved issues | Count of issues with 'resolved' status. |
Dependent item | mantis.project.status.resolved_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Closed issues | Count of issues with 'closed' status. |
Dependent item | mantis.project.status.closed_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Assigned issues | Count of issues with 'assigned' status. |
Dependent item | mantis.project.status.assigned_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Feedback issues | Count of issues with 'feedback' status. |
Dependent item | mantis.project.status.feedback_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Acknowledged issues | Count of issues with 'acknowledged' status. |
Dependent item | mantis.project.status.acknowledged_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Confirmed issues | Count of issues with 'confirmed' status. |
Dependent item | mantis.project.status.confirmed_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Open issues | Count of "open" resolution issues. |
Dependent item | mantis.project.resolution.open_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Fixed issues | Count of "fixed" resolution issues. |
Dependent item | mantis.project.resolution.fixed_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Reopened issues | Count of "reopened" resolution issues. |
Dependent item | mantis.project.resolution.reopened_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Unable to reproduce issues | Count of "unable to reproduce" resolution issues. |
Dependent item | mantis.project.resolution.unabletoreproduce_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Not fixable issues | Count of "not fixable" resolution issues. |
Dependent item | mantis.project.resolution.notfixableissues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Duplicate issues | Count of "duplicate" resolution issues. |
Dependent item | mantis.project.resolution.duplicate_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: No change required issues | Count of "no change required" resolution issues. |
Dependent item | mantis.project.resolution.nochangerequired_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Suspended issues | Count of "suspended" resolution issues. |
Dependent item | mantis.project.resolution.suspended_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Will not fix issues | Count of "wont fix" resolution issues. |
Dependent item | mantis.project.resolution.wontfixissues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Feature severity issues | Count of "feature" severity issues. |
Dependent item | mantis.project.severity.feature_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Trivial severity issues | Count of "trivial" severity issues. |
Dependent item | mantis.project.severity.trivial_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Text severity issues | Count of "text" severity issues. |
Dependent item | mantis.project.severity.text_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Tweak severity issues | Count of "tweak" severity issues. |
Dependent item | mantis.project.severity.tweak_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Minor severity issues | Count of "minor" severity issues. |
Dependent item | mantis.project.severity.minor_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Major severity issues | Count of "major" severity issues. |
Dependent item | mantis.project.severity.major_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Crash severity issues | Count of "crash" severity issues. |
Dependent item | mantis.project.severity.crash_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Block severity issues | Count of "block" severity issues. |
Dependent item | mantis.project.severity.block_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: None priority issues | Count of "none" priority issues. |
Dependent item | mantis.project.priority.none_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Low priority issues | Count of "low" priority issues. |
Dependent item | mantis.project.priority.low_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Normal priority issues | Count of "normal" priority issues. |
Dependent item | mantis.project.priority.normal_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: High priority issues | Count of "high" priority issues. |
Dependent item | mantis.project.priority.high_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Urgent priority issues | Count of "urgent" priority issues. |
Dependent item | mantis.project.priority.urgent_issues[{#NAME}] Preprocessing
|
Project [{#NAME}]: Immediate priority issues | Count of "immediate" priority issues. |
Dependent item | mantis.project.priority.immediate_issues[{#NAME}] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Kubernetes state. It works without external scripts and uses the script item to make HTTP requests to the Kubernetes API.
Template Kubernetes cluster state by HTTP
- collects metrics by HTTP agent from kube-state-metrics endpoint and Kubernetes API.
Don't forget to change macros {$KUBE.API.URL} and {$KUBE.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values.
NOTE. Some metrics may not be collected depending on your Kubernetes version and configuration.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Install the Zabbix Helm Chart in your Kubernetes cluster. Internal service metrics are collected from kube-state-metrics endpoint.
Template needs to use authorization via API token.
Set the {$KUBE.API.URL}
such as <scheme>://<host>:<port>
.
Get the generated service account token using the command:
kubectl get secret zabbix-service-account -n monitoring -o jsonpath={.data.token} | base64 -d
Then set it to the macro {$KUBE.API.TOKEN}
.
Set {$KUBE.STATE.ENDPOINT.NAME}
with Kube state metrics endpoint name. See kubectl -n monitoring get ep
. Default: zabbix-kube-state-metrics
.
NOTE. If you wish to monitor Controller Manager and Scheduler components, you might need to set the --binding-address
option for them to the address where Zabbix proxy can reach them.
For example, for clusters created with kubeadm
it can be set in the following manifest files (changes will be applied immediately):
Depending on your Kubernetes distribution, you might need to adjust {$KUBE.CONTROL_PLANE.TAINT}
macro (for example, set it to node-role.kubernetes.io/master
for OpenShift).
NOTE. Some metrics may not be collected depending on your Kubernetes version and configuration.
Also, see the Macros section for a list of macros used to set trigger values.
Set up the macros to filter the metrics of discovered Kubelets by node names:
Set up macros to filter metrics by namespace:
Set up macros to filter node metrics by nodename:
Note: If you have a large cluster, it is highly recommended to set a filter for discoverable namespaces.
You can use the {$KUBE.KUBELET.FILTER.LABELS}
and {$KUBE.KUBELET.FILTER.ANNOTATIONS}
macros for advanced filtering of kubelets by node labels and annotations.
Notes about labels and annotations filters:
key1: value, key2: regexp
).!
) to invert the filter (!key: value
).For example: kubernetes.io/hostname: kubernetes-node[5-25], !node-role.kubernetes.io/ingress: .*
. As a result, the kubelets on nodes 5-25 without the "ingress" role will be discovered.
See the Kubernetes documentation for details about labels and annotations:
You can also set up evaluation periods for replica mismatch triggers (Deployments, ReplicaSets, StatefulSets) with the macro {$KUBE.REPLICA.MISMATCH.EVAL_PERIOD}
, which supports context and regular expressions. For example, you can create the following macros:
{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:"deployment:default:nginx-deployment"} = #3
{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:regex:"deployment:.*:.*"} = #10
or {$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:regex:"^deployment.*"} = #10
{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:regex:".*:default:.*"} = 15m
Note that different context macros with regular expressions matching the same string can be applied in an undefined order, and simple context macros (without regular expressions) have higher priority. Read the Important notes section in Zabbix documentation
for details.
Name | Description | Default |
---|---|---|
{$KUBE.API.URL} | Kubernetes API endpoint URL in the format |
https://kubernetes.default.svc.cluster.local:443 |
{$KUBE.API.READYZ.ENDPOINT} | Kubernetes API readyz endpoint /readyz |
/readyz |
{$KUBE.API.LIVEZ.ENDPOINT} | Kubernetes API livez endpoint /livez |
/livez |
{$KUBE.API.COMPONENTSTATUSES.ENDPOINT} | Kubernetes API componentstatuses endpoint /api/v1/componentstatuses |
/api/v1/componentstatuses |
{$KUBE.API.TOKEN} | Service account bearer token. |
|
{$KUBE.HTTP.PROXY} | Sets the HTTP proxy to |
|
{$KUBE.STATE.ENDPOINT.NAME} | Kubernetes state endpoint name. |
zabbix-kube-state-metrics |
{$OPENSHIFT.STATE.ENDPOINT.NAME} | OpenShift state endpoint name. |
openshift-state-metrics |
{$KUBE.API_SERVER.SCHEME} | Kubernetes API servers metrics endpoint scheme. Used in ControlPlane LLD. |
https |
{$KUBE.API_SERVER.PORT} | Kubernetes API servers metrics endpoint port. Used in ControlPlane LLD. |
6443 |
{$KUBE.CONTROL_PLANE.TAINT} | Taint that applies to control plane nodes. Change if needed. Used in ControlPlane LLD. |
node-role.kubernetes.io/control-plane |
{$KUBE.CONTROLLER_MANAGER.SCHEME} | Kubernetes Controller manager metrics endpoint scheme. Used in ControlPlane LLD. |
https |
{$KUBE.CONTROLLER_MANAGER.PORT} | Kubernetes Controller manager metrics endpoint port. Used in ControlPlane LLD. |
10257 |
{$KUBE.SCHEDULER.SCHEME} | Kubernetes Scheduler metrics endpoint scheme. Used in ControlPlane LLD. |
https |
{$KUBE.SCHEDULER.PORT} | Kubernetes Scheduler metrics endpoint port. Used in ControlPlane LLD. |
10259 |
{$KUBE.KUBELET.SCHEME} | Kubernetes Kubelet metrics endpoint scheme. Used in Kubelet LLD. |
https |
{$KUBE.KUBELET.PORT} | Kubernetes Kubelet metrics endpoint port. Used in Kubelet LLD. |
10250 |
{$KUBE.LLD.FILTER.NAMESPACE.MATCHES} | Filter of discoverable metrics by namespace. |
.* |
{$KUBE.LLD.FILTER.NAMESPACE.NOT_MATCHES} | Filter to exclude discovered metrics by namespace. |
CHANGE_IF_NEEDED |
{$KUBE.LLD.FILTER.NODE.MATCHES} | Filter of discoverable nodes by nodename. |
.* |
{$KUBE.LLD.FILTER.NODE.NOT_MATCHES} | Filter to exclude discovered nodes by nodename. |
CHANGE_IF_NEEDED |
{$KUBE.LLD.FILTER.KUBELET_NODE.MATCHES} | Filter of discoverable Kubelets by nodename. |
.* |
{$KUBE.LLD.FILTER.KUBELETNODE.NOTMATCHES} | Filter to exclude discovered Kubelets by nodename. |
CHANGE_IF_NEEDED |
{$KUBE.KUBELET.FILTER.ANNOTATIONS} | Node annotations to filter Kubelets (regex in values are supported). See the template's README.md for details. |
|
{$KUBE.KUBELET.FILTER.LABELS} | Node labels to filter Kubelets (regex in values are supported). See the template's README.md for details. |
|
{$KUBE.LLD.FILTER.PV.MATCHES} | Filter of discoverable persistent volumes by name. |
.* |
{$KUBE.LLD.FILTER.PV.NOT_MATCHES} | Filter to exclude discovered persistent volumes by name. |
CHANGE_IF_NEEDED |
{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD} | The evaluation period range which is used for calculation of expressions in trigger prototypes (time period or value range). Can be used with context. |
#5 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes: Get state metrics | Collecting Kubernetes metrics from kube-state-metrics. |
Script | kube.state.metrics |
Kubernetes: Control plane LLD | Generation of data for Control plane discovery rules. |
Script | kube.control_plane.lld Preprocessing
|
Kubernetes: Node LLD | Generation of data for Kubelet discovery rules. |
Script | kube.node.lld Preprocessing
|
Kubernetes: Get component statuses | HTTP agent | kube.componentstatuses Preprocessing
|
|
Kubernetes: Get readyz | HTTP agent | kube.readyz Preprocessing
|
|
Kubernetes: Get livez | HTTP agent | kube.livez Preprocessing
|
|
Kubernetes: Namespace count | The number of namespaces. |
Dependent item | kube.namespace.count Preprocessing
|
Kubernetes: CronJob count | Number of cronjobs. |
Dependent item | kube.cronjob.count Preprocessing
|
Kubernetes: Job count | Number of jobs (generated by cronjob + job). |
Dependent item | kube.job.count Preprocessing
|
Kubernetes: Endpoint count | Number of endpoints. |
Dependent item | kube.endpoint.count Preprocessing
|
Kubernetes: Deployment count | The number of deployments. |
Dependent item | kube.deployment.count Preprocessing
|
Kubernetes: Service count | The number of services. |
Dependent item | kube.service.count Preprocessing
|
Kubernetes: StatefulSet count | The number of statefulsets. |
Dependent item | kube.statefulset.count Preprocessing
|
Kubernetes: Node count | The number of nodes. |
Dependent item | kube.node.count Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
API servers discovery | Dependent item | kube.api_servers.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Controller manager nodes discovery | Dependent item | kube.controller_manager.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Scheduler servers nodes discovery | Dependent item | kube.scheduler.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubelet discovery | Dependent item | kube.kubelet.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Daemonset discovery | Dependent item | kube.daemonset.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes: Namespace [{#NAMESPACE}] Daemonset [{#NAME}]: Ready | The number of nodes that should be running the daemon pod and have one or more running and ready. |
Dependent item | kube.daemonset.ready[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Daemonset [{#NAME}]: Scheduled | The number of nodes that run at least one daemon pod and are supposed to. |
Dependent item | kube.daemonset.scheduled[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Daemonset [{#NAME}]: Desired | The number of nodes that should be running the daemon pod. |
Dependent item | kube.daemonset.desired[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Daemonset [{#NAME}]: Misscheduled | The number of nodes that run a daemon pod but are not supposed to. |
Dependent item | kube.daemonset.misscheduled[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Daemonset [{#NAME}]: Updated number scheduled | The total number of nodes that are running updated daemon pod. |
Dependent item | kube.daemonset.updated[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
PVC discovery | Dependent item | kube.pvc.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes: Namespace [{#NAMESPACE}] PVC [{#NAME}] Status phase | The current status phase of the persistent volume claim. |
Dependent item | kube.pvc.status_phase[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] PVC [{#NAME}] Requested storage | The capacity of storage requested by the persistent volume claim. |
Dependent item | kube.pvc.requested.storage[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] PVC status phase: Bound, sum | The total amount of persistent volume claims in the Bound phase. |
Dependent item | kube.pvc.status_phase.bound.sum[{#NAMESPACE}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] PVC status phase: Lost, sum | The total amount of persistent volume claims in the Lost phase. |
Dependent item | kube.pvc.status_phase.lost.sum[{#NAMESPACE}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] PVC status phase: Pending, sum | The total amount of persistent volume claims in the Pending phase. |
Dependent item | kube.pvc.status_phase.pending.sum[{#NAMESPACE}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes: NS [{#NAMESPACE}] PVC [{#NAME}]: PVC is pending | count(/Kubernetes cluster state by HTTP/kube.pvc.status_phase[{#NAMESPACE}/{#NAME}],2m,,5)>=2 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
PV discovery | Dependent item | kube.pv.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes: PV [{#NAME}] Status phase | The current status phase of the persistent volume. |
Dependent item | kube.pv.status_phase[{#NAME}] Preprocessing
|
Kubernetes: PV [{#NAME}] Capacity bytes | A capacity of the persistent volume in bytes. |
Dependent item | kube.pv.capacity.bytes[{#NAME}] Preprocessing
|
Kubernetes: PV status phase: Pending, sum | The total amount of persistent volumes in the Pending phase. |
Dependent item | kube.pv.status_phase.pending.sum[{#SINGLETON}] Preprocessing
|
Kubernetes: PV status phase: Available, sum | The total amount of persistent volumes in the Available phase. |
Dependent item | kube.pv.status_phase.available.sum[{#SINGLETON}] Preprocessing
|
Kubernetes: PV status phase: Bound, sum | The total amount of persistent volumes in the Bound phase. |
Dependent item | kube.pv.status_phase.bound.sum[{#SINGLETON}] Preprocessing
|
Kubernetes: PV status phase: Released, sum | The total amount of persistent volumes in the Released phase. |
Dependent item | kube.pv.status_phase.released.sum[{#SINGLETON}] Preprocessing
|
Kubernetes: PV status phase: Failed, sum | The total amount of persistent volumes in the Failed phase. |
Dependent item | kube.pv.status_phase.failed.sum[{#SINGLETON}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes: PV [{#NAME}]: PV has failed | count(/Kubernetes cluster state by HTTP/kube.pv.status_phase[{#NAME}],2m,,3)>=2 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Deployment discovery | Dependent item | kube.deployment.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Paused | Whether the deployment is paused and will not be processed by the deployment controller. |
Dependent item | kube.deployment.spec_paused[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas desired | Number of desired pods for a deployment. |
Dependent item | kube.deployment.replicas_desired[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Rollingupdate max unavailable | Maximum number of unavailable replicas during a rolling update of a deployment. |
Dependent item | kube.deployment.rollingupdate.max_unavailable[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas | The number of replicas per deployment. |
Dependent item | kube.deployment.replicas[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas available | The number of available replicas per deployment. |
Dependent item | kube.deployment.replicas_available[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas unavailable | The number of unavailable replicas per deployment. |
Dependent item | kube.deployment.replicas_unavailable[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas updated | The number of updated replicas per deployment. |
Dependent item | kube.deployment.replicas_updated[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Replicas mismatched | The number of available replicas not matching the desired number of replicas. |
Dependent item | kube.deployment.replicas_mismatched[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes: Namespace [{#NAMESPACE}] Deployment [{#NAME}]: Deployment replicas mismatch | Deployment has not matched the expected number of replicas during the specified trigger evaluation period. |
min(/Kubernetes cluster state by HTTP/kube.deployment.replicas_mismatched[{#NAMESPACE}/{#NAME}],{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:"deployment:{#NAMESPACE}:{#NAME}"})>0 and last(/Kubernetes cluster state by HTTP/kube.deployment.replicas_desired[{#NAMESPACE}/{#NAME}])>=0 and last(/Kubernetes cluster state by HTTP/kube.deployment.replicas_available[{#NAMESPACE}/{#NAME}])>=0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Endpoint discovery | Dependent item | kube.endpoint.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes: Namespace [{#NAMESPACE}] Endpoint [{#NAME}]: Address available | Number of addresses available in endpoint. |
Dependent item | kube.endpoint.address_available[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Endpoint [{#NAME}]: Address not ready | Number of addresses not ready in endpoint. |
Dependent item | kube.endpoint.addressnotready[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Endpoint [{#NAME}]: Age | Endpoint age (number of seconds since creation). |
Dependent item | kube.endpoint.age[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Node discovery | Dependent item | kube.node.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes: Node [{#NAME}]: CPU allocatable | The CPU resources of a node that are available for scheduling. |
Dependent item | kube.node.cpu_allocatable[{#NAME}] Preprocessing
|
Kubernetes: Node [{#NAME}]: Memory allocatable | The memory resources of a node that are available for scheduling. |
Dependent item | kube.node.memory_allocatable[{#NAME}] Preprocessing
|
Kubernetes: Node [{#NAME}]: Pods allocatable | The pods resources of a node that are available for scheduling. |
Dependent item | kube.node.pods_allocatable[{#NAME}] Preprocessing
|
Kubernetes: Node [{#NAME}]: Ephemeral storage allocatable | The allocatable ephemeral storage of a node that is available for scheduling. |
Dependent item | kube.node.ephemeralstorageallocatable[{#NAME}] Preprocessing
|
Kubernetes: Node [{#NAME}]: CPU capacity | The capacity for CPU resources of a node. |
Dependent item | kube.node.cpu_capacity[{#NAME}] Preprocessing
|
Kubernetes: Node [{#NAME}]: Memory capacity | The capacity for memory resources of a node. |
Dependent item | kube.node.memory_capacity[{#NAME}] Preprocessing
|
Kubernetes: Node [{#NAME}]: Ephemeral storage capacity | The ephemeral storage capacity of a node. |
Dependent item | kube.node.ephemeralstoragecapacity[{#NAME}] Preprocessing
|
Kubernetes: Node [{#NAME}]: Pods capacity | The capacity for pods resources of a node. |
Dependent item | kube.node.pods_capacity[{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Pod discovery | Dependent item | kube.pod.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] Phase: Pending | Pod is in pending state. |
Dependent item | kube.pod.phase.pending[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] Phase: Succeeded | Pod is in succeeded state. |
Dependent item | kube.pod.phase.succeeded[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] Phase: Failed | Pod is in failed state. |
Dependent item | kube.pod.phase.failed[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] Phase: Unknown | Pod is in unknown state. |
Dependent item | kube.pod.phase.unknown[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] Phase: Running | Pod is in unknown state. |
Dependent item | kube.pod.phase.running[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers terminated | Describes whether the container is currently in terminated state. |
Dependent item | kube.pod.containers_terminated[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers waiting | Describes whether the container is currently in waiting state. |
Dependent item | kube.pod.containers_waiting[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers ready | Describes whether the containers readiness check succeeded. |
Dependent item | kube.pod.containers_ready[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers restarts | The number of container restarts. |
Dependent item | kube.pod.containers_restarts[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers running | Describes whether the container is currently in running state. |
Dependent item | kube.pod.containers_running[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Ready | Describes whether the pod is ready to serve requests. |
Dependent item | kube.pod.ready[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Scheduled | Describes the status of the scheduling process for the pod. |
Dependent item | kube.pod.scheduled[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Unschedulable | Describes the unschedulable status for the pod. |
Dependent item | kube.pod.unschedulable[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers CPU limits | The limit on CPU cores to be used by a container. |
Dependent item | kube.pod.containers.limits.cpu[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers memory limits | The limit on memory to be used by a container. |
Dependent item | kube.pod.containers.limits.memory[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers CPU requests | The number of requested CPU cores by a container. |
Dependent item | kube.pod.containers.requests.cpu[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Containers memory requests | The number of requested memory bytes by a container. |
Dependent item | kube.pod.containers.requests.memory[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Pod is not healthy | min(/Kubernetes cluster state by HTTP/kube.pod.phase.failed[{#NAMESPACE}/{#NAME}],10m)>0 or min(/Kubernetes cluster state by HTTP/kube.pod.phase.pending[{#NAMESPACE}/{#NAME}],10m)>0 or min(/Kubernetes cluster state by HTTP/kube.pod.phase.unknown[{#NAMESPACE}/{#NAME}],10m)>0 |High |
|||
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}]: Pod is crash looping | Containers of the pod keep restarting. This most likely indicates that the pod is in the CrashLoopBackOff state. |
(last(/Kubernetes cluster state by HTTP/kube.pod.containers_restarts[{#NAMESPACE}/{#NAME}])-min(/Kubernetes cluster state by HTTP/kube.pod.containers_restarts[{#NAMESPACE}/{#NAME}],15m))>1 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
ReplicaSet discovery | Dependent item | kube.replicaset.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes: Namespace [{#NAMESPACE}] ReplicaSet [{#NAME}]: Replicas | The number of replicas per ReplicaSet. |
Dependent item | kube.replicaset.replicas[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] ReplicaSet [{#NAME}]: Desired replicas | Number of desired pods for a ReplicaSet. |
Dependent item | kube.replicaset.replicas_desired[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] ReplicaSet [{#NAME}]: Fully labeled replicas | The number of fully labeled replicas per ReplicaSet. |
Dependent item | kube.replicaset.fullylabeledreplicas[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] ReplicaSet [{#NAME}]: Ready | The number of ready replicas per ReplicaSet. |
Dependent item | kube.replicaset.ready[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] ReplicaSet [{#NAME}]: Replicas mismatched | The number of ready replicas not matching the desired number of replicas. |
Dependent item | kube.replicaset.replicas_mismatched[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes: Namespace [{#NAMESPACE}] RS [{#NAME}]: ReplicaSet mismatch | ReplicaSet has not matched the expected number of replicas during the specified trigger evaluation period. |
min(/Kubernetes cluster state by HTTP/kube.replicaset.replicas_mismatched[{#NAMESPACE}/{#NAME}],{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:"replicaset:{#NAMESPACE}:{#NAME}"})>0 and last(/Kubernetes cluster state by HTTP/kube.replicaset.replicas_desired[{#NAMESPACE}/{#NAME}])>=0 and last(/Kubernetes cluster state by HTTP/kube.replicaset.ready[{#NAMESPACE}/{#NAME}])>=0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
StatefulSet discovery | Dependent item | kube.statefulset.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes: Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: Replicas | The number of replicas per StatefulSet. |
Dependent item | kube.statefulset.replicas[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: Desired replicas | Number of desired pods for a StatefulSet. |
Dependent item | kube.statefulset.replicas_desired[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: Current replicas | The number of current replicas per StatefulSet. |
Dependent item | kube.statefulset.replicas_current[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: Ready replicas | The number of ready replicas per StatefulSet. |
Dependent item | kube.statefulset.replicas_ready[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: Updated replicas | The number of updated replicas per StatefulSet. |
Dependent item | kube.statefulset.replicas_updated[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: Replicas mismatched | The number of ready replicas not matching the number of replicas. |
Dependent item | kube.statefulset.replicas_mismatched[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes: Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: StatefulSet is down | (last(/Kubernetes cluster state by HTTP/kube.statefulset.replicas_ready[{#NAMESPACE}/{#NAME}]) / last(/Kubernetes cluster state by HTTP/kube.statefulset.replicas_current[{#NAMESPACE}/{#NAME}]))<>1 |High |
|||
Kubernetes: Namespace [{#NAMESPACE}] StatefulSet [{#NAME}]: StatefulSet replicas mismatch | StatefulSet has not matched the number of replicas during the specified trigger evaluation period. |
min(/Kubernetes cluster state by HTTP/kube.statefulset.replicas_mismatched[{#NAMESPACE}/{#NAME}],{$KUBE.REPLICA.MISMATCH.EVAL_PERIOD:"statefulset:{#NAMESPACE}:{#NAME}"})>0 and last(/Kubernetes cluster state by HTTP/kube.statefulset.replicas[{#NAMESPACE}/{#NAME}])>=0 and last(/Kubernetes cluster state by HTTP/kube.statefulset.replicas_ready[{#NAMESPACE}/{#NAME}])>=0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
PodDisruptionBudget discovery | Dependent item | kube.pdb.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes: Namespace [{#NAMESPACE}] PodDisruptionBudget [{#NAME}]: Pods healthy | Current number of healthy pods. |
Dependent item | kube.pdb.pods_healthy[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] PodDisruptionBudget [{#NAME}]: Pods desired | Minimum desired number of healthy pods. |
Dependent item | kube.pdb.pods_desired[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] PodDisruptionBudget [{#NAME}]: Disruptions allowed | Number of pod disruptions that are allowed. |
Dependent item | kube.pdb.disruptions_allowed[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] PodDisruptionBudget [{#NAME}]: Pods total | Total number of pods counted by this disruption budget. |
Dependent item | kube.pdb.pods_total[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
CronJob discovery | Dependent item | kube.cronjob.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes: Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Suspend | Suspend flag tells the controller to suspend subsequent executions. |
Dependent item | kube.cronjob.spec_suspend[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Active | Active holds pointers to currently running jobs. |
Dependent item | kube.cronjob.status_active[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Last schedule | LastScheduleTime keeps information of when was the last time the job was successfully scheduled. |
Dependent item | kube.cronjob.lastscheduletime[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Next schedule | Next time the cronjob should be scheduled. The time after lastScheduleTime or after the cron job's creation time if it's never been scheduled. Use this to determine if the job is delayed. |
Dependent item | kube.cronjob.nextscheduletime[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Failed | The number of pods which reached the Failed phase and the reason for failure. |
Dependent item | kube.cronjob.status_failed[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Succeeded | The number of pods which reached the Succeeded phase. |
Dependent item | kube.cronjob.status_succeeded[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Completion succeeded | Number of jobs the execution of which has been completed. |
Dependent item | kube.cronjob.completion.succeeded[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] CronJob [{#NAME}]: Completion failed | Number of jobs the execution of which has failed. |
Dependent item | kube.cronjob.completion.failed[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Job discovery | Dependent item | kube.job.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes: Namespace [{#NAMESPACE}] Job [{#NAME}]: Failed | The number of pods which reached the Failed phase and the reason for failure. |
Dependent item | kube.job.status_failed[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Job [{#NAME}]: Succeeded | The number of pods which reached the Succeeded phase. |
Dependent item | kube.job.status_succeeded[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Job [{#NAME}]: Completion succeeded | Number of jobs the execution of which has been completed. |
Dependent item | kube.job.completion.succeeded[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Job [{#NAME}]: Completion failed | Number of jobs the execution of which has failed. |
Dependent item | kube.job.completion.failed[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Component statuses discovery | Dependent item | kube.componentstatuses.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes: Component [{#NAME}]: Healthy | Cluster component healthy. |
Dependent item | kube.componentstatuses.healthy[{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes: Component [{#NAME}] is unhealthy | count(/Kubernetes cluster state by HTTP/kube.componentstatuses.healthy[{#NAME}],#3,,"True")<2 and length(last(/Kubernetes cluster state by HTTP/kube.componentstatuses.healthy[{#NAME}]))>0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Readyz discovery | Dependent item | kube.readyz.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes: Readyz [{#NAME}]: Healthcheck | Result of readyz healthcheck for component. |
Dependent item | kube.readyz.healthcheck[{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes: Readyz [{#NAME}] is unhealthy | count(/Kubernetes cluster state by HTTP/kube.readyz.healthcheck[{#NAME}],#3,,"ok")<2 and length(last(/Kubernetes cluster state by HTTP/kube.readyz.healthcheck[{#NAME}]))>0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Livez discovery | Dependent item | kube.livez.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes: Livez [{#NAME}]: Healthcheck | Result of livez healthcheck for component. |
Dependent item | kube.livez.healthcheck[{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes: Livez [{#NAME}] is unhealthy | count(/Kubernetes cluster state by HTTP/kube.livez.healthcheck[{#NAME}],#3,,"ok")<2 and length(last(/Kubernetes cluster state by HTTP/kube.livez.healthcheck[{#NAME}]))>0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
OpenShift BuildConfig discovery | Dependent item | openshift.buildconfig.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
OpenShift: Namespace [{#NAMESPACE}] BuildConfig [{#NAME}]: Created | OpenShift BuildConfig Unix creation timestamp. |
Dependent item | openshift.buildconfig.created.time[{#NAMESPACE}/{#NAME}] Preprocessing
|
OpenShift: Namespace [{#NAMESPACE}] BuildConfig [{#NAME}]: Generation | Sequence number representing a specific generation of the desired state. |
Dependent item | openshift.buildconfig.generation[{#NAMESPACE}/{#NAME}] Preprocessing
|
OpenShift: Namespace [{#NAMESPACE}] BuildConfig [{#NAME}]: Latest version | The latest version of BuildConfig. |
Dependent item | openshift.buildconfig.status[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
OpenShift Build discovery | Dependent item | openshift.build.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
OpenShift: Namespace [{#NAMESPACE}] Build [{#NAME}]: Created | OpenShift Build Unix creation timestamp. |
Dependent item | openshift.build.created.time[{#NAMESPACE}/{#NAME}] Preprocessing
|
OpenShift: Namespace [{#NAMESPACE}] Build [{#NAME}]: Generation | Sequence number representing a specific generation of the desired state. |
Dependent item | openshift.build.sequence.number[{#NAMESPACE}/{#NAME}] Preprocessing
|
OpenShift: Namespace [{#NAMESPACE}] Build [{#NAME}]: Status phase | The Build phase. |
Dependent item | openshift.build.status_phase[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
OpenShift: Build [{#NAME}]: Build has failed | count(/Kubernetes cluster state by HTTP/openshift.build.status_phase[{#NAMESPACE}/{#NAME}],2m,"ge",6)>=2 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
OpenShift ClusterResourceQuota discovery | Dependent item | openshift.cluster.resource.quota.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
OpenShift: Quota [{#NAME}] Resource [{#RESOURCE}]: Type [{#TYPE}]] | Usage about resource quota. |
Dependent item | openshift.cluster.resource.quota[{#RESOURCE}/{#NAME}/{#TYPE}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
OpenShift Route discovery | Dependent item | openshift.route.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
OpenShift: Namespace [{#NAMESPACE}] Route [{#NAME}]: Created | OpenShift Route Unix creation timestamp. |
Dependent item | openshift.route.created.time[{#NAMESPACE}/{#NAME}] Preprocessing
|
OpenShift: Namespace [{#NAMESPACE}] Route [{#NAME}]: Status | Information about route status. |
Dependent item | openshift.route.status[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
OpenShift: Route [{#NAME}] with issue: Status is false | count(/Kubernetes cluster state by HTTP/openshift.route.status[{#NAMESPACE}/{#NAME}],2m,,0)>=2 |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Kubernetes Scheduler by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Kubernetes Scheduler by HTTP
- collects metrics by HTTP agent from Scheduler /metrics endpoint.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Internal service metrics are collected from /metrics endpoint. Template needs to use Authorization via API token.
Don't forget change macros {$KUBE.SCHEDULER.SERVER.URL}, {$KUBE.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values.
NOTE. You might need to set the --binding-address
option for Scheduler to the address where Zabbix proxy can reach it.
For example, for clusters created with kubeadm
it can be set in the following manifest file (changes will be applied immediately):
NOTE. Some metrics may not be collected depending on your Kubernetes Scheduler instance version and configuration.
Name | Description | Default |
---|---|---|
{$KUBE.SCHEDULER.SERVER.URL} | Kubernetes Scheduler metrics endpoint URL. |
https://localhost:10259/metrics |
{$KUBE.API.TOKEN} | API Authorization Token. |
|
{$KUBE.SCHEDULER.HTTP.CLIENT.ERROR} | Maximum number of HTTP client requests failures used for trigger. |
2 |
{$KUBE.SCHEDULER.UNSCHEDULABLE} | Maximum number of scheduling failures with 'unschedulable' used for trigger. |
2 |
{$KUBE.SCHEDULER.ERROR} | Maximum number of scheduling failures with 'error' used for trigger. |
2 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes Scheduler: Get Scheduler metrics | Get raw metrics from Scheduler instance /metrics endpoint. |
HTTP agent | kubernetes.scheduler.get_metrics Preprocessing
|
Kubernetes Scheduler: Virtual memory, bytes | Virtual memory size in bytes. |
Dependent item | kubernetes.scheduler.processvirtualmemory_bytes Preprocessing
|
Kubernetes Scheduler: Resident memory, bytes | Resident memory size in bytes. |
Dependent item | kubernetes.scheduler.processresidentmemory_bytes Preprocessing
|
Kubernetes Scheduler: CPU | Total user and system CPU usage ratio. |
Dependent item | kubernetes.scheduler.cpu.util Preprocessing
|
Kubernetes Scheduler: Goroutines | Number of goroutines that currently exist. |
Dependent item | kubernetes.scheduler.go_goroutines Preprocessing
|
Kubernetes Scheduler: Go threads | Number of OS threads created. |
Dependent item | kubernetes.scheduler.go_threads Preprocessing
|
Kubernetes Scheduler: Fds open | Number of open file descriptors. |
Dependent item | kubernetes.scheduler.open_fds Preprocessing
|
Kubernetes Scheduler: Fds max | Maximum allowed open file descriptors. |
Dependent item | kubernetes.scheduler.max_fds Preprocessing
|
Kubernetes Scheduler: REST Client requests: 2xx, rate | Number of HTTP requests with 2xx status code per second. |
Dependent item | kubernetes.scheduler.clienthttprequests_200.rate Preprocessing
|
Kubernetes Scheduler: REST Client requests: 3xx, rate | Number of HTTP requests with 3xx status code per second. |
Dependent item | kubernetes.scheduler.clienthttprequests_300.rate Preprocessing
|
Kubernetes Scheduler: REST Client requests: 4xx, rate | Number of HTTP requests with 4xx status code per second. |
Dependent item | kubernetes.scheduler.clienthttprequests_400.rate Preprocessing
|
Kubernetes Scheduler: REST Client requests: 5xx, rate | Number of HTTP requests with 5xx status code per second. |
Dependent item | kubernetes.scheduler.clienthttprequests_500.rate Preprocessing
|
Kubernetes Scheduler: Schedule attempts: scheduled | Number of attempts to schedule pods with result "scheduled" per second. |
Dependent item | kubernetes.scheduler.schedulerscheduleattempts.scheduled.rate Preprocessing
|
Kubernetes Scheduler: Schedule attempts: unschedulable | Number of attempts to schedule pods with result "unschedulable" per second. |
Dependent item | kubernetes.scheduler.schedulerscheduleattempts.unschedulable.rate Preprocessing
|
Kubernetes Scheduler: Schedule attempts: error | Number of attempts to schedule pods with result "error" per second. |
Dependent item | kubernetes.scheduler.schedulerscheduleattempts.error.rate Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes Scheduler: Too many REST Client errors | "Kubernetes Scheduler REST Client requests is experiencing high error rate (with 5xx HTTP code). |
min(/Kubernetes Scheduler by HTTP/kubernetes.scheduler.client_http_requests_500.rate,5m)>{$KUBE.SCHEDULER.HTTP.CLIENT.ERROR} |Warning |
||
Kubernetes Scheduler: Too many unschedulable pods | Number of attempts to schedule pods with 'unschedulable' result is too high. 'unschedulable' means a pod could not be scheduled. |
min(/Kubernetes Scheduler by HTTP/kubernetes.scheduler.scheduler_schedule_attempts.unschedulable.rate,5m)>{$KUBE.SCHEDULER.UNSCHEDULABLE} |Warning |
||
Kubernetes Scheduler: Too many schedule attempts with errors | Number of attempts to schedule pods with 'error' result is too high. 'error' means an internal scheduler problem. |
min(/Kubernetes Scheduler by HTTP/kubernetes.scheduler.scheduler_schedule_attempts.error.rate,5m)>{$KUBE.SCHEDULER.ERROR} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Scheduling algorithm histogram | Discovery raw data of scheduling algorithm latency. |
Dependent item | kubernetes.scheduler.scheduling_algorithm.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes Scheduler: Scheduling algorithm duration bucket, {#LE} | Scheduling algorithm latency in seconds. |
Dependent item | kubernetes.scheduler.schedulingalgorithmduration[{#LE}] Preprocessing
|
Kubernetes Scheduler: Scheduling algorithm duration, p90 | 90 percentile of scheduling algorithm latency in seconds. |
Calculated | kubernetes.scheduler.schedulingalgorithmduration_p90[{#SINGLETON}] |
Kubernetes Scheduler: Scheduling algorithm duration, p95 | 95 percentile of scheduling algorithm latency in seconds. |
Calculated | kubernetes.scheduler.schedulingalgorithmduration_p95[{#SINGLETON}] |
Kubernetes Scheduler: Scheduling algorithm duration, p99 | 99 percentile of scheduling algorithm latency in seconds. |
Calculated | kubernetes.scheduler.schedulingalgorithmduration_p99[{#SINGLETON}] |
Kubernetes Scheduler: Scheduling algorithm duration, p50 | 50 percentile of scheduling algorithm latency in seconds. |
Calculated | kubernetes.scheduler.schedulingalgorithmduration_p50[{#SINGLETON}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Binding histogram | Discovery raw data of binding latency. |
Dependent item | kubernetes.scheduler.binding.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes Scheduler: Binding duration bucket, {#LE} | Binding latency in seconds. |
Dependent item | kubernetes.scheduler.binding_duration[{#LE}] Preprocessing
|
Kubernetes Scheduler: Binding duration, p90 | 90 percentile of binding latency in seconds. |
Calculated | kubernetes.scheduler.bindingdurationp90[{#SINGLETON}] |
Kubernetes Scheduler: Binding duration, p95 | 99 percentile of binding latency in seconds. |
Calculated | kubernetes.scheduler.bindingdurationp95[{#SINGLETON}] |
Kubernetes Scheduler: Binding duration, p99 | 95 percentile of binding latency in seconds. |
Calculated | kubernetes.scheduler.bindingdurationp99[{#SINGLETON}] |
Kubernetes Scheduler: Binding duration, p50 | 50 percentile of binding latency in seconds. |
Calculated | kubernetes.scheduler.bindingdurationp50[{#SINGLETON}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
e2e scheduling histogram | Discovery raw data and percentile items of e2e scheduling latency. |
Dependent item | kubernetes.controller.e2e_scheduling.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes Scheduler: ["{#RESULT}"]: e2e scheduling seconds bucket, {#LE} | E2e scheduling latency in seconds (scheduling algorithm + binding) |
Dependent item | kubernetes.scheduler.e2eschedulingbucket[{#LE},"{#RESULT}"] Preprocessing
|
Kubernetes Scheduler: ["{#RESULT}"]: e2e scheduling, p50 | 50 percentile of e2e scheduling latency. |
Calculated | kubernetes.scheduler.e2eschedulingp50["{#RESULT}"] |
Kubernetes Scheduler: ["{#RESULT}"]: e2e scheduling, p90 | 90 percentile of e2e scheduling latency. |
Calculated | kubernetes.scheduler.e2eschedulingp90["{#RESULT}"] |
Kubernetes Scheduler: ["{#RESULT}"]: e2e scheduling, p95 | 95 percentile of e2e scheduling latency. |
Calculated | kubernetes.scheduler.e2eschedulingp95["{#RESULT}"] |
Kubernetes Scheduler: ["{#RESULT}"]: e2e scheduling, p99 | 95 percentile of e2e scheduling latency. |
Calculated | kubernetes.scheduler.e2eschedulingp99["{#RESULT}"] |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Kubernetes nodes that work without any external scripts.
It works without external scripts and uses the script item to make HTTP requests to the Kubernetes API.
Install the Zabbix Helm Chart (https://git.zabbix.com/projects/ZT/repos/kubernetes-helm/browse?at=refs%2Fheads%2Frelease%2F6.4) in your Kubernetes cluster.
Change the values according to the environment in the file $HOME/zabbix_values.yaml.
For example:
Set the {$KUBE.API.URL}
such as <scheme>://<host>:<port>
.
Get the generated service account token using the command
kubectl get secret zabbix-service-account -n monitoring -o jsonpath={.data.token} | base64 -d
Then set it to the macro {$KUBE.API.TOKEN}
.
Set up the macros to filter the metrics of discovered nodes
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Install the Zabbix Helm Chart in your Kubernetes cluster.
Set the {$KUBE.API.URL}
such as <scheme>://<host>:<port>
.
Get the generated service account token using the command
kubectl get secret zabbix-service-account -n monitoring -o jsonpath={.data.token} | base64 -d
Then set it to the macro {$KUBE.API.TOKEN}
.
Set {$KUBE.NODES.ENDPOINT.NAME}
with Zabbix agent's endpoint name. See kubectl -n monitoring get ep
. Default: zabbix-zabbix-helm-chrt-agent
.
Set up the macros to filter the metrics of discovered nodes and host creation based on host prototypes:
Set up macros to filter pod metrics by namespace:
Note, If you have a large cluster, it is highly recommended to set a filter for discoverable pods.
You can use the {$KUBE.NODE.FILTER.LABELS}
, {$KUBE.POD.FILTER.LABELS}
, {$KUBE.NODE.FILTER.ANNOTATIONS}
and {$KUBE.POD.FILTER.ANNOTATIONS}
macros for advanced filtering of nodes and pods by labels and annotations.
Notes about labels and annotations filters:
key1: value, key2: regexp
).!
) to invert the filter (!key: value
).For example: kubernetes.io/hostname: kubernetes-node[5-25], !node-role.kubernetes.io/ingress: .*
. As a result, the nodes 5-25 without the "ingress" role will be discovered.
See the Kubernetes documentation for details about labels and annotations:
Note, the discovered nodes will be created as separate hosts in Zabbix with the Linux template automatically assigned to them.
Name | Description | Default |
---|---|---|
{$KUBE.API.URL} | Kubernetes API endpoint URL in the format |
https://kubernetes.default.svc.cluster.local:443 |
{$KUBE.API.TOKEN} | Service account bearer token. |
|
{$KUBE.HTTP.PROXY} | Sets the HTTP proxy to |
|
{$KUBE.NODES.ENDPOINT.NAME} | Kubernetes nodes endpoint name. See "kubectl -n monitoring get ep". |
zabbix-zabbix-helm-chrt-agent |
{$KUBE.LLD.FILTER.NODE.MATCHES} | Filter of discoverable nodes. |
.* |
{$KUBE.LLD.FILTER.NODE.NOT_MATCHES} | Filter to exclude discovered nodes. |
CHANGE_IF_NEEDED |
{$KUBE.LLD.FILTER.NODE.ROLE.MATCHES} | Filter of discoverable nodes by role. |
.* |
{$KUBE.LLD.FILTER.NODE.ROLE.NOT_MATCHES} | Filter to exclude discovered node by role. |
CHANGE_IF_NEEDED |
{$KUBE.NODE.FILTER.ANNOTATIONS} | Annotations to filter nodes (regex in values are supported). See the template's README.md for details. |
|
{$KUBE.NODE.FILTER.LABELS} | Labels to filter nodes (regex in values are supported). See the template's README.md for details. |
|
{$KUBE.POD.FILTER.ANNOTATIONS} | Annotations to filter pods (regex in values are supported). See the template's README.md for details. |
|
{$KUBE.POD.FILTER.LABELS} | Labels to filter Pods (regex in values are supported). See the template's README.md for details. |
|
{$KUBE.LLD.FILTER.POD.NAMESPACE.MATCHES} | Filter of discoverable pods by namespace. |
.* |
{$KUBE.LLD.FILTER.POD.NAMESPACE.NOT_MATCHES} | Filter to exclude discovered pods by namespace. |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes: Get nodes | Collecting and processing cluster nodes data via Kubernetes API. |
Script | kube.nodes |
Get nodes check | Data collection check. |
Dependent item | kube.nodes.check Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes: Failed to get nodes | length(last(/Kubernetes nodes by HTTP/kube.nodes.check))>0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Node discovery | Dependent item | kube.node.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Node [{#NAME}]: Get data | Collecting and processing cluster by node [{#NAME}] data via Kubernetes API. |
Dependent item | kube.node.get[{#NAME}] Preprocessing
|
Node [{#NAME}] Addresses: External IP | Typically the IP address of the node that is externally routable (available from outside the cluster). |
Dependent item | kube.node.addresses.external_ip[{#NAME}] Preprocessing
|
Node [{#NAME}] Addresses: Internal IP | Typically the IP address of the node that is routable only within the cluster. |
Dependent item | kube.node.addresses.internal_ip[{#NAME}] Preprocessing
|
Node [{#NAME}] Allocatable: CPU | Allocatable CPU. 'Allocatable' on a Kubernetes node is defined as the amount of compute resources that are available for pods. The scheduler does not over-subscribe 'Allocatable'. 'CPU', 'memory' and 'ephemeral-storage' are supported as of now. |
Dependent item | kube.node.allocatable.cpu[{#NAME}] Preprocessing
|
Node [{#NAME}] Allocatable: Memory | Allocatable Memory. 'Allocatable' on a Kubernetes node is defined as the amount of compute resources that are available for pods. The scheduler does not over-subscribe 'Allocatable'. 'CPU', 'memory' and 'ephemeral-storage' are supported as of now. |
Dependent item | kube.node.allocatable.memory[{#NAME}] Preprocessing
|
Node [{#NAME}] Allocatable: Pods | https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/ |
Dependent item | kube.node.allocatable.pods[{#NAME}] Preprocessing
|
Node [{#NAME}] Capacity: CPU | CPU resource capacity. https://kubernetes.io/docs/concepts/architecture/nodes/#capacity |
Dependent item | kube.node.capacity.cpu[{#NAME}] Preprocessing
|
Node [{#NAME}] Capacity: Memory | Memory resource capacity. https://kubernetes.io/docs/concepts/architecture/nodes/#capacity |
Dependent item | kube.node.capacity.memory[{#NAME}] Preprocessing
|
Node [{#NAME}] Capacity: Pods | https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/ |
Dependent item | kube.node.capacity.pods[{#NAME}] Preprocessing
|
Node [{#NAME}] Conditions: Disk pressure | True if pressure exists on the disk size - that is, if the disk capacity is low; otherwise False. |
Dependent item | kube.node.conditions.diskpressure[{#NAME}] Preprocessing
|
Node [{#NAME}] Conditions: Memory pressure | True if pressure exists on the node memory - that is, if the node memory is low; otherwise False. |
Dependent item | kube.node.conditions.memorypressure[{#NAME}] Preprocessing
|
Node [{#NAME}] Conditions: Network unavailable | True if the network for the node is not correctly configured, otherwise False. |
Dependent item | kube.node.conditions.networkunavailable[{#NAME}] Preprocessing
|
Node [{#NAME}] Conditions: PID pressure | True if pressure exists on the processes - that is, if there are too many processes on the node; otherwise False. |
Dependent item | kube.node.conditions.pidpressure[{#NAME}] Preprocessing
|
Node [{#NAME}] Conditions: Ready | True if the node is healthy and ready to accept pods, False if the node is not healthy and is not accepting pods, and Unknown if the node controller has not heard from the node in the last node-monitor-grace-period (default is 40 seconds). |
Dependent item | kube.node.conditions.ready[{#NAME}] Preprocessing
|
Node [{#NAME}] Info: Architecture | Node architecture. |
Dependent item | kube.node.info.architecture[{#NAME}] Preprocessing
|
Node [{#NAME}] Info: Container runtime | Container runtime. https://kubernetes.io/docs/setup/production-environment/container-runtimes/ |
Dependent item | kube.node.info.containerruntime[{#NAME}] Preprocessing
|
Node [{#NAME}] Info: Kernel version | Node kernel version. |
Dependent item | kube.node.info.kernelversion[{#NAME}] Preprocessing
|
Node [{#NAME}] Info: Kubelet version | Version of Kubelet. |
Dependent item | kube.node.info.kubeletversion[{#NAME}] Preprocessing
|
Node [{#NAME}] Info: KubeProxy version | Version of KubeProxy. |
Dependent item | kube.node.info.kubeproxyversion[{#NAME}] Preprocessing
|
Node [{#NAME}] Info: Operating system | Node operating system. |
Dependent item | kube.node.info.operatingsystem[{#NAME}] Preprocessing
|
Node [{#NAME}] Info: OS image | Node OS image. |
Dependent item | kube.node.info.osversion[{#NAME}] Preprocessing
|
Node [{#NAME}] Info: Roles | Node roles. |
Dependent item | kube.node.info.roles[{#NAME}] Preprocessing
|
Node [{#NAME}] Limits: CPU | Node CPU limits. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |
Dependent item | kube.node.limits.cpu[{#NAME}] Preprocessing
|
Node [{#NAME}] Limits: Memory | Node Memory limits. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |
Dependent item | kube.node.limits.memory[{#NAME}] Preprocessing
|
Node [{#NAME}] Requests: CPU | Node CPU requests. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |
Dependent item | kube.node.requests.cpu[{#NAME}] Preprocessing
|
Node [{#NAME}] Requests: Memory | Node Memory requests. https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/ |
Dependent item | kube.node.requests.memory[{#NAME}] Preprocessing
|
Node [{#NAME}] Uptime | Node uptime. |
Dependent item | kube.node.uptime[{#NAME}] Preprocessing
|
Node [{#NAME}] Used: Pods | Current number of pods on the node. |
Dependent item | kube.node.used.pods[{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Node [{#NAME}] Conditions: Pressure exists on the disk size | True - pressure exists on the disk size - that is, if the disk capacity is low; otherwise False. |
last(/Kubernetes nodes by HTTP/kube.node.conditions.diskpressure[{#NAME}])=1 |Warning |
||
Node [{#NAME}] Conditions: Pressure exists on the node memory | True - pressure exists on the node memory - that is, if the node memory is low; otherwise False |
last(/Kubernetes nodes by HTTP/kube.node.conditions.memorypressure[{#NAME}])=1 |Warning |
||
Node [{#NAME}] Conditions: Network is not correctly configured | True - the network for the node is not correctly configured, otherwise False |
last(/Kubernetes nodes by HTTP/kube.node.conditions.networkunavailable[{#NAME}])=1 |Warning |
||
Node [{#NAME}] Conditions: Pressure exists on the processes | True - pressure exists on the processes - that is, if there are too many processes on the node; otherwise False |
last(/Kubernetes nodes by HTTP/kube.node.conditions.pidpressure[{#NAME}])=1 |Warning |
||
Node [{#NAME}] Conditions: Is not in Ready state | False - if the node is not healthy and is not accepting pods. |
last(/Kubernetes nodes by HTTP/kube.node.conditions.ready[{#NAME}])<>1 |Warning |
||
Node [{#NAME}] Limits: Total CPU limits are too high | last(/Kubernetes nodes by HTTP/kube.node.limits.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 0.9 |Warning |
Depends on:
|
||
Node [{#NAME}] Limits: Total CPU limits are too high | last(/Kubernetes nodes by HTTP/kube.node.limits.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 1 |Average |
|||
Node [{#NAME}] Limits: Total memory limits are too high | last(/Kubernetes nodes by HTTP/kube.node.limits.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 0.9 |Warning |
Depends on:
|
||
Node [{#NAME}] Limits: Total memory limits are too high | last(/Kubernetes nodes by HTTP/kube.node.limits.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 1 |Average |
|||
Node [{#NAME}] Requests: Total CPU requests are too high | last(/Kubernetes nodes by HTTP/kube.node.requests.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 0.5 |Warning |
Depends on:
|
||
Node [{#NAME}] Requests: Total CPU requests are too high | last(/Kubernetes nodes by HTTP/kube.node.requests.cpu[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.cpu[{#NAME}]) > 0.8 |Average |
|||
Node [{#NAME}] Requests: Total memory requests are too high | last(/Kubernetes nodes by HTTP/kube.node.requests.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 0.5 |Warning |
Depends on:
|
||
Node [{#NAME}] Requests: Total memory requests are too high | last(/Kubernetes nodes by HTTP/kube.node.requests.memory[{#NAME}]) / last(/Kubernetes nodes by HTTP/kube.node.allocatable.memory[{#NAME}]) > 0.8 |Average |
|||
Node [{#NAME}]: Has been restarted | Uptime is less than 10 minutes. |
last(/Kubernetes nodes by HTTP/kube.node.uptime[{#NAME}])<10 |Info |
||
Node [{#NAME}] Used: Kubelet too many pods | Kubelet is running at capacity. |
last(/Kubernetes nodes by HTTP/kube.node.used.pods[{#NAME}])/ last(/Kubernetes nodes by HTTP/kube.node.capacity.pods[{#NAME}]) > 0.9 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Pod discovery | Dependent item | kube.pod.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Node [{#NODE}] Pod [{#POD}]: Get data | Collecting and processing cluster by node [{#NODE}] data via Kubernetes API. |
Dependent item | kube.pod.get[{#POD}] Preprocessing
|
Node [{#NODE}] Pod [{#POD}] Conditions: Containers ready | All containers in the Pod are ready. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions |
Dependent item | kube.pod.conditions.containers_ready[{#POD}] Preprocessing
|
Node [{#NODE}] Pod [{#POD}] Conditions: Initialized | All init containers have started successfully. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions |
Dependent item | kube.pod.conditions.initialized[{#POD}] Preprocessing
|
Node [{#NODE}] Pod [{#POD}] Conditions: Ready | The Pod is able to serve requests and should be added to the load balancing pools of all matching Services. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions |
Dependent item | kube.pod.conditions.ready[{#POD}] Preprocessing
|
Node [{#NODE}] Pod [{#POD}] Conditions: Scheduled | The Pod has been scheduled to a node. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions |
Dependent item | kube.pod.conditions.scheduled[{#POD}] Preprocessing
|
Node [{#NODE}] Pod [{#POD}] Containers: Restarts | The number of times the container has been restarted, currently based on the number of dead containers that have not yet been removed. Note that this is calculated from dead containers. But those containers are subject to garbage collection. |
Dependent item | kube.pod.containers.restartcount[{#POD}] Preprocessing
|
Node [{#NODE}] Pod [{#POD}] Status: Phase | The phase of a Pod is a simple, high-level summary of where the Pod is in its lifecycle. https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle#pod-phase |
Dependent item | kube.pod.status.phase[{#POD}] Preprocessing
|
Node [{#NODE}] Pod [{#POD}] Uptime | Pod uptime. |
Dependent item | kube.pod.uptime[{#POD}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Node [{#NODE}] Pod [{#POD}]: Pod is crash looping | Containers of the pod keep restarting. This most likely indicates that the pod is in the CrashLoopBackOff state. |
(last(/Kubernetes nodes by HTTP/kube.pod.containers.restartcount[{#POD}])-min(/Kubernetes nodes by HTTP/kube.pod.containers.restartcount[{#POD}],15m))>1 |Warning |
||
Node [{#NODE}] Pod [{#POD}] Status: Kubernetes Pod not healthy | Pod has been in a non-ready state for longer than 10 minutes. |
count(/Kubernetes nodes by HTTP/kube.pod.status.phase[{#POD}],10m, "regexp","^(1|4|5)$")>=9 |High |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Kubernetes Kubelet by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Kubernetes Kubelet by HTTP
- collects metrics by HTTP agent from Kubelet /metrics endpoint.
Don't forget change macros {$KUBE.KUBELET.URL}, {$KUBE.API.TOKEN}.
NOTE. Some metrics may not be collected depending on your Kubernetes instance version and configuration.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Internal service metrics are collected from /metrics endpoint. Template needs to use Authorization via API token.
Don't forget change macros {$KUBE.KUBELET.URL}, {$KUBE.API.TOKEN}.
NOTE. Some metrics may not be collected depending on your Kubernetes instance version and configuration.
Name | Description | Default |
---|---|---|
{$KUBE.API.TOKEN} | Service account bearer token. |
|
{$KUBE.KUBELET.URL} | Kubernetes Kubelet instance URL. |
https://localhost:10250 |
{$KUBE.KUBELET.METRIC.ENDPOINT} | Kubelet /metrics endpoint. |
/metrics |
{$KUBE.KUBELET.CADVISOR.ENDPOINT} | cAdvisor metrics from Kubelet /metrics/cadvisor endpoint. |
/metrics/cadvisor |
{$KUBE.KUBELET.PODS.ENDPOINT} | Kubelet /pods endpoint. |
/pods |
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes: Get kubelet metrics | Collecting raw Kubelet metrics from /metrics endpoint. |
HTTP agent | kube.kubelet.metrics |
Kubernetes: Get cadvisor metrics | Collecting raw Kubelet metrics from /metrics/cadvisor endpoint. |
HTTP agent | kube.cadvisor.metrics |
Kubernetes: Get pods | Collecting raw Kubelet metrics from /pods endpoint. |
HTTP agent | kube.pods |
Kubernetes: Pods running | The number of running pods. |
Dependent item | kube.kubelet.pods.running Preprocessing
|
Kubernetes: Containers running | The number of running containers. |
Dependent item | kube.kubelet.containers.running Preprocessing
|
Kubernetes: Containers last state terminated | The number of containers that were previously terminated. |
Dependent item | kube.kublet.containers.terminated Preprocessing
|
Kubernetes: Containers restarts | The number of times the container has been restarted. |
Dependent item | kube.kubelet.containers.restarts Preprocessing
|
Kubernetes: CPU cores, total | The number of cores in this machine (available until kubernetes v1.18). |
Dependent item | kube.kubelet.cpu.cores Preprocessing
|
Kubernetes: Machine memory, bytes | Resident memory size in bytes. |
Dependent item | kube.kubelet.machine.memory Preprocessing
|
Kubernetes: Virtual memory, bytes | Virtual memory size in bytes. |
Dependent item | kube.kubelet.virtual.memory Preprocessing
|
Kubernetes: File descriptors, max | Maximum number of open file descriptors. |
Dependent item | kube.kubelet.processmaxfds Preprocessing
|
Kubernetes: File descriptors, open | Number of open file descriptors. |
Dependent item | kube.kubelet.processopenfds Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Runtime operations discovery | Dependent item | kube.kubelet.runtimeoperationsbucket.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes: [{#OP_TYPE}] Runtime operations bucket: {#LE} | Duration in seconds of runtime operations. Broken down by operation type. |
Dependent item | kube.kublet.runtimeopsdurationsecondsbucket[{#LE},"{#OP_TYPE}"] Preprocessing
|
Kubernetes: [{#OP_TYPE}] Runtime operations total, rate | Cumulative number of runtime operations by operation type. |
Dependent item | kube.kublet.runtimeopstotal.rate["{#OP_TYPE}"] Preprocessing
|
Kubernetes: [{#OP_TYPE}] Operations, p90 | 90 percentile of operation latency distribution in seconds for each verb. |
Calculated | kube.kublet.runtimeopsdurationsecondsp90["{#OP_TYPE}"] |
Kubernetes: [{#OP_TYPE}] Operations, p95 | 95 percentile of operation latency distribution in seconds for each verb. |
Calculated | kube.kublet.runtimeopsdurationsecondsp95["{#OP_TYPE}"] |
Kubernetes: [{#OP_TYPE}] Operations, p99 | 99 percentile of operation latency distribution in seconds for each verb. |
Calculated | kube.kublet.runtimeopsdurationsecondsp99["{#OP_TYPE}"] |
Kubernetes: [{#OP_TYPE}] Operations, p50 | 50 percentile of operation latency distribution in seconds for each verb. |
Calculated | kube.kublet.runtimeopsdurationsecondsp50["{#OP_TYPE}"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Pods discovery | Dependent item | kube.kubelet.pods.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] CPU: Load average, 10s | Pods cpu load average over the last 10 seconds. |
Dependent item | kube.pod.containercpuloadaverage10s[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] CPU: System seconds, total | System cpu time consumed. It is calculated from the cumulative value using the |
Dependent item | kube.pod.containercpusystemsecondstotal[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] CPU: Usage seconds, total | Consumed cpu time. It is calculated from the cumulative value using the |
Dependent item | kube.pod.containercpuusagesecondstotal[{#NAMESPACE}/{#NAME}] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#NAME}] CPU: User seconds, total | User cpu time consumed. It is calculated from the cumulative value using the |
Dependent item | kube.pod.containercpuusersecondstotal[{#NAMESPACE}/{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
REST client requests discovery | Dependent item | kube.kubelet.rest.requests.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes: Host [{#HOST}] Request method [{#METHOD}] Code:[{#CODE}] | Number of HTTP requests, partitioned by status code, method, and host. |
Dependent item | kube.kubelet.rest.requests["{#CODE}", "{#HOST}", "{#METHOD}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Container memory discovery | Dependent item | kube.kubelet.container.memory.cache.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: Memory page cache | Number of bytes of page cache memory. |
Dependent item | kube.kubelet.container.memory.cache["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: Memory max usage | Maximum memory usage recorded in bytes. |
Dependent item | kube.kubelet.container.memory.max_usage["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: RSS | Size of RSS in bytes. |
Dependent item | kube.kubelet.container.memory.rss["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: Swap | Container swap usage in bytes. |
Dependent item | kube.kubelet.container.memory.swap["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: Usage | Current memory usage in bytes, including all memory regardless of when it was accessed. |
Dependent item | kube.kubelet.container.memory.usage["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing
|
Kubernetes: Namespace [{#NAMESPACE}] Pod [{#POD}] Container [{#CONTAINER}]: Working set | Current working set in bytes. |
Dependent item | kube.kubelet.container.memory.working_set["{#CONTAINER}", "{#NAMESPACE}", "{#POD}"] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Kubernetes Controller manager by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Kubernetes Controller manager by HTTP
- collects metrics by HTTP agent from Controller manager /metrics endpoint.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Internal service metrics are collected from /metrics endpoint. Template needs to use Authorization via API token.
Don't forget change macros {$KUBE.CONTROLLER.SERVER.URL}, {$KUBE.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values.
NOTE. You might need to set the --binding-address
option for Controller Manager to the address where Zabbix proxy can reach it.
For example, for clusters created with kubeadm
it can be set in the following manifest file (changes will be applied immediately):
NOTE. Some metrics may not be collected depending on your Kubernetes Controller manager instance version and configuration.
Name | Description | Default |
---|---|---|
{$KUBE.CONTROLLER.SERVER.URL} | Kubernetes Controller manager metrics endpoint URL. |
https://localhost:10257/metrics |
{$KUBE.API.TOKEN} | API Authorization Token |
|
{$KUBE.CONTROLLER.HTTP.CLIENT.ERROR} | Maximum number of HTTP client requests failures used for trigger. |
2 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes Controller: Get Controller metrics | Get raw metrics from Controller instance /metrics endpoint. |
HTTP agent | kubernetes.controller.get_metrics Preprocessing
|
Kubernetes Controller Manager: Leader election status | Gauge of if the reporting system is master of the relevant lease, 0 indicates backup, 1 indicates master. |
Dependent item | kubernetes.controller.leaderelectionmaster_status Preprocessing
|
Kubernetes Controller Manager: Virtual memory, bytes | Virtual memory size in bytes. |
Dependent item | kubernetes.controller.processvirtualmemory_bytes Preprocessing
|
Kubernetes Controller Manager: Resident memory, bytes | Resident memory size in bytes. |
Dependent item | kubernetes.controller.processresidentmemory_bytes Preprocessing
|
Kubernetes Controller Manager: CPU | Total user and system CPU usage ratio. |
Dependent item | kubernetes.controller.cpu.util Preprocessing
|
Kubernetes Controller Manager: Goroutines | Number of goroutines that currently exist. |
Dependent item | kubernetes.controller.go_goroutines Preprocessing
|
Kubernetes Controller Manager: Go threads | Number of OS threads created. |
Dependent item | kubernetes.controller.go_threads Preprocessing
|
Kubernetes Controller Manager: Fds open | Number of open file descriptors. |
Dependent item | kubernetes.controller.open_fds Preprocessing
|
Kubernetes Controller Manager: Fds max | Maximum allowed open file descriptors. |
Dependent item | kubernetes.controller.max_fds Preprocessing
|
Kubernetes Controller Manager: REST Client requests: 2xx, rate | Number of HTTP requests with 2xx status code per second. |
Dependent item | kubernetes.controller.clienthttprequests_200.rate Preprocessing
|
Kubernetes Controller Manager: REST Client requests: 3xx, rate | Number of HTTP requests with 3xx status code per second. |
Dependent item | kubernetes.controller.clienthttprequests_300.rate Preprocessing
|
Kubernetes Controller Manager: REST Client requests: 4xx, rate | Number of HTTP requests with 4xx status code per second. |
Dependent item | kubernetes.controller.clienthttprequests_400.rate Preprocessing
|
Kubernetes Controller Manager: REST Client requests: 5xx, rate | Number of HTTP requests with 5xx status code per second. |
Dependent item | kubernetes.controller.clienthttprequests_500.rate Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes Controller Manager: Too many HTTP client errors | "Kubernetes Controller manager is experiencing high error rate (with 5xx HTTP code). |
min(/Kubernetes Controller manager by HTTP/kubernetes.controller.client_http_requests_500.rate,5m)>{$KUBE.CONTROLLER.HTTP.CLIENT.ERROR} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Workqueue metrics discovery | Dependent item | kubernetes.controller.workqueue.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue adds total, rate | Total number of adds handled by workqueue per second. |
Dependent item | kubernetes.controller.workqueueaddstotal["{#NAME}"] Preprocessing
|
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue depth | Current depth of workqueue. |
Dependent item | kubernetes.controller.workqueue_depth["{#NAME}"] Preprocessing
|
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue unfinished work, sec | How many seconds of work has done that is in progress and hasn't been observed by work_duration. Large values indicate stuck threads. One can deduce the number of stuck threads by observing the rate at which this increases. |
Dependent item | kubernetes.controller.workqueueunfinishedwork_seconds["{#NAME}"] Preprocessing
|
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue retries, rate | Total number of retries handled by workqueue per second. |
Dependent item | kubernetes.controller.workqueueretriestotal["{#NAME}"] Preprocessing
|
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue longest running processor, sec | How many seconds has the longest running processor for workqueue been running. |
Dependent item | kubernetes.controller.workqueuelongestrunningprocessorseconds["{#NAME}"] Preprocessing
|
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue work duration, p90 | 90 percentile of how long in seconds processing an item from workqueue takes, by queue. |
Calculated | kubernetes.controller.workqueueworkdurationsecondsp90["{#NAME}"] |
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue work duration, p95 | 95 percentile of how long in seconds processing an item from workqueue takes, by queue. |
Calculated | kubernetes.controller.workqueueworkdurationsecondsp95["{#NAME}"] |
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue work duration, p99 | 99 percentile of how long in seconds processing an item from workqueue takes, by queue. |
Calculated | kubernetes.controller.workqueueworkdurationsecondsp99["{#NAME}"] |
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue work duration, 50p | 50 percentiles of how long in seconds processing an item from workqueue takes, by queue. |
Calculated | kubernetes.controller.workqueueworkdurationsecondsp50["{#NAME}"] |
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue queue duration, p90 | 90 percentile of how long in seconds an item stays in workqueue before being requested, by queue. |
Calculated | kubernetes.controller.workqueuequeuedurationsecondsp90["{#NAME}"] |
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue queue duration, p95 | 95 percentile of how long in seconds an item stays in workqueue before being requested, by queue. |
Calculated | kubernetes.controller.workqueuequeuedurationsecondsp95["{#NAME}"] |
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue queue duration, p99 | 99 percentile of how long in seconds an item stays in workqueue before being requested, by queue. |
Calculated | kubernetes.controller.workqueuequeuedurationsecondsp99["{#NAME}"] |
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue queue duration, 50p | 50 percentile of how long in seconds an item stays in workqueue before being requested. If there are no requests for 5 minute, item value will be discarded. |
Calculated | kubernetes.controller.workqueuequeuedurationsecondsp50["{#NAME}"] Preprocessing
|
Kubernetes Controller Manager: ["{#NAME}"]: Workqueue duration seconds bucket, {#LE} | How long in seconds processing an item from workqueue takes. |
Dependent item | kubernetes.controller.durationsecondsbucket[{#LE},"{#NAME}"] Preprocessing
|
Kubernetes Controller Manager: ["{#NAME}"]: Queue duration seconds bucket, {#LE} | How long in seconds an item stays in workqueue before being requested. |
Dependent item | kubernetes.controller.queuedurationseconds_bucket[{#LE},"{#NAME}"] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Kubernetes API server that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Kubernetes API server by HTTP
- collects metrics by HTTP agent from API server /metrics endpoint.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Internal service metrics are collected from /metrics endpoint. Template needs to use Authorization via API token.
Don't forget change macros {$KUBE.API.SERVER.URL}, {$KUBE.API.TOKEN}. Also, see the Macros section for a list of macros used to set trigger values.
NOTE. Some metrics may not be collected depending on your Kubernetes API server instance version and configuration.
Name | Description | Default |
---|---|---|
{$KUBE.API.SERVER.URL} | Kubernetes API server metrics endpoint URL. |
https://localhost:6443/metrics |
{$KUBE.API.TOKEN} | API Authorization Token. |
|
{$KUBE.API.CERT.EXPIRATION} | Number of days for alert of client certificate used for trigger. |
7 |
{$KUBE.API.HTTP.CLIENT.ERROR} | Maximum number of HTTP client requests failures used for trigger. |
2 |
{$KUBE.API.HTTP.SERVER.ERROR} | Maximum number of HTTP server requests failures used for trigger. |
2 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes API: Get API instance metrics | Get raw metrics from API instance /metrics endpoint. |
HTTP agent | kubernetes.api.get_metrics Preprocessing
|
Kubernetes API: Audit events, total | Accumulated number audit events generated and sent to the audit backend. |
Dependent item | kubernetes.api.auditeventtotal Preprocessing
|
Kubernetes API: Virtual memory, bytes | Virtual memory size in bytes. |
Dependent item | kubernetes.api.processvirtualmemory_bytes Preprocessing
|
Kubernetes API: Resident memory, bytes | Resident memory size in bytes. |
Dependent item | kubernetes.api.processresidentmemory_bytes Preprocessing
|
Kubernetes API: CPU | Total user and system CPU usage ratio. |
Dependent item | kubernetes.api.cpu.util Preprocessing
|
Kubernetes API: Goroutines | Number of goroutines that currently exist. |
Dependent item | kubernetes.api.go_goroutines Preprocessing
|
Kubernetes API: Go threads | Number of OS threads created. |
Dependent item | kubernetes.api.go_threads Preprocessing
|
Kubernetes API: Fds open | Number of open file descriptors. |
Dependent item | kubernetes.api.open_fds Preprocessing
|
Kubernetes API: Fds max | Maximum allowed open file descriptors. |
Dependent item | kubernetes.api.max_fds Preprocessing
|
Kubernetes API: gRPCs client started, rate | Total number of RPCs started per second. |
Dependent item | kubernetes.api.grpcclientstarted.rate Preprocessing
|
Kubernetes API: gRPCs messages received, rate | Total number of gRPC stream messages received per second. |
Dependent item | kubernetes.api.grpcclientmsg_received.rate Preprocessing
|
Kubernetes API: gRPCs messages sent, rate | Total number of gRPC stream messages sent per second. |
Dependent item | kubernetes.api.grpcclientmsg_sent.rate Preprocessing
|
Kubernetes API: Request terminations, rate | Number of requests which apiserver terminated in self-defense per second. |
Dependent item | kubernetes.api.apiserverrequestterminations Preprocessing
|
Kubernetes API: TLS handshake errors, rate | Number of requests dropped with 'TLS handshake error from' error per second. |
Dependent item | kubernetes.api.apiservertlshandshakeerrorstotal.rate Preprocessing
|
Kubernetes API: API server requests: 5xx, rate | Counter of apiserver requests broken out for each HTTP response code. |
Dependent item | kubernetes.api.apiserverrequesttotal_500.rate Preprocessing
|
Kubernetes API: API server requests: 4xx, rate | Counter of apiserver requests broken out for each HTTP response code. |
Dependent item | kubernetes.api.apiserverrequesttotal_400.rate Preprocessing
|
Kubernetes API: API server requests: 3xx, rate | Counter of apiserver requests broken out for each HTTP response code. |
Dependent item | kubernetes.api.apiserverrequesttotal_300.rate Preprocessing
|
Kubernetes API: API server requests: 0 | Counter of apiserver requests broken out for each HTTP response code. |
Dependent item | kubernetes.api.apiserverrequesttotal_0.rate Preprocessing
|
Kubernetes API: API server requests: 2xx, rate | Counter of apiserver requests broken out for each HTTP response code. |
Dependent item | kubernetes.api.apiserverrequesttotal_200.rate Preprocessing
|
Kubernetes API: HTTP requests: 5xx, rate | Number of HTTP requests with 5xx status code per second. |
Dependent item | kubernetes.api.restclientrequeststotal500.rate Preprocessing
|
Kubernetes API: HTTP requests: 4xx, rate | Number of HTTP requests with 4xx status code per second. |
Dependent item | kubernetes.api.restclientrequeststotal400.rate Preprocessing
|
Kubernetes API: HTTP requests: 3xx, rate | Number of HTTP requests with 3xx status code per second. |
Dependent item | kubernetes.api.restclientrequeststotal300.rate Preprocessing
|
Kubernetes API: HTTP requests: 2xx, rate | Number of HTTP requests with 2xx status code per second. |
Dependent item | kubernetes.api.restclientrequeststotal200.rate Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes API: Too many server errors | "Kubernetes API server is experiencing high error rate (with 5xx HTTP code). |
min(/Kubernetes API server by HTTP/kubernetes.api.apiserver_request_total_500.rate,5m)>{$KUBE.API.HTTP.SERVER.ERROR} |Warning |
||
Kubernetes API: Too many client errors | "Kubernetes API client is experiencing high error rate (with 5xx HTTP code). |
min(/Kubernetes API server by HTTP/kubernetes.api.rest_client_requests_total_500.rate,5m)>{$KUBE.API.HTTP.CLIENT.ERROR} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Long-running requests | Discovery of long-running requests by verb, resource and scope. |
Dependent item | kubernetes.api.longrunning_gauge.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes API: Long-running ["{#VERB}"] requests ["{#RESOURCE}"]: {#SCOPE} | Gauge of all active long-running apiserver requests broken out by verb, resource and scope. Not all requests are tracked this way. |
Dependent item | kubernetes.api.longrunning_gauge["{#RESOURCE}","{#SCOPE}","{#VERB}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Request duration histogram | Discovery raw data and percentile items of request duration. |
Dependent item | kubernetes.api.requests_bucket.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes API: ["{#VERB}"] Requests bucket: {#LE} | Response latency distribution in seconds for each verb. |
Dependent item | kubernetes.api.requestdurationseconds_bucket[{#LE},"{#VERB}"] Preprocessing
|
Kubernetes API: ["{#VERB}"] Requests, p90 | 90 percentile of response latency distribution in seconds for each verb. |
Calculated | kubernetes.api.requestdurationseconds_p90["{#VERB}"] |
Kubernetes API: ["{#VERB}"] Requests, p95 | 95 percentile of response latency distribution in seconds for each verb. |
Calculated | kubernetes.api.requestdurationseconds_p95["{#VERB}"] |
Kubernetes API: ["{#VERB}"] Requests, p99 | 99 percentile of response latency distribution in seconds for each verb. |
Calculated | kubernetes.api.requestdurationseconds_p99["{#VERB}"] |
Kubernetes API: ["{#VERB}"] Requests, p50 | 50 percentile of response latency distribution in seconds for each verb. |
Calculated | kubernetes.api.requestdurationseconds_p50["{#VERB}"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Requests inflight discovery | Discovery requests inflight by kind. |
Dependent item | kubernetes.api.inflight_requests.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes API: Requests current: {#KIND} | Maximal number of currently used inflight request limit of this apiserver per request kind in last second. |
Dependent item | kubernetes.api.currentinflightrequests["{#KIND}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
gRPC completed requests discovery | Discovery grpc completed requests by grpc code. |
Dependent item | kubernetes.api.grpcclienthandled.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes API: gRPCs completed: {#GRPC_CODE}, rate | Total number of RPCs completed by the client regardless of success or failure per second. |
Dependent item | kubernetes.api.grpcclienthandledtotal.rate["{#GRPCCODE}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Authentication attempts discovery | Discovery authentication attempts by result. |
Dependent item | kubernetes.api.authentication_attempts.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes API: Authentication attempts: {#RESULT}, rate | Authentication attempts by result per second. |
Dependent item | kubernetes.api.authentication_attempts.rate["{#RESULT}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Authentication requests discovery | Discovery authentication attempts by name. |
Dependent item | kubernetes.api.authenticateduserrequests.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes API: Authenticated requests: {#NAME}, rate | Counter of authenticated requests broken out by username per second. |
Dependent item | kubernetes.api.authenticateduserrequests.rate["{#NAME}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Watchers metrics discovery | Discovery watchers by kind. |
Dependent item | kubernetes.api.apiserverregisteredwatchers.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes API: Watchers: {#KIND} | Number of currently registered watchers for a given resource. |
Dependent item | kubernetes.api.apiserverregisteredwatchers["{#KIND}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Etcd objects metrics discovery | Discovery etcd objects by resource. |
Dependent item | kubernetes.api.etcdobjectcounts.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes API: etcd objects: {#RESOURCE} | Number of stored objects at the time of last check split by kind. |
Dependent item | kubernetes.api.etcdobjectcounts["{#RESOURCE}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Workqueue metrics discovery | Discovery workqueue metrics by name. |
Dependent item | kubernetes.api.workqueue.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes API: ["{#NAME}"] Workqueue depth | Current depth of workqueue. |
Dependent item | kubernetes.api.workqueue_depth["{#NAME}"] Preprocessing
|
Kubernetes API: ["{#NAME}"] Workqueue adds total, rate | Total number of adds handled by workqueue per second. |
Dependent item | kubernetes.api.workqueueaddstotal.rate["{#NAME}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Client certificate expiration histogram | Discovery raw data of client certificate expiration |
Dependent item | kubernetes.api.certificate_expiration.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Kubernetes API: Certificate expiration seconds bucket, {#LE} | Distribution of the remaining lifetime on the certificate used to authenticate a request. |
Dependent item | kubernetes.api.clientcertificateexpirationsecondsbucket[{#LE}] Preprocessing
|
Kubernetes API: Client certificate expiration, p1 | 1 percentile of the remaining lifetime on the certificate used to authenticate a request. |
Calculated | kubernetes.api.clientcertificateexpiration_p1[{#SINGLETON}] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kubernetes API: Kubernetes client certificate is expiring | A client certificate used to authenticate to the apiserver is expiring in {$KUBE.API.CERT.EXPIRATION} days. |
last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) > 0 and last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) < {$KUBE.API.CERT.EXPIRATION}*24*60*60 |Warning |
Depends on:
|
|
Kubernetes API: Kubernetes client certificate expires soon | A client certificate used to authenticate to the apiserver is expiring in less than 24.0 hours. |
last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) > 0 and last(/Kubernetes API server by HTTP/kubernetes.api.client_certificate_expiration_p1[{#SINGLETON}]) < 24*60*60 |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Apache Kafka monitoring by Zabbix via JMX and doesn't require any external scripts.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Metrics are collected by JMX.
Name | Description | Default |
---|---|---|
{$KAFKA.USER} | zabbix |
|
{$KAFKA.PASSWORD} | zabbix |
|
{$KAFKA.TOPIC.MATCHES} | Filter of discoverable topics |
.* |
{$KAFKA.TOPIC.NOT_MATCHES} | Filter to exclude discovered topics |
__consumer_offsets |
{$KAFKA.NETPROCAVG_IDLE.MIN.WARN} | The minimum Network processor average idle percent for trigger expression. |
30 |
{$KAFKA.REQUESTHANDLERAVG_IDLE.MIN.WARN} | The minimum Request handler average idle percent for trigger expression. |
30 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Kafka: Leader election per second | Number of leader elections per second. |
JMX agent | jmx["kafka.controller:type=ControllerStats,name=LeaderElectionRateAndTimeMs","Count"] |
Kafka: Unclean leader election per second | Number of “unclean” elections per second. |
JMX agent | jmx["kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec","Count"] Preprocessing
|
Kafka: Controller state on broker | One indicates that the broker is the controller for the cluster. |
JMX agent | jmx["kafka.controller:type=KafkaController,name=ActiveControllerCount","Value"] Preprocessing
|
Kafka: Ineligible pending replica deletes | The number of ineligible pending replica deletes. |
JMX agent | jmx["kafka.controller:type=KafkaController,name=ReplicasIneligibleToDeleteCount","Value"] |
Kafka: Pending replica deletes | The number of pending replica deletes. |
JMX agent | jmx["kafka.controller:type=KafkaController,name=ReplicasToDeleteCount","Value"] |
Kafka: Ineligible pending topic deletes | The number of ineligible pending topic deletes. |
JMX agent | jmx["kafka.controller:type=KafkaController,name=TopicsIneligibleToDeleteCount","Value"] |
Kafka: Pending topic deletes | The number of pending topic deletes. |
JMX agent | jmx["kafka.controller:type=KafkaController,name=TopicsToDeleteCount","Value"] |
Kafka: Offline log directory count | The number of offline log directories (for example, after a hardware failure). |
JMX agent | jmx["kafka.log:type=LogManager,name=OfflineLogDirectoryCount","Value"] |
Kafka: Offline partitions count | Number of partitions that don't have an active leader. |
JMX agent | jmx["kafka.controller:type=KafkaController,name=OfflinePartitionsCount","Value"] |
Kafka: Bytes out per second | The rate at which data is fetched and read from the broker by consumers. |
JMX agent | jmx["kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec","Count"] Preprocessing
|
Kafka: Bytes in per second | The rate at which data sent from producers is consumed by the broker. |
JMX agent | jmx["kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec","Count"] Preprocessing
|
Kafka: Messages in per second | The rate at which individual messages are consumed by the broker. |
JMX agent | jmx["kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec","Count"] Preprocessing
|
Kafka: Bytes rejected per second | The rate at which bytes rejected per second by the broker. |
JMX agent | jmx["kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec","Count"] Preprocessing
|
Kafka: Client fetch request failed per second | Number of client fetch request failures per second. |
JMX agent | jmx["kafka.server:type=BrokerTopicMetrics,name=FailedFetchRequestsPerSec","Count"] Preprocessing
|
Kafka: Produce requests failed per second | Number of failed produce requests per second. |
JMX agent | jmx["kafka.server:type=BrokerTopicMetrics,name=FailedProduceRequestsPerSec","Count"] Preprocessing
|
Kafka: Request handler average idle percent | Indicates the percentage of time that the request handler (IO) threads are not in use. |
JMX agent | jmx["kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent","OneMinuteRate"] Preprocessing
|
Kafka: Fetch-Consumer response send time, mean | Average time taken, in milliseconds, to send the response. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer","Mean"] |
Kafka: Fetch-Consumer response send time, p95 | The time taken, in milliseconds, to send the response for 95th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer","95thPercentile"] |
Kafka: Fetch-Consumer response send time, p99 | The time taken, in milliseconds, to send the response for 99th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchConsumer","99thPercentile"] |
Kafka: Fetch-Follower response send time, mean | Average time taken, in milliseconds, to send the response. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchFollower","Mean"] |
Kafka: Fetch-Follower response send time, p95 | The time taken, in milliseconds, to send the response for 95th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchFollower","95thPercentile"] |
Kafka: Fetch-Follower response send time, p99 | The time taken, in milliseconds, to send the response for 99th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=FetchFollower","99thPercentile"] |
Kafka: Produce response send time, mean | Average time taken, in milliseconds, to send the response. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce","Mean"] |
Kafka: Produce response send time, p95 | The time taken, in milliseconds, to send the response for 95th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce","95thPercentile"] |
Kafka: Produce response send time, p99 | The time taken, in milliseconds, to send the response for 99th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=ResponseSendTimeMs,request=Produce","99thPercentile"] |
Kafka: Fetch-Consumer request total time, mean | Average time in ms to serve the Fetch-Consumer request. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer","Mean"] |
Kafka: Fetch-Consumer request total time, p95 | Time in ms to serve the Fetch-Consumer request for 95th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer","95thPercentile"] |
Kafka: Fetch-Consumer request total time, p99 | Time in ms to serve the specified Fetch-Consumer for 99th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchConsumer","99thPercentile"] |
Kafka: Fetch-Follower request total time, mean | Average time in ms to serve the Fetch-Follower request. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower","Mean"] |
Kafka: Fetch-Follower request total time, p95 | Time in ms to serve the Fetch-Follower request for 95th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower","95thPercentile"] |
Kafka: Fetch-Follower request total time, p99 | Time in ms to serve the Fetch-Follower request for 99th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=FetchFollower","99thPercentile"] |
Kafka: Produce request total time, mean | Average time in ms to serve the Produce request. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce","Mean"] |
Kafka: Produce request total time, p95 | Time in ms to serve the Produce requests for 95th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce","95thPercentile"] |
Kafka: Produce request total time, p99 | Time in ms to serve the Produce requests for 99th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=Produce","99thPercentile"] |
Kafka: Fetch-Consumer request total time, mean | Average time for a request to update metadata. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata","Mean"] |
Kafka: UpdateMetadata request total time, p95 | Time for update metadata requests for 95th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata","95thPercentile"] |
Kafka: UpdateMetadata request total time, p99 | Time for update metadata requests for 99th percentile. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TotalTimeMs,request=UpdateMetadata","99thPercentile"] |
Kafka: Temporary memory size in bytes (Fetch), max | The maximum of temporary memory used for converting message formats and decompressing messages. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Fetch","Max"] |
Kafka: Temporary memory size in bytes (Fetch), min | The minimum of temporary memory used for converting message formats and decompressing messages. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Fetch","Mean"] |
Kafka: Temporary memory size in bytes (Produce), max | The maximum of temporary memory used for converting message formats and decompressing messages. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Produce","Max"] |
Kafka: Temporary memory size in bytes (Produce), avg | The amount of temporary memory used for converting message formats and decompressing messages. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Produce","Mean"] |
Kafka: Temporary memory size in bytes (Produce), min | The minimum of temporary memory used for converting message formats and decompressing messages. |
JMX agent | jmx["kafka.network:type=RequestMetrics,name=TemporaryMemoryBytes,request=Produce","Min"] |
Kafka: Network processor average idle percent | The average percentage of time that the network processors are idle. |
JMX agent | jmx["kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent","Value"] Preprocessing
|
Kafka: Requests in producer purgatory | Number of requests waiting in producer purgatory. |
JMX agent | jmx["kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Fetch","Value"] |
Kafka: Requests in fetch purgatory | Number of requests waiting in fetch purgatory. |
JMX agent | jmx["kafka.server:type=DelayedOperationPurgatory,name=PurgatorySize,delayedOperation=Produce","Value"] |
Kafka: Replication maximum lag | The maximum lag between the time that messages are received by the leader replica and by the follower replicas. |
JMX agent | jmx["kafka.server:type=ReplicaFetcherManager,name=MaxLag,clientId=Replica","Value"] |
Kafka: Under minimum ISR partition count | The number of partitions under the minimum In-Sync Replica (ISR) count. |
JMX agent | jmx["kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCount","Value"] |
Kafka: Under replicated partitions | The number of partitions that have not been fully replicated in the follower replicas (the number of non-reassigning replicas - the number of ISR > 0). |
JMX agent | jmx["kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions","Value"] |
Kafka: ISR expands per second | The rate at which the number of ISRs in the broker increases. |
JMX agent | jmx["kafka.server:type=ReplicaManager,name=IsrExpandsPerSec","Count"] Preprocessing
|
Kafka: ISR shrink per second | Rate of replicas leaving the ISR pool. |
JMX agent | jmx["kafka.server:type=ReplicaManager,name=IsrShrinksPerSec","Count"] Preprocessing
|
Kafka: Leader count | The number of replicas for which this broker is the leader. |
JMX agent | jmx["kafka.server:type=ReplicaManager,name=LeaderCount","Value"] |
Kafka: Partition count | The number of partitions in the broker. |
JMX agent | jmx["kafka.server:type=ReplicaManager,name=PartitionCount","Value"] |
Kafka: Number of reassigning partitions | The number of reassigning leader partitions on a broker. |
JMX agent | jmx["kafka.server:type=ReplicaManager,name=ReassigningPartitions","Value"] |
Kafka: Request queue size | The size of the delay queue. |
JMX agent | jmx["kafka.server:type=Request","queue-size"] |
Kafka: Version | Current version of broker. |
JMX agent | jmx["kafka.server:type=app-info","version"] Preprocessing
|
Kafka: Uptime | The service uptime expressed in seconds. |
JMX agent | jmx["kafka.server:type=app-info","start-time-ms"] Preprocessing
|
Kafka: ZooKeeper client request latency | Latency in milliseconds for ZooKeeper requests from broker. |
JMX agent | jmx["kafka.server:type=ZooKeeperClientMetrics,name=ZooKeeperRequestLatencyMs","Count"] |
Kafka: ZooKeeper connection status | Connection status of broker's ZooKeeper session. |
JMX agent | jmx["kafka.server:type=SessionExpireListener,name=SessionState","Value"] Preprocessing
|
Kafka: ZooKeeper disconnect rate | ZooKeeper client disconnect per second. |
JMX agent | jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperDisconnectsPerSec","Count"] Preprocessing
|
Kafka: ZooKeeper session expiration rate | ZooKeeper client session expiration per second. |
JMX agent | jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperExpiresPerSec","Count"] Preprocessing
|
Kafka: ZooKeeper readonly rate | ZooKeeper client readonly per second. |
JMX agent | jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperReadOnlyConnectsPerSec","Count"] Preprocessing
|
Kafka: ZooKeeper sync rate | ZooKeeper client sync per second. |
JMX agent | jmx["kafka.server:type=SessionExpireListener,name=ZooKeeperSyncConnectsPerSec","Count"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Kafka: Unclean leader election detected | Unclean leader elections occur when there is no qualified partition leader among Kafka brokers. If Kafka is configured to allow an unclean leader election, a leader is chosen from the out-of-sync replicas, and any messages that were not synced prior to the loss of the former leader are lost forever. Essentially, unclean leader elections sacrifice consistency for availability. |
last(/Apache Kafka by JMX/jmx["kafka.controller:type=ControllerStats,name=UncleanLeaderElectionsPerSec","Count"])>0 |Average |
||
Kafka: There are offline log directories | The offline log directory count metric indicate the number of log directories which are offline (due to a hardware failure for example) so that the broker cannot store incoming messages anymore. |
last(/Apache Kafka by JMX/jmx["kafka.log:type=LogManager,name=OfflineLogDirectoryCount","Value"]) > 0 |Warning |
||
Kafka: One or more partitions have no leader | Any partition without an active leader will be completely inaccessible, and both consumers and producers of that partition will be blocked until a leader becomes available. |
last(/Apache Kafka by JMX/jmx["kafka.controller:type=KafkaController,name=OfflinePartitionsCount","Value"]) > 0 |Warning |
||
Kafka: Request handler average idle percent is too low | The request handler idle ratio metric indicates the percentage of time the request handlers are not in use. The lower this number, the more loaded the broker is. |
max(/Apache Kafka by JMX/jmx["kafka.server:type=KafkaRequestHandlerPool,name=RequestHandlerAvgIdlePercent","OneMinuteRate"],15m)<{$KAFKA.REQUEST_HANDLER_AVG_IDLE.MIN.WARN} |Average |
||
Kafka: Network processor average idle percent is too low | The network processor idle ratio metric indicates the percentage of time the network processor are not in use. The lower this number, the more loaded the broker is. |
max(/Apache Kafka by JMX/jmx["kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent","Value"],15m)<{$KAFKA.NET_PROC_AVG_IDLE.MIN.WARN} |Average |
||
Kafka: Failed to fetch info data | Zabbix has not received data for items for the last 15 minutes |
nodata(/Apache Kafka by JMX/jmx["kafka.network:type=SocketServer,name=NetworkProcessorAvgIdlePercent","Value"],15m)=1 |Warning |
||
Kafka: There are partitions under the min ISR | The Under min ISR partitions metric displays the number of partitions, where the number of In-Sync Replicas (ISR) is less than the minimum number of in-sync replicas specified. The two most common causes of under-min ISR partitions are that one or more brokers is unresponsive, or the cluster is experiencing performance issues and one or more brokers are falling behind. |
last(/Apache Kafka by JMX/jmx["kafka.server:type=ReplicaManager,name=UnderMinIsrPartitionCount","Value"])>0 |Average |
||
Kafka: There are under replicated partitions | The Under replicated partitions metric displays the number of partitions that do not have enough replicas to meet the desired replication factor. A partition will also be considered under-replicated if the correct number of replicas exist, but one or more of the replicas have fallen significantly behind the partition leader. The two most common causes of under-replicated partitions are that one or more brokers is unresponsive, or the cluster is experiencing performance issues and one or more brokers have fallen behind. |
last(/Apache Kafka by JMX/jmx["kafka.server:type=ReplicaManager,name=UnderReplicatedPartitions","Value"])>0 |Average |
||
Kafka: Version has changed | The Kafka version has changed. Acknowledge to close the problem manually. |
last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","version"],#1)<>last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","version"],#2) and length(last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","version"]))>0 |Info |
Manual close: Yes | |
Kafka: has been restarted | Uptime is less than 10 minutes. |
last(/Apache Kafka by JMX/jmx["kafka.server:type=app-info","start-time-ms"])<10m |Info |
Manual close: Yes | |
Kafka: Broker is not connected to ZooKeeper | find(/Apache Kafka by JMX/jmx["kafka.server:type=SessionExpireListener,name=SessionState","Value"],,"regexp","CONNECTED")=0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Topic Metrics (write) | JMX agent | jmx.discovery[beans,"kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic=*"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Kafka {#JMXTOPIC}: Messages in per second | The rate at which individual messages are consumed by topic. |
JMX agent | jmx["kafka.server:type=BrokerTopicMetrics,name=MessagesInPerSec,topic={#JMXTOPIC}","Count"] Preprocessing
|
Kafka {#JMXTOPIC}: Bytes in per second | The rate at which data sent from producers is consumed by topic. |
JMX agent | jmx["kafka.server:type=BrokerTopicMetrics,name=BytesInPerSec,topic={#JMXTOPIC}","Count"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Topic Metrics (read) | JMX agent | jmx.discovery[beans,"kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic=*"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Kafka {#JMXTOPIC}: Bytes out per second | The rate at which data is fetched and read from the broker by consumers (by topic). |
JMX agent | jmx["kafka.server:type=BrokerTopicMetrics,name=BytesOutPerSec,topic={#JMXTOPIC}","Count"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Topic Metrics (errors) | JMX agent | jmx.discovery[beans,"kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec,topic=*"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Kafka {#JMXTOPIC}: Bytes rejected per second | Rejected bytes rate by topic. |
JMX agent | jmx["kafka.server:type=BrokerTopicMetrics,name=BytesRejectedPerSec,topic={#JMXTOPIC}","Count"] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is used for monitoring Jira Data Center health. It is designed for standalone operation for on-premises Jira installations.
This template uses a single data source, JMX, which requires JMX RMI setup of your Jira application and Java Gateway setup on the Zabbix side. If you need "Garbage collector" and "Web server" monitoring, add "Generic Java JMX" and "Apache Tomcat by JMX" templates on the same host.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Metrics are collected by JMX.
{$JMX.USERNAME}
and {$JMX.PASSWORD}
.Name | Description | Default |
---|---|---|
{$JMX.USER} | User for JMX. |
|
{$JMX.PASSWORD} | Password for JMX. |
|
{$JIRA_DC.LICENSE.USER.CAPACITY.WARN} | User capacity warning threshold (%). |
80 |
{$JIRA_DC.DB.CONNECTION.USAGE.WARN} | Warning threshold for database connections usage (%). |
80 |
{$JIRA_DC.ISSUE.LATENCY.WARN} | Warning threshold for issue operation latency (in seconds). |
5 |
{$JIRA_DC.STORAGE.LATENCY.WARN} | Warning threshold for storage write operation latency (in seconds). |
5 |
{$JIRA_DC.INDEXING.LATENCY.WARN} | Warning threshold for indexing operation latency (in seconds). |
5 |
{$JIRA_DC.LLD.FILTER.MATCHES.HOMEFOLDERS} | Used for storage metric discovery. |
local|share |
{$JIRA_DC.LLD.FILTER.NOT.MATCHES.HOMEFOLDERS} | Used for storage metric discovery. |
NO MATCH |
{$JIRA_DC.LLD.FILTER.MATCHES.INDEXING} | Used for indexing metric discovery. |
.* |
{$JIRA_DC.LLD.FILTER.NOT.MATCHES.INDEXING} | Used for indexing metric discovery. |
NO MATCH |
{$JIRA_DC.LLD.FILTER.MATCHES.ISSUE} | Used for issue discovery. |
.* |
{$JIRA_DC.LLD.FILTER.NOT.MATCHES.ISSUE} | Used for issue discovery. |
NO MATCH |
{$JIRA_DC.LLD.FILTER.MATCHES.MAIL} | Used for mail server connection metric discovery. |
.* |
{$JIRA_DC.LLD.FILTER.NOT.MATCHES.MAIL} | Used for mail server connection metric discovery. |
NO MATCH |
{$JIRA_DC.LLD.FILTER.MATCHES.LICENSE} | Used for license discovery. |
.* |
{$JIRA_DC.LLD.FILTER.NOT.MATCHES.LICENSE} | Used for license discovery. |
NO MATCH |
Name | Description | Type | Key and additional info |
---|---|---|---|
DB: Connections: State | The state of the database connection. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=state,name=value",Value] |
DB: Connections: Failed per minute | The count of database connection failures registered in one minute. Units: fpm - fails per minute. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=failures,name=counter",Count] Preprocessing
|
DB: Pool: Connections: Idle | Idle connection count of the database pool. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=pool,category03=numIdle,name=value",Value] |
DB: Pool: Connections: Active | Active connection count of the database pool. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=pool,category03=numActive,name=value",Value] |
DB: Reads | Database read operations from Jira per second. Units: rps - read operations per second. |
JMX agent | jmx["com.atlassian.jira:type=db.reads",invocation.count] Preprocessing
|
DB: Writes | Database write operations from Jira per second. Units: wps - write operations per second. |
JMX agent | jmx["com.atlassian.jira:type=db.writes",invocation.count] Preprocessing
|
DB: Connections: Limit | Total allowed database connection count. |
JMX agent | jmx["com.atlassian.jira:name=BasicDataSource,connectionpool=connections",MaxTotal] |
DB: Connections: Active | Active database connection count. |
JMX agent | jmx["com.atlassian.jira:name=BasicDataSource,connectionpool=connections",NumActive] |
DB: Connections: Latency | The latest measure of latency when querying the database. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=latency,name=value",Value] |
License: Users: Get | License data for the discovery rule. |
JMX agent | jmx.discovery[attributes,"com.atlassian.jira:type=jira.license"] Preprocessing
|
HTTP: Pool: Connections: Active | The latest measure of the number of active connections in the HTTP connection pool. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=http,category01=connection,category02=pool,category03=numActive,name=value",Value] |
HTTP: Pool: Connections: Idle | The latest measure of the number of idle connections in the HTTP connection pool. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=http,category01=connection,category02=pool,category03=numIdle,name=value",Value] |
HTTP: Sessions: Active | The latest measure of the number of active user sessions. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=http,category01=connection,category02=sessions,category03=active,name=value",Value] |
HTTP: Requests per minute | The latest measure of the total number of HTTP requests per minute. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=http,category01=requests,name=value",Value] |
Mail: Queue | The latest measure of the number of items in a mail queue. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numItems,name=value",Value] |
Mail: Queue: Error | The latest measure of the number of items in an error mail queue. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numErrors,name=value",Value] |
Mail: Sent per minute | The latest measure of the number of emails sent by the SMTP server per minute. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numEmailsSentPerMin,name=value",Value] |
Mail: Processed per minute | The latest measure of the number of items processed by a mail queue per minute. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numItemsProcessedPerMin,name=value",Value] |
Mail: Queue: Processing state | The latest indicator of the state of a mail queue job. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=jobRunning,name=value",Value] |
Entity: Issues | The number of issues. |
JMX agent | jmx["com.atlassian.jira:type=entity.issues.total",Value] |
Entity: Attachments | The number of attachments. |
JMX agent | jmx["com.atlassian.jira:type=entity.attachments.total",Value] |
Entity: Components | The number of components. |
JMX agent | jmx["com.atlassian.jira:type=entity.components.total",Value] |
Entity: Custom fields | The number of custom fields. |
JMX agent | jmx["com.atlassian.jira:type=entity.customfields.total",Value] |
Entity: Filters | The number of filters. |
JMX agent | jmx["com.atlassian.jira:type=entity.filters.total",Value] |
Entity: Versions created | The number of versions created. |
JMX agent | jmx["com.atlassian.jira:type=entity.versions.total",Value] |
Issue: Search per minute | Issue searches performed per minute. |
JMX agent | jmx["com.atlassian.jira:type=issue.search.count",Value] Preprocessing
|
Issue: Created per minute | Issues created per minute. |
JMX agent | jmx["com.atlassian.jira:type=issue.created.count",Value] Preprocessing
|
Issue: Updates per minute | Issue updates performed per minute. |
JMX agent | jmx["com.atlassian.jira:type=issue.updated.count",Value] Preprocessing
|
Quicksearch: Concurrent searches | The number of concurrent searches that are being performed in real-time by using the quick search. |
JMX agent | jmx["com.atlassian.jira:type=quicksearch.concurrent.search",Value] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
DB: Connection lost | Database connection lost |
max(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=state,name=value",Value],3m)=0 |Average |
Manual close: Yes | |
DB: Pool: Out of idle connections | Fires when out of idle connections in database pool for 5 minutes. |
min(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=db,category01=connection,category02=pool,category03=numIdle,name=value",Value],5m)<=0 |Warning |
Manual close: Yes | |
DB: Connection usage is near the limit | 100*min(/Jira Data Center by JMX/jmx["com.atlassian.jira:name=BasicDataSource,connectionpool=connections",NumActive],5m)/last(/Jira Data Center by JMX/jmx["com.atlassian.jira:name=BasicDataSource,connectionpool=connections",MaxTotal])>{$JIRA_DC.DB.CONNECTION.USAGE.WARN} |Warning |
Manual close: Yes | ||
DB: Connection limit reached | min(/Jira Data Center by JMX/jmx["com.atlassian.jira:name=BasicDataSource,connectionpool=connections",NumActive],5m)=last(/Jira Data Center by JMX/jmx["com.atlassian.jira:name=BasicDataSource,connectionpool=connections",MaxTotal]) |Warning |
Manual close: Yes | ||
HTTP: Pool: Out of idle connections | All available connections are utilized. It can cause outages for users as the system is unable to serve their requests. |
min(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=http,category01=connection,category02=pool,category03=numIdle,name=value",Value],5m)<=0 |Warning |
Manual close: Yes | |
Mail: Queue: Doesn’t empty over an extended period | Might indicate SMTP performance or connection problems. |
min(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numItems,name=value",Value],30m)>0 |Warning |
Manual close: Yes Depends on:
|
|
Mail: Error queue contains one or more items | A mail queue attempts to resend items up to 10 times. If the operation fails for the 11th time, the items are put into an error mail queue. |
max(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numErrors,name=value",Value],5m)>0 |Warning |
Manual close: Yes | |
Mail: Queue job is not running | It should be running when its queue is not empty. |
max(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=jobRunning,name=value",Value],15m)=0 and min(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=mail,category01=queue,category02=numItems,name=value",Value],15m)>0 |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Storage discovery | Discovery of the Jira storage metrics. |
JMX agent | jmx.discovery[beans,"com.atlassian.jira:type=metrics,category00=home,category01=,category02=write,category03=latency,,name=value"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Storage [{#JMXCATEGORY01}]: Latency | The median latency of writing a small file (~30 bytes) to |
JMX agent | jmx["{#JMXOBJ}",Value] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Storage [{#JMXCATEGORY01}]: Slow performance | Fires when latency grows above the threshold: |
min(/Jira Data Center by JMX/jmx["{#JMXOBJ}",Value],5m)>{$JIRA_DC.STORAGE.LATENCY.WARN:"{#JMXCATEGORY01}"} |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Mail server discovery | Discovery of the Jira connected mail servers. |
JMX agent | jmx.discovery[beans,"com.atlassian.jira:type=metrics,category00=mail,category01=,category02=connection,category03=state,name="] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Mail [{#JMXCATEGORY01},{#JMXNAME}]: Connection state | Shows connection state of Jira to discovered mail server: |
JMX agent | jmx["{#JMXOBJ}",Connected] Preprocessing
|
Mail [{#JMXCATEGORY01},{#JMXNAME}]: Failures per minute | Count of failed connections to discovered mail server |
JMX agent | jmx["{#JMXOBJ}",TotalFailures] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Mail [{#JMXCATEGORY01}-{#JMXNAME}]: Server disconnected | Trigger is fired when discovered mail server |
max(/Jira Data Center by JMX/jmx["{#JMXOBJ}",Connected],5m)=0 |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Indexing latency discovery | Discovery of the Jira indexing metrics. |
JMX agent | jmx.discovery[beans,"com.atlassian.jira:type=metrics,category00=indexing,name=*"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Indexing [{#JMXNAME}]: Latency | Average time spent on indexing operations. |
JMX agent | jmx["com.atlassian.jira:type=metrics,category00=indexing,name={#JMXNAME}",Mean] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Indexing [{#JMXNAME}]: Slow performance | Fires when latency grows above the threshold: |
min(/Jira Data Center by JMX/jmx["com.atlassian.jira:type=metrics,category00=indexing,name={#JMXNAME}",Mean],5m)>{$JIRA_DC.INDEXING.LATENCY.WARN:"{#JMXNAME}"} |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Issue latency discovery | Discovery of the Jira issue latency metrics. |
JMX agent | jmx.discovery[beans,"com.atlassian.jira:type=metrics,category00=issue,name=*"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Issue [{#JMXNAME}]: Latency | Average time spent on issue |
JMX agent | jmx["{#JMXOBJ}",Mean] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Issue [{#JMXNAME}]: Slow operations | Fires when latency grows above the threshold: |
min(/Jira Data Center by JMX/jmx["{#JMXOBJ}",Mean],5m)>{$JIRA_DC.ISSUE.LATENCY.WARN:"{#JMXNAME}"} |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
License discovery | Discovery of the Jira licenses. |
Dependent item | jmx.license.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
License [{#LICENSE.TYPE}]: Users: Current | Current user count for |
Dependent item | jmx.license.get.user.current["{#LICENSE.TYPE}"] Preprocessing
|
License [{#LICENSE.TYPE}]: Users: Maximum | User count limit for
|
Dependent item | jmx.license.get.user.max["{#LICENSE.TYPE}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
License [{#LICENSE.TYPE}]: Low user capacity | Fires when relative user quantity grows above the threshold: |
last(/Jira Data Center by JMX/jmx.license.get.user.max["{#LICENSE.TYPE}"])>=0 * (100*last(/Jira Data Center by JMX/jmx.license.get.user.current["{#LICENSE.TYPE}"])/last(/Jira Data Center by JMX/jmx.license.get.user.max["{#LICENSE.TYPE}"])>{$JIRA_DC.LICENSE.USER.CAPACITY.WARN:"{#LICENSE.TYPE}"}) |Warning |
Manual close: Yes Depends on:
|
|
License [{#LICENSE.TYPE}]: User count reached the limit | Fires when user quantity reaches the limit. |
last(/Jira Data Center by JMX/jmx.license.get.user.max["{#LICENSE.TYPE}"])>=0 * ((last(/Jira Data Center by JMX/jmx.license.get.user.max["{#LICENSE.TYPE}"])-last(/Jira Data Center by JMX/jmx.license.get.user.current["{#LICENSE.TYPE}"]))<=0) |Average |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Apache Jenkins by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Metrics are collected by requests to Metrics API. For common metrics: Install and configure Metrics plugin parameters according official documentations. Do not forget to configure access to the Metrics Servlet by issuing API key and change macro {$JENKINS.API.KEY}.
For monitoring computers and builds: Create API token for monitoring user according official documentations and change macro {$JENKINS.USER}, {$JENKINS.API.TOKEN}. Don't forget to change macros {$JENKINS.URL}.
Name | Description | Default |
---|---|---|
{$JENKINS.URL} | Jenkins URL in the format |
|
{$JENKINS.API.KEY} | API key to access Metrics Servlet |
|
{$JENKINS.USER} | Username for HTTP BASIC authentication |
zabbix |
{$JENKINS.API.TOKEN} | API token for HTTP BASIC authentication. |
|
{$JENKINS.PING.REPLY} | Expected reply to the ping. |
pong |
{$JENKINS.FILE_DESCRIPTORS.MAX.WARN} | Maximum percentage of file descriptors usage alert threshold (for trigger expression). |
85 |
{$JENKINS.JOB.HEALTH.SCORE.MIN.WARN} | Minimum job's health score (for trigger expression). |
50 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Jenkins: Get service metrics | HTTP agent | jenkins.get_metrics Preprocessing
|
|
Jenkins: Get healthcheck | HTTP agent | jenkins.healthcheck Preprocessing
|
|
Jenkins: Get jobs info | HTTP agent | jenkins.job_info Preprocessing
|
|
Jenkins: Get computer info | HTTP agent | jenkins.computer_info Preprocessing
|
|
Jenkins: Disk space check message | The message will reference the first node which fails this check. There may be other nodes that fail the check, but this health check is designed to fail fast. |
Dependent item | jenkins.disk_space.message Preprocessing
|
Jenkins: Temporary space check message | The message will reference the first node which fails this check. There may be other nodes that fail the check, but this health check is designed to fail fast. |
Dependent item | jenkins.temporary_space.message Preprocessing
|
Jenkins: Plugins check message | The message of plugins health check. |
Dependent item | jenkins.plugins.message Preprocessing
|
Jenkins: Thread deadlock check message | The message of thread deadlock health check. |
Dependent item | jenkins.thread_deadlock.message Preprocessing
|
Jenkins: Disk space check | Returns FAIL if any of the Jenkins disk space monitors are reporting the disk space as less than the configured threshold. |
Dependent item | jenkins.disk_space Preprocessing
|
Jenkins: Plugins check | Returns FAIL if any of the Jenkins plugins failed to start. |
Dependent item | jenkins.plugins Preprocessing
|
Jenkins: Temporary space check | Returns FAIL if any of the Jenkins temporary space monitors are reporting the temporary space as less than the configured threshold. |
Dependent item | jenkins.temporary_space Preprocessing
|
Jenkins: Thread deadlock check | Returns FAIL if there are any deadlocked threads in the Jenkins master JVM. |
Dependent item | jenkins.thread_deadlock Preprocessing
|
Jenkins: Get gauges | Raw items for gauges metrics. |
Dependent item | jenkins.gauges.raw Preprocessing
|
Jenkins: Executors count | The number of executors available to Jenkins. This is corresponds to the sum of all the executors of all the online nodes. |
Dependent item | jenkins.executor.count Preprocessing
|
Jenkins: Executors free | The number of executors available to Jenkins that are not currently in use. |
Dependent item | jenkins.executor.free Preprocessing
|
Jenkins: Executors in use | The number of executors available to Jenkins that are currently in use. |
Dependent item | jenkins.executor.in_use Preprocessing
|
Jenkins: Nodes count | The number of build nodes available to Jenkins, both online and offline. |
Dependent item | jenkins.node.count Preprocessing
|
Jenkins: Nodes offline | The number of build nodes available to Jenkins but currently offline. |
Dependent item | jenkins.node.offline Preprocessing
|
Jenkins: Nodes online | The number of build nodes available to Jenkins and currently online. |
Dependent item | jenkins.node.online Preprocessing
|
Jenkins: Plugins active | The number of plugins in the Jenkins instance that started successfully. |
Dependent item | jenkins.plugins.active Preprocessing
|
Jenkins: Plugins failed | The number of plugins in the Jenkins instance that failed to start. A value other than 0 is typically indicative of a potential issue within the Jenkins installation that will either be solved by explicitly disabling the plugin(s) or by resolving the plugin dependency issues. |
Dependent item | jenkins.plugins.failed Preprocessing
|
Jenkins: Plugins inactive | The number of plugins in the Jenkins instance that are not currently enabled. |
Dependent item | jenkins.plugins.inactive Preprocessing
|
Jenkins: Plugins with update | The number of plugins in the Jenkins instance that have a newer version reported as available in the current Jenkins update center metadata held by Jenkins. This value is not indicative of an issue with Jenkins but high values can be used as a trigger to review the plugins with updates with a view to seeing whether those updates potentially contain fixes for issues that could be affecting your Jenkins instance. |
Dependent item | jenkins.plugins.with_update Preprocessing
|
Jenkins: Projects count | The number of projects. |
Dependent item | jenkins.project.count Preprocessing
|
Jenkins: Jobs count | The number of jobs in Jenkins. |
Dependent item | jenkins.job.count.value Preprocessing
|
Jenkins: Get meters | Raw items for meters metrics. |
Dependent item | jenkins.meters.raw Preprocessing
|
Jenkins: Job scheduled, m1 rate | The rate at which jobs are scheduled. If a job is already in the queue and an identical request for scheduling the job is received then Jenkins will coalesce the two requests. This metric gives a reasonably pure measure of the load requirements of the Jenkins master as it is unaffected by the number of executors available to the system. |
Dependent item | jenkins.job.scheduled.m1.rate Preprocessing
|
Jenkins: Jobs scheduled, m5 rate | The rate at which jobs are scheduled. If a job is already in the queue and an identical request for scheduling the job is received then Jenkins will coalesce the two requests. This metric gives a reasonably pure measure of the load requirements of the Jenkins master as it is unaffected by the number of executors available to the system. |
Dependent item | jenkins.job.scheduled.m5.rate Preprocessing
|
Jenkins: Get timers | Raw items for timers metrics. |
Dependent item | jenkins.timers.raw Preprocessing
|
Jenkins: Job blocked, m1 rate | The rate at which jobs in the build queue enter the blocked state. |
Dependent item | jenkins.job.blocked.m1.rate Preprocessing
|
Jenkins: Job blocked, m5 rate | The rate at which jobs in the build queue enter the blocked state. |
Dependent item | jenkins.job.blocked.m5.rate Preprocessing
|
Jenkins: Job blocked duration, p95 | The amount of time which jobs spend in the blocked state. |
Dependent item | jenkins.job.blocked.duration.p95 Preprocessing
|
Jenkins: Job blocked duration, median | The amount of time which jobs spend in the blocked state. |
Dependent item | jenkins.job.blocked.duration.p50 Preprocessing
|
Jenkins: Job building, m1 rate | The rate at which jobs are built. |
Dependent item | jenkins.job.building.m1.rate Preprocessing
|
Jenkins: Job building, m5 rate | The rate at which jobs are built. |
Dependent item | jenkins.job.building.m5.rate Preprocessing
|
Jenkins: Job building duration, p95 | The amount of time which jobs spend building. |
Dependent item | jenkins.job.building.duration.p95 Preprocessing
|
Jenkins: Job building duration, median | The amount of time which jobs spend building. |
Dependent item | jenkins.job.building.duration.p50 Preprocessing
|
Jenkins: Job buildable, m1 rate | The rate at which jobs in the build queue enter the buildable state. |
Dependent item | jenkins.job.buildable.m1.rate Preprocessing
|
Jenkins: Job buildable, m5 rate | The rate at which jobs in the build queue enter the buildable state. |
Dependent item | jenkins.job.buildable.m5.rate Preprocessing
|
Jenkins: Job buildable duration, p95 | The amount of time which jobs spend in the buildable state. |
Dependent item | jenkins.job.buildable.duration.p95 Preprocessing
|
Jenkins: Job buildable duration, median | The amount of time which jobs spend in the buildable state. |
Dependent item | jenkins.job.buildable.duration.p50 Preprocessing
|
Jenkins: Job queuing, m1 rate | The rate at which jobs are queued. |
Dependent item | jenkins.job.queuing.m1.rate Preprocessing
|
Jenkins: Job queuing, m5 rate | The rate at which jobs are queued. |
Dependent item | jenkins.job.queuing.m5.rate Preprocessing
|
Jenkins: Job queuing duration, p95 | The total time which jobs spend in the build queue. |
Dependent item | jenkins.job.queuing.duration.p95 Preprocessing
|
Jenkins: Job queuing duration, median | The total time which jobs spend in the build queue. |
Dependent item | jenkins.job.queuing.duration.p50 Preprocessing
|
Jenkins: Job total, m1 rate | The rate at which jobs are queued. |
Dependent item | jenkins.job.total.m1.rate Preprocessing
|
Jenkins: Job total, m5 rate | The rate at which jobs are queued. |
Dependent item | jenkins.job.total.m5.rate Preprocessing
|
Jenkins: Job total duration, p95 | The total time which jobs spend from entering the build queue to completing building. |
Dependent item | jenkins.job.total.duration.p95 Preprocessing
|
Jenkins: Job total duration, median | The total time which jobs spend from entering the build queue to completing building. |
Dependent item | jenkins.job.total.duration.p50 Preprocessing
|
Jenkins: Job waiting, m1 rate | The rate at which jobs enter the quiet period. |
Dependent item | jenkins.job.waiting.m1.rate Preprocessing
|
Jenkins: Job waiting, m5 rate | The rate at which jobs enter the quiet period. |
Dependent item | jenkins.job.waiting.m5.rate Preprocessing
|
Jenkins: Job waiting duration, p95 | The total amount of time that jobs spend in their quiet period. |
Dependent item | jenkins.job.waiting.duration.p95 Preprocessing
|
Jenkins: Job waiting duration, median | The total amount of time that jobs spend in their quiet period. |
Dependent item | jenkins.job.waiting.duration.p50 Preprocessing
|
Jenkins: Build queue, blocked | The number of jobs that are in the Jenkins build queue and currently in the blocked state. |
Dependent item | jenkins.queue.blocked Preprocessing
|
Jenkins: Build queue, size | The number of jobs that are in the Jenkins build queue. |
Dependent item | jenkins.queue.size Preprocessing
|
Jenkins: Build queue, buildable | The number of jobs that are in the Jenkins build queue and currently in the blocked state. |
Dependent item | jenkins.queue.buildable Preprocessing
|
Jenkins: Build queue, pending | The number of jobs that are in the Jenkins build queue and currently in the blocked state. |
Dependent item | jenkins.queue.pending Preprocessing
|
Jenkins: Build queue, stuck | The number of jobs that are in the Jenkins build queue and currently in the blocked state. |
Dependent item | jenkins.queue.stuck Preprocessing
|
Jenkins: HTTP active requests, rate | The number of currently active requests against the Jenkins master Web UI. |
Dependent item | jenkins.http.active_requests.rate Preprocessing
|
Jenkins: HTTP response 400, rate | The rate at which the Jenkins master Web UI is responding to requests with an HTTP/400 status code. |
Dependent item | jenkins.http.bad_request.rate Preprocessing
|
Jenkins: HTTP response 500, rate | The rate at which the Jenkins master Web UI is responding to requests with an HTTP/500 status code. |
Dependent item | jenkins.http.server_error.rate Preprocessing
|
Jenkins: HTTP response 503, rate | The rate at which the Jenkins master Web UI is responding to requests with an HTTP/503 status code. |
Dependent item | jenkins.http.service_unavailable.rate Preprocessing
|
Jenkins: HTTP response 200, rate | The rate at which the Jenkins master Web UI is responding to requests with an HTTP/200 status code. |
Dependent item | jenkins.http.ok.rate Preprocessing
|
Jenkins: HTTP response other, rate | The rate at which the Jenkins master Web UI is responding to requests with a non-informational status code that is not in the list: HTTP/200, HTTP/201, HTTP/204, HTTP/304, HTTP/400, HTTP/403, HTTP/404, HTTP/500, or HTTP/503. |
Dependent item | jenkins.http.other.rate Preprocessing
|
Jenkins: HTTP response 201, rate | The rate at which the Jenkins master Web UI is responding to requests with an HTTP/201 status code. |
Dependent item | jenkins.http.created.rate Preprocessing
|
Jenkins: HTTP response 204, rate | The rate at which the Jenkins master Web UI is responding to requests with an HTTP/204 status code. |
Dependent item | jenkins.http.no_content.rate Preprocessing
|
Jenkins: HTTP response 404, rate | The rate at which the Jenkins master Web UI is responding to requests with an HTTP/404 status code. |
Dependent item | jenkins.http.not_found.rate Preprocessing
|
Jenkins: HTTP response 304, rate | The rate at which the Jenkins master Web UI is responding to requests with an HTTP/304 status code. |
Dependent item | jenkins.http.not_modified.rate Preprocessing
|
Jenkins: HTTP response 403, rate | The rate at which the Jenkins master Web UI is responding to requests with an HTTP/403 status code. |
Dependent item | jenkins.http.forbidden.rate Preprocessing
|
Jenkins: HTTP requests, rate | The rate at which the Jenkins master Web UI is receiving requests. |
Dependent item | jenkins.http.requests.rate Preprocessing
|
Jenkins: HTTP requests, p95 | The time spent generating the corresponding responses. |
Dependent item | jenkins.http.requests_p95.rate Preprocessing
|
Jenkins: HTTP requests, median | The time spent generating the corresponding responses. |
Dependent item | jenkins.http.requests_p50.rate Preprocessing
|
Jenkins: Version | Version of Jenkins server. |
Dependent item | jenkins.version Preprocessing
|
Jenkins: CPU Load | The system load on the Jenkins master as reported by the JVM's Operating System JMX bean. The calculation of system load is operating system dependent. Typically this is the sum of the number of processes that are currently running plus the number that are waiting to run. This is typically comparable against the number of CPU cores. |
Dependent item | jenkins.system.cpu.load Preprocessing
|
Jenkins: Uptime | The number of seconds since the Jenkins master JVM started. |
Dependent item | jenkins.system.uptime Preprocessing
|
Jenkins: File descriptor ratio | The ratio of used to total file descriptors |
Dependent item | jenkins.descriptor.ratio Preprocessing
|
Jenkins: Service ping | HTTP agent | jenkins.ping Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Jenkins: Disk space is too low | Jenkins disk space monitors are reporting the disk space as less than the configured threshold. The message will reference the first node which fails this check. |
last(/Jenkins by HTTP/jenkins.disk_space)=0 and length(last(/Jenkins by HTTP/jenkins.disk_space.message))>0 |Warning |
||
Jenkins: One or more Jenkins plugins failed to start | A failure is typically indicative of a potential issue within the Jenkins installation that will either be solved by explicitly disabling the failing plugin(s) or by resolving the corresponding plugin dependency issues. |
last(/Jenkins by HTTP/jenkins.plugins)=0 and length(last(/Jenkins by HTTP/jenkins.plugins.message))>0 |Info |
Manual close: Yes | |
Jenkins: Temporary space is too low | Jenkins temporary space monitors are reporting the temporary space as less than the configured threshold. The message will reference the first node which fails this check. |
last(/Jenkins by HTTP/jenkins.temporary_space)=0 and length(last(/Jenkins by HTTP/jenkins.temporary_space.message))>0 |Warning |
||
Jenkins: There are deadlocked threads in Jenkins master JVM | There are any deadlocked threads in the Jenkins master JVM. |
last(/Jenkins by HTTP/jenkins.thread_deadlock)=0 and length(last(/Jenkins by HTTP/jenkins.thread_deadlock.message))>0 |Warning |
||
Jenkins: Service has no online nodes | last(/Jenkins by HTTP/jenkins.node.online)=0 |Average |
|||
Jenkins: Version has changed | The Jenkins version has changed. Acknowledge to close the problem manually. |
last(/Jenkins by HTTP/jenkins.version,#1)<>last(/Jenkins by HTTP/jenkins.version,#2) and length(last(/Jenkins by HTTP/jenkins.version))>0 |Info |
Manual close: Yes | |
Jenkins: Host has been restarted | Uptime is less than 10 minutes. |
last(/Jenkins by HTTP/jenkins.system.uptime)<10m |Info |
Manual close: Yes | |
Jenkins: Current number of used files is too high | min(/Jenkins by HTTP/jenkins.descriptor.ratio,5m)>{$JENKINS.FILE_DESCRIPTORS.MAX.WARN} |Warning |
|||
Jenkins: Service is down | last(/Jenkins by HTTP/jenkins.ping)=0 |Average |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Jobs discovery | HTTP agent | jenkins.jobs Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Jenkins job [{#NAME}]: Get job | Raw data for a job. |
Dependent item | jenkins.job.get[{#NAME}] Preprocessing
|
Jenkins job [{#NAME}]: Health score | Represents health of project. A number between 0-100. Job Description: {#DESCRIPTION} Job Url: {#URL} |
Dependent item | jenkins.build.health[{#NAME}] Preprocessing
|
Jenkins job [{#NAME}]: Last Build number | Details: {#URL}/lastBuild/ |
Dependent item | jenkins.last_build.number[{#NAME}] Preprocessing
|
Jenkins job [{#NAME}]: Last Build duration | Build duration (in seconds). |
Dependent item | jenkins.last_build.duration[{#NAME}] Preprocessing
|
Jenkins job [{#NAME}]: Last Build timestamp | Dependent item | jenkins.last_build.timestamp[{#NAME}] Preprocessing
|
|
Jenkins job [{#NAME}]: Last Build result | Dependent item | jenkins.last_build.result[{#NAME}] Preprocessing
|
|
Jenkins job [{#NAME}]: Last Failed Build number | Details: {#URL}/lastFailedBuild/ |
Dependent item | jenkins.lastfailedbuild.number[{#NAME}] Preprocessing
|
Jenkins job [{#NAME}]: Last Failed Build duration | Build duration (in seconds). |
Dependent item | jenkins.lastfailedbuild.duration[{#NAME}] Preprocessing
|
Jenkins job [{#NAME}]: Last Failed Build timestamp | Dependent item | jenkins.lastfailedbuild.timestamp[{#NAME}] Preprocessing
|
|
Jenkins job [{#NAME}]: Last Successful Build number | Details: {#URL}/lastSuccessfulBuild/ |
Dependent item | jenkins.lastsuccessfulbuild.number[{#NAME}] Preprocessing
|
Jenkins job [{#NAME}]: Last Successful Build duration | Build duration (in seconds). |
Dependent item | jenkins.lastsuccessfulbuild.duration[{#NAME}] Preprocessing
|
Jenkins job [{#NAME}]: Last Successful Build timestamp | Dependent item | jenkins.lastsuccessfulbuild.timestamp[{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Jenkins job [{#NAME}]: Job is unhealthy | last(/Jenkins by HTTP/jenkins.build.health[{#NAME}])<{$JENKINS.JOB.HEALTH.SCORE.MIN.WARN} |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Computers discovery | HTTP agent | jenkins.computers Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Jenkins: Computer [{#DISPLAY_NAME}]: Get computer | Raw data for a computer. |
Dependent item | jenkins.computer.get[{#DISPLAY_NAME}] Preprocessing
|
Jenkins: Computer [{#DISPLAY_NAME}]: Executors | The maximum number of concurrent builds that Jenkins may perform on this node. |
Dependent item | jenkins.computer.numExecutors[{#DISPLAY_NAME}] Preprocessing
|
Jenkins: Computer [{#DISPLAY_NAME}]: State | Represents the actual online/offline state. Node description: {#DESCRIPTION} |
Dependent item | jenkins.computer.state[{#DISPLAY_NAME}] Preprocessing
|
Jenkins: Computer [{#DISPLAY_NAME}]: Offline cause reason | If the computer was offline (either temporarily or not), will return the cause as a string (without user info). Empty string if the system was put offline without given a cause. |
Dependent item | jenkins.computer.offline.reason[{#DISPLAY_NAME}] Preprocessing
|
Jenkins: Computer [{#DISPLAY_NAME}]: Idle | Returns true if all the executors of this computer are idle. |
Dependent item | jenkins.computer.idle[{#DISPLAY_NAME}] Preprocessing
|
Jenkins: Computer [{#DISPLAY_NAME}]: Temporarily offline | Returns true if this node is marked temporarily offline. |
Dependent item | jenkins.computer.tempoffline[{#DISPLAYNAME}] Preprocessing
|
Jenkins: Computer [{#DISPLAY_NAME}]: Available disk space | The available disk space of $JENKINS_HOME on agent. |
Dependent item | jenkins.computer.diskspace[{#DISPLAYNAME}] Preprocessing
|
Jenkins: Computer [{#DISPLAY_NAME}]: Available temp space | The available disk space of the temporary directory. Java tools and tests/builds often create files in the temporary directory, and may not function properly if there's no available space. |
Dependent item | jenkins.computer.tempspace[{#DISPLAYNAME}] Preprocessing
|
Jenkins: Computer [{#DISPLAY_NAME}]: Response time average | The round trip network response time from the master to the agent |
Dependent item | jenkins.computer.responsetime[{#DISPLAYNAME}] Preprocessing
|
Jenkins: Computer [{#DISPLAY_NAME}]: Available physical memory | The total physical memory of the system, available bytes. |
Dependent item | jenkins.computer.availablephysicalmemory[{#DISPLAY_NAME}] Preprocessing
|
Jenkins: Computer [{#DISPLAY_NAME}]: Available swap space | Available swap space in bytes. |
Dependent item | jenkins.computer.availableswapspace[{#DISPLAY_NAME}] Preprocessing
|
Jenkins: Computer [{#DISPLAY_NAME}]: Total physical memory | Total physical memory of the system, in bytes. |
Dependent item | jenkins.computer.totalphysicalmemory[{#DISPLAY_NAME}] Preprocessing
|
Jenkins: Computer [{#DISPLAY_NAME}]: Total swap space | Total number of swap space in bytes. |
Dependent item | jenkins.computer.totalswapspace[{#DISPLAY_NAME}] Preprocessing
|
Jenkins: Computer [{#DISPLAY_NAME}]: Clock difference | The clock difference between the master and nodes. |
Dependent item | jenkins.computer.clockdifference[{#DISPLAYNAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Jenkins: Computer [{#DISPLAY_NAME}]: Node is down | Node down with reason: {{ITEM.LASTVALUE2}.regsub("(.*)",\1)} |
last(/Jenkins by HTTP/jenkins.computer.state[{#DISPLAY_NAME}])=1 and length(last(/Jenkins by HTTP/jenkins.computer.offline.reason[{#DISPLAY_NAME}]))>0 |Average |
Depends on:
|
|
Jenkins: Computer [{#DISPLAY_NAME}]: Node is temporarily offline | Node is temporarily Offline with reason: {{ITEM.LASTVALUE2}.regsub("(.*)",\1)} |
last(/Jenkins by HTTP/jenkins.computer.temp_offline[{#DISPLAY_NAME}])=1 and length(last(/Jenkins by HTTP/jenkins.computer.offline.reason[{#DISPLAY_NAME}]))>0 |Info |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor IIS (Internet Information Services) by Zabbix that works without any external scripts.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
You have to enable the following Windows Features (Control Panel > Programs and Features > Turn Windows features on or off) on your server
Web Server (IIS)
Web Server (IIS)\Management Tools\IIS Management Scripts and Tools
Optionally, it is possible to customize the template:
Name | Description | Default |
---|---|---|
{$IIS.PORT} | Listening port. |
80 |
{$IIS.SERVICE} | The service (http/https/etc) for port check. See "net.tcp.service" documentation page for more information: https://www.zabbix.com/documentation/6.4/manual/config/items/itemtypes/simple_checks |
http |
{$IIS.QUEUE.MAX.WARN} | Maximum application pool's request queue length for trigger expression. |
|
{$IIS.QUEUE.MAX.TIME} | The time during which the queue length may exceed the threshold. |
5m |
{$IIS.APPPOOL.NOT_MATCHES} | This macro is used in application pools discovery. Can be overridden on the host or linked template level. |
<CHANGE_IF_NEEDED> |
{$IIS.APPPOOL.MATCHES} | This macro is used in application pools discovery. Can be overridden on the host or linked template level. |
.+ |
{$IIS.APPPOOL.MONITORED} | Monitoring status for discovered application pools. Use context to avoid trigger firing for specific application pools. "1" - enabled, "0" - disabled. |
1 |
{$AGENT.TIMEOUT} | Timeout after which agent is considered unavailable. |
5m |
Name | Description | Type | Key and additional info |
---|---|---|---|
IIS: World Wide Web Publishing Service (W3SVC) state | The World Wide Web Publishing Service (W3SVC) provides web connectivity and administration of websites through the IIS snap-in. If the World Wide Web Publishing Service stops, the operating system cannot serve any form of web request. This service was dependent on "Windows Process Activation Service". |
Zabbix agent (active) | service.info[W3SVC] Preprocessing
|
IIS: Windows Process Activation Service (WAS) state | Windows Process Activation Service (WAS) is a tool for managing worker processes that contain applications that host Windows Communication Foundation (WCF) services. Worker processes handle requests that are sent to a Web Server for specific application pools. Each application pool sets boundaries for the applications it contains. |
Zabbix agent (active) | service.info[WAS] Preprocessing
|
IIS: {$IIS.PORT} port ping | Simple check | net.tcp.service[{$IIS.SERVICE},,{$IIS.PORT}] Preprocessing
|
|
IIS: Uptime | The service uptime expressed in seconds. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Service Uptime"] |
IIS: Bytes Received per second | The average rate per minute at which data bytes are received by the service at the Application Layer. Does not include protocol headers or control bytes. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Bytes Received/sec", 60] |
IIS: Bytes Sent per second | The average rate per minute at which data bytes are sent by the service. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Bytes Sent/sec", 60] |
IIS: Bytes Total per second | The average rate per minute of total bytes/sec transferred by the Web service (sum of bytes sent/sec and bytes received/sec). |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Bytes Total/Sec", 60] |
IIS: Current connections | The number of active connections. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Current Connections"] |
IIS: Total connection attempts | The total number of connections to the Web or FTP service that have been attempted since service startup. The count is the total for all Web sites or FTP sites combined. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Total Connection Attempts (all instances)"] Preprocessing
|
IIS: Connection attempts per second | The average rate per minute that connections using the Web service are being attempted. The count is the average for all Web sites combined. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Connection Attempts/Sec", 60] Preprocessing
|
IIS: Anonymous users per second | The number of requests from users over an anonymous connection per second. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Anonymous Users/sec", 60] |
IIS: NonAnonymous users per second | The number of requests from users over a non-anonymous connection per second. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\NonAnonymous Users/sec", 60] |
IIS: Method GET requests per second | The rate of HTTP requests made using the GET method. GET requests are generally used for basic file retrievals or image maps, though they can be used with forms. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Get Requests/Sec", 60] Preprocessing
|
IIS: Method COPY requests per second | The rate of HTTP requests made using the COPY method. Copy requests are used for copying files and directories. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Copy Requests/Sec", 60] Preprocessing
|
IIS: Method CGI requests per second | The rate of CGI requests that are simultaneously being processed by the Web service. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\CGI Requests/Sec", 60] Preprocessing
|
IIS: Method DELETE requests per second | The rate of HTTP requests using the DELETE method made. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Delete Requests/Sec", 60] Preprocessing
|
IIS: Method HEAD requests per second | The rate of HTTP requests using the HEAD method made. HEAD requests generally indicate a client is querying the state of a document they already have to see if it needs to be refreshed. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Head Requests/Sec", 60] Preprocessing
|
IIS: Method ISAPI requests per second | The rate of ISAPI Extension requests that are simultaneously being processed by the Web service. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\ISAPI Extension Requests/Sec", 60] Preprocessing
|
IIS: Method LOCK requests per second | The rate of HTTP requests made using the LOCK method. Lock requests are used to lock a file for one user so that only that user can modify the file. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Lock Requests/Sec", 60] Preprocessing
|
IIS: Method MKCOL requests per second | The rate of HTTP requests using the MKCOL method made. Mkcol requests are used to create directories on the server. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Mkcol Requests/Sec", 60] Preprocessing
|
IIS: Method MOVE requests per second | The rate of HTTP requests using the MOVE method made. Move requests are used for moving files and directories. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Move Requests/Sec", 60] Preprocessing
|
IIS: Method OPTIONS requests per second | The rate of HTTP requests using the OPTIONS method made. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Options Requests/Sec", 60] Preprocessing
|
IIS: Method POST requests per second | Rate of HTTP requests using POST method. Generally used for forms or gateway requests. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Post Requests/Sec", 60] Preprocessing
|
IIS: Method PROPFIND requests per second | The rate of HTTP requests using the PROPFIND method made. Propfind requests retrieve property values on files and directories. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Propfind Requests/Sec", 60] Preprocessing
|
IIS: Method PROPPATCH requests per second | The rate of HTTP requests using the PROPPATCH method made. Proppatch requests set property values on files and directories. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Proppatch Requests/Sec", 60] Preprocessing
|
IIS: Method PUT requests per second | The rate of HTTP requests using the PUT method made. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Put Requests/Sec", 60] Preprocessing
|
IIS: Method MS-SEARCH requests per second | The rate of HTTP requests using the MS-SEARCH method made. Search requests are used to query the server to find resources that match a set of conditions provided by the client. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Search Requests/Sec", 60] Preprocessing
|
IIS: Method TRACE requests per second | The rate of HTTP requests using the TRACE method made. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Trace Requests/Sec", 60] Preprocessing
|
IIS: Method TRACE requests per second | The rate of HTTP requests using the UNLOCK method made. Unlock requests are used to remove locks from files. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Unlock Requests/Sec", 60] Preprocessing
|
IIS: Method Total requests per second | The rate of all HTTP requests received. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Total Method Requests/Sec", 60] Preprocessing
|
IIS: Method Total Other requests per second | Total Other Request Methods is the number of HTTP requests that are not OPTIONS, GET, HEAD, POST, PUT, DELETE, TRACE, MOVE, COPY, MKCOL, PROPFIND, PROPPATCH, SEARCH, LOCK or UNLOCK methods (since service startup). Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Other Request Methods/Sec", 60] Preprocessing
|
IIS: Locked errors per second | The rate of errors due to requests that couldn't be satisfied by the server because the requested document was locked. These are generally reported as an HTTP 423 error code to the client. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Locked Errors/Sec", 60] Preprocessing
|
IIS: Not Found errors per second | The rate of errors due to requests that couldn't be satisfied by the server because the requested document could not be found. These are generally reported to the client with HTTP error code 404. Average per minute. |
Zabbix agent (active) | perfcounteren["\Web Service(_Total)\Not Found Errors/Sec", 60] Preprocessing
|
IIS: Files cache hits percentage | The ratio of user-mode file cache hits to total cache requests (since service startup). Note: This value might be low if the Kernel URI cache hits percentage is high. |
Zabbix agent (active) | perfcounteren["\Web Service Cache\File Cache Hits %"] Preprocessing
|
IIS: URIs cache hits percentage | The ratio of user-mode URI Cache Hits to total cache requests (since service startup) |
Zabbix agent (active) | perfcounteren["\Web Service Cache\URI Cache Hits %"] Preprocessing
|
IIS: File cache misses | The total number of unsuccessful lookups in the user-mode file cache since service startup. |
Zabbix agent (active) | perfcounteren["\Web Service Cache\File Cache Misses"] Preprocessing
|
IIS: URI cache misses | The total number of unsuccessful lookups in the user-mode URI cache since service startup. |
Zabbix agent (active) | perfcounteren["\Web Service Cache\URI Cache Misses"] Preprocessing
|
IIS: Active agent availability | Availability of active checks on the host. The value of this item corresponds to availability icons in the host list. Possible value: 0 - unknown 1 - available 2 - not available |
Zabbix internal | zabbix[host,active_agent,available] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
IIS: The World Wide Web Publishing Service (W3SVC) is not running | The World Wide Web Publishing Service (W3SVC) is not in the running state. IIS cannot start. |
last(/IIS by Zabbix agent active/service.info[W3SVC])<>0 |High |
Depends on:
|
|
IIS: Windows process Activation Service (WAS) is not running | Windows Process Activation Service (WAS) is not in the running state. IIS cannot start. |
last(/IIS by Zabbix agent active/service.info[WAS])<>0 |High |
||
IIS: Port {$IIS.PORT} is down | last(/IIS by Zabbix agent active/net.tcp.service[{$IIS.SERVICE},,{$IIS.PORT}])=0 |Average |
Manual close: Yes Depends on:
|
||
IIS: has been restarted | Uptime is less than 10 minutes. |
last(/IIS by Zabbix agent active/perf_counter_en["\Web Service(_Total)\Service Uptime"])<10m |Info |
Manual close: Yes | |
IIS: Active checks are not available | Active checks are considered unavailable. Agent is not sending heartbeat for prolonged time. |
min(/IIS by Zabbix agent active/zabbix[host,active_agent,available],{$AGENT.TIMEOUT})=2 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Application pools discovery | Zabbix agent (active) | wmi.getall[root\webAdministration, select Name from ApplicationPool] |
Name | Description | Type | Key and additional info |
---|---|---|---|
IIS: {#APPPOOL} Uptime | The web application uptime period since the last restart. |
Zabbix agent (active) | perfcounteren["\APPPOOLWAS({#APPPOOL})\Current Application Pool Uptime"] |
IIS: AppPool {#APPPOOL} state | The state of the application pool. |
Zabbix agent (active) | perfcounteren["\APPPOOLWAS({#APPPOOL})\Current Application Pool State"] Preprocessing
|
IIS: AppPool {#APPPOOL} recycles | The number of times the application pool has been recycled since Windows Process Activation Service (WAS) started. |
Zabbix agent (active) | perfcounteren["\APPPOOLWAS({#APPPOOL})\Total Application Pool Recycles"] Preprocessing
|
IIS: AppPool {#APPPOOL} current queue size | The number of requests in the queue. |
Zabbix agent (active) | perfcounteren["\HTTP Service Request Queues({#APPPOOL})\CurrentQueueSize"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
IIS: {#APPPOOL} has been restarted | Uptime is less than 10 minutes. |
last(/IIS by Zabbix agent active/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Current Application Pool Uptime"])<10m |Info |
Manual close: Yes | |
IIS: Application pool {#APPPOOL} is not in Running state | last(/IIS by Zabbix agent active/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Current Application Pool State"])<>3 and {$IIS.APPPOOL.MONITORED:"{#APPPOOL}"}=1 |High |
Depends on:
|
||
IIS: Application pool {#APPPOOL} has been recycled | last(/IIS by Zabbix agent active/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Total Application Pool Recycles"],#1)<>last(/IIS by Zabbix agent active/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Total Application Pool Recycles"],#2) and {$IIS.APPPOOL.MONITORED:"{#APPPOOL}"}=1 |Info |
|||
IIS: Request queue of {#APPPOOL} is too large | min(/IIS by Zabbix agent active/perf_counter_en["\HTTP Service Request Queues({#APPPOOL})\CurrentQueueSize"],{$IIS.QUEUE.MAX.TIME})>{$IIS.QUEUE.MAX.WARN} |Warning |
Depends on:
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor IIS (Internet Information Services) by Zabbix that works without any external scripts.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
You have to enable the following Windows Features (Control Panel > Programs and Features > Turn Windows features on or off) on your server
Web Server (IIS)
Web Server (IIS)\Management Tools\IIS Management Scripts and Tools
Optionally, it is possible to customize the template:
Name | Description | Default |
---|---|---|
{$IIS.PORT} | Listening port. |
80 |
{$IIS.SERVICE} | The service (http/https/etc) for port check. See "net.tcp.service" documentation page for more information: https://www.zabbix.com/documentation/6.4/manual/config/items/itemtypes/simple_checks |
http |
{$IIS.QUEUE.MAX.WARN} | Maximum application pool's request queue length for trigger expression. |
|
{$IIS.QUEUE.MAX.TIME} | The time during which the queue length may exceed the threshold. |
5m |
{$IIS.APPPOOL.NOT_MATCHES} | This macro is used in application pools discovery. Can be overridden on the host or linked template level. |
<CHANGE_IF_NEEDED> |
{$IIS.APPPOOL.MATCHES} | This macro is used in application pools discovery. Can be overridden on the host or linked template level. |
.+ |
{$IIS.APPPOOL.MONITORED} | Monitoring status for discovered application pools. Use context to avoid trigger firing for specific application pools. "1" - enabled, "0" - disabled. |
1 |
Name | Description | Type | Key and additional info |
---|---|---|---|
IIS: World Wide Web Publishing Service (W3SVC) state | The World Wide Web Publishing Service (W3SVC) provides web connectivity and administration of websites through the IIS snap-in. If the World Wide Web Publishing Service stops, the operating system cannot serve any form of web request. This service was dependent on "Windows Process Activation Service". |
Zabbix agent | service.info[W3SVC] Preprocessing
|
IIS: Windows Process Activation Service (WAS) state | Windows Process Activation Service (WAS) is a tool for managing worker processes that contain applications that host Windows Communication Foundation (WCF) services. Worker processes handle requests that are sent to a Web Server for specific application pools. Each application pool sets boundaries for the applications it contains. |
Zabbix agent | service.info[WAS] Preprocessing
|
IIS: {$IIS.PORT} port ping | Simple check | net.tcp.service[{$IIS.SERVICE},,{$IIS.PORT}] Preprocessing
|
|
IIS: Uptime | The service uptime expressed in seconds. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Service Uptime"] |
IIS: Bytes Received per second | The average rate per minute at which data bytes are received by the service at the Application Layer. Does not include protocol headers or control bytes. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Bytes Received/sec", 60] |
IIS: Bytes Sent per second | The average rate per minute at which data bytes are sent by the service. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Bytes Sent/sec", 60] |
IIS: Bytes Total per second | The average rate per minute of total bytes/sec transferred by the Web service (sum of bytes sent/sec and bytes received/sec). |
Zabbix agent | perfcounteren["\Web Service(_Total)\Bytes Total/Sec", 60] |
IIS: Current connections | The number of active connections. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Current Connections"] |
IIS: Total connection attempts | The total number of connections to the Web or FTP service that have been attempted since service startup. The count is the total for all Web sites or FTP sites combined. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Total Connection Attempts (all instances)"] Preprocessing
|
IIS: Connection attempts per second | The average rate per minute that connections using the Web service are being attempted. The count is the average for all Web sites combined. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Connection Attempts/Sec", 60] Preprocessing
|
IIS: Anonymous users per second | The number of requests from users over an anonymous connection per second. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Anonymous Users/sec", 60] |
IIS: NonAnonymous users per second | The number of requests from users over a non-anonymous connection per second. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\NonAnonymous Users/sec", 60] |
IIS: Method GET requests per second | The rate of HTTP requests made using the GET method. GET requests are generally used for basic file retrievals or image maps, though they can be used with forms. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Get Requests/Sec", 60] Preprocessing
|
IIS: Method COPY requests per second | The rate of HTTP requests made using the COPY method. Copy requests are used for copying files and directories. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Copy Requests/Sec", 60] Preprocessing
|
IIS: Method CGI requests per second | The rate of CGI requests that are simultaneously being processed by the Web service. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\CGI Requests/Sec", 60] Preprocessing
|
IIS: Method DELETE requests per second | The rate of HTTP requests using the DELETE method made. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Delete Requests/Sec", 60] Preprocessing
|
IIS: Method HEAD requests per second | The rate of HTTP requests using the HEAD method made. HEAD requests generally indicate a client is querying the state of a document they already have to see if it needs to be refreshed. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Head Requests/Sec", 60] Preprocessing
|
IIS: Method ISAPI requests per second | The rate of ISAPI Extension requests that are simultaneously being processed by the Web service. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\ISAPI Extension Requests/Sec", 60] Preprocessing
|
IIS: Method LOCK requests per second | The rate of HTTP requests made using the LOCK method. Lock requests are used to lock a file for one user so that only that user can modify the file. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Lock Requests/Sec", 60] Preprocessing
|
IIS: Method MKCOL requests per second | The rate of HTTP requests using the MKCOL method made. Mkcol requests are used to create directories on the server. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Mkcol Requests/Sec", 60] Preprocessing
|
IIS: Method MOVE requests per second | The rate of HTTP requests using the MOVE method made. Move requests are used for moving files and directories. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Move Requests/Sec", 60] Preprocessing
|
IIS: Method OPTIONS requests per second | The rate of HTTP requests using the OPTIONS method made. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Options Requests/Sec", 60] Preprocessing
|
IIS: Method POST requests per second | Rate of HTTP requests using POST method. Generally used for forms or gateway requests. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Post Requests/Sec", 60] Preprocessing
|
IIS: Method PROPFIND requests per second | The rate of HTTP requests using the PROPFIND method made. Propfind requests retrieve property values on files and directories. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Propfind Requests/Sec", 60] Preprocessing
|
IIS: Method PROPPATCH requests per second | The rate of HTTP requests using the PROPPATCH method made. Proppatch requests set property values on files and directories. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Proppatch Requests/Sec", 60] Preprocessing
|
IIS: Method PUT requests per second | The rate of HTTP requests using the PUT method made. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Put Requests/Sec", 60] Preprocessing
|
IIS: Method MS-SEARCH requests per second | The rate of HTTP requests using the MS-SEARCH method made. Search requests are used to query the server to find resources that match a set of conditions provided by the client. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Search Requests/Sec", 60] Preprocessing
|
IIS: Method TRACE requests per second | The rate of HTTP requests using the TRACE method made. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Trace Requests/Sec", 60] Preprocessing
|
IIS: Method TRACE requests per second | The rate of HTTP requests using the UNLOCK method made. Unlock requests are used to remove locks from files. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Unlock Requests/Sec", 60] Preprocessing
|
IIS: Method Total requests per second | The rate of all HTTP requests received. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Total Method Requests/Sec", 60] Preprocessing
|
IIS: Method Total Other requests per second | Total Other Request Methods is the number of HTTP requests that are not OPTIONS, GET, HEAD, POST, PUT, DELETE, TRACE, MOVE, COPY, MKCOL, PROPFIND, PROPPATCH, SEARCH, LOCK or UNLOCK methods (since service startup). Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Other Request Methods/Sec", 60] Preprocessing
|
IIS: Locked errors per second | The rate of errors due to requests that couldn't be satisfied by the server because the requested document was locked. These are generally reported as an HTTP 423 error code to the client. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Locked Errors/Sec", 60] Preprocessing
|
IIS: Not Found errors per second | The rate of errors due to requests that couldn't be satisfied by the server because the requested document could not be found. These are generally reported to the client with HTTP error code 404. Average per minute. |
Zabbix agent | perfcounteren["\Web Service(_Total)\Not Found Errors/Sec", 60] Preprocessing
|
IIS: Files cache hits percentage | The ratio of user-mode file cache hits to total cache requests (since service startup). Note: This value might be low if the Kernel URI cache hits percentage is high. |
Zabbix agent | perfcounteren["\Web Service Cache\File Cache Hits %"] Preprocessing
|
IIS: URIs cache hits percentage | The ratio of user-mode URI Cache Hits to total cache requests (since service startup) |
Zabbix agent | perfcounteren["\Web Service Cache\URI Cache Hits %"] Preprocessing
|
IIS: File cache misses | The total number of unsuccessful lookups in the user-mode file cache since service startup. |
Zabbix agent | perfcounteren["\Web Service Cache\File Cache Misses"] Preprocessing
|
IIS: URI cache misses | The total number of unsuccessful lookups in the user-mode URI cache since service startup. |
Zabbix agent | perfcounteren["\Web Service Cache\URI Cache Misses"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
IIS: The World Wide Web Publishing Service (W3SVC) is not running | The World Wide Web Publishing Service (W3SVC) is not in the running state. IIS cannot start. |
last(/IIS by Zabbix agent/service.info[W3SVC])<>0 |High |
Depends on:
|
|
IIS: Windows process Activation Service (WAS) is not running | Windows Process Activation Service (WAS) is not in the running state. IIS cannot start. |
last(/IIS by Zabbix agent/service.info[WAS])<>0 |High |
||
IIS: Port {$IIS.PORT} is down | last(/IIS by Zabbix agent/net.tcp.service[{$IIS.SERVICE},,{$IIS.PORT}])=0 |Average |
Manual close: Yes Depends on:
|
||
IIS: has been restarted | Uptime is less than 10 minutes. |
last(/IIS by Zabbix agent/perf_counter_en["\Web Service(_Total)\Service Uptime"])<10m |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Application pools discovery | Zabbix agent | wmi.getall[root\webAdministration, select Name from ApplicationPool] |
Name | Description | Type | Key and additional info |
---|---|---|---|
IIS: {#APPPOOL} Uptime | The web application uptime period since the last restart. |
Zabbix agent | perfcounteren["\APPPOOLWAS({#APPPOOL})\Current Application Pool Uptime"] |
IIS: AppPool {#APPPOOL} state | The state of the application pool. |
Zabbix agent | perfcounteren["\APPPOOLWAS({#APPPOOL})\Current Application Pool State"] Preprocessing
|
IIS: AppPool {#APPPOOL} recycles | The number of times the application pool has been recycled since Windows Process Activation Service (WAS) started. |
Zabbix agent | perfcounteren["\APPPOOLWAS({#APPPOOL})\Total Application Pool Recycles"] Preprocessing
|
IIS: AppPool {#APPPOOL} current queue size | The number of requests in the queue. |
Zabbix agent | perfcounteren["\HTTP Service Request Queues({#APPPOOL})\CurrentQueueSize"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
IIS: {#APPPOOL} has been restarted | Uptime is less than 10 minutes. |
last(/IIS by Zabbix agent/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Current Application Pool Uptime"])<10m |Info |
Manual close: Yes | |
IIS: Application pool {#APPPOOL} is not in Running state | last(/IIS by Zabbix agent/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Current Application Pool State"])<>3 and {$IIS.APPPOOL.MONITORED:"{#APPPOOL}"}=1 |High |
Depends on:
|
||
IIS: Application pool {#APPPOOL} has been recycled | last(/IIS by Zabbix agent/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Total Application Pool Recycles"],#1)<>last(/IIS by Zabbix agent/perf_counter_en["\APP_POOL_WAS({#APPPOOL})\Total Application Pool Recycles"],#2) and {$IIS.APPPOOL.MONITORED:"{#APPPOOL}"}=1 |Info |
|||
IIS: Request queue of {#APPPOOL} is too large | min(/IIS by Zabbix agent/perf_counter_en["\HTTP Service Request Queues({#APPPOOL})\CurrentQueueSize"],{$IIS.QUEUE.MAX.TIME})>{$IIS.QUEUE.MAX.WARN} |Warning |
Depends on:
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor HAProxy by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template collects metrics by polling the HAProxy stats page with HTTP agent.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
HAProxy stats page
.If you want to use authentication, set the username and password in the stats auth
option of the configuration file.
The example configuration of HAProxy:
frontend stats
bind *:8404
stats enable
stats uri /stats
stats refresh 10s
#stats auth Username:Password # Authentication credentials
Set the hostname or IP address of the HAProxy stats host or container in the {$HAPROXY.STATS.HOST}
macro. You can also change the status page port in the {$HAPROXY.STATS.PORT}
macro, the status page scheme in the {$HAPROXY.STATS.SCHEME}
macro and the status page path in the {$HAPROXY.STATS.PATH}
macro if necessary.
If you have enabled authentication in the HAProxy configuration file in step 1, set the username and password in the {$HAPROXY.USERNAME}
and {$HAPROXY.PASSWORD}
macros.
Name | Description | Default |
---|---|---|
{$HAPROXY.STATS.SCHEME} | The scheme of HAProxy stats page (http/https). |
http |
{$HAPROXY.STATS.HOST} | The hostname or IP address of the HAProxy stats host or container. |
<SET HAPROXY HOST> |
{$HAPROXY.STATS.PORT} | The port of the HAProxy stats host or container. |
8404 |
{$HAPROXY.STATS.PATH} | The path of the HAProxy stats page. |
stats |
{$HAPROXY.USERNAME} | The username of the HAProxy stats page. |
|
{$HAPROXY.PASSWORD} | The password of the HAProxy stats page. |
|
{$HAPROXY.RESPONSE_TIME.MAX.WARN} | The HAProxy stats page maximum response time in seconds for trigger expression. |
10s |
{$HAPROXY.FRONT_DREQ.MAX.WARN} | The HAProxy maximum denied requests for trigger expression. |
10 |
{$HAPROXY.FRONT_EREQ.MAX.WARN} | The HAProxy maximum number of request errors for trigger expression. |
10 |
{$HAPROXY.BACK_QCUR.MAX.WARN} | Maximum number of requests on Backend unassigned in queue for trigger expression. |
10 |
{$HAPROXY.BACK_RTIME.MAX.WARN} | Maximum of average Backend response time for trigger expression. |
10s |
{$HAPROXY.BACK_QTIME.MAX.WARN} | Maximum of average time spent in queue on Backend for trigger expression. |
10s |
{$HAPROXY.BACK_ERESP.MAX.WARN} | Maximum of responses with error on Backend for trigger expression. |
10 |
{$HAPROXY.SERVER_QCUR.MAX.WARN} | Maximum number of requests on server unassigned in queue for trigger expression. |
10 |
{$HAPROXY.SERVER_RTIME.MAX.WARN} | Maximum of average server response time for trigger expression. |
10s |
{$HAPROXY.SERVER_QTIME.MAX.WARN} | Maximum of average time spent in queue on server for trigger expression. |
10s |
{$HAPROXY.SERVER_ERESP.MAX.WARN} | Maximum of responses with error on server for trigger expression. |
10 |
{$HAPROXY.FRONT_SUTIL.MAX.WARN} | Maximum of session usage percentage on frontend for trigger expression. |
80 |
Name | Description | Type | Key and additional info |
---|---|---|---|
HAProxy: Get stats | HAProxy Statistics Report in CSV format |
HTTP agent | haproxy.get Preprocessing
|
HAProxy: Get nodes | Array for LLD rules. |
Dependent item | haproxy.get.nodes Preprocessing
|
HAProxy: Get stats page | HAProxy Statistics Report HTML |
HTTP agent | haproxy.get_html |
HAProxy: Version | Dependent item | haproxy.version Preprocessing
|
|
HAProxy: Uptime | Dependent item | haproxy.uptime Preprocessing
|
|
HAProxy: Service status | Simple check | net.tcp.service["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"] Preprocessing
|
|
HAProxy: Service response time | Simple check | net.tcp.service.perf["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HAProxy: Version has changed | HAProxy version has changed. Acknowledge to close the problem manually. |
last(/HAProxy by HTTP/haproxy.version,#1)<>last(/HAProxy by HTTP/haproxy.version,#2) and length(last(/HAProxy by HTTP/haproxy.version))>0 |Info |
Manual close: Yes | |
HAProxy: has been restarted | Uptime is less than 10 minutes. |
last(/HAProxy by HTTP/haproxy.uptime)<10m |Info |
Manual close: Yes | |
HAProxy: Service is down | last(/HAProxy by HTTP/net.tcp.service["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"])=0 |Average |
Manual close: Yes | ||
HAProxy: Service response time is too high | min(/HAProxy by HTTP/net.tcp.service.perf["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"],5m)>{$HAPROXY.RESPONSE_TIME.MAX.WARN} |Warning |
Manual close: Yes Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Backend discovery | Discovery backends |
Dependent item | haproxy.backend.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
HAProxy Backend {#PXNAME}: Raw data | The raw data of the Backend with the name |
Dependent item | haproxy.backend.raw[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Status | Possible values: UP - The server is reporting as healthy. DOWN - The server is reporting as unhealthy and unable to receive requests. NOLB - You've added http-check disable-on-404 to the backend and the health checked URL has returned an HTTP 404 response. MAINT - The server has been disabled or put into maintenance mode. DRAIN - The server has been put into drain mode. no check - Health checks are not enabled for this server. |
Dependent item | haproxy.backend.status[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Responses time | Average backend response time (in ms) for the last 1,024 requests |
Dependent item | haproxy.backend.rtime[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Errors connection per second | Number of requests that encountered an error attempting to connect to a backend server. |
Dependent item | haproxy.backend.econ.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Responses denied per second | Responses denied due to security concerns (ACL-restricted). |
Dependent item | haproxy.backend.dresp.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Response errors per second | Number of requests whose responses yielded an error |
Dependent item | haproxy.backend.eresp.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Unassigned requests | Current number of requests unassigned in queue. |
Dependent item | haproxy.backend.qcur[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Time in queue | Average time spent in queue (in ms) for the last 1,024 requests |
Dependent item | haproxy.backend.qtime[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Redispatched requests per second | Number of times a request was redispatched to a different backend. |
Dependent item | haproxy.backend.wredis.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Retried connections per second | Number of times a connection was retried. |
Dependent item | haproxy.backend.wretr.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Number of responses with codes 1xx per second | Number of informational HTTP responses per second. |
Dependent item | haproxy.backend.hrsp_1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Number of responses with codes 2xx per second | Number of successful HTTP responses per second. |
Dependent item | haproxy.backend.hrsp_2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Number of responses with codes 3xx per second | Number of HTTP redirections per second. |
Dependent item | haproxy.backend.hrsp_3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Number of responses with codes 4xx per second | Number of HTTP client errors per second. |
Dependent item | haproxy.backend.hrsp_4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Number of responses with codes 5xx per second | Number of HTTP server errors per second. |
Dependent item | haproxy.backend.hrsp_5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Incoming traffic | Number of bits received by the backend |
Dependent item | haproxy.backend.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Outgoing traffic | Number of bits sent by the backend |
Dependent item | haproxy.backend.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Number of active servers | Number of active servers. |
Dependent item | haproxy.backend.act[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Number of backup servers | Number of backup servers. |
Dependent item | haproxy.backend.bck[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Sessions per second | Cumulative number of sessions (end-to-end connections) per second. |
Dependent item | haproxy.backend.stot.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Weight | Total effective weight. |
Dependent item | haproxy.backend.weight[{#PXNAME},{#SVNAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HAProxy backend {#PXNAME}: Server is DOWN | Backend is not available. |
count(/HAProxy by HTTP/haproxy.backend.status[{#PXNAME},{#SVNAME}],#5,"eq","DOWN")=5 |Average |
||
HAProxy backend {#PXNAME}: Average response time is high | Average backend response time (in ms) for the last 1,024 requests is more than {$HAPROXY.BACK_RTIME.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.backend.rtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_RTIME.MAX.WARN} |Warning |
||
HAProxy backend {#PXNAME}: Number of responses with error is high | Number of requests on backend, whose responses yielded an error, is more than {$HAPROXY.BACK_ERESP.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.backend.eresp.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_ERESP.MAX.WARN} |Warning |
||
HAProxy backend {#PXNAME}: Current number of requests unassigned in queue is high | Current number of requests on backend unassigned in queue is more than {$HAPROXY.BACK_QCUR.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.backend.qcur[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_QCUR.MAX.WARN} |Warning |
||
HAProxy backend {#PXNAME}: Average time spent in queue is high | Average time spent in queue (in ms) for the last 1,024 requests is more than {$HAPROXY.BACK_QTIME.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.backend.qtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_QTIME.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Frontend discovery | Discovery frontends |
Dependent item | haproxy.frontend.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
HAProxy Frontend {#PXNAME}: Raw data | The raw data of the Frontend with the name |
Dependent item | haproxy.frontend.raw[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Status | Possible values: OPEN, STOP. When Status is OPEN, the frontend is operating normally and ready to receive traffic. |
Dependent item | haproxy.frontend.status[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Requests rate | HTTP requests per second |
Dependent item | haproxy.frontend.req_rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Sessions rate | Number of sessions created per second |
Dependent item | haproxy.frontend.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Established sessions | The current number of established sessions. |
Dependent item | haproxy.frontend.scur[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Session limits | The most simultaneous sessions that are allowed, as defined by the maxconn setting in the frontend. |
Dependent item | haproxy.frontend.slim[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Session utilization | Percentage of sessions used (scur / slim * 100). |
Calculated | haproxy.frontend.sutil[{#PXNAME},{#SVNAME}] |
HAProxy Frontend {#PXNAME}: Request errors per second | Number of request errors per second. |
Dependent item | haproxy.frontend.ereq.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Denied requests per second | Requests denied due to security concerns (ACL-restricted) per second. |
Dependent item | haproxy.frontend.dreq.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Number of responses with codes 1xx per second | Number of informational HTTP responses per second. |
Dependent item | haproxy.frontend.hrsp_1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Number of responses with codes 2xx per second | Number of successful HTTP responses per second. |
Dependent item | haproxy.frontend.hrsp_2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Number of responses with codes 3xx per second | Number of HTTP redirections per second. |
Dependent item | haproxy.frontend.hrsp_3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Number of responses with codes 4xx per second | Number of HTTP client errors per second. |
Dependent item | haproxy.frontend.hrsp_4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Number of responses with codes 5xx per second | Number of HTTP server errors per second. |
Dependent item | haproxy.frontend.hrsp_5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Incoming traffic | Number of bits received by the frontend |
Dependent item | haproxy.frontend.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Outgoing traffic | Number of bits sent by the frontend |
Dependent item | haproxy.frontend.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HAProxy frontend {#PXNAME}: Session utilization is high | Alerting on this metric is essential to ensure your server has sufficient capacity to handle all concurrent sessions. Unlike requests, upon reaching the session limit HAProxy will deny additional clients until resource consumption drops. Furthermore, if you find your session usage percentage to be hovering above 80%, it could be time to either modify HAProxy's configuration to allow more sessions, or migrate your HAProxy server to a bigger box. |
min(/HAProxy by HTTP/haproxy.frontend.sutil[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_SUTIL.MAX.WARN} |Warning |
||
HAProxy frontend {#PXNAME}: Number of request errors is high | Number of request errors is more than {$HAPROXY.FRONT_EREQ.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.frontend.ereq.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_EREQ.MAX.WARN} |Warning |
||
HAProxy frontend {#PXNAME}: Number of requests denied is high | Number of requests denied due to security concerns (ACL-restricted) is more than {$HAPROXY.FRONT_DREQ.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.frontend.dreq.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_DREQ.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Server discovery | Discovery servers |
Dependent item | haproxy.server.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
HAProxy Server {#PXNAME} {#SVNAME}: Raw data | The raw data of the Server named |
Dependent item | haproxy.server.raw[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Status | Dependent item | haproxy.server.status[{#PXNAME},{#SVNAME}] Preprocessing
|
|
HAProxy {#PXNAME} {#SVNAME}: Responses time | Average server response time (in ms) for the last 1,024 requests. |
Dependent item | haproxy.server.rtime[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Errors connection per second | Number of requests that encountered an error attempting to connect to a backend server. |
Dependent item | haproxy.server.econ.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Responses denied per second | Responses denied due to security concerns (ACL-restricted). |
Dependent item | haproxy.server.dresp.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Response errors per second | Number of requests whose responses yielded an error. |
Dependent item | haproxy.server.eresp.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Unassigned requests | Current number of requests unassigned in queue. |
Dependent item | haproxy.server.qcur[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Time in queue | Average time spent in queue (in ms) for the last 1,024 requests. |
Dependent item | haproxy.server.qtime[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Redispatched requests per second | Number of times a request was redispatched to a different backend. |
Dependent item | haproxy.server.wredis.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Retried connections per second | Number of times a connection was retried. |
Dependent item | haproxy.server.wretr.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 1xx per second | Number of informational HTTP responses per second. |
Dependent item | haproxy.server.hrsp_1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 2xx per second | Number of successful HTTP responses per second. |
Dependent item | haproxy.server.hrsp_2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 3xx per second | Number of HTTP redirections per second. |
Dependent item | haproxy.server.hrsp_3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 4xx per second | Number of HTTP client errors per second. |
Dependent item | haproxy.server.hrsp_4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 5xx per second | Number of HTTP server errors per second. |
Dependent item | haproxy.server.hrsp_5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Incoming traffic | Number of bits received by the backend |
Dependent item | haproxy.server.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Outgoing traffic | Number of bits sent by the backend |
Dependent item | haproxy.server.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Server is active | Shows whether the server is active (marked with a Y) or a backup (marked with a -). |
Dependent item | haproxy.server.act[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Server is backup | Shows whether the server is a backup (marked with a Y) or active (marked with a -). |
Dependent item | haproxy.server.bck[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Sessions per second | Cumulative number of sessions (end-to-end connections) per second. |
Dependent item | haproxy.server.stot.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Weight | Effective weight. |
Dependent item | haproxy.server.weight[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Configured maxqueue | Configured maxqueue for the server, or nothing in the value is 0 (default, meaning no limit). |
Dependent item | haproxy.server.qlimit[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Server was selected per second | Number of times that server was selected. |
Dependent item | haproxy.server.lbtot.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Status of last health check | Status of last health check, one of: UNK -> unknown INI -> initializing SOCKERR -> socket error L4OK -> check passed on layer 4, no upper layers testing enabled L4TOUT -> layer 1-4 timeout L4CON -> layer 1-4 connection problem, for example "Connection refused" (tcp rst) or "No route to host" (icmp) L6OK -> check passed on layer 6 L6TOUT -> layer 6 (SSL) timeout L6RSP -> layer 6 invalid response - protocol error L7OK -> check passed on layer 7 L7OKC -> check conditionally passed on layer 7, for example 404 with disable-on-404 L7TOUT -> layer 7 (HTTP/SMTP) timeout L7RSP -> layer 7 invalid response - protocol error L7STS -> layer 7 response error, for example HTTP 5xx Notice: If a check is currently running, the last known status will be reported, prefixed with "* ". e. g. "* L7OK". |
Dependent item | haproxy.server.check_status[{#PXNAME},{#SVNAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HAProxy {#PXNAME} {#SVNAME}: Server is DOWN | Server is not available. |
count(/HAProxy by HTTP/haproxy.server.status[{#PXNAME},{#SVNAME}],#5,"eq","DOWN")=5 |Warning |
||
HAProxy {#PXNAME} {#SVNAME}: Average response time is high | Average server response time (in ms) for the last 1,024 requests is more than {$HAPROXY.SERVER_RTIME.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.server.rtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_RTIME.MAX.WARN} |Warning |
||
HAProxy {#PXNAME} {#SVNAME}: Number of responses with error is high | Number of requests on server, whose responses yielded an error, is more than {$HAPROXY.SERVER_ERESP.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.server.eresp.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_ERESP.MAX.WARN} |Warning |
||
HAProxy {#PXNAME} {#SVNAME}: Current number of requests unassigned in queue is high | Current number of requests unassigned in queue is more than {$HAPROXY.SERVER_QCUR.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.server.qcur[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_QCUR.MAX.WARN} |Warning |
||
HAProxy {#PXNAME} {#SVNAME}: Average time spent in queue is high | Average time spent in queue (in ms) for the last 1,024 requests is more than {$HAPROXY.SERVER_QTIME.MAX.WARN}. |
min(/HAProxy by HTTP/haproxy.server.qtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_QTIME.MAX.WARN} |Warning |
||
HAProxy {#PXNAME} {#SVNAME}: Health check error | Please check the server for faults. |
find(/HAProxy by HTTP/haproxy.server.check_status[{#PXNAME},{#SVNAME}],#3,"regexp","(?:L[4-7]OK|^$)")=0 |Warning |
Depends on:
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor HAProxy by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template collects metrics by polling the HAProxy stats page with Zabbix agent.
Note, that this template doesn't support authentication and redirects (limitations of web.page.get
).
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
HAProxy stats page
.The example configuration of HAProxy:
frontend stats
bind *:8404
stats enable
stats uri /stats
stats refresh 10s
{$HAPROXY.STATS.HOST}
macro. You can also change the status page port in the {$HAPROXY.STATS.PORT}
macro, the status page scheme in the {$HAPROXY.STATS.SCHEME}
macro and the status page path in the {$HAPROXY.STATS.PATH}
macro if necessary.Name | Description | Default |
---|---|---|
{$HAPROXY.STATS.SCHEME} | The scheme of HAProxy stats page(http/https). |
http |
{$HAPROXY.STATS.HOST} | The hostname or IP address of the HAProxy stats host or container. |
localhost |
{$HAPROXY.STATS.PORT} | The port of the HAProxy stats host or container. |
8404 |
{$HAPROXY.STATS.PATH} | The path of HAProxy stats page. |
stats |
{$HAPROXY.RESPONSE_TIME.MAX.WARN} | The HAProxy stats page maximum response time in seconds for trigger expression. |
10s |
{$HAPROXY.FRONT_DREQ.MAX.WARN} | The HAProxy maximum denied requests for trigger expression. |
10 |
{$HAPROXY.FRONT_EREQ.MAX.WARN} | The HAProxy maximum number of request errors for trigger expression. |
10 |
{$HAPROXY.BACK_QCUR.MAX.WARN} | Maximum number of requests on BACKEND unassigned in queue for trigger expression. |
10 |
{$HAPROXY.BACK_RTIME.MAX.WARN} | Maximum of average BACKEND response time for trigger expression. |
10s |
{$HAPROXY.BACK_QTIME.MAX.WARN} | Maximum of average time spent in queue on BACKEND for trigger expression. |
10s |
{$HAPROXY.BACK_ERESP.MAX.WARN} | Maximum of responses with error on BACKEND for trigger expression. |
10 |
{$HAPROXY.SERVER_QCUR.MAX.WARN} | Maximum number of requests on server unassigned in queue for trigger expression. |
10 |
{$HAPROXY.SERVER_RTIME.MAX.WARN} | Maximum of average server response time for trigger expression. |
10s |
{$HAPROXY.SERVER_QTIME.MAX.WARN} | Maximum of average time spent in queue on server for trigger expression. |
10s |
{$HAPROXY.SERVER_ERESP.MAX.WARN} | Maximum of responses with error on server for trigger expression. |
10 |
{$HAPROXY.FRONT_SUTIL.MAX.WARN} | Maximum of session usage percentage on frontend for trigger expression. |
80 |
Name | Description | Type | Key and additional info |
---|---|---|---|
HAProxy: Get stats | HAProxy Statistics Report in CSV format |
Zabbix agent | web.page.get["{$HAPROXY.STATS.SCHEME}://{$HAPROXY.STATS.HOST}:{$HAPROXY.STATS.PORT}/{$HAPROXY.STATS.PATH};csv"] Preprocessing
|
HAProxy: Get nodes | Array for LLD rules. |
Dependent item | haproxy.get.nodes Preprocessing
|
HAProxy: Get stats page | HAProxy Statistics Report HTML |
Zabbix agent | web.page.get["{$HAPROXY.STATS.SCHEME}://{$HAPROXY.STATS.HOST}:{$HAPROXY.STATS.PORT}/{$HAPROXY.STATS.PATH}"] |
HAProxy: Version | Dependent item | haproxy.version Preprocessing
|
|
HAProxy: Uptime | Dependent item | haproxy.uptime Preprocessing
|
|
HAProxy: Service status | Zabbix agent | net.tcp.service["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"] Preprocessing
|
|
HAProxy: Service response time | Zabbix agent | net.tcp.service.perf["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HAProxy: Version has changed | HAProxy version has changed. Acknowledge to close the problem manually. |
last(/HAProxy by Zabbix agent/haproxy.version,#1)<>last(/HAProxy by Zabbix agent/haproxy.version,#2) and length(last(/HAProxy by Zabbix agent/haproxy.version))>0 |Info |
Manual close: Yes | |
HAProxy: has been restarted | Uptime is less than 10 minutes. |
last(/HAProxy by Zabbix agent/haproxy.uptime)<10m |Info |
Manual close: Yes | |
HAProxy: Service is down | last(/HAProxy by Zabbix agent/net.tcp.service["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"])=0 |Average |
Manual close: Yes | ||
HAProxy: Service response time is too high | min(/HAProxy by Zabbix agent/net.tcp.service.perf["{$HAPROXY.STATS.SCHEME}","{$HAPROXY.STATS.HOST}","{$HAPROXY.STATS.PORT}"],5m)>{$HAPROXY.RESPONSE_TIME.MAX.WARN} |Warning |
Manual close: Yes Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Backend discovery | Discovery backends |
Dependent item | haproxy.backend.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
HAProxy Backend {#PXNAME}: Raw data | The raw data of the Backend with the name |
Dependent item | haproxy.backend.raw[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Status | Possible values: UP - The server is reporting as healthy. DOWN - The server is reporting as unhealthy and unable to receive requests. NOLB - You've added http-check disable-on-404 to the backend and the health checked URL has returned an HTTP 404 response. MAINT - The server has been disabled or put into maintenance mode. DRAIN - The server has been put into drain mode. no check - Health checks are not enabled for this server. |
Dependent item | haproxy.backend.status[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Responses time | Average backend response time (in ms) for the last 1,024 requests |
Dependent item | haproxy.backend.rtime[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Errors connection per second | Number of requests that encountered an error attempting to connect to a backend server. |
Dependent item | haproxy.backend.econ.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Responses denied per second | Responses denied due to security concerns (ACL-restricted). |
Dependent item | haproxy.backend.dresp.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Response errors per second | Number of requests whose responses yielded an error |
Dependent item | haproxy.backend.eresp.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Unassigned requests | Current number of requests unassigned in queue. |
Dependent item | haproxy.backend.qcur[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Time in queue | Average time spent in queue (in ms) for the last 1,024 requests |
Dependent item | haproxy.backend.qtime[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Redispatched requests per second | Number of times a request was redispatched to a different backend. |
Dependent item | haproxy.backend.wredis.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Retried connections per second | Number of times a connection was retried. |
Dependent item | haproxy.backend.wretr.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Number of responses with codes 1xx per second | Number of informational HTTP responses per second. |
Dependent item | haproxy.backend.hrsp_1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Number of responses with codes 2xx per second | Number of successful HTTP responses per second. |
Dependent item | haproxy.backend.hrsp_2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Number of responses with codes 3xx per second | Number of HTTP redirections per second. |
Dependent item | haproxy.backend.hrsp_3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Number of responses with codes 4xx per second | Number of HTTP client errors per second. |
Dependent item | haproxy.backend.hrsp_4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Number of responses with codes 5xx per second | Number of HTTP server errors per second. |
Dependent item | haproxy.backend.hrsp_5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Incoming traffic | Number of bits received by the backend |
Dependent item | haproxy.backend.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Outgoing traffic | Number of bits sent by the backend |
Dependent item | haproxy.backend.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Number of active servers | Number of active servers. |
Dependent item | haproxy.backend.act[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Number of backup servers | Number of backup servers. |
Dependent item | haproxy.backend.bck[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Sessions per second | Cumulative number of sessions (end-to-end connections) per second. |
Dependent item | haproxy.backend.stot.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Backend {#PXNAME}: Weight | Total effective weight. |
Dependent item | haproxy.backend.weight[{#PXNAME},{#SVNAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HAProxy backend {#PXNAME}: Server is DOWN | Backend is not available. |
count(/HAProxy by Zabbix agent/haproxy.backend.status[{#PXNAME},{#SVNAME}],#5,"eq","DOWN")=5 |Average |
||
HAProxy backend {#PXNAME}: Average response time is high | Average backend response time (in ms) for the last 1,024 requests is more than {$HAPROXY.BACK_RTIME.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.backend.rtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_RTIME.MAX.WARN} |Warning |
||
HAProxy backend {#PXNAME}: Number of responses with error is high | Number of requests on backend, whose responses yielded an error, is more than {$HAPROXY.BACK_ERESP.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.backend.eresp.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_ERESP.MAX.WARN} |Warning |
||
HAProxy backend {#PXNAME}: Current number of requests unassigned in queue is high | Current number of requests on backend unassigned in queue is more than {$HAPROXY.BACK_QCUR.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.backend.qcur[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_QCUR.MAX.WARN} |Warning |
||
HAProxy backend {#PXNAME}: Average time spent in queue is high | Average time spent in queue (in ms) for the last 1,024 requests is more than {$HAPROXY.BACK_QTIME.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.backend.qtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.BACK_QTIME.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Frontend discovery | Discovery frontends |
Dependent item | haproxy.frontend.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
HAProxy Frontend {#PXNAME}: Raw data | The raw data of the Frontend with the name |
Dependent item | haproxy.frontend.raw[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Status | Possible values: OPEN, STOP. When Status is OPEN, the frontend is operating normally and ready to receive traffic. |
Dependent item | haproxy.frontend.status[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Requests rate | HTTP requests per second |
Dependent item | haproxy.frontend.req_rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Sessions rate | Number of sessions created per second |
Dependent item | haproxy.frontend.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Established sessions | The current number of established sessions. |
Dependent item | haproxy.frontend.scur[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Session limits | The most simultaneous sessions that are allowed, as defined by the maxconn setting in the frontend. |
Dependent item | haproxy.frontend.slim[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Session utilization | Percentage of sessions used (scur / slim * 100). |
Calculated | haproxy.frontend.sutil[{#PXNAME},{#SVNAME}] |
HAProxy Frontend {#PXNAME}: Request errors per second | Number of request errors per second. |
Dependent item | haproxy.frontend.ereq.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Denied requests per second | Requests denied due to security concerns (ACL-restricted) per second. |
Dependent item | haproxy.frontend.dreq.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Number of responses with codes 1xx per second | Number of informational HTTP responses per second. |
Dependent item | haproxy.frontend.hrsp_1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Number of responses with codes 2xx per second | Number of successful HTTP responses per second. |
Dependent item | haproxy.frontend.hrsp_2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Number of responses with codes 3xx per second | Number of HTTP redirections per second. |
Dependent item | haproxy.frontend.hrsp_3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Number of responses with codes 4xx per second | Number of HTTP client errors per second. |
Dependent item | haproxy.frontend.hrsp_4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Number of responses with codes 5xx per second | Number of HTTP server errors per second. |
Dependent item | haproxy.frontend.hrsp_5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Incoming traffic | Number of bits received by the frontend |
Dependent item | haproxy.frontend.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy Frontend {#PXNAME}: Outgoing traffic | Number of bits sent by the frontend |
Dependent item | haproxy.frontend.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HAProxy frontend {#PXNAME}: Session utilization is high | Alerting on this metric is essential to ensure your server has sufficient capacity to handle all concurrent sessions. Unlike requests, upon reaching the session limit HAProxy will deny additional clients until resource consumption drops. Furthermore, if you find your session usage percentage to be hovering above 80%, it could be time to either modify HAProxy's configuration to allow more sessions, or migrate your HAProxy server to a bigger box. |
min(/HAProxy by Zabbix agent/haproxy.frontend.sutil[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_SUTIL.MAX.WARN} |Warning |
||
HAProxy frontend {#PXNAME}: Number of request errors is high | Number of request errors is more than {$HAPROXY.FRONT_EREQ.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.frontend.ereq.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_EREQ.MAX.WARN} |Warning |
||
HAProxy frontend {#PXNAME}: Number of requests denied is high | Number of requests denied due to security concerns (ACL-restricted) is more than {$HAPROXY.FRONT_DREQ.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.frontend.dreq.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.FRONT_DREQ.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Server discovery | Discovery servers |
Dependent item | haproxy.server.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
HAProxy Server {#PXNAME} {#SVNAME}: Raw data | The raw data of the Server named |
Dependent item | haproxy.server.raw[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Status | Dependent item | haproxy.server.status[{#PXNAME},{#SVNAME}] Preprocessing
|
|
HAProxy {#PXNAME} {#SVNAME}: Responses time | Average server response time (in ms) for the last 1,024 requests. |
Dependent item | haproxy.server.rtime[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Errors connection per second | Number of requests that encountered an error attempting to connect to a backend server. |
Dependent item | haproxy.server.econ.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Responses denied per second | Responses denied due to security concerns (ACL-restricted). |
Dependent item | haproxy.server.dresp.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Response errors per second | Number of requests whose responses yielded an error. |
Dependent item | haproxy.server.eresp.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Unassigned requests | Current number of requests unassigned in queue. |
Dependent item | haproxy.server.qcur[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Time in queue | Average time spent in queue (in ms) for the last 1,024 requests. |
Dependent item | haproxy.server.qtime[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Redispatched requests per second | Number of times a request was redispatched to a different backend. |
Dependent item | haproxy.server.wredis.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Retried connections per second | Number of times a connection was retried. |
Dependent item | haproxy.server.wretr.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 1xx per second | Number of informational HTTP responses per second. |
Dependent item | haproxy.server.hrsp_1xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 2xx per second | Number of successful HTTP responses per second. |
Dependent item | haproxy.server.hrsp_2xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 3xx per second | Number of HTTP redirections per second. |
Dependent item | haproxy.server.hrsp_3xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 4xx per second | Number of HTTP client errors per second. |
Dependent item | haproxy.server.hrsp_4xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Number of responses with codes 5xx per second | Number of HTTP server errors per second. |
Dependent item | haproxy.server.hrsp_5xx.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Incoming traffic | Number of bits received by the backend |
Dependent item | haproxy.server.bin.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Outgoing traffic | Number of bits sent by the backend |
Dependent item | haproxy.server.bout.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Server is active | Shows whether the server is active (marked with a Y) or a backup (marked with a -). |
Dependent item | haproxy.server.act[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Server is backup | Shows whether the server is a backup (marked with a Y) or active (marked with a -). |
Dependent item | haproxy.server.bck[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Sessions per second | Cumulative number of sessions (end-to-end connections) per second. |
Dependent item | haproxy.server.stot.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Weight | Effective weight. |
Dependent item | haproxy.server.weight[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Configured maxqueue | Configured maxqueue for the server, or nothing in the value is 0 (default, meaning no limit). |
Dependent item | haproxy.server.qlimit[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Server was selected per second | Number of times that server was selected. |
Dependent item | haproxy.server.lbtot.rate[{#PXNAME},{#SVNAME}] Preprocessing
|
HAProxy {#PXNAME} {#SVNAME}: Status of last health check | Status of last health check, one of: UNK -> unknown INI -> initializing SOCKERR -> socket error L4OK -> check passed on layer 4, no upper layers testing enabled L4TOUT -> layer 1-4 timeout L4CON -> layer 1-4 connection problem, for example "Connection refused" (tcp rst) or "No route to host" (icmp) L6OK -> check passed on layer 6 L6TOUT -> layer 6 (SSL) timeout L6RSP -> layer 6 invalid response - protocol error L7OK -> check passed on layer 7 L7OKC -> check conditionally passed on layer 7, for example 404 with disable-on-404 L7TOUT -> layer 7 (HTTP/SMTP) timeout L7RSP -> layer 7 invalid response - protocol error L7STS -> layer 7 response error, for example HTTP 5xx Notice: If a check is currently running, the last known status will be reported, prefixed with "* ". e. g. "* L7OK". |
Dependent item | haproxy.server.check_status[{#PXNAME},{#SVNAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
HAProxy {#PXNAME} {#SVNAME}: Server is DOWN | Server is not available. |
count(/HAProxy by Zabbix agent/haproxy.server.status[{#PXNAME},{#SVNAME}],#5,"eq","DOWN")=5 |Warning |
||
HAProxy {#PXNAME} {#SVNAME}: Average response time is high | Average server response time (in ms) for the last 1,024 requests is more than {$HAPROXY.SERVER_RTIME.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.server.rtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_RTIME.MAX.WARN} |Warning |
||
HAProxy {#PXNAME} {#SVNAME}: Number of responses with error is high | Number of requests on server, whose responses yielded an error, is more than {$HAPROXY.SERVER_ERESP.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.server.eresp.rate[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_ERESP.MAX.WARN} |Warning |
||
HAProxy {#PXNAME} {#SVNAME}: Current number of requests unassigned in queue is high | Current number of requests unassigned in queue is more than {$HAPROXY.SERVER_QCUR.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.server.qcur[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_QCUR.MAX.WARN} |Warning |
||
HAProxy {#PXNAME} {#SVNAME}: Average time spent in queue is high | Average time spent in queue (in ms) for the last 1,024 requests is more than {$HAPROXY.SERVER_QTIME.MAX.WARN}. |
min(/HAProxy by Zabbix agent/haproxy.server.qtime[{#PXNAME},{#SVNAME}],5m)>{$HAPROXY.SERVER_QTIME.MAX.WARN} |Warning |
||
HAProxy {#PXNAME} {#SVNAME}: Health check error | Please check the server for faults. |
find(/HAProxy by Zabbix agent/haproxy.server.check_status[{#PXNAME},{#SVNAME}],#3,"regexp","(?:L[4-7]OK|^$)")=0 |Warning |
Depends on:
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template for monitoring Hadoop over HTTP that works without any external scripts. It collects metrics by polling the Hadoop API remotely using an HTTP agent and JSONPath preprocessing. Zabbix server (or proxy) execute direct requests to ResourceManager, NodeManagers, NameNode, DataNodes APIs. All metrics are collected at once, thanks to the Zabbix bulk data collection.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
You should define the IP address (or FQDN) and Web-UI port for the ResourceManager in {$HADOOP.RESOURCEMANAGER.HOST} and {$HADOOP.RESOURCEMANAGER.PORT} macros and for the NameNode in {$HADOOP.NAMENODE.HOST} and {$HADOOP.NAMENODE.PORT} macros respectively. Macros can be set in the template or overridden at the host level.
Name | Description | Default |
---|---|---|
{$HADOOP.RESOURCEMANAGER.HOST} | The Hadoop ResourceManager host IP address or FQDN. |
ResourceManager |
{$HADOOP.RESOURCEMANAGER.PORT} | The Hadoop ResourceManager Web-UI port. |
8088 |
{$HADOOP.RESOURCEMANAGER.RESPONSE_TIME.MAX.WARN} | The Hadoop ResourceManager API page maximum response time in seconds for trigger expression. |
10s |
{$HADOOP.NAMENODE.HOST} | The Hadoop NameNode host IP address or FQDN. |
NameNode |
{$HADOOP.NAMENODE.PORT} | The Hadoop NameNode Web-UI port. |
9870 |
{$HADOOP.NAMENODE.RESPONSE_TIME.MAX.WARN} | The Hadoop NameNode API page maximum response time in seconds for trigger expression. |
10s |
{$HADOOP.CAPACITY_REMAINING.MIN.WARN} | The Hadoop cluster capacity remaining percent for trigger expression. |
20 |
Name | Description | Type | Key and additional info |
---|---|---|---|
ResourceManager: Service status | Hadoop ResourceManager API port availability. |
Simple check | net.tcp.service["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"] Preprocessing
|
ResourceManager: Service response time | Hadoop ResourceManager API performance. |
Simple check | net.tcp.service.perf["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"] |
Hadoop: Get ResourceManager stats | HTTP agent | hadoop.resourcemanager.get | |
ResourceManager: Uptime | Dependent item | hadoop.resourcemanager.uptime Preprocessing
|
|
ResourceManager: Get info | Dependent item | hadoop.resourcemanager.info Preprocessing
|
|
ResourceManager: RPC queue & processing time | Average time spent on processing RPC requests. |
Dependent item | hadoop.resourcemanager.rpcprocessingtime_avg Preprocessing
|
ResourceManager: Active NMs | Number of Active NodeManagers. |
Dependent item | hadoop.resourcemanager.numactivenm Preprocessing
|
ResourceManager: Decommissioning NMs | Number of Decommissioning NodeManagers. |
Dependent item | hadoop.resourcemanager.numdecommissioningnm Preprocessing
|
ResourceManager: Decommissioned NMs | Number of Decommissioned NodeManagers. |
Dependent item | hadoop.resourcemanager.numdecommissionednm Preprocessing
|
ResourceManager: Lost NMs | Number of Lost NodeManagers. |
Dependent item | hadoop.resourcemanager.numlostnm Preprocessing
|
ResourceManager: Unhealthy NMs | Number of Unhealthy NodeManagers. |
Dependent item | hadoop.resourcemanager.numunhealthynm Preprocessing
|
ResourceManager: Rebooted NMs | Number of Rebooted NodeManagers. |
Dependent item | hadoop.resourcemanager.numrebootednm Preprocessing
|
ResourceManager: Shutdown NMs | Number of Shutdown NodeManagers. |
Dependent item | hadoop.resourcemanager.numshutdownnm Preprocessing
|
NameNode: Service status | Hadoop NameNode API port availability. |
Simple check | net.tcp.service["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"] Preprocessing
|
NameNode: Service response time | Hadoop NameNode API performance. |
Simple check | net.tcp.service.perf["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"] |
Hadoop: Get NameNode stats | HTTP agent | hadoop.namenode.get | |
NameNode: Uptime | Dependent item | hadoop.namenode.uptime Preprocessing
|
|
NameNode: Get info | Dependent item | hadoop.namenode.info Preprocessing
|
|
NameNode: RPC queue & processing time | Average time spent on processing RPC requests. |
Dependent item | hadoop.namenode.rpcprocessingtime_avg Preprocessing
|
NameNode: Block Pool Renaming | Dependent item | hadoop.namenode.percentblockpool_used Preprocessing
|
|
NameNode: Transactions since last checkpoint | Total number of transactions since last checkpoint. |
Dependent item | hadoop.namenode.transactionssincelast_checkpoint Preprocessing
|
NameNode: Percent capacity remaining | Available capacity in percent. |
Dependent item | hadoop.namenode.percent_remaining Preprocessing
|
NameNode: Capacity remaining | Available capacity. |
Dependent item | hadoop.namenode.capacity_remaining Preprocessing
|
NameNode: Corrupt blocks | Number of corrupt blocks. |
Dependent item | hadoop.namenode.corrupt_blocks Preprocessing
|
NameNode: Missing blocks | Number of missing blocks. |
Dependent item | hadoop.namenode.missing_blocks Preprocessing
|
NameNode: Failed volumes | Number of failed volumes. |
Dependent item | hadoop.namenode.volumefailurestotal Preprocessing
|
NameNode: Alive DataNodes | Count of alive DataNodes. |
Dependent item | hadoop.namenode.numlivedata_nodes Preprocessing
|
NameNode: Dead DataNodes | Count of dead DataNodes. |
Dependent item | hadoop.namenode.numdeaddata_nodes Preprocessing
|
NameNode: Stale DataNodes | DataNodes that do not send a heartbeat within 30 seconds are marked as "stale". |
Dependent item | hadoop.namenode.numstaledata_nodes Preprocessing
|
NameNode: Total files | Total count of files tracked by the NameNode. |
Dependent item | hadoop.namenode.files_total Preprocessing
|
NameNode: Total load | The current number of concurrent file accesses (read/write) across all DataNodes. |
Dependent item | hadoop.namenode.total_load Preprocessing
|
NameNode: Blocks allocable | Maximum number of blocks allocable. |
Dependent item | hadoop.namenode.block_capacity Preprocessing
|
NameNode: Total blocks | Count of blocks tracked by NameNode. |
Dependent item | hadoop.namenode.blocks_total Preprocessing
|
NameNode: Under-replicated blocks | The number of blocks with insufficient replication. |
Dependent item | hadoop.namenode.underreplicatedblocks Preprocessing
|
Hadoop: Get NodeManagers states | HTTP agent | hadoop.nodemanagers.get Preprocessing
|
|
Hadoop: Get DataNodes states | HTTP agent | hadoop.datanodes.get Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
ResourceManager: Service is unavailable | last(/Hadoop by HTTP/net.tcp.service["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"])=0 |Average |
Manual close: Yes | ||
ResourceManager: Service response time is too high | min(/Hadoop by HTTP/net.tcp.service.perf["tcp","{$HADOOP.RESOURCEMANAGER.HOST}","{$HADOOP.RESOURCEMANAGER.PORT}"],5m)>{$HADOOP.RESOURCEMANAGER.RESPONSE_TIME.MAX.WARN} |Warning |
Manual close: Yes Depends on:
|
||
ResourceManager: Service has been restarted | Uptime is less than 10 minutes. |
last(/Hadoop by HTTP/hadoop.resourcemanager.uptime)<10m |Info |
Manual close: Yes | |
ResourceManager: Failed to fetch ResourceManager API page | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Hadoop by HTTP/hadoop.resourcemanager.uptime,30m)=1 |Warning |
Manual close: Yes Depends on:
|
|
ResourceManager: Cluster has no active NodeManagers | Cluster is unable to execute any jobs without at least one NodeManager. |
max(/Hadoop by HTTP/hadoop.resourcemanager.num_active_nm,5m)=0 |High |
||
ResourceManager: Cluster has unhealthy NodeManagers | YARN considers any node with disk utilization exceeding the value specified under the property yarn.nodemanager.disk-health-checker.max-disk-utilization-per-disk-percentage (in yarn-site.xml) to be unhealthy. Ample disk space is critical to ensure uninterrupted operation of a Hadoop cluster, and large numbers of unhealthyNodes (the number to alert on depends on the size of your cluster) should be quickly investigated and resolved. |
min(/Hadoop by HTTP/hadoop.resourcemanager.num_unhealthy_nm,15m)>0 |Average |
||
NameNode: Service is unavailable | last(/Hadoop by HTTP/net.tcp.service["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"])=0 |Average |
Manual close: Yes | ||
NameNode: Service response time is too high | min(/Hadoop by HTTP/net.tcp.service.perf["tcp","{$HADOOP.NAMENODE.HOST}","{$HADOOP.NAMENODE.PORT}"],5m)>{$HADOOP.NAMENODE.RESPONSE_TIME.MAX.WARN} |Warning |
Manual close: Yes Depends on:
|
||
NameNode: Service has been restarted | Uptime is less than 10 minutes. |
last(/Hadoop by HTTP/hadoop.namenode.uptime)<10m |Info |
Manual close: Yes | |
NameNode: Failed to fetch NameNode API page | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Hadoop by HTTP/hadoop.namenode.uptime,30m)=1 |Warning |
Manual close: Yes Depends on:
|
|
NameNode: Cluster capacity remaining is low | A good practice is to ensure that disk use never exceeds 80 percent capacity. |
max(/Hadoop by HTTP/hadoop.namenode.percent_remaining,15m)<{$HADOOP.CAPACITY_REMAINING.MIN.WARN} |Warning |
||
NameNode: Cluster has missing blocks | A missing block is far worse than a corrupt block, because a missing block cannot be recovered by copying a replica. |
min(/Hadoop by HTTP/hadoop.namenode.missing_blocks,15m)>0 |Average |
||
NameNode: Cluster has volume failures | HDFS now allows for disks to fail in place, without affecting DataNode operations, until a threshold value is reached. This is set on each DataNode via the dfs.datanode.failed.volumes.tolerated property; it defaults to 0, meaning that any volume failure will shut down the DataNode; on a production cluster where DataNodes typically have 6, 8, or 12 disks, setting this parameter to 1 or 2 is typically the best practice. |
min(/Hadoop by HTTP/hadoop.namenode.volume_failures_total,15m)>0 |Average |
||
NameNode: Cluster has DataNodes in Dead state | The death of a DataNode causes a flurry of network activity, as the NameNode initiates replication of blocks lost on the dead nodes. |
min(/Hadoop by HTTP/hadoop.namenode.num_dead_data_nodes,5m)>0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Node manager discovery | HTTP agent | hadoop.nodemanager.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Hadoop NodeManager {#HOSTNAME}: Get stats | HTTP agent | hadoop.nodemanager.get[{#HOSTNAME}] | |
{#HOSTNAME}: RPC queue & processing time | Average time spent on processing RPC requests. |
Dependent item | hadoop.nodemanager.rpcprocessingtime_avg[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: Container launch avg duration | Dependent item | hadoop.nodemanager.containerlaunchduration_avg[{#HOSTNAME}] Preprocessing
|
|
{#HOSTNAME}: JVM Threads | The number of JVM threads. |
Dependent item | hadoop.nodemanager.jvm.threads[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: JVM Garbage collection time | The JVM garbage collection time in milliseconds. |
Dependent item | hadoop.nodemanager.jvm.gc_time[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: JVM Heap usage | The JVM heap usage in MBytes. |
Dependent item | hadoop.nodemanager.jvm.memheapused[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: Uptime | Dependent item | hadoop.nodemanager.uptime[{#HOSTNAME}] Preprocessing
|
|
Hadoop NodeManager {#HOSTNAME}: Get raw info | Dependent item | hadoop.nodemanager.raw_info[{#HOSTNAME}] Preprocessing
|
|
{#HOSTNAME}: State | State of the node - valid values are: NEW, RUNNING, UNHEALTHY, DECOMMISSIONING, DECOMMISSIONED, LOST, REBOOTED, SHUTDOWN. |
Dependent item | hadoop.nodemanager.state[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: Version | Dependent item | hadoop.nodemanager.version[{#HOSTNAME}] Preprocessing
|
|
{#HOSTNAME}: Number of containers | Dependent item | hadoop.nodemanager.numcontainers[{#HOSTNAME}] Preprocessing
|
|
{#HOSTNAME}: Used memory | Dependent item | hadoop.nodemanager.usedmemory[{#HOSTNAME}] Preprocessing
|
|
{#HOSTNAME}: Available memory | Dependent item | hadoop.nodemanager.availablememory[{#HOSTNAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
{#HOSTNAME}: Service has been restarted | Uptime is less than 10 minutes. |
last(/Hadoop by HTTP/hadoop.nodemanager.uptime[{#HOSTNAME}])<10m |Info |
Manual close: Yes | |
{#HOSTNAME}: Failed to fetch NodeManager API page | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Hadoop by HTTP/hadoop.nodemanager.uptime[{#HOSTNAME}],30m)=1 |Warning |
Manual close: Yes Depends on:
|
|
{#HOSTNAME}: NodeManager has state {ITEM.VALUE}. | The state is different from normal. |
last(/Hadoop by HTTP/hadoop.nodemanager.state[{#HOSTNAME}])<>"RUNNING" |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Data node discovery | HTTP agent | hadoop.datanode.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Hadoop DataNode {#HOSTNAME}: Get stats | HTTP agent | hadoop.datanode.get[{#HOSTNAME}] | |
{#HOSTNAME}: Remaining | Remaining disk space. |
Dependent item | hadoop.datanode.remaining[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: Used | Used disk space. |
Dependent item | hadoop.datanode.dfs_used[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: Number of failed volumes | Number of failed storage volumes. |
Dependent item | hadoop.datanode.numfailedvolumes[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: JVM Threads | The number of JVM threads. |
Dependent item | hadoop.datanode.jvm.threads[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: JVM Garbage collection time | The JVM garbage collection time in milliseconds. |
Dependent item | hadoop.datanode.jvm.gc_time[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: JVM Heap usage | The JVM heap usage in MBytes. |
Dependent item | hadoop.datanode.jvm.memheapused[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: Uptime | Dependent item | hadoop.datanode.uptime[{#HOSTNAME}] Preprocessing
|
|
Hadoop DataNode {#HOSTNAME}: Get raw info | Dependent item | hadoop.datanode.raw_info[{#HOSTNAME}] Preprocessing
|
|
{#HOSTNAME}: Version | DataNode software version. |
Dependent item | hadoop.datanode.version[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: Admin state | Administrative state. |
Dependent item | hadoop.datanode.admin_state[{#HOSTNAME}] Preprocessing
|
{#HOSTNAME}: Oper state | Operational state. |
Dependent item | hadoop.datanode.oper_state[{#HOSTNAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
{#HOSTNAME}: Service has been restarted | Uptime is less than 10 minutes. |
last(/Hadoop by HTTP/hadoop.datanode.uptime[{#HOSTNAME}])<10m |Info |
Manual close: Yes | |
{#HOSTNAME}: Failed to fetch DataNode API page | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Hadoop by HTTP/hadoop.datanode.uptime[{#HOSTNAME}],30m)=1 |Warning |
Manual close: Yes Depends on:
|
|
{#HOSTNAME}: DataNode has state {ITEM.VALUE}. | The state is different from normal. |
last(/Hadoop by HTTP/hadoop.datanode.oper_state[{#HOSTNAME}])<>"Live" |Average |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed to monitor GitLab by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template GitLab by HTTP
— collects metrics by an HTTP agent from the GitLab /-/metrics
endpoint.
See https://docs.gitlab.com/ee/administration/monitoring/prometheus/gitlab_metrics.html.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template works with self-hosted GitLab instances. Internal service metrics are collected from the GitLab /-/metrics
endpoint.
To access metrics following two methods are available:
Admin -> Monitoring -> Health check
page: http://your.gitlab.address/admin/health_check; Use this token in macro {$GITLAB.HEALTH.TOKEN}
as variable path, like: ?token=your_token
.
Remember to change the macros {$GITLAB.URL}
.
Also, see the Macros section for a list of macros used to set trigger values.NOTE. Some metrics may not be collected depending on your Gitlab instance version and configuration. See Gitlab's documentation for further information about its metric collection.
Name | Description | Default |
---|---|---|
{$GITLAB.URL} | URL of a GitLab instance. |
http://localhost |
{$GITLAB.HEALTH.TOKEN} | The token path for Gitlab health check. Example |
|
{$GITLAB.UNICORN.UTILIZATION.MAX.WARN} | The maximum percentage of Unicorn workers utilization for a trigger expression. |
90 |
{$GITLAB.PUMA.UTILIZATION.MAX.WARN} | The maximum percentage of Puma thread utilization for a trigger expression. |
90 |
{$GITLAB.HTTP.FAIL.MAX.WARN} | The maximum number of HTTP request failures for a trigger expression. |
2 |
{$GITLAB.REDIS.FAIL.MAX.WARN} | The maximum number of Redis client exceptions for a trigger expression. |
2 |
{$GITLAB.UNICORN.QUEUE.MAX.WARN} | The maximum number of Unicorn queued requests for a trigger expression. |
1 |
{$GITLAB.PUMA.QUEUE.MAX.WARN} | The maximum number of Puma queued requests for a trigger expression. |
1 |
{$GITLAB.OPEN.FDS.MAX.WARN} | The maximum percentage of used file descriptors for a trigger expression. |
90 |
Name | Description | Type | Key and additional info |
---|---|---|---|
GitLab: Get instance metrics | HTTP agent | gitlab.get_metrics Preprocessing
|
|
GitLab: Instance readiness check | The readiness probe checks whether the GitLab instance is ready to accept traffic via Rails Controllers. |
HTTP agent | gitlab.readiness Preprocessing
|
GitLab: Application server status | Checks whether the application server is running. This probe is used to know if Rails Controllers are not deadlocked due to a multi-threading. |
HTTP agent | gitlab.liveness Preprocessing
|
GitLab: Version | Version of the GitLab instance. |
Dependent item | gitlab.deployments.version Preprocessing
|
GitLab: Ruby: First process start time | Minimum UNIX timestamp of ruby processes start time. |
Dependent item | gitlab.ruby.processstarttime_seconds.first Preprocessing
|
GitLab: Ruby: Last process start time | Maximum UNIX timestamp ruby processes start time. |
Dependent item | gitlab.ruby.processstarttime_seconds.last Preprocessing
|
GitLab: User logins, total | Counter of how many users have logged in since GitLab was started or restarted. |
Dependent item | gitlab.usersessionlogins_total Preprocessing
|
GitLab: User CAPTCHA logins failed, total | Counter of failed CAPTCHA attempts during login. |
Dependent item | gitlab.failedlogincaptcha_total Preprocessing
|
GitLab: User CAPTCHA logins, total | Counter of successful CAPTCHA attempts during login. |
Dependent item | gitlab.successfullogincaptcha_total Preprocessing
|
GitLab: Upload file does not exist | Number of times an upload record could not find its file. |
Dependent item | gitlab.uploadfiledoesnotexist Preprocessing
|
GitLab: Pipelines: Processing events, total | Total amount of pipeline processing events. |
Dependent item | gitlab.pipeline.processingeventstotal Preprocessing
|
GitLab: Pipelines: Created, total | Counter of pipelines created. |
Dependent item | gitlab.pipeline.created_total Preprocessing
|
GitLab: Pipelines: Auto DevOps pipelines, total | Counter of completed Auto DevOps pipelines. |
Dependent item | gitlab.pipeline.autodevopscompleted.total Preprocessing
|
GitLab: Pipelines: Auto DevOps pipelines, failed | Counter of completed Auto DevOps pipelines with status "failed". |
Dependent item | gitlab.pipeline.autodevopscompleted_total.failed Preprocessing
|
GitLab: Pipelines: CI/CD creation duration | The sum of the time in seconds it takes to create a CI/CD pipeline. |
Dependent item | gitlab.pipeline.pipeline_creation Preprocessing
|
GitLab: Pipelines: Pipelines: CI/CD creation count | The count of the time it takes to create a CI/CD pipeline. |
Dependent item | gitlab.pipeline.pipeline_creation.count Preprocessing
|
GitLab: Database: Connection pool, busy | Connections to the main database in use where the owner is still alive. |
Dependent item | gitlab.database.connectionpoolbusy Preprocessing
|
GitLab: Database: Connection pool, current | Current connections to the main database in the pool. |
Dependent item | gitlab.database.connectionpoolconnections Preprocessing
|
GitLab: Database: Connection pool, dead | Connections to the main database in use where the owner is not alive. |
Dependent item | gitlab.database.connectionpooldead Preprocessing
|
GitLab: Database: Connection pool, idle | Connections to the main database not in use. |
Dependent item | gitlab.database.connectionpoolidle Preprocessing
|
GitLab: Database: Connection pool, size | Total connection to the main database pool capacity. |
Dependent item | gitlab.database.connectionpoolsize Preprocessing
|
GitLab: Database: Connection pool, waiting | Threads currently waiting on this queue. |
Dependent item | gitlab.database.connectionpoolwaiting Preprocessing
|
GitLab: Redis: Client requests rate, queues | Number of Redis client requests per second. (Instance: queues) |
Dependent item | gitlab.redis.client_requests.queues.rate Preprocessing
|
GitLab: Redis: Client requests rate, cache | Number of Redis client requests per second. (Instance: cache) |
Dependent item | gitlab.redis.client_requests.cache.rate Preprocessing
|
GitLab: Redis: Client requests rate, shared_state | Number of Redis client requests per second. (Instance: shared_state) |
Dependent item | gitlab.redis.clientrequests.sharedstate.rate Preprocessing
|
GitLab: Redis: Client exceptions rate, queues | Number of Redis client exceptions per second. (Instance: queues) |
Dependent item | gitlab.redis.client_exceptions.queues.rate Preprocessing
|
GitLab: Redis: Client exceptions rate, cache | Number of Redis client exceptions per second. (Instance: cache) |
Dependent item | gitlab.redis.client_exceptions.cache.rate Preprocessing
|
GitLab: Redis: client exceptions rate, shared_state | Number of Redis client exceptions per second. (Instance: shared_state) |
Dependent item | gitlab.redis.clientexceptions.sharedstate.rate Preprocessing
|
GitLab: Cache: Misses rate, total | The cache read miss count. |
Dependent item | gitlab.cache.misses_total.rate Preprocessing
|
GitLab: Cache: Operations rate, total | The count of cache operations. |
Dependent item | gitlab.cache.operations_total.rate Preprocessing
|
GitLab: Ruby: CPU usage per second | Average CPU time util in seconds. |
Dependent item | gitlab.ruby.processcpuseconds.rate Preprocessing
|
GitLab: Ruby: Running_threads | Number of running Ruby threads. |
Dependent item | gitlab.ruby.threads_running Preprocessing
|
GitLab: Ruby: File descriptors opened, avg | Average number of opened file descriptors. |
Dependent item | gitlab.ruby.file_descriptors.avg Preprocessing
|
GitLab: Ruby: File descriptors opened, max | Maximum number of opened file descriptors. |
Dependent item | gitlab.ruby.file_descriptors.max Preprocessing
|
GitLab: Ruby: File descriptors opened, min | Minimum number of opened file descriptors. |
Dependent item | gitlab.ruby.file_descriptors.min Preprocessing
|
GitLab: Ruby: File descriptors, max | Maximum number of open file descriptors per process. |
Dependent item | gitlab.ruby.processmaxfds Preprocessing
|
GitLab: Ruby: RSS memory, avg | Average RSS Memory usage in bytes. |
Dependent item | gitlab.ruby.processresidentmemory_bytes.avg Preprocessing
|
GitLab: Ruby: RSS memory, min | Minimum RSS Memory usage in bytes. |
Dependent item | gitlab.ruby.processresidentmemory_bytes.min Preprocessing
|
GitLab: Ruby: RSS memory, max | Maximum RSS Memory usage in bytes. |
Dependent item | gitlab.ruby.processresidentmemory_bytes.max Preprocessing
|
GitLab: HTTP requests rate, total | Number of requests received into the system. |
Dependent item | gitlab.http.requests.rate Preprocessing
|
GitLab: HTTP requests rate, 5xx | Number of handle failures of requests with HTTP-code 5xx. |
Dependent item | gitlab.http.requests.5xx.rate Preprocessing
|
GitLab: HTTP requests rate, 4xx | Number of handle failures of requests with code 4XX. |
Dependent item | gitlab.http.requests.4xx.rate Preprocessing
|
GitLab: Transactions per second | Transactions per second (gitlabtransaction* metrics). |
Dependent item | gitlab.transactions.rate Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
GitLab: Gitlab instance is not able to accept traffic | last(/GitLab by HTTP/gitlab.readiness)=0 |High |
Depends on:
|
||
GitLab: Liveness check was failed | The application server is not running or Rails Controllers are deadlocked. |
last(/GitLab by HTTP/gitlab.liveness)=0 |High |
||
GitLab: Version has changed | The GitLab version has changed. Acknowledge to close the problem manually. |
last(/GitLab by HTTP/gitlab.deployments.version,#1)<>last(/GitLab by HTTP/gitlab.deployments.version,#2) and length(last(/GitLab by HTTP/gitlab.deployments.version))>0 |Info |
Manual close: Yes | |
GitLab: Too many Redis queues client exceptions | "Too many Redis client exceptions during the requests to Redis instance queues." |
min(/GitLab by HTTP/gitlab.redis.client_exceptions.queues.rate,5m)>{$GITLAB.REDIS.FAIL.MAX.WARN} |Warning |
||
GitLab: Too many Redis cache client exceptions | "Too many Redis client exceptions during the requests to Redis instance cache." |
min(/GitLab by HTTP/gitlab.redis.client_exceptions.cache.rate,5m)>{$GITLAB.REDIS.FAIL.MAX.WARN} |Warning |
||
GitLab: Too many Redis shared_state client exceptions | "Too many Redis client exceptions during the requests to Redis instance shared_state." |
min(/GitLab by HTTP/gitlab.redis.client_exceptions.shared_state.rate,5m)>{$GITLAB.REDIS.FAIL.MAX.WARN} |Warning |
||
GitLab: Failed to fetch info data | Zabbix has not received a metrics data for the last 30 minutes |
nodata(/GitLab by HTTP/gitlab.ruby.threads_running,30m)=1 |Warning |
Manual close: Yes Depends on:
|
|
GitLab: Current number of open files is too high | min(/GitLab by HTTP/gitlab.ruby.file_descriptors.max,5m)/last(/GitLab by HTTP/gitlab.ruby.process_max_fds)*100>{$GITLAB.OPEN.FDS.MAX.WARN} |Warning |
|||
GitLab: Too many HTTP requests failures | "Too many requests failed on GitLab instance with 5xx HTTP code" |
min(/GitLab by HTTP/gitlab.http.requests.5xx.rate,5m)>{$GITLAB.HTTP.FAIL.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Unicorn metrics discovery | DiscoveryUnicorn specific metrics, when Unicorn is used. |
HTTP agent | gitlab.unicorn.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
GitLab: Unicorn: Workers | The number of Unicorn workers |
Dependent item | gitlab.unicorn.unicorn_workers[{#SINGLETON}] Preprocessing
|
GitLab: Unicorn: Active connections | The number of active Unicorn connections. |
Dependent item | gitlab.unicorn.active_connections[{#SINGLETON}] Preprocessing
|
GitLab: Unicorn: Queued connections | The number of queued Unicorn connections. |
Dependent item | gitlab.unicorn.queued_connections[{#SINGLETON}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
GitLab: Unicorn worker utilization is too high | min(/GitLab by HTTP/gitlab.unicorn.active_connections[{#SINGLETON}],5m)/last(/GitLab by HTTP/gitlab.unicorn.unicorn_workers[{#SINGLETON}])*100>{$GITLAB.UNICORN.UTILIZATION.MAX.WARN} |Warning |
|||
GitLab: Unicorn is queueing requests | min(/GitLab by HTTP/gitlab.unicorn.queued_connections[{#SINGLETON}],5m)>{$GITLAB.UNICORN.QUEUE.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Puma metrics discovery | Discovery of Puma specific metrics when Puma is used. |
HTTP agent | gitlab.puma.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
GitLab: Active connections | Number of puma threads processing a request. |
Dependent item | gitlab.puma.active_connections[{#SINGLETON}] Preprocessing
|
GitLab: Workers | Total number of puma workers. |
Dependent item | gitlab.puma.workers[{#SINGLETON}] Preprocessing
|
GitLab: Running workers | The number of booted puma workers. |
Dependent item | gitlab.puma.running_workers[{#SINGLETON}] Preprocessing
|
GitLab: Stale workers | The number of old puma workers. |
Dependent item | gitlab.puma.stale_workers[{#SINGLETON}] Preprocessing
|
GitLab: Running threads | The number of running puma threads. |
Dependent item | gitlab.puma.running[{#SINGLETON}] Preprocessing
|
GitLab: Queued connections | The number of connections in that puma worker's "todo" set waiting for a worker thread. |
Dependent item | gitlab.puma.queued_connections[{#SINGLETON}] Preprocessing
|
GitLab: Pool capacity | The number of requests the puma worker is capable of taking right now. |
Dependent item | gitlab.puma.pool_capacity[{#SINGLETON}] Preprocessing
|
GitLab: Max threads | The maximum number of puma worker threads. |
Dependent item | gitlab.puma.max_threads[{#SINGLETON}] Preprocessing
|
GitLab: Idle threads | The number of spawned puma threads which are not processing a request. |
Dependent item | gitlab.puma.idle_threads[{#SINGLETON}] Preprocessing
|
GitLab: Killer terminations, total | The number of workers terminated by PumaWorkerKiller. |
Dependent item | gitlab.puma.killerterminationstotal[{#SINGLETON}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
GitLab: Puma instance thread utilization is too high | min(/GitLab by HTTP/gitlab.puma.active_connections[{#SINGLETON}],5m)/last(/GitLab by HTTP/gitlab.puma.max_threads[{#SINGLETON}])*100>{$GITLAB.PUMA.UTILIZATION.MAX.WARN} |Warning |
|||
GitLab: Puma is queueing requests | min(/GitLab by HTTP/gitlab.puma.queued_connections[{#SINGLETON}],15m)>{$GITLAB.PUMA.QUEUE.MAX.WARN} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Official JMX Template from Zabbix distribution. Could be useful for many Java Applications (JMX).
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Refer to the vendor documentation.
Name | Description | Default |
---|---|---|
{$JMX.NONHEAP.MEM.USAGE.MAX} | A threshold in percent for Non-heap memory utilization trigger. |
85 |
{$JMX.NONHEAP.MEM.USAGE.TIME} | The time during which the Non-heap memory utilization may exceed the threshold. |
10m |
{$JMX.HEAP.MEM.USAGE.MAX} | A threshold in percent for Heap memory utilization trigger. |
85 |
{$JMX.HEAP.MEM.USAGE.TIME} | The time during which the Heap memory utilization may exceed the threshold. |
10m |
{$JMX.MP.USAGE.MAX} | A threshold in percent for memory pools utilization trigger. Use a context to change the threshold for a specific pool. |
85 |
{$JMX.MP.USAGE.TIME} | The time during which the memory pools utilization may exceed the threshold. |
10m |
{$JMX.FILE.DESCRIPTORS.MAX} | A threshold in percent for file descriptors count trigger. |
85 |
{$JMX.FILE.DESCRIPTORS.TIME} | The time during which the file descriptors count may exceed the threshold. |
3m |
{$JMX.CPU.LOAD.MAX} | A threshold in percent for CPU utilization trigger. |
85 |
{$JMX.CPU.LOAD.TIME} | The time during which the CPU utilization may exceed the threshold. |
5m |
{$JMX.MEM.POOL.NAME.MATCHES} | This macro used in memory pool discovery as a filter. |
Old Gen|G1|Perm Gen|Code Cache|Tenured Gen |
{$JMX.USER} | JMX username. |
|
{$JMX.PASSWORD} | JMX password. |
Name | Description | Type | Key and additional info |
---|---|---|---|
ClassLoading: Loaded class count | Displays number of classes that are currently loaded in the Java virtual machine. |
JMX agent | jmx["java.lang:type=ClassLoading","LoadedClassCount"] Preprocessing
|
ClassLoading: Total loaded class count | Displays the total number of classes that have been loaded since the Java virtual machine has started execution. |
JMX agent | jmx["java.lang:type=ClassLoading","TotalLoadedClassCount"] Preprocessing
|
ClassLoading: Unloaded class count | Displays the total number of classes that have been loaded since the Java virtual machine has started execution. |
JMX agent | jmx["java.lang:type=ClassLoading","UnloadedClassCount"] Preprocessing
|
Compilation: Name of the current JIT compiler | Displays the total number of classes unloaded since the Java virtual machine has started execution. |
JMX agent | jmx["java.lang:type=Compilation","Name"] Preprocessing
|
Compilation: Accumulated time spent | Displays the approximate accumulated elapsed time spent in compilation, in seconds. |
JMX agent | jmx["java.lang:type=Compilation","TotalCompilationTime"] Preprocessing
|
Memory: Heap memory committed | Current heap memory allocated. This amount of memory is guaranteed for the Java virtual machine to use. |
JMX agent | jmx["java.lang:type=Memory","HeapMemoryUsage.committed"] |
Memory: Heap memory maximum size | Maximum amount of heap that can be used for memory management. This amount of memory is not guaranteed to be available if it is greater than the amount of committed memory. The Java virtual machine may fail to allocate memory even if the amount of used memory does not exceed this maximum size. |
JMX agent | jmx["java.lang:type=Memory","HeapMemoryUsage.max"] Preprocessing
|
Memory: Heap memory used | Current memory usage outside the heap. |
JMX agent | jmx["java.lang:type=Memory","HeapMemoryUsage.used"] Preprocessing
|
Memory: Non-Heap memory committed | Current memory allocated outside the heap. This amount of memory is guaranteed for the Java virtual machine to use. |
JMX agent | jmx["java.lang:type=Memory","NonHeapMemoryUsage.committed"] Preprocessing
|
Memory: Non-Heap memory maximum size | Maximum amount of non-heap memory that can be used for memory management. This amount of memory is not guaranteed to be available if it is greater than the amount of committed memory. The Java virtual machine may fail to allocate memory even if the amount of used memory does not exceed this maximum size. |
JMX agent | jmx["java.lang:type=Memory","NonHeapMemoryUsage.max"] Preprocessing
|
Memory: Non-Heap memory used | Current memory usage outside the heap |
JMX agent | jmx["java.lang:type=Memory","NonHeapMemoryUsage.used"] Preprocessing
|
Memory: Object pending finalization count | The approximate number of objects for which finalization is pending. |
JMX agent | jmx["java.lang:type=Memory","ObjectPendingFinalizationCount"] Preprocessing
|
OperatingSystem: File descriptors maximum count | This is the number of file descriptors we can have opened in the same process, as determined by the operating system. You can never have more file descriptors than this number. |
JMX agent | jmx["java.lang:type=OperatingSystem","MaxFileDescriptorCount"] Preprocessing
|
OperatingSystem: File descriptors opened | This is the number of opened file descriptors at the moment, if this reaches the MaxFileDescriptorCount, the application will throw an IOException: Too many open files. This could mean you are opening file descriptors and never closing them. |
JMX agent | jmx["java.lang:type=OperatingSystem","OpenFileDescriptorCount"] |
OperatingSystem: Process CPU Load | ProcessCpuLoad represents the CPU load in this process. |
JMX agent | jmx["java.lang:type=OperatingSystem","ProcessCpuLoad"] Preprocessing
|
Runtime: JVM uptime | JMX agent | jmx["java.lang:type=Runtime","Uptime"] Preprocessing
|
|
Runtime: JVM name | JMX agent | jmx["java.lang:type=Runtime","VmName"] Preprocessing
|
|
Runtime: JVM version | JMX agent | jmx["java.lang:type=Runtime","VmVersion"] Preprocessing
|
|
Threading: Daemon thread count | Number of daemon threads running. |
JMX agent | jmx["java.lang:type=Threading","DaemonThreadCount"] Preprocessing
|
Threading: Peak thread count | Maximum number of threads being executed at the same time since the JVM was started or the peak was reset. |
JMX agent | jmx["java.lang:type=Threading","PeakThreadCount"] |
Threading: Thread count | The number of threads running at the current moment. |
JMX agent | jmx["java.lang:type=Threading","ThreadCount"] |
Threading: Total started thread count | The number of threads started since the JVM was launched. |
JMX agent | jmx["java.lang:type=Threading","TotalStartedThreadCount"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Compilation: {HOST.NAME} uses suboptimal JIT compiler | find(/Generic Java JMX/jmx["java.lang:type=Compilation","Name"],,"like","Client")=1 |Info |
Manual close: Yes | ||
Memory: Heap memory usage is high | min(/Generic Java JMX/jmx["java.lang:type=Memory","HeapMemoryUsage.used"],{$JMX.HEAP.MEM.USAGE.TIME})>(last(/Generic Java JMX/jmx["java.lang:type=Memory","HeapMemoryUsage.max"])*{$JMX.HEAP.MEM.USAGE.MAX}/100) and last(/Generic Java JMX/jmx["java.lang:type=Memory","HeapMemoryUsage.max"])>0 |Warning |
|||
Memory: Non-Heap memory usage is high | min(/Generic Java JMX/jmx["java.lang:type=Memory","NonHeapMemoryUsage.used"],{$JMX.NONHEAP.MEM.USAGE.TIME})>(last(/Generic Java JMX/jmx["java.lang:type=Memory","NonHeapMemoryUsage.max"])*{$JMX.NONHEAP.MEM.USAGE.MAX}/100) and last(/Generic Java JMX/jmx["java.lang:type=Memory","NonHeapMemoryUsage.max"])>0 |Warning |
|||
OperatingSystem: Opened file descriptor count is high | min(/Generic Java JMX/jmx["java.lang:type=OperatingSystem","OpenFileDescriptorCount"],{$JMX.FILE.DESCRIPTORS.TIME})>(last(/Generic Java JMX/jmx["java.lang:type=OperatingSystem","MaxFileDescriptorCount"])*{$JMX.FILE.DESCRIPTORS.MAX}/100) |Warning |
|||
OperatingSystem: Process CPU Load is high | min(/Generic Java JMX/jmx["java.lang:type=OperatingSystem","ProcessCpuLoad"],{$JMX.CPU.LOAD.TIME})>{$JMX.CPU.LOAD.MAX} |Average |
|||
Runtime: JVM is not reachable | nodata(/Generic Java JMX/jmx["java.lang:type=Runtime","Uptime"],5m)=1 |Average |
Manual close: Yes | ||
Runtime: {HOST.NAME} runs suboptimal VM type | find(/Generic Java JMX/jmx["java.lang:type=Runtime","VmName"],,"like","Server")<>1 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Garbage collector discovery | Garbage collectors metrics discovery. |
JMX agent | jmx.discovery["beans","java.lang:name=*,type=GarbageCollector"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
GarbageCollector: {#JMXNAME} number of collections per second | Displays the total number of collections that have occurred per second. |
JMX agent | jmx["java.lang:name={#JMXNAME},type=GarbageCollector","CollectionCount"] Preprocessing
|
GarbageCollector: {#JMXNAME} accumulated time spent in collection | Displays the approximate accumulated collection elapsed time, in seconds. |
JMX agent | jmx["java.lang:name={#JMXNAME},type=GarbageCollector","CollectionTime"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Memory pool discovery | Memory pools metrics discovery. |
JMX agent | jmx.discovery["beans","java.lang:name=*,type=MemoryPool"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Memory pool: {#JMXNAME} committed | Current memory allocated. |
JMX agent | jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.committed"] Preprocessing
|
Memory pool: {#JMXNAME} maximum size | Maximum amount of memory that can be used for memory management. This amount of memory is not guaranteed to be available if it is greater than the amount of committed memory. The Java virtual machine may fail to allocate memory even if the amount of used memory does not exceed this maximum size. |
JMX agent | jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.max"] Preprocessing
|
Memory pool: {#JMXNAME} used | Current memory usage. |
JMX agent | jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.used"] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Memory pool: {#JMXNAME} memory usage is high | min(/Generic Java JMX/jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.used"],{$JMX.MP.USAGE.TIME:"{#JMXNAME}"})>(last(/Generic Java JMX/jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.max"])*{$JMX.MP.USAGE.MAX:"{#JMXNAME}"}/100) and last(/Generic Java JMX/jmx["java.lang:name={#JMXNAME},type=MemoryPool","Usage.max"])>0 |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Official Template for Microsoft Exchange Server 2016.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Metrics are collected by Zabbix agent active.
1. Import the template into Zabbix.
2. Link the imported template to a host with MS Exchange.
Note that template doesn't provide information about Windows services state. Recommended to use it with "OS Windows by Zabbix agent active" template.
Name | Description | Default |
---|---|---|
{$MS.EXCHANGE.PERF.INTERVAL} | Update interval for perfcounteren items. |
60 |
{$MS.EXCHANGE.DB.FAULTS.TIME} | The time during which the database page faults may exceed the threshold. |
5m |
{$MS.EXCHANGE.DB.FAULTS.WARN} | Threshold for database page faults trigger. |
0 |
{$MS.EXCHANGE.LOG.STALLS.TIME} | The time during which the log records stalled may exceed the threshold. |
10m |
{$MS.EXCHANGE.LOG.STALLS.WARN} | Threshold for log records stalled trigger. |
100 |
{$MS.EXCHANGE.DB.ACTIVE.READ.TIME} | The time during which the active database read operations latency may exceed the threshold. |
5m |
{$MS.EXCHANGE.DB.ACTIVE.READ.WARN} | Threshold for active database read operations latency trigger. |
0.02 |
{$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME} | The time during which the active database write operations latency may exceed the threshold. |
10m |
{$MS.EXCHANGE.DB.ACTIVE.WRITE.WARN} | Threshold for active database write operations latency trigger. |
0.05 |
{$MS.EXCHANGE.DB.PASSIVE.READ.TIME} | The time during which the passive database read operations latency may exceed the threshold. |
5m |
{$MS.EXCHANGE.DB.PASSIVE.READ.WARN} | Threshold for passive database read operations latency trigger. |
0.2 |
{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME} | The time during which the passive database write operations latency may exceed the threshold. |
10m |
{$MS.EXCHANGE.RPC.TIME} | The time during which the RPC requests latency may exceed the threshold. |
10m |
{$MS.EXCHANGE.RPC.WARN} | Threshold for RPC requests latency trigger. |
0.05 |
{$MS.EXCHANGE.RPC.COUNT.TIME} | The time during which the RPC total requests may exceed the threshold. |
5m |
{$MS.EXCHANGE.RPC.COUNT.WARN} | Threshold for LDAP triggers. |
70 |
{$MS.EXCHANGE.LDAP.TIME} | The time during which the LDAP metrics may exceed the threshold. |
5m |
{$MS.EXCHANGE.LDAP.WARN} | Threshold for LDAP triggers. |
0.05 |
{$AGENT.TIMEOUT} | Timeout after which agent is considered unavailable. |
5m |
Name | Description | Type | Key and additional info |
---|---|---|---|
MS Exchange: Databases total mounted | Shows the number of active database copies on the server. |
Zabbix agent (active) | perfcounteren["\MSExchange Active Manager(_total)\Database Mounted"] Preprocessing
|
MS Exchange [Client Access Server]: ActiveSync: ping command pending | Shows the number of ping commands currently pending in the queue. |
Zabbix agent (active) | perfcounteren["\MSExchange ActiveSync\Ping Commands Pending", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange [Client Access Server]: ActiveSync: requests per second | Shows the number of HTTP requests received from the client via ASP.NET per second. Determines the current Exchange ActiveSync request rate. Used only to determine current user load. |
Zabbix agent (active) | perfcounteren["\MSExchange ActiveSync\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange [Client Access Server]: ActiveSync: sync commands per second | Shows the number of sync commands processed per second. Clients use this command to synchronize items within a folder. |
Zabbix agent (active) | perfcounteren["\MSExchange ActiveSync\Sync Commands/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange [Client Access Server]: Autodiscover: requests per second | Shows the number of Autodiscover service requests processed each second. Determines current user load. |
Zabbix agent (active) | perfcounteren["\MSExchangeAutodiscover\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange [Client Access Server]: Availability Service: availability requests per second | Shows the number of requests serviced per second. The request can be only for free/ busy information or include suggestions. One request may contain multiple mailboxes. Determines the rate at which Availability service requests are occurring. |
Zabbix agent (active) | perfcounteren["\MSExchange Availability Service\Availability Requests (sec)", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange [Client Access Server]: Outlook Web App: current unique users | Shows the number of unique users currently logged on to Outlook Web App. This value monitors the number of unique active user sessions, so that users are only removed from this counter after they log off or their session times out. Determines current user load. |
Zabbix agent (active) | perfcounteren["\MSExchange OWA\Current Unique Users", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange [Client Access Server]: Outlook Web App: requests per second | Shows the number of requests handled by Outlook Web App per second. Determines current user load. |
Zabbix agent (active) | perfcounteren["\MSExchange OWA\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange [Client Access Server]: MSExchangeWS: requests per second | Shows the number of requests processed each second. Determines current user load. |
Zabbix agent (active) | perfcounteren["\MSExchangeWS\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange: Active agent availability | Availability of active checks on the host. The value of this item corresponds to availability icons in the host list. Possible value: 0 - unknown 1 - available 2 - not available |
Zabbix internal | zabbix[host,active_agent,available] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
MS Exchange: Active checks are not available | Active checks are considered unavailable. Agent is not sending heartbeat for prolonged time. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/zabbix[host,active_agent,available],{$AGENT.TIMEOUT})=2 |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Databases discovery | Discovery of Exchange databases. |
Zabbix agent (active) | perf_instance.discovery["MSExchange Active Manager"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Active Manager [{#INSTANCE}]: Database copy role | Database copy active or passive role. |
Zabbix agent (active) | perfcounteren["\MSExchange Active Manager({#INSTANCE})\Database Copy Role Active"] Preprocessing
|
Information Store [{#INSTANCE}]: Database state | Database state. Possible values: 0: Database without any copy and dismounted. 1: Database is a primary database and mounted. 2: Database is a passive copy and the state is healthy. |
Zabbix agent (active) | perfcounteren["\MSExchangeIS Store({#INSTANCE})\Database State"] Preprocessing
|
Information Store [{#INSTANCE}]: Active mailboxes count | Number of active mailboxes in this database. |
Zabbix agent (active) | perfcounteren["\MSExchangeIS Store({#INSTANCE})\Active mailboxes"] |
Information Store [{#INSTANCE}]: Page faults per second | Indicates the rate of page faults that can't be serviced because there are no pages available for allocation from the database cache. If this counter is above 0, it's an indication that the MSExchange Database\I/O Database Writes (Attached) Average Latency is too high. |
Zabbix agent (active) | perfcounteren["\MSExchange Database({#INF.STORE})\Database Page Fault Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Information Store [{#INSTANCE}]: Log records stalled | Indicates the number of log records that can't be added to the log buffers per second because the log buffers are full. The average value should be below 10 per second. Spikes (maximum values) shouldn't be higher than 100 per second. |
Zabbix agent (active) | perfcounteren["\MSExchange Database({#INF.STORE})\Log Record Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Information Store [{#INSTANCE}]: Log threads waiting | Indicates the number of threads waiting to complete an update of the database by writing their data to the log. |
Zabbix agent (active) | perfcounteren["\MSExchange Database({#INF.STORE})\Log Threads Waiting", {$MS.EXCHANGE.PERF.INTERVAL}] |
Information Store [{#INSTANCE}]: RPC requests per second | Shows the number of RPC operations per second for each database instance. |
Zabbix agent (active) | perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC Operations/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Information Store [{#INSTANCE}]: RPC requests latency | RPC Latency average is the average latency of RPC requests per database. Average is calculated over all RPCs since exrpc32 was loaded. Should be less than 50ms at all times, with spikes less than 100ms. |
Zabbix agent (active) | perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Information Store [{#INSTANCE}]: RPC requests total | Indicates the overall RPC requests currently executing within the information store process. Should be below 70 at all times. |
Zabbix agent (active) | perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC requests", {$MS.EXCHANGE.PERF.INTERVAL}] |
Database Counters [{#INSTANCE}]: Active database read operations per second | Shows the number of database read operations. |
Zabbix agent (active) | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached)/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Database Counters [{#INSTANCE}]: Active database read operations latency | Shows the average length of time per database read operation. Should be less than 20 ms on average. |
Zabbix agent (active) | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Database Counters [{#INSTANCE}]: Passive database read operations latency | Shows the average length of time per passive database read operation. Should be less than 200ms on average. |
Zabbix agent (active) | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Database Counters [{#INSTANCE}]: Active database write operations per second | Shows the number of database write operations per second for each attached database instance. |
Zabbix agent (active) | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached)/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Database Counters [{#INSTANCE}]: Active database write operations latency | Shows the average length of time per database write operation. Should be less than 50ms on average. |
Zabbix agent (active) | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Database Counters [{#INSTANCE}]: Passive database write operations latency | Shows the average length of time, in ms, per passive database write operation. Should be less than the read latency for the same instance, as measured by the MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency counter. |
Zabbix agent (active) | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Information Store [{#INSTANCE}]: Page faults is too high | Too much page faults stalls for database "{#INSTANCE}". This counter should be 0 on production servers. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database({#INF.STORE})\Database Page Fault Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.FAULTS.TIME})>{$MS.EXCHANGE.DB.FAULTS.WARN} |Average |
||
Information Store [{#INSTANCE}]: Log records stalls is too high | Stalled log records too high. The average value should be less than 10 threads waiting. |
avg(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database({#INF.STORE})\Log Record Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LOG.STALLS.TIME})>{$MS.EXCHANGE.LOG.STALLS.WARN} |Average |
||
Information Store [{#INSTANCE}]: RPC Requests latency is too high | Should be less than 50ms at all times, with spikes less than 100ms. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchangeIS Store({#INSTANCE})\RPC Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.RPC.TIME})>{$MS.EXCHANGE.RPC.WARN} |Warning |
||
Information Store [{#INSTANCE}]: RPC Requests total count is too high | Should be below 70 at all times. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchangeIS Store({#INSTANCE})\RPC requests", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.RPC.COUNT.TIME})>{$MS.EXCHANGE.RPC.COUNT.WARN} |Warning |
||
Database Counters [{#INSTANCE}]: Average read time latency is too high | Should be less than 20ms on average. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.ACTIVE.READ.TIME})>{$MS.EXCHANGE.DB.ACTIVE.READ.WARN} |Warning |
||
Database Counters [{#INSTANCE}]: Average read time latency is too high | Should be less than 200ms on average. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.READ.TIME})>{$MS.EXCHANGE.DB.PASSIVE.READ.WARN} |Warning |
||
Database Counters [{#INSTANCE}]: Average write time latency is too high for {$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME} | Should be less than 50ms on average. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME})>{$MS.EXCHANGE.DB.ACTIVE.WRITE.WARN} |Warning |
||
Database Counters [{#INSTANCE}]: Average write time latency is higher than read time latency for {$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME} | Should be less than the read latency for the same instance, as measured by the MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency counter. |
avg(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME})>avg(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME}) |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Web services discovery | Discovery of Exchange web services. |
Zabbix agent (active) | perfinstanceen.discovery["Web Service"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Web Service [{#INSTANCE}]: Current connections | Shows the current number of connections established to the each Web Service. |
Zabbix agent (active) | perfcounteren["\Web Service({#INSTANCE})\Current Connections", {$MS.EXCHANGE.PERF.INTERVAL}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
LDAP discovery | Discovery of domain controller. |
Zabbix agent (active) | perfinstanceen.discovery["MSExchange ADAccess Domain Controllers"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Domain Controller [{#INSTANCE}]: Read time | Time that it takes to send an LDAP read request to the domain controller in question and get a response. Should ideally be below 50 ms; spikes below 100 ms are acceptable. |
Zabbix agent (active) | perfcounteren["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Read Time", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Domain Controller [{#INSTANCE}]: Search time | Time that it takes to send an LDAP search request and get a response. Should ideally be below 50 ms; spikes below 100 ms are acceptable. |
Zabbix agent (active) | perfcounteren["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Search Time", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Domain Controller [{#INSTANCE}]: LDAP read time is too high | Should be less than 50ms at all times, with spikes less than 100ms. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Read Time", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LDAP.TIME})>{$MS.EXCHANGE.LDAP.WARN} |Average |
||
Domain Controller [{#INSTANCE}]: LDAP search time is too high | Should be less than 50ms at all times, with spikes less than 100ms. |
min(/Microsoft Exchange Server 2016 by Zabbix agent active/perf_counter_en["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Search Time", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LDAP.TIME})>{$MS.EXCHANGE.LDAP.WARN} |Average |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Official Template for Microsoft Exchange Server 2016.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Metrics are collected by Zabbix agent.
1. Import the template into Zabbix.
2. Link the imported template to a host with MS Exchange.
Note that template doesn't provide information about Windows services state. Recommended to use it with "OS Windows by Zabbix agent" template.
Name | Description | Default |
---|---|---|
{$MS.EXCHANGE.PERF.INTERVAL} | Update interval for perfcounteren items. |
60 |
{$MS.EXCHANGE.DB.FAULTS.TIME} | The time during which the database page faults may exceed the threshold. |
5m |
{$MS.EXCHANGE.DB.FAULTS.WARN} | Threshold for database page faults trigger. |
0 |
{$MS.EXCHANGE.LOG.STALLS.TIME} | The time during which the log records stalled may exceed the threshold. |
10m |
{$MS.EXCHANGE.LOG.STALLS.WARN} | Threshold for log records stalled trigger. |
100 |
{$MS.EXCHANGE.DB.ACTIVE.READ.TIME} | The time during which the active database read operations latency may exceed the threshold. |
5m |
{$MS.EXCHANGE.DB.ACTIVE.READ.WARN} | Threshold for active database read operations latency trigger. |
0.02 |
{$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME} | The time during which the active database write operations latency may exceed the threshold. |
10m |
{$MS.EXCHANGE.DB.ACTIVE.WRITE.WARN} | Threshold for active database write operations latency trigger. |
0.05 |
{$MS.EXCHANGE.DB.PASSIVE.READ.TIME} | The time during which the passive database read operations latency may exceed the threshold. |
5m |
{$MS.EXCHANGE.DB.PASSIVE.READ.WARN} | Threshold for passive database read operations latency trigger. |
0.2 |
{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME} | The time during which the passive database write operations latency may exceed the threshold. |
10m |
{$MS.EXCHANGE.RPC.TIME} | The time during which the RPC requests latency may exceed the threshold. |
10m |
{$MS.EXCHANGE.RPC.WARN} | Threshold for RPC requests latency trigger. |
0.05 |
{$MS.EXCHANGE.RPC.COUNT.TIME} | The time during which the RPC total requests may exceed the threshold. |
5m |
{$MS.EXCHANGE.RPC.COUNT.WARN} | Threshold for LDAP triggers. |
70 |
{$MS.EXCHANGE.LDAP.TIME} | The time during which the LDAP metrics may exceed the threshold. |
5m |
{$MS.EXCHANGE.LDAP.WARN} | Threshold for LDAP triggers. |
0.05 |
Name | Description | Type | Key and additional info |
---|---|---|---|
MS Exchange: Databases total mounted | Shows the number of active database copies on the server. |
Zabbix agent | perfcounteren["\MSExchange Active Manager(_total)\Database Mounted"] Preprocessing
|
MS Exchange [Client Access Server]: ActiveSync: ping command pending | Shows the number of ping commands currently pending in the queue. |
Zabbix agent | perfcounteren["\MSExchange ActiveSync\Ping Commands Pending", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange [Client Access Server]: ActiveSync: requests per second | Shows the number of HTTP requests received from the client via ASP.NET per second. Determines the current Exchange ActiveSync request rate. Used only to determine current user load. |
Zabbix agent | perfcounteren["\MSExchange ActiveSync\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange [Client Access Server]: ActiveSync: sync commands per second | Shows the number of sync commands processed per second. Clients use this command to synchronize items within a folder. |
Zabbix agent | perfcounteren["\MSExchange ActiveSync\Sync Commands/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange [Client Access Server]: Autodiscover: requests per second | Shows the number of Autodiscover service requests processed each second. Determines current user load. |
Zabbix agent | perfcounteren["\MSExchangeAutodiscover\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange [Client Access Server]: Availability Service: availability requests per second | Shows the number of requests serviced per second. The request can be only for free/ busy information or include suggestions. One request may contain multiple mailboxes. Determines the rate at which Availability service requests are occurring. |
Zabbix agent | perfcounteren["\MSExchange Availability Service\Availability Requests (sec)", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange [Client Access Server]: Outlook Web App: current unique users | Shows the number of unique users currently logged on to Outlook Web App. This value monitors the number of unique active user sessions, so that users are only removed from this counter after they log off or their session times out. Determines current user load. |
Zabbix agent | perfcounteren["\MSExchange OWA\Current Unique Users", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange [Client Access Server]: Outlook Web App: requests per second | Shows the number of requests handled by Outlook Web App per second. Determines current user load. |
Zabbix agent | perfcounteren["\MSExchange OWA\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
MS Exchange [Client Access Server]: MSExchangeWS: requests per second | Shows the number of requests processed each second. Determines current user load. |
Zabbix agent | perfcounteren["\MSExchangeWS\Requests/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Databases discovery | Discovery of Exchange databases. |
Zabbix agent | perf_instance.discovery["MSExchange Active Manager"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Active Manager [{#INSTANCE}]: Database copy role | Database copy active or passive role. |
Zabbix agent | perfcounteren["\MSExchange Active Manager({#INSTANCE})\Database Copy Role Active"] Preprocessing
|
Information Store [{#INSTANCE}]: Database state | Database state. Possible values: 0: Database without any copy and dismounted. 1: Database is a primary database and mounted. 2: Database is a passive copy and the state is healthy. |
Zabbix agent | perfcounteren["\MSExchangeIS Store({#INSTANCE})\Database State"] Preprocessing
|
Information Store [{#INSTANCE}]: Active mailboxes count | Number of active mailboxes in this database. |
Zabbix agent | perfcounteren["\MSExchangeIS Store({#INSTANCE})\Active mailboxes"] |
Information Store [{#INSTANCE}]: Page faults per second | Indicates the rate of page faults that can't be serviced because there are no pages available for allocation from the database cache. If this counter is above 0, it's an indication that the MSExchange Database\I/O Database Writes (Attached) Average Latency is too high. |
Zabbix agent | perfcounteren["\MSExchange Database({#INF.STORE})\Database Page Fault Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Information Store [{#INSTANCE}]: Log records stalled | Indicates the number of log records that can't be added to the log buffers per second because the log buffers are full. The average value should be below 10 per second. Spikes (maximum values) shouldn't be higher than 100 per second. |
Zabbix agent | perfcounteren["\MSExchange Database({#INF.STORE})\Log Record Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Information Store [{#INSTANCE}]: Log threads waiting | Indicates the number of threads waiting to complete an update of the database by writing their data to the log. |
Zabbix agent | perfcounteren["\MSExchange Database({#INF.STORE})\Log Threads Waiting", {$MS.EXCHANGE.PERF.INTERVAL}] |
Information Store [{#INSTANCE}]: RPC requests per second | Shows the number of RPC operations per second for each database instance. |
Zabbix agent | perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC Operations/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Information Store [{#INSTANCE}]: RPC requests latency | RPC Latency average is the average latency of RPC requests per database. Average is calculated over all RPCs since exrpc32 was loaded. Should be less than 50ms at all times, with spikes less than 100ms. |
Zabbix agent | perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Information Store [{#INSTANCE}]: RPC requests total | Indicates the overall RPC requests currently executing within the information store process. Should be below 70 at all times. |
Zabbix agent | perfcounteren["\MSExchangeIS Store({#INSTANCE})\RPC requests", {$MS.EXCHANGE.PERF.INTERVAL}] |
Database Counters [{#INSTANCE}]: Active database read operations per second | Shows the number of database read operations. |
Zabbix agent | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached)/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Database Counters [{#INSTANCE}]: Active database read operations latency | Shows the average length of time per database read operation. Should be less than 20 ms on average. |
Zabbix agent | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Database Counters [{#INSTANCE}]: Passive database read operations latency | Shows the average length of time per passive database read operation. Should be less than 200ms on average. |
Zabbix agent | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Database Counters [{#INSTANCE}]: Active database write operations per second | Shows the number of database write operations per second for each attached database instance. |
Zabbix agent | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached)/sec", {$MS.EXCHANGE.PERF.INTERVAL}] |
Database Counters [{#INSTANCE}]: Active database write operations latency | Shows the average length of time per database write operation. Should be less than 50ms on average. |
Zabbix agent | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Database Counters [{#INSTANCE}]: Passive database write operations latency | Shows the average length of time, in ms, per passive database write operation. Should be less than the read latency for the same instance, as measured by the MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency counter. |
Zabbix agent | perfcounteren["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Information Store [{#INSTANCE}]: Page faults is too high | Too much page faults stalls for database "{#INSTANCE}". This counter should be 0 on production servers. |
min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database({#INF.STORE})\Database Page Fault Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.FAULTS.TIME})>{$MS.EXCHANGE.DB.FAULTS.WARN} |Average |
||
Information Store [{#INSTANCE}]: Log records stalls is too high | Stalled log records too high. The average value should be less than 10 threads waiting. |
avg(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database({#INF.STORE})\Log Record Stalls/sec", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LOG.STALLS.TIME})>{$MS.EXCHANGE.LOG.STALLS.WARN} |Average |
||
Information Store [{#INSTANCE}]: RPC Requests latency is too high | Should be less than 50ms at all times, with spikes less than 100ms. |
min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchangeIS Store({#INSTANCE})\RPC Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.RPC.TIME})>{$MS.EXCHANGE.RPC.WARN} |Warning |
||
Information Store [{#INSTANCE}]: RPC Requests total count is too high | Should be below 70 at all times. |
min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchangeIS Store({#INSTANCE})\RPC requests", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.RPC.COUNT.TIME})>{$MS.EXCHANGE.RPC.COUNT.WARN} |Warning |
||
Database Counters [{#INSTANCE}]: Average read time latency is too high | Should be less than 20ms on average. |
min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.ACTIVE.READ.TIME})>{$MS.EXCHANGE.DB.ACTIVE.READ.WARN} |Warning |
||
Database Counters [{#INSTANCE}]: Average read time latency is too high | Should be less than 200ms on average. |
min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.READ.TIME})>{$MS.EXCHANGE.DB.PASSIVE.READ.WARN} |Warning |
||
Database Counters [{#INSTANCE}]: Average write time latency is too high for {$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME} | Should be less than 50ms on average. |
min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Attached) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.ACTIVE.WRITE.TIME})>{$MS.EXCHANGE.DB.ACTIVE.WRITE.WARN} |Warning |
||
Database Counters [{#INSTANCE}]: Average write time latency is higher than read time latency for {$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME} | Should be less than the read latency for the same instance, as measured by the MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency counter. |
avg(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Writes (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME})>avg(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange Database ==> Instances({#INF.STORE}/_Total)\I/O Database Reads (Recovery) Average Latency", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.DB.PASSIVE.WRITE.TIME}) |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Web services discovery | Discovery of Exchange web services. |
Zabbix agent | perfinstanceen.discovery["Web Service"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Web Service [{#INSTANCE}]: Current connections | Shows the current number of connections established to the each Web Service. |
Zabbix agent | perfcounteren["\Web Service({#INSTANCE})\Current Connections", {$MS.EXCHANGE.PERF.INTERVAL}] |
Name | Description | Type | Key and additional info |
---|---|---|---|
LDAP discovery | Discovery of domain controller. |
Zabbix agent | perfinstanceen.discovery["MSExchange ADAccess Domain Controllers"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Domain Controller [{#INSTANCE}]: Read time | Time that it takes to send an LDAP read request to the domain controller in question and get a response. Should ideally be below 50 ms; spikes below 100 ms are acceptable. |
Zabbix agent | perfcounteren["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Read Time", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Domain Controller [{#INSTANCE}]: Search time | Time that it takes to send an LDAP search request and get a response. Should ideally be below 50 ms; spikes below 100 ms are acceptable. |
Zabbix agent | perfcounteren["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Search Time", {$MS.EXCHANGE.PERF.INTERVAL}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Domain Controller [{#INSTANCE}]: LDAP read time is too high | Should be less than 50ms at all times, with spikes less than 100ms. |
min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Read Time", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LDAP.TIME})>{$MS.EXCHANGE.LDAP.WARN} |Average |
||
Domain Controller [{#INSTANCE}]: LDAP search time is too high | Should be less than 50ms at all times, with spikes less than 100ms. |
min(/Microsoft Exchange Server 2016 by Zabbix agent/perf_counter_en["\MSExchange ADAccess Domain Controllers({#INSTANCE})\LDAP Search Time", {$MS.EXCHANGE.PERF.INTERVAL}],{$MS.EXCHANGE.LDAP.TIME})>{$MS.EXCHANGE.LDAP.WARN} |Average |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed to monitor etcd
by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
The template Etcd by HTTP
— collects metrics by help of the HTTP agent from /metrics
endpoint.
Refer to the
vendor documentation
.
For the users of etcd version <= 3.4
!
In
etcd v3.5
some metrics have been deprecated. See more details onUpgrade etcd from 3.4 to 3.5
. Please upgrade youretcd
instance, or use olderEtcd by HTTP
template version.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Make sure that etcd
allows the collection of metrics. You can test it by running: curl -L http://localhost:2379/metrics
.
Check if etcd
is accessible from Zabbix proxy or Zabbix server depending on where you are planning to do the monitoring. To verify it, run curl -L http://<etcd_node_address>:2379/metrics
.
Add the template to the etcd
node. Set the hostname or IP address of the etcd
host in the {$ETCD.HOST}
macro. By default, the template uses a client's port.
You can configure metrics endpoint location by adding --listen-metrics-urls
flag.
For more details, see the etcd documentation
.
Additional points to consider:
etcd
, don't forget to change macros: {$ETCD.SCHEME}
and {$ETCD.PORT}
.{$ETCD.USERNAME}
and {$ETCD.PASSWORD}
macros in the template to use on a host level if necessary.zabbix_get -s etcd-host -k etcd.health
.Name | Description | Default |
---|---|---|
{$ETCD.HOST} | The hostname or IP address of the |
<SET ETCD HOST> |
{$ETCD.PORT} | The port of the |
2379 |
{$ETCD.SCHEME} | The request scheme which may be |
http |
{$ETCD.USER} | ||
{$ETCD.PASSWORD} | ||
{$ETCD.LEADER.CHANGES.MAX.WARN} | The maximum number of leader changes. |
5 |
{$ETCD.PROPOSAL.FAIL.MAX.WARN} | The maximum number of proposal failures. |
2 |
{$ETCD.HTTP.FAIL.MAX.WARN} | The maximum number of HTTP request failures. |
2 |
{$ETCD.PROPOSAL.PENDING.MAX.WARN} | The maximum number of proposals in queue. |
5 |
{$ETCD.OPEN.FDS.MAX.WARN} | The maximum percentage of used file descriptors. |
90 |
{$ETCD.GRPC_CODE.MATCHES} | The filter of discoverable gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md. |
.* |
{$ETCD.GRPCCODE.NOTMATCHES} | The filter to exclude discovered gRPC codes. See more details on https://github.com/grpc/grpc/blob/master/doc/statuscodes.md. |
CHANGE_IF_NEEDED |
{$ETCD.GRPC.ERRORS.MAX.WARN} | The maximum number of gRPC request failures. |
1 |
{$ETCD.GRPC_CODE.TRIGGER.MATCHES} | The filter of discoverable gRPC codes, which will create triggers. |
Aborted|Unavailable |
Name | Description | Type | Key and additional info |
---|---|---|---|
Etcd: Service's TCP port state | Simple check | net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"] Preprocessing
|
|
Etcd: Get node metrics | HTTP agent | etcd.get_metrics | |
Etcd: Node health | HTTP agent | etcd.health Preprocessing
|
|
Etcd: Server is a leader | It defines - whether or not this member is a leader: 1 - it is; 0 - otherwise. |
Dependent item | etcd.is.leader Preprocessing
|
Etcd: Server has a leader | It defines - whether or not a leader exists: 1 - it exists; 0 - it does not. |
Dependent item | etcd.has.leader Preprocessing
|
Etcd: Leader changes | The number of leader changes the member has seen since its start. |
Dependent item | etcd.leader.changes Preprocessing
|
Etcd: Proposals committed per second | The number of consensus proposals committed. |
Dependent item | etcd.proposals.committed.rate Preprocessing
|
Etcd: Proposals applied per second | The number of consensus proposals applied. |
Dependent item | etcd.proposals.applied.rate Preprocessing
|
Etcd: Proposals failed per second | The number of failed proposals seen. |
Dependent item | etcd.proposals.failed.rate Preprocessing
|
Etcd: Proposals pending | The current number of pending proposals to commit. |
Dependent item | etcd.proposals.pending Preprocessing
|
Etcd: Reads per second | The number of read actions by |
Dependent item | etcd.reads.rate Preprocessing
|
Etcd: Writes per second | The number of writes (e.g., |
Dependent item | etcd.writes.rate Preprocessing
|
Etcd: Client gRPC received bytes per second | The number of bytes received from gRPC clients per second. |
Dependent item | etcd.network.grpc.received.rate Preprocessing
|
Etcd: Client gRPC sent bytes per second | The number of bytes sent from gRPC clients per second. |
Dependent item | etcd.network.grpc.sent.rate Preprocessing
|
Etcd: HTTP requests received | The number of requests received into the system (successfully parsed and |
Dependent item | etcd.http.requests.rate Preprocessing
|
Etcd: HTTP 5XX | The number of handled failures of requests (non-watches), by the method ( |
Dependent item | etcd.http.requests.5xx.rate Preprocessing
|
Etcd: HTTP 4XX | The number of handled failures of requests (non-watches), by the method ( |
Dependent item | etcd.http.requests.4xx.rate Preprocessing
|
Etcd: RPCs received per second | The number of RPC stream messages received on the server. |
Dependent item | etcd.grpc.received.rate Preprocessing
|
Etcd: RPCs sent per second | The number of gRPC stream messages sent by the server. |
Dependent item | etcd.grpc.sent.rate Preprocessing
|
Etcd: RPCs started per second | The number of RPCs started on the server. |
Dependent item | etcd.grpc.started.rate Preprocessing
|
Etcd: Get version | HTTP agent | etcd.get_version | |
Etcd: Server version | The version of the |
Dependent item | etcd.server.version Preprocessing
|
Etcd: Cluster version | The version of the |
Dependent item | etcd.cluster.version Preprocessing
|
Etcd: DB size | The total size of the underlying database. |
Dependent item | etcd.db.size Preprocessing
|
Etcd: Keys compacted per second | The number of DB keys compacted per second. |
Dependent item | etcd.keys.compacted.rate Preprocessing
|
Etcd: Keys expired per second | The number of expired keys per second. |
Dependent item | etcd.keys.expired.rate Preprocessing
|
Etcd: Keys total | The total number of keys. |
Dependent item | etcd.keys.total Preprocessing
|
Etcd: Uptime |
|
Dependent item | etcd.uptime Preprocessing
|
Etcd: Virtual memory | The size of virtual memory expressed in bytes. |
Dependent item | etcd.virtual.bytes Preprocessing
|
Etcd: Resident memory | The size of resident memory expressed in bytes. |
Dependent item | etcd.res.bytes Preprocessing
|
Etcd: CPU | The total user and system CPU time spent in seconds. |
Dependent item | etcd.cpu.util Preprocessing
|
Etcd: Open file descriptors | The number of open file descriptors. |
Dependent item | etcd.open.fds Preprocessing
|
Etcd: Maximum open file descriptors | The Maximum number of open file descriptors. |
Dependent item | etcd.max.fds Preprocessing
|
Etcd: Deletes per second | The number of deletes seen by this member per second. |
Dependent item | etcd.delete.rate Preprocessing
|
Etcd: PUT per second | The number of puts seen by this member per second. |
Dependent item | etcd.put.rate Preprocessing
|
Etcd: Range per second | The number of ranges seen by this member per second. |
Dependent item | etcd.range.rate Preprocessing
|
Etcd: Transaction per second | The number of transactions seen by this member per second. |
Dependent item | etcd.txn.rate Preprocessing
|
Etcd: Pending events | The total number of pending events to be sent. |
Dependent item | etcd.events.sent.rate Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Etcd: Service is unavailable | last(/Etcd by HTTP/net.tcp.service["{$ETCD.SCHEME}","{$ETCD.HOST}","{$ETCD.PORT}"])=0 |Average |
Manual close: Yes | ||
Etcd: Node healthcheck failed | See more details on https://etcd.io/docs/v3.5/op-guide/monitoring/#health-check. |
last(/Etcd by HTTP/etcd.health)=0 |Average |
Depends on:
|
|
Etcd: Failed to fetch info data | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Etcd by HTTP/etcd.is.leader,30m)=1 |Warning |
Manual close: Yes Depends on:
|
|
Etcd: Member has no leader | If a member does not have a leader, it is totally unavailable. |
last(/Etcd by HTTP/etcd.has.leader)=0 |Average |
||
Etcd: Instance has seen too many leader changes | Rapid leadership changes impact the performance of |
(max(/Etcd by HTTP/etcd.leader.changes,15m)-min(/Etcd by HTTP/etcd.leader.changes,15m))>{$ETCD.LEADER.CHANGES.MAX.WARN} |Warning |
||
Etcd: Too many proposal failures | Normally related to two issues: temporary failures related to a leader election or longer downtime caused by a loss of quorum in the cluster. |
min(/Etcd by HTTP/etcd.proposals.failed.rate,5m)>{$ETCD.PROPOSAL.FAIL.MAX.WARN} |Warning |
||
Etcd: Too many proposals are queued to commit | Rising pending proposals suggests there is a high client load, or the member cannot commit proposals. |
min(/Etcd by HTTP/etcd.proposals.pending,5m)>{$ETCD.PROPOSAL.PENDING.MAX.WARN} |Warning |
||
Etcd: Too many HTTP requests failures | Too many requests failed on |
min(/Etcd by HTTP/etcd.http.requests.5xx.rate,5m)>{$ETCD.HTTP.FAIL.MAX.WARN} |Warning |
||
Etcd: Server version has changed | Etcd version has changed. Acknowledge to close the problem manually. |
last(/Etcd by HTTP/etcd.server.version,#1)<>last(/Etcd by HTTP/etcd.server.version,#2) and length(last(/Etcd by HTTP/etcd.server.version))>0 |Info |
Manual close: Yes | |
Etcd: Cluster version has changed | Etcd version has changed. Acknowledge to close the problem manually. |
last(/Etcd by HTTP/etcd.cluster.version,#1)<>last(/Etcd by HTTP/etcd.cluster.version,#2) and length(last(/Etcd by HTTP/etcd.cluster.version))>0 |Info |
Manual close: Yes | |
Etcd: Host has been restarted | Uptime is less than 10 minutes. |
last(/Etcd by HTTP/etcd.uptime)<10m |Info |
Manual close: Yes | |
Etcd: Current number of open files is too high | Heavy usage of a file descriptor (i.e., near the limit of the process's file descriptor) indicates a potential file descriptor exhaustion issue. |
min(/Etcd by HTTP/etcd.open.fds,5m)/last(/Etcd by HTTP/etcd.max.fds)*100>{$ETCD.OPEN.FDS.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
gRPC codes discovery | Dependent item | etcd.grpc_code.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Etcd: RPCs completed with code {#GRPC.CODE} | The number of RPCs completed on the server with grpc_code {#GRPC.CODE}. |
Dependent item | etcd.grpc.handled.rate[{#GRPC.CODE}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Etcd: Too many failed gRPC requests with code: {#GRPC.CODE} | min(/Etcd by HTTP/etcd.grpc.handled.rate[{#GRPC.CODE}],5m)>{$ETCD.GRPC.ERRORS.MAX.WARN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Peers discovery | Dependent item | etcd.peer.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Etcd: Etcd peer {#ETCD.PEER}: Bytes sent | The number of bytes sent to a peer with the ID |
Dependent item | etcd.bytes.sent.rate[{#ETCD.PEER}] Preprocessing
|
Etcd: Etcd peer {#ETCD.PEER}: Bytes received | The number of bytes received from a peer with the ID |
Dependent item | etcd.bytes.received.rate[{#ETCD.PEER}] Preprocessing
|
Etcd: Etcd peer {#ETCD.PEER}: Send failures | The number of sent failures from a peer with the ID |
Dependent item | etcd.sent.fail.rate[{#ETCD.PEER}] Preprocessing
|
Etcd: Etcd peer {#ETCD.PEER}: Receive failures | The number of received failures from a peer with the ID |
Dependent item | etcd.received.fail.rate[{#ETCD.PEER}] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Envoy Proxy by Zabbix that works without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Envoy Proxy by HTTP
- collects metrics by HTTP agent from metrics endpoint {$ENVOY.METRICS.PATH} endpoint (default: /stats/prometheus).
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Internal service metrics are collected from {$ENVOY.METRICS.PATH} endpoint (default: /stats/prometheus). https://www.envoyproxy.io/docs/envoy/v1.20.0/operations/stats_overview
Don't forget to change macros {$ENVOY.URL}, {$ENVOY.METRICS.PATH}. Also, see the Macros section for a list of macros used to set trigger values.
NOTE. Some metrics may not be collected depending on your Envoy Proxy instance version and configuration.
Name | Description | Default |
---|---|---|
{$ENVOY.URL} | Instance URL. |
http://localhost:9901 |
{$ENVOY.METRICS.PATH} | The path Zabbix will scrape metrics in prometheus format from. |
/stats/prometheus |
{$ENVOY.CERT.MIN} | Minimum number of days before certificate expiration used for trigger expression. |
7 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Envoy Proxy: Get node metrics | Get server metrics. |
HTTP agent | envoy.get_metrics Preprocessing
|
Envoy Proxy: Server state | State of the server. Live - (default) Server is live and serving traffic. Draining - Server is draining listeners in response to external health checks failing. Pre initializing - Server has not yet completed cluster manager initialization. Initializing - Server is running the cluster manager initialization callbacks (e.g., RDS). |
Dependent item | envoy.server.state Preprocessing
|
Envoy Proxy: Server live | 1 if the server is not currently draining, 0 otherwise. |
Dependent item | envoy.server.live Preprocessing
|
Envoy Proxy: Uptime | Current server uptime in seconds. |
Dependent item | envoy.server.uptime Preprocessing
|
Envoy Proxy: Certificate expiration, day before | Number of days until the next certificate being managed will expire. |
Dependent item | envoy.server.daysuntilfirstcertexpiring Preprocessing
|
Envoy Proxy: Server concurrency | Number of worker threads. |
Dependent item | envoy.server.concurrency Preprocessing
|
Envoy Proxy: Memory allocated | Current amount of allocated memory in bytes. Total of both new and old Envoy processes on hot restart. |
Dependent item | envoy.server.memory_allocated Preprocessing
|
Envoy Proxy: Memory heap size | Current reserved heap size in bytes. New Envoy process heap size on hot restart. |
Dependent item | envoy.server.memoryheapsize Preprocessing
|
Envoy Proxy: Memory physical size | Current estimate of total bytes of the physical memory. New Envoy process physical memory size on hot restart. |
Dependent item | envoy.server.memoryphysicalsize Preprocessing
|
Envoy Proxy: Filesystem, flushed by timer rate | Total number of times internal flush buffers are written to a file due to flush timeout per second. |
Dependent item | envoy.filesystem.flushedbytimer.rate Preprocessing
|
Envoy Proxy: Filesystem, write completed rate | Total number of times a file was written per second. |
Dependent item | envoy.filesystem.write_completed.rate Preprocessing
|
Envoy Proxy: Filesystem, write failed rate | Total number of times an error occurred during a file write operation per second. |
Dependent item | envoy.filesystem.write_failed.rate Preprocessing
|
Envoy Proxy: Filesystem, reopen failed rate | Total number of times a file was failed to be opened per second. |
Dependent item | envoy.filesystem.reopen_failed.rate Preprocessing
|
Envoy Proxy: Connections, total | Total connections of both new and old Envoy processes. |
Dependent item | envoy.server.total_connections Preprocessing
|
Envoy Proxy: Connections, parent | Total connections of the old Envoy process on hot restart. |
Dependent item | envoy.server.parent_connections Preprocessing
|
Envoy Proxy: Clusters, warming | Number of currently warming (not active) clusters. |
Dependent item | envoy.clustermanager.warmingclusters Preprocessing
|
Envoy Proxy: Clusters, active | Number of currently active (warmed) clusters. |
Dependent item | envoy.clustermanager.activeclusters Preprocessing
|
Envoy Proxy: Clusters, added rate | Total clusters added (either via static config or CDS) per second. |
Dependent item | envoy.clustermanager.clusteradded.rate Preprocessing
|
Envoy Proxy: Clusters, modified rate | Total clusters modified (via CDS) per second. |
Dependent item | envoy.clustermanager.clustermodified.rate Preprocessing
|
Envoy Proxy: Clusters, removed rate | Total clusters removed (via CDS) per second. |
Dependent item | envoy.clustermanager.clusterremoved.rate Preprocessing
|
Envoy Proxy: Clusters, updates rate | Total cluster updates per second. |
Dependent item | envoy.clustermanager.clusterupdated.rate Preprocessing
|
Envoy Proxy: Listeners, active | Number of currently active listeners. |
Dependent item | envoy.listenermanager.totallisteners_active Preprocessing
|
Envoy Proxy: Listeners, draining | Number of currently draining listeners. |
Dependent item | envoy.listenermanager.totallisteners_draining Preprocessing
|
Envoy Proxy: Listener, warming | Number of currently warming listeners. |
Dependent item | envoy.listenermanager.totallisteners_warming Preprocessing
|
Envoy Proxy: Listener manager, initialized | A boolean (1 if started and 0 otherwise) that indicates whether listeners have been initialized on workers. |
Dependent item | envoy.listenermanager.workersstarted Preprocessing
|
Envoy Proxy: Listeners, create failure | Total failed listener object additions to workers per second. |
Dependent item | envoy.listenermanager.listenercreate_failure.rate Preprocessing
|
Envoy Proxy: Listeners, create success | Total listener objects successfully added to workers per second. |
Dependent item | envoy.listenermanager.listenercreate_success.rate Preprocessing
|
Envoy Proxy: Listeners, added | Total listeners added (either via static config or LDS) per second. |
Dependent item | envoy.listenermanager.listeneradded.rate Preprocessing
|
Envoy Proxy: Listeners, stopped | Total listeners stopped per second. |
Dependent item | envoy.listenermanager.listenerstopped.rate Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Envoy Proxy: Server state is not live | last(/Envoy Proxy by HTTP/envoy.server.state) > 0 |Average |
|||
Envoy Proxy: Service has been restarted | Uptime is less than 10 minutes. |
last(/Envoy Proxy by HTTP/envoy.server.uptime)<10m |Info |
Manual close: Yes | |
Envoy Proxy: Failed to fetch metrics data | Zabbix has not received data for items for the last 10 minutes. |
nodata(/Envoy Proxy by HTTP/envoy.server.uptime,10m)=1 |Warning |
Manual close: Yes | |
Envoy Proxy: SSL certificate expires soon | Please check certificate. Less than {$ENVOY.CERT.MIN} days left until the next certificate being managed will expire. |
last(/Envoy Proxy by HTTP/envoy.server.days_until_first_cert_expiring)<{$ENVOY.CERT.MIN} |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster metrics discovery | Dependent item | envoy.lld.cluster Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, total | Current cluster membership total. |
Dependent item | envoy.cluster.membershiptotal["{#CLUSTERNAME}"] Preprocessing
|
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, healthy | Current cluster healthy total (inclusive of both health checking and outlier detection). |
Dependent item | envoy.cluster.membershiphealthy["{#CLUSTERNAME}"] Preprocessing
|
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, unhealthy | Current cluster unhealthy. |
Calculated | envoy.cluster.membershipunhealthy["{#CLUSTERNAME}"] |
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Membership, degraded | Current cluster degraded total. |
Dependent item | envoy.cluster.membershipdegraded["{#CLUSTERNAME}"] Preprocessing
|
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Connections, total | Current cluster total connections. |
Dependent item | envoy.cluster.upstreamcxtotal["{#CLUSTER_NAME}"] Preprocessing
|
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Connections, active | Current cluster total active connections. |
Dependent item | envoy.cluster.upstreamcxactive["{#CLUSTER_NAME}"] Preprocessing
|
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests total, rate | Current cluster request total per second. |
Dependent item | envoy.cluster.upstreamrqtotal.rate["{#CLUSTER_NAME}"] Preprocessing
|
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests timeout, rate | Current cluster requests that timed out waiting for a response per second. |
Dependent item | envoy.cluster.upstreamrqtimeout.rate["{#CLUSTER_NAME}"] Preprocessing
|
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests completed, rate | Total upstream requests completed per second. |
Dependent item | envoy.cluster.upstreamrqcompleted.rate["{#CLUSTER_NAME}"] Preprocessing
|
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 2xx, rate | Aggregate HTTP response codes per second. |
Dependent item | envoy.cluster.upstreamrq2x.rate["{#CLUSTER_NAME}"] Preprocessing
|
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 3xx, rate | Aggregate HTTP response codes per second. |
Dependent item | envoy.cluster.upstreamrq3x.rate["{#CLUSTER_NAME}"] Preprocessing
|
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 4xx, rate | Aggregate HTTP response codes per second. |
Dependent item | envoy.cluster.upstreamrq4x.rate["{#CLUSTER_NAME}"] Preprocessing
|
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests 5xx, rate | Aggregate HTTP response codes per second. |
Dependent item | envoy.cluster.upstreamrq5x.rate["{#CLUSTER_NAME}"] Preprocessing
|
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests pending | Total active requests pending a connection pool connection. |
Dependent item | envoy.cluster.upstreamrqpendingactive["{#CLUSTERNAME}"] Preprocessing
|
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Requests active | Total active requests. |
Dependent item | envoy.cluster.upstreamrqactive["{#CLUSTER_NAME}"] Preprocessing
|
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Upstream bytes out, rate | Total sent connection bytes per second. |
Dependent item | envoy.cluster.upstreamcxtxbytestotal.rate["{#CLUSTER_NAME}"] Preprocessing
|
Envoy Proxy: Cluster ["{#CLUSTER_NAME}"]: Upstream bytes in, rate | Total received connection bytes per second. |
Dependent item | envoy.cluster.upstreamcxrxbytestotal.rate["{#CLUSTER_NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Envoy Proxy: There are unhealthy clusters | last(/Envoy Proxy by HTTP/envoy.cluster.membership_unhealthy["{#CLUSTER_NAME}"]) > 0 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
Listeners metrics discovery | Dependent item | envoy.lld.listeners Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Envoy Proxy: Listener ["{#LISTENER_ADDRESS}"]: Connections, active | Total active connections. |
Dependent item | envoy.listener.downstreamcxactive["{#LISTENER_ADDRESS}"] Preprocessing
|
Envoy Proxy: Listener ["{#LISTENER_ADDRESS}"]: Connections, rate | Total connections per second. |
Dependent item | envoy.listener.downstreamcxtotal.rate["{#LISTENER_ADDRESS}"] Preprocessing
|
Envoy Proxy: Listener ["{#LISTENER_ADDRESS}"]: Sockets, undergoing | Sockets currently undergoing listener filter processing. |
Dependent item | envoy.listener.downstreamprecxactive["{#LISTENERADDRESS}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
HTTP metrics discovery | Dependent item | envoy.lld.http Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Requests, rate | Total active connections per second. |
Dependent item | envoy.http.downstreamrqtotal.rate["{#CONN_MANAGER}"] Preprocessing
|
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Requests, active | Total active requests. |
Dependent item | envoy.http.downstreamrqactive["{#CONN_MANAGER}"] Preprocessing
|
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Requests timeout, rate | Total requests closed due to a timeout on the request path per second. |
Dependent item | envoy.http.downstreamrqtimeout["{#CONN_MANAGER}"] Preprocessing
|
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Connections, rate | Total connections per second. |
Dependent item | envoy.http.downstreamcxtotal["{#CONN_MANAGER}"] Preprocessing
|
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Connections, active | Total active connections. |
Dependent item | envoy.http.downstreamcxactive["{#CONN_MANAGER}"] Preprocessing
|
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Bytes in, rate | Total bytes received per second. |
Dependent item | envoy.http.downstreamcxrxbytestotal.rate["{#CONN_MANAGER}"] Preprocessing
|
Envoy Proxy: HTTP ["{#CONN_MANAGER}"]: Bytes out, rate | Total bytes sent per second. |
Dependent item | envoy.http.downstreamcxtxbytestota.rate["{#CONN_MANAGER}"] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Elasticsearch by Zabbix that work without any external scripts.
It works with both standalone and cluster instances.
The metrics are collected in one pass remotely using an HTTP agent.
They are getting values from REST API _cluster/health
, _cluster/stats
, _nodes/stats
requests.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Set the hostname or IP address of the Elasticsearch host in the {$ELASTICSEARCH.HOST}
macro.
Set the login and password in the {$ELASTICSEARCH.USERNAME}
and {$ELASTICSEARCH.PASSWORD}
macros.
If you use an atypical location of ES API, don't forget to change the macros {$ELASTICSEARCH.SCHEME}
,{$ELASTICSEARCH.PORT}
.
Name | Description | Default |
---|---|---|
{$ELASTICSEARCH.USERNAME} | The username of the Elasticsearch. |
|
{$ELASTICSEARCH.PASSWORD} | The password of the Elasticsearch. |
|
{$ELASTICSEARCH.HOST} | The hostname or IP address of the Elasticsearch host. |
<SET ELASTICSEARCH HOST> |
{$ELASTICSEARCH.PORT} | The port of the Elasticsearch host. |
9200 |
{$ELASTICSEARCH.SCHEME} | The scheme of the Elasticsearch (http/https). |
http |
{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} | The ES cluster maximum response time in seconds for trigger expression. |
10s |
{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN} | Maximum of query latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN} | Maximum of fetch latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN} | Maximum of indexing latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN} | Maximum of flush latency in milliseconds for trigger expression. |
100 |
{$ELASTICSEARCH.HEAP_USED.MAX.WARN} | The maximum percent in the use of JVM heap for warning trigger expression. |
85 |
{$ELASTICSEARCH.HEAP_USED.MAX.CRIT} | The maximum percent in the use of JVM heap for critically trigger expression. |
95 |
Name | Description | Type | Key and additional info |
---|---|---|---|
ES: Service status | Checks if the service is running and accepting TCP connections. |
Simple check | net.tcp.service["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"] Preprocessing
|
ES: Service response time | Checks performance of the TCP service. |
Simple check | net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"] |
ES: Get cluster health | Returns the health status of a cluster. |
HTTP agent | es.cluster.get_health |
ES: Cluster health status | Health status of the cluster, based on the state of its primary and replica shards. Statuses are: green All shards are assigned. yellow All primary shards are assigned, but one or more replica shards are unassigned. If a node in the cluster fails, some data could be unavailable until that node is repaired. red One or more primary shards are unassigned, so some data is unavailable. This can occur briefly during cluster startup as primary shards are assigned. |
Dependent item | es.cluster.status Preprocessing
|
ES: Number of nodes | The number of nodes within the cluster. |
Dependent item | es.cluster.numberofnodes Preprocessing
|
ES: Number of data nodes | The number of nodes that are dedicated to data nodes. |
Dependent item | es.cluster.numberofdata_nodes Preprocessing
|
ES: Number of relocating shards | The number of shards that are under relocation. |
Dependent item | es.cluster.relocating_shards Preprocessing
|
ES: Number of initializing shards | The number of shards that are under initialization. |
Dependent item | es.cluster.initializing_shards Preprocessing
|
ES: Number of unassigned shards | The number of shards that are not allocated. |
Dependent item | es.cluster.unassigned_shards Preprocessing
|
ES: Delayed unassigned shards | The number of shards whose allocation has been delayed by the timeout settings. |
Dependent item | es.cluster.delayedunassignedshards Preprocessing
|
ES: Number of pending tasks | The number of cluster-level changes that have not yet been executed. |
Dependent item | es.cluster.numberofpending_tasks Preprocessing
|
ES: Task max waiting in queue | The time expressed in seconds since the earliest initiated task is waiting for being performed. |
Dependent item | es.cluster.taskmaxwaitinginqueue Preprocessing
|
ES: Inactive shards percentage | The ratio of inactive shards in the cluster expressed as a percentage. |
Dependent item | es.cluster.inactiveshardspercentasnumber Preprocessing
|
ES: Get cluster stats | Returns cluster statistics. |
HTTP agent | es.cluster.get_stats |
ES: Cluster uptime | Uptime duration in seconds since JVM has last started. |
Dependent item | es.nodes.jvm.max_uptime Preprocessing
|
ES: Number of non-deleted documents | The total number of non-deleted documents across all primary shards assigned to the selected nodes. This number is based on the documents in Lucene segments and may include the documents from nested fields. |
Dependent item | es.indices.docs.count Preprocessing
|
ES: Indices with shards assigned to nodes | The total number of indices with shards assigned to the selected nodes. |
Dependent item | es.indices.count Preprocessing
|
ES: Total size of all file stores | The total size in bytes of all file stores across all selected nodes. |
Dependent item | es.nodes.fs.totalinbytes Preprocessing
|
ES: Total available size to JVM in all file stores | The total number of bytes available to JVM in the file stores across all selected nodes. Depending on OS or process-level restrictions, this number may be less than nodes.fs.freeinbyes. This is the actual amount of free disk space the selected Elasticsearch nodes can use. |
Dependent item | es.nodes.fs.availableinbytes Preprocessing
|
ES: Nodes with the data role | The number of selected nodes with the data role. |
Dependent item | es.nodes.count.data Preprocessing
|
ES: Nodes with the ingest role | The number of selected nodes with the ingest role. |
Dependent item | es.nodes.count.ingest Preprocessing
|
ES: Nodes with the master role | The number of selected nodes with the master role. |
Dependent item | es.nodes.count.master Preprocessing
|
ES: Get nodes stats | Returns cluster nodes statistics. |
HTTP agent | es.nodes.get_stats |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
ES: Service is down | The service is unavailable or does not accept TCP connections. |
last(/Elasticsearch Cluster by HTTP/net.tcp.service["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"])=0 |Average |
Manual close: Yes | |
ES: Service response time is too high | The performance of the TCP service is very low. |
min(/Elasticsearch Cluster by HTTP/net.tcp.service.perf["{$ELASTICSEARCH.SCHEME}","{$ELASTICSEARCH.HOST}","{$ELASTICSEARCH.PORT}"],5m)>{$ELASTICSEARCH.RESPONSE_TIME.MAX.WARN} |Warning |
Manual close: Yes Depends on:
|
|
ES: Health is YELLOW | All primary shards are assigned, but one or more replica shards are unassigned. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=1 |Average |
||
ES: Health is RED | One or more primary shards are unassigned, so some data is unavailable. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=2 |High |
||
ES: Health is UNKNOWN | The health status of the cluster is unknown or cannot be obtained. |
last(/Elasticsearch Cluster by HTTP/es.cluster.status)=255 |High |
||
ES: The number of nodes within the cluster has decreased | change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)<0 |Info |
Manual close: Yes | ||
ES: The number of nodes within the cluster has increased | change(/Elasticsearch Cluster by HTTP/es.cluster.number_of_nodes)>0 |Info |
Manual close: Yes | ||
ES: Cluster has the initializing shards | The cluster has the initializing shards longer than 10 minutes. |
min(/Elasticsearch Cluster by HTTP/es.cluster.initializing_shards,10m)>0 |Average |
||
ES: Cluster has the unassigned shards | The cluster has the unassigned shards longer than 10 minutes. |
min(/Elasticsearch Cluster by HTTP/es.cluster.unassigned_shards,10m)>0 |Average |
||
ES: Cluster has been restarted | Uptime is less than 10 minutes. |
last(/Elasticsearch Cluster by HTTP/es.nodes.jvm.max_uptime)<10m |Info |
Manual close: Yes | |
ES: Cluster does not have enough space for resharding | There is not enough disk space for index resharding. |
(last(/Elasticsearch Cluster by HTTP/es.nodes.fs.total_in_bytes)-last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes))/(last(/Elasticsearch Cluster by HTTP/es.cluster.number_of_data_nodes)-1)>last(/Elasticsearch Cluster by HTTP/es.nodes.fs.available_in_bytes) |High |
||
ES: Cluster has only two master nodes | The cluster has only two nodes with a master role and will be unavailable if one of them breaks. |
last(/Elasticsearch Cluster by HTTP/es.nodes.count.master)=2 |Disaster |
Name | Description | Type | Key and additional info |
---|---|---|---|
Cluster nodes discovery | Discovery ES cluster nodes. |
HTTP agent | es.nodes.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
ES {#ES.NODE}: Get data | Returns cluster nodes statistics. |
Dependent item | es.node.get.data[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total size | Total size (in bytes) of all file stores. |
Dependent item | es.node.fs.total.totalinbytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total available size | The total number of bytes available to this Java virtual machine on all file stores. Depending on OS or process level restrictions, this might appear less than fs.total.freeinbytes. This is the actual amount of free disk space the Elasticsearch node can utilize. |
Dependent item | es.node.fs.total.availableinbytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Node uptime | JVM uptime in seconds. |
Dependent item | es.node.jvm.uptime[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Maximum JVM memory available for use | The maximum amount of memory, in bytes, available for use by the heap. |
Dependent item | es.node.jvm.mem.heapmaxin_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Amount of JVM heap currently in use | The memory, in bytes, currently in use by the heap. |
Dependent item | es.node.jvm.mem.heapusedin_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Percent of JVM heap currently in use | The percentage of memory currently in use by the heap. |
Dependent item | es.node.jvm.mem.heapusedpercent[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Amount of JVM heap committed | The amount of memory, in bytes, available for use by the heap. |
Dependent item | es.node.jvm.mem.heapcommittedin_bytes[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Number of open HTTP connections | The number of currently open HTTP connections for the node. |
Dependent item | es.node.http.current_open[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Rate of HTTP connections opened | The number of HTTP connections opened for the node per second. |
Dependent item | es.node.http.opened.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent throttling operations | Time in seconds spent throttling operations for the last measuring span. |
Dependent item | es.node.indices.indexing.throttle_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent throttling recovery operations | Time in seconds spent throttling recovery operations for the last measuring span. |
Dependent item | es.node.indices.recovery.throttle_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent throttling merge operations | Time in seconds spent throttling merge operations for the last measuring span. |
Dependent item | es.node.indices.merges.totalthrottledtime[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Rate of queries | The number of query operations per second. |
Dependent item | es.node.indices.search.query.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total number of query | The total number of query operations. |
Dependent item | es.node.indices.search.query_total[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent performing query | Time in seconds spent performing query operations for the last measuring span. |
Dependent item | es.node.indices.search.query_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total time spent performing query | Time in milliseconds spent performing query operations. |
Dependent item | es.node.indices.search.querytimein_millis[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Query latency | The average query latency calculated by sampling the total number of queries and the total elapsed time at regular intervals. |
Calculated | es.node.indices.search.query_latency[{#ES.NODE}] |
ES {#ES.NODE}: Current query operations | The number of query operations currently running. |
Dependent item | es.node.indices.search.query_current[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Rate of fetch | The number of fetch operations per second. |
Dependent item | es.node.indices.search.fetch.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total number of fetch | The total number of fetch operations. |
Dependent item | es.node.indices.search.fetch_total[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent performing fetch | Time in seconds spent performing fetch operations for the last measuring span. |
Dependent item | es.node.indices.search.fetch_time[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total time spent performing fetch | Time in milliseconds spent performing fetch operations. |
Dependent item | es.node.indices.search.fetchtimein_millis[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Fetch latency | The average fetch latency calculated by sampling the total number of fetches and the total elapsed time at regular intervals. |
Calculated | es.node.indices.search.fetch_latency[{#ES.NODE}] |
ES {#ES.NODE}: Current fetch operations | The number of fetch operations currently running. |
Dependent item | es.node.indices.search.fetch_current[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Write thread pool executor tasks completed | The number of tasks completed by the write thread pool executor. |
Dependent item | es.node.thread_pool.write.completed.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Write thread pool active threads | The number of active threads in the write thread pool. |
Dependent item | es.node.thread_pool.write.active[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Write thread pool tasks in queue | The number of tasks in queue for the write thread pool. |
Dependent item | es.node.thread_pool.write.queue[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Write thread pool executor tasks rejected | The number of tasks rejected by the write thread pool executor. |
Dependent item | es.node.thread_pool.write.rejected.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Search thread pool executor tasks completed | The number of tasks completed by the search thread pool executor. |
Dependent item | es.node.thread_pool.search.completed.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Search thread pool active threads | The number of active threads in the search thread pool. |
Dependent item | es.node.thread_pool.search.active[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Search thread pool tasks in queue | The number of tasks in queue for the search thread pool. |
Dependent item | es.node.thread_pool.search.queue[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Search thread pool executor tasks rejected | The number of tasks rejected by the search thread pool executor. |
Dependent item | es.node.thread_pool.search.rejected.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Refresh thread pool executor tasks completed | The number of tasks completed by the refresh thread pool executor. |
Dependent item | es.node.thread_pool.refresh.completed.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Refresh thread pool active threads | The number of active threads in the refresh thread pool. |
Dependent item | es.node.thread_pool.refresh.active[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Refresh thread pool tasks in queue | The number of tasks in queue for the refresh thread pool. |
Dependent item | es.node.thread_pool.refresh.queue[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Refresh thread pool executor tasks rejected | The number of tasks rejected by the refresh thread pool executor. |
Dependent item | es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total number of indexing | The total number of indexing operations. |
Dependent item | es.node.indices.indexing.index_total[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total time spent performing indexing | Total time in milliseconds spent performing indexing operations. |
Dependent item | es.node.indices.indexing.indextimein_millis[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Indexing latency | The average indexing latency calculated from the available indextotal and indextimeinmillis metrics. |
Calculated | es.node.indices.indexing.index_latency[{#ES.NODE}] |
ES {#ES.NODE}: Current indexing operations | The number of indexing operations currently running. |
Dependent item | es.node.indices.indexing.index_current[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total number of index flushes to disk | The total number of flush operations. |
Dependent item | es.node.indices.flush.total[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Total time spent on flushing indices to disk | Total time in milliseconds spent performing flush operations. |
Dependent item | es.node.indices.flush.totaltimein_millis[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Flush latency | The average flush latency calculated from the available flush.total and flush.totaltimein_millis metrics. |
Calculated | es.node.indices.flush.latency[{#ES.NODE}] |
ES {#ES.NODE}: Rate of index refreshes | The number of refresh operations per second. |
Dependent item | es.node.indices.refresh.rate[{#ES.NODE}] Preprocessing
|
ES {#ES.NODE}: Time spent performing refresh | Time in seconds spent performing refresh operations for the last measuring span. |
Dependent item | es.node.indices.refresh.time[{#ES.NODE}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
ES {#ES.NODE}: has been restarted | Uptime is less than 10 minutes. |
last(/Elasticsearch Cluster by HTTP/es.node.jvm.uptime[{#ES.NODE}])<10m |Info |
Manual close: Yes | |
ES {#ES.NODE}: Percent of JVM heap in use is high | This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. |
min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.WARN} |Warning |
Depends on:
|
|
ES {#ES.NODE}: Percent of JVM heap in use is critical | This indicates that the rate of garbage collection isn't keeping up with the rate of garbage creation. |
min(/Elasticsearch Cluster by HTTP/es.node.jvm.mem.heap_used_percent[{#ES.NODE}],1h)>{$ELASTICSEARCH.HEAP_USED.MAX.CRIT} |High |
||
ES {#ES.NODE}: Query latency is too high | If latency exceeds a threshold, look for potential resource bottlenecks, or investigate whether you need to optimize your queries. |
min(/Elasticsearch Cluster by HTTP/es.node.indices.search.query_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.QUERY_LATENCY.MAX.WARN} |Warning |
||
ES {#ES.NODE}: Fetch latency is too high | The fetch phase should typically take much less time than the query phase. If you notice this metric consistently increasing, |
min(/Elasticsearch Cluster by HTTP/es.node.indices.search.fetch_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FETCH_LATENCY.MAX.WARN} |Warning |
||
ES {#ES.NODE}: Write thread pool executor has the rejected tasks | The number of tasks rejected by the write thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.write.rejected.rate[{#ES.NODE}],5m)>0 |Warning |
||
ES {#ES.NODE}: Search thread pool executor has the rejected tasks | The number of tasks rejected by the search thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.search.rejected.rate[{#ES.NODE}],5m)>0 |Warning |
||
ES {#ES.NODE}: Refresh thread pool executor has the rejected tasks | The number of tasks rejected by the refresh thread pool executor is over 0 for 5m. |
min(/Elasticsearch Cluster by HTTP/es.node.thread_pool.refresh.rejected.rate[{#ES.NODE}],5m)>0 |Warning |
||
ES {#ES.NODE}: Indexing latency is too high | If the latency is increasing, it may indicate that you are indexing too many documents at the same time (Elasticsearch's documentation |
min(/Elasticsearch Cluster by HTTP/es.node.indices.indexing.index_latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.INDEXING_LATENCY.MAX.WARN} |Warning |
||
ES {#ES.NODE}: Flush latency is too high | If you see this metric increasing steadily, it may indicate a problem with slow disks; this problem may escalate |
min(/Elasticsearch Cluster by HTTP/es.node.indices.flush.latency[{#ES.NODE}],5m)>{$ELASTICSEARCH.FLUSH_LATENCY.MAX.WARN} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Docker engine by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Docker by Zabbix agent 2
— collects metrics by polling zabbix-agent2.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Setup and configure Zabbix agent 2 compiled with the Docker monitoring plugin. The user by which the Zabbix agent 2 is running should have access permissions to the Docker socket.
Test availability: zabbix_get -s docker-host -k docker.info
Name | Description | Default |
---|---|---|
{$DOCKER.LLD.FILTER.CONTAINER.MATCHES} | Filter of discoverable containers. |
.* |
{$DOCKER.LLD.FILTER.CONTAINER.NOT_MATCHES} | Filter to exclude discovered containers. |
CHANGE_IF_NEEDED |
{$DOCKER.LLD.FILTER.IMAGE.MATCHES} | Filter of discoverable images. |
.* |
{$DOCKER.LLD.FILTER.IMAGE.NOT_MATCHES} | Filter to exclude discovered images. |
CHANGE_IF_NEEDED |
Name | Description | Type | Key and additional info |
---|---|---|---|
Docker: Ping | Zabbix agent | docker.ping Preprocessing
|
|
Docker: Get info | Zabbix agent | docker.info | |
Docker: Get containers | Zabbix agent | docker.containers | |
Docker: Get images | Zabbix agent | docker.images | |
Docker: Get data_usage | Zabbix agent | docker.data_usage | |
Docker: Containers total | Total number of containers on this host. |
Dependent item | docker.containers.total Preprocessing
|
Docker: Containers running | Total number of containers running on this host. |
Dependent item | docker.containers.running Preprocessing
|
Docker: Containers stopped | Total number of containers stopped on this host. |
Dependent item | docker.containers.stopped Preprocessing
|
Docker: Containers paused | Total number of containers paused on this host. |
Dependent item | docker.containers.paused Preprocessing
|
Docker: Images total | Number of images with intermediate image layers. |
Dependent item | docker.images.total Preprocessing
|
Docker: Storage driver | Docker storage driver. https://docs.docker.com/storage/storagedriver/ |
Dependent item | docker.driver Preprocessing
|
Docker: Memory limit enabled | Dependent item | docker.mem_limit.enabled Preprocessing
|
|
Docker: Swap limit enabled | Dependent item | docker.swap_limit.enabled Preprocessing
|
|
Docker: Kernel memory enabled | Dependent item | docker.kernel_mem.enabled Preprocessing
|
|
Docker: Kernel memory TCP enabled | Dependent item | docker.kernelmemtcp.enabled Preprocessing
|
|
Docker: CPU CFS Period enabled | https://docs.docker.com/config/containers/resource_constraints/#configure-the-default-cfs-scheduler |
Dependent item | docker.cpucfsperiod.enabled Preprocessing
|
Docker: CPU CFS Quota enabled | https://docs.docker.com/config/containers/resource_constraints/#configure-the-default-cfs-scheduler |
Dependent item | docker.cpucfsquota.enabled Preprocessing
|
Docker: CPU Shares enabled | https://docs.docker.com/config/containers/resource_constraints/#configure-the-default-cfs-scheduler |
Dependent item | docker.cpu_shares.enabled Preprocessing
|
Docker: CPU Set enabled | https://docs.docker.com/config/containers/resource_constraints/#configure-the-default-cfs-scheduler |
Dependent item | docker.cpu_set.enabled Preprocessing
|
Docker: Pids limit enabled | Dependent item | docker.pids_limit.enabled Preprocessing
|
|
Docker: IPv4 Forwarding enabled | Dependent item | docker.ipv4_forwarding.enabled Preprocessing
|
|
Docker: Debug enabled | Dependent item | docker.debug.enabled Preprocessing
|
|
Docker: Nfd | Number of used File Descriptors. |
Dependent item | docker.nfd Preprocessing
|
Docker: OomKill disabled | Dependent item | docker.oomkill.disabled Preprocessing
|
|
Docker: Goroutines | Number of goroutines. |
Dependent item | docker.goroutines Preprocessing
|
Docker: Logging driver | Dependent item | docker.logging_driver Preprocessing
|
|
Docker: Cgroup driver | Dependent item | docker.cgroup_driver Preprocessing
|
|
Docker: NEvents listener | Dependent item | docker.nevents_listener Preprocessing
|
|
Docker: Kernel version | Dependent item | docker.kernel_version Preprocessing
|
|
Docker: Operating system | Dependent item | docker.operating_system Preprocessing
|
|
Docker: OS type | Dependent item | docker.os_type Preprocessing
|
|
Docker: Architecture | Dependent item | docker.architecture Preprocessing
|
|
Docker: NCPU | Dependent item | docker.ncpu Preprocessing
|
|
Docker: Memory total | Dependent item | docker.mem.total Preprocessing
|
|
Docker: Docker root dir | Dependent item | docker.root_dir Preprocessing
|
|
Docker: Name | Dependent item | docker.name Preprocessing
|
|
Docker: Server version | Dependent item | docker.server_version Preprocessing
|
|
Docker: Default runtime | Dependent item | docker.default_runtime Preprocessing
|
|
Docker: Live restore enabled | Dependent item | docker.live_restore.enabled Preprocessing
|
|
Docker: Layers size | Dependent item | docker.layers_size Preprocessing
|
|
Docker: Images size | Dependent item | docker.images_size Preprocessing
|
|
Docker: Containers size | Dependent item | docker.containers_size Preprocessing
|
|
Docker: Volumes size | Dependent item | docker.volumes_size Preprocessing
|
|
Docker: Images available | Number of top-level images. |
Dependent item | docker.images.top_level Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Docker: Service is down | last(/Docker by Zabbix agent 2/docker.ping)=0 |Average |
Manual close: Yes | ||
Docker: Failed to fetch info data | Zabbix has not received data for items for the last 30 minutes. |
nodata(/Docker by Zabbix agent 2/docker.name,30m)=1 |Warning |
Manual close: Yes Depends on:
|
|
Docker: Version has changed | Docker version has changed. Acknowledge to close the problem manually. |
last(/Docker by Zabbix agent 2/docker.server_version,#1)<>last(/Docker by Zabbix agent 2/docker.server_version,#2) and length(last(/Docker by Zabbix agent 2/docker.server_version))>0 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Images discovery | Discovery of images metrics. |
Zabbix agent | docker.images.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Image {#NAME}: Created | Dependent item | docker.image.created["{#ID}"] Preprocessing
|
|
Image {#NAME}: Size | Dependent item | docker.image.size["{#ID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Containers discovery | Discovery of containers metrics. Parameter: true - Returns all containers false - Returns only running containers |
Zabbix agent | docker.containers.discovery[false] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Container {#NAME}: Get stats | Get container stats based on resource usage. |
Zabbix agent | docker.container_stats["{#NAME}"] |
Container {#NAME}: CPU total usage per second | Dependent item | docker.containerstats.cpuusage.total.rate["{#NAME}"] Preprocessing
|
|
Container {#NAME}: CPU percent usage | Dependent item | docker.containerstats.cpupct_usage["{#NAME}"] Preprocessing
|
|
Container {#NAME}: CPU kernelmode usage per second | Dependent item | docker.containerstats.cpuusage.kernel.rate["{#NAME}"] Preprocessing
|
|
Container {#NAME}: CPU usermode usage per second | Dependent item | docker.containerstats.cpuusage.user.rate["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Online CPUs | Dependent item | docker.containerstats.onlinecpus["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Throttling periods | Number of periods with throttling active. |
Dependent item | docker.containerstats.cpuusage.throttling_periods["{#NAME}"] Preprocessing
|
Container {#NAME}: Throttled periods | Number of periods when the container hits its throttling limit. |
Dependent item | docker.containerstats.cpuusage.throttled_periods["{#NAME}"] Preprocessing
|
Container {#NAME}: Throttled time | Aggregate time the container was throttled for in nanoseconds. |
Dependent item | docker.containerstats.cpuusage.throttled_time["{#NAME}"] Preprocessing
|
Container {#NAME}: Memory usage | Dependent item | docker.container_stats.memory.usage["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Memory maximum usage | Dependent item | docker.containerstats.memory.maxusage["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Memory commit bytes | Dependent item | docker.containerstats.memory.commitbytes["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Memory commit peak bytes | Dependent item | docker.containerstats.memory.commitpeak_bytes["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Memory private working set | Dependent item | docker.containerstats.memory.privateworking_set["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Current PIDs count | Current number of PIDs the container has created. |
Dependent item | docker.containerstats.pidsstats.current["{#NAME}"] Preprocessing
|
Container {#NAME}: Networks bytes received per second | Dependent item | docker.networks.rx_bytes["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Networks packets received per second | Dependent item | docker.networks.rx_packets["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Networks errors received per second | Dependent item | docker.networks.rx_errors["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Networks incoming packets dropped per second | Dependent item | docker.networks.rx_dropped["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Networks bytes sent per second | Dependent item | docker.networks.tx_bytes["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Networks packets sent per second | Dependent item | docker.networks.tx_packets["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Networks errors sent per second | Dependent item | docker.networks.tx_errors["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Networks outgoing packets dropped per second | Dependent item | docker.networks.tx_dropped["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Get info | Return low-level information about a container. |
Zabbix agent | docker.container_info["{#NAME}",full] |
Container {#NAME}: Created | Dependent item | docker.container_info.created["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Image | Dependent item | docker.container_info.image["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Restart count | Dependent item | docker.containerinfo.restartcount["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Status | Dependent item | docker.container_info.state.status["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Health status | Container's |
Dependent item | docker.container_info.state.health["{#NAME}"] Preprocessing
|
Container {#NAME}: Health failing streak | Dependent item | docker.container_info.state.health.failing["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Running | Dependent item | docker.container_info.state.running["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Paused | Dependent item | docker.container_info.state.paused["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Restarting | Dependent item | docker.container_info.state.restarting["{#NAME}"] Preprocessing
|
|
Container {#NAME}: OOMKilled | Dependent item | docker.container_info.state.oomkilled["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Dead | Dependent item | docker.container_info.state.dead["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Pid | Dependent item | docker.container_info.state.pid["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Exit code | Dependent item | docker.container_info.state.exitcode["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Error | Dependent item | docker.container_info.state.error["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Started at | Dependent item | docker.container_info.started["{#NAME}"] Preprocessing
|
|
Container {#NAME}: Finished at | Time at which the container last terminated. |
Dependent item | docker.container_info.finished["{#NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Container {#NAME}: Health state container is unhealthy | Container health state is unhealthy. |
count(/Docker by Zabbix agent 2/docker.container_info.state.health["{#NAME}"],2m,,2)>=2 |High |
||
Container {#NAME}: Container has been stopped with error code | last(/Docker by Zabbix agent 2/docker.container_info.state.exitcode["{#NAME}"])>0 and last(/Docker by Zabbix agent 2/docker.container_info.state.running["{#NAME}"])=0 |Average |
Manual close: Yes | ||
Container {#NAME}: An error has occurred in the container | Container {#NAME} has an error. Acknowledge to close the problem manually. |
last(/Docker by Zabbix agent 2/docker.container_info.state.error["{#NAME}"],#1)<>last(/Docker by Zabbix agent 2/docker.container_info.state.error["{#NAME}"],#2) and length(last(/Docker by Zabbix agent 2/docker.container_info.state.error["{#NAME}"]))>0 |Warning |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Control-M by Zabbix that work without any external scripts.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template is intended to be used on Control-M Enterprise Manager instances.
It monitors:
Control-M server by HTTP
template.To use this template, you must set macros: {$API.TOKEN} and {$API.URI.ENDPOINT}.
To access the API token, use one of the following Control-M interfaces:
{$API.URI.ENDPOINT}
- is the Control-M Automation API endpoint for the API requests, including your server IP, or DNS address, Automation API port and path.
For example, https://monitored.controlm.instance:8443/automation-api
.
Name | Description | Default |
---|---|---|
{$API.URI.ENDPOINT} | The API endpoint is a URI - for example, |
<set the api uri endpoint here> |
{$API.TOKEN} | A token to use for API connections. |
<set the token here> |
Name | Description | Type | Key and additional info |
---|---|---|---|
Control-M: Get Control-M servers | Gets a list of servers. |
HTTP agent | controlm.servers |
Control-M: Get SLA services | Gets all the SLA active services. |
HTTP agent | controlm.services |
Name | Description | Type | Key and additional info |
---|---|---|---|
Server discovery | Discovers the Control-M servers. |
Dependent item | controlm.server.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
SLA services discovery | Discovers the SLA services in the Control-M environment. |
Dependent item | controlm.services.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: stats | Gets the service statistics. |
Dependent item | service.stats['{#SERVICE.NAME}','{#SERVICE.JOB}'] Preprocessing
|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: status | Gets the service status. |
Dependent item | service.status['{#SERVICE.NAME}','{#SERVICE.JOB}'] Preprocessing
|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'executed' | Gets the number of jobs in the state - |
Dependent item | service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',executed] Preprocessing
|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'waitCondition' | Gets the number of jobs in the state - |
Dependent item | service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',waitCondition] Preprocessing
|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'waitResource' | Gets the number of jobs in the state - |
Dependent item | service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',waitResource] Preprocessing
|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'waitHost' | Gets the number of jobs in the state - |
Dependent item | service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',waitHost] Preprocessing
|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'waitWorkload' | Gets the number of jobs in the state - |
Dependent item | service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',waitWorkload] Preprocessing
|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'completed' | Gets the number of jobs in the state - |
Dependent item | service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',completed] Preprocessing
|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs 'error' | Gets the number of jobs in the state - |
Dependent item | service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',error] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: status [{ITEM.VALUE}] | The service has encountered an issue. |
last(/Control-M enterprise manager by HTTP/service.status['{#SERVICE.NAME}','{#SERVICE.JOB}'],#1)=0 or last(/Control-M enterprise manager by HTTP/service.status['{#SERVICE.NAME}','{#SERVICE.JOB}'],#1)=10 |Average |
Manual close: Yes | |
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: status [{ITEM.VALUE}] | The service has finished its job late. |
last(/Control-M enterprise manager by HTTP/service.status['{#SERVICE.NAME}','{#SERVICE.JOB}'],#1)=3 |Warning |
Manual close: Yes | |
Service [{#SERVICE.NAME}, {#SERVICE.JOB}]: jobs in 'error' state | There are services present which are in the state - |
last(/Control-M enterprise manager by HTTP/service.jobs.status['{#SERVICE.NAME}','{#SERVICE.JOB}',error],#1)>0 |Average |
This template is designed to get metrics from the Control-M server using the Control-M Automation API with HTTP agent.
This template monitors server statistics, discovers jobs and agents using Low Level Discovery.
To use this template, macros {$API.TOKEN}, {$API.URI.ENDPOINT}, and {$SERVER.NAME} need to be set.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template is primarily intended for using in conjunction with the Control-M enterprise manager by HTTP
template in order to create host prototypes.
It monitors:
However, if you wish to monitor the Control-M server separately with this template, you must set the following macros: {$API.TOKEN}, {$API.URI.ENDPOINT}, and {$SERVER.NAME}.
To access the {$API.TOKEN}
macro, use one of the following interfaces:
{$API.URI.ENDPOINT}
- is the Control-M Automation API endpoint for the API requests, including your server IP, or DNS address, the Automation API port and path.
For example, https://monitored.controlm.instance:8443/automation-api
.
{$SERVER.NAME}
- is the name of the Control-M server to be monitored.
Name | Description | Default |
---|---|---|
{$SERVER.NAME} | The name of the Control-M server. |
<set the server name here> |
{$API.URI.ENDPOINT} | The API endpoint is a URI - for example, |
<set the api uri endpoint here> |
{$API.TOKEN} | A token to use for API connections. |
<set the token here> |
Name | Description | Type | Key and additional info |
---|---|---|---|
Control-M: Get Control-M server stats | Gets the statistics of the server. |
HTTP agent | controlm.server.stats Preprocessing
|
Control-M: Get jobs | Gets the status of jobs. |
HTTP agent | controlm.jobs |
Control-M: Get agents | Gets agents for the server. |
HTTP agent | controlm.agents |
Control-M: Jobs statistics | Gets the statistics of jobs. |
Dependent item | controlm.jobs.statistics Preprocessing
|
Control-M: Jobs returned | Gets the count of returned jobs. |
Dependent item | controlm.jobs.statistics.returned Preprocessing
|
Control-M: Jobs total | Gets the count of total jobs. |
Dependent item | controlm.jobs.statistics.total Preprocessing
|
Control-M: Server state | Gets the metric of the server state. |
Dependent item | server.state Preprocessing
|
Control-M: Server message | Gets the metric of the server message. |
Dependent item | server.message Preprocessing
|
Control-M: Server version | Gets the metric of the server version. |
Dependent item | server.version Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Control-M: Server is down | The server is down. |
last(/Control-M server by HTTP/server.state)=0 or last(/Control-M server by HTTP/server.state)=10 |High |
||
Control-M: Server disconnected | The server is disconnected. |
last(/Control-M server by HTTP/server.message,#1)="Disconnected" |High |
||
Control-M: Server error | The server has encountered an error. |
last(/Control-M server by HTTP/server.message,#1)<>"Connected" and last(/Control-M server by HTTP/server.message,#1)<>"Disconnected" and last(/Control-M server by HTTP/server.message,#1)<>"" |High |
||
Control-M: Server version has changed | The server version has changed. Acknowledge to close the problem manually. |
last(/Control-M server by HTTP/server.version,#1)<>last(/Control-M server by HTTP/server.version,#2) and length(last(/Control-M server by HTTP/server.version))>0 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Jobs discovery | Discovers jobs on the server. |
Dependent item | controlm.jobs.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Job [{#JOB.ID}]: stats | Gets the statistics of a job. |
Dependent item | job.stats['{#JOB.ID}'] Preprocessing
|
Job [{#JOB.ID}]: status | Gets the status of a job. |
Dependent item | job.status['{#JOB.ID}'] Preprocessing
|
Job [{#JOB.ID}]: number of runs | Gets the number of runs for a job. |
Dependent item | job.numberOfRuns['{#JOB.ID}'] Preprocessing
|
Job [{#JOB.ID}]: type | Gets the job type. |
Dependent item | job.type['{#JOB.ID}'] Preprocessing
|
Job [{#JOB.ID}]: held status | Gets the held status of a job. |
Dependent item | job.held['{#JOB.ID}'] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Job [{#JOB.ID}]: status [{ITEM.VALUE}] | The job has encountered an issue. |
last(/Control-M server by HTTP/job.status['{#JOB.ID}'],#1)=1 or last(/Control-M server by HTTP/job.status['{#JOB.ID}'],#1)=10 |Warning |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Agent discovery | Discovers agents on the server. |
Dependent item | controlm.agent.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Agent [{#AGENT.NAME}]: stats | Gets the statistics of an agent. |
Dependent item | agent.stats['{#AGENT.NAME}'] Preprocessing
|
Agent [{#AGENT.NAME}]: status | Gets the status of an agent. |
Dependent item | agent.status['{#AGENT.NAME}'] Preprocessing
|
Agent [{#AGENT.NAME}]: version | Gets the version number of an agent. |
Dependent item | agent.version['{#AGENT.NAME}'] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Agent [{#AGENT.NAME}]: status [{ITEM.VALUE}] | The agent has encountered an issue. |
last(/Control-M server by HTTP/agent.status['{#AGENT.NAME}'],#1)=1 or last(/Control-M server by HTTP/agent.status['{#AGENT.NAME}'],#1)=10 |Average |
Manual close: Yes | |
Agent [{#AGENT.NAME}}: status disabled | The agent is disabled. |
last(/Control-M server by HTTP/agent.status['{#AGENT.NAME}'],#1)=2 or last(/Control-M server by HTTP/agent.status['{#AGENT.NAME}'],#1)=3 |Info |
Manual close: Yes | |
Agent [{#AGENT.NAME}]: version has changed | The agent version has changed. Acknowledge to close the problem manually. |
last(/Control-M server by HTTP/agent.version['{#AGENT.NAME}'],#1)<>last(/Control-M server by HTTP/agent.version['{#AGENT.NAME}'],#2) |Info |
Manual close: Yes | |
Agent [{#AGENT.NAME}]: unknown version | The agent version is unknown. |
last(/Control-M server by HTTP/agent.version['{#AGENT.NAME}'],#1)="Unknown" |Warning |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor HashiCorp Consul by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template HashiCorp Consul Cluster by HTTP
— collects metrics by HTTP agent from API endpoints.
More information about metrics you can find in official documentation.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Template need to use Authorization via API token.
Don't forget to change macros {$CONSUL.CLUSTER.URL}, {$CONSUL.TOKEN}.
Also, see the Macros section for a list of macros used to set trigger values.
This template support Consul namespaces. You can set macro {$CONSUL.NAMESPACE}, if you are interested in only one service namespace. Do not specify this macro to get all of services.
In case of Open Source version leave this macro empty.
NOTE. Some metrics may not be collected depending on your HashiCorp Consul instance version and configuration.
NOTE. You maybe are interested in Envoy Proxy by HTTP template.
Name | Description | Default |
---|---|---|
{$CONSUL.CLUSTER.URL} | Consul cluster URL. |
http://localhost:8500 |
{$CONSUL.TOKEN} | Consul auth token. |
<PUT YOUR AUTH TOKEN> |
{$CONSUL.NAMESPACE} | Consul service namespace. Enterprise only, in case of Open Source version leave this macro empty. Do not specify this macro to get all of services. |
|
{$CONSUL.API.SCHEME} | Consul API scheme. Using in node LLD. |
http |
{$CONSUL.API.PORT} | Consul API port. Using in node LLD. |
8500 |
{$CONSUL.LLD.FILTER.NODE_NAME.MATCHES} | Filter of discoverable discovered nodes. |
.* |
{$CONSUL.LLD.FILTER.NODENAME.NOTMATCHES} | Filter to exclude discovered nodes. |
CHANGE IF NEEDED |
{$CONSUL.LLD.FILTER.SERVICE_NAME.MATCHES} | Filter of discoverable discovered services. |
.* |
{$CONSUL.LLD.FILTER.SERVICENAME.NOTMATCHES} | Filter to exclude discovered services. |
CHANGE IF NEEDED |
{$CONSUL.SERVICE_NODES.CRITICAL.MAX.AVG} | Maximum number of service nodes in status 'critical' for trigger expression. Can be used with context. |
0 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Consul cluster: Cluster leader | Current leader address. |
HTTP agent | consul.get_leader Preprocessing
|
Consul cluster: Nodes: peers | The number of Raft peers for the datacenter in which the agent is running. |
HTTP agent | consul.get_peers Preprocessing
|
Consul cluster: Get nodes | Catalog of nodes registered in a given datacenter. |
HTTP agent | consul.get_nodes Preprocessing
|
Consul cluster: Get nodes Serf health status | Get Serf Health Status for all agents in cluster. |
HTTP agent | consul.getclusterserf Preprocessing
|
Consul: Nodes: total | Number of nodes on current dc. |
Dependent item | consul.nodes_total Preprocessing
|
Consul: Nodes: passing | Number of agents on current dc with serf health status 'passing'. |
Dependent item | consul.nodes_passing Preprocessing
|
Consul: Nodes: critical | Number of agents on current dc with serf health status 'critical'. |
Dependent item | consul.nodes_critical Preprocessing
|
Consul: Nodes: warning | Number of agents on current dc with serf health status 'warning'. |
Dependent item | consul.nodes_warning Preprocessing
|
Consul cluster: Get services | Catalog of services registered in a given datacenter. |
HTTP agent | consul.getcatalogservices Preprocessing
|
Consul: Services: total | Number of services on current dc. |
Dependent item | consul.services_total Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Consul cluster: Leader has been changed | Consul cluster version has changed. Acknowledge to close the problem manually. |
last(/HashiCorp Consul Cluster by HTTP/consul.get_leader,#1)<>last(/HashiCorp Consul Cluster by HTTP/consul.get_leader,#2) and length(last(/HashiCorp Consul Cluster by HTTP/consul.get_leader))>0 |Info |
Manual close: Yes | |
Consul: One or more nodes in cluster in 'critical' state | One or more agents on current dc with serf health status 'critical'. |
last(/HashiCorp Consul Cluster by HTTP/consul.nodes_critical)>0 |Average |
||
Consul: One or more nodes in cluster in 'warning' state | One or more agents on current dc with serf health status 'warning'. |
last(/HashiCorp Consul Cluster by HTTP/consul.nodes_warning)>0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Consul cluster nodes discovery | Dependent item | consul.lld_nodes Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Consul: Node ["{#NODE_NAME}"]: Serf Health | Node Serf Health Status. |
Dependent item | consul.serf.health["{#NODE_NAME}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Consul cluster services discovery | Dependent item | consul.lld_services Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Consul: Service ["{#SERVICE_NAME}"]: Nodes passing | The number of nodes with service status |
Dependent item | consul.service.nodespassing["{#SERVICENAME}"] Preprocessing
|
Consul: Service ["{#SERVICE_NAME}"]: Nodes warning | The number of nodes with service status |
Dependent item | consul.service.nodeswarning["{#SERVICENAME}"] Preprocessing
|
Consul: Service ["{#SERVICE_NAME}"]: Nodes critical | The number of nodes with service status |
Dependent item | consul.service.nodescritical["{#SERVICENAME}"] Preprocessing
|
Consul cluster: ["{#SERVICE_NAME}"]: Get raw service state | Retrieve service instances providing the service indicated on the path. |
HTTP agent | consul.getservicestats["{#SERVICE_NAME}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Consul: Service ["{#SERVICE_NAME}"]: Too many nodes with service status 'critical' | One or more nodes with service status 'critical'. |
last(/HashiCorp Consul Cluster by HTTP/consul.service.nodes_critical["{#SERVICE_NAME}"])>{$CONSUL.CLUSTER.SERVICE_NODES.CRITICAL.MAX.AVG:"{#SERVICE_NAME}"} |Average |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor HashiCorp Consul by Zabbix that works without any external scripts.
Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Do not forget to enable Prometheus format for export metrics.
See documentation.
More information about metrics you can find in official documentation.
Template HashiCorp Consul Node by HTTP
— collects metrics by HTTP agent from /v1/agent/metrics endpoint.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Internal service metrics are collected from /v1/agent/metrics endpoint. Do not forget to enable Prometheus format for export metrics. See documentation. Template need to use Authorization via API token.
Don't forget to change macros {$CONSUL.NODE.API.URL}, {$CONSUL.TOKEN}.
Also, see the Macros section for a list of macros used to set trigger values.
More information about metrics you can find in official documentation.
This template support Consul namespaces. You can set macros {$CONSUL.LLD.FILTER.SERVICENAMESPACE.MATCHES}, {$CONSUL.LLD.FILTER.SERVICENAMESPACE.NOT_MATCHES} if you want to filter discovered services by namespace.
In case of Open Source version service namespace will be set to 'None'.
NOTE. Some metrics may not be collected depending on your HashiCorp Consul instance version and configuration.
NOTE. You maybe are interested in Envoy Proxy by HTTP template.
Name | Description | Default |
---|---|---|
{$CONSUL.NODE.API.URL} | Consul instance URL. |
http://localhost:8500 |
{$CONSUL.TOKEN} | Consul auth token. |
<PUT YOUR AUTH TOKEN> |
{$CONSUL.OPEN.FDS.MAX.WARN} | Maximum percentage of used file descriptors. |
90 |
{$CONSUL.LLD.FILTER.LOCALSERVICENAME.MATCHES} | Filter of discoverable discovered services on local node. |
.* |
{$CONSUL.LLD.FILTER.LOCALSERVICENAME.NOT_MATCHES} | Filter to exclude discovered services on local node. |
CHANGE IF NEEDED |
{$CONSUL.LLD.FILTER.SERVICE_NAMESPACE.MATCHES} | Filter of discoverable discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'. |
.* |
{$CONSUL.LLD.FILTER.SERVICENAMESPACE.NOTMATCHES} | Filter to exclude discovered service by namespace on local node. Enterprise only, in case of Open Source version Namespace will be set to 'None'. |
CHANGE IF NEEDED |
{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN} | Maximum acceptable value of node's health score for WARNING trigger expression. |
2 |
{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH} | Maximum acceptable value of node's health score for AVERAGE trigger expression. |
4 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Consul: Get instance metrics | Get raw metrics from Consul instance /metrics endpoint. |
HTTP agent | consul.get_metrics Preprocessing
|
Consul: Get node info | Get configuration and member information of the local agent. |
HTTP agent | consul.getnodeinfo Preprocessing
|
Consul: Role | Role of current Consul agent. |
Dependent item | consul.role Preprocessing
|
Consul: Version | Version of Consul agent. |
Dependent item | consul.version Preprocessing
|
Consul: Number of services | Number of services on current node. |
Dependent item | consul.services_number Preprocessing
|
Consul: Number of checks | Number of checks on current node. |
Dependent item | consul.checks_number Preprocessing
|
Consul: Number of check monitors | Number of check monitors on current node. |
Dependent item | consul.checkmonitorsnumber Preprocessing
|
Consul: Process CPU seconds, total | Total user and system CPU time spent in seconds. |
Dependent item | consul.cpusecondstotal.rate Preprocessing
|
Consul: Virtual memory size | Virtual memory size in bytes. |
Dependent item | consul.virtualmemorybytes Preprocessing
|
Consul: RSS memory usage | Resident memory size in bytes. |
Dependent item | consul.residentmemorybytes Preprocessing
|
Consul: Goroutine count | The number of Goroutines on Consul instance. |
Dependent item | consul.goroutines Preprocessing
|
Consul: Open file descriptors | Number of open file descriptors. |
Dependent item | consul.processopenfds Preprocessing
|
Consul: Open file descriptors, max | Maximum number of open file descriptors. |
Dependent item | consul.processmaxfds Preprocessing
|
Consul: Client RPC, per second | Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server. This gives a measure of how much a given agent is loading the Consul servers. This is only generated by agents in client mode, not Consul servers. |
Dependent item | consul.client_rpc Preprocessing
|
Consul: Client RPC failed ,per second | Number of times per second whenever a Consul agent in client mode makes an RPC request to a Consul server and fails. |
Dependent item | consul.clientrpcfailed Preprocessing
|
Consul: TCP connections, accepted per second | This metric counts the number of times a Consul agent has accepted an incoming TCP stream connection per second. |
Dependent item | consul.memberlist.tcp_accept Preprocessing
|
Consul: TCP connections, per second | This metric counts the number of times a Consul agent has initiated a push/pull sync with an other agent per second. |
Dependent item | consul.memberlist.tcp_connect Preprocessing
|
Consul: TCP send bytes, per second | This metric measures the total number of bytes sent by a Consul agent through the TCP protocol per second. |
Dependent item | consul.memberlist.tcp_sent Preprocessing
|
Consul: UDP received bytes, per second | This metric measures the total number of bytes received by a Consul agent through the UDP protocol per second. |
Dependent item | consul.memberlist.udp_received Preprocessing
|
Consul: UDP sent bytes, per second | This metric measures the total number of bytes sent by a Consul agent through the UDP protocol per second. |
Dependent item | consul.memberlist.udp_sent Preprocessing
|
Consul: GC pause, p90 | The 90 percentile for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds. |
Dependent item | consul.gc_pause.p90 Preprocessing
|
Consul: GC pause, p50 | The 50 percentile (median) for the number of nanoseconds consumed by stop-the-world garbage collection (GC) pauses since Consul started, in milliseconds. |
Dependent item | consul.gc_pause.p50 Preprocessing
|
Consul: Memberlist: degraded | This metric counts the number of times the Consul agent has performed failure detection on another agent at a slower probe rate. The agent uses its own health metric as an indicator to perform this action. If its health score is low, it means that the node is healthy, and vice versa. |
Dependent item | consul.memberlist.degraded Preprocessing
|
Consul: Memberlist: health score | This metric describes a node's perception of its own health based on how well it is meeting the soft real-time requirements of the protocol. This metric ranges from 0 to 8, where 0 indicates "totally healthy". |
Dependent item | consul.memberlist.health_score Preprocessing
|
Consul: Memberlist: gossip, p90 | The 90 percentile for the number of gossips (messages) broadcasted to a set of randomly selected nodes. |
Dependent item | consul.memberlist.dispatch_log.p90 Preprocessing
|
Consul: Memberlist: gossip, p50 | The 50 for the number of gossips (messages) broadcasted to a set of randomly selected nodes. |
Dependent item | consul.memberlist.gossip.p50 Preprocessing
|
Consul: Memberlist: msg alive | This metric counts the number of alive Consul agents, that the agent has mapped out so far, based on the message information given by the network layer. |
Dependent item | consul.memberlist.msg.alive Preprocessing
|
Consul: Memberlist: msg dead | This metric counts the number of times a Consul agent has marked another agent to be a dead node. |
Dependent item | consul.memberlist.msg.dead Preprocessing
|
Consul: Memberlist: msg suspect | The number of times a Consul agent suspects another as failed while probing during gossip protocol. |
Dependent item | consul.memberlist.msg.suspect Preprocessing
|
Consul: Memberlist: probe node, p90 | The 90 percentile for the time taken to perform a single round of failure detection on a select Consul agent. |
Dependent item | consul.memberlist.probe_node.p90 Preprocessing
|
Consul: Memberlist: probe node, p50 | The 50 percentile (median) for the time taken to perform a single round of failure detection on a select Consul agent. |
Dependent item | consul.memberlist.probe_node.p50 Preprocessing
|
Consul: Memberlist: push pull node, p90 | The 90 percentile for the number of Consul agents that have exchanged state with this agent. |
Dependent item | consul.memberlist.pushpullnode.p90 Preprocessing
|
Consul: Memberlist: push pull node, p50 | The 50 percentile (median) for the number of Consul agents that have exchanged state with this agent. |
Dependent item | consul.memberlist.pushpullnode.p50 Preprocessing
|
Consul: KV store: apply, p90 | The 90 percentile for the time it takes to complete an update to the KV store. |
Dependent item | consul.kvs.apply.p90 Preprocessing
|
Consul: KV store: apply, p50 | The 50 percentile (median) for the time it takes to complete an update to the KV store. |
Dependent item | consul.kvs.apply.p50 Preprocessing
|
Consul: KV store: apply, rate | The number of updates to the KV store per second. |
Dependent item | consul.kvs.apply.rate Preprocessing
|
Consul: Serf member: flap, rate | Increments when an agent is marked dead and then recovers within a short time period. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports. Shown as events per second. |
Dependent item | consul.serf.member.flap.rate Preprocessing
|
Consul: Serf member: failed, rate | Increments when an agent is marked dead. This can be an indicator of overloaded agents, network problems, or configuration errors where agents cannot connect to each other on the required ports. Shown as events per second. |
Dependent item | consul.serf.member.failed.rate Preprocessing
|
Consul: Serf member: join, rate | Increments when an agent joins the cluster. If an agent flapped or failed this counter also increments when it re-joins. Shown as events per second. |
Dependent item | consul.serf.member.join.rate Preprocessing
|
Consul: Serf member: left, rate | Increments when an agent leaves the cluster. Shown as events per second. |
Dependent item | consul.serf.member.left.rate Preprocessing
|
Consul: Serf member: update, rate | Increments when a Consul agent updates. Shown as events per second. |
Dependent item | consul.serf.member.update.rate Preprocessing
|
Consul: ACL: resolves, rate | The number of ACL resolves per second. |
Dependent item | consul.acl.resolves.rate Preprocessing
|
Consul: Catalog: register, rate | The number of catalog register operation per second. |
Dependent item | consul.catalog.register.rate Preprocessing
|
Consul: Catalog: deregister, rate | The number of catalog deregister operation per second. |
Dependent item | consul.catalog.deregister.rate Preprocessing
|
Consul: Snapshot: append line, p90 | The 90 percentile for the time taken by the Consul agent to append an entry into the existing log. |
Dependent item | consul.snapshot.append_line.p90 Preprocessing
|
Consul: Snapshot: append line, p50 | The 50 percentile (median) for the time taken by the Consul agent to append an entry into the existing log. |
Dependent item | consul.snapshot.append_line.p50 Preprocessing
|
Consul: Snapshot: append line, rate | The number of snapshot appendLine operations per second. |
Dependent item | consul.snapshot.append_line.rate Preprocessing
|
Consul: Snapshot: compact, p90 | The 90 percentile for the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction. |
Dependent item | consul.snapshot.compact.p90 Preprocessing
|
Consul: Snapshot: compact, p50 | The 50 percentile (median) for the time taken by the Consul agent to compact a log. This operation occurs only when the snapshot becomes large enough to justify the compaction. |
Dependent item | consul.snapshot.compact.p50 Preprocessing
|
Consul: Snapshot: compact, rate | The number of snapshot compact operations per second. |
Dependent item | consul.snapshot.compact.rate Preprocessing
|
Consul: Get local services | Get all the services that are registered with the local agent and their status. |
Script | consul.getlocalservices |
Consul: Get local services check | Data collection check. |
Dependent item | consul.getlocalservices.check Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Consul: Version has been changed | Consul version has changed. Acknowledge to close the problem manually. |
last(/HashiCorp Consul Node by HTTP/consul.version,#1)<>last(/HashiCorp Consul Node by HTTP/consul.version,#2) and length(last(/HashiCorp Consul Node by HTTP/consul.version))>0 |Info |
Manual close: Yes | |
Consul: Current number of open files is too high | "Heavy file descriptor usage (i.e., near the process’s file descriptor limit) indicates a potential file descriptor exhaustion issue." |
min(/HashiCorp Consul Node by HTTP/consul.process_open_fds,5m)/last(/HashiCorp Consul Node by HTTP/consul.process_max_fds)*100>{$CONSUL.OPEN.FDS.MAX.WARN} |Warning |
||
Consul: Node's health score is warning | This metric ranges from 0 to 8, where 0 indicates "totally healthy". |
max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.WARN} |Warning |
Depends on:
|
|
Consul: Node's health score is critical | This metric ranges from 0 to 8, where 0 indicates "totally healthy". |
max(/HashiCorp Consul Node by HTTP/consul.memberlist.health_score,#3)>{$CONSUL.NODE.HEALTH_SCORE.MAX.HIGH} |Average |
||
Consul: Failed to get local services | Failed to get local services. Check debug log for more information. |
length(last(/HashiCorp Consul Node by HTTP/consul.get_local_services.check))>0 |Warning |
Name | Description | Type | Key and additional info |
---|---|---|---|
Local node services discovery | Discover metrics for services that are registered with the local agent. |
Dependent item | consul.nodeserviceslld Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Consul: ["{#SERVICE_NAME}"]: Aggregated status | Aggregated values of all health checks for the service instance. |
Dependent item | consul.service.aggregatedstate["{#SERVICEID}"] Preprocessing
|
Consul: ["{#SERVICENAME}"]: Check ["{#SERVICECHECK_NAME}"]: Status | Current state of health check for the service. |
Dependent item | consul.service.check.state["{#SERVICEID}/{#SERVICECHECK_ID}"] Preprocessing
|
Consul: ["{#SERVICENAME}"]: Check ["{#SERVICECHECK_NAME}"]: Output | Current output of health check for the service. |
Dependent item | consul.service.check.output["{#SERVICEID}/{#SERVICECHECK_ID}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Consul: Aggregated status is 'warning' | Aggregated state of service on the local agent is 'warning'. |
last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 1 |Warning |
||
Consul: Aggregated status is 'critical' | Aggregated state of service on the local agent is 'critical'. |
last(/HashiCorp Consul Node by HTTP/consul.service.aggregated_state["{#SERVICE_ID}"]) = 2 |Average |
Name | Description | Type | Key and additional info |
---|---|---|---|
HTTP API methods discovery | Discovery HTTP API methods specific metrics. |
Dependent item | consul.httpapidiscovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Consul: HTTP request: ["{#HTTP_METHOD}"], p90 | The 90 percentile of how long it takes to service the given HTTP request for the given verb. |
Dependent item | consul.http.api.p90["{#HTTP_METHOD}"] Preprocessing
|
Consul: HTTP request: ["{#HTTP_METHOD}"], p50 | The 50 percentile (median) of how long it takes to service the given HTTP request for the given verb. |
Dependent item | consul.http.api.p50["{#HTTP_METHOD}"] Preprocessing
|
Consul: HTTP request: ["{#HTTP_METHOD}"], rate | The number of HTTP request for the given verb per second. |
Dependent item | consul.http.api.rate["{#HTTP_METHOD}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Raft server metrics discovery | Discover raft metrics for server nodes. |
Dependent item | consul.raft.server.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Consul: Raft state | Current state of Consul agent. |
Dependent item | consul.raft.state[{#SINGLETON}] Preprocessing
|
Consul: Raft state: leader | Increments when a server becomes a leader. |
Dependent item | consul.raft.state_leader[{#SINGLETON}] Preprocessing
|
Consul: Raft state: candidate | The number of initiated leader elections. |
Dependent item | consul.raft.state_candidate[{#SINGLETON}] Preprocessing
|
Consul: Raft: apply, rate | Incremented whenever a leader first passes a message into the Raft commit process (called an Apply operation). This metric describes the arrival rate of new logs into Raft per second. |
Dependent item | consul.raft.apply.rate[{#SINGLETON}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Raft leader metrics discovery | Discover raft metrics for leader nodes. |
Dependent item | consul.raft.leader.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Consul: Raft state: leader last contact, p90 | The 90 percentile of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds. |
Dependent item | consul.raft.leaderlastcontact.p90[{#SINGLETON}] Preprocessing
|
Consul: Raft state: leader last contact, p50 | The 50 percentile (median) of how long it takes a leader node to communicate with followers during a leader lease check, in milliseconds. |
Dependent item | consul.raft.leaderlastcontact.p50[{#SINGLETON}] Preprocessing
|
Consul: Raft state: commit time, p90 | The 90 percentile time it takes to commit a new entry to the raft log on the leader, in milliseconds. |
Dependent item | consul.raft.commit_time.p90[{#SINGLETON}] Preprocessing
|
Consul: Raft state: commit time, p50 | The 50 percentile (median) time it takes to commit a new entry to the raft log on the leader, in milliseconds. |
Dependent item | consul.raft.commit_time.p50[{#SINGLETON}] Preprocessing
|
Consul: Raft state: dispatch log, p90 | The 90 percentile time it takes for the leader to write log entries to disk, in milliseconds. |
Dependent item | consul.raft.dispatch_log.p90[{#SINGLETON}] Preprocessing
|
Consul: Raft state: dispatch log, p50 | The 50 percentile (median) time it takes for the leader to write log entries to disk, in milliseconds. |
Dependent item | consul.raft.dispatch_log.p50[{#SINGLETON}] Preprocessing
|
Consul: Raft state: dispatch log, rate | The number of times a Raft leader writes a log to disk per second. |
Dependent item | consul.raft.dispatch_log.rate[{#SINGLETON}] Preprocessing
|
Consul: Raft state: commit, rate | The number of commits a new entry to the Raft log on the leader per second. |
Dependent item | consul.raft.commit_time.rate[{#SINGLETON}] Preprocessing
|
Consul: Autopilot healthy | Tracks the overall health of the local server cluster. 1 if all servers are healthy, 0 if one or more are unhealthy. |
Dependent item | consul.autopilot.healthy[{#SINGLETON}] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Cloudflare monitoring by Zabbix via HTTP and doesn't require any external scripts.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
1. Create a host, for example mywebsite.com, for a site in your Cloudflare account.
2. Link the template to the host.
3. Customize the values of {$CLOUDFLARE.API.TOKEN}, {$CLOUDFLARE.ZONE_ID} macros.
Cloudflare API Tokens are available in your Cloudflare account under My Profile > API Tokens.
Zone ID is available in your Cloudflare account under Account Home > Site.
Name | Description | Default |
---|---|---|
{$CLOUDFLARE.API.URL} | The URL of Cloudflare API endpoint. |
https://api.cloudflare.com/client/v4 |
{$CLOUDFLARE.API.TOKEN} | Your Cloudflare API Token. |
<change> |
{$CLOUDFLARE.ZONE_ID} | Your Cloudflare Site Zone ID. |
<change> |
{$CLOUDFLARE.GET_DATA.TIMEOUT} | Response timeout for Cloudflare API. |
3s |
{$CLOUDFLARE.ERRORS.MAX.WARN} | Maximum responses with errors in %. |
30 |
{$CLOUDFLARE.CACHED_BANDWIDTH.MIN.WARN} | Minimum of cached bandwidth in %. |
50 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Cloudflare: Total bandwidth | The volume of all data. |
Dependent item | cloudflare.bandwidth.all Preprocessing
|
Cloudflare: Cached bandwidth | The volume of cached data. |
Dependent item | cloudflare.bandwidth.cached Preprocessing
|
Cloudflare: Uncached bandwidth | The volume of uncached data. |
Dependent item | cloudflare.bandwidth.uncached Preprocessing
|
Cloudflare: Cache hit ratio of bandwidth | The ratio of the amount cached bandwidth to the bandwidth in percentage. |
Dependent item | cloudflare.bandwidth.cachehitratio Preprocessing
|
Cloudflare: SSL encrypted bandwidth | The volume of encrypted data. |
Dependent item | cloudflare.bandwidth.ssl.encrypted Preprocessing
|
Cloudflare: Unencrypted bandwidth | The volume of unencrypted data. |
Dependent item | cloudflare.bandwidth.ssl.unencrypted Preprocessing
|
Cloudflare: DNS queries | The amount of all DNS queries. |
Dependent item | cloudflare.dns.query.all Preprocessing
|
Cloudflare: Stale DNS queries | The number of stale DNS queries. |
Dependent item | cloudflare.dns.query.stale Preprocessing
|
Cloudflare: Uncached DNS queries | The number of uncached DNS queries. |
Dependent item | cloudflare.dns.query.uncached Preprocessing
|
Cloudflare: Get data | The JSON with result of Cloudflare API request. |
Script | cloudflare.get |
Cloudflare: Total page views | The amount of all pageviews. |
Dependent item | cloudflare.pageviews.all Preprocessing
|
Cloudflare: Total requests | The amount of all requests. |
Dependent item | cloudflare.requests.all Preprocessing
|
Cloudflare: Cached requests | Dependent item | cloudflare.requests.cached Preprocessing
|
|
Cloudflare: Uncached requests | The number of uncached requests. |
Dependent item | cloudflare.requests.uncached Preprocessing
|
Cloudflare: Cache hit ratio % over time | The ratio of the amount cached requests to all requests in percentage. |
Dependent item | cloudflare.requests.cachehitratio Preprocessing
|
Cloudflare: Response codes 1xx | The number requests with 1xx response codes. |
Dependent item | cloudflare.requests.response_100 Preprocessing
|
Cloudflare: Response codes 2xx | The number requests with 2xx response codes. |
Dependent item | cloudflare.requests.response_200 Preprocessing
|
Cloudflare: Response codes 3xx | The number requests with 3xx response codes. |
Dependent item | cloudflare.requests.response_300 Preprocessing
|
Cloudflare: Response codes 4xx | The number requests with 4xx response codes. |
Dependent item | cloudflare.requests.response_400 Preprocessing
|
Cloudflare: Response codes 5xx | The number requests with 5xx response codes. |
Dependent item | cloudflare.requests.response_500 Preprocessing
|
Cloudflare: Non-2xx responses ratio | The ratio of the amount requests with non-2xx response codes to all requests in percentage. |
Dependent item | cloudflare.requests.others_ratio Preprocessing
|
Cloudflare: 2xx responses ratio | The ratio of the amount requests with 2xx response codes to all requests in percentage. |
Dependent item | cloudflare.requests.success_ratio Preprocessing
|
Cloudflare: SSL encrypted requests | The number of encrypted requests. |
Dependent item | cloudflare.requests.ssl.encrypted Preprocessing
|
Cloudflare: Unencrypted requests | The number of unencrypted requests. |
Dependent item | cloudflare.requests.ssl.unencrypted Preprocessing
|
Cloudflare: Total threats | The number of all threats. |
Dependent item | cloudflare.threats.all Preprocessing
|
Cloudflare: Unique visitors | The number of all visitors IPs. |
Dependent item | cloudflare.uniques.all Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Cloudflare: Cached bandwidth is too low | max(/Cloudflare by HTTP/cloudflare.bandwidth.cache_hit_ratio,#3) < {$CLOUDFLARE.CACHED_BANDWIDTH.MIN.WARN} |Warning |
|||
Cloudflare: Ratio of non-2xx responses is too high | A large number of errors can indicate a malfunction of the site. |
min(/Cloudflare by HTTP/cloudflare.requests.others_ratio,#3) > {$CLOUDFLARE.ERRORS.MAX.WARN} |Average |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor TLS/SSL certificate on the website by Zabbix agent 2 that works without any external scripts. Zabbix agent 2 with the WebCertificate plugin requests certificate using the web.certificate.get key and returns JSON with certificate attributes.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
1. Setup and configure zabbix-agent2 with the WebCertificate plugin.
2. Test availability: zabbix_get -s <zabbix_agent_addr> -k web.certificate.get[<website_DNS_name>]
3. Create a host for the TLS/SSL certificate with Zabbix agent interface.
4. Link the template to the host.
5. Customize the value of {$CERT.WEBSITE.HOSTNAME} macro.
Name | Description | Default |
---|---|---|
{$CERT.EXPIRY.WARN} | Number of days until the certificate expires. |
7 |
{$CERT.WEBSITE.HOSTNAME} | The website DNS name for the connection. |
<Put DNS name> |
{$CERT.WEBSITE.PORT} | The TLS/SSL port number of the website. |
443 |
{$CERT.WEBSITE.IP} | The website IP address for the connection. |
Name | Description | Type | Key and additional info |
---|---|---|---|
Cert: Get | Returns the JSON with attributes of a certificate of the requested site. |
Zabbix agent | web.certificate.get[{$CERT.WEBSITE.HOSTNAME},{$CERT.WEBSITE.PORT},{$CERT.WEBSITE.IP}] Preprocessing
|
Cert: Validation result | The certificate validation result. Possible values: valid/invalid/valid-but-self-signed |
Dependent item | cert.validation Preprocessing
|
Cert: Last validation status | Last check result message. |
Dependent item | cert.message Preprocessing
|
Cert: Version | The version of the encoded certificate. |
Dependent item | cert.version Preprocessing
|
Cert: Serial number | The serial number is a positive integer assigned by the CA to each certificate. It is unique for each certificate issued by a given CA. Non-conforming CAs may issue certificates with serial numbers that are negative or zero. |
Dependent item | cert.serial_number Preprocessing
|
Cert: Signature algorithm | The algorithm identifier for the algorithm used by the CA to sign the certificate. |
Dependent item | cert.signature_algorithm Preprocessing
|
Cert: Issuer | The field identifies the entity that has signed and issued the certificate. |
Dependent item | cert.issuer Preprocessing
|
Cert: Valid from | The date on which the certificate validity period begins. |
Dependent item | cert.not_before Preprocessing
|
Cert: Expires on | The date on which the certificate validity period ends. |
Dependent item | cert.not_after Preprocessing
|
Cert: Subject | The field identifies the entity associated with the public key stored in the subject public key field. |
Dependent item | cert.subject Preprocessing
|
Cert: Subject alternative name | The subject alternative name extension allows identities to be bound to the subject of the certificate. These identities may be included in addition to or in place of the identity in the subject field of the certificate. Defined options include an Internet electronic mail address, a DNS name, an IP address, and a Uniform Resource Identifier (URI). |
Dependent item | cert.alternative_names Preprocessing
|
Cert: Public key algorithm | The digital signature algorithm is used to verify the signature of a certificate. |
Dependent item | cert.publickeyalgorithm Preprocessing
|
Cert: Fingerprint | The Certificate Signature (SHA1 Fingerprint or Thumbprint) is the hash of the entire certificate in DER form. |
Dependent item | cert.sha1_fingerprint Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Cert: SSL certificate is invalid | SSL certificate has expired or it is issued for another domain. |
find(/Website certificate by Zabbix agent 2/cert.validation,,"like","invalid")=1 |High |
||
Cert: SSL certificate expires soon | The SSL certificate should be updated or it will become untrusted. |
(last(/Website certificate by Zabbix agent 2/cert.not_after) - now()) / 86400 < {$CERT.EXPIRY.WARN} |Warning |
Depends on:
|
|
Cert: Fingerprint has changed | The SSL certificate fingerprint has changed. If you did not update the certificate, it may mean your certificate has been hacked. Acknowledge to close the problem manually. |
last(/Website certificate by Zabbix agent 2/cert.sha1_fingerprint) <> last(/Website certificate by Zabbix agent 2/cert.sha1_fingerprint,#2) |Info |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
The template to monitor Ceph cluster by Zabbix that work without any external scripts. Most of the metrics are collected in one go, thanks to Zabbix bulk data collection.
Template Ceph by Zabbix agent 2
— collects metrics by polling zabbix-agent2.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Test availability: zabbix_get -s ceph-host -k ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"]
Name | Description | Default |
---|---|---|
{$CEPH.USER} | zabbix |
|
{$CEPH.API.KEY} | zabbix_pass |
|
{$CEPH.CONNSTRING} | https://localhost:8003 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Ceph: Get overall cluster status | Zabbix agent | ceph.status["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] | |
Ceph: Get OSD stats | Zabbix agent | ceph.osd.stats["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] | |
Ceph: Get OSD dump | Zabbix agent | ceph.osd.dump["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] | |
Ceph: Get df | Zabbix agent | ceph.df.details["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] | |
Ceph: Ping | Zabbix agent | ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] Preprocessing
|
|
Ceph: Number of Monitors | The number of Monitors configured in a Ceph cluster. |
Dependent item | ceph.num_mon Preprocessing
|
Ceph: Overall cluster status | The overall Ceph cluster status, eg 0 - HEALTHOK, 1 - HEALTHWARN or 2 - HEALTH_ERR. |
Dependent item | ceph.overall_status Preprocessing
|
Ceph: Minimum Mon release version | minmonrelease_name |
Dependent item | ceph.minmonrelease_name Preprocessing
|
Ceph: Ceph Read bandwidth | The global read bytes per second. |
Dependent item | ceph.rd_bytes.rate Preprocessing
|
Ceph: Ceph Write bandwidth | The global write bytes per second. |
Dependent item | ceph.wr_bytes.rate Preprocessing
|
Ceph: Ceph Read operations per sec | The global read operations per second. |
Dependent item | ceph.rd_ops.rate Preprocessing
|
Ceph: Ceph Write operations per sec | The global write operations per second. |
Dependent item | ceph.wr_ops.rate Preprocessing
|
Ceph: Total bytes available | The total bytes available in a Ceph cluster. |
Dependent item | ceph.totalavailbytes Preprocessing
|
Ceph: Total bytes | The total (RAW) capacity of a Ceph cluster in bytes. |
Dependent item | ceph.total_bytes Preprocessing
|
Ceph: Total bytes used | The total bytes used in a Ceph cluster. |
Dependent item | ceph.totalusedbytes Preprocessing
|
Ceph: Total number of objects | The total number of objects in a Ceph cluster. |
Dependent item | ceph.total_objects Preprocessing
|
Ceph: Number of Placement Groups | The total number of Placement Groups in a Ceph cluster. |
Dependent item | ceph.num_pg Preprocessing
|
Ceph: Number of Placement Groups in Temporary state | The total number of Placement Groups in a pg_temp state |
Dependent item | ceph.numpgtemp Preprocessing
|
Ceph: Number of Placement Groups in Active state | The total number of Placement Groups in an active state. |
Dependent item | ceph.pg_states.active Preprocessing
|
Ceph: Number of Placement Groups in Clean state | The total number of Placement Groups in a clean state. |
Dependent item | ceph.pg_states.clean Preprocessing
|
Ceph: Number of Placement Groups in Peering state | The total number of Placement Groups in a peering state. |
Dependent item | ceph.pg_states.peering Preprocessing
|
Ceph: Number of Placement Groups in Scrubbing state | The total number of Placement Groups in a scrubbing state. |
Dependent item | ceph.pg_states.scrubbing Preprocessing
|
Ceph: Number of Placement Groups in Undersized state | The total number of Placement Groups in an undersized state. |
Dependent item | ceph.pg_states.undersized Preprocessing
|
Ceph: Number of Placement Groups in Backfilling state | The total number of Placement Groups in a backfill state. |
Dependent item | ceph.pg_states.backfilling Preprocessing
|
Ceph: Number of Placement Groups in degraded state | The total number of Placement Groups in a degraded state. |
Dependent item | ceph.pg_states.degraded Preprocessing
|
Ceph: Number of Placement Groups in inconsistent state | The total number of Placement Groups in an inconsistent state. |
Dependent item | ceph.pg_states.inconsistent Preprocessing
|
Ceph: Number of Placement Groups in Unknown state | The total number of Placement Groups in an unknown state. |
Dependent item | ceph.pg_states.unknown Preprocessing
|
Ceph: Number of Placement Groups in remapped state | The total number of Placement Groups in a remapped state. |
Dependent item | ceph.pg_states.remapped Preprocessing
|
Ceph: Number of Placement Groups in recovering state | The total number of Placement Groups in a recovering state. |
Dependent item | ceph.pg_states.recovering Preprocessing
|
Ceph: Number of Placement Groups in backfill_toofull state | The total number of Placement Groups in a backfill_toofull state. |
Dependent item | ceph.pgstates.backfilltoofull Preprocessing
|
Ceph: Number of Placement Groups in backfill_wait state | The total number of Placement Groups in a backfill_wait state. |
Dependent item | ceph.pgstates.backfillwait Preprocessing
|
Ceph: Number of Placement Groups in recovery_wait state | The total number of Placement Groups in a recovery_wait state. |
Dependent item | ceph.pgstates.recoverywait Preprocessing
|
Ceph: Number of Pools | The total number of pools in a Ceph cluster. |
Dependent item | ceph.num_pools Preprocessing
|
Ceph: Number of OSDs | The number of the known storage daemons in a Ceph cluster. |
Dependent item | ceph.num_osd Preprocessing
|
Ceph: Number of OSDs in state: UP | The total number of the online storage daemons in a Ceph cluster. |
Dependent item | ceph.numosdup Preprocessing
|
Ceph: Number of OSDs in state: IN | The total number of the participating storage daemons in a Ceph cluster. |
Dependent item | ceph.numosdin Preprocessing
|
Ceph: Ceph OSD avg fill | The average fill of OSDs. |
Dependent item | ceph.osd_fill.avg Preprocessing
|
Ceph: Ceph OSD max fill | The percentage of the most filled OSD. |
Dependent item | ceph.osd_fill.max Preprocessing
|
Ceph: Ceph OSD min fill | The percentage fill of the minimum filled OSD. |
Dependent item | ceph.osd_fill.min Preprocessing
|
Ceph: Ceph OSD max PGs | The maximum amount of Placement Groups on OSDs. |
Dependent item | ceph.osd_pgs.max Preprocessing
|
Ceph: Ceph OSD min PGs | The minimum amount of Placement Groups on OSDs. |
Dependent item | ceph.osd_pgs.min Preprocessing
|
Ceph: Ceph OSD avg PGs | The average amount of Placement Groups on OSDs. |
Dependent item | ceph.osd_pgs.avg Preprocessing
|
Ceph: Ceph OSD Apply latency Avg | The average apply latency of OSDs. |
Dependent item | ceph.osdlatencyapply.avg Preprocessing
|
Ceph: Ceph OSD Apply latency Max | The maximum apply latency of OSDs. |
Dependent item | ceph.osdlatencyapply.max Preprocessing
|
Ceph: Ceph OSD Apply latency Min | The minimum apply latency of OSDs. |
Dependent item | ceph.osdlatencyapply.min Preprocessing
|
Ceph: Ceph OSD Commit latency Avg | The average commit latency of OSDs. |
Dependent item | ceph.osdlatencycommit.avg Preprocessing
|
Ceph: Ceph OSD Commit latency Max | The maximum commit latency of OSDs. |
Dependent item | ceph.osdlatencycommit.max Preprocessing
|
Ceph: Ceph OSD Commit latency Min | The minimum commit latency of OSDs. |
Dependent item | ceph.osdlatencycommit.min Preprocessing
|
Ceph: Ceph backfill full ratio | The backfill full ratio setting of the Ceph cluster as configured on OSDMap. |
Dependent item | ceph.osdbackfillfullratio Preprocessing
|
Ceph: Ceph full ratio | The full ratio setting of the Ceph cluster as configured on OSDMap. |
Dependent item | ceph.osdfullratio Preprocessing
|
Ceph: Ceph nearfull ratio | The near full ratio setting of the Ceph cluster as configured on OSDMap. |
Dependent item | ceph.osdnearfullratio Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Ceph: Can not connect to cluster | The connection to the Ceph RESTful module is broken (if there is any error presented including AUTH and the configuration issues). |
last(/Ceph by Zabbix agent 2/ceph.ping["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"])=0 |Average |
||
Ceph: Cluster in ERROR state | last(/Ceph by Zabbix agent 2/ceph.overall_status)=2 |Average |
Manual close: Yes | ||
Ceph: Cluster in WARNING state | last(/Ceph by Zabbix agent 2/ceph.overall_status)=1 |Warning |
Manual close: Yes Depends on:
|
||
Ceph: Minimum monitor release version has changed | A Ceph version has changed. Acknowledge to close the problem manually. |
last(/Ceph by Zabbix agent 2/ceph.min_mon_release_name,#1)<>last(/Ceph by Zabbix agent 2/ceph.min_mon_release_name,#2) and length(last(/Ceph by Zabbix agent 2/ceph.min_mon_release_name))>0 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
OSD | Zabbix agent | ceph.osd.discovery["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Ceph: [osd.{#OSDNAME}] OSD in | Dependent item | ceph.osd[{#OSDNAME},in] Preprocessing
|
|
Ceph: [osd.{#OSDNAME}] OSD up | Dependent item | ceph.osd[{#OSDNAME},up] Preprocessing
|
|
Ceph: [osd.{#OSDNAME}] OSD PGs | Dependent item | ceph.osd[{#OSDNAME},num_pgs] Preprocessing
|
|
Ceph: [osd.{#OSDNAME}] OSD fill | Dependent item | ceph.osd[{#OSDNAME},fill] Preprocessing
|
|
Ceph: [osd.{#OSDNAME}] OSD latency apply | The time taken to flush an update to disks. |
Dependent item | ceph.osd[{#OSDNAME},latency_apply] Preprocessing
|
Ceph: [osd.{#OSDNAME}] OSD latency commit | The time taken to commit an operation to the journal. |
Dependent item | ceph.osd[{#OSDNAME},latency_commit] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Ceph: OSD osd.{#OSDNAME} is down | OSD osd.{#OSDNAME} is marked "down" in the osdmap. |
last(/Ceph by Zabbix agent 2/ceph.osd[{#OSDNAME},up]) = 0 |Average |
||
Ceph: OSD osd.{#OSDNAME} is full | min(/Ceph by Zabbix agent 2/ceph.osd[{#OSDNAME},fill],15m) > last(/Ceph by Zabbix agent 2/ceph.osd_full_ratio)*100 |Average |
|||
Ceph: Ceph OSD osd.{#OSDNAME} is near full | min(/Ceph by Zabbix agent 2/ceph.osd[{#OSDNAME},fill],15m) > last(/Ceph by Zabbix agent 2/ceph.osd_nearfull_ratio)*100 |Warning |
Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Pool | Zabbix agent | ceph.pool.discovery["{$CEPH.CONNSTRING}","{$CEPH.USER}","{$CEPH.API.KEY}"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Ceph: [{#POOLNAME}] Pool Used | The total bytes used in a pool. |
Dependent item | ceph.pool["{#POOLNAME}",bytes_used] Preprocessing
|
Ceph: [{#POOLNAME}] Max available | The maximum available space in the given pool. |
Dependent item | ceph.pool["{#POOLNAME}",max_avail] Preprocessing
|
Ceph: [{#POOLNAME}] Pool RAW Used | Bytes used in pool including the copies made. |
Dependent item | ceph.pool["{#POOLNAME}",stored_raw] Preprocessing
|
Ceph: [{#POOLNAME}] Pool Percent Used | The percentage of the storage used per pool. |
Dependent item | ceph.pool["{#POOLNAME}",percent_used] Preprocessing
|
Ceph: [{#POOLNAME}] Pool objects | The number of objects in the pool. |
Dependent item | ceph.pool["{#POOLNAME}",objects] Preprocessing
|
Ceph: [{#POOLNAME}] Pool Read bandwidth | The read rate per pool (bytes per second). |
Dependent item | ceph.pool["{#POOLNAME}",rd_bytes.rate] Preprocessing
|
Ceph: [{#POOLNAME}] Pool Write bandwidth | The write rate per pool (bytes per second). |
Dependent item | ceph.pool["{#POOLNAME}",wr_bytes.rate] Preprocessing
|
Ceph: [{#POOLNAME}] Pool Read operations | The read rate per pool (operations per second). |
Dependent item | ceph.pool["{#POOLNAME}",rd_ops.rate] Preprocessing
|
Ceph: [{#POOLNAME}] Pool Write operations | The write rate per pool (operations per second). |
Dependent item | ceph.pool["{#POOLNAME}",wr_ops.rate] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Refer to the vendor documentation.
Name | Description | Default |
---|---|---|
{$ARANET.API.ENDPOINT} | Aranet Cloud API endpoint. |
https://aranet.cloud/api |
{$ARANET.API.USERNAME} | Aranet Cloud username. |
<PUT YOUR USERNAME> |
{$ARANET.API.PASSWORD} | Aranet Cloud password. |
<PUT YOUR PASSWORD> |
{$ARANET.API.SPACE_NAME} | Aranet Cloud organization name. |
<PUT YOUR SPACE NAME> |
{$ARANET.LLD.FILTER.SENSOR_NAME.MATCHES} | Filter of discoverable sensors by name. |
.+ |
{$ARANET.LLD.FILTER.SENSORNAME.NOTMATCHES} | Filter to exclude discoverable sensors by name. |
CHANGE_IF_NEEDED |
{$ARANET.LLD.FILTER.SENSOR_ID.MATCHES} | Filter of discoverable sensors by id. |
.+ |
{$ARANET.LLD.FILTER.GATEWAY_NAME.MATCHES} | Filter of discoverable sensors by gateway name. |
.+ |
{$ARANET.LLD.FILTER.GATEWAYNAME.NOTMATCHES} | Filter to exclude discoverable sensors by gateway name. |
CHANGE_IF_NEEDED |
{$ARANET.LLD.FILTER.GATEWAY_ID.MATCHES} | Filter of discoverable sensors by gateway id. |
.+ |
{$ARANET.BATT.VOLTAGE.MIN.WARN} | Battery voltage warning threshold. |
1 |
{$ARANET.BATT.VOLTAGE.MIN.CRIT} | Battery voltage critical threshold. |
2 |
{$ARANET.HUMIDITY.MIN.WARN} | Minimum humidity threshold. |
20 |
{$ARANET.HUMIDITY.MAX.WARN} | Maximum humidity threshold. |
70 |
{$ARANET.CO2.MAX.WARN} | CO2 warning threshold. |
600 |
{$ARANET.CO2.MAX.CRIT} | CO2 critical threshold. |
1000 |
{$ARANET.LAST_UPDATE.MAX.WARN} | Data update delay threshold. |
1h |
Name | Description | Type | Key and additional info |
---|---|---|---|
Aranet: Sensors discovery | Discovery for Aranet Cloud sensors |
Dependent item | aranet.sensor.discovery Preprocessing
|
Aranet: Get data | Script | aranet.get_data |
Name | Description | Type | Key and additional info |
---|---|---|---|
Temperature discovery | Discovery for Aranet Cloud temperature sensors |
Dependent item | aranet.temp.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.temp["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Humidity discovery | Discovery for Aranet Cloud humidity sensors |
Dependent item | aranet.humidity.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.humidity["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
{#METRIC}: Low humidity on "[{#GATEWAYNAME}] {#SENSORNAME}" | max(/Aranet Cloud/aranet.humidity["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) < {$ARANET.HUMIDITY.MIN.WARN:"{#SENSOR_NAME}"} |Warning |
Depends on:
|
||
{#METRIC}: High humidity on "[{#GATEWAYNAME}] {#SENSORNAME}" | min(/Aranet Cloud/aranet.humidity["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) > {$ARANET.HUMIDITY.MAX.WARN:"{#SENSOR_NAME}"} |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
RSSI discovery | Discovery for Aranet Cloud RSSI sensors |
Dependent item | aranet.rssi.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.rssi["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Battery voltage discovery | Discovery for Aranet Cloud Battery voltage sensors |
Dependent item | aranet.battery.voltage.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.battery.voltage["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
{#METRIC}: Low battery voltage on "[{#GATEWAYNAME}] {#SENSORNAME}" | max(/Aranet Cloud/aranet.battery.voltage["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) < {$ARANET.BATT.VOLTAGE.MIN.WARN:"{#SENSOR_NAME}"} |Warning |
Depends on:
|
||
{#METRIC}: Critically low battery voltage on "[{#GATEWAYNAME}] {#SENSORNAME}" | max(/Aranet Cloud/aranet.battery.voltage["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) < {$ARANET.BATT.VOLTAGE.MIN.CRIT:"{#SENSOR_NAME}"} |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
CO2 discovery | Discovery for Aranet Cloud CO2 sensors |
Dependent item | aranet.co2.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.co2["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
{#METRIC}: High CO2 level on "[{#GATEWAYNAME}] {#SENSORNAME}" | min(/Aranet Cloud/aranet.co2["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) > {$ARANET.CO2.MAX.WARN:"{#SENSOR_NAME}"} |Warning |
Depends on:
|
||
{#METRIC}: Critically high CO2 level on "[{#GATEWAYNAME}] {#SENSORNAME}" | min(/Aranet Cloud/aranet.co2["{#GATEWAY_ID}", "{#SENSOR_ID}"],5m) > {$ARANET.CO2.MAX.CRIT:"{#SENSOR_NAME}"} |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Atmospheric pressure discovery | Discovery for Aranet Cloud atmospheric pressure sensors |
Dependent item | aranet.pressure.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.pressure["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Voltage discovery | Discovery for Aranet Cloud Voltage sensors |
Dependent item | aranet.voltage.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.voltage["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Weight discovery | Discovery for Aranet Cloud Weight sensors |
Dependent item | aranet.weight.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.weight["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Volumetric Water Content discovery | Discovery for Aranet Cloud Volumetric Water Content sensors |
Dependent item | aranet.volumwatercontent.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.volumetric.water.content["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
PPFD discovery | Discovery for Aranet Cloud PPFD sensors |
Dependent item | aranet.ppfd.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.ppfd["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Distance discovery | Discovery for Aranet Cloud Distance sensors |
Dependent item | aranet.distance.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.distance["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Illuminance discovery | Discovery for Aranet Cloud Illuminance sensors |
Dependent item | aranet.illuminance.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.illuminance["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
pH discovery | Discovery for Aranet Cloud pH sensors |
Dependent item | aranet.ph.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.ph["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Current discovery | Discovery for Aranet Cloud Current sensors |
Dependent item | aranet.current.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.current["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Soil Dielectric Permittivity discovery | Discovery for Aranet Cloud Soil Dielectric Permittivity sensors |
Dependent item | aranet.soildielectricperm.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.soildielectricperm["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Soil Electrical Conductivity discovery | Discovery for Aranet Cloud Soil Electrical Conductivity sensors |
Dependent item | aranet.soilelectriccond.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.soilelectriccond["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Pore Electrical Conductivity discovery | Discovery for Aranet Cloud Pore Electrical Conductivity sensors |
Dependent item | aranet.poreelectriccond.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.poreelectriccond["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Pulses discovery | Discovery for Aranet Cloud Pulses sensors |
Dependent item | aranet.pulses.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.pulses["{#GATEWAYID}", "{#SENSORID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Pulses Cumulative discovery | Discovery for Aranet Cloud Pulses Cumulative sensors |
Dependent item | aranet.pulses_cumulative.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.pulsescumulative["{#GATEWAYID}", "{#SENSOR_ID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Differential Pressure discovery | Discovery for Aranet Cloud Differential Pressure sensors |
Dependent item | aranet.diff_pressure.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.diffpressure["{#GATEWAYID}", "{#SENSOR_ID}"] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Last update discovery | Discovery for Aranet Cloud Last update metric |
Dependent item | aranet.last_update.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#METRIC}: [{#GATEWAYNAME}] {#SENSORNAME} | Dependent item | aranet.lastupdate["{#GATEWAYID}", "{#SENSOR_ID}"] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
{#METRIC}: Sensor data "[{#GATEWAYNAME}] {#SENSORNAME}" is not updated | last(/Aranet Cloud/aranet.last_update["{#GATEWAY_ID}", "{#SENSOR_ID}"]) > {$ARANET.LAST_UPDATE.MAX.WARN:"{#SENSOR_NAME}"} |Warning |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Apache monitoring by Zabbix via HTTP and doesn't require any external scripts.
The template collects metrics by polling mod_status
with HTTP agent remotely:
127.0.0.1
ServerVersion: Apache/2.4.41 (Unix)
ServerMPM: event
Server Built: Aug 14 2019 00:35:10
CurrentTime: Friday, 16-Aug-2019 12:38:40 UTC
RestartTime: Wednesday, 14-Aug-2019 07:58:26 UTC
ParentServerConfigGeneration: 1
ParentServerMPMGeneration: 0
ServerUptimeSeconds: 189613
ServerUptime: 2 days 4 hours 40 minutes 13 seconds
Load1: 4.60
Load5: 1.20
Load15: 0.47
Total Accesses: 27860
Total kBytes: 33011
Total Duration: 54118
CPUUser: 18.02
CPUSystem: 31.76
CPUChildrenUser: 0
CPUChildrenSystem: 0
CPULoad: .0262535
Uptime: 189613
ReqPerSec: .146931
BytesPerSec: 178.275
BytesPerReq: 1213.33
DurationPerReq: 1.9425
BusyWorkers: 7
IdleWorkers: 93
Processes: 4
Stopping: 0
BusyWorkers: 7
IdleWorkers: 93
ConnsTotal: 13
ConnsAsyncWriting: 0
ConnsAsyncKeepAlive: 5
ConnsAsyncClosing: 0
Scoreboard: ...
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
mod_status
.Check the availability of the module with this command line:
httpd -M 2>/dev/null | grep status_module
This is an example configuration of the Apache web server:
<Location "/server-status">
SetHandler server-status
Require host example.com
</Location>
{$APACHE.STATUS.HOST}
macro. You can also change the status page port in the {$APACHE.STATUS.PORT}
macro and status page path in the {$APACHE.STATUS.PATH}
macro if necessary.Name | Description | Default |
---|---|---|
{$APACHE.STATUS.HOST} | The hostname or IP address of the Apache status page host. |
<SET APACHE HOST> |
{$APACHE.STATUS.PORT} | The port of the Apache status page. |
80 |
{$APACHE.STATUS.PATH} | The URL path. |
server-status?auto |
{$APACHE.STATUS.SCHEME} | The request scheme, which may be either HTTP or HTTPS. |
http |
{$APACHE.RESPONSE_TIME.MAX.WARN} | The maximum Apache response time expressed in seconds for a trigger expression. |
10 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Apache: Get status | Getting data from a machine-readable version of the Apache status page. For more information see Apache Module mod_status. |
HTTP agent | apache.get_status Preprocessing
|
Apache: Service ping | Simple check | net.tcp.service[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"] Preprocessing
|
|
Apache: Service response time | Simple check | net.tcp.service.perf[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"] | |
Apache: Total bytes | The total bytes served. |
Dependent item | apache.bytes Preprocessing
|
Apache: Bytes per second | It is calculated as a rate of change for total bytes statistics.
|
Dependent item | apache.bytes.rate Preprocessing
|
Apache: Requests per second | It is calculated as a rate of change for the "Total requests" statistics.
|
Dependent item | apache.requests.rate Preprocessing
|
Apache: Total requests | The total number of the Apache server accesses. |
Dependent item | apache.requests Preprocessing
|
Apache: Uptime | The service uptime expressed in seconds. |
Dependent item | apache.uptime Preprocessing
|
Apache: Version | The Apache service version. |
Dependent item | apache.version Preprocessing
|
Apache: Total workers busy | The total number of busy worker threads/processes. |
Dependent item | apache.workers_total.busy Preprocessing
|
Apache: Total workers idle | The total number of idle worker threads/processes. |
Dependent item | apache.workers_total.idle Preprocessing
|
Apache: Workers closing connection | The number of workers in closing state. |
Dependent item | apache.workers.closing Preprocessing
|
Apache: Workers DNS lookup | The number of workers in |
Dependent item | apache.workers.dnslookup Preprocessing
|
Apache: Workers finishing | The number of workers in finishing state. |
Dependent item | apache.workers.finishing Preprocessing
|
Apache: Workers idle cleanup | The number of workers in cleanup state. |
Dependent item | apache.workers.cleanup Preprocessing
|
Apache: Workers keepalive (read) | The number of workers in |
Dependent item | apache.workers.keepalive Preprocessing
|
Apache: Workers logging | The number of workers in logging state. |
Dependent item | apache.workers.logging Preprocessing
|
Apache: Workers reading request | The number of workers in reading state. |
Dependent item | apache.workers.reading Preprocessing
|
Apache: Workers sending reply | The number of workers in sending state. |
Dependent item | apache.workers.sending Preprocessing
|
Apache: Workers slot with no current process | The number of slots with no current process. |
Dependent item | apache.workers.slot Preprocessing
|
Apache: Workers starting up | The number of workers in starting state. |
Dependent item | apache.workers.starting Preprocessing
|
Apache: Workers waiting for connection | The number of workers in waiting state. |
Dependent item | apache.workers.waiting Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Apache: Failed to fetch status page | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Apache by HTTP/apache.get_status,30m)=1 |Warning |
Manual close: Yes Depends on:
|
|
Apache: Service is down | last(/Apache by HTTP/net.tcp.service[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"])=0 |Average |
Manual close: Yes | ||
Apache: Service response time is too high | min(/Apache by HTTP/net.tcp.service.perf[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"],5m)>{$APACHE.RESPONSE_TIME.MAX.WARN} |Warning |
Manual close: Yes Depends on:
|
||
Apache: Host has been restarted | Uptime is less than 10 minutes. |
last(/Apache by HTTP/apache.uptime)<10m |Info |
Manual close: Yes | |
Apache: Version has changed | Apache version has changed. Acknowledge to close the problem manually. |
last(/Apache by HTTP/apache.version,#1)<>last(/Apache by HTTP/apache.version,#2) and length(last(/Apache by HTTP/apache.version))>0 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Event MPM discovery | The discovery of additional metrics if the event Multi-Processing Module (MPM) is used. For more details see Apache MPM event. |
Dependent item | apache.mpm.event.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Apache: Connections async closing | The number of asynchronous connections in closing state (applicable only to the event MPM). |
Dependent item | apache.connections[async_closing{#SINGLETON}] Preprocessing
|
Apache: Connections async keepalive | The number of asynchronous connections in keepalive state (applicable only to the event MPM). |
Dependent item | apache.connections[asynckeepalive{#SINGLETON}] Preprocessing
|
Apache: Connections async writing | The number of asynchronous connections in writing state (applicable only to the event MPM). |
Dependent item | apache.connections[async_writing{#SINGLETON}] Preprocessing
|
Apache: Connections total | The number of total connections. |
Dependent item | apache.connections[total{#SINGLETON}] Preprocessing
|
Apache: Bytes per request | The average number of client requests per second. |
Dependent item | apache.bytes[per_request{#SINGLETON}] Preprocessing
|
Apache: Number of async processes | The number of asynchronous processes. |
Dependent item | apache.process[num{#SINGLETON}] Preprocessing
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Apache monitoring by Zabbix via Zabbix agent and doesn't require any external scripts.
The template Apache by Zabbix agent
- collects metrics by polling mod_status locally with Zabbix agent:
127.0.0.1
ServerVersion: Apache/2.4.41 (Unix)
ServerMPM: event
Server Built: Aug 14 2019 00:35:10
CurrentTime: Friday, 16-Aug-2019 12:38:40 UTC
RestartTime: Wednesday, 14-Aug-2019 07:58:26 UTC
ParentServerConfigGeneration: 1
ParentServerMPMGeneration: 0
ServerUptimeSeconds: 189613
ServerUptime: 2 days 4 hours 40 minutes 13 seconds
Load1: 4.60
Load5: 1.20
Load15: 0.47
Total Accesses: 27860
Total kBytes: 33011
Total Duration: 54118
CPUUser: 18.02
CPUSystem: 31.76
CPUChildrenUser: 0
CPUChildrenSystem: 0
CPULoad: .0262535
Uptime: 189613
ReqPerSec: .146931
BytesPerSec: 178.275
BytesPerReq: 1213.33
DurationPerReq: 1.9425
BusyWorkers: 7
IdleWorkers: 93
Processes: 4
Stopping: 0
BusyWorkers: 7
IdleWorkers: 93
ConnsTotal: 13
ConnsAsyncWriting: 0
ConnsAsyncKeepAlive: 5
ConnsAsyncClosing: 0
Scoreboard: ...
It also uses Zabbix agent to collect Apache
Linux process statistics such as CPU usage, memory usage, and whether the process is running or not.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
See the setup instructions for mod_status.
Check the availability of the module with this command line: httpd -M 2>/dev/null | grep status_module
This is an example configuration of the Apache web server:
<Location "/server-status">
SetHandler server-status
Require host example.com
</Location>
If you use another path, then do not forget to change the {$APACHE.STATUS.PATH}
macro.
Install and setup Zabbix agent.
Name | Description | Default |
---|---|---|
{$APACHE.STATUS.HOST} | The hostname or IP address of the Apache status page. |
127.0.0.1 |
{$APACHE.STATUS.PORT} | The port of the Apache status page. |
80 |
{$APACHE.STATUS.PATH} | The URL path. |
server-status?auto |
{$APACHE.STATUS.SCHEME} | The request scheme, which may be either HTTP or HTTPS. |
http |
{$APACHE.RESPONSE_TIME.MAX.WARN} | The maximum Apache response time expressed in seconds for a trigger expression. |
10 |
{$APACHE.PROCESS_NAME} | The process name filter for the Apache process discovery. |
(httpd|apache2) |
{$APACHE.PROCESS.NAME.PARAMETER} | The process name of the Apache web server used in the item key |
Name | Description | Type | Key and additional info |
---|---|---|---|
Apache: Get status | Getting data from a machine-readable version of the Apache status page. For more information see Apache Module mod_status. |
Zabbix agent | web.page.get["{$APACHE.STATUS.SCHEME}://{$APACHE.STATUS.HOST}:{$APACHE.STATUS.PORT}/{$APACHE.STATUS.PATH}"] Preprocessing
|
Apache: Service ping | Zabbix agent | net.tcp.service[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"] Preprocessing
|
|
Apache: Service response time | Zabbix agent | net.tcp.service.perf[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"] | |
Apache: Total bytes | The total bytes served. |
Dependent item | apache.bytes Preprocessing
|
Apache: Bytes per second | It is calculated as a rate of change for total bytes statistics.
|
Dependent item | apache.bytes.rate Preprocessing
|
Apache: Requests per second | It is calculated as a rate of change for the "Total requests" statistics.
|
Dependent item | apache.requests.rate Preprocessing
|
Apache: Total requests | The total number of the Apache server accesses. |
Dependent item | apache.requests Preprocessing
|
Apache: Uptime | The service uptime expressed in seconds. |
Dependent item | apache.uptime Preprocessing
|
Apache: Version | The Apache service version. |
Dependent item | apache.version Preprocessing
|
Apache: Total workers busy | The total number of busy worker threads/processes. |
Dependent item | apache.workers_total.busy Preprocessing
|
Apache: Total workers idle | The total number of idle worker threads/processes. |
Dependent item | apache.workers_total.idle Preprocessing
|
Apache: Workers closing connection | The number of workers in closing state. |
Dependent item | apache.workers.closing Preprocessing
|
Apache: Workers DNS lookup | The number of workers in |
Dependent item | apache.workers.dnslookup Preprocessing
|
Apache: Workers finishing | The number of workers in finishing state. |
Dependent item | apache.workers.finishing Preprocessing
|
Apache: Workers idle cleanup | The number of workers in cleanup state. |
Dependent item | apache.workers.cleanup Preprocessing
|
Apache: Workers keepalive (read) | The number of workers in |
Dependent item | apache.workers.keepalive Preprocessing
|
Apache: Workers logging | The number of workers in logging state. |
Dependent item | apache.workers.logging Preprocessing
|
Apache: Workers reading request | The number of workers in reading state. |
Dependent item | apache.workers.reading Preprocessing
|
Apache: Workers sending reply | The number of workers in sending state. |
Dependent item | apache.workers.sending Preprocessing
|
Apache: Workers slot with no current process | The number of slots with no current process. |
Dependent item | apache.workers.slot Preprocessing
|
Apache: Workers starting up | The number of workers in starting state. |
Dependent item | apache.workers.starting Preprocessing
|
Apache: Workers waiting for connection | The number of workers in waiting state. |
Dependent item | apache.workers.waiting Preprocessing
|
Apache: Get processes summary | The aggregated data of summary metrics for all processes. |
Zabbix agent | proc.get[{$APACHE.PROCESS.NAME.PARAMETER},,,summary] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Apache: Host has been restarted | Uptime is less than 10 minutes. |
last(/Apache by Zabbix agent/apache.uptime)<10m |Info |
Manual close: Yes | |
Apache: Version has changed | Apache version has changed. Acknowledge to close the problem manually. |
last(/Apache by Zabbix agent/apache.version,#1)<>last(/Apache by Zabbix agent/apache.version,#2) and length(last(/Apache by Zabbix agent/apache.version))>0 |Info |
Manual close: Yes |
Name | Description | Type | Key and additional info |
---|---|---|---|
Event MPM discovery | The discovery of additional metrics if the event Multi-Processing Module (MPM) is used. For more details see Apache MPM event. |
Dependent item | apache.mpm.event.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Apache: Connections async closing | The number of asynchronous connections in closing state (applicable only to the event MPM). |
Dependent item | apache.connections[async_closing{#SINGLETON}] Preprocessing
|
Apache: Connections async keepalive | The number of asynchronous connections in keepalive state (applicable only to the event MPM). |
Dependent item | apache.connections[asynckeepalive{#SINGLETON}] Preprocessing
|
Apache: Connections async writing | The number of asynchronous connections in writing state (applicable only to the event MPM). |
Dependent item | apache.connections[async_writing{#SINGLETON}] Preprocessing
|
Apache: Connections total | The number of total connections. |
Dependent item | apache.connections[total{#SINGLETON}] Preprocessing
|
Apache: Bytes per request | The average number of client requests per second. |
Dependent item | apache.bytes[per_request{#SINGLETON}] Preprocessing
|
Apache: Number of async processes | The number of asynchronous processes. |
Dependent item | apache.process[num{#SINGLETON}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Apache process discovery | The discovery of the Apache process summary. |
Dependent item | apache.proc.discovery |
Name | Description | Type | Key and additional info |
---|---|---|---|
Apache: CPU utilization | The percentage of the CPU utilization by a process {#APACHE.NAME}. |
Zabbix agent | proc.cpu.util[{#APACHE.NAME}] |
Apache: Get process data | The summary metrics aggregated by a process {#APACHE.NAME}. |
Dependent item | apache.proc.get[{#APACHE.NAME}] Preprocessing
|
Apache: Memory usage (rss) | The summary of resident set size memory used by a process {#APACHE.NAME} expressed in bytes. |
Dependent item | apache.proc.rss[{#APACHE.NAME}] Preprocessing
|
Apache: Memory usage (vsize) | The summary of virtual memory used by a process {#APACHE.NAME} expressed in bytes. |
Dependent item | apache.proc.vmem[{#APACHE.NAME}] Preprocessing
|
Apache: Memory usage, % | The percentage of real memory used by a process {#APACHE.NAME}. |
Dependent item | apache.proc.pmem[{#APACHE.NAME}] Preprocessing
|
Apache: Number of running processes | The number of running processes {#APACHE.NAME}. |
Dependent item | apache.proc.num[{#APACHE.NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Apache: Process is not running | last(/Apache by Zabbix agent/apache.proc.num[{#APACHE.NAME}])=0 |High |
|||
Apache: Service is down | last(/Apache by Zabbix agent/net.tcp.service[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"])=0 and last(/Apache by Zabbix agent/apache.proc.num[{#APACHE.NAME}])>0 |Average |
Manual close: Yes | ||
Apache: Failed to fetch status page | Zabbix has not received any data for items for the last 30 minutes. |
nodata(/Apache by Zabbix agent/web.page.get["{$APACHE.STATUS.SCHEME}://{$APACHE.STATUS.HOST}:{$APACHE.STATUS.PORT}/{$APACHE.STATUS.PATH}"],30m)=1 and last(/Apache by Zabbix agent/apache.proc.num[{#APACHE.NAME}])>0 |Warning |
Manual close: Yes Depends on:
|
|
Apache: Service response time is too high | min(/Apache by Zabbix agent/net.tcp.service.perf[http,"{$APACHE.STATUS.HOST}","{$APACHE.STATUS.PORT}"],5m)>{$APACHE.RESPONSE_TIME.MAX.WARN} and last(/Apache by Zabbix agent/apache.proc.num[{#APACHE.NAME}])>0 |Warning |
Manual close: Yes Depends on:
|
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Apache ActiveMQ monitoring by Zabbix via JMX and doesn't require any external scripts.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
Metrics are collected by JMX.
Name | Description | Default |
---|---|---|
{$ACTIVEMQ.USER} | User for JMX |
admin |
{$ACTIVEMQ.PASSWORD} | Password for JMX |
activemq |
{$ACTIVEMQ.PORT} | Port for JMX |
1099 |
{$ACTIVEMQ.LLD.FILTER.BROKER.MATCHES} | Filter of discoverable discovered brokers |
.* |
{$ACTIVEMQ.LLD.FILTER.BROKER.NOT_MATCHES} | Filter to exclude discovered brokers |
CHANGE IF NEEDED |
{$ACTIVEMQ.LLD.FILTER.DESTINATION.MATCHES} | Filter of discoverable discovered destinations |
.* |
{$ACTIVEMQ.LLD.FILTER.DESTINATION.NOT_MATCHES} | Filter to exclude discovered destinations |
CHANGE IF NEEDED |
{$ACTIVEMQ.MSG.RATE.WARN.TIME} | The time for message enqueue/dequeue rate. Can be used with destination or broker name as context. |
15m |
{$ACTIVEMQ.MEM.MAX.WARN} | Memory threshold for AVERAGE trigger. Can be used with destination or broker name as context. |
75 |
{$ACTIVEMQ.MEM.MAX.HIGH} | Memory threshold for HIGH trigger. Can be used with destination or broker name as context. |
90 |
{$ACTIVEMQ.MEM.TIME} | Time during which the metric can be above the threshold. Can be used with destination or broker name as context. |
5m |
{$ACTIVEMQ.STORE.MAX.WARN} | Storage threshold for AVERAGE trigger. Can be used with broker name as context. |
75 |
{$ACTIVEMQ.STORE.TIME} | Time during which the metric can be above the threshold. Can be used with destination or broker name as context. |
5m |
{$ACTIVEMQ.STORE.MAX.HIGH} | Storage threshold for HIGH trigger. Can be used with broker name as context. |
90 |
{$ACTIVEMQ.TEMP.MAX.WARN} | Temp threshold for AVERAGE trigger. Can be used with broker name as context. |
75 |
{$ACTIVEMQ.TEMP.MAX.HIGH} | Temp threshold for HIGH trigger. Can be used with broker name as context. |
90 |
{$ACTIVEMQ.TEMP.TIME} | Time during which the metric can be above the threshold. Can be used with destination or broker name as context. |
5m |
{$ACTIVEMQ.DESTINATION.CONSUMERS.MIN.TIME} | Time during which there may be no consumers in destination. Can be used with destination name as context. |
10m |
{$ACTIVEMQ.DESTINATION.CONSUMERS.MIN.HIGH} | Minimum amount of consumers for destination. Can be used with destination name as context. |
1 |
{$ACTIVEMQ.DESTINATION.PRODUCERS.MIN.TIME} | Time during which there may be no producers on destination. Can be used with destination name as context. |
10m |
{$ACTIVEMQ.DESTINATION.PRODUCERS.MIN.HIGH} | Minimum amount of producers for destination. Can be used with destination name as context. |
1 |
{$ACTIVEMQ.BROKER.CONSUMERS.MIN.TIME} | Time during which there may be no consumers on destination. Can be used with broker name as context. |
5m |
{$ACTIVEMQ.BROKER.CONSUMERS.MIN.HIGH} | Minimum amount of consumers for broker. Can be used with broker name as context. |
1 |
{$ACTIVEMQ.BROKER.PRODUCERS.MIN.TIME} | Time during which there may be no producers on broker. Can be used with broker name as context. |
5m |
{$ACTIVEMQ.BROKER.PRODUCERS.MIN.HIGH} | Minimum amount of producers for broker. Can be used with broker name as context. |
1 |
{$ACTIVEMQ.TOTAL.CONSUMERS.COUNT} | Attribute for TotalConsumerCount per destination. Used to suppress destination's triggers when the count of consumers on the broker is lower than threshold. |
TotalConsumerCount |
{$ACTIVEMQ.TOTAL.PRODUCERS.COUNT} | Attribute for TotalProducerCount per destination. Used to suppress destination's triggers when the count of consumers on the broker is lower than threshold. |
TotalProducerCount |
{$ACTIVEMQ.QUEUE.TIME} | Time during which the QueueSize can be higher than threshold. Can be used with destination name as context. |
10m |
{$ACTIVEMQ.QUEUE.WARN} | Threshold for QueueSize. Can be used with destination name as context. |
100 |
{$ACTIVEMQ.QUEUE.ENABLED} | Use this to disable alerting for specific destination. 1 = enabled, 0 = disabled. Can be used with destination name as context. |
1 |
{$ACTIVEMQ.EXPIRED.WARN} | Threshold for expired messages count. Can be used with destination name as context. |
0 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Brokers discovery | Discovery of brokers |
JMX agent | jmx.discovery[beans,"org.apache.activemq:type=Broker,brokerName=*"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
Broker {#JMXBROKERNAME}: Version | The version of the broker. |
JMX agent | jmx[{#JMXOBJ},BrokerVersion] Preprocessing
|
Broker {#JMXBROKERNAME}: Uptime | The uptime of the broker. |
JMX agent | jmx[{#JMXOBJ},UptimeMillis] Preprocessing
|
Broker {#JMXBROKERNAME}: Memory limit | Memory limit, in bytes, used for holding undelivered messages before paging to temporary storage. |
JMX agent | jmx[{#JMXOBJ},MemoryLimit] Preprocessing
|
Broker {#JMXBROKERNAME}: Memory usage in percents | Percent of memory limit used. |
JMX agent | jmx[{#JMXOBJ}, MemoryPercentUsage] |
Broker {#JMXBROKERNAME}: Storage limit | Disk limit, in bytes, used for persistent messages before producers are blocked. |
JMX agent | jmx[{#JMXOBJ},StoreLimit] Preprocessing
|
Broker {#JMXBROKERNAME}: Storage usage in percents | Percent of store limit used. |
JMX agent | jmx[{#JMXOBJ},StorePercentUsage] |
Broker {#JMXBROKERNAME}: Temp limit | Disk limit, in bytes, used for non-persistent messages and temporary data before producers are blocked. |
JMX agent | jmx[{#JMXOBJ},TempLimit] Preprocessing
|
Broker {#JMXBROKERNAME}: Temp usage in percents | Percent of temp limit used. |
JMX agent | jmx[{#JMXOBJ},TempPercentUsage] |
Broker {#JMXBROKERNAME}: Messages enqueue rate | Rate of messages that have been sent to the broker. |
JMX agent | jmx[{#JMXOBJ},TotalEnqueueCount] Preprocessing
|
Broker {#JMXBROKERNAME}: Messages dequeue rate | Rate of messages that have been delivered by the broker and acknowledged by consumers. |
JMX agent | jmx[{#JMXOBJ},TotalDequeueCount] Preprocessing
|
Broker {#JMXBROKERNAME}: Consumers count total | Number of consumers attached to this broker. |
JMX agent | jmx[{#JMXOBJ},TotalConsumerCount] |
Broker {#JMXBROKERNAME}: Producers count total | Number of producers attached to this broker. |
JMX agent | jmx[{#JMXOBJ},TotalProducerCount] |
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Broker {#JMXBROKERNAME}: Version has been changed | The Broker {#JMXBROKERNAME} version has changed. Acknowledge to close the problem manually. |
last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},BrokerVersion],#1)<>last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},BrokerVersion],#2) and length(last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},BrokerVersion]))>0 |Info |
Manual close: Yes | |
Broker {#JMXBROKERNAME}: Broker has been restarted | Uptime is less than 10 minutes. |
last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},UptimeMillis])<10m |Info |
Manual close: Yes | |
Broker {#JMXBROKERNAME}: Memory usage is too high | min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ}, MemoryPercentUsage],{$ACTIVEMQ.MEM.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.MEM.MAX.WARN:"{#JMXBROKERNAME}"} |Average |
Depends on:
|
||
Broker {#JMXBROKERNAME}: Memory usage is too high | min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ}, MemoryPercentUsage],{$ACTIVEMQ.MEM.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.MEM.MAX.HIGH:"{#JMXBROKERNAME}"} |High |
|||
Broker {#JMXBROKERNAME}: Storage usage is too high | min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},StorePercentUsage],{$ACTIVEMQ.STORE.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.STORE.MAX.WARN:"{#JMXBROKERNAME}"} |Average |
Depends on:
|
||
Broker {#JMXBROKERNAME}: Storage usage is too high | min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},StorePercentUsage],{$ACTIVEMQ.STORE.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.STORE.MAX.HIGH:"{#JMXBROKERNAME}"} |High |
|||
Broker {#JMXBROKERNAME}: Temp usage is too high | min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TempPercentUsage],{$ACTIVEMQ.TEMP.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.TEMP.MAX.WARN} |Average |
Depends on:
|
||
Broker {#JMXBROKERNAME}: Temp usage is too high | min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TempPercentUsage],{$ACTIVEMQ.TEMP.TIME:"{#JMXBROKERNAME}"})>{$ACTIVEMQ.TEMP.MAX.HIGH} |High |
|||
Broker {#JMXBROKERNAME}: Message enqueue rate is higher than dequeue rate | Enqueue rate is higher than dequeue rate. It may indicate performance problems. |
avg(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TotalEnqueueCount],{$ACTIVEMQ.MSG.RATE.WARN.TIME:"{#JMXBROKERNAME}"})>avg(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TotalEnqueueCount],{$ACTIVEMQ.MSG.RATE.WARN.TIME:"{#JMXBROKERNAME}"}) |Average |
||
Broker {#JMXBROKERNAME}: Consumers count is too low | max(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TotalConsumerCount],{$ACTIVEMQ.BROKER.CONSUMERS.MIN.TIME:"{#JMXBROKERNAME}"})<{$ACTIVEMQ.BROKER.CONSUMERS.MIN.HIGH:"{#JMXBROKERNAME}"} |High |
|||
Broker {#JMXBROKERNAME}: Producers count is too low | max(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},TotalProducerCount],{$ACTIVEMQ.BROKER.PRODUCERS.MIN.TIME:"{#JMXBROKERNAME}"})<{$ACTIVEMQ.BROKER.PRODUCERS.MIN.HIGH:"{#JMXBROKERNAME}"} |High |
Name | Description | Type | Key and additional info |
---|---|---|---|
Destinations discovery | Discovery of destinations |
JMX agent | jmx.discovery[beans,"org.apache.activemq:type=Broker,brokerName=,destinationType=,destinationName=*"] |
Name | Description | Type | Key and additional info |
---|---|---|---|
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Consumers count | Number of consumers attached to this destination. |
JMX agent | jmx[{#JMXOBJ},ConsumerCount] |
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Consumers count total on {#JMXBROKERNAME} | Number of consumers attached to the broker of this destination. Used to suppress destination's triggers when the count of consumers on the broker is lower than threshold. |
JMX agent | jmx["org.apache.activemq:type=Broker,brokerName={#JMXBROKERNAME}",{$ACTIVEMQ.TOTAL.CONSUMERS.COUNT: "{#JMXDESTINATIONNAME}"}] Preprocessing
|
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Producers count | Number of producers attached to this destination. |
JMX agent | jmx[{#JMXOBJ},ProducerCount] |
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Producers count total on {#JMXBROKERNAME} | Number of producers attached to the broker of this destination. Used to suppress destination's triggers when the count of producers on the broker is lower than threshold. |
JMX agent | jmx["org.apache.activemq:type=Broker,brokerName={#JMXBROKERNAME}",{$ACTIVEMQ.TOTAL.PRODUCERS.COUNT: "{#JMXDESTINATIONNAME}"}] Preprocessing
|
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Memory usage in percents | The percentage of the memory limit used. |
JMX agent | jmx[{#JMXOBJ},MemoryPercentUsage] |
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Messages enqueue rate | Rate of messages that have been sent to the destination. |
JMX agent | jmx[{#JMXOBJ},EnqueueCount] Preprocessing
|
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Messages dequeue rate | Rate of messages that has been acknowledged (and removed) from the destination. |
JMX agent | jmx[{#JMXOBJ},DequeueCount] Preprocessing
|
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Queue size | Number of messages on this destination, including any that have been dispatched but not acknowledged. |
JMX agent | jmx[{#JMXOBJ},QueueSize] |
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Expired messages count | Number of messages that have been expired. |
JMX agent | jmx[{#JMXOBJ},ExpiredCount] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Consumers count is too low | max(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},ConsumerCount],{$ACTIVEMQ.DESTINATION.CONSUMERS.MIN.TIME:"{#JMXDESTINATIONNAME}"})<{$ACTIVEMQ.DESTINATION.CONSUMERS.MIN.HIGH:"{#JMXDESTINATIONNAME}"} and last(/Apache ActiveMQ by JMX/jmx["org.apache.activemq:type=Broker,brokerName={#JMXBROKERNAME}",{$ACTIVEMQ.TOTAL.CONSUMERS.COUNT: "{#JMXDESTINATIONNAME}"}])>{$ACTIVEMQ.BROKER.CONSUMERS.MIN.HIGH:"{#JMXBROKERNAME}"} |Average |
Manual close: Yes | ||
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Producers count is too low | max(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},ProducerCount],{$ACTIVEMQ.DESTINATION.PRODUCERS.MIN.TIME:"{#JMXDESTINATIONNAME}"})<{$ACTIVEMQ.DESTINATION.PRODUCERS.MIN.HIGH:"{#JMXDESTINATIONNAME}"} and last(/Apache ActiveMQ by JMX/jmx["org.apache.activemq:type=Broker,brokerName={#JMXBROKERNAME}",{$ACTIVEMQ.TOTAL.PRODUCERS.COUNT: "{#JMXDESTINATIONNAME}"}])>{$ACTIVEMQ.BROKER.PRODUCERS.MIN.HIGH:"{#JMXBROKERNAME}"} |Average |
Manual close: Yes | ||
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Memory usage is too high | last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},MemoryPercentUsage])>{$ACTIVEMQ.MEM.MAX.WARN:"{#JMXDESTINATIONNAME}"} |Average |
|||
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Memory usage is too high | last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},MemoryPercentUsage])>{$ACTIVEMQ.MEM.MAX.HIGH:"{#JMXDESTINATIONNAME}"} |High |
|||
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Message enqueue rate is higher than dequeue rate | Enqueue rate is higher than dequeue rate. It may indicate performance problems. |
avg(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},EnqueueCount],{$ACTIVEMQ.MSG.RATE.WARN.TIME:"{#JMXDESTINATIONNAME}"})>avg(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},DequeueCount],{$ACTIVEMQ.MSG.RATE.WARN.TIME:"{#JMXDESTINATIONNAME}"}) |Average |
||
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Queue size is high | Queue size is higher than threshold. It may indicate performance problems. |
min(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},QueueSize],{$ACTIVEMQ.QUEUE.TIME:"{#JMXDESTINATIONNAME}"})>{$ACTIVEMQ.QUEUE.WARN:"{#JMXDESTINATIONNAME}"} and {$ACTIVEMQ.QUEUE.ENABLED:"{#JMXDESTINATIONNAME}"}=1 |Average |
||
{#JMXBROKERNAME}: {#JMXDESTINATIONTYPE} {#JMXDESTINATIONNAME}: Expired messages count is high | This metric represents the number of messages that expired before they could be delivered. If you expect all messages to be delivered and acknowledged within a certain amount of time, you can set an expiration for each message, and investigate if your ExpiredCount metric rises above zero. |
last(/Apache ActiveMQ by JMX/jmx[{#JMXOBJ},ExpiredCount])>{$ACTIVEMQ.EXPIRED.WARN:"{#JMXDESTINATIONNAME}"} |Average |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums
This template is designed for the effortless deployment of Acronis Cyber Protect Cloud monitoring by Zabbix via HTTP and doesn't require any external scripts.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This is a master template that needs to be assigned to a host, and it will automatically create MSP host prototype, which will monitor Acronis Cyber Protect Cloud metrics.
Before using this template it is required to create a new MSP-level API client for Zabbix to use. To do that, sign into your Acronis Cyber Protect Cloud WEB interface, navigate to Settings
-> API clients
and create new API client.
You will be shown credentials for this API client. These credentials need to be entered in the following user macros of this template:
{$ACRONIS.CPC.AUTH.CLIENT.ID}
- enter Client ID
here;
{$ACRONIS.CPC.AUTH.SECRET}
- enter Secret
here;
{$ACRONIS.CPC.DATACENTER.URL}
- enter Data center URL
This is all the configuration needed for this integration.
Name | Description | Default |
---|---|---|
{$ACRONIS.CPC.DATACENTER.URL} | Acronis Cyber Protect Cloud datacenter URL, e.g., https://eu2-cloud.acronis.com. |
|
{$ACRONIS.CPC.AUTH.INTERVAL} | API token regeneration interval, in minutes. By default, Acronis Cyber Protect Cloud tokens expire after 2 hours. |
110m |
{$ACRONIS.CPC.HTTP.PROXY} | Sets the HTTP proxy for the authorization item. Host prototypes will also use this value for HTTP proxy. If this parameter is empty, then no proxy is used. |
|
{$ACRONIS.CPC.AUTH.CLIENT.ID} | Client ID for API user access. |
|
{$ACRONIS.CPC.AUTH.SECRET} | Secret for API user access. |
|
{$ACRONIS.CPC.PATH.ACCOUNT.MANAGEMENT} | Sub-path for the Account Management API. |
/api/2 |
Name | Description | Type | Key and additional info |
---|---|---|---|
Acronis CPC: Get access token | Authorizes API user and receives access token. |
HTTP agent | acronis.cpc.accountmanager.gettoken Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Acronis CPC: MSP Discovery | Discovers MSP and creates host prototype based on that. |
Dependent item | acronis.cpc.lld.msp_discovery |
This template is designed for the effortless deployment of Acronis Cyber Protect Cloud MSP monitoring by Zabbix via HTTP and doesn't require any external scripts.
Zabbix version: 6.4 and higher.
This template has been tested on:
Zabbix should be configured according to the instructions in the Templates out of the box section.
This template is not meant to be used independently. A host with the Acronis Cyber Protect Cloud by HTTP
template will request API token and automatically create a host prototype with this template assigned to it.
If needed, you can specify an HTTP proxy for the template to use by changing the value of {$ACRONIS.CPC.HTTP.PROXY}
user macro.
Device discovery trigger prototypes that check services which have failed to run, have trigger time offset user macros:
{$ACRONIS.CPC.OFFSET.SCHEDULED.ANTIMALWARE}
{$ACRONIS.CPC.OFFSET.SCHEDULED.BACKUP}
{$ACRONIS.CPC.OFFSET.SCHEDULED.VULNERABILITY}
{$ACRONIS.CPC.OFFSET.SCHEDULED.PATCH}
Using these macros, their respective triggers can be offset in both directions. For example, if you wish to make
sure that the trigger fires only when the current time is at least 3 minutes over the next scheduled antimalware
scan, then set the value of {$ACRONIS.CPC.OFFSET.SCHEDULED.ANTIMALWARE}
user macro to -180
.
This is the default behaviour.
Name | Description | Default |
---|---|---|
{$ACRONIS.CPC.DATACENTER.URL} | Acronis Cyber Protect Cloud datacenter URL, e.g., https://eu2-cloud.acronis.com. |
|
{$ACRONIS.CPC.HTTP.PROXY} | Sets the HTTP proxy for the authorization item. Host prototypes will also use this value for HTTP proxy. If this parameter is empty, then no proxy is used. |
|
{$ACRONIS.CPC.CYBERFIT.WARN} | CyberFit score threshold for "warning" severity trigger. |
669 |
{$ACRONIS.CPC.CYBERFIT.HIGH} | CyberFit score threshold for "high" severity trigger. |
579 |
{$ACRONIS.CPC.OFFSET.SCHEDULED.ANTIMALWARE} | Offset time in seconds for scheduled antimalware scan trigger check. |
-180 |
{$ACRONIS.CPC.OFFSET.SCHEDULED.BACKUP} | Offset time in seconds for scheduled backup run trigger check. |
-180 |
{$ACRONIS.CPC.OFFSET.SCHEDULED.VULNERABILITY} | Offset time in seconds for scheduled vulnerability assessment run trigger check. |
-180 |
{$ACRONIS.CPC.OFFSET.SCHEDULED.PATCH} | Offset time in seconds for scheduled patch management run trigger check. |
-180 |
{$ACRONIS.CPC.DEVICE.RESOURCE.TYPE} | Comma separated list of resource types for devices retrieval. |
resource.machine |
{$ACRONIS.CPC.ALERT.DISCOVERY.CATEGORY.MATCHES} | Sets the alert category regex filter to use in alert discovery for including. |
.* |
{$ACRONIS.CPC.ALERT.DISCOVERY.CATEGORY.NOT_MATCHES} | Sets the alert category regex filter to use in alert discovery for excluding. |
CHANGE_IF_NEEDED |
{$ACRONIS.CPC.ALERT.DISCOVERY.SEVERITY.MATCHES} | Sets the alert severity regex filter to use in alert discovery for including. |
.* |
{$ACRONIS.CPC.ALERT.DISCOVERY.SEVERITY.NOT_MATCHES} | Sets the alert severity regex filter to use in alert discovery for excluding. |
CHANGE_IF_NEEDED |
{$ACRONIS.CPC.ALERT.DISCOVERY.RESOURCE.MATCHES} | Sets the alert resource name regex filter to use in alert discovery for including. |
.* |
{$ACRONIS.CPC.ALERT.DISCOVERY.RESOURCE.NOT_MATCHES} | Sets the alert resource name regex filter to use in alert discovery for excluding. |
CHANGE_IF_NEEDED |
{$ACRONIS.CPC.CUSTOMER.DISCOVERY.KIND.MATCHES} | Sets the customer name regex filter to use in customer discovery for including. |
customer |
{$ACRONIS.CPC.CUSTOMER.DISCOVERY.NAME.MATCHES} | Sets the customer name regex filter to use in customer discovery for including. |
.* |
{$ACRONIS.CPC.CUSTOMER.DISCOVERY.NAME.NOT_MATCHES} | Sets the customer name regex filter to use in customer discovery for excluding. |
CHANGE_IF_NEEDED |
{$ACRONIS.CPC.DEVICE.DISCOVERY.TENANT.MATCHES} | Sets the tenant name regex filter to use in device discovery for including. |
.* |
{$ACRONIS.CPC.DEVICE.DISCOVERY.TENANT.NOT_MATCHES} | Sets the tenant name regex filter to use in device discovery for excluding. |
CHANGE_IF_NEEDED |
{$ACRONIS.CPC.ACCESS_TOKEN} | API access token. |
|
{$ACRONIS.CPC.PATH.ACCOUNT.MANAGEMENT} | Sub-path for the Account Management API. |
/api/2 |
{$ACRONIS.CPC.PATH.RESOURCE.MANAGEMENT} | Sub-path for the Resource Management API. |
/api/resource_management/v4 |
{$ACRONIS.CPC.PATH.ALERTS} | Sub-path for the Alerts API. |
/api/alert_manager/v1 |
{$ACRONIS.CPC.PATH.AGENTS} | Sub-path for the Agents API. |
/api/agent_manager/v2 |
{$ACRONIS.CPC.MSP.TENANT.UUID} | UUID for MSP. |
Name | Description | Type | Key and additional info |
---|---|---|---|
Acronis CPC: Register integration | Registers integration on Acronis services. |
Script | acronis.cpc.register.integration |
Acronis CPC: Get alerts | Fetches all alerts. |
HTTP agent | acronis.cpc.alerts.get Preprocessing
|
Acronis CPC: Get customers | Fetches all customers. |
HTTP agent | acronis.cpc.customers.get Preprocessing
|
Acronis CPC: Get devices | Fetches all devices. |
HTTP agent | acronis.cpc.devices.get Preprocessing
|
Acronis CPC: Alerts with "ok" severity | Gets count of alerts with "ok" severity. |
Dependent item | acronis.cpc.alerts.severity.ok Preprocessing
|
Acronis CPC: Alerts with "warning" severity | Gets count of alerts with "warning" severity. |
Dependent item | acronis.cpc.alerts.severity.warn Preprocessing
|
Acronis CPC: Alerts with "error" severity | Gets count of alerts with "error" severity. |
Dependent item | acronis.cpc.alerts.severity.err Preprocessing
|
Acronis CPC: Alerts with "critical" severity | Gets count of alerts with "critical" severity. |
Dependent item | acronis.cpc.alerts.severity.crit Preprocessing
|
Acronis CPC: Alerts with "information" severity | Gets count of alerts with "information" severity. |
Dependent item | acronis.cpc.alerts.severity.info Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Acronis CPC: Alerts discovery | Discovers alerts. |
Dependent item | acronis.cpc.alerts.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Alert [{#TYPE}]:[{#ALERT_ID}]: Alert severity | Severity for the alert. |
Dependent item | acronis.cpc.alert.severity[{#ALERT_ID}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Alert [{#TYPE}]:[{#ALERT_ID}]: Alert has "critical" severity | Alert has "critical" severity. |
last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.alert.severity[{#ALERT_ID}])=3 |High |
Manual close: Yes | |
Alert [{#TYPE}]:[{#ALERT_ID}]: Alert has "error" severity | Alert has "error" severity. |
last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.alert.severity[{#ALERT_ID}])=2 |Average |
Manual close: Yes Depends on:
|
|
Alert [{#TYPE}]:[{#ALERT_ID}]: Alert has "warning" severity | Alert has "warning" severity. |
last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.alert.severity[{#ALERT_ID}])=1 |Warning |
Manual close: Yes Depends on:
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Acronis CPC: Customer discovery | Discovers customers. |
Dependent item | acronis.cpc.customer.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Customer [{#NAME}]: Enabled status | Enabled status for customer (true or false). |
Dependent item | acronis.cpc.customer.status[{#NAME}] Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Acronis CPC: Device discovery | Discovers devices. |
Dependent item | acronis.cpc.device.discovery Preprocessing
|
Name | Description | Type | Key and additional info |
---|---|---|---|
Device [{#NAME}]:[{#ID}]: Raw data resources status | Gets statuses for device resources. |
HTTP agent | acronis.cpc.device.res.status.raw[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: CyberFit score | Acronis "CyberFit" score for the device. Value of "-1" is assigned if "CyberFit" could not be found for device. |
Dependent item | acronis.cpc.device.cyberfit[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Agent version | Agent version for the device. |
Dependent item | acronis.cpc.device.agent.version[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Agent enabled | Agent status (enabled or disabled) for the device. |
Dependent item | acronis.cpc.device.agent.enabled[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Agent online | Agent reachability for the device. |
Dependent item | acronis.cpc.device.agent.online[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Protection status | Protection status for device. |
Dependent item | acronis.cpc.device.protection.status[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Protection plan name | Protection plan name for device. |
Dependent item | acronis.cpc.device.protection.name[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Previous successful antimalware protection scan | Previous successful antimalware protection scan for device. |
Dependent item | acronis.cpc.device.protection.scan.prev.ok[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Previous antimalware protection scan | Previous antimalware protection scan for device. |
Dependent item | acronis.cpc.device.protection.scan.prev[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Next antimalware protection scan | Next scheduled antimalware protection scan for device. |
Dependent item | acronis.cpc.device.protection.scan.next[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Previous successful machine backup run | Previous successful machine backup run for device. |
Dependent item | acronis.cpc.device.backup.prev.ok[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Previous machine backup run | Previous machine backup run for device. |
Dependent item | acronis.cpc.device.backup.prev[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Next machine backup run | Next scheduled machine backup run for device. |
Dependent item | acronis.cpc.device.backup.next[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Previous successful vulnerability assessment | Previous successful vulnerability assessment for device. |
Dependent item | acronis.cpc.device.vuln.prev.ok[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Previous vulnerability assessment | Previous vulnerability assessment for device. |
Dependent item | acronis.cpc.device.vuln.prev[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Next vulnerability assessment | Next scheduled vulnerability assessment for device. |
Dependent item | acronis.cpc.device.vuln.next[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Previous successful patch management run | Previous successful patch management run for device. |
Dependent item | acronis.cpc.device.patch.prev.ok[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Previous patch management run | Previous patch management run for device. |
Dependent item | acronis.cpc.device.patch.prev[{#NAME}] Preprocessing
|
Device [{#NAME}]:[{#ID}]: Next patch management run | Next scheduled patch management run for device. |
Dependent item | acronis.cpc.device.patch.next[{#NAME}] Preprocessing
|
Name | Description | Expression | Severity | Dependencies and additional info |
---|---|---|---|---|
Device [{#NAME}]:[{#ID}]: CyberFit score critical | CyberFit score for this device is critical for at least 3 minutes. |
min(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.cyberfit[{#NAME}],3m) < {$ACRONIS.CPC.CYBERFIT.HIGH} and max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.cyberfit[{#NAME}],3m) <> -1 |High |
Manual close: Yes | |
Device [{#NAME}]:[{#ID}]: CyberFit score low | CyberFit score for this device is low for at least 3 minutes. |
min(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.cyberfit[{#NAME}],3m) < {$ACRONIS.CPC.CYBERFIT.WARN} and max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.cyberfit[{#NAME}],3m) <> -1 |Warning |
Manual close: Yes Depends on:
|
|
Device [{#NAME}]:[{#ID}]: Agent disabled | Agent for this device is disabled for at least 3 minutes. |
max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.agent.enabled[{#NAME}],3m) < 1 |Info |
Manual close: Yes | |
Device [{#NAME}]:[{#ID}]: Protection status "error" | Device has "error" protection status. |
last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.protection.status[{#NAME}])="error" |Average |
Manual close: Yes | |
Device [{#NAME}]:[{#ID}]: Protection status "warning" | Device has "warning" protection status. |
last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.protection.status[{#NAME}])="warning" |Warning |
Manual close: Yes Depends on:
|
|
Device [{#NAME}]:[{#ID}]: Previous protection scan not successful | Device has "error" protection status. |
last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.protection.scan.prev.ok[{#NAME}])<>last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.protection.scan.prev[{#NAME}]) |Average |
Manual close: Yes | |
Device [{#NAME}]:[{#ID}]: Scheduled antimalware scan failed to run | Scheduled antimalware scan failed to run. |
last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.protection.scan.next[{#NAME}]) < (now() + {$ACRONIS.CPC.OFFSET.SCHEDULED.ANTIMALWARE}) |Warning |
Manual close: Yes | |
Device [{#NAME}]:[{#ID}]: Previous machine backup run not successful | Previous machine backup did not run successfully. |
max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.backup.prev.ok[{#NAME}],1m)<>max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.backup.prev[{#NAME}],1m) |Average |
Manual close: Yes | |
Device [{#NAME}]:[{#ID}]: Scheduled machine backup failed to run | Scheduled machine backup failed to run. |
last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.backup.next[{#NAME}]) < (now() + {$ACRONIS.CPC.OFFSET.SCHEDULED.BACKUP}) |Warning |
Manual close: Yes | |
Device [{#NAME}]:[{#ID}]: Previous vulnerability assessment not successful | Previous vulnerability assessment did not run successfully. |
max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.vuln.prev.ok[{#NAME}],1m)<>max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.vuln.prev[{#NAME}],1m) |Average |
Manual close: Yes | |
Device [{#NAME}]:[{#ID}]: Scheduled vulnerability assessment failed to run | Scheduled vulnerability assessment failed to run. |
last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.vuln.next[{#NAME}]) < (now() + {$ACRONIS.CPC.OFFSET.SCHEDULED.VULNERABILITY}) |Warning |
Manual close: Yes | |
Device [{#NAME}]:[{#ID}]: Previous patch management run not successful | Previous patch management run did not run successfully. |
max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.patch.prev.ok[{#NAME}],1m)<>max(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.patch.prev[{#NAME}],1m) |Average |
Manual close: Yes | |
Device [{#NAME}]:[{#ID}]: Scheduled patch management failed to run | Scheduled patch management failed to run. |
last(/Acronis Cyber Protect Cloud MSP by HTTP/acronis.cpc.device.patch.next[{#NAME}]) < (now() + {$ACRONIS.CPC.OFFSET.SCHEDULED.PATCH}) |Warning |
Manual close: Yes |
Please report any issues with the template at https://support.zabbix.com
You can also provide feedback, discuss the template, or ask for help at ZABBIX forums